0% found this document useful (0 votes)
17 views17 pages

Appliedmath 04 00081

This study introduces a two-stage feature selection method that combines Artificial Bee Colony (ABC) optimization with Adaptive LASSO (AD_LASSO) to enhance model performance in high-dimensional datasets. The first stage utilizes ABC for global feature subset selection, while the second stage refines the selection using AD_LASSO to improve interpretability and eliminate redundancy. Results demonstrate that the ABC-ADLASSO method outperforms traditional methods in accuracy and precision, particularly in complex scenarios with high feature correlation.

Uploaded by

NURİYE SANCAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

Appliedmath 04 00081

This study introduces a two-stage feature selection method that combines Artificial Bee Colony (ABC) optimization with Adaptive LASSO (AD_LASSO) to enhance model performance in high-dimensional datasets. The first stage utilizes ABC for global feature subset selection, while the second stage refines the selection using AD_LASSO to improve interpretability and eliminate redundancy. Results demonstrate that the ABC-ADLASSO method outperforms traditional methods in accuracy and precision, particularly in complex scenarios with high feature correlation.

Uploaded by

NURİYE SANCAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Article

A Two-Stage Feature Selection Approach Based on Artificial Bee


Colony and Adaptive LASSO in High-Dimensional Data
Efe Precious Onakpojeruo 1,2, * and Nuriye Sancar 3, *

1 Operational Research Center in Healthcare, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey
2 Department of Biomedical Engineering, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey
3 Department of Mathematics, Near East University, TRNC Mersin 10, Nicosia 99138, Turkey
* Correspondence: [email protected] (E.P.O.); [email protected] (N.S.)

Abstract: High-dimensional datasets, where the number of features far exceeds the number of
observations, present significant challenges in feature selection and model performance. This study
proposes a novel two-stage feature-selection approach that integrates Artificial Bee Colony (ABC)
optimization with Adaptive Least Absolute Shrinkage and Selection Operator (AD_LASSO). The
initial stage reduces dimensionality while effectively dealing with complex, high-dimensional search
spaces by using ABC to conduct a global search for the ideal subset of features. The second stage
applies AD_LASSO, refining the selected features by eliminating redundant features and enhancing
model interpretability. The proposed ABC-ADLASSO method was compared with the AD_LASSO,
LASSO, stepwise, and LARS methods under different simulation settings in high-dimensional data
and various real datasets. According to the results obtained from simulations and applications on
various real datasets, ABC-ADLASSO has shown significantly superior performance in terms of
accuracy, precision, and overall model performance, particularly in scenarios with high correlation
and a large number of features compared to the other methods evaluated. This two-stage approach
offers robust feature selection and improves predictive accuracy, making it an effective tool for
analyzing high-dimensional data.

Keywords: feature selection; artificial bee colony; adaptive LASSO; high-dimensional data

Citation: Onakpojeruo, E.P.; Sancar, N.


A Two-Stage Feature Selection
Approach Based on Artificial Bee 1. Introduction
Colony and Adaptive LASSO in
High-dimensional data are the data where the number of features (p) is significantly
High-Dimensional Data. AppliedMath
higher than the number of observations (n) (p >> n) [1]. High-dimensional datasets present
2024, 4, 1522–1538. https://doi.org/
challenges in the feature-selection process due to the increased complexity resulting from
10.3390/appliedmath4040081
the large number of features, multicollinearity, and the presence of irrelevant or redundant
Academic Editor: Yu Chen features. This situation presents researchers with additional difficulties in interpreting and
Received: 28 October 2024
evaluating this data. In high-dimensional data, redundant features can reduce model per-
Revised: 5 December 2024
formance, increase computational time, and also cause overfitting problems in the modeling
Accepted: 10 December 2024
process. These challenges require advanced methods that can effectively identify the most
Published: 12 December 2024 relevant predictors while maintaining computational efficiency. The increasing popularity
of high-dimensional data in various fields, such as genetics and bioinformatics, has led
to effective feature-selection methods rapidly gaining importance. Existing approaches,
such as the High Dimensional Selection with Interactions (HDSI) algorithm, which inte-
Copyright: © 2024 by the authors. grates bootstrapping and random subspace sampling with classical statistical techniques,
Licensee MDPI, Basel, Switzerland. represent a significant advancement in addressing feature-selection challenges to efficiently
This article is an open access article
handle high-dimensional data [2]. Furthermore, high-dimensional analysis of semidefinite
distributed under the terms and
relaxations for sparse principal component analysis highlights trade-offs between statistical
conditions of the Creative Commons
and computational efficiency [3]. On the other hand, the Greedy Anytime Algorithm for
Attribution (CC BY) license (https://
sparse PCA provides an efficient solution for high-dimensional sparse PCA problems [4].
creativecommons.org/licenses/by/
For a given model, feature selection can be determined as an optimization problem.
4.0/).

AppliedMath 2024, 4, 1522–1538. https://doi.org/10.3390/appliedmath4040081 https://www.mdpi.com/journal/appliedmath


AppliedMath 2024, 4 1523

 
Given a dataset X = x1 , x2 , . . . , x p where xi represents i-th feature, the aim of feature
selection is to identify a subset of features XS ⊂ X that maximizes the model’s performance
according to a certain evaluation criterion. Feature selection is the process of diminishing
the number of input features by eliminating the least significant or redundant ones, hence
enhancing model interpretability and decreasing computational effort [5]. Depending
on the way these features are employed, the feature-selection process can be broadly
subdivided into three broad categories: filter, wrapper, and embedded [6,7]. Filter methods
use statistical measures to rank each feature discriminately to the learning algorithm, which
is simple and fast but ignores interactions between features. The wrapper methods apply
a machine learning model to rate various feature subsets, which results in a more precise
selection of features but comes at the cost of more time consumption [6–8]. Some of the
methods are designed in a way that feature selection is incorporated into the model, for
instance, LASSO or Elastic Net, balancing accuracy with the complexity of the model
and number of samples. Despite the strengths of existing selection methods, the high-
dimensional data complexity still requires hybrid approaches that leverage the advantages
of multiple feature-selection techniques [9].
There are many methods available for optimizing the feature-selection process, and
most of them have their advantages as well as disadvantages. Of the swarm intelligence
algorithms, nature-inspired Artificial Bee Colony (ABC) has emerged as a powerful opti-
mization algorithm [10,11]. ABC was originally designed based on the foraging pattern of
honeybees. ABC mimics the bee colony to search for the optimal solution to hundreds of
optimization problems [12,13]. When applied to feature selection, it seeks the best subsets
of features through a balance of the exploitation and exploration process. This algorithm
performs better than others due to its simplicity and ability to avoid becoming trapped in
local optima. If compared with conventional approaches that fail in high-dimensional data
due to computational issues, the application of ABC is more flexible [12,14,15]. This study
proposes the development of a two-stage feature-selection approach involving ABC and
Adaptive Least Absolute Shrinkage and Selection Operator (AD_LASSO) to increase model
accuracy. The motivation for using ABC in this framework stems from its demonstrated
effectiveness in global optimization tasks and its capability to handle the intricacies of
high-dimensional data.

2. Related Studies
Feature selection plays an important role in machine learning, especially in dealing
with high-dimensional datasets where the number of features is higher than the number
of observations. Different feature-selection techniques have been proposed in the existing
literature, broadly categorized into filter methods, wrapper methods, embedded methods,
and a combination of these methods, known as hybrid methods. In this section, the authors
review some studies and advances dedicated to the hybrid feature-selection techniques, as
well as present the potential of ABC relative to other types of metaheuristic approaches.

2.1. Filtering Method, Wrapper Technique, and Embedded Algorithm


Among the simplest selection techniques, there are filter methods, where each feature
is examined individually with respect to the learning algorithm through such criteria as
mutual information, correlation, or statistical tests [16]. While these methods are computa-
tionally efficient, they do not consider the interaction of features and hence, the resulting
best feature subsets can be far from ideal. Wrapper methods, on the other hand, evaluate the
weights of several features with a machine learning scheme for selecting features based on
their significant contributions to the model’s performance [17]. Although wrapper methods
offer better accuracy, they are time-consuming and computationally expensive, particularly
when the data is high-dimensional; this is because, in a wrapper, all or several features are
tested by training a model on a sample of the given data [6,7]. Embedded approaches such
as the LASSO, Adaptive LASSO, and Elastic Net integrate feature selection directly into the
model training process, where variables are selected as part of the model building [18]. The
AppliedMath 2024, 4 1524

LASSO method adds an L1 norm penalty to the objective function when the coefficients of
the features that are not useful for the prediction equal to zero, hence providing a binary
selection of features [19,20]. Because of this, LASSO is especially helpful when the data
contain many unimportant factors. However, LASSO often chooses one predictor from a
set of correlated features, which may not always be desirable in scenarios where predictors
are highly correlated. Adaptive LASSO considers LASSO’s strengths and applies better
penalty strategies for feature selection. It overcomes LASSO’s limitations by offering better
feature selection, less bias, and better consistency in high-dimensional data [21,22].

2.2. Metaheuristic-Based Feature Selection


Metaheuristic algorithms, such as ABC, genetic algorithm (GA), Ant Colony Optimiza-
tion (ACO), and particle swarm optimization (PSO), have been mostly used for feature
selection in recent years due to their efficiency in searching large solution spaces. These
algorithms mimic natural processes looking for feature subsets which, in turn, can spare
one from evaluating all possible combinations, which is useful in high-dimensional data
problems. ABC has received more attention lately in the literature as an alternative to
GA and PSO. ABC, first developed by Karaboga in 2005 [12], seeks to mimic the honey
bees’ intelligent foraging pattern. The colony comprises workers or employed bees that
focus on the exploitation of the various regions within the solution space, onlookers, and
scouts that focus on the discovery of new regions. Employed bees seek for promising
areas, onlookers select the best areas to search or invest their time in, and scouts, finally,
bring new areas to work on. This enables ABC to achieve exploration and exploitation
ratios just as successively as crossover and mutation in GA. In view of the fact that feature
selection is one of the most crucial stages of a pattern-recognition process, a number of
experimental verifications of the efficiency of ABC have been conducted in terms of this
step. For instance, ref. [23] applied ABC to large-scale gene expression profiles and revealed
that ABC could achieve a considerable decrease in dimensionality while achieving high
performance. Also, ref. [24] applied ABC to consider the features in a medical diagnosis
system, which showed better results than GA and Simulated Annealing. Studies that
have compared the two algorithms suggest that ABC has improved models with better
exploration potential in large multifactorial spaces. For instance, refs. [25,26] used ABC to
select features and suggested that ABC yielded better global search owing to its ineligibility
for premature convergence.

2.3. Hybrid Feature-Selection Approaches


Due to the shortcomings of the singular feature-selection strategies, there have been
some attempts to integrate the various methods into an individual framework. Hybrid
methods mostly involve the application of filter or wrapper methods in their association
with metaheuristic algorithms, with the aim of increasing the efficiency of computations and
the accuracy of their predictions. ABC has inspired other approaches, but the combined
methods of using it together with other optimization approaches have attracted more
attention. Refs. [27,28] proposed an integrated ABC approach, incorporating Tabu Search
to improve both local and global searching. The results of this experiment showed that the
proposed randomized hybrid method converged faster and provided better solutions as
compared to more conventional methods of feature selection for bioinformatics problems.
Similarly, ref. [29] has combined the method of ABC with a support vector machine for the
feature-selection procedure in high dimensions and proved good classification performance
on relatively low times of computations. ABC has become an effective alternative tool for
feature-selection problems as employed in the following studies [12–15]. Its biological-
inspired mechanism that deals with exploration and exploitation has been successfully
used to find the optimal feature subset in different areas, ranging from medical diagnosis
to text classification. In this study, we proposed the ABC-based method due to its well-
developed global search abilities and its higher stability in preventing the algorithm from
being trapped by local optima than other metaheuristic optimization methods. The primary
AppliedMath 2024, 4 1525

aim of this study is to develop a two-stage feature-selection technique using the ABC
optimization method alongside AD_LASSO in a high-dimensional dataset. This hybrid
framework seeks to achieve maximum feature-selection performance while at the same
time minimizing model complexity.

3. Materials and Methods


3.1. Linear Regression Model
Linear regression describes the association between a dependent feature (y) and one
or more independent features (predictors). The goal is to predict the values of y using the
predictors by estimating the coefficients β in the following linear equation:

y = Xβ + ϵ (1)

where X ∈ Rnxp data (design) matrix for independent features  (predictors). The row
i of the data matrix X is the column vector xi = xi1 , . . . , xip . y ∈ Rn is the vector
of observed dependent features with n as the number of observations, and p is the
number of independent features. β = β 1 , . . . , β p ∈ R p is the vector of unknown co-


efficients (parameters to be estimated) and ϵ ∈ Rn is the error term, assumed to be


normally distributed with a mean of zero and constant variance (i.e., homoscedastic),
i.e., E(ϵ) = 0, ϵ ∼ N 0, σ2 In where σ2 isvariance.

Suppose that β̂ is the estimator of β. Then, the residuals are determined as ri = ri β̂ =
yi − Xi β̂ where ŷ = X β̂. Because residuals show the error of the model fit, it is best to keep
them small in size. The estimation of β̂ is obtained by minimizing the sum of squared
residuals:
n
β̂ = argmin∥y − Xβ∥22 = argmin∑i=1 (yi − Xi β)2 (2)
β β

where ∥y − Xβ∥2 represents the L2 norm (Euclidean norm) of the residual vector y − Xβ.
This minimization problem has a closed-form solution which is called an ordinary least
square estimator (OLS), defined as
  −1
β̂ = X T X XT y (3)

It is obviously seen that computation of the OLS estimator depends on some as-
sumptions, like most statistical models [30]; the matrix X T X must have full rank, i.e.,
rank (X) = p. When p >> n, the matrix X T X becomes singular and non-invertible, making
the OLS solution undefined. On the other hand, overfitting and high correlation among in-
dependent features are other challenges of high-dimensional data. Stepwise regression [31]
and LARS (Least Angle Regression) [32] are widely used methods for feature selection in
high-dimensional data. These methods can help manage the complexity of the model while
selecting important features. On the other hand, regularization methods in regression,
such as LASSO and Adaptive LASSO, apply penalties on the size of coefficients to avoid
overfitting and enhance model performance in high-dimensional data by shrinking less
relevant feature coefficients toward zero.

3.2. Stepwise Regression


Stepwise regression [31] is a feature-selection method that iteratively adds or removes
features based on a selected criterion in extended BIC (ExBIC) for high-dimensional data.
The process can follow a forward selection approach, which starts with no features and
adds the most significant features step-by-step, or a backward elimination approach, which
starts with all features and removes the least significant ones. Alternatively, a combination
of both, called stepwise selection, evaluates adding and removing variables at each step.
AppliedMath 2024, 4 1526

3.3. Least Angle Regression (LARS)


The Least Angle Regression (LARS) method is a stepwise version of the LASSO
approach used in regression for feature selection [32]. LARS is similar to forward stepwise
regression. At each step, it identifies the feature most correlated with the target. When
multiple features have equal correlation with the target, instead of continuing along the
same feature, it proceeds in an equiangular direction between the features. The LARS
method follows the following steps:
1. Start with β̂ = 0 and r = y where r is residual.
2. Identify the feature X that has the highest correlation with the residual (or equivalently,
the feature that forms the least angle with the residual).
3. Continue following the highly correlated predictor until the residual and another
feature x have an equal correlation.
4. Move in a direction equiangular to both the features.
5. Repeat steps until all the features are included in the model.

3.4. Regularization Methods


Regularization methods such as LASSO and adaptive lasso are frequently used meth-
ods in regression analysis which assist in overcoming overfitting by including penalty
terms to the loss function, thus improving model generalization on high-dimensional data.
Equation (4) clearly demonstrates the mechanics of regularization:

L( β; .) = L( β) + ϕ( β; .) (4)
| {z } | {z } | {z }
total loss f unction loss f untion f or linear model regularization penalty

where L( β) = ∥y − Xβ∥22 .

3.4.1. Least Absolute Shrinkage and Selection Operator (LASSO)


LASSO (Least Absolute Shrinkage and Selection Operator) is one of the regularization
methods that adds an L1 penalty (absolute norm) to the loss function [20]. The optimization
problem can then be formulated for some t > 0 as

min ∥y − Xβ∥22 , subject to ∥ β∥1 ≤ t (5)


p
where ∥ β∥1 = ∑ J =1 β j is the L1 norm of the coefficient vector. By solving this minimization
problem as an unconstrained problem by incorporating a Lagrange multiplier λ > 0, the
LASSO estimator is given by

β̂ LASSO = argmin∥y − Xβ∥22 + λ∥ β∥1 (6)


β

where λ is a tuning parameter that controls the shrinkage of the LASSO coefficient with
λ ≥ 0. LASSO stands out due to its ability to perform feature selection by shrinking some
coefficients to exactly zero [33]. Because of this, LASSO is especially helpful when the data
contains many unimportant factors. However, LASSO often chooses one predictor from a
set of correlated features, which may not always be desirable in scenarios where predictors
are highly correlated.

3.4.2. Adaptive LASSO


Adaptive LASSO considers LASSO’s strengths and applies better penalty strategies
for feature selection [21,22]. It overcomes LASSO’s limitations by offering better feature
selection, less bias, and better consistency in high-dimensional data. The Adaptive LASSO
estimator is defined as
p
β̂ LASSO = argmin∥y − Xβ∥22 + λ∑ j=1 w j β j (7)
β
AppliedMath 2024, 4 1527

1
where w j is the weight for each coefficient. Generally, it is set to w j = with LASSO
| βˆj |
estimator β̂ j . These weights penalize small coefficients more and large coefficients less,
thus keeping the necessary features in the model.
Adaptive LASSO handles high-dimensional components better than LASSO due to its
improved weight. The weights w j , inversely proportional to initial coefficient estimates β̂ j ,
mitigate bias from smaller coefficients that are highly impacted. This technique promotes
significant features with higher coefficient values and demotes and eliminates unimportant
ones with higher penalties, improving feature-selection precision. However, Adaptive
LASSO may have limitations, especially when working with enormous data or data with
significant correlation or nonlinearity.
One-stage selection methods like Adaptive LASSO may perform poorly when
there are multiple layout interconnections or dependency linkages. This is where
two-stage techniques become crucial. Utilizing a two-stage approach, one can enhance
one-stage methods by adding a second refinement step, utilizing Adaptive LASSO
with a metaheuristic optimization algorithm. Two-stage techniques allow additional
time for consideration and fine-tuning feature importance, reducing variability and
increasing stability. These methods first perform an initial selection, followed by a
secondary process to refine the feature subset, ensuring that only the most significant
features are retained in the model. This proposed approach enhances the reliability
of feature selection in high-dimensional data, where traditional methods may fall
short. In this study, a two-stage ABC-Adaptive LASSO-based hybrid variable-selection
method has been proposed.

3.5. Artificial Bee Colony Optimization (ABC)


The ABC algorithm has been introduced by Karaboga [12] for solving the continuous
optimization problems. It mimics the foraging behavior of honey bee colonies, where
the search process is divided into three roles: employed bees, onlooker bees, and scout
bees. Each of these roles works alongside the other to essentially conduct a search of the
decision-making space for the best possible solution. The first half of the bees in the ABC
algorithm are workers, and the second half are observers. Each food source stands in for a
worker bee that uses its own food source to its advantage and then returns to the hive to tell
the other bees about it. Spectator bees identify food sources by following the movements
of worker bees. The method states that a food source’s nectar amount symbolizes the
solution’s quality (i.e., fitness), while a food source’s position represents a potential solution
(i.e., food source) to the problem. The number of solutions in the swarm is equal to the
number of employed or onlooker bees and food sources [12,13].
The ABC algorithm has seven steps, which include the following:
Step 1: Initialization: First of all, ABC is initialized with SN food sources where SN
is the size of food sources (worker bees). Each food source Xi where i = 1, 2, . . ., SN is a
vector with dimension D, which stands for the number of parameters to be optimized. The
first food source locations are randomly generated by the Equation (8):
 
j j j j
Xi = Xmin + rand(0, 1)· Xmax − Xmin (8)

j j
where j = 1, 2, . . ., D. Xmax and Xmin represents upper and lower bounds of the jth parameter,
and rand(0, 1) represents the random number from 0 to 1.
Step 2: This step involves evaluating the food sources by objective function. In
this context, we determine the nectar amount (or objective value) associated with each
food source.
AppliedMath 2024, 4 1528

Step 3: The process of worker bees: Upon initialization, each worker bee visits its
food source and searches for a neighboring food source with superior nectar quality. The
location of the closest food source for a worker bee Xi is Vi given by the following equation:
 
jrand jrand jrand jrand
Vi = Xi + rand(−1, 1)· Xi − Xk (9)

where Xk is the food source which is randomly selected, k ∈ (1, 2, . . ., SN) is determined
at random, different from i, jrand ∈ {1, 2, . . . , D} represents the random integer number,
and rand[−1, 1] is a random value from −1 to 1.
Step 4: Selection and assessment of quality: The quality of the new food source is
assessed after the identification of the new food source. Bees will abandon their current
food source in favor of a new one if the latter exhibits superior quality.
Step 5: The process for onlooker bees: The onlooker bees acquire knowledge regarding
the characteristics of the food sources from worker bees after all the worker bees complete
their foraging operations. The onlooker bee assesses the probability of locating a food
source Xi based on the quality of information obtained from all worker bees, represented
as πi . For each food source Xi , the probability value πi is ascertained based on the quality
of food source I, as assessed by the worker bee by Equation (10).

fi
πi = SN
(10)
∑ n =1 f n

where fi is the value associated the objective function. Consequently, this probability
value π i is compared with a randomly generated number between 0 and 1. The
onlooker bee that identifies a new food source is given this food source if its probability
value, as calculated by Equation (10), exceeds the random value, provided the food
source is located.
Step 6: This stage involves preserving the best food source with the best quality.
Step 7: The scout bee: In the scout bee process, a bee substitutes an abandoned food
source with one it has discovered. Each bee in the swarm is assigned an own counter for
this process. Upon reaching a specific threshold in her counter value, a bee will abandon the
food source (the solution) and commence the search for alternative food sources. According
to Equation (8), a scout bee seeks a new food source.
The procedure persists until a specified termination criterion is met by repeating steps
3 to 7.
Since the feature-selection problem is defined as a discrete optimization problem, a
binary version of ABC is needed. Generally, the sigmoid function is applied to convert
continuous values into binary.
1
S(Vi ) = (11)
1 + e−Vi
If S(Vi ) > rand(0, 1), set Vi ; otherwise, set Vi = 0.

3.6. The Proposed ABC-ADLASSO Method for Feature Selection


The proposed feature-selection method consists of two phases: In the first phase,
ABC optimization is applied to narrow the search space and identify a subset of relevant
features, reducing computing costs and improving model accuracy. In the second phase,
the AD_LASSO method is used to further refine the selected features, eliminating any
remaining irrelevant features. By reducing the dimensionality with ABC first, the com-
plexity of the AD_LASSO process is minimized, improving its performance accuracy and
preventing overfitting. The advantages of ABC over other heuristic methods have been
widely discussed in the literature. For example, ref. [34] highlights that ABC requires fewer
control parameters and has a more advanced ability to balance exploration and exploitation.
Ref. [35] emphasizes that, despite its simplicity, ABC is more effective in global optimiza-
tion problems. Additionally, ABC can be easily adapted to various optimization problems,
AppliedMath 2024, 4 1529

giving it a significant advantage over other algorithms and faster convergence, making it a
suitable choice for high-dimensional data. To use the ABC algorithm efficiently and reap
its benefits, a few crucial factors must be taken into account:
1. Representation of Bees
Each bee in the ABC algorithm represents a potential solution, which is a binary vector
corresponding to a subset of features. For example, given a dataset with 100 features,
a bee might be represented as a vector [1, 0, 0, 1, 0, 1, 0, . . ., 1], where 1 s indicates the
selected features.
2. Objective Function
Choosing the appropriate objective function in the optimization process is critical
to ensure the accuracy and effectiveness of the solution. The Extended-Bayesian
Information Criterion (ExBIC) was utilized as a fitness function for the proposed
feature-selection method. ExBIC is a model selection criterion developed especially
for high-dimensional data and is commonly used for feature selection [36]. ExBIC is
also effective in controlling false positives while balancing model fit and complexity
and is defined by the Equation (12):

ExBIC = −2 ∗ logL + d ∗ log(n) + 2 ∗ γ ∗ log( p) (12)

where d denotes the number of selected features, n is the number of total observations,
p is the number of all features in the data matrix, and γ is a parameter ranging
between 0 and 1. A more optimal model will have a lower ExBIC value, reflecting
an improved trade-off between model accuracy and complexity. In this case, γ is a
fixed-value parameter, commonly assigned as 0.5, as suggested by [36]. logL is the
logarithm of the likelihood of the model (which is related to the residual sum of
squares in linear regression). In the proposed method, the fitness function ExBIC
will be minimized in the first step using the ABC algorithm for feature selection.
Then, in the second step, the remaining unnecessary features will be eliminated using
Adaptive LASSO. Adaptive LASSO will be enabled to work on a more refined feature
set in the second stage after unnecessary features are removed using ABC in the first
stage. In order to provide more precise and efficient results, this strategy seeks to
integrate the advantages of both approaches.
3. The Control Parameters for ABC
By trial and error, the following parameters have been defined for the ABC-based
proposed method:
Number of food sources: SN, as SN is the number of features in the data.
Maximum number of iterations: 100.
Max Limit: 10, where the max limit is how many times a food source can be selected
without improvement before it is abandoned.
The flow chart of the proposed two-stage ABC-ADLASSO method is presented in
Figure 1.
AppliedMath
AppliedMath2024,
2024,4,4FOR PEER REVIEW 1530 9

Figure
Figure1.1.Flow
Flowchart
chartof
of the
the proposed method.
proposed method.

4.4.Simulation
SimulationStudy Study
Thesimulation
The simulation study study was was conducted
conducted to to show
showthe thefeature-selection
feature-selectionperformance performanceofof
thedeveloped
the developed ABC-ADLASSO ABC-ADLASSO method method by by comparing
comparing itit with withthe theAD_LASSO,
AD_LASSO,LASSO, LASSO,
stepwise, and LARS methods under different simulation
stepwise, and LARS methods under different simulation settings in high-dimensional settings in high-dimensional data.
In the simulation study, the R Studio was used for
data. In the simulation study, the R Studio was used for all processes. The linear modelall processes. The linear model was used
for data
was usedgeneration.
for data generation.  
y = Xβ + ϵ, ϵ ∼ N 0, σ2 (13)
𝑦 = 𝑋𝛽 + 𝜖, 𝜖~𝑁(0, 𝜎 ) (13)
Six simulation scenarios with high-dimensional settings were considered. The sample
size Sixn = 50 simulation
is used forscenarios each setting. with high-dimensional settings were considered. The
sample The 6 scenarios are consideredsetting.
size n = 50 is used for each as follows [18]:
The 6 scenarios are considered as follows [18]:
Scenario1: p = 60 and σ = 1.5. The rows are independent in data matrix  X. The
Scenario1: p = 60 and x j1 , .σ. .=, x1.5. The rows are independent in data matrix X. The first 10

first 10 features j10 and the remaining 50 features x j11 , . . . , x j60 are inde-
features 𝑥 , … , 𝑥 and the remaining 50 features
pendent in thej-th row. The pairwise correlation among r-th and d-th components 𝑥 , … , 𝑥 are independent in the
in row.
j-th x j1 , .The
. . , xpairwise | r − d
j10 is ρ correlation
| where ρ among = 0.5 and r-thr, dand= 1,d-th
. . .,components
10. Also, thein pairwise is 𝜌| |
𝑥 , … , 𝑥correlation
| r − d |

amongρ r-th
where = 0.5 and and d-thr, components
d = 1, …, 10.inAlso, x j11 ,the x j60 is ρ correlation
. . . ,pairwise where ρamong= 0.5 and r-thr,and
d = 11,
d-th
| |
. . ., 60.
components in 𝑥 , … , 𝑥 is 𝜌 where 𝜌 = 0.5 and r, d = 11, …, 60.
Scenario2:This
Scenario2: Thisisisidentical
identical to to Scenario1,
Scenario1, with with the theexception
exceptionthat thatρ𝜌==0.90.
0.90.
Scenario3: This is identical to Scenario1, with the exception that p = 100.
Scenario3: This is identical to Scenario1, with the exception that p = 100.
Scenario4: This is identical to Scenario2, with the exception that p = 100.
Scenario4: This is identical to Scenario2, with the exception that p = 100.
Scenario5: p = 60 and σ = 1.5. The features are generated:
Scenario5:
x ji = Z1i +p e=ji 60 forandj = 1, σ 2,= 1.5.
. . ., The
5 andfeatures are generated:
𝑥x ==𝑍Z ++𝑒e for j = 1, 2, …, 5 and
ji 2i ji for j = 6, 7, . . ., 10 where Z ji ∼ N (0, 1) and e ji ∼ N (0, 1/100). βs are 1.5 for
𝑥the=first
𝑍 + 𝑒 for
10 components j = 6, 7, and …,0 10 the rest𝑍of~𝑁(0,1)
for where and 𝑒 ~𝑁(0,1/100). 𝛽𝑠 are 1.5 for the
the components.
first 10 components
Scenario6: This is identicaland 0 for the rest of with
to Scenario1, the components.
the exception that p = 100.
Scenario6: This is identical
The confusion matrix was used to evaluate to Scenario1, with the exception
performances that p of = 100.
the developed and
traditional
The confusion Adaptive LASSO
matrix was methods.
used toInevaluate
this matrix, True Positives
performances of the (TP) are features
developed and
correctly identified as relevant (correctly determining
traditional Adaptive LASSO methods. In this matrix, True Positives (TP) are features significant or non-zero coeffi-
cients), and
correctly False Positives
identified as relevant (FP) are irrelevant
(correctly determining featuressignificant
incorrectly oridentified as relevant
non-zero coefficients),
(zero coefficients incorrectly determined as
and False Positives (FP) are irrelevant features incorrectly identified as relevantsignificant or non-zero). True Negatives(zero
coefficients incorrectly determined as significant or non-zero). True Negatives (TN) are
irrelevant features correctly identified as irrelevant (correctly determining zero
AppliedMath 2024, 4 1531

(TN) are irrelevant features correctly identified as irrelevant (correctly determining


zero coefficients), and False Negatives (FN) are relevant features incorrectly identi-
fied as irrelevant (nonzero coefficients incorrectly determined as non-significant or
zero). Based on the confusion matrix, accuracy, sensitivity, and specificity values were
computed for the methods:

TP + TN
Accuracy = (14)
TP + TN + FN + FP
TN
Specificity = (15)
TN + FP
TP
Sensitivity = (16)
TP + FN
A total of 300 random repetitions of the simulations are performed. Every simulated
dataset is split into a training set (80%) and a test set (20%) for each iteration of the
simulation. The proposed ABC-ADLASSO, AD_LASSO, LASSO, stepwise, and LARS
methods were implemented on the training set, and the performances of the methods were
analyzed on the testing set.

5. Simulation Results
The simulation study has been performed to demonstrate the impact of increasing
the number of features (p) and the correlation among the features on feature-selection
performance in high-dimensional data. As p increases, the complexity of the data grows,
making feature selection more challenging due to the higher likelihood of including ir-
relevant or redundant features. Among the feature-selection methods compared, the
standard AD_LASSO consistently outperforms traditional one-stage feature-selection meth-
ods, LASSO, LARS, and stepwise, across all scenarios, particularly when dimensionality
increases. The simulation study demonstrates that the proposed two-stage ABC-ADLASSO
method enhances AD_LASSO’s feature-selection performance, achieving superior sensitiv-
ity, specificity, and accuracy values, resulting in more successful outcomes than AD_LASSO
and other compared methods across all scenarios.
In scenarios with lower correlation and lower dimension (p = 60, ρ = 0.50), LASSO
performs similarly to stepwise and LARS, managing acceptable feature selection. As the
correlation among features rises (ρ = 0.90), stepwise demonstrates the most significant
decline in performance, struggling to handle multicollinearity effectively. LASSO and
LARS also have experienced noticeable performance decreases, but LASSO generally
outperforms LARS by providing a slightly better feature selection in high-correlation
settings. However, in these difficult scenarios, both approaches are not as effective as
AD_LASSO.
Our proposed ABC-ADLASSO feature-selection method consistently outperforms
AD_LASSO, LASSO, stepwise, and LARS across all simulation scenarios, particularly as p
increases and the correlation between features increases. This improvement is attributed to
the ABC’s ability to explore a broader solution space, enabling it to handle multicollinearity
more effectively and avoid local minima. As the correlation among features increases,
traditional one-stage methods like AD_LASSO, LASSO, LARS, and stepwise (BICP) struggle
with bias and selection accuracy, while the proposed two-stage approach achieves feature
selection more robustly. Table 1 shows the performance results for all methods across
different simulation scenarios.
AppliedMath 2024, 4 1532

Table 1. Simulation results.

Scenario Method Sensitivity Specificity Accuracy


ABC-ADLASSO 0.879 0.958 0.914
AD_LASSO 0.715 0.892 0.817
Scenario1
LASSO 0.686 0.887 0.799
n = 50, p = 60, ρ = 0.50
STEPWISE 0.652 0.857 0.795
LARS 0.633 0.861 0.789
ABC-ADLASSO 0.911 0.961 0.924
AD_LASSO 0.742 0.919 0.846
Scenario2
LASSO 0.674 0.893 0.802
n = 50, p = 60, ρ = 0.90
STEPWISE 0.564 0.731 0.633
LARS 0.643 0.890 0.786
ABC-ADLASSO 0.928 0.973 0.938
AD_LASSO 0.784 0.925 0.909
Scenario3
LASSO 0.627 0.849 0.742
n = 50, p = 100, ρ = 0.50
STEPWISE 0.577 0.721 0.609
LARS 0.615 0.856 0.743
ABC-ADLASSO 0.905 0.961 0.922
AD_LASSO 0.734 0.905 0.825
Scenario4
LASSO 0.600 0.868 0.734
n = 50, p = 100, ρ = 0.90
STEPWISE 0.482 0.659 0.527
LARS 0.617 0.872 0.755
ABC-ADLASSO 0.923 0.974 0.931
AD_LASSO 0.739 0.916 0.859
Scenario5
n = 50, p = 60, grouping LASSO 0.573 0.831 0.744
effect STEPWISE 0.432 0.630 0.508
LARS 0.548 0.862 0.783
ABC-ADLASSO 0.935 0.984 0.943
AD_LASSO 0.793 0.933 0.850
Scenario6
n = 50, p = 100, grouping LASSO 0.552 0.824 0.735
effect STEPWISE 0.337 0.442 0.411
LARS 0.524 0.846 0.731

a. Real dataset application


The proposed ABC-ADLASSO, AD_LASSO, LASSO, stepwise, and LARS methods
were implemented in the Communities and Crime [37], Large-scale Wave Energy Farm [38],
Insurance Company Benchmark (COIL 2000) [39], and Federal Reserve Economic Data
(FRED) [40] real datasets for feature selection to identify significant features from the
training set, and their efficacy was assessed on the test set.
In the Communities and Crime dataset, there is a significant amount of missing data
in this dataset, where the response feature is the per capita violent crime rate. The features
included in the dataset involve the community, such as the percentage of the population
considered urban and the median family income, and involve law enforcement, such as per
capita number of police officers and percent of officers assigned to drug units. After data
cleaning, a clean dataset with 101 explanatory features and 1996 observations was obtained.
AppliedMath 2024, 4 1533

Since the goal is to select important features on a high-dimensional dataset, a random index
was created to select 80 observations for the training set and 20 observations for the test set.
The Large-scale Wave Energy Farm dataset includes 99 WECs, or wave energy con-
verters, with 6300 observations based on Perth and Sydney wave scenarios as predictors
and total power output as the response variable. The main goal is to predict the total power
output of the wave farm based on the coordination of WECs. Since the goal is to select
significant features on a high-dimensional dataset, a random index was created to select
80 observations for the training set and 20 observations for the test set.
The Insurance Company Benchmark (COIL 2000) dataset includes 5000 customer
records, each with 86 features. Among these, 85 are independent variables: 43 sociode-
mographic features and 42 product ownership features. The target variable is number of
mobile home policies, which indicates the number of mobile home insurance policies. Since
the study aims to perform feature selection on a high-dimensional dataset, a random index
was used to partition the data into an 80-observation training set and a 20-observation
test set.
The FRED data used in this study consist of 115 macroeconomic variables obtained
from the Federal Reserve Economic Data (FRED) database of the St. Louis Federal Re-
serve Bank. For this analysis, we focus on the period between 2008 and 2016 to evaluate
high-dimensional regression models with 102 observations. The goal is to perform variable
selection on 114 predictors with one output variable, “Personal Consumption Expendi-
tures Price Index” (PCEPI), using Adaptive-LASSO and the proposed method and to
compare the performance of these approaches using 82 observations for the training set
and 20 observations for the test set.
Each method was applied to every dataset 10 times for feature selection, and the mean,
standard deviation, median, interquartile range (IQR), minimum, and maximum values for
Adjusted R2 , MAE, and RMSE are presented in Tables 2–5.

Table 2. Real dataset results on Communities and Crime Dataset.

Communities IQR (25th


Standard
and Crime Mean Median Min Max Percentile–75th
Deviation
Dataset Percentile)
Adjusted R2 0.769 0.023 0.764 0.739 0.796 0.750–0.791
ABC-ADLASSO RMSE 0.100 0.014 0.091 0.073 0.116 0.084–0.113
MAE 0.076 0.013 0.080 0.057 0.095 0.065–0.084
Adjusted R2 0.575 0.047 0.580 0.517 0.643 0.532–0.607
AD_LASSO RMSE 0.202 0.021 0.199 0.175 0.247 0.186–0.211
MAE 0.161 0.022 0.166 0.123 0.190 0.146–0.174
Adjusted R2 0.496 0.014 0.497 0.472 0.515 0.486–0.507
LASSO RMSE 0.239 0.007 0.243 0.232 0.268 0.225–0.249
MAE 0.192 0.006 0.190 0.184 0.202 0.187–0.197
Adjusted R2 0.320 0.006 0.318 0.311 0.330 0.316–0.324
STEPWISE RMSE 0.321 0.005 0.317 0.312 0.330 0.318–0.324
MAE 0.227 0.002 0.227 0.224 0.230 0.225–0.228
Adjusted R2 0.460 0.007 0.461 0.448 0.470 0.456–0.464
LARS RMSE 0.249 0.006 0.247 0.240 0.260 0.245–0.252
MAE 0.206 0.005 0.210 0.198 0.224 0.203–0.217
AppliedMath 2024, 4 1534

Table 3. Real dataset results on Large-scale Wave Energy Farm dataset.

Large-Scale IQR (25th


Standard
Wave Energy Mean Median Min Max Percentile–75th
Deviation
Farm Dataset Percentile)
Adjusted R2 0.695 0.010 0.690 0.684 0.708 0.689–0.700
ABC-ADLASSO RMSE 0.110 0.002 0.113 0.108 0.119 0.109–0.115
MAE 0.084 0.002 0.084 0.082 0.087 0.083–0.085
Adjusted R2 0.616 0.006 0.614 0.608 0.623 0.610–0.619
AD_LASSO RMSE 0.187 0.005 0.187 0.180 0.192 0.185–0.190
MAE 0.147 0.005 0.147 0.140 0.152 0.145–0.150
Adjusted R2 0.527 0.008 0.529 0.517 0.536 0.521–0.530
LASSO RMSE 0.279 0.006 0.274 0.270 0.285 0.277–0.282
MAE 0.205 0.007 0.204 0.197 0.214 0.200–0.209
Adjusted R2 0.369 0.006 0.370 0.361 0.379 0.365–0.372
STEPWISE RMSE 0.523 0.005 0.520 0.508 0.535 0.522–0.530
MAE 0.447 0.008 0.445 0.438 0.460 0.442–0.449
Adjusted R2 0.425 0.005 0.428 0.418 0.432 0.421–0.429
LARS RMSE 0.355 0.006 0.356 0.347 0.364 0.349–0.360
MAE 0.273 0.007 0.272 0.265 0.283 0.267–0.278

Table 4. Real dataset results on Insurance Company Benchmark (COIL 2000) dataset.

Insurance
Company IQR (25th
Standard
Benchmark Mean Median Min Max Percentile–75th
Deviation
(COIL 2000) Percentile)
DataSet
Adjusted R2 0.636 0.012 0.628 0.620 0.653 0.624–0.647
ABC-ADLASSO RMSE 0.113 0.002 0.113 0.110 0.118 0.112–0.115
MAE 0.087 0.002 0.087 0.084 0.092 0.085–0.089
Adjusted R2 0.611 0.005 0.612 0.600 0.619 0.608–0.615
AD_LASSO RMSE 0.170 0.003 0.171 0.165 0.176 0.168–0.172
MAE 0.140 0.003 0.140 0.135 0.146 0.138–0.142
Adjusted R2 0.566 0.004 0.568 0.559 0.574 0.563–0.570
LASSO RMSE 0.228 0.003 0.228 0.223 0.235 0.225–0.230
MAE 0.183 0.003 0.183 0.180 0.191 0.181–0.186
Adjusted R2 0.476 0.004 0.477 0.470 0.484 0.473–0.479
STEPWISE RMSE 0.522 0.003 0.523 0.514 0.527 0.520–0.525
MAE 0.428 0.003 0.428 0.421 0.432 0.426–0.430
Adjusted R2 0.514 0.005 0.514 0.505 0.522 0.510–0.518
LARS RMSE 0.468 0.004 0.470 0.462 0.475 0.466–0.471
MAE 0.378 0.004 0.379 0.372 0.385 0.376–0.381
AppliedMath 2024, 4 1535

Table 5. Real dataset results on Federal Reserve Economic Data (FRED) dataset.

Federal Reserve IQR (25th


Standard
Economic Data Mean Median Min Max Percentile–75th
Deviation
(FRED) Percentile)
Adjusted R2 0.758 0.013 0.754 0.736 0.776 0.746–0.766
ABC-ADLASSO RMSE 0.112 0.002 0.117 0.093 0.133 0.103–0.123
MAE 0.089 0.002 0.092 0.067 0.107 0.077–0.097
Adjusted R2 0.703 0.009 0.700 0.680 0.720 0.690–0.710
AD_LASSO RMSE 0.169 0.003 0.168 0.148 0.188 0.158–0.178
MAE 0.139 0.003 0.140 0.120 0.160 0.130–0.150
Adjusted R2 0.625 0.017 0.620 0.603 0.643 0.613–0.633
LASSO RMSE 0.228 0.006 0.231 0.210 0.250 0.220–0.240
MAE 0.188 0.006 0.185 0.165 0.205 0.175–0.195
Adjusted R2 0.483 0.008 0.487 0.465 0.505 0.475–0.495
STEPWISE RMSE 0.414 0.006 0.410 0.392 0.434 0.402–0.422
MAE 0.325 0.007 0.324 0.303 0.345 0.313–0.333
Adjusted R2 0.555 0.016 0.552 0.532 0.572 0.542–0.562
LARS RMSE 0.416 0.005 0.418 0.398 0.438 0.408–0.428
MAE 0.317 0.005 0.319 0.299 0.339 0.309–0.329

b. Real dataset application results


The results from the real datasets are generally consistent with the simulation find-
ings. Similar to the simulation scenarios, the proposed method demonstrated significantly
superior performance compared to other methods in each real dataset. Among traditional
methods, AD_LASSO outperformed LASSO, LARS, and stepwise, but the proposed method
consistently achieved the best performance across all real datasets. The findings in each
real dataset indicate that the ABC-ADLASSO approach regularly surpasses AD_LASSO,
LASSO, stepwise, and LARS methods based on Adjusted R2 , Root Mean Square Error
(RMSE), and Mean Absolute Error (MAE). ABC-ADLASSO attained elevated Adjusted R2
values, indicating its enhanced capacity to elucidate variance in the response feature. It
yields reduced RMSE and MAE values, signifying enhanced predictive accuracy. These
findings show that the proposed two-stage method is a very suitable method for feature
selection in the context of high-dimensional data. Also, it can be said that all methods
show stable results in the min–max value range. However, since the ABC-ADLASSO
method is a heuristic optimization-based method, this property is a very important feature
of this method.

6. Discussion
The findings from both the simulation study and the empirical data application vali-
date the benefits of the proposed two-stage feature-selection method utilizing ABC and
Adaptive LASSO. In high-dimensional contexts, feature selection is essential to prevent
overfitting and enhance model interpretability. Our results indicate that ABC-ADLASSO
provides enhanced feature selection and predictive accuracy relative to one-stage ap-
proaches such as AD_LASSO, LASSO, stepwise, and LARS. The initial stage utilizes the
ABC method to effectively reduce the search space, select the most promising features,
and address issues related to high multicollinearity. The second stage employs Adaptive
LASSO to further improve the selected features, guaranteeing that the final model is both
concise and precise. The proposed ABC-ADLASSO method offers advantages such as im-
proved feature-selection accuracy by combining global exploration (ABC) with AD_LASSO.
AppliedMath 2024, 4 1536

However, a potential drawback is the need for careful tuning of hyperparameters. The
performance of both ABC and AD_LASSO depends on the chosen hyperparameter settings,
and incorrect tuning may reduce the method’s effectiveness and impact the accuracy of the
results. In the simulation analysis, the suggested method performs well in every situation,
especially as the correlation rises. This approach reduces the complexity of the model and
increases its performance in high-dimensional data.

7. Conclusions
This study has introduced an innovative two-stage feature-selection method that integrates
the ABC metaheuristic optimization method with Adaptive LASSO for high-dimensional data.
The proposed ABC-ADLASSO method was compared with the AD_LASSO, LASSO, stepwise,
and LARS methods under different simulation settings in high-dimensional data and various
real datasets to show the feature-selection performance of the proposed method. The ABC-
ADLASSO method has overcome the shortcomings of single-stage feature-selection methods
by integrating a global optimization algorithm (ABC) in the first stage and enhancing feature
selection using a penalization technique (Adaptive LASSO) in the second stage. According to
the results obtained from simulations and applications on various real datasets, ABC-ADLASSO
has shown significantly superior performance in terms of accuracy, precision, and overall model
performance, particularly in scenarios with high correlation and a large number of features
compared to the other methods evaluated. This two-stage methodology offers a robust and
adaptable solution to handling high-dimensional data, rendering it particularly relevant in
domains such as genetics, bioinformatics, and intricate predictive modeling. Future research
may investigate the integration of this methodology with alternative machine learning classifiers
and its use across different datasets from various fields. Also, in future studies, a comprehensive
comparative analysis of the proposed method with other optimization-based feature-selection
techniques can be performed.

Author Contributions: Conceptualization, E.P.O. and N.S.; methodology, E.P.O. and N.S.; software,
E.P.O. and N.S.; validation, E.P.O. and N.S.; formal analysis, E.P.O. and N.S.; investigation, E.P.O. and
N.S.; resources, E.P.O. and N.S.; data curation, E.P.O. and N.S.; writing—original draft preparation;
writing—review and editing, E.P.O. and N.S.; visualization, E.P.O. and N.S.; supervision, E.P.O. and
N.S.; project administration, E.P.O. and N.S. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not Applicable.
Informed Consent Statement: Not Applicable.
Data Availability Statement: The data that support the findings of this study are available on request
from the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Sancar, N.; Onakpojeruo, E.P.; Inan, D.; Uzun, O.D. Adaptive Elastic Net Based on Modified PSO for Variable Selection in Cox
Model with High-Dimensional Data: A Comprehensive Simulation Study. IEEE Access 2023, 11, 127302–127316. [CrossRef]
2. Jain, R.; Xu, W. HDSI: High dimensional selection with interactions algorithm on feature selection and testing. PLoS ONE 2021,
16, e0246159. [CrossRef] [PubMed]
3. Amini, A.A.; Wainwright, M.J. High-dimensional analysis of semidefinite relaxations for sparse principal components. In
Proceedings of the IEEE International Symposium on Information Theory ISIT 2008, Toronto, ON, Canada, 6–11 July 2008;
pp. 2454–2458.
4. Holtzman, G.; Soffer, A.; Vilenchik, D. A greedy anytime algorithm for sparse PCA. In Proceedings of the 33rd Conference on
Learning Theory (COLT 2020), Graz, Austria, 9–12 July 2020; pp. 1939–1956.
5. Rouhi, A.; Nezamabadi-Pour, H. Feature Selection in High-Dimensional Data. In Advances in Intelligent Systems and Computing;
Springer: Cham, Switzerland, 2020; Volume 1123, pp. 85–128. Available online: https://link.springer.com/chapter/10.1007/978-
3-030-34094-0_5 (accessed on 7 October 2024).
AppliedMath 2024, 4 1537

6. Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-
Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. Available online: www.frontiersin.org (accessed on 25 October
2024). [CrossRef] [PubMed]
7. Curreri, F.; Fiumara, G.; Xibilia, M.G. Input Selection Methods for Soft Sensor Design: A Survey. Future Internet 2020, 12, 97.
Available online: https://www.mdpi.com/1999-5903/12/6/97/htm (accessed on 25 October 2024). [CrossRef]
8. Maseno, E.M.; Wang, Z. Hybrid Wrapper Feature Selection Method Based on Genetic Algorithm and Extreme Learning Machine
for Intrusion Detection. J. Big Data 2024, 11, 24. [CrossRef]
9. Bohrer, J.S.; Dorn, M. Enhancing Classification with Hybrid Feature Selection: A Multi-Objective Genetic Algorithm for High-
Dimensional Data. Expert Syst. Appl. 2024, 255, 124518. [CrossRef]
10. Owoc, M.L. Usability of Honeybee Algorithms in Practice. In Towards Nature-Inspired Sustainable Development; IFIP Advances in
Information and Communication Technology; Springer: Cham, Switzerland, 2024; Volume 693, pp. 161–176. Available online:
https://link.springer.com/chapter/10.1007/978-3-031-61069-1_12 (accessed on 15 October 2024).
11. Stamadianos, T.; Taxidou, A.; Marinaki, M.; Marinakis, Y. Swarm Intelligence and Nature-Inspired Algorithms for Solving Vehicle
Routing Problems: A Survey. Oper. Res. 2024, 24, 47. Available online: https://link.springer.com/article/10.1007/s12351-024-008
62-5 (accessed on 15 October 2024). [CrossRef]
12. Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report TR06; Computer Engineering
Department, Engineering Faculty, Erciyes University: Kayseri, Türkiye, 2005.
13. Karaboga, D.; Kaya, E. An Adaptive and Hybrid Artificial Bee Colony Algorithm (aABC) for ANFIS Training. Appl. Soft Comput.
2016, 49, 423–436. [CrossRef]
14. Nozohour-Leilabady, B.; Fazelabdolabadi, B. On the Application of Artificial Bee Colony (ABC) Algorithm for Optimization
of Well Placements in Fractured Reservoirs: Efficiency Comparison with the Particle Swarm Optimization (PSO) Methodology.
Petroleum 2016, 2, 79–89. [CrossRef]
15. Yarat, S.; Senan, S.; Orman, Z. A Comparative Study on PSO with Other Metaheuristic Methods. In International Series in
Operations Research and Management Science; Springer: Cham, Switzerland, 2021; Volume 306, pp. 49–72. Available online:
https://link.springer.com/chapter/10.1007/978-3-030-70281-6_4 (accessed on 15 October 2024).
16. Theng, D.; Bhoyar, K.K. Feature Selection Techniques for Machine Learning: A Survey of More Than Two Decades of Research.
Knowl. Inf. Syst. 2024, 66, 1575–1637. Available online: https://link.springer.com/article/10.1007/s10115-023-02010-5 (accessed
on 7 October 2024). [CrossRef]
17. Liu, X.Y.; Liang, Y.; Wang, S.; Yang, Z.Y.; Ye, H.S. A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature
Selection. IEEE Access 2018, 6, 22863–22874. [CrossRef]
18. Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182.
19. Yerlikaya-Özkurt, F.; Taylan, P. Enhancing Classification Modeling Through Feature Selection and Smoothness: A Conic-
Fused Lasso Approach Integrated with Mean Shift Outlier Modelling. J. Dyn. Games 2024, 12, 1–23. Available online: http:
//staging.xml2html.mdpi.lab/articles/appliedmath-3309531 (accessed on 7 October 2024). [CrossRef]
20. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [CrossRef]
21. Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for Sparse High-Dimensional Regression Models. Ann. Stat. 2008, 18, 1603–1618.
22. Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. Available online: https:
//www.tandfonline.com/doi/abs/10.1198/016214506000000735 (accessed on 25 October 2024). [CrossRef]
23. Zhang, Z.; Tong, T.; Fang, Y.; Zheng, J.; Zhang, X.; Niu, C.; Li, J.; Zhang, X.; Xue, D. Genome-Wide Identification of Barley ABC
Genes and Their Expression in Response to Abiotic Stress Treatment. Plants 2020, 9, 1281. [CrossRef]
24. Garg, S.; Kaur, K.; Batra, S.; Aujla, G.S.; Morgan, G.; Kumar, N.; Zomaya, A.Y.; Ranjan, R. En-ABC: An Ensemble Artificial Bee
Colony Based Anomaly Detection Scheme for Cloud Environment. J. Parallel Distrib. Comput. 2020, 135, 219–233. [CrossRef]
25. Hancer, E.; Xue, B.; Karaboga, D.; Zhang, M. A Binary ABC Algorithm Based on Advanced Similarity Scheme for Feature
Selection. Appl. Soft Comput. 2015, 36, 334–348. [CrossRef]
26. Chamchuen, S.; Siritaratiwat, A.; Fuangfoo, P.; Suthisopapan, P.; Khunkitti, P. High-Accuracy Power Quality Disturbance
Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm. Energies 2021, 14, 1238. [CrossRef]
27. Guo, Y.; Zhang, C. A Hybrid Artificial Bee Colony Algorithm for Satisfiability Problems Based on Tabu Search. In Proceedings of
the 3rd IEEE International Conference on Computer and Communications (ICCC 2017), Chengdu, China, 13–16 October 2017;
IEEE: New York, NY, USA, 2018; pp. 2226–2230.
28. Gu, T.; Chen, H.; Chang, L.; Li, L. Intrusion Detection System Based on Improved ABC Algorithm with Tabu Search. IEEJ Trans.
Electr. Electron. Eng. 2019, 14, 1652–1660. Available online: https://onlinelibrary.wiley.com/doi/full/10.1002/tee.22987 (accessed
on 15 October 2024). [CrossRef]
29. Kiliçarslan, S.; Dönmez, E. Improved Multi-Layer Hybrid Adaptive Particle Swarm Optimization Based Artificial Bee Colony for
Optimizing Feature Selection and Classification of Microarray Data. Multimed. Tools Appl. 2024, 83, 67259–67281. Available online:
https://link.springer.com/article/10.1007/s11042-023-17234-4 (accessed on 7 October 2024). [CrossRef]
30. Kumar, H. Decision Making for Hotel Selection Using Rough Set Theory: A Case Study of Indian Hotels. Int. J. Appl. Eng. Res.
2018, 13, 3988–3998.
31. Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill: New York, NY, USA, 2005.
32. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Statist. 2004, 32, 407–499. [CrossRef]
AppliedMath 2024, 4 1538

33. Sirimongkolkasem, T.; Drikvandi, R. On Regularisation Methods for Analysis of High Dimensional Data. Ann. Data Sci. 2019,
6, 737–763. Available online: https://link.springer.com/article/10.1007/s40745-019-00209-4 (accessed on 26 November 2024).
[CrossRef]
34. Akay, B.; Karaboga, D.; Gorkemli, B.; Kaya, E. A Survey on the Artificial Bee Colony Algorithm Variants for Binary, Integer, and
Mixed Integer Programming Problems. Appl. Soft Comput. 2021, 106, 107351. [CrossRef]
35. Bansal, J.C.; Joshi, S.K.; Sharma, H. Modified Global Best Artificial Bee Colony for Constrained Optimization Problems. Comput.
Electr. Eng. 2018, 67, 365–382. [CrossRef]
36. Chen, J.; Chen, Z. Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika 2008, 95,
759–771. [CrossRef]
37. Communities and Crime—UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/dataset/183/
communities+and+crime (accessed on 24 October 2024).
38. Neshat, M.; Alexander, B.; Sergiienko, N.Y.; Wagner, M. Optimization of Large Wave Farms Using a Multi-Strategy Evolutionary
Framework. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, Cancún, Mexico, 8–12 July 2020.
39. Putten, P. Insurance Company Benchmark (COIL 2000) [Dataset]. UCI Machine Learning Repository. [CrossRef]
40. Federal Reserve Bank of St. Louis. Federal Reserve Economic Data (FRED). Available online: https://fred.stlouisfed.org (accessed
on 27 November 2024).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like