0% found this document useful (0 votes)
14 views13 pages

Effective Detection of Alzheimer's Disease by Optimizing Fuzzy K-Nearest Neighbors Based On Salp Swarm Algorithm

This paper presents an improved fuzzy K-nearest neighbor method (IBSSA-FKNN) optimized by the binary salp swarm algorithm for the early diagnosis of Alzheimer's disease (AD). The method demonstrates high classification accuracy rates of 95.37% for distinguishing AD from mild cognitive impairment (MCI) and 100% for AD from normal controls (NC), outperforming five other comparison methods. The proposed approach effectively enhances feature selection and classification performance, offering significant potential for clinical applications in diagnosing AD.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

Effective Detection of Alzheimer's Disease by Optimizing Fuzzy K-Nearest Neighbors Based On Salp Swarm Algorithm

This paper presents an improved fuzzy K-nearest neighbor method (IBSSA-FKNN) optimized by the binary salp swarm algorithm for the early diagnosis of Alzheimer's disease (AD). The method demonstrates high classification accuracy rates of 95.37% for distinguishing AD from mild cognitive impairment (MCI) and 100% for AD from normal controls (NC), outperforming five other comparison methods. The proposed approach effectively enhances feature selection and classification performance, offering significant potential for clinical applications in diagnosing AD.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computers in Biology and Medicine 159 (2023) 106930

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

Effective detection of Alzheimer’s disease by optimizing fuzzy K-nearest


neighbors based on salp swarm algorithm
Dongwan Lu a, Yinggao Yue b, Zhongyi Hu a, c, d, *, Minghai Xu b, **, Yinsheng Tong a, Hanjie Ma a
a
School of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035, China
b
School of Intelligent Manufacturing and Electronic Engineering, Wenzhou University of Technology, Wenzhou, 325035, China
c
Intelligent Information Systems Institute, Wenzhou University, Wenzhou, 325035, China
d
Key Laboratory of Intelligent Image Processing and Analysis, Wenzhou, China

A R T I C L E I N F O A B S T R A C T

Keywords: Alzheimer’s disease (AD) is a typical senile degenerative disease that has received increasing attention world­
Feature selection wide. Many artificial intelligence methods have been used in the diagnosis of AD. In this paper, a fuzzy k-nearest
Salp swarm algorithm neighbor method based on the improved binary salp swarm algorithm (IBSSA-FKNN) is proposed for the early
Alzheimer’s disease
diagnosis of AD, so as to distinguish between patients with mild cognitive impairment (MCI), Alzheimer’s disease
Medical diagnosis
Swarm intelligence algorithm
(AD) and normal controls (NC). First, the performance and feature selection accuracy of the method are validated
on 5 different benchmark datasets. Secondly, the paper uses the Structural Magnetic Resolution Imaging (sMRI)
dataset, in terms of classification accuracy, sensitivity, specificity, etc., the effectiveness of the method on the AD
dataset is verified. The simulation results show that the classification accuracy of this method for AD and MCI,
AD and NC, MCI and NC are 95.37%, 100%, and 93.95%, respectively. These accuracies are better than the other
five comparison methods. The method proposed in this paper can learn better feature subsets from serial
multimodal features, so as to improve the performance of early AD diagnosis. It has a good application prospect
and will bring great convenience for clinicians to make better decisions in clinical diagnosis.

1. Introduction through early intervention strategies [3].


In recent years, in the field of computer-aided diagnosis, many
Alzheimer’s disease (AD) is a typical senile degenerative disease. Its studies have carried out in-depth explorations of the disease. Most
clinical symptoms include memory loss, mood changes, cognitive studies have adopted the combination of the dimension reduction
decline, difficulty speaking, writing, walking, etc. This disease is one of method and an effective classifier to further improve the early diagnosis
the important ones that endanger the health of the elderly at present [1]. of AD. Y. Gharaibeh M et al. used different pre-training models to extract
At present, it affects more than 50 million people worldwide, and is features: Inception V3 and DenseNet201. The PCA method is used to
expected to affect 150 million people by 2050. Since the pathogenic select features with 0.99 interpretation variance ratio, where the com­
mechanism of AD has not been fully elucidated, there is currently no bination of selected features from two pre-training models is fed into the
cure for AD in humans. One of the important reasons is that when the machine learning classifier, and the accuracy of Alzheimer’s disease
disease has not yet been detected, it has already progressed irreversibly, classification is 99.14% [4]. Singh S et al. adopted a hybrid strategy of
resulting in significant memory loss and neurological decline [2]. ant colony optimization (ACO) and feedforward convolution neural
Generally speaking, the course of AD is divided into three stages: first, network (CNN or ConvNet), achieving an accuracy of 98.67% [5]. Pan D
pre-symptomatic AD, then mild cognitive impairment (MCI), and adopts an adaptive interpretable integrated model based on 3D convo­
finally, a gradual development into AD. Among them, MCI is often lution neural network (3DCNN) and genetic algorithm (GA), that is,
mistaken as a manifestation of normal aging and misses the best time for 3DCNN + EL + GA is used to distinguish AD and MCI, and the
treatment. Therefore, early diagnosis of AD is crucial to delaying the discriminative brain regions that are significantly helpful for classifi­
disease, changing the disease process, or even preventing the disease cation are identified in a data-driven manner [6]. Velliangiri S et al. used

* Corresponding author. School of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035, China.
** Corresponding author.
E-mail addresses: [email protected] (Z. Hu), [email protected] (M. Xu).

https://doi.org/10.1016/j.compbiomed.2023.106930
Received 23 August 2022; Received in revised form 15 March 2023; Accepted 13 April 2023
Available online 14 April 2023
0010-4825/© 2023 Elsevier Ltd. All rights reserved.
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

the depth feature reduction technology and the gradient face optimizer this method in dealing with practical problems, this paper discretizes it
optimized dual support vector machine classifier (TSVM-GDO) to clas­ into binary ISSA (IBSSA) and applies it to feature selection with the goal
sify AD diseases, which greatly improved the classification accuracy and of finding the optimal feature subset. On the one hand, on the datasets of
greatly shortened the execution time [7]. Seo Jungryul et al. used a deep BreastCancer, glass, hepatitisfulldata, Lymphography, and WDBC data­
learning model combining multi-layer perceptron, SVM, and RNN and sets obtained from the UCI Machine Learning Repository, the effec­
achieved an experimental accuracy rate of 70.97% [8]. tiveness of this method is tested in terms of classification accuracy,
The advantages of KNN are that it is user-friendly, easy to under­ sensitivity, and specificity, and other indicators. On the other hand, in
stand, interpretable, and has a high accuracy rate. The KNN method order to verify the effectiveness of this method in the diagnosis of early
equally weighs the selected neighbors without considering their space AD, we used MRI, PET, and CSF multimodal feature data from the in­
with specific points [9]. Usually, we use a more advanced form of the ternational Alzheimer’s disease neuroimaging initiative (ADNI) and
KNN method. Keller introduces fuzzy sets to improve KNN and proposes compared them with other methods, which are swarm intelligence al­
a fuzzy K-nearest neighbor (FKNN). It applies fuzzy logic by assigning a gorithms combined with a FKNN classifier. The experimental results
certain degree of membership to groups based on the space of each show that the IBSSA-FKNN method can effectively improve the classi­
k-nearest neighbor [10]. Since FKNN was proposed, it has been widely fication performance and the performance of early AD diagnosis. It has a
used in various classification task, and has been applied in many fields, good application prospect and will bring great convenience for clini­
such as biological and image data classification [11], face recognition cians to make better decisions in clinical diagnosis.
[12], Parkinson’s disease diagnosis [13], tracking moving targets in The rest of this paper is organized as follows: Section 2 introduces the
videos [14], etc. Meanwhile, some researchers use meta-heuristics to FKNN classifier; Section 3 introduces the salp swarm algorithm and the
solve practical problems, such as medical diagnosis [15,16], financial binary salp swarm algorithm; Section 4 introduces the classification
distress prediction [17], parameter extraction of solar cells [18], engi­ method proposed in this paper, namely IBSSA-FKNN; Section 5 conducts
neering design problems [19,20], feature selection [21,22], education experiments and results analysis on traditional datasets; and Section 6
prediction [23], PID control [24], wind speed prediction [25], rolling conducts experiments and results analysis on sMRI datasets. Finally,
bearing fault diagnosis [26], gate resource allocation [27] and sched­ Section 7 discusses the conclusions and introduces the prospects for
uling problems [28], etc. When using FKNN to solve practical problems, future work.
there are two problems to deal with. On the one hand, proper kernel
parameter settings play an important role in designing an effective SVM 2. Background materials
model. The first parameter, adaptively specified the neighborhood size
k, and the second parameter, fuzzy intensity parameter m. On the other 2.1. Fuzzy K-nearest neighbors(FKNN)
hand, choosing the optimal subset of input features also greatly affects
the performance of the FKNN model. KNN is one of the simplest classifiers. For the samples to be classified,
Feature selection is a commonly used dimensionality reduction KNN determines the sample class as the pattern of the neighbor’s class
method that refers to selecting a subset of attributes from the original set according to the k neighbors closest to the sample. However, this
of attributes. Its main purpose is to identify important features, elimi­ method defaults to assuming that each sample has the same weight and
nate the irrelevance of unnecessary features, and build a good learning has only one class, which is not the case in reality. In order to solve these
model. Feature selection greatly reduces the computational time of the two problems, Keller introduced fuzzy set theory into KNN and proposed
induction algorithm and improves the accuracy of the resulting model. the FKNN algorithm. In FKNN, each sample now belongs to multiple
Feature selection can be divided into two categories: correlation-based classes with different membership degrees, and no longer belongs to
filtered feature selection and search-based heuristic feature selection only one class. Furthermore, FKNN assigns different weights to each
[29]. In recent years, algorithms inspired by nature have become very neighbor according to the distance between samples. Simply put,
popular for solving various optimization problems. However, some meta neighbors with similar distances have greater weight in determining the
heuristic algorithms proposed recently, for example, the monarch but­ class than those with farther distances. In FKNN, the fuzzy membership
terfly optimization (MBO) [30], slime mould algorithm (SMA) [31], of samples is assigned to different classes according to the following
moth search algorithm (MSA) [32], hunger games search (HGS) [33], formula:
Runge Kutta method (RUN) [34], colony predation algorithm (CPA)

k ( /⃦ ⃦2/(m− 1) )
[35], weighted mean of vectors (INFO) [36] and Harris hawks optimi­ uij 1 ⃦x − xj ⃦
zation (HHO) [37], have also attracted the attention of many scholars. In ui (x) =
j=1
(1)
k ( /⃦ ⃦2/(m− 1) )
this paper, the binary salp swarm algorithm(BSSA) is used to optimize ∑
1 ⃦x − xj ⃦
the FKNN classifier and perform feature selection at the same time. The j=1

salp swarm algorithm (SSA) is a global optimization algorithm based on


swarm intelligence that was proposed by Mirjalili et al., in 2017 [38]. where i = 1, 2, …, C, j = 1, 2, …, k, the number of classes is C, and the
The algorithm is simple and effective, and since it was proposed, it has number of nearest neighbors is K. When calculating the contribution of
been applied to various optimization tasks. each neighbor to the membership value, the fuzzy intensity parameter m
The choice of dimensionality reduction method and classifier is of is used to determine the weight of the distance, and its value is usually
⃦ ⃦
great significance for the early diagnosis of AD. The classifier based on chosen as m ∈ (1, ∞). ⃦x − xj ⃦ is the distance between x and its j-th
FKNN has achieved excellent performance on disease diagnosis prob­ nearest neighbor xj , Euclidean distance is usually chosen as the distance
lems such as the early diagnosis of AD [39] and thyroid disease diagnosis metric. uij is the membership degree of pattern xj from the training set to
[40]. In summary, this paper proposes a FKNN feature selection method class i, in the k nearest neighbors of x. In this study, we adopted con­
based on the improved binary salp swarm algorithm. First, the Cubic strained fuzzy membership, that is, we find the k nearest neighbors of
mapping method is used to initialize the population, so that the initial each training pattern (such as xk ), and the membership of xk in each
salp population covers the feasible region space more evenly; Secondly, class is assigned as:
the variable helix factor is introduced, which makes full use of the in­ { ( / )
0.51 + n k × 0.49, if j = i
dividual’s opposite solution about the origin, reduces the number of uij (xk ) = ( / ) j (2)
nj K × 0.49, if j ∕
=i
individuals beyond the boundary, and ensures the algorithm has a
detailed and flexible search ability. Finally, the best and the worst in­ The value nj is the number of neighbors found belonging to the j-th
dividuals are selected for the updated individuals to carry out dimen­ class. Note that the membership calculated by equation (2) should meet
sional random difference mutation. In order to further study the role of

2
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

the following equation:



C
μij = 1, j = 1, 2, …, n,
i=1

n
(3)
0< uij < n,
j=1
uij ∈ [0, 1]

After calculating all memberships of the query sample, assign it to


the class with the highest membership value, that is:

(4)
C
C(x) = argmax(ui (x))
i=1

The steps of FKNN are as follows.

1) Calculate the membership degree of all training samples for each


category by eq. (2);
2) For the test sample, find its K nearest neighbors by distance mea­
surement, and calculate the membership degree of the test sample for
each class by Eq. (1);
3) Get the predicted label by Eq. (4). Fig. 1. Cubic mapping when ρ = 2.59, x0 = 0.3.

2.2. Salp swarm algorithm (SSA) optimization algorithm, which plays the role of balancing global
exploration and local development, and is the most important control
Salp Swarm Algorithm (SSA) is a global optimization algorithm parameter in SSA. The expression of c1 is:
based on swarm intelligence that was proposed by Mirjalili et al., in ( )2
2017 [38]. The salp is a kind of marine creature with body tissue and a − 4l

movement mode highly similar to jellyfish, and it is a kind of capsule (7)


L
c1 = 2e
animal that can float freely. In the chain-like group behavior of salps,
individuals usually connect head to tail to form a “chain” and move in where l is the current iteration number; L is the maximum iteration
sequence. In the salp chain, divided into leaders and followers, the number. The convergence factor is a decreasing function of 2–0.
leader moves towards the food and guides the movement of the fol­ The control parameters c2 and c3 are random numbers of [0,1],
lowers who follow them. According to the strict “hierarchy” system, the which are used to enhance the randomness of Xdl and improve the global
movement of the followers is affected only by the previous step. Such a search and individual diversity of the chain groups.
movement mode makes the salps chain have a strong ability for global Followers position update: in the process of movement and foraging
exploration and local development. in the salp chain, the followers move forward in a chain-like manner
Population initialization: Let the search space be the Euclidean space through the mutual influence between the front and rear individuals.
of D × N, D is the space dimension, and N is population number. The Their displacement conforms to Newton’s law of motion, and their
position of the salps in space is denoted by Xn = [Xn1 , Xn2 , Xn3 , ⋯, XnD ]T , motion displacement is:
the position of the food is denoted by Fn = [Fn1 , Fn2 , Fn3 , ⋯, FnD ]T , n = 1
1, 2, 3,⋯,N.The upper bound of the search space is ub = [ub1 ,ub2 ,ub3 ,⋯, X = at2 + v0 t (8)
2
ubj , ⋯, ubD ],and the lower bound is lb = [lb1 , lb2 , lb3 , ⋯, lbj , ⋯, lbD ], j =
1, 2, 3, ⋯, D. where t is the time; a is the acceleration, the formula is a = (vfinal −
v0 )/t; v0 is the initial velocity, and vfinal = (Xdi − Xdi− 1 )/t.
XD×N = rand(D, N) ⋅ (ub − lb) + lb (5)
Considering that t is iterative in the optimization algorithm, let t = 1
Leaders in the population are represented by Xdl ,
followers are rep­ and v0 = 0 in the iterative process. Then formula (4) can be expressed as:
resented by Xdi , i = 1, 2, 3.4, ⋯N; d = 1, 2, 3, ⋯D.
Xdi − Xdi− 1
Leader position update: During salp chain movement and foraging, X= (9)
2
the leader’s position update is expressed as:
{
Fd + c1 ((ub − lb)c2 + lb), c3 ≥ 0.5 where i ≥ 2; Xdi and Xdi− 1 are the positions of two salps that are closely
xld = (6) connected to each other in the d-th dimension, respectively. Therefore,
Fd − c1 ((ub − lb)c2 + lb), c3 < 0.5
the position of the follower is expressed as:
where Xdl and Fd are the position of the first salp (leader) and the position
Xdi + Xdi− 1
(10)

of food in the d-th dimension, respectively; ub and lb are the corre­ Xdi =
2
sponding upper and lower bounds, respectively. where c1 , c2 , and c3 are
control parameters. where Xdi and Xdi are the updated follower’s position and the pre-update

Equation (2) shows that the leader’s location update is only related follower’s position in the d-th dimension, respectively.
to the location of the food. c1 is the convergence factor in the

3
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

The pseudocode of the SSA algorithm is as follows. 3.2. Binary mechanism


Algorithm 1. Salp Swarm Algorithm
In practice, according to the different types of solutions, it can be
divided into continuous solution space and discrete solution space,
while the standard Salp Swarm Algorithm can only use the position

3. Improved binary salp swarm algorithm (IBSSA) vector in the continuous domain to move around the search space. The
transformation between the continuous solution space and the discrete
3.1. Population initialization strategy based on cubic chaotic map solution space can be discretized through a specific transformation
function, generally using a sigmoid transformation function. At the same
Chaotic sequences have the advantages of easy implementation, time, the position of the salps may stay at some local points and remain
short execution time, and being able to jump out of the local optimal unchanged when the value is large. To avoid this weakness, the sigmoid
value, so they are widely used in random based optimization algorithms. transformation function is used here [42]. Use the particle’s velocity
Lyapunov exponent is often used to judge the dynamic performance of probability to change the position of an element.
the system. The larger the value, the higher the degree of chaos. FENG ( ) 1
et al. analyzed the best chaotic sequences generated by 16 common S xij (t) = ( ) (13)
1 + exp − xij (t)
chaotic maps [41], and the results showed that the running time of cubic
chaotic map is short, and the lyapunov exponent is close to the optimal
where xij (t) is the velocity of the i-th individual in the j-th dimension at
value. In this paper, cubic chaotic map is used to optimize the initial
solution and improve the search efficiency. time t; S(xij (t)) is the transformation probability that the position xij (t)
The expression of the standard Cubic chaotic mapping function is: takes 1 or 0.
After calculating the transition probability, the following equation
xn− 1 = αxn3 − βxn (11) (10) needs to be used to update the position of the salps:
⎧ ( )
where, α and β are chaos influencing factors, and the range of Cubic ⎨ 1, rand ≥ S xji (t)
maps with different α and β values is different. Generally, when β ∈ xij (t) = ( ) (14)
⎩ 0, rand < S xi (t)
(2.3, 3), the sequence generated by Cubic mapping is chaotic. j
In addition, when α = 1, xn ∈ ( − 2, 2). when α = 4, xn ∈ ( − 1, 1). To
make xn ∈ (0, 1), the Cubic mapping used in the improved algorithm is where xij (t) is the position of the i-th salps in the j-th dimension at time t.
in the following form:
Xmax is selected as the maximum position value to limit the range of xij (t),
( )
xn+1 = ρxn 1 − x2n , xn ∈ (0, 1) (12) that is, xij (t) ∈ [ − Xmax , Xmax ], and also limit the probability that the po­
sition xij (t) is converted to 1 or 0.
where, ρ is the control parameter. Chaos of Cubic map is closely related
to the value of parameter ρ. Here, take the initial value x0 = 0.3, and the
number of iterations is 10000. The simulation results of Cubic mapping 3.3. Follower position updating strategy based on the variable helix
are shown in Fig. 1. mechanism
It can be seen from the figure that when ρ = 2.59, Cubic map is a full
map between (0,1) and has the best chaotic ergodicity. The location update of the i-th follower of the thallus group algo­
rithm is determined by the location coordinates of the i-th and i-1
thallus, and this location update rule is only determined by the positions

4
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

of the previous individual and the current individual in the thallus chain,
Therefore, the updated thimbles are highly dependent on the leader
individuals of the previous update, which easily limits the global search
ability and local search speed of the algorithm. To solve the above
problems, the variable helix factor is introduced, which makes full use of X
the individual’s opposite solution about the origin, reduces the number
of individuals beyond the boundary, and ensures the algorithm has a
detailed and flexible search ability.
The variable helix factor is calculated as follows:
H = a⋅cos(k ⋅ l ⋅ π) (15)
⎧ km
X

⎨ M
1, t <
a= 2 (16)

⎩ e5⋅l , otherwise

t
l = 1 − 2⋅ (17)
M

where, H represents variable helix factor; a is the parameter used to


control the spiral. The early iteration value is close to 1, and the later
iteration value gradually decreases; k is the parameter representing the
km
spiral cycle, and the value is M/10; l is a parameter that decreases lin­
early from 1 to - 1 as the number of iterations increases.
The improvement of the follower’s extensive search enables the km

follower to make full use of the entire search space, more easily get rid of
the attraction of the local optimal solution, strengthen the search of the
entire space, maintain the diversity of the population, enhance the early
algorithm exploration ability, and improve the later algorithm devel­
opment ability. Based on this, the follower formula is updated as follows:

⎪ 1 ( i i− 1
) M

⎨ ⋅cos(a⋅l⋅π)⋅ xd + xd , t <
2 2
(18)

xid = Fig. 2. Flowchart of the proposed IBSSA-FKNN diagnostic system.

⎪ 1 ( i ) M
⎩ ⋅e ⋅cos(a⋅l⋅π)⋅ x + x
5⋅l i− 1
, t >
d d
2 2
different distances from different neighbors to the center of the sample
class. FKNN needs to get the distance from all training set samples to get
3.4. Dimensional random difference mutation
its k neighbors, resulting in a large amount of computation. In view of
the problems existing in FKNN, many scholars have studied and
Use random difference mutation to carry out dimensional mutation,
improved it, mainly aiming at some parameter selection or optimization
and obtain a new individual dimension through this mutation. The
problems involved in this algorithm. These improved methods have
specific formula is as follows.
improved accuracy and reliability to a certain extent, but still have
( ) ( ′ )
xij = r1 × Fj − xij + r2 × xj − xij (19) shortcomings. On the other hand, SSA has strong optimization ability
and high optimization accuracy. But for complex problems, they will
Among them, xij is the j-th dimension of the i-th individual in the salps also fall into local extremum. Therefore, first, the Cubic mapping
method is used to initialize the population, so that the initial salp pop­
group; Fj is the j-th dimension of food source location; xj is the j-th

ulation covers the feasible region space more evenly; Secondly, the
dimension of a random individual in the population; r1 and r2 are
variable helix factor is introduced, which makes full use of the in­
random numbers of [0,1]. After the population location update is
dividual’s opposite solution about the origin, reduces the number of
completed, use the dimension-by-dimension random differential muta­
individuals beyond the boundary, and ensures the algorithm has a
tion to mutate each dimension of the individual, and evaluate a certain
detailed and flexible search ability. Finally, the best and the worst in­
dimension after it mutates. If it is excellent, retain the solution after the
dividuals are selected for the updated individuals to carry out dimen­
mutation. If the evaluation result after the mutation becomes poor,
sional random difference mutation. In order to further study the role of
discard the poor dimension information, reduce the interference be­
this method in dealing with practical problems, this paper discretizes it
tween each dimension, and increase the search scope. Due to the
into binary ISSA (IBSSA) and applies it to feature selection with the goal
blindness of mutation operation, the search efficiency of the algorithm
of finding the optimal feature subset.
will be reduced and the calculation amount will be greatly increased if
In this section, we use the IBSSA algorithm for feature selection to the
all individuals are subjected to dimensional random differential muta­
original FKNN and create a model called IBSSA-FKNN. The main goal of
tion. Therefore, only the best and worst individuals in the population are
this model is to optimize the FKNN classifier: (1) determine the number
selected for mutation. The best individual mutation can improve the
of the nearest neighbors k and the fuzzy strength parameter m;(2)
search efficiency, and the worst individual mutation can improve the
identify the best subset of discriminative features and feature selection.
search range and jump out of the local optimal solution.
The appropriate feature subset obtained is used as input to the optimized
FKNN model for classification. The IBSSA-FKNN method takes diag­
4. Proposed IBSSA-FKNN model
nostic accuracy as the fitness of feature selection. The IBSSA-FKNN
flowchart of the overall architecture of the proposed model is shown
On the one hand, in FKNN, the distance weights of k neighbors are
in Fig. 2.
calculated based on distance measures, without distinguishing the
A flag vector for feature selection is shown in Fig. 3. The vector
importance of features and without taking into account the impact of

5
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

consisting of a series of binary values of 0 and 1 represents a subset of 5. Traditional data set experiment
features, that is, an actual feature vector, which has been normalized.
For a problem with D dimensions, there are D bits in the vector. The i-th 5.1. Dataset description
feature is selected if the value of the i-th bit equals one; otherwise, this
feature will not be selected (i = 1, 2, …., D). The size of a feature subset is In order to verify the effectiveness of the proposed method, this
the number of bits, whose values are one in the vector. The pseudocode section conducts experiments on the SSA-FKNN method on 5 classifi­
of the IBSSA algorithm is presented as shown in Algorithm 2. cation datasets, which are BreastCancer, glass, hepatitisfulldata,
Lymphography, WDBC. The datasets are from the UCI Machine Learning
Algorithm 2. Pseudo-code for feature selection procedure
Repository (http://archive.ics.uci.edu/ml/datasets). Among them, the
BreastCancer dataset has 699 data, including 9 features and 2 categories;

the grass dataset has 214 data, including 10 features and 2 categories;
the hepatitisfulldata dataset has 155 data, including 20 features and 2
After the parameter pair and feature subset were obtained, the FKNN
categories; Lymphography dataset has 148 data, including 18 features
model began to perform the classification tasks. At first, the FKNN
and 4 categories; WDBC dataset has 569 data, including 30 features and
trained on reduced the training feature space using the parameter pair to
2 categories. The specific description information of the dataset is shown
evolve an optimal model, and then the optimal FKNN model was
in Table 1.
employed to predict the new samples on the reduced testing feature
Before the experiment, the data needs to be preprocessed first. Since
space. The whole process was done via the 10-fold CV analysis, and
the BreastCancer data set has the missing features, in order to ensure the
finally the average results over 10 folds were computed. The detailed
integrity of the sample data, the average value of these records is pro­
pseudo-code for the classification phase is as follows.
cessed in this experiment. At the same time, in order to reduce the dif­
Algorithm 3. Pseudo-code f or the classification procedure ference between the eigenvalues and prevent the larger eigenvalues

6
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

Table 3
The detailed results of the IBSSA-FKNN model on the BreastCancer dataset.
Runs of 10- ACC SEN SPE PRE No.of selected
fold CV feature

#1 1 1 1 1 4
#2 1 1 1 1 4
#3 1 1 1 1 4
#4 0.98571 1 1 0.97872 4
#5 1 1 0.95833 1 4
Fig. 3. A flag vector for feature selection. #6 1 1 1 1 4
#7 1 1 1 1 3
#8 0.98571 0.97826 1 1 3
from excessively affecting the smaller eigenvalues, we normalize each
#9 0.98551 0.97778 1 1 4
eigenvalue to the [-1,1] interval. The normalized calculation formula is: #10 0.97183 0.95652 1 1 5
( ) Mean 0.9928 0.9913 0.9958 0.9979 3.9
x − mina
(20)

x = ∗2− 1
maxa − mina
through the KNN model. K-fold cross-validation is mainly used to obtain
where x is the original value of the data, x is the normalized value, maxa

an unbiased estimate of generalization accuracy. If K is set to 10, the
is the maximum value in feature a, and mina is the minimum value in data set is divided into 10 subsets, one of which is taken as the test set,
feature a. and the remaining part is taken as the training set. Then, the average
error of all 10 tests is calculated. During the implementation of the K-
fold cross-validation strategy, all test sets are independent, and rela­
5.2. Experimental setup and description
tively stable and reliable results can be obtained. In addition, this section
uses the IBSSA algorithm to generate the optimal feature subset on the
The proposed IBSSA-FKNN method is implemented on the MAT­
training set and then uses the validation dataset filtered by the optimal
LAB2018b platform. This experiment is performed on an NVIDIA
feature subset to classify using the FKNN classifier to obtain the final
GeForce GTX 1660 with Windows 10 as the operating system. The
result. In subsequent experiments, the best results of the evaluation in­
detailed parameters of IBSSA-FKNN are set as follows: the number of
dicators have been bolded in the table.
populations is 20, and the maximum number of iterations is set to 1000.
In order to verify the effectiveness of the improved IBSSA algorithm in
5.3. Evaluation criteria
feature selection, a total of 5 comparison algorithms are set up for
comparison, which are Binary Bat Algorithm (BBA) [43], Binary Moth
Evaluate the classification performance of this method, which are
Flame Optimizer (BMFO) [44], Quantum Gaussian Dragonfly Algorithm
classification Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), Pre­
(QGDA) [45], Binary Quantum Grey Wolf Optimization Algorithm
cision(PRE), F-measure. Defined as follows:
(BQGWO) [46], Binary Spread Strategy with the Chaotic Local Search
Accuracy is the proportion of the total number of correct predictions.
Grey Wolf Optimization (BSCGWO) [47]. The parameter settings of the
Use the following methods to determine:
contrast group intelligent optimization algorithm involved in this paper
are shown in Table 3. TP + TN
ACC = × 100% (21)
The experiments are mainly carried out by using the wrapped feature TP + TN + FN + FP
selection method. During the experiments, the IBSSA algorithm is used Sensitivity is an index used to measure the classifier’s recognition of
to generate feature subsets, and the resulting feature subsets are eval­ abnormal records, and is also often expressed as the TP rate.
uated using the results obtained by the FKNN classifier. In the feature
selection process, the IBSSA algorithm realizes the search through a ten- SEN =
TP
× 100% (22)
fold cross-validation strategy and applies it to practical problems TP + FN
Specificity is often used to estimate the ability of a classification
Table 1 model to identify normal examples, which is also often expressed as the
Detailed description of the dataset. TN rate.
NO. dataset number of number of number of is there a TN
categories samples features missing SPE = × 100% (23)
value
TN + FP

1 BreastCancer 2 699 9 yes Precision is the correct proportion of positive instances of prediction,
2 glass 2 214 10 no as calculated using:
3 hepatitisfulldata 2 155 20 no
4 Lymphography 4 148 18 no
5 WDBC 2 569 30 no
Table 4
The detailed results of the IBSSA-FKNN model on the glass dataset.
Table 2 Runs of 10-fold CV ACC SEN SPE PRE No.of selected feature
Parameter setting of swarm intelligence optimization algorithm.
#1 0.95238 0 0 0.95238 3
Algorithm Parameters #2 0.85 0 0 0.85 3
#3 0.90476 0 0 0.90476 4
BBA [fmin , fmax ] = [0, 2]; A = 0.5; r = 0.5; α = 0.95; γ = 0.05
#4 0.80952 0 0 0.80952 3
BMFO b =1
#5 0.90909 0 0 0.90909 3
BQGWO a = 2 − FEs × (2 /MaxFEs); r1 = r2 = rand(0, 1); A = 2 × a × r1 ; C =
#6 0.95 0 0 0.95 4
2 × r2 ; β = ω = 10
#7 0.86957 0 0 0.86957 0
BSCGWO a = 2 − FEs × (2 /MaxFEs); r1 = r2 = rand(0, 1); A = 2 × a × r1 ; C =
#8 0.90909 0 0 0.90909 0
2 × r2 ; β = a ∗ rand(0, 1)
) #9 0.78261 0 0 0.78261 4
IBSSA 4 × FEs 2
− ( #10 0.90476 0 0 0.90476 6
c1 = 2 × e MaxFEs ; c2 = c3 = rand(0, 1); ρ = 2.59; x0 = 0.3 Mean 0.8842 0 0 0.8842 3

7
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

Table 5 Table 9
The detailed results of the IBSSA-FKNN model on the hepatitisfulldata dataset. Experimental results of six methods on the glass dataset.
Runs of 10-fold CV ACC SEN SPE PRE No.of selected feature Algorithm Features’ ACC SEN SPE PRE F-measure
size (%) (%) (%) (%) (%)
#1 1 1 1 1 2
#2 1 1 1 1 5 IBSSA- 3 0.8842 0 0 0.8842 0
#3 1 1 1 1 4 FKNN
#4 1 1 1 1 2 BBA-FKNN 4.2 0.6856 0 0 0.6856 0
#5 1 1 1 1 2 BMFO- 3.7 0.8788 0 0 0.8788 0
#6 1 1 1 1 3 FKNN
#7 1 1 1 1 3 QGDA- 4.3 0.8773 0 0 0.8773 0
#8 1 1 1 1 5 FKNN
#9 1 1 1 1 4 BQGWO- 3.9 0.8595 0 0 0.8595 0
#10 1 1 1 1 1 FKNN
Mean 1 1 1 1 3.1 BSCGWO- 3.7 0.8744 0 0 0.8744 0
FKNN

Table 6
The detailed results of the IBSSA-FKNN model on the Lymphography dataset. Table 10
Experimental results of six methods on the hepatitisfulldata dataset.
Runs of 10-fold CV ACC SEN SPE PRE No.of selected feature
Algorithm Features’ ACC SEN SPE PRE F-
#1 1 0 0 1 4
size (%) (%) (%) (%) measure
#2 0.9375 0 0 0.9375 3
(%)
#3 1 0 0 1 3
#4 1 0 0 1 7 IBSSA- 3.1 1 1 1 1 1
#5 1 0 0 1 7 FKNN
#6 1 0 0 1 7 BBA-FKNN 8.2 0.8410 0.6417 0.8949 0.6683 0.6309
#7 1 0 0 1 3 BMFO- 7.6 1 1 1 1 1
#8 1 0 0 1 6 FKNN
#9 0.92857 0 0 0.92857 3 QGDA- 3.6 1 1 1 1 1
#10 1 0 0 1 8 FKNN
Mean 0.9866 1 1 0.9866 5.1 BQGWO- 3.9 0.9875 0.9750 0.9923 0.9750 0.9714
FKNN
BSCGWO- 3.7 0.9933 0.9667 1 1 0.9800
FKNN
Table 7
The detailed results of the IBSSA-FKNN model on the WDBC dataset.
Runs of 10-fold CV ACC SEN SPE PRE No.of selected feature Table 11
#1 0.98276 0.95455 1 1 13 Experimental results of six methods on the Lymphography dataset.
#2 1 1 1 1 5
Algorithm Features’ ACC SEN SPE PRE F-measure
#3 1 1 1 1 9
size (%) (%) (%) (%) (%)
#4 1 1 1 1 2
#5 0.98246 1 1 1 2 IBSSA- 5.1 0.9866 0 0 0.9866 0
#6 1 1 1 1 6 FKNN
#7 1 1 1 1 8 BBA-FKNN 7.1 0.8268 0 0 0.8268 0
#8 1 1 1 1 3 BMFO- 6.7 0.9749 0 0 0.9749 0
#9 1 1 1 1 4 FKNN
#10 1 1 1 1 3 QGDA- 5.7 0.9799 0 0 0.9799 0
Mean 0.9965 0.9955 1 1 5.5 FKNN
BQGWO- 4.5 0.9804 0 0 0.9804 0
FKNN
BSCGWO- 4 0.9518 0 0 0.9518 0
Table 8 FKNN
Experimental results of six methods on the BreastCancer dataset.
Algorithm Features’ ACC SEN SPE PRE F-
size (%) (%) (%) (%) measure
(%)
Table 12
IBSSA- 3.3 0.9929 0.9913 0.9958 0.9979 0.9945 Experimental results of six methods on the WDBC dataset.
FKNN
Algorithm Features’ ACC SEN SPE PRE F-
BBA-FKNN 3.8 0.9411 0.9453 0.9333 0.9650 0.9543
size (%) (%) (%) (%) measure
BMFO- 3.6 0.9871 0.9847 0.9917 0.9957 0.9901
(%)
FKNN
QGDA- 3.6 0.9872 0.9847 0.9920 0.9957 0.9901 IBSSA- 5.5 0.9965 0.9955 1 1 0.9953
FKNN FKNN
BQGWO- 4.1 0.9885 0.9913 0.9833 0.9914 0.9913 BBA-FKNN 10 0.9474 0.9288 0.9583 0.9353 0.9297
FKNN BMFO- 12.5 0.9965 0.9952 0.9972 0.9955 0.9952
BSCGWO- 3.4 0.9857 0.9869 0.9833 0.9914 0.9890 FKNN
FKNN QGDA- 5.7 0.9982 0.9952 0.9972 0.9955 0.9977
FKNN
BQGWO- 4.1 0.9948 0.9907 0.9972 0.9952 0.9930
TP FKNN
PRE = × 100% (24) BSCGWO- 4.3 0.9947 0.9859 1 1 0.9928
TP + FP
FKNN
Among them, TP (True Positive), FP (False Positive), TN (True
Negative) and FN (False Negative) represent true positive, false positive,
true negative and false positive, respectively.

8
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

Fig. 4. The frequency of selected features in 10-fold CV on the BreastCancer


data set.

Fig. 7. The frequency of selected features in 10-fold CV on the Lymphography


data set.

5.4. Experimental results

Table 3, 4, 5, 6, and 7 show the comparison of the results of the


IBSSA-FKNN algorithm performing 10 times of 10-fold cross-validation
on 5 datasets, respectively. The performance evaluation criteria include
training classification Accuracy (ACC), Sensitivity (SEN), Specificity
(SPE), precision(PRE) and number of feature selection. As can be seen
from Table 3, the experimental results of the IBSSA-FKNN method on the
BreastCancer dataset are recorded in detail. During the 10-fold opera­
tion, the average values of the five evaluation indicators are 0.9928,
Fig. 5. The frequency of selected features in 10-fold CV on the glass data set. 0.9913, 0.9958, 0.9979 and 3.9, respectively. Similarly, it can be seen
from Table 4 that the experimental results of the IBSSA-FKNN method
on the glass dataset are recorded in detail. During the 10-fold operation,
the average values of the five evaluation indicators are 0.8842, 0, 0,
0.8842 and 3, respectively. It can be seen from Table 5 that the exper­
imental results of the IBSSA-FKNN method on the hepatitisfuldata
dataset are recorded in detail. During the 10-fold operation, the average
values of the five evaluation indicators are 1, 1, 1, 1 and 3.1, respec­
tively. It can be seen from Table 6 that the experimental results of the
IBSSA-FKNN method on the Lymphagraphy data set are recorded in
detail. During the 10-fold operation, the average values of the five
evaluation indicators are 0.9866, 1, 1, 0.9866 and 5.1, respectively. It
can be seen from Table 7 that the experimental results of the IBSSA-
FKNN method on WDBC dataset are recorded in detail. During the 10-
fold operation, the average values of the five evaluation indicators are
0.9965, 0.9955, 1, 1 and 5.5, respectively.
This section compares the IBSSA algorithm with other 5

Fig. 6. The frequency of selected features in 10-fold CV on the hepatitisfulldata


data set.

Lewis and Gale proposed the F-measure in 1994, which is defined as


follows:
( 2 )
β + 1 ∗ Pr ecision ∗ Sensitivity
F− = (25)
β2 ∗ Pr ecision + Sensitivity

In Equation (25) above, there is a value from 0 to infinity to control the


weights assigned to the precision and sensitivity. If all positive instances
are classified incorrectly, any classifier evaluated using the above will
have a metric of 0. In this experiment, the β value was set to 1.

Fig. 8. The frequency of selected features in 10-fold CV on the WDBC dataset.

9
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

Fig. 9. Comparison of time consumption of different methods.

metaheuristic optimization algorithms on 5 different datasets to test its thickness, cell shape uniformity, edge adhesion, and chromatin, which
performance on feature selection problems. Tables 8–12 record the have been selected more than 5 times, so we think that these features can
mean values of the selected feature length, classification accuracy, be used as a reference for distinguishing breast cancer, which can be
sensitivity, specificity, precision, and F-measure obtained by the BMFO, found in the selection feature’s 10-fold CV.
BBA, QGDA, BQGWO, BSCGWO, and IBSSA algorithms under the The 10-fold selection features in the glass dataset are shown in Fig. 5.
experiment of 10-fold crossover. The original number of features in the glass dataset is 9. After feature
It can be seen from the experimental results in Tables 8–12 that for selection, not all features are selected for classification. The average
the IBSSA algorithm, only in the BreastCancer dataset, the sensitivity number of selected features of the IBSSA-FKNN method is 3.6, and its
index is slightly inferior to other algorithms. On the four datasets, most important features are F1, F4, F6, F7, and F8, all of which have
including grass, the algorithm achieves the best selected feature length, been selected more than 4 times or more, so we think that these features
classification accuracy, sensitivity, specificity, precision, and F-measure. can be used as a reference for distinguishing. It can be found in the se­
For example, on the BreastCancer dataset, the IBSSA algorithm obtained lection feature’s 10-fold CV.
the optimal average number of feature selection 3.3, the optimal average The 10-fold selection features in the hepatitisfulldata dataset are
classification accuracy 99.29%, the optimal average sensitivity 99.13%, shown in Fig. 6. The original feature number in the hepatitisfulldata
the optimal average specificity 99.58%, the optimal average precision dataset is 19. After feature selection, not all features are selected for
99.79%, and the optimal average F-measure 99.45%. Experimental re­ classification. The average number of selected features of the IBSSA-
sults show that the IBSSA algorithm improves the classification accu­ FKNN method is 3.3, and its most important features are F1, F2, F3,
racy, sensitivity, specificity, precision, and F-measure of feature subsets and F11, all of which have been selected more than 3 times or more, so
to a certain extent. It is worth noting that although this algorithm does we think that these features can be used as a reference for distinguishing,
not perform very well in improving classification accuracy, they have a which can be found in the selection feature’s 10-fold CV.
better performance in reducing the data dimension. The 10-fold selection features in the Lymphography dataset are
In order to explore how many and which features are selected in the shown in Fig. 7. The original number of features in the Lymphography
feature selection process, we further conduct experiments on 5 datasets dataset is 17. After feature selection, not all features are selected for
to investigate the details of the feature selection mechanism of the salp classification. The average number of selected features of the IBSSA-
swarm optimization algorithm. Figs. 3–7 show the statistical diagram of FKNN method is 3.6, and its most important features are F2, F7, F11,
the number of times each feature value is selected in the 10-fold cross- F13, and F14, all of which have been selected more than 4 times or more,
validation experiment of the IBSSA-FKNN method. From these figures, so we think these features can be used as a reference for distinguishing.
we can find that some features are selected more times, while some It can be found in the selection feature’s 10-fold CV.
features are selected less times. The 10-fold selection features on the WDBC dataset are shown in
The 10-fold selection features in the BreastCancer dataset are shown Fig. 8. The original number of features on the Wdbc dataset is 29. After
in Fig. 4. The original number of features in the BreastCancer dataset is feature selection, not all features are selected for classification. The
9. After feature selection, not all features are selected for classification. average number of selected features of the IBSSA-FKNN method is 4.3,
The average number of selected features for the IBSSA-FKNN method is and its most important features are F1, F2, F3, F14, F17, F21, F24, which
3.3, and its most important features are F1, F3, F4, F7, i.e. bundle are all selected more than 3 times or more, so we think these features can
be used as a reference for distinguishing. It can be found in the selection
Table 13 feature’s10-fold CV.
Subject information(mean ± std).
category Number of Age Years of MMSE ASAS-
6. sMRI dataset experiment
subjects Education Cog
6.1. Dataset description
AD 51 75.2 ± 14.7 ± 3.6 23.8 ± 18.3 ±
7.4 2.0 6.0
NC 52 75.3 ± 15.8 ± 3.2 29.0 ± 7.4 ± The experimental data are obtained from the international Alz­
5.2 1.2 3.2 heimer’s disease neuroimaging initiative (ADNI) database (http://adni.
MCI-C 43 75.8 ± 16.1 ± 2.6 26.6 ± 12.9 ± loni.usc.edu/). ADNI was established in 2003 by the National Institute
6.8 1.7 3.9
MCI-NC 56 74.7 ± 16.1 ± 3.0 27.5 ± 10.2 ±
on Aging (NIA), the National Institute of Biomedical Imaging and
7.7 1.5 4.3 Bioengineering (NIBIB), the Food and Drug Administration (FDA),

10
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

Table 14 modalities are selected for the experiment, and only the data collected at
Different methods classify AD/MCI, AD/NC, MCI/NC on multimodal data. the benchmark time point of these subjects are selected. In the Inter­
AD vs. MCI national AD Database, 202 subjects have the above three modalities at
the same time. Table 13 lists the demographic information of these
Algorithm Features’ ACC SEN SPE PRE F-
size (%) (%) (%) (%) measure subjects.
(%)

IBSSA- 11.5 0.9537 0.99 0.9233 0.9627 0.9657 6.2. Experimental setup and description
FKNN
BBA-FKNN 73.2 0.5608 0.6656 0.36 0.6696 0.6638
This paper adopts a 10-fold cross-validation strategy to evaluate the
BMFO- 117 0.8803 0.9289 0.79 0.8999 0.9109
FKNN classification performance of the proposed method. Specifically, the
QGDA- 44.7 0.9533 0.98 0.86 0.9409 0.9574 sample set is divided into 10 pieces on average, one of which is selected
FKNN one by one as the test set, and the remaining 9 pieces are used as the
BQGWO- 8.5 0.9667 0.9889 0.92 0.9642 0.9761
training set. Calculate the features’ size, average accuracy, sensitivity,
FKNN
BSCGWO- 5.6 0.9667 0.97 0.94 0.9727 0.9747
specificity, and F-measure of these 10 experiments as the experimental
FKNN results of one division. Then randomly exchange the order of the sam­
ples, divide the 10-fold cross validation once more, and calculate the
AD vs. NC
features’ size, average accuracy, sensitivity, specificity, and F-measure.
IBSSA- 11.4 1 1 1 1 1
Repeat the division 10 times, and calculate the features’ size, average
FKNN
BBA-FKNN 76.5 0.8145 0.8 0.83 0.8223 0.8037 accuracy, sensitivity, specificity, and F-measure for these 10 divisions.
BMFO- 103.6 0.96 0.94 0.98 0.98 0.9578 The experiment adopts the two-class method (AD/MCI, AD/NC, and
FKNN MCI/NC) to fully verify the influence of different classifications on the
QGDA- 32.3 0.97 0.98 0.96 0.9667 0.9707 experimental results.
FKNN
In order to verify the performance of the method proposed in this
BQGWO- 20.8 0.9909 0.98 1 1 0.9889
FKNN paper for the diagnosis of early AD, it is compared with five classifica­
BSCGWO- 2.6 0.99 0.98 1 0.9817 0.9889 tion methods that are also the same as the swarm intelligence optimi­
FKNN zation algorithm combined with the FKNN classifier.

6.3. Experimental results and discussion


MCI vs. NC

IBSSA- 27.3 0.9395 0.9789 0.94 0.9718 0.9555 In order to verify the performance of the IBSSA-FKNN method pro­
FKNN
BBA-FKNN 76.5 0.7432 0.8 0.6367 0.8100 0.8024
posed in this paper for early AD diagnosis, it is compared with other
BMFO- 103.6 0.8686 0.9089 0.79 0.8994 0.8970 methods of swarm intelligence optimization combined with classifiers.
FKNN The five classification methods are: a feature selection method based on
QGDA- 39.3 0.9252 0.9478 0.88 0.9436 0.9437
FKNN
BQGWO- 21.6 0.9354 0.9589 0.8567 0.9292 0.9527 Table 15
FKNN Sample size and classification results of AD prediction and diagnosis methods.
BSCGWO- 5.8 0.9137 0.92 0.9067 0.9496 0.9314 references methods sample size Accuracy(%)
FKNN
Literature SVM with Gaussian Baseline MRI:198 Baseline MRI:
[49] kernel AD,409MCI(pMCI and AD vs. NC
private pharmaceutical enterprises, and non-profit organizations. Its sMCI), 231 NC 87.9%
pMCI vs. NC
main goal is to test whether the progress of MCI and early AD can be
83.2%
measured by combining MRI, PET, other biomarkers, and clinical neu­ pMCI vs.
ropsychological evaluation. The database contains data modalities, sMCI 70.4%
including MRI image data based on time series, PET image data, and Literature Bagging algorithm and 56 AD, 60 MCI,60 NC AD vs. NC
[50] SVM 89%%
other types of biomarker values, such as CSF, and some clinical neuro­
MCI vs. NC
psychological assessment scores, such as the mini-mental state exami­ 72%
nation (MMSE) and the Alzheimer’s disease assessment scale-cognitive Literature deep full link network 65 AD,67 cMCI, 102 AD vs. HC
(ADAS-Cog). These data categories are mainly: patients with early AD, [51] and stacked self- ncMCI,77HC 87.76%%
patients with mild cognitive impairment (MCI), and the cognitive encoder MCI vs. HC
76.92%
normal control group (NC). While mild cognitive impairment (MCI) is
Literature multi-instance 198 AD,238 sMCI, 167 AD vs. NC
usually considered an early stage of AD, which is a transition state from [52] learning techniques of pMCI,234 NC 88.8%%
normal control (NC) to AD, especially late-stage MCI is likely to develop graph pMCI vs.
into AD. Therefore, MCI is generally divided into MCI converted to AD sMCI 69.6%
Literature Independent 202 AD, 410 MCI, 236 75% of data
(MCI patients who will convert to AD, MCIc) and MCI not converted to
[53] Component Analysis NC in training
AD (MCI patients who will not convert to AD, MCInc). The subjects of (ICA) and SVM set:
the ADNI database were recruited from 50 websites across the United AD vs. NC
States and Canada. Their initial goal was to recruit 800 adult volunteers, 78.4%
ranging in age from 55 to 90 years old. Among them, 200 were elderly MCI vs. NC
71.2%
people with normal cognition in the follow-up test for three consecutive
90% of data
years, 400 were patients with mild cognitive impairment in the follow- in training
up test for three consecutive years, and 200 were patients with AD in the set:
follow-up test for two consecutive years. The personal basic information AD vs. NC
85.7%%
of these subjects can be obtained from the official website of ADNI.
MCI vs. NC
In this paper, the sample data of subjects with MRI, PET, and CSF 79.2%

11
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

a binary bat algorithm combined with a fuzzy k-nearest neighbor clas­ Batmanghelich N et al. used Bagging algorithm and SVM for AD/NC
sifier; Feature selection method based on a binary Moth-Flame Opti­ classification, and logistic regression model using Boosting algorithm
mization combined with a fuzzy k-nearest neighbor classifier; Feature was used for MCI/NC classification [49]. Liu S et al. realized the diag­
selection method based on binary Gaussian discriminant analysis algo­ nosis and prediction of AD by using deep full link network and stacked
rithm combined with fuzzy k-nearest neighbor classifier; And two self-encoder [50]. Tong T et al. used multi-instance learning techniques
feature selection methods based on binary improved grey wolf optimi­ of graph to classify samples by extracting local density blocks as features
zation algorithm combined with a fuzzy k-nearest neighbor classifier. [51]. Yang W et al. used Independent Component Analysis (ICA) for
Table 14 shows the experimental results of the performance com­ feature extraction and combines it with SVM algorithm for AD predic­
parison between the IBSSA-FKNN method and the other five methods on tion [52].
the concatenated multimodal data, respectively, for classifying AD/NC,
AD/MCI, and MCI/NC. In Table 14, IBSSA-FKNN indicates that the bi­ 7. Conclusion and future work
nary salp swarm algorithm is first used for feature selection, and then the
FKNN classification model is used for classification experiments. Other In this study, we propose a FKNN feature selection method based on
methods are the same. Among them, all the experimental results listed in the binary salp swarm algorithm and apply this method to the early
Table 14 are the average value of each index divided by 10 times of 10- diagnosis of AD. First, on the datasets of BreastCancer, glass, hep­
fold cross-validation. atitisfulldata, Lymphography, and WDBC obtained from the UCI Ma­
The experimental results in Table 14 show that employing the chine Learning Repository, the effectiveness of this method is tested in
feature selection step can improve the performance of the classification terms of classification accuracy, sensitivity, and specificity and other
model in diagnosing early AD. The classification accuracy of IBSSA- aspects. Second, in order to verify the effectiveness of this method in the
FKNN method for AD and MCI, AD and NC, MCI and NC is 95.37%, diagnosis of early AD, the multimodal feature data of MRI, PET and CSF
100%, and 93.95%, respectively. From the six indicators of the three from the international AD neuroimaging initiative (ADNI) were used
groups of classification results, the methods proposed in this paper are and compared with other swarm intelligence algorithms combined with
better than the other five methods. At the same time, the advantages of the FKNN classifier. The experimental results show that the proposed
BQGWO-FKNN and BSCGWO-FKNN methods are also obvious, second IBSSA-FKNN method is superior to the other five FKNN models based on
only to IBSSA-FKNN method. In AD/NC and MCI/NC classification ex­ swarm intelligence algorithms in various performance indicators and
periments, BSCGWO-FKNN method is superior to IBSSA-FKNN in that can effectively improve the classification performance and the
selecting the number of feature subsets, but ranks first in other in­ performance of early AD diagnosis. The promising application prospect
dicators. In the AD/MCI classification experiment, IBSSA-FKNN method will bring great convenience for clinicians to make better decisions in
is slightly lower than BQGWO-FKNN or BSCGWO-FKNN method in a clinical diagnosis.
certain index. The experimental results show that IBSSA-FKNN is still On the one hand, future research will expand the suggested strategy
better than other methods in the 3 groups of classification experiments. to considerably bigger datasets. Second, the approach suggested in this
Based on the experimental analysis results in Table 2 above, the study may be developed further to improve the AD classification impact.
following conclusions can be drawn: The FKNN feature selection method Next, I’ll aim to integrate deep learning with a swarm intelligence
based on the Salp Swarm Algorithm proposed in this paper can signifi­ optimization method and apply it to the early detection of Alzheimer’s
cantly improve the classification performance of only using the FKNN disease in order to accomplish AD classification. On the other hand, this
classifier. Compared with other swarm intelligence methods combined work only focuses on a small quantity of labeled training data, although
with FKNN classifier, the method proposed in this paper improves there is a large amount of unlabeled multimodal data accessible in clinic.
various indicators such as classification accuracy, sensitivity and spec­ Also, there are a considerable amount of incomplete multimodal data in
ificity. Among them, the improvement of AD/MCI classification per­ clinic. Making full use of this incomplete multimodal labeled data may
formance is particularly significant, and the combination of FKNN not only enhance the quantity of training samples, but also build
classifier can achieve higher classification performance, so the IBSSA- learning methods for incomplete multimodal data, which can improve
FKNN method proposed in this paper can be well applied to the diag­ the model’s promotion performance. In a nutshell, the experiment gives
nosis of early AD. a useful research idea and algorithm for the study of Alzheimer’s dis­
From another point, this paper analyzes the time consumption of ease, and it demonstrates that the swarm intelligence optimization al­
different algorithms on AD classification, as shown in Fig. 9. It can be gorithm has a positive influence on the early detection of Alzheimer’s
seen from the figure that the classification method proposed in this disease.
question takes a relatively long time. From the results of the above in­
dicators, this method is superior to other methods in classification ac­ Data availability
curacy and other indicators. Next, I will continue to explore how to
ensure accuracy while saving time. It takes longer than other classifi­ The data used to support the findings of this study are included in the
cation techniques because of improvement approach 4, namely, the article.
dimensional random difference mutation. Since this technique requires
mutating each aspect of the individual and then judging and screening Declaration of competing interest
the outcomes. The blindness of the mutation process is certain to result
in a decrease in the algorithm’s search efficiency and a considerable The authors declare that they have no conflicts of interest.
increase in the quantity of computation. This method, however, can
cause the algorithm to deviate from the local optimum solution, boost­ Acknowledgments
ing the accuracy of AD classification.
In recent years, scholars have proposed many diagnostic algorithms Dongwan Lu and Yinggao Yue contributed equally to this work and
for AD. Since these algorithms use different databases and different should be considered as co-first authors. This work was supported in part
preprocessing methods, it is difficult to directly conduct comparative by the Natural Science Foundation of Zhejiang Province under Grant
experiments. Therefore, relevant algorithms that perform well in LY23F010002, in part by Wenzhou basic scientific research project
different sample numbers are selected for comparison. Table 15 lists the under Grant R20210030 and Service science and technology innovation
sample size and classification results of each algorithm. Janoušová E project of Wenzhou Science and Technology Association under Grant
et al. combined penalty regression data resampling to extract features kjfw36, the general scientific research projects of Zhejiang Provincial
and classify data by using SVM with Gaussian kernel [48]. Department of Education under Grant Y202250103, in part by Major

12
D. Lu et al. Computers in Biology and Medicine 159 (2023) 106930

scientific and technological innovation projects of Wenzhou Science and [24] G.Q. Zeng, X.Q. Xie, M.R. Chen, et al., Adaptive population extremal optimization-
based PID neural network for multivariable nonlinear control systems[J], Swarm
Technology Plan under Grant ZG2021021, School Level Scientific
Evol. Comput. 44 (2) (2019) 320–334.
Research Projects of Wenzhou University of Technology under grants [25] M.R. Chen, G.Q. Zeng, K.D. Lu, et al., A two-layer nonlinear combination method
ky202201 and ky202209, the Teaching Reform Research Project of for short-term wind speed prediction based on ELM, ENN, and LSTM[J], IEEE
Wenzhou University of Technology under grant 2022JG12, Major Internet Things J. 6 (4) (2019) 6997–7010.
[26] W. Deng, H. Liu, J. Xu, et al., An improved quantum-inspired differential evolution
Project of Zhejiang Natural Science Foundation under Grant algorithm for deep belief network[J], IEEE Trans. Instrum. Meas. 69 (10) (2020)
LD21F020001, Grant LSZ19F020001, and the National Natural Science 7319–7327.
Foundation of China under Grant U1809209, Wenzhou Intelligent [27] W. Deng, J. Xu, H. Zhao, et al., A novel gate resource allocation method using
improved PSO-based QEA[J], IEEE Trans. Intell. Transport. Syst. 23 (3) (2020)
Image Processing and Analysis Key Laboratory Construction Project 1737–1745.
under Grant 2021HZSY007105. [28] J. Pang, H. Zhou, Y.C. Tsai, et al., A scatter simulated annealing algorithm for the
bi-objective scheduling problem for the wet station of semiconductor
manufacturing[J], Comput. Ind. Eng. 123 (9) (2018) 54–66.
References [29] B. Venkatesh, J. Anuradha, A review of feature selection and its methods[J],
Cybern. Inf. Technol. 19 (1) (2019) 3–26.
[1] J. Weller, A. Budson, Current understanding of Alzheimer’s disease diagnosis and [30] G.G. Wang, S. Deb, Z. Cui, Monarch butterfly optimization[J], Neural Comput.
treatment[J], F1000Research 7 (7) (2018) 1–9. Appl. 31 (7) (2019) 1995–2014.
[2] J. Rasmussen, H. Langerman, Alzheimer’s disease–why we need early diagnosis[J], [31] S. Li, H. Chen, M. Wang, et al., Slime mould algorithm: a new method for stochastic
Degener. Neurol. Neuromuscul. Dis. 9 (12) (2019) 123–130. optimization[J], Future Generat. Comput. Syst. 111 (2020) 300–323.
[3] D. Lu, K. Popuri, G.W. Ding, et al., Multimodal and multiscale deep neural [32] G.G. Wang, Moth search algorithm: a bio-inspired metaheuristic algorithm for
networks for the early diagnosis of Alzheimer’s disease using structural MR and global optimization problems[J], Memetic Computing 10 (2) (2018) 151–164.
FDG-PET images[J], Sci. Rep. 8 (1) (2018) 1–13. [33] Y. Yang, H. Chen, A.A. Heidari, et al., Hunger games search: visions, conception,
[4] M. Gharaibeh, M. Almahmoud, M.Z. Ali, et al., Early diagnosis of alzheimer’s implementation, deep analysis, perspectives, and towards performance shifts[J],
disease using cerebral catheter angiogram neuroimaging: a novel model based on Expert Syst. Appl. 177 (2021), 114864.
deep learning approaches[J], Big Data and Cognitive Computing 6 (1) (2022) 2. [34] J.C. Butcher, A history of Runge-Kutta methods[J], Appl. Numer. Math. 20 (3)
[5] S. Singh, R.R. Janghel, Early diagnosis of alzheimer’s disease using aco optimized (1996) 247–260.
deep cnn classifier[C]//Ubiquitous Intelligent Systems, in: Proceedings of ICUIS [35] J. Tu, H. Chen, M. Wang, et al., The colony predation algorithm[J], J. Bionic Eng.
2021, Springer, Singapore, 2022, pp. 15–31. 18 (3) (2021) 674–710.
[6] D. Pan, G. Luo, A. Zeng, et al., Adaptive 3DCNN-Based Interpretable Ensemble [36] I. Ahmadianfar, A.A. Heidari, S. Noshadian, et al., INFO: an efficient optimization
Model for Early Diagnosis of Alzheimer’s Disease[J], IEEE Transactions on algorithm based on weighted mean of vectors[J], Expert Syst. Appl. 195 (2022),
Computational Social Systems, 2022. 116516.
[7] S. Velliangiri, S. Pandiaraj, S. Muthubalaji, Multiclass recognition of AD [37] A.A. Heidari, S. Mirjalili, H. Faris, et al., Harris hawks optimization: algorithm and
neurological diseases using a bag of deep reduced features coupled with gradient applications[J], Future Generat. Comput. Syst. 97 (2019) 849–872.
descent optimized twin support vector machine classifier for early diagnosis[J], [38] S. Mirjalili, A.H. Gandomi, S.Z. Mirjalili, et al., Salp Swarm Algorithm: a bio-
Concurrency Comput. Pract. Ex. 34 (21) (2022), e7099. inspired optimizer for engineering design problems[J], Adv. Eng. Software 114
[8] J. Seo, T.H. Laine, G. Oh, et al., EEG-based emotion classification for Alzheimer’s (12) (2017) 163–191.
disease patients using conventional machine learning and recurrent neural network [39] M. Emmanuel, J. Jabez, An enhanced fuzzy based KNN classification method for
models[J], Sensors 20 (24) (2020) 7212–7225. Alzheimer’s disease identification from SMRI images[J], JOURNAL OF
[9] T. Zheng, Y. Yu, H. Lei, et al., Compositionally graded KNN-based multilayer ALGEBRAIC STATISTICS 13 (3) (2022) 89–103.
composite with excellent piezoelectric temperature stability[J], Adv. Mater. 34 (8) [40] H. Abbad Ur Rehman, C.Y. Lin, Z. Mushtaq, et al., Performance analysis of machine
(2022), 2109175. learning algorithms for thyroid disease[J], Arabian J. Sci. Eng. 46 (10) (2021)
[10] A. Shokrzade, M. Ramezani, F.A. Tab, et al., A novel extreme learning machine 9437–9449.
based kNN classification method for dealing with big data[J], Expert Syst. Appl. [41] J. Feng, J. Zhang, X. Zhu, et al., A novel chaos optimization algorithm[J],
183 (11) (2021), 115293. Multimed. Tool. Appl. 76 (16) (2017) 17405–17436.
[11] Y. Huang, Y. Li, Prediction of protein subcellular locations using fuzzy k-NN [42] S.A. Mirjalili, S.Z.M. Hashim, BMOA: binary magnetic optimization algorithm[J],
method[J], Bioinformatics 20 (1) (2004) 21–28. International Journal of Machine Learning and Computing 2 (3) (2012) 204.
[12] K.C. Kwak, W. Pedrycz, Face recognition using a fuzzy fisherface classifier[J], [43] R.Y.M. Nakamura, L.A.M. Pereira, K.A. Costa, et al., BBA: a binary bat algorithm
Pattern Recogn. 38 (10) (2005) 1717–1732. for feature selection[C], in: 2012 25th SIBGRAPI Conference on Graphics, Patterns
[13] H.L. Chen, C.C. Huang, X.G. Yu, et al., An efficient diagnosis system for detection of and Images, IEEE, 2012, pp. 291–297.
Parkinson’s disease using fuzzy k-nearest neighbor approach[J], Expert Syst. Appl. [44] A. Patil, G. Soni, A. Prakash, A BMFO-KNN based intelligent fault detection
40 (1) (2013) 263–271. approach for reciprocating compressor[J], International Journal of System
[14] A. Mondal, S. Ghosh, A. Ghosh, Efficient silhouette-based contour tracking using Assurance Engineering and Management 13 (2) (2022) 797–809.
local information[J], Soft Comput. 20 (2) (2016) 785–805. [45] C. Yu, Z. Cai, X. Ye, et al., Quantum-like mutation-induced dragonfly-inspired
[15] W. Shan, Z. Qiao, A.A. Heidari, et al., Double adaptive weights for stabilization of optimization approach[J], Math. Comput. Simulat. 178 (2020) 259–289.
moth flame optimizer: balance analysis, engineering cases, and medical diagnosis [46] E. Emary, H.M. Zawbaa, A.E. Hassanien, Binary grey wolf optimization approaches
[J], Knowl. Base Syst. 214 (2) (2021), 106728. for feature selection[J], Neurocomputing 172 (2016) 371–381.
[16] S. Wang, Y. Zhao, J. Li, et al., Neurostructural correlates of hope: dispositional [47] J. Hu, A.A. Heidari, L. Zhang, et al., Chaotic diffusion-limited aggregation
hope mediates the impact of the SMA gray matter volume on subjective well-being enhanced grey wolf optimizer: insights, analysis, binarization, and feature
in late adolescence[J], Soc. Cognit. Affect Neurosci. 15 (4) (2020) 395–404. selection[J], Int. J. Intell. Syst. 37 (8) (2022) 4864–4927.
[17] Y. Zhang, R. Liu, A.A. Heidari, et al., Towards augmented kernel extreme learning [48] E. Janoušová, M. Vounou, R. Wolz, et al., Biomarker discovery for sparse
models for bankruptcy prediction: algorithmic behavior and comprehensive classification of brain images in Alzheimer’s disease[J], Annals of the BMVA (2)
analysis[J], Neurocomputing 430 (3) (2021) 185–212. (2012) 1–11.
[18] S. Jiao, G. Chong, C. Huang, et al., Orthogonally adapted Harris hawks [49] N. Batmanghelich, B. Taskar, C. Davatzikos, A general and unifying framework for
optimization for parameter estimation of photovoltaic models[J], Energy 203 (7) feature construction, in: Image-Based Pattern classification[C]//International
(2020), 117804. Conference on Information Processing in Medical Imaging, vol. 5636, Springer,
[19] J. Tu, H. Chen, J. Liu, et al., Evolutionary biogeography-based whale optimization Berlin, Heidelberg, 2009, pp. 423–434.
methods with communication structure: towards measuring the balance[J], Knowl. [50] S. Liu, S. Liu, W. Cai, et al., Early diagnosis of Alzheimer’s disease with deep
Base Syst. 212 (1) (2021), 106642. learning[C], in: 2014 IEEE 11th International Symposium on Biomedical Imaging
[20] S. Song, P. Wang, A.A. Heidari, et al., Dimension decided Harris hawks (ISBI), IEEE, 2014, pp. 1015–1018, 7.
optimization with Gaussian mutation: balance analysis and diversity patterns[J], [51] T. Tong, R. Wolz, Q. Gao, et al., Multiple instance learning for classification of
Knowl. Base Syst. 215 (3) (2021), 106425. dementia in brain MRI[J], Med. Image Anal. 18 (5) (2014) 808–818.
[21] Y. Fan, P. Wang, M. Mafarja, et al., A bioinformatic variant fruit fly optimizer for [52] W. Yang, R.L.M. Lui, J.H. Gao, et al., Independent component analysis-based
tackling optimization problems[J], Knowl. Base Syst. 213 (2) (2021), 106704. classification of Alzheimer’s disease MRI data[J], J. Alzheim. Dis. 24 (4) (2011)
[22] X. Zhang, Y. Xu, C. Yu, et al., Gaussian mutational chaotic fruit fly-built 775–783.
optimization and feature selection[J], Expert Syst. Appl. 141 (3) (2020), 112976.
[23] W. Zhu, C. Ma, X. Zhao, et al., Evaluation of sino foreign cooperative education
project using orthogonal sine cosine optimized kernel extreme learning machine
[J], IEEE Access 8 (3) (2020) 61107–61123.

13

You might also like