0% found this document useful (0 votes)
21 views14 pages

A Self-Adapting and Efficient Dandelion Algorithm and Its Application To Feature Selection For Credit Card Fraud Detection

This document presents a self-adapting and efficient dandelion algorithm (SEDA) designed to optimize feature selection for credit card fraud detection. The proposed algorithm simplifies the traditional dandelion algorithm by retaining only the normal sowing operator and implementing an adaptive seeding radius strategy, resulting in improved performance and reduced computational time. Experimental results demonstrate that SEDA outperforms existing methods in classification and detection tasks related to imbalanced data.

Uploaded by

pavithra.m2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

A Self-Adapting and Efficient Dandelion Algorithm and Its Application To Feature Selection For Credit Card Fraud Detection

This document presents a self-adapting and efficient dandelion algorithm (SEDA) designed to optimize feature selection for credit card fraud detection. The proposed algorithm simplifies the traditional dandelion algorithm by retaining only the normal sowing operator and implementing an adaptive seeding radius strategy, resulting in improved performance and reduced computational time. Experimental results demonstrate that SEDA outperforms existing methods in classification and detection tasks related to imbalanced data.

Uploaded by

pavithra.m2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO.

2, FEBRUARY 2024 377

A Self-Adapting and Efficient Dandelion Algorithm


and Its Application to Feature Selection for
Credit Card Fraud Detection
Honghao Zhu , Member, IEEE, MengChu Zhou , Fellow, IEEE, Yu Xie , and Aiiad Albeshri

Abstract—A dandelion algorithm (DA) is a recently developed colony algorithm (ABC) [2], particle swarm optimizer (PSO)
intelligent optimization algorithm for function optimization prob- [3], artificial fish swarm algorithm (AFSA) [4], firefly opti-
lems. Many of its parameters need to be set by experience in DA,
which might not be appropriate for all optimization problems. A mization algorithm (FA) [5], cuckoo optimization algorithm
self-adapting and efficient dandelion algorithm is proposed in this (COA) [6], bat algorithm (BA) [7], grey wolf optimization
work to lower the number of DA’s parameters and simplify DA’s algorithm (GWO) [8], evolution strategy with covariance
structure. Only the normal sowing operator is retained; while the matrix adaptation (CMA-ES) [9], and fireworks algorithm
other operators are discarded. An adaptive seeding radius strat- (FWA) [10]. Although they have achieved some success in
egy is designed for the core dandelion. The results show that the
proposed algorithm achieves better performance on the standard some fields, the free lunch theory [11] indicates that an algo-
test functions with less time consumption than its competitive rithm cannot perform optimally on all problems, and thus
peers. In addition, the proposed algorithm is applied to feature many try to propose new algorithms to face new challenges.
selection for credit card fraud detection (CCFD), and the results A dandelion algorithm (DA) is a recently proposed swarm
indicate that it can obtain higher classification and detection per-
formance than the-state-of-the-art methods. intelligence algorithm as inspired by dandelion sowing [12].
Due to its excellent performance in function optimization, DA
Index Terms— Credit card fraud detection (CCFD), dandelion
algorithm (DA), feature selection, normal sowing operator. has attracted the attention of many scholars, and they have
proposed many variants of it to further improve its perfor-
I. Introduction mance. Gong et al. [13] propose an improved dandelion algo-
PTIMIZATION problems exist in daily life, and can be rithm called MDA, in which Levy mutation is replaced by the
O solved by using mathematical algorithms. When their
scales grow, though, they become incredibly challenging to
mean position of all seeds. They also give the proof of conver-
gence and analysis of parameters. Zhu et al. [12] propose a
solve. It is found that meta-heuristic algorithms are a class of dandelion algorithm with probability-based mutation, in
efficient algorithms to solve such complex problems. In the which three probability models, namely linear (DAPML),
past decades, numerous meta-heuristics algorithms were pro- binomial (DAPMB), and exponential (DAPME) models, are
posed, such as genetic algorithms (GA) [1], artificial bee designed to select whether to utilize Levy or Gaussian muta-
Manuscript received June 23, 2023; revised August 17, 2023 and tion. Han and Zhu [14] propose a fusion dandelion algorithm-
September 27, 2023; accepted October 2, 2023. This work was supported by based adaptive selection strategy (FSS-DA) in which a dis-
the Institutional Fund Projects (IFPIP-1481-611-1443), the Key Projects of tance-aware selection strategy (DSS) by jointly considering
Natural Science Research in Anhui Higher Education Institutions (2022
AH051909), the Provincial Quality Project of Colleges and Universities in the function fitness value and the individual position distance,
Anhui Province (2022sdxx020, 2022xqhz044), and Bengbu University 2021 and other famous selection strategy are put into its fusion pool
High-Level Scientific Research and Cultivation Project (2021pyxm04).
Recommended by Associate Editor Xin Luo. (Corresponding authors:
together. The best selection strategy decided by the overall
MengChu Zhou and Yu Xie.) reward is chosen to select the suitable dandelions for the next
Citation: H. Zhu, M. C. Zhou, Y. Xie, and A. Albeshri, “A self-adapting generation. To determine whether a seed is excellent or not, an
and efficient dandelion algorithm and its application to feature selection for
credit card fraud detection,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp.
improvement of the evolution process of DA with an extreme
377–390, Feb. 2024. learning machine (ELMDA) is proposed [15], where the train-
H. Zhu is with the School of Computer and Information Engineering, ing model is first built based on the extreme learning machine
Bengbu University, Bengbu 233030, China (e-mail: [email protected]).
(ELM) [16] with the training set, which is constructed by the
M. C. Zhou is with the School of Information and Electronic Engineering,
Zhejiang Gongshang University, Hangzhou 310018, China, and also with the excellent dandelions and poor dandelions assigned their corre-
Helen and John C. Hartmann, the Department of Electrical and Computer sponding labels (i.e., +1 if excellent or −1 if poor), and then
Engineering, New Jersey Institute of Technology, Newark NJ 07102 USA (e-
mail: [email protected]).
the training model is applied to classify the seeds into excel-
Y. Xie is with the College of Information Engineering, Shanghai Maritime lent or poor ones. Excellent seeds are chosen to participate in
University, Shanghai 201306, China (e-mail: [email protected]). the evolution process, while the poor ones are abandoned to
A. Albeshri is with the Department of Computer Science, King Abdulaziz save function evaluations. Han et al. [17] propose competi-
University, Jeddah 21481, Saudi Arabia (e-mail: [email protected]).
tion-driven dandelion algorithms in which a novel competi-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. tion mechanism is designed to find out the loser, and then
Digital Object Identifier 10.1109/JAS.2023.124008 three historical information models are designed by exploit-

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
378 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

ing historical information with an estimation-of-distribution parameters set. Once they are fixed, it may perform well for
algorithm [18] to generate a new individual to replace the some problems but no others. Therefore, it is worthwhile to
loser. In addition, they have been applied in many fields, such reduce the number of parameters and simplify the Dandelion
as traffic flow prediction [19], credit card fraud detection structure such that it can select more discriminative features
(CCFD) [20], and multi-skill resource-constrained project and minimize feature redundancy and feature noise to improve
scheduling [21]. Even if they perform better than their peers, it the classification results of imbalanced data. In addition, many
is found that new parameters are introduced by them. For all researchers have used intelligent algorithms to optimize the
these methods, the structures are complex and have a large extreme learning machine to achieve excellent results, but
number of parameters to tune. they only deal with balanced data and do not consider the task
Filter, wrap, and embedded methods are commonly used in of imbalanced data. The motivation of this paper stems from
feature selection. The filter method is completely independent the aforementioned problems. Finding solutions address the
of any machine learning algorithm. It sorts data features above-mentioned issues holds significant importance for
according to certain criteria and selects the best feature subset. researchers and financial professionals in recognizing the role
The wrapped method relies on a specific classification model of feature selection in detecting credit card fraud given imbal-
to evaluate the selected feature subset. Because it directly anced classification scenarios.
optimizes the target, the wrapped method performs better. The In order to address these issues well, this work proposes a
embedded method imparts feature selection into the classifica- self-adapting and efficient dandelion algorithm, namely
tion model and combines the advantages of filtering and wrap- SEDA. Specifically, basic DA consists of normal sowing,
ping methods [22]. In this paper, a random search package mutation sowing, and selection strategy, but only normal sow-
method is used for feature selection, which mainly uses an ing is retained to generate seeds in SEDA, i.e., mutation sow-
intelligent optimization algorithm to iteratively add and delete ing is removed. Meanwhile, the population of SEDA has only
features to establish feature subsets. one dandelion, i.e., core dandelion, which is different from the
Sun et al. [23] proposed to use a genetic algorithm to select basic DA including core dandelion and assistant ones. In addi-
good feature subsets and support vector machine (SVM) as tion, an adaptive seeding radius strategy is designed to replace
the base classifier to improve the detection rate. Xue et al. the two parameters (withering factor and growth factor).
[24] proposed three new PSO initialization strategies and three Finally, the greedy selection strategy is applied to choose the
new individual optimal and global optimal update mecha- best one in the population into the next generation. Based on
nisms for feature selection to achieve the purpose of maximiz- the above design, it is noted that all unnecessary mechanisms
ing classification performance, minimizing the number of fea- of DA are removed in SEDA. The proposed algorithm has the
tures, and reducing computation time. The base classifier used following advantages:
in this method is the K-nearest neighbor. Masood et al. [25] 1) It is easy to implement.
proposed a new rank-based incremental search strategy 2) It has only one parameter, i.e., the number of seeds.
method, Wrank-ELM, which adopted a feature selection 3) It reduces time consumption.
method based on package sorting and used the ELM classifier The proposed algorithm is used to select proper features for
to evaluate the selected features. Xue et al. [26] proposed an credit card fraud detection since these benefits are advanta-
adaptive particle swarm optimization algorithm for feature geous to the solution of real-world problems. This work aims
selection, especially for large-scale feature selection. KNN is to make the following new contributions:
used as the base classifier, and the experimental results show 1) Proposing a simple and efficient dandelion algorithm in
that the design method is effective. Sagban et al. [27] com- which all unnecessary mechanisms of DA are removed.
bined the bat algorithm and the ant colony algorithm to con- 2) Developing an adaptive seeding radius strategy to reduce
duct a feature selection algorithm for the classification of cer- the number of DA’s parameters.
vical cancer data sets, but this method only takes accuracy as 3) Applying the proposed algorithm successfully to feature
an evaluation index in terms of experimental results. Cur- selection for credit card fraud detection.
rently, existing intelligent optimization algorithms primarily The paper is organized as follows. In Section II, we provide
focus on feature selection in the context of balanced data clas- a brief introduction of basic DA. The proposed algorithm is
sification. Therefore, investigating how to design an improved given in details in Section III. Section IV presents the experi-
intelligent algorithm for feature selection that enhances the mental results. The application of SEDA is presented in Sec-
classification performance on imbalanced data is a worth- tion V. Finally, Section VI draws the conclusion.
while research problem.
According to the analysis of the literature, the existing liter- II. Dandelion Algorithm
ature uses intelligent optimization algorithms to select fea- DA is divided into four parts: initialization, normal sowing,
tures for balanced data, and there is a lack of intelligent opti- mutation sowing, and selection strategy [12], [28]:
mization algorithms for feature selection under unbalanced 1) Initialization: The first generation of population is gener-
classification. Among the existing intelligent optimization ated randomly in the search range of N dandelions.
algorithms, the Dandelion algorithm, as a newly proposed 2) Normal Sowing: DA solves a minimization problem
algorithm, has superior performance. It needs to set many where the number of seeds is calculated based on the fitness

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 379

1.00 1.9
1.8
0.95
1.7
0.90
1.6
0.85 1.5
Values

Values
0.80 1.4
1.3
0.75
1.2
0.70
1.1
0.65 1.0
0 200 400 600 800 1000 0 200 400 600 800 1000
Evaluations Evaluations
(a) (b)

Fig. 1. Variation trend (a) γ = 1 and (b) γ , 1.

value, i.e., the smaller the fitness value, the more seeds are T
δ = 1− (5)
produced. Each dandelion produces dandelion seeds within a T̂
certain seeding radius that is dynamically adjusted. where T̂ is the maximum number of function evaluations, and
3) Mutation Sowing: The best dandelion, i.e., the dandelion T is the current number.
with the minimum fitness, jumps out of a local optimum by The best dandelion is sown with mutation operation, i.e.,
Levy mutation. Xb′ = Xb × (1 + Levy( )) (6)
4) Selection Strategy: The best dandelion is always selected
where Levy( ) is a random number following the Levy distri-
and retained for the next generation. Then N − 1 dandelions
bution.
are selected from the rest according to a selection operator.
Normal sowing and selection strategies are repeated until III. Proposed Algorithms
the given or termination condition is satisfied.
In DA, the number of dandelion seeds is calculated as A. A Self-Adapting and Efficient Dandelion Algorithm
 In order to simplify DA’s structure, a self-adapting and effi-


 f − f (xi ) + ε
S M × M
 , Ni > S m cient dandelion algorithm (SEDA) is presented in this paper.
Ni = 
 f M − fm + ε (1)


S ,
In it, an adaptive seeding radius strategy is designed to further
m N i ≤ S m reduce the number of parameters when the sowing radius is
where S M is used to control the maximum value of produced calculated. The sowing radius for the core dandelion can be
seeds, S m is used to control the minimum number of pro- calculated as follows.
duced seeds, f M and fm respectively the maximum and mini- As for DA, dandelions are divided into two types: optimal
mum fitness values of all dandelions, ϵ is machine epsilon, and other dandelions, which are calculated via (2) and (3),
used to avoid the error of dividing by zero, and the fitness respectively. They show that the optimal dandelion algorithm
value of the i-th dandelion is expressed as f (xi ). has two parameters when performing a seeding process: wilt-
Dandelions are divided into best dandelions and other dan- ing and growth factors. To simplify DA’s structure and reduce
delions according to their fitness value. Among them, the best its parameter count, an adaptive dandelion algorithm keeps
dandelion is calculated as only one type of dandelion, i.e., the core dandelion. To sim-
 plify the structure of the algorithm and reduce parameter

U b − Lb , t=1


 count, the adaptive dandelion algorithm keeps only one dan-
Rb (t) = 
Rb (t − 1) × α, γ = 1 (2)


R (t − 1) × β, γ , 1
delion, i.e., the core dandelion, and removes the wilt and
b growth factors. The seeding radius of the core dandelion is
where Lb and Ub is repectively lower bound and upper bound calculated as.
of the search space, t means the t-th generation, α and β are U − L , t=1



b b
respectively withering factor and growth factor, and γ reflects 

 ( )


 1 + β−t /T̂
2 2
the growth trend and is calculated as follows: 

Rb (t − 1) × , γ=1
Rb (t) = 
 2 (7)
fb (t) + ε 

 ( )
γ= . (3) 

 1 + βt /T̂
2 2
fb (t − 1) + ε 


Rb (t − 1) × , γ , 1.
The sowing radius of other dandelions is calculated as: 2
 From (7), it can be seen that the wilting and growth factors

 Ub − Lb , t=1
Ri (t) = 
 δ × R (t − 1) + (∥x ∥ − ∥x ∥ ), (4) have been replaced by an adaptive radius strategy that can be
i b ∞ i ∞ otherwise updated adaptively by the number of function evaluations, and
where δ is a weight factor calculated as their trends are shown in Fig. 1.

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
380 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

As can be seen from Fig. 1, when γ = 1, the seeding radius Randomly generating 1
becomes smaller as t increases, with a downward trend; when dandelion
γ , 1, the seeding radius becomes larger as t increases, with an
upward trend. They tend to meet the original intent of the
underlying dandelion algorithm. Generating Ni seeds
The proposed SEDA has one dandelion only in the initial
population, and it can be considered as the best dandelion,
which means that only the best dandelion generates the seeds Are there seeds N
and needs the calculation of its sowing radius. It is evident out of range?
that SEDA is simpler than the basic DA and its variants. In
Y
SEDA, the sowing radius of the best dandelion is first calcu-
Randomly generating
lated by (7), and then Ni seeds are generated with the sowing seeds in search space to
radius. If the new seeds are out of bounds, they are replaced replace them
by the random ones generated in the search space. Finally, the
best one is selected from the population including the best one
Assessing the seeds
and all seeds by a greedy selection strategy. If the termination
condition is not satisfied, the above steps are repeated.
In SEDA, all additional mechanisms (Gaussian variation,
Levy variation, and seeding of common dandelions) have
Is the termination Y End
been removed as shown in Algorithm 1. condition satisfied?

Algorithm 1 Self-Adapting and Efficient Dandelion Algorithm


N
Input: Xb and Ni //Xb is the current best dandelion, Ni is the num-
ber of seeds. Selecting 1 dandelion
Output: Xi // Exporting the best dandelions.
1: repeat
2: for i = 1 to Ni do Fig. 2. Flowchart of SEDA.
3: Calculate Rb by (7) // Calculate Rb using (7)
4: for k = 1 to d do number of features in a dataset. If the value of a dimension is
5: Xik = Xbk + rand(0, Rb ) greater than 0.5, it means that this dimension (feature) is
6: if Xik out of range then selected. Otherwise, it is removed. For example, one dande-
7: Generate a random location in the search lion (0.4, 0.1, 0.6, 0.7, 0.2) is to select the third and fourth fea-
space ture from the dataset as the new dataset. In addition, the aims
8: end if are to obtain a better classification performance and fewer fea-
9: end for tures, and thus the objective function is designed as
10: end for
11: Select the best dandelion in the current population
f = λ × (1 − G) + µ × NF (8)
12: until Satisfaction of termination conditions where λ and μ are weight factors to trade-off the classification
performance and number of features. They are set to 0.99 and
To express the proposed algorithm more clearly, the
0.01 in this paper since the classification performance is more
flowchart of SEDA is given in Fig. 2. The computational com-
important than the number of features. G is the performance
plexity of Algorithm 1 is O(T̂ × M) where M is the total num-
indicator for imbalanced classification (G-mean) [20]. NF is
ber of seeds, and T̂ is the maximum number of generations.
the number of features. The smaller the value of f, the better
B. An Imbalance Classification Method Based on SEDA Feature the classification performance. Moreover, some of the-state-
Selection of-the-art methods that are often used as comparisons, DA,
Credit card fraud is an important issue and is quite costly for MDA, DAPML, DAPMB, DAPME, BA, and IPSO are selected
banks and card-related companies. Billions of dollars are lost to compare with SEDA. The maximum number of evalua-
each year due to credit card fraud. Due to confidentiality tions is set to 1000. It is noted the weighted extreme learning
issues, there is a lack of research on analyzing real-world machine (WELM) [32] is selected as the base classifier and
credit card data. Therefore, it is extremely important to detect the reasons are: 1) it has fast learning speed; and 2) it is suit-
whether a credit card is fraudulent. Credit card fraud detec- able for imbalanced classification problems.
tion is a classic imbalanced classification task [29], [30]. Fea- G-mean and AUC (area under curve) are selected as the
ture selection is considered as an effective way to handle it evaluation indicators. In the following, they are first con-
[31]. In this paper, the proposed SEDA is applied to feature ducted on 10 imbalanced classification datasets, and then
selection for solving it. three credit card fraud detection datasets.
In SEDA, the dimension of each dandelion is set to the In summary, the imbalance classification method based on

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 381

 ( )
the adaptive efficient dandelion algorithm feature selection 
 P X(t) ∈ W ∗ | X(t − 1) ∈ W ∗ = 1

 (
(SEDA-W) is shown in Algorithm 2. 

 ∗ ∗)
P X(t) < W | X(t − 1) ∈ W = 0

P(X(t − 1), X(t)) = 
 ( ) (11)
Algorithm 2 Imbalanced Classification Algorithm Based on 

 P X(t) ∈ W ∗ | X(t − 1) < W ∗ > 0



SEDA Feature Selection  ( )
P X(t) < W ∗ | X(t − 1) < W ∗ > 0.
Input: Training dataset, maximum number of evaluations. According to the total probability formula, we have
Output: G-mean. ( ) ( ( ))
1: Initializing the parameters of SEDA and WELM P X(t) ∈ W ∗ = 1 − P X(t − 1) ∈ W ∗
( )
2: Randomly initialize the population, calculate the fitness value of × P X(t) ∈ W ∗ | X(t − 1) < W ∗
the population and evaluate the population according to (8) ( )
+ P X(t − 1) ∈ W ∗
3: repeat ( )
4: Use (7) to generate seeds and evaluate the seeds × P X(t) ∈ W ∗ | X(t − 1) ∈ W ∗ . (12)
5: Selection of individuals from existing populations for the Let δ(t) = P((X(t) ∈ W )), u(t) = P((X(t) ∈ W | X(t−1) < W ∗ )),
∗ ∗

next generation through a selection strategy (12) can be changed to


6: until meet the stopping conditions
δ(t) = (1 − δ(t − 1)) × u(t) + δ(t − 1). (13)
7: Find the best dandelion and its location
8: Get G-mean based on the best dandelion and its location From (13), we have
9: return 1 − δ(t) = (1 − δ(t − 1)) × (1 − u(t))
In Algorithm 2, Lines 1 and 2 are the initialization phase = (1 − δ(t − 2)) × (1 − u(t)) × (1 − u(t − 1))
that requires N function evaluations. Lines 4–6 require M =
function evaluations, where M is the total number of seeds. ..
.
The computational complexity of WELM classification is
∏ t
O(N̂mL + N̂ 2 L), where N̂ and m denote the number of train- = (1 − δ(0)) × (1 − u(k)). (14)
ing instances and the dimension of each training instance, k=1
respectively [33]. L denotes the number of hidden nodes in Taking the limit on both sides of (14), we obtain
WELM. If the maximum number of generations are set to T̂ ,  
the computational complexity of Algorithm 2 is O(N(N̂mL+  ∏
t 
lim δ(t) = 1 − lim (1 − δ(0)) × (1 − u(k))

N̂ 2 L) + T̂ M(N̂mL + N̂ 2 L)). t→∞ t→∞
k=1
 t 
C. Convergence Analysis of SEDA-W ∏ 
= 1 − (1 − δ(0)) × lim  (1 − u(k)) . (15)
The convergence of SEDA-W is mathematically analyzed. t→∞
k=1
We assume that X is a dandelion in SEDA-W’s population.
Since it has only one dandelion in the initial population, it can From (15), it is easy to find that the result depends mainly
on u(k) = P(X(k) ∈ W ∗ | X(k − 1) < W ∗ ). If the function evalua-
be considered as the best one.
tions are given enough, SEDA-W could find the optimal solu-
Theorem 1: The evolution direction of SEDA-W is mono- ∏
tonically nonincreasing, i.e., f (X(t + 1)) ≤ f (X(t)), ∀ t ≥ 1. tion, i.e., 0 < u(t) ≤ 1. Thus we have limt→∞ ( tk=1 (1−
Proof: Through the selection strategy, the best dandelion is u(k))) = 0 , and limt→∞ δ(t) = 1, i.e., limt→∞ P (X(t) ∈ W ∗ ) = 1.
always retained, and its fitness value is monotonically non- ■
increasing. Thus, f (X(t + 1)) ≤ f (X(t)) is established. ■ IV. Experimental Results and their Analysis
Theorem 2: {X(t) , t ∈ Z + } of SEDA-W forms a homoge-
neous Markov chain. A. Experimental Settings
Proof: X(t) changes to X(t + 1), denoted as 1) Benchmark Functions: In order to evaluate the perfor-
mance of the proposed algorithm, 28 benchmark functions
X(t + 1) = T (X(t)). (9)
from CEC2013 are used in our experiments. The CEC2013
From (9), we can see that X(t + 1) is only related to the test suite is listed in [34].
present state X(t). Thus X(t) of SEDA-W, t > 0 , forms a In this paper, each algorithm is run 51 times, and the dimen-
Markov chain. Also, we can observe that T depends on X(t), sion is set to 30, the function evaluations are set to 300 000.
but not t [15]. ■ Note that no additional parameters need to be set in SEDA
Theorem 3: {X(t) , t ∈ Z + } of SEDA-W converges to the except the number of the seeds.
global/local optimal solution set W ∗ with probability 1, i.e., 2) Impact of the Number of Seeds: On SEDA performance.
The number of the seeds is set to 100, 200, 300, 400 and 500,
( )
lim P X(t) ∈ W ∗ = 1. (10) respectively. The mean error and Friedman ranking are shown
t→∞
in Table I. The best mean error on each function is marked in
Proof: Based on Theorems 1 and 2, {X(t)} is monotonically bold. From Table I, it can be seen that SEDA performs the
non-increasing homogeneous Markov chain, and the transi- best among them when the number of the seeds is set to 300 in
tion probability from {X(t − 1)} to {X(t)} is [15] terms of Friedman ranking (2.5000). Thus, it is set to 300 in

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
382 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

TABLE I
Mean Error and Friedman Ranking of SEDA for Different Numbers of Seeds
100 200 300 400 500
Function
ē ē ē ē ē
f1 0.00E+00 0.00E+00 0.00E+00 1.94E−08 5.85E−06
f2 3.11E+05 6.12E+05 9.92E+05 1.28E+06 1.73E+06
f3 5.98E+07 5.69E+07 3.51E+07 3.93E+07 7.87E+07
f4 2.09E−03 1.10E−03 1.11E−02 1.10E−01 6.64E−01
f5 1.87E−04 1.72E−04 3.12E−04 9.63E−04 4.33E−03
f6 2.32E+01 2.76E+01 2.76E+01 3.19E+01 2.95E+01
f7 5.37E+01 5.18E+01 5.20E+01 4.74E+01 4.86E+01
f8 2.09E+01 2.09E+01 2.09E+01 2.09E+01 2.09E+01
f9 1.91E+01 1.92E+01 1.79E+01 1.83E+01 1.73E+01
f10 2.98E−02 2.85E−02 2.36E−02 2.84E−02 3.05E−02
f11 7.29E+01 7.25E+01 6.86E+01 7.35E+01 6.93E+01
f12 1.06E+02 1.16E+02 1.14E+02 1.04E+02 9.81E+01
f13 2.09E+02 2.05E+02 1.91E+02 1.97E+02 1.84E+02
f14 2.21E+03 2.24E+03 2.14E+03 2.26E+03 2.16E+03
f15 3.47E+03 3.48E+03 3.50E+03 3.28E+03 3.31E+03
f16 2.86E−01 3.09E−01 2.16E−01 2.14E−01 2.49E−01
f17 1.19E+02 1.09E+02 1.08E+02 1.07E+02 1.08E+02
f18 1.56E+02 1.63E+02 1.60E+02 1.53E+02 1.57E+02
f19 5.62E+00 5.69E+00 5.69E+00 5.22E+00 5.40E+00
f20 1.21E+01 1.18E+01 1.18E+01 1.19E+01 1.16E+01
f21 3.30E+02 3.28E+02 2.98E+02 3.07E+02 3.27E+02
f22 2.63E+03 2.52E+03 2.39E+03 2.58E+03 2.27E+03
f23 3.88E+03 3.90E+03 4.09E+03 3.75E+03 3.79E+03
f24 2.55E+02 2.51E+02 2.55E+02 2.53E+02 2.48E+02
f25 2.79E+02 2.79E+02 2.77E+02 2.74E+02 2.73E+02
f26 2.06E+02 2.03E+02 2.00E+02 2.03E+02 2.03E+02
f27 8.36E+02 8.14E+02 7.96E+02 7.87E+02 7.99E+02
f28 3.40E+02 3.45E+02 2.84E+02 3.21E+02 3.11E+02
Friedman ranking 3.7142 3.5357 2.5000 2.6428 2.6071

the following experiment. In addition, the Wilcoxon rank-sum test is conducted to ver-
ify the performance of all compared algorithms, in which “+”
B. Comparison of SEDA With Other Methods indicates that SEDA performs significantly better than its
1) Comparison of SEDA With DA and Its Variants: In this peers, “−” indicates SEDA performs worse than its peers, and
section, the proposed SEDA is compared with the basic DA “ = ” indicates SEDA and its peer have no insignificant differ-
and its variants including MDA, DAPML, DAPMB, and ence. The results as presented at the bottom of Table II show
DAPME. Their parameter settings follow their corresponding that SEDA is significantly better than DA and its variants.
references [12]. The mean error and Friedman rankings are Finally, the average time consumption of six algorithms on
shown in Table II. The best mean error and Friedman ranking 28 test functions is shown in Fig. 4. From it, it can be con-
are marked in bold. cluded that the proposed SEDA can achieve better perfor-
From Table II, it can be seen that the six algorithms have mance with less time consumption.
the same performance on f1 and f8, while SEDA has a better Fig. 3 shows their shatis of DA, DAPML, DAPMB, DAPME,
mean error than the other five algorithms on the other func- IPSO, MDA, and SEDA on some benchmark functions.
tions. Judging from the Friedman ranking, it can be con- 2) Comparison of Three Models With Other Swarm Intelli-
cluded that the proposed SEDA is the best among them. The gence Algorithms: In order to further verify performance of
convergence curves of f8, f9, f13 , and f14 are shown in Fig. 3 the proposed SEDA, a comparison with ABC, IPSO, DE and
while the other 24 convergence curves are in Supplementary CMA-ES is conducted on CEC2013, and the results of the
File. From Fig. 3, it can be seen that our method can jump out four algorithms are taken from their references [35]–[38]. The
of the local optimum in the process of running. comparison results of among ABC, DE, CMA-ES, IPSO, and

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 383

TABLE II
Mean Error of the Test Function and Friedman Ranking

Function DA MDA DAPML DAPMB DAPME SEDA


f1 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00
f2 3.16E+05 4.57E+05 3.15E+05 3.13E+05 3.01E+05 9.92E+05
f3 1.28E+08 1.16E+08 9.03E+07 1.13E+08 7.16E+07 3.51E+07
f4 4.02E−01 2.13E−01 4.02E−01 3.34E−01 3.00E−01 1.11E−02
f5 2.72E−04 3.10E−04 2.82E−04 2.59E−04 2.61E−04 3.12E−04
f6 2.66E+01 2.69E+01 2.59E+01 2.59E+01 1.96E+01 2.76E+01
f7 8.41E+01 8.77E+01 7.97E+01 8.50E+01 9.06E+01 5.20E+01
f8 2.09E+01 2.09E+01 2.09E+01 2.09E+01 2.09E+01 2.09E+01
f9 2.13E+01 2.27E+01 2.11E+01 2.18E+01 2.16E+01 1.79E+01
f10 2.95E−02 4.36E−02 2.82E−02 3.10E−02 3.23E−02 2.36E−02
f11 8.76E+01 8.34E+01 9.40E+01 8.87E+01 9.18E+01 6.86E+01
f12 1.35E+02 1.38E+02 1.25E+02 1.24E+02 1.37E+02 1.14E+02
f13 2.30E+02 2.36E+02 2.25E+02 2.24E+02 2.26E+02 1.91E+02
f14 2.72E+03 2.83E+03 2.76E+03 2.72E+03 2.73E+03 2.14E+03
f15 3.74E+03 3.62E+03 3.46E+03 3.68E+03 3.67E+03 3.50E+03
f16 3.41E−01 5.21E−01 3.16E−01 3.41E−01 3.34E−01 2.16E−01
f17 1.35E+02 1.33E+02 1.31E+02 1.30E+02 1.37E+02 1.08E+02
f18 1.89E+02 1.88E+02 1.76E+02 1.80E+02 1.85E+02 1.60E+02
f19 6.51E+00 6.17E+00 6.06E+00 5.93E+00 6.11E+00 5.69E+00
f20 1.22E+01 1.20E+01 1.20E+01 1.20E+01 1.20E+01 1.18E+01
f21 3.39E+02 3.30E+02 3.20E+02 3.45E+02 3.47E+02 2.98E+02
f22 3.17E+03 3.20E+03 3.15E+03 3.18E+03 3.10E+03 2.39E+03
f23 4.08E+03 4.07E+03 4.14E+03 4.30E+03 4.17E+03 4.09E+03
f24 2.62E+02 2.68E+02 2.65E+02 2.65E+02 2.65E+02 2.55E+02
f25 2.86E+02 2.90E+02 2.86E+02 2.89E+02 2.87E+02 2.77E+02
f26 2.06E+02 2.10E+02 2.03E+02 2.06E+02 2.06E+02 2.00E+02
f27 8.96E+02 9.15E+02 8.95E+02 8.76E+02 8.99E+02 7.96E+02
f28 4.05E+02 3.19E+02 3.68E+02 3.41E+02 3.95E+02 2.84E+02
Friedman ranking 4.1964 4.625 3.0892 3.4821 3.875 1.7321
+/ = /− 17/8/3 19/7/2 17/9/2 16/9/3 18/7/3 N/A

SEDA are presented in Table III. The best result is marked in sample, and “#R” is the ratio of the number of minority sam-
bold. ples to the number of majority samples in a dataset.
As shown in Table III, CMA-ES performs extremely well In this simulation, the regularization coefficient and band-
on unimodal functions but suffers from premature conver- width of the radial basis function kernel in WELM are set to 1
gence on some complex functions. ABC performs best on 10 and 100, respectively. Each algorithm runs 10 times. The
functions. Judging from the Friedman ranking, it can be seen results of G-mean [20] and AUC [39] are shown in Tables V
that the SEDA is the best among all the compared algorithms, and VI.
which indicates that SEDA is more stable on the test function. In addition, AUC and G-mean are evaluation indexes to
measure the quality of dichotomies. AUC is defined as the
C. Comparison on 13 Imbalanced Datasets area under the ROC (receiver operating characteristic) curve.
In order to verify the performance of SEDA, a comparison F1 represents the harmonic average of Recall and Precision.
among DA, MDA, DAPML, DAPMB, DAPME, BA, and IPSO G-mean is the geometric average of the prediction accuracy of
with WELM are conducted on 13 imbalanced classification positive and negative samples.
datasets in this section, as shown in Table IV. The datasets TP
can be found in UCI and KEEL database, where column Precision = (16)
T P + FP
“Abbr.” lists the assigned code of datasets, “#N” is the num-
ber of samples, “#mi/#Ma” represents the number of minority TP
Recall = (17)
and majority samples, “#A” is the number of features in a T P + FN

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
384 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

–678.70 –550
DA DA
–678.75 DAPML –555 DAPML
DAPMB DAPMB
–678.80 DAPME DAPME
–560
MDA MDA
–678.85 SEDA SEDA
–565
Optima

Optima
–678.90
–570
–678.95
–575
–679.00

–679.05 –580

–679.10 –585
0 0.5 1.0 1.5 20 2.5 3.0 0 0.5 1.0 1.5 20 2.5 3.0
Evaluations ×105 Evaluations ×105
(a) The convergence curves of f8 (b) The convergence curves of f9

1200 10 000
DA DA
1000 DAPML 9000 DAPML
DAPMB DAPMB
DAPME 8000 DAPME
800
MDA MDA
SEDA 7000 SEDA
600
Optima

Optima
6000
400
5000
200
4000
0 3000

–200 2000
0 0.5 1.0 1.5 20 2.5 3.0 0 0.5 1.0 1.5 20 2.5 3.0
Evaluations ×105 Evaluations ×105
(c) The convergence curves of f13 (d) The convergence curves of f14

Fig. 3. Convergence curves of DA, DAPML, DAPMB, DAPME, MDA, and SEDA on some benchmark functions.

12 among them on S1, and S4−S13, and it is a bit worse than IPSO
on S2 and S3. DAPML and SEDA have the same performance
on S13. The other algorithms perform poorly in the remaining
10
datasets. In addition, the results of the T-test, as shown at the
bottom of Table V, indicate that SEDA is significantly better
8 than its peers in terms of G-mean values. From Table VI, it
Time consumption

can be observed that SEDA is the best on 12 datasets besides


6
S2 in terms of AUC.
Based on the results of G-mean and AUC, we can conclude
that SEDA can pick out a better feature subset from all fea-
4 tures to get better classification performance than its peers.

D. Comparison With Three Datasets With Large Imbalance


2
Ratio
In order to investigate the performance of the proposed
0
DA MDA DAPML DAPMB DAPME SEDA SEDA on datasets with large imbalance ratios, we have selec-
ted three datasets which are from [40], as shown in Table VII.
Fig. 4. Average time cost of 6 algorithms on 28 tested functions. Dataset S16 contains a total of 284 807 samples, with only 492
samples belonging to the minority class, accounting for only
2 × Precision × Recall
F1 = (18) 0.172% of the total. The results of G-mean and AUC are pre-
Precision + Recall sented in Tables VIII and IX, and the best results on each
√ dataset are marked in bold. As can be seen from Tables VIII
TP TN
G-mean = × (19) and IX, on all three lager imbalance ratio datasets, our pro-
T P + FN T N + FP
posed method outperforms the other methods on G-mean, and
where T P , T N , F P , and F N are respectively the number of true the AUC value of S15 is slightly lower than that of MDA-W.
positive cases, true negative cases, false positive cases, and
false negative cases. E. Comparison via Three CCFD Datasets
From Table V, it can be seen that SEDA performs the best Three real-world transaction datasets are selected for this

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 385

TABLE III
Mean Error and Friedman Ranking of the Five Algorithms
Function ABC DE CMA-ES IPSO SEDA
f1 0.00E+00 1.89E-03 0.00E+00 0.00E+00 0.00E+00
f2 6.20E+06 5.52E+04 0.00E+00 3.38E+05 9.92E+05
f3 5.74E+08 2.16E+06 1.41E+01 2.88E+08 3.51E+07
f4 8.75E+04 1.32E-01 0.00E+00 3.86E+04 1.11E-02
f5 0.00E+00 2.48E-03 0.00E+00 5.42E-04 3.12E-04
f6 1.46E+01 7.82E+00 7.82E-02 3.79E+01 2.76E+01
f7 1.25E+02 4.89E+01 1.91E+01 8.79E+01 5.20E+01
f8 2.09E+01 2.09E+01 2.14E+01 2.09E+01 2.09E+01
f9 3.01E+01 1.59E+01 4.81E+01 2.88E+01 1.79E+01
f10 2.27E-01 3.42E-02 1.78E-02 3.40E-01 2.36E-02
f11 0.00E+00 7.88E+01 4.00E+02 1.05E+02 6.86E+01
f12 3.19E+02 8.14E+01 9.42E+02 1.04E+02 1.14E+02
f13 3.29E+02 1.61E+02 1.08E+03 1.94E+02 1.91E+02
f14 3.58E-01 2.38E+03 4.94E+03 3.99E+03 2.14E+03
f15 3.88E+03 5.19E+03 5.02E+03 3.81E+03 3.50E+03
f16 1.07E+00 1.97E+00 5.42E-02 1.31E+00 2.16E-01
f17 3.04E+01 9.29E+01 7.44E+02 1.16E+02 1.08E+02
f18 3.04E+02 2.34E+02 5.17E+02 1.21E+02 1.60E+02
f19 2.62E-01 4.51E+00 3.54E+00 9.51E+00 5.69E+00
f20 1.44E+01 1.43E+01 1.49E+01 1.35E+01 1.18E+01
f21 1.65E+02 3.20E+02 3.44E+02 3.09E+02 2.98E+02
f22 2.41E+01 1.72E+03 7.97E+03 4.30E+03 2.39E+03
f23 4.95E+03 5.28E+03 6.95E+03 4.83E+03 4.09E+03
f24 2.90E+02 2.47E+02 6.62E+02 2.67E+02 2.55E+02
f25 3.06E+02 2.89E+02 4.41E+02 2.99E+02 2.77E+02
f26 2.01E+02 2.52E+02 3.29E+02 2.86E+02 2.00E+02
f27 4.16E+02 7.64E+02 5.39E+02 1.00E+03 7.96E+02
f28 2.58E+02 4.02E+02 4.78E+03 4.01E+02 2.84E+02
Friedman ranking 2.875 2.8393 3.5357 3.3929 2.3571

TABLE IV
Public Data
Abbr. Dataset #N #mi/Ma #A #R
S1 German 1000 300/700 20 2.3
S2 Phoneme 5404 1586/3818 5 2.4
S3 Haberman 306 81/225 3 2.8
S4 Cmc 1473 333/1140 9 3.4
S5 Balance 625 49/576 4 11.8
S6 Vehicle 846 212/634 18 3.0
S7 Glass2 214 17/197 9 11.6
S8 Boston housing 506 106/400 13 3.8
S9 Hepatitis 155 32/123 19 3.8
S10 Nursery 12 960 328/12 632 8 38.5
S11 Wpbc 198 47/151 33 3.2
S12 Satimage 6435 626/5809 36 9.3
S13 Abalone 4177 36/4141 9 115.0

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
386 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

TABLE V
G-Mean

Dataset BA-W DA-W DAPML-W DAPMB-W DAPME-W IPSO-W MDA-W SEDA-W


S1 0.6875 0.6797 0.6808 0.6666 0.6523 0.7170 0.6853 0.7451
S2 0.7243 0.7427 0.7429 0.7430 0.7429 0.7519 0.7428 0.7478
S3 0.5843 0.6013 0.6039 0.6018 0.6060 0.6395 0.6013 0.6361
S4 0.6374 0.6601 0.6614 0.6603 0.6604 0.6778 0.6614 0.6922
S5 0.3973 0.5169 0.5179 0.5205 0.5179 0.5429 0.5175 0.5469
S6 0.6529 0.6692 0.6641 0.6583 0.6581 0.6737 0.6667 0.7120
S7 0.5758 0.7291 0.7291 0.7219 0.7220 0.6270 0.7291 0.7597
S8 0.5724 0.6297 0.6306 0.6309 0.6314 0.6136 0.6324 0.7015
S9 0.7381 0.8984 0.8905 0.8921 0.8343 0.8579 0.8764 0.9537
S10 0.8459 0.8348 0.8445 0.8343 0.8412 0.8580 0.8343 0.8623
S11 0.6651 0.7636 0.7403 0.6972 0.6963 0.7109 0.7625 0.7921
S12 0.6017 0.7152 0.6921 0.6821 0.6800 0.6431 0.6753 0.7240
S13 0.7083 0.7971 0.7981 0.7862 0.7864 0.7731 0.7917 0.7981
t-test (+/ = /−) 13/0/0 11/2/0 11/2/0 12/1/0 11/2/0 9/1/3 11/2/0 N/A

TABLE VI
AUC

Dataset BA-W DA-W DAPML-W DAPMB-W DAPME-W IPSO-W MDA-W SEDA-W


S1 0.6910 0.6864 0.6912 0.6710 0.6683 0.7250 0.6921 0.7514
S2 0.7262 0.7438 0.7437 0.7439 0.7437 0.7526 0.7436 0.7502
S3 0.6229 0.6233 0.6260 0.6219 0.6334 0.6573 0.6233 0.6608
S4 0.6630 0.6637 0.6640 0.6621 0.6642 0.6828 0.6640 0.6983
S5 0.4749 0.5272 0.5278 0.5338 0.5277 0.5542 0.5262 0.5559
S6 0.6580 0.6764 0.6708 0.6663 0.6658 0.6756 0.6716 0.7151
S7 0.6599 0.7686 0.7696 0.7616 0.7636 0.6999 0.7684 0.7911
S8 0.6010 0.6501 0.6506 0.6592 0.6583 0.6457 0.6526 0.7348
S9 0.7500 0.9054 0.8971 0.8769 0.8452 0.8660 0.8804 0.9554
S10 0.8580 0.8501 0.8555 0.8692 0.8556 0.8682 0.8651 0.8718
S11 0.6779 0.7819 0.7588 0.7119 0.7121 0.7269 0.7721 0.7994
S12 0.6767 0.7317 0.7185 0.7195 0.7104 0.7042 0.7127 0.7534
S13 0.7267 0.8038 0.8041 0.7933 0.7936 0.7906 0.7977 0.8041
t-test (+/ = /−) 13/0/0 11/2/0 11/2/0 10/3/0 11/2/0 9/3/1 10/3/0 N/A

experiment to validate the performance of SEDA, they are 9.06% the second best method IPSO-W and the worst one
LoanPrediction1, Creditcardcsvpresent2 and Default of Credit BA-W, respectively.
Card Clients3. For convenience, they are abbreviated as 2) On dataset D2, SEDA has the best results on all five eval-
D1−D3, respectively, which list in Table X. uation metrics. On the key metric G-mean, SEDA outforms by
To reflect the classification performance comprehensively, 0.85% and 12.08% the second best method DA-W and the
in addition to G-mean and AUC, three evaluation metrics, worst one DAPME-W, respectively.
Precision, Recall, and F1, are added to measure the perfor- 3) IPSO has the best Precision and F1 on dataset D3; SEDA
mance of each algorithm, and the experimental results are has the best results on Recall, AUC and G-mean. On the key
shown in Tables XI−XIII. metric G-mean, SEDA performs better 1.44% and 9.41% than
The experimental results are analysed as follows. the second best method IPSO-W and worst one DAPME-W,
1) IPSO has the best Precision on the dataset D1; SEDA has respectively.
better results than other algorithms on Recall, F1, AUC and G- From the analysis of the above experimental results, it can
mean. On the key metric G-mean, SEDA beats by 4.50% and be seen that the method proposed in this paper outperforms
1 https://github.com/Paliking/ML_examples/blob/master/LoanPrediction/tra- the compared methods in most of the indicators, especially in
in_u6lujuX_CVtuZ9i.csv
2 https://github.com/gksj7/creditcardcsvpresent the key indicator G-mean, which is better than all other meth-
3 http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients ods.

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 387

TABLE VII
The Datasets of Large Imbalance Ratio
Abbr. Dataset #N #mi/Ma #A #R
S14 Statlog (Shuttle) 46 036 450/44 586 8 101.3
S15 Default of credit card clients 23 430 66/23 364 22 354.0
S16 Credit card fraud detection 284 807 492/284 315 30 577.9

TABLE VIII
G-Mean of the Large Imbalance Ratio Datasets

Dataset BA-W DA-W DAPML-W DAPMB-W DAPME-W IPSO-W MDA-W SEDA-W


S14 0.8237 0.9100 0.9102 0.6114 0.9102 0.9180 0.9102 0.9285
S15 0.5004 0.5746 0.6307 0.5707 0.6306 0.5229 0.6308 0.6319
S16 0.5002 0.5746 0.5787 0.5652 0.5446 0.5819 0.5380 0.5942
t-test (+/ = /−) 3/0/0 3/0/0 2/1/0 3/0/0 3/0/0 2/1/0 3/0/0 N/A

TABLE IX
AUC of the Large Imbalance Ratio Datasets

Dataset BA-W DA-W DAPML-W DAPMB-W DAPME-W IPSO-W MDA-W SEDA-W


S14 0.8696 0.9148 0.9150 0.7635 0.9150 0.9224 0.9150 0.9318
S15 0.5035 0.5864 0.6325 0.5726 0.6329 0.5756 0.6387 0.6385
S16 0.5009 0.5864 0,5898 0.5726 0.5555 0.5862 0.5481 0.5943
t-test (+/ = /−) 3/0/0 3/0/0 2/1/0 3/0/0 3/0/0 2/1/0 1/1/1 N/A

TABLE X
Public Data
Abbr. Dataset #N #mi/Ma #A #R
D1 LoanPrediction 614 192/422 11 0.45
D2 Default of credit card clients 30 000 6636/23 364 22 0.28
D3 CreditCardCSVPresent 3075 448/2627 9 0.17

TABLE XI
Comparison of Results for D1

Method Precision Recall F1 AUC G-mean


BA-W 0.7768 0.4539 0.5643 0.6915 0.6457
DA-W 0.7713 0.5408 0.5932 0.7039 0.6618
DAPML-W 0.7713 0.5408 0.5932 0.7039 0.6618
DAPMB-W 0.7519 0.5462 0.5907 0.7012 0.6602
DAPME-W 0.7521 0.5461 0.5898 0.6994 0.6608
IPSO-W 0.8236 0.4945 0.6026 0.7153 0.6739
MDA-W 0.7713 0.5408 0.5932 0.7039 0.6618
SEDA-W 0.7179 0.5989 0.6320 0.7210 0.7042

Finally, we examine the features selected by each algorithm tribute more to distinguishing categories than the rest, while
with the largest G-mean values on the datasets D1−D3, as the features in columns 1, 4, 8, 11, 16, 17 and 19 contribute
shown in Figs. 5−7. The red parts are selected and white one less to distinguishing categories. As can be seen in Fig. 7, in
are unselected. As can be seen in Fig. 5, in D1, the feature in D3, the feature in column 5 contributes more to distinguishing
column 10 is selected by all the algorithms in the experiment, categories than the rest, while the feature in column 3 con-
and therefore this feature contributes more to the differentia- tributes less to distinguishing categories.
tion of categories than the rest, while the features in columns 1
and 7 contributed less to the differentiation of categories. As V. Conclusion
can be seen in Fig. 6, in D2, the features in column 5 con- This paper has presented a simple and efficient dandelion

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
388 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

TABLE XII
Comparison of Results for D2

Method Precision Recall F1 AUC G-mean


BA-W 0.5265 0.4985 0.5016 0.6681 0.6373
DA-W 0.5537 0.5458 0.5481 0.6983 0.6799
DAPML-W 0.5547 0.5207 0.5346 0.6904 0.6656
DAPMB-W 0.5988 0.4485 0.5129 0.6527 0.6118
DAPME-W 0.5608 0.4497 0.4822 0.6539 0.6129
IPSO-W 0.5393 0.4853 0.5090 0.6722 0.6439
MDA-W 0.5032 0.5367 0.4978 0.6622 0.6255
SEDA-W 0.5666 0.5502 0.5571 0.7039 0.6857

TABLE XIII
Comparison of Results for D3

Method Precision Recall F1 AUC G-mean


BA-W 0.6580 0.7102 0.6426 0.8089 0.7962
DA-W 0.5025 0.7346 0.5951 0.8043 0.8007
DAPML-W 0.5026 0.7347 0.5969 0.8053 0.8018
DAPMB-W 0.5015 0.7343 0.5960 0.8032 0.7908
DAPME-W 0.4818 0.7213 0.5766 0.7936 0.7899
IPSO-W 0.7155 0.7928 0.7250 0.8583 0.8519
MDA-W 0.5025 0.7346 0.5951 0.8043 0.8007
SEDA-W 0.6093 0.8440 0.6885 0.8674 0.8642

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9
BA-W BA-W
DA-W DA-W
DAPML-W DAPML-W
DAPMB-W DAPMB-W

DAPME-W DAPME-W
IPSO-W
IPSO-W
MDA-W
MDA-W
SEDA-W
SEDA-W

Fig. 7. D3 selected features.


Fig. 5. D1 selected features.

basic DA and its variants, as well as other swarm intelligence


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
algorithms on CEC2013 benchmark functions. In addition, it
BA-W
is found that the proposed algorithm requires less time con-
DA-W sumption than its compared peers. Note that SEDA has been
DAPML-W combined with WELM for the classification problem of
DAPMB-W
imbalanced dataset. The experiments are conducted on 13
public datasets, and 3 datasets are large ratio datasets. Experi-
DAPME-W
mental results show the superiority of our proposed method.
IPSO-W
Finally, it is applied to feature selection for credit card fraud
MDA-W detection, and the results show its effectiveness. All experi-
SEDA-W mental data in this paper can be obtained from https://github.
com/bbxyzhh/Experimental-data.
Fig. 6. D2 selected features. Our future work aims to analyze the other optimization
technologies [41]–[49] and combine them with SEDA to fur-
algorithm (SEDA) by removing all unnecessary mechanisms ther improve its performance. More real-world datasets [42],
of basic DA, an adaptive seeding radius strategy is designed to [50]–[52] should be used to test the proposed algorithm and its
further reduce the number of DA’s parameters. Experimental peers. In addition, in practice, the credit card data may be
results show that our proposed algorithm outperforms the incomplete due to physical constraints. Hence, how to deal

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: A SELF-ADAPTING AND EFFICIENT DA AND ITS APPLICATION TO FEATURE SELECTION FOR CCFD 389

with incomplete data efficiently should be the focus of our “Optimizing weighted extreme learning machines for imbalanced
classification and application to credit card fraud detection,”
future work [53]–[55]. Neurocomputing, vol. 407, pp. 50–62, Sept. 2020.
[21] A. H. Hosseinian and V. Baradaran, “Detecting communities of
References workforces for the multi-skill resource-constrained project scheduling
problem: A dandelion solution approach,” J. Ind. Syst. Eng., vol. 12, no.
[1] S. Katoch, S. S. Chauhan, and V. Kumar, “A review on genetic Special issue on Project Management and Control, pp. 72–99, Jan.
algorithm: Past, present, and future,” Multimed. Tools Appl., vol. 80, 2019.
no. 5, pp. 8091–8126, Feb. 2021.
[22] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J. M.
[2] Ş. Öztürk, R. Ahmad, and N. Akhtar, “Variants of artificial bee colony Benítez, and F. Herrera, “A review of microarray datasets and applied
algorithm and its applications in medical image processing,” Appl. Soft feature selection methods,” Inf. Sci., vol. 282, pp. 111–135, Oct. 2014.
Comput., vol. 97, p. 106799, Dec. 2020.
[23] Z. Sun, G. Bebis, and R. Miller, “Object detection using feature subset
[3] W. Deng, J. Xu, H. Zhao, and Y. Song, “A novel gate resource selection,” Pattern Recognit., vol. 37, no. 11, pp. 2165–2176, Nov. 2004.
allocation method using improved PSO-based QEA,” IEEE Trans.
Intell. Transp. Syst., vol. 23, no. 3, pp. 1737–1745, Mar. 2022. [24] B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimisation for
feature selection in classification: Novel initialisation and updating
[4] E. B. Tirkolaee, A. Goli, and G. W. Weber, “Fuzzy mathematical mechanisms,” Appl. Soft Comput., vol. 18, pp. 261–276, May 2014.
programming and self-adaptive artificial fish swarm algorithm for just-
in-time energy-aware flow shop scheduling problem with outsourcing [25] M. K. Masood, Y. C. Soh, and C. Jiang, “Occupancy estimation from
option,” IEEE Trans. Fuzzy Syst., vol. 28, no. 11, pp. 2772–2783, Nov. environmental parameters using wrapper and hybrid feature selection,”
2020. Appl. Soft Comput., vol. 60, pp. 482–494, Nov. 2017.
[5] J. R. Albert, A. Sharma, B. Rajani, A. Mishra, A. Saxena, C. [26] Y. Xue, B. Xue, and M. Zhang, “Self-adaptive particle swarm
Nandagopal, and S. Mewada, “Investigation on load harmonic reduction optimization for large-scale feature selection in classification,” ACM
through solar-power utilization in intermittent SSFI using particle Trans. Knowl. Discovery Data, vol. 13, no. 5, p. 50, Oct. 2019.
swarm, genetic, and modified firefly optimization algorithms,” J. Intell. [27] R. Sagban, H. A. Marhoon, and R. Alubady, “Hybrid bat-ant colony
Fuzzy Syst., vol. 42, no. 4, pp. 4117–4133, Mar. 2022. optimization algorithm for rule-based feature selection in health care,”
[6] R. Rajabioun, “Cuckoo optimization algorithm,” Appl. Soft Comput., Int. J. Electr. Comput. Eng., vol. 10, no. 6, pp. 6655–6663, Dec. 2020.
vol. 11, no. 8, pp. 5508–5518, Dec. 2011. [28] X. Li, S. Han, L. Zhao, C. Gong, and X. Liu, “New dandelion algorithm
[7] J. Bi, H. Yuan, J. Zhai, M. Zhou, and H. V. Poor, “Self-adaptive bat optimizes extreme learning machine for biomedical classification
algorithm with genetic operations,” IEEE/CAA J. Autom. Sinica, vol. 9, problems,” Comput. Intell. Neurosci., vol. 2017, p. 4523754, Sept. 2017.
no. 7, pp. 1284–1294, Jul. 2022. [29] Q. Kang, L. Shi, M. Zhou, X. S. Wang, Q. D. Wu, and Z. Wei, “A
[8] S. Mirjalili, S. Saremi, S. M. Mirjalili, and L. D. S. Coelho, “Multi- distance-based weighted undersampling scheme for support vector
objective grey wolf optimizer: A novel algorithm for multi-criterion machines and its application to imbalanced classification,” IEEE Trans.
optimization,” Expert Syst. Appl., vol. 47, pp. 106–119, Apr. 2016. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4152–4165, Sept. 2018.
[9] N. Hansen, S. D. Müller, and P. Koumoutsakos, “Reducing the time [30] L. Zheng, G. Liu, C. Yan, and C. Jiang, “Transaction fraud detection
complexity of the derandomized evolution strategy with covariance based on total order relation and behavior diversity,” IEEE Trans.
matrix adaptation (CMA-ES),” Evol. Comput., vol. 11, no. 1, pp. 1–18, Comput. Soc. Syst., vol. 5, no. 3, pp. 796–806, Sept. 2018.
Mar. 2003. [31] H. Liu, M. Zhou, and Q. Liu, “An embedded feature selection method
[10] S. Han, K. Zhu, M. Zhou, X. Liu, H. Liu, Y. Al-Turki, and A. for imbalanced data classification,” IEEE/CAA J. Autom. Sinica, vol. 6,
Abusorrah, “A novel multiobjective fireworks algorithm and its applica- no. 3, pp. 703–715, May 2019.
tions to imbalanced distance minimization problems,” IEEE/CAA J. [32] W. Zong, G.-B. Huang, and Y. Chen, “Weighted extreme learning
Autom. Sinica, vol. 9, no. 8, pp. 1476–1489, Aug. 2022. machine for imbalance learning,” Neurocomputing, vol. 101, pp. 229–
[11] D. H. Wolpert and W. G. Macready, “No free lunch theorems for 242, Feb. 2013.
optimization,” IEEE Trans. Evol. Comput., vol. 1, no. 1, pp. 67–82, Apr. [33] H. Yu, C. Sun, X. Yang, S. Zheng, Q. Wang, and X. Xi, “LW-ELM: A
1997. fast and flexible cost-sensitive learning framework for classifying
[12] H. Zhu, G. Liu, M. Zhou, Y. Xie, and Q. Kang, “Dandelion algorithm imbalanced data,” IEEE Access, vol. 6, pp. 28488–28500, May 2018.
with probability-based mutation,” IEEE Access, vol. 7, pp. 97974– [34] J. J. Liang, B. Y. Qu, P. N. Suganthan, and A. G. Hernández-Díaz,
97985, Jul. 2019. “Problem definitions and evaluation criteria for the CEC 2013 special
[13] C. Gong, S. Han, X. Li, L. Zhao, and X. Liu, “A new dandelion session on real-parameter optimization,” Comput. Intell. Lab.,
algorithm and optimization for extreme learning machine,” J. Exp. Zhengzhou Univ., Zhengzhou, China and Nanyang Technol. Univ.,
Theor. Artif. Intell., vol. 30, no. 1, pp. 39–52, 2018. Singapore, Tech. Rep., Jan. 2013.
[14] S. Han and K. Zhu, “Fusion with distance-aware selection strategy for [35] D. Karaboga and B. Basturk, “A powerful and efficient algorithm for
dandelion algorithm,” Knowl.-Based Syst., vol. 205, p. 106282, Oct. numerical function optimization: Artificial bee colony (ABC)
2020. algorithm,” J. Global Optim., vol. 39, no. 3, pp. 459–471, Nov. 2007.
[15] S. Han, K. Zhu, and R. Wang, “Improvement of evolution process of [36] M. Zambrano-Bigiarini, M. Clerc, and R. Rojas, “Standard particle
dandelion algorithm with extreme learning machine for global swarm optimisation 2011 at CEC-2013: A baseline for future PSO
optimization problems,” Expert Syst. Appl., vol. 163, p. 113803, Jan. improvements,” in Proc. IEEE Congr. Evolutionary Computation,
2021. Cancun, Mexico, 2013, pp. 2337–2344.
[16] G.-B. Huang, Q.-Y. Zhu, and C. K. Siew, “Extreme learning machine: [37] R. Storn and K. Price, “Differential evolution-a simple and efficient
A new learning scheme of feedforward neural networks,” in Proc. IEEE heuristic for global optimization over continuous spaces,” J. Global
Int. Joint Conf. Neural Networks, Budapest, Hungary, 2004, pp. Optim., vol. 11, no. 4, pp. 341–359, Dec. 1997.
985–990. [38] N. Hansen and A. Ostermeier, “Adapting arbitrary normal mutation
[17] S. Han, K. Zhu, and M. Zhou, “Competition-driven dandelion distributions in evolution strategies: The covariance matrix adaptation,”
algorithms with historical information feedback,” IEEE Trans. Syst. in Proc. IEEE Int. Conf. Evolutionary Computation, Nagoya, Japan,
Man Cybern. Syst., vol. 52, no. 2, pp. 966–979, Feb. 2022. 1996, pp. 312–317.
[18] J. Li, J. Q. Zhang, C. J. Jiang, and M. Zhou, “Composite particle swarm [39] J. M. Lobo, A. Jiménez-Valverde, and R. Real, “AUC: A misleading
optimizer with historical memory for function optimization,” IEEE measure of the performance of predictive distribution models,” Global
Trans. Cybern., vol. 45, no. 10, pp. 2350–2363, Oct. 2015. Ecol. Biogeogr., vol. 17, no. 2, pp. 145–151, Mar. 2008.
[19] X. Liu and X. Qin, “A probability-based core dandelion guided [40] S. Han, K. Zhu, M. Zhou, H. Alhumade, and A. Abusorrah, “Locating
dandelion algorithm and application to traffic flow prediction,” Eng. multiple equivalent feature subsets in feature selection for imbalanced
Appl. Artif. Intell., vol. 96, p. 103922, Nov. 2020. classification,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 9,
[20] H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah, and Q. Kang, pp. 9195–9209, Sept. 2023.

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
390 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 11, NO. 2, FEBRUARY 2024

[41] Y. Xie, G. Liu, C. Yan, C. Jiang, M. Zhou, and M. Li, “Learning Honghao Zhu (Member, IEEE) received the B.S.
transactional behavioral representations for credit card fraud detection,” degree in computer science from Huaibei Normal
IEEE Trans. Neural Netw. Learn. Syst., 2022. DOI: 10.1109/TNNLS. University in 2003, M.S. degree from University of
2022.3208967 Electronic Science and Technology of China in 2009,
and the Ph.D. degree in computer science and tech-
[42] Y. Xie, G. Liu, C. Yan, C. Jiang, and M. Zhou, “Time-aware attention- nology from Tongji University in 2021. He is an
based gated network for credit card fraud detection by extracting Associate Professor at Bengbu University. His
transactional behaviors,” IEEE Trans. Comput. Soc. Syst., vol. 10, no. 3, research interests include machine learning, evolu-
pp. 1004–1016, Jun. 2023. tionary algorithms, recommendation systems, and
[43] S. Han, K. Zhu, M. Zhou, and X. Cai, “Competition-driven multimodal credit card fraud detection.
multiobjective optimization and its application to feature selection for
credit card fraud detection,” IEEE Trans. Syst. Man Cybern. Syst.,
vol. 52, no. 12, pp. 7845–7857, Dec. 2022. MengChu Zhou (Fellow, IEEE) received the B.S.
[44] S. Han, K. Zhu, M. Zhou, and X. Liu, “Joint deployment optimization degree in control engineering from Nanjing Univer-
and flight trajectory planning for UAV assisted IoT data collection: A sity of Science and Technology, in 1983, M.S.
bilevel optimization approach,” IEEE Trans. Intell. Transp. Syst., degree in automatic control from Beijing Institute of
vol. 23, no. 11, pp. 21492–21504, Nov. 2022. Technology, in 1986, and Ph.D. degree in computer
and systems engineering from Rensselaer Polytech-
[45] Z. Huang, Y. Liu, C. Zhan, C. Lin, W. Cai, and Y. Chen, “A novel nic Institute, USA in 1990. He joined the Depart-
group recommendation model with two-stage deep learning,” IEEE ment of Electrical and Computer Engineering, New
Trans. Syst. Man Cybern. Syst., vol. 52, no. 9, pp. 5853–5864, Sept. Jersey Institute of Technology in 1990, and has been
2022. a Distinguished Professor since 2013. His research
[46] M. Cui, L. Li, M. Zhou, and A. Abusorrah, “Surrogate-assisted interests include intelligent automation, robotics, Petri nets, Internet of
autoencoder-embedded evolutionary optimization algorithm to solve Things, edge/cloud computing, and big data analytics. He has over 1200 pub-
high-dimensional expensive problems,” IEEE Trans. Evol. Comput., lications including 17 books, over 800 journal papers including over 650
vol. 26, no. 4, pp. 676–689, Aug. 2022. IEEE Transactions papers, 31 patents and 32 book-chapters. He is a recipient
[47] M. Cui, L. Li, M. Zhou, J. Li, A. Abusorrah, and K. Sedraoui, “A Bi- of Excellence in Research Prize and Medal from NJIT, Humboldt Research
population cooperative optimization algorithm assisted by an Award for US Senior Scientists from Alexander von Humboldt Foundation,
autoencoder for medium-scale expensive problems,” IEEE/CAA J. and Franklin V. Taylor Memorial Award and the Norbert Wiener Award from
Autom. Sinica, vol. 9, no. 11, pp. 1952–1966, Nov. 2022. IEEE Systems, Man, and Cybernetics Society, and Edison Patent Award from
the Research & Development Council of New Jersey. He is a Life Member of
[48] Z. Zhao, S. Liu, M. Zhou, and A. Abusorrah, “Dual-objective mixed
Chinese Association for Science and Technology-USA and served as its Pres-
integer linear program and memetic algorithm for an industrial group
ident in 1999. He is Fellow of IEEE, (International Federation of Automatic
scheduling problem,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6,
Control (IFAC), American Association for the Advancement of Science
pp. 1199–1209, Jun. 2021.
(AAAS), Chinese Association of Automation (CAA) and National Academy
[49] X. Zhu and M. Zhou, “Multiobjective optimized deployment of edge- of Inventors (NAI)).
enabled wireless visual sensor networks for target coverage,” IEEE
Internet Things J., vol. 10, no. 17, pp. 15325–15337, Sept. 2023.
[50] D. Li, Q. Wu, M. Zhou, and F. Luo, “HHFS: A hybrid hierarchical Yu Xie received the B.S. degree in information secu-
feature selection method for ageing gene classification,” IEEE Trans. rity from Qingdao University in 2017, the Ph.D.
Cognit. Dev. Syst., vol. 15, no. 2, pp. 690–699, Jun. 2023. degree in computer software and theory from Tongji
University in 2022. He joined the College of Infor-
[51] H. Liu, M. Zhou, X. Lu, A. Abusorrah, and Y. Al-Turki, “Analysis of mation Engineering, Shanghai Maritime University,
evolutionary social media activities: Pre-vaccine and post-vaccine in 2022. His research interests include credit card
emergency use,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 4, pp. 1090– fraud detection, machine learning, representation
1092, Apr. 2023. learning, and big data.
[52] W. Duo, M. Zhou, and A. Abusorrah, “A survey of cyber attacks on
cyber physical systems: Recent advances and challenges,” IEEE/CAA J.
Autom. Sinica, vol. 9, no. 5, pp. 784–800, May 2022.
[53] X. Luo, Y. Yuan, S. Chen, N. Zeng, and Z. Wang, “Position-transitional
particle swarm optimization-incorporated latent factor analysis,” IEEE Aiiad Albeshri Received the M.S. and the Ph.D.
Trans. Knowl. Data Eng., vol. 34, no. 8, pp. 3958–3970, Aug. 2022. degrees in information technology from Queensland
University of Technology, Australia in 2007 and
[54] D. Wu, Y. He, X. Luo, and M. Zhou, “A latent factor analysis-based 2013, respectively. He has been an Associate Profes-
approach to online sparse streaming feature selection,” IEEE Trans. sor at the Department of Computer Science, King
Syst. Man Cybern. Syst., vol. 52, no. 11, pp. 6744–6758, Nov. 2022. Abdulaziz University, Saudi Arabia since 2018. His
[55] X. Luo, H. Wu, and Z. Li, “Neulft: A novel approach to nonlinear research interests include information security, trust
canonical polyadic decomposition on high-dimensional incomplete in cloud computing, big data and high-performance
tensors,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 6148–6166, computing (HPC).
Jun. 2023.

Authorized licensed use limited to: VIT University. Downloaded on October 04,2024 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.

You might also like