0% found this document useful (0 votes)
11 views8 pages

Two Novel SMOTE Methods For Solving Imbalanced Classification Problems

This document presents two novel methods, Center Point SMOTE (CP-SMOTE) and Inner and Outer SMOTE (IO-SMOTE), to address the imbalanced classification problem in machine learning. These methods aim to improve the synthetic minority oversampling technique (SMOTE) by reducing the influence of noise samples and enhancing classification performance. Numerical experiments demonstrate that both CP-SMOTE and IO-SMOTE outperform conventional SMOTE and no-sampling methods in various classification metrics.

Uploaded by

padmajakamaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

Two Novel SMOTE Methods For Solving Imbalanced Classification Problems

This document presents two novel methods, Center Point SMOTE (CP-SMOTE) and Inner and Outer SMOTE (IO-SMOTE), to address the imbalanced classification problem in machine learning. These methods aim to improve the synthetic minority oversampling technique (SMOTE) by reducing the influence of noise samples and enhancing classification performance. Numerical experiments demonstrate that both CP-SMOTE and IO-SMOTE outperform conventional SMOTE and no-sampling methods in various classification metrics.

Uploaded by

padmajakamaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Received 23 December 2022, accepted 10 January 2023, date of publication 13 January 2023, date of current version 19 January 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3236794

Two Novel SMOTE Methods for Solving


Imbalanced Classification Problems
YUAN BAO 1 AND SIBO YANG 2
1 School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
2 School of Science, Dalian Maritime University, Dalian 116026, China

Corresponding author: Sibo Yang ([email protected])


This work was supported by the National Natural Science Foundation of China under Grant 61720106005.

ABSTRACT The imbalanced classification problem has always been one of the important challenges in
neural network and machine learning. As an effective method to deal with imbalanced classification prob-
lems, the synthetic minority oversampling technique (SMOTE) has its disadvantage: Some noise samples
may participate in the process of synthesizing new samples; As a result, the new synthetic sample lacks its
rationality, which will reduce the classification performances of the network. To remedy this shortcoming,
two novel improved SMOTE method are proposed in this paper: Center point SMOTE (CP-SMOTE) method
and Inner and outer SMOTE (IO-SMOTE) method. The CP-SMOTE method generates new samples based on
finding several center points, then linearly combining the minority samples with their corresponding center
points. The IO-SMOTE method divides minority samples into inner and outer samples, and then uses inner
samples as much as possible in the subsequent process of generating new samples. Numerical experiments
are conducted to prove that compared with no-sampling and conventional SMOTE methods, the CP-SMOTE
and IO-SMOTE methods can achieve better classification performances.

INDEX TERMS Imbalanced classification problems, IO-SMOTE method, CP-SMOTE method, machine
learning.

I. INTRODUCTION The approaches to dealing with the imbalanced prob-


For general balanced classification problems, the conven- lems roughly come from two directions: algorithm improve-
tional neural networks can achieve good classification results. ment [4], [5] and data processing. The algorithm improve-
However, in the real world, there are lots of imbalanced ment includes feature selection [6], cost-sensitive [7], and
problems, such as transaction fraud, cancer diagnosis [1], [2], integrated learning. And one of the effective data processing
virus script judgment, and so on. As far as cancer diagnosis is resampling method [8], which includes undersampling [9]
is concerned, the number of cancer patients must be small. and oversampling methods [10]. Undersampling method is to
But it is precisely that these few cancer patients are the most remove some samples in the majority class to make the num-
important research objects. At this time, the original neural ber of positive and negative samples balanced, and then train
networks [3] are no longer able to obtain satisfactory clas- the network. The random undersampling (RUS) method [11]
sification results, especially for those minority samples. The is one of the simpler undersampling methods. As the name
reason for this result is that too few minority samples make suggests, the RUS method is to randomly select some samples
the networks unable to learn the dataset efficiently. Therefore, from the majority Smajor to form a sample set E; And then
how to deal with imbalanced problems is an important issue remove the sample set E from Smajor to obtain a new data set
in machine learning. Sminor +Smajor −E. The RUS method achieves the purpose of
modifying the sample distribution by changing the proportion
of the majority samples, so as to make the samples more
The associate editor coordinating the review of this manuscript and balanced. However, it also has some disadvantages. Since
approving it for publication was Jad Nasreddine . the sample number of the new dataset is less than that of

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
5816 VOLUME 11, 2023
Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

the original dataset, some information will be lost. That is, out to compare the CP-SMOTE and IO-SMOTE methods
deleting the majority samples might cause the classifier to with the no-sampling and conventional SMOTE methods.
lose important information about the majority class. According to comparing the classification accuracy rate, pre-
In order to overcome the shortcomings of the under- diction rate, recall rate, F1-measure and some other indica-
sampling method, researchers have proposed oversampling tors, the CP-SMOTE and IO-SMOTE methods have their own
method [12], [13]. And the basic idea of the oversampling advantages, and on the whole, these two methods are much
method is to add some minority samples to make the number better than the SMOTE method.
of positive and negative samples balanced. The simplest ran- The remaining chapters of this paper are organized as
dom oversampling (ROS) method [14] is to randomly select follows: The descriptions of the CP-SMOTE and IO-SMOTE
some samples from the minority samples Sminor , and generate methods are given in Section II. And in Section III, some
a sample set E by copying the selected samples, then add numerical experiments on four datasets and corresponding
them to Sminor to obtain a new minority class set Sminor + E. analysis are carried out after we show the experiment setting.
However, for the ROS method, the complexity increases in At last, the conclusion is presented in Section IV.
the process of training the networks due to the duplication of
the minority samples. On the other hand, it is easy to cause II. CP-SMOTE AND IO-SMOTE METHODS
over-fitting problems, because the ROS method is simply a A. CP-SMOTE METHOD (CENTER POINT SMOTE METHOD)
copy process of the initial samples, which is not conducive to
For solving the imbalanced classification problem, the con-
the generalization performance of the network.
ventional SMOTE method synthesize several minority points
In order to solve the over-fitting problem [15] caused by to balance the number of various samples. However, this
the ROS method, and simultaneously ensure the dataset is method blurs the boundary between the majority and minority
balanced, Chawla [16] proposed a synthetic minority over- samples. As shown in Fig. 1, suppose that A is chosen to be an
sampling technique (SMOTE) method. The basic idea of the oversampling point, then randomly select point B among the
SMOTE method is as follows: For each minority sample xi , k-nearest neighbor points of A, and randomly generate point
randomly choose a sample xi′ from its neighbor (xi′ is also a C on the connection line between point A and B. However, it is
minority sample); Then randomly select a point on the line not difficult to see that the neighbor points of C are majority
between xi and xi′ as the new synthetic minority sample. points, and even point C itself might be a majority sample.
Based on the SMOTE method, many researchers have Therefore, the new sample synthesized by SMOTE method
made improvements and achieved better classification is an extremely unreasonable sample point, which will cause
results. Borderline SMOTE [17] oversampling process is to a particularly large error in the subsequent network training
divide the minority samples into three categories: safe, danger and affect the performances of the classifier.
and noise. And then, only the danger samples are employed To overcome the above-mentioned shortcomings of the
to generate the novel samples. Radius SMOTE first selects SMOTE method, we propose a new center point-SMOTE
a minority sample xi and calculates a radius according to (CP-SMOTE) method. First, the k-clustering method [20],
the k-nearest neighbor. Then take xi as the center and ran- [21] is used to find several regions of the minority sample
domly find several points so that their distance to xi is less distribution. For each region, calculate the Euclidean center
than the radius. The R-SMOTE method [18] eliminates the point of all the minority points in the region where they are
limitation of generating minority class instance distribution located. If this distance is less than the distance of any major-
and improves the classification accuracy of minority class. ity sample point to the center point, then randomly select a
ADASYN [19] was proposed to generate new minority class new point between the minority sample point and the center
samples near the original samples that were misclassified point; Otherwise, the minority sample point is abandoned.
based on the k-nearest neighbor classifier. As shown in Fig. 2, find the two regions where the minority
For the original SMOTE method, some noise samples sample is located. For the right region, we calculate the
might participate in the process of synthesizing new sam- distances of the center point O to all points in this region,
ples. Thus, the new synthetic sample lacks its rational- and calculate the closest distance d of all the majority sample
ity, which will reduce the classification performances of points to O. For each minority sample D, if the distance
the classifier. The purpose of this paper is to propose two between D and O is less than d, then we randomly synthesize
novel improved SMOTE methods: Center point SMOTE (CP- a point between D and O; otherwise, D does not participate
SMOTE) method and Inner and outer SMOTE (IO-SMOTE) in synthesizing new sample points. For the left region, the
method. The novel CP-SMOTE method generates new sam- similar process is applied again.
ples according to finding several center points, and making The process is given in Algorithm 1:
a linear combination of the minority samples and their cor-
responding center points. As another alternative method to Step 1: Divide the imbalanced dataset into majority
avoid noise samples, the IO-SMOTE method divides minor- class samples and minority class samples.
ity samples into inner and outer parts, and then uses inner Step 2: The k-clustering method is employed
samples as much as possible in the subsequent process of gen- to find n regions and corresponding center
erating new samples. The numerical experiments are carried point {O1 , O2 , . . . , On } of the minority sample

VOLUME 11, 2023 5817


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

FIGURE 1. Special case of SMOTE method. The stars, circles and square denote the minority samples,
majority samples and new synthetic sample, respectively.

FIGURE 2. CP-SMOTE method. The stars, circles and triangle denote the minority samples, majority
samples and center point, respectively.

distribution, where Oi = m1 m
P
j=1 Dij , Dij is the j-th For each point x ∈ M , it can be obviously classified into
point in i-th region. positive class if most neighbor of x is positive. In this case,
Step 3: For i = 1 to i = n, calculate the closest we call the point x inner point. On the other hand, if most
distance di of all the majority sample points to the neighbor of x are negative points, the class of point x is not
points Oi . easy to give and point x is denoted by outer point. Therefore,
Step 4: For each minority class sample P, calculate the minority set M is divided into two parts: inner point set
the distance dis of this sample to its corresponding and outer point set. Here, the k-nearest neighbor method is
center point. applied to find the neighbors of point x. Specifically, select
Step 5: Compare dis with its corresponding di . two fixed positive integer c1 and c2 , where c1 < c2 . For
If dis < di , then synthesize a point in the following any x ∈ M , if there exists c ∈ [c1 , c2 ] such that the number
criterion: of positive points in the adjacent points of x exceeds half of
|M |, then point x is an inner point. Otherwise, point x is an
Pnew = ηP + (1 − η)Oi , (1)
outer point. As shown in Fig. 3, for the minority sample x1 ,
where 0 < η < 1. Otherwise, the point P does not only one of the six-nearest neighbors is the minority sample,
participate in synthesizing new sample points. so x1 is an outer point; On the contrary, for x2 , five of the
Step 6: Put the dataset obtained in steps 2-5 and six-nearest neighbors are minority samples, so x2 is an inner
the original sample set together, and then train the point.
networks. The process is given in Algorithm 2:
Step 1: Divide the imbalanced dataset into majority
B. IO-SMOTE METHOD (INNER AND OUTER SMOTE set N and minority set M .
METHOD) Step 2: Divide the minority set M into two parts:
Given an imbalanced dataset including the minority (positive) Inner set inner and outer set outer.
set M and the majority (negative) set N , |M | < |N |. Here |M | Step 3: In the case of inner ̸= ∅ and outer ̸ = ∅,
and |N | denote the number of M and N , respectively. for each point x ∈ inner then find point y ∈ outer

5818 VOLUME 11, 2023


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

FIGURE 3. IO-SMOTE method. The stars and circles denote the minority and majority samples,
respectively. x1 is an outer point, x2 is an inner point.

closest to the point x. IO-SMOTE method synthe- negative class; Otherwise, if the actual output is more than
sizes a new point z in the following criterion: 0.50, then we regard it as approximately equal to 1 and
classify this sample into positive class. Here, the sigmoidal
z = ηx + (1 − η)y, (2) function is employed as activation function:
where 0 < η < 1. In this way, the point number 1
that the IO-SMOTE method synthesizes is equal to g(x) = . (5)
1 + e−x
the inner point number. The experiment process is given in Algorithm 3:
Step 4: For the case inner ̸ = ∅ or outer ̸ = ∅,
Step 1: Input the imbalanced dataset, the minority
randomly choose three point x1 , x2 and x3 from the
(positive) set M = {mj |mj ∈ Rn , j = 1, . . . , M }
minority set M . IO-SMOTE method synthesizes a
and the majority (negative) set N = {nj |nj ∈
new point z in the following criterion:
Rn , j = 1, . . . , N }.
z = η2 x1 + (1 − η2 )y, (3) Step 2: The above four methods are applied to
generate positive samples Q to balance the num-
where ber of positive samples and negative samples,
y = η1 x2 + (1 − η1 )x3 , (4) respectively.
Step 3: Five-fold cross validation technology: 8 =
where 0 < η1 , η2 < 1. M ∪ N ∪ Q = {(xj , oj )|xj ∈ Rn , oj = 0 or 1, j =
Step 5: Put the dataset obtained in steps 3-4 and the 1, . . . , T } is equally divided into five parts:
original sample set together, and train the networks. 81 , . . ., 85 .
Step 4: For i = 1 to i = 5, do Step 4 to Step 7. Let
III. NUMERICAL EXPERIMENTS 8i be the test samples, while 8 \ 8i is the training
To verify the validity of the CP-SMOTE and IO-SMOTE samples.
methods, we compare them with no-sampling and SMOTE Step 5: Train an FNN with the datasets generated
methods on four real classification problems: ecoli1, yeast1, by each of the above-mentioned four methods, and
yeast3 and newthyroid1. test the performances of these four networks.
Step 6: Train an ELM with the datasets generated
A. EXPERIMENT SETTINGS by each of the above-mentioned four methods, and
In our experiments, five-fold cross validation technology will test the performances of these four networks.
be used [22], [23], [24], [25]. For details, the dataset is equally Step 7: Repeat the above procedure Steps 3-6 twenty
divided into five parts, and the learning process is conducted times.
twenty times. For each time of the training process, each part Step 8: Compare the one hundred experimental
takes turns as the test set, while the rest as the training set. The results of these four methods.
above process is repeated twenty times. After adding them all
together, one hundred classification results are achieved for B. EXPERIMENTAL RESULTS
each method-data pair. The contents in Tabs. 1-3 are obtained For these four different datasets, the SMOTE, IO-SMOTE,
by averaging the corresponding 100 results. and CP-SMOTE methods are respectively applied to over-
We evaluate the class of a sample according to the actual sample the minority class samples. For newly generated sam-
output: If the actual output is less than 0.50, then we regard ples, their characteristics are high-dimensional. To visualize
it as approximately equal to 0 and classify this sample into these points, the PCA technique [26], [27] is employed

VOLUME 11, 2023 5819


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

FIGURE 4. Discrete point models based on four oversampling methods in two-dimension. Red plus sign represents minority sample points, blue dots
denote majority samples, green snowflakes are newly generated samples.

to reduce the dimensionality of the sample points in the datasets obtained by the above three oversampling meth-
n-dimensional to two-dimensional space. The distributions of ods (cf. Tabs. 1-2). According to these two tables, the
these points are shown in Fig. 4, where blue represents the IO-SMOTE and CP-SMOTE methods are both better than
majority sample points, red represents the minority sample the no-sampling and SMOTE methods in terms of training
points, and green represents the synthetic sample points. and test accuracies. Moreover, the classification accuracies of
Obviously, compared with the SMOTE method, the new syn- the CP-SMOTE method are slightly higher than those of the
thetic sample points of the IO-SMOTE and CP-SMOTE are IO-SMOTE method.
more compact, especially the CP-SMOTE method. For the At the same time, we compared the error function in
CP-SMOTE method, the new synthetic sample points rarely the neural network model (cf. Fig. 5). It can be seen that
appear near the class boundary, which will make the error the dataset without oversampling processing has the largest
smaller in the learning process. error, while the dataset with SMOTE method has a signifi-
Furthermore, the feedforward neural network (FNN) [28] cant improvement. In addition, the errors of these two novel
and extreme learning machine (ELM) [29], [30] are SMOTE methods are both obviously better than that of the
employed to train the original dataset and the new SMOTE method.

5820 VOLUME 11, 2023


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

FIGURE 5. Error functions based on four oversampling methods for four datasets.

TABLE 1. Classification accuracies for four oversampling methods in ELM. TABLE 2. Classification accuracies for four oversampling methods in FNN.

error (RMSE) [34]. The specific calculation formulae are as


For the purpose of evaluating the error in the learn- follows:
ing process of these four methods, besides the classifica- TP
PR := ,
tion accuracy, we also compare the following five criteria: TP + FP
prediction rate (PR), recalling rate (RR) [31], F1-measure TP
RR := ,
[32], the standard deviation (σ ) [33] and the root mean square TP + FN

VOLUME 11, 2023 5821


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

TABLE 3. Five classification criteria for four datasets. center points; The IO-SMOTE method divides minority sam-
ples into inner and outer samples, and then uses inner samples
as much as possible in the subsequent process of generating
new samples. Most of the samples generated by these two
methods are far away from the classification boundary, which
will make error smaller in the process of training the network.
Experiments are conducted for solving four classifica-
tion problems. The experimental results reveal that the
IO-SMOTE and CP-SMOTE methods both have better per-
formances than the traditional SMOTE method.

REFERENCES
[1] M. Saini and S. Susan, ‘‘Deep transfer with minority data augmentation for
imbalanced breast cancer dataset,’’ Appl. Soft Comput., vol. 97, Dec. 2020,
Art. no. 106759.
[2] Q. Li, G. Yu, J. Wang, and Y. Liu, ‘‘A deep multimodal generative and
fusion framework for class-imbalanced multimodal data,’’ Multimedia
Tools Appl., vol. 79, nos. 33–34, pp. 25023–25050, Sep. 2020.
[3] E. Judith and J. M. Deleo, ‘‘Artificial neural networks,’’ Cancer, vol. 91,
no. 8, pp. 1615–1635, 2001.
[4] J.-J. Zhang and P. Zhong, ‘‘Learning biased SVM with weighted within-
class scatter for imbalanced classification,’’ Neural Process. Lett., vol. 51,
no. 1, pp. 797–817, Feb. 2020.
[5] H. Zhu, H. Liu, and A. Fu, ‘‘Class-weighted neural network for monotonic
imbalanced classification,’’ Int. J. Mach. Learn. Cybern., vol. 12, no. 4,
pp. 1191–1201, Apr. 2021.
v [6] B. Selvalakshmi and M. Subramaniam, ‘‘Intelligent ontology based seman-
u S−1 tic information retrieval using feature selection and classification,’’ Cluster
u 1 X
σ := t (yi − yi )2 , Comput., vol. 22, no. 5, pp. 12871–12881, Sep. 2019.
S −1 [7] H. Hu, Q. Wang, M. Cheng, and Z. Gao, ‘‘Cost-sensitive semi-supervised
i=1
v deep learning to assess driving risk by application of naturalistic vehicle
u S trajectories,’’ Exp. Syst. Appl., vol. 178, Sep. 2021, Art. no. 115041.
u1 X
RMSE := t (yi − ti )2 , [8] N. M. Faber, ‘‘Comment on a recently proposed resampling method,’’
S J. Chemometrics, vol. 15, no. 3, pp. 169–188, Mar. 2001.
i=1 [9] H. Yu, J. Ni, and J. Zhao, ‘‘ACOSampling: An ant colony optimization-
2 × PR × RR based undersampling method for classifying imbalanced DNA microarray
F1 − measure := . data,’’ Neurocomputing, vol. 101, pp. 309–318, Feb. 2013.
PR + RR
[10] S. Park and H. Park, ‘‘Combined oversampling and undersampling method
based on slow-start algorithm for imbalanced network traffic,’’ Computing,
And Tab. 3 shows the prediction rate, recall rate, vol. 103, no. 1, pp. 1–24, 2021.
F1-measure, σ and RMSE of these four methods. In these [11] M. A. Tahir, J. Kittler, F. Yan, and K. Mikolajczyk, ‘‘Concept learning for
four datasets, the IO-SMOTE and CP-SMOTE methods both image and video retrieval: The inverse random under sampling approach,’’
have better performances than the no-sampling and SMOTE in Proc. 17th Eur. Signal Process. Conf., 2015, pp. 574–578.
[12] S. Kumar, M. S. Chaudhari, R. Gupta, and S. Majhi, ‘‘Multiple CFOs
methods on all these five criteria. Furthermore, in the datasets estimation and implementation of SC-FDMA uplink system using over-
of Ecoli1, Yeast1 and Yeast3, the CP-SMOTE method per- sampling and iterative method,’’ IEEE Trans. Veh. Technol., vol. 69, no. 6,
forms better than IO-SMOTE method; And for the rest dataset pp. 6254–6263, Jun. 2020.
[13] Y. Yang, S. Fu, and E. T. Chung, ‘‘Online mixed multiscale finite element
of Newthyroid1, these two methods have their own advan- method with oversampling and its applications,’’ J. Sci. Comput., vol. 82,
tages under different evaluation criteria. Combined with the no. 2, pp. 1–20, Feb. 2020.
classification accuracies, the ranking of these four oversam- [14] Y. Pang, Z. Chen, L. Peng, K. Ma, C. Zhao, and K. Ji, ‘‘A signature-
pling methods is: CP-SMOTE>IO-SMOTE>SMOTE >No- based assistant random oversampling method for malware detection,’’ in
Proc. 18th IEEE Int. Conf. Trust, Secur. Privacy Comput. Commun./13th
sampling. SMOTE and these two proposed methods are over- IEEE Int. Conf. Big Data Sci. Eng. (TrustCom/BigDataSE), Aug. 2019,
sampling methods and do not involve network structure. Only pp. 256–263.
ELM and FNN networks are enough in experiments. In fact, [15] J. Kolluri, V. K. Kotte, M. S. B. Phridviraj, and S. Razia, ‘‘Reducing
overfitting problem in machine learning using novel L1/4 regularization
we will obtain similar results under other network models. method,’’ in Proc. 4th Int. Conf. Trends Electron. Informat. (ICOEI),
Jun. 2020, pp. 934–938.
IV. CONCLUSION [16] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, ‘‘SMOTEBoost:
Improving prediction of the minority class in boosting,’’ in Proc. Eur. Conf.
This paper proposes two novel improved SMOTE methods to Knowl. Discovery Databases, 2003, pp. 107–119.
generate new samples: Center point SMOTE (CP-SMOTE) [17] H. Hui, W. Y. Wang, and B. H. Mao, ‘‘Borderline-SMOTE: A new over-
method and Inner and outer SMOTE (IO-SMOTE) method. sampling method in imbalanced data sets learning,’’ in Proc. Int. Conf. Adv.
Intell. Comput., 2005, pp. 878–887.
The CP-SMOTE method generates new samples according
[18] M. Naseriparsa, A. Al-Shammari, M. Sheng, Y. Zhang, and R. Zhou,
to finding several center points, and then making a linear ‘‘RSMOTE: Improving classification performance over imbalanced medi-
combination of the minority samples and their corresponding cal datasets,’’ Health Inf. Sci. Syst., vol. 8, no. 1, pp. 1–13, Dec. 2020.

5822 VOLUME 11, 2023


Y. Bao, S. Yang: Two Novel SMOTE Methods for Solving Imbalanced Classification Problems

[19] H. He, Y. Bai, E. A. Garcia, and S. Li, ‘‘ADASYN: Adaptive synthetic [32] R. Wang and J. Li, ‘‘Bayes test of precision, recall, and F1 measure for
sampling approach for imbalanced learning,’’ in Proc. IEEE Int. Joint Conf. comparison of two natural language processing models,’’ in Proc. 57th
Neural Netw., Jun. 2008, pp. 1322–1328. Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 4135–4145.
[20] R. Aishwarya and V. Nagaraju, ‘‘Automatic region of interest based medi- [33] H. Azami, A. Fernández, and J. Escudero, ‘‘Refined multiscale fuzzy
cal image segmentation using spatial fuzzy K clustering method 1,’’ Int. J. entropy based on standard deviation for biomedical signal analysis,’’ Med.
Electron. Commun. Technol., vol. 3, no. 1, pp. 226–229, Mar. 2012. Biol. Eng., Comput., vol. 55, no. 11, pp. 2037–2052, 2017.
[21] S. Mahak, ‘‘Image segmentation with modified K-means clustering [34] C. J. Willmott and K. Matsuura, ‘‘Advantages of the mean absolute
method,’’ Int. J. Recent Technol. Eng., vol. 1, no. 2, pp. 176–179, 2012. error (MAE) over the root mean square error (RMSE) in assessing average
[22] T.-T. Wong and N.-Y. Yang, ‘‘Dependency analysis of accuracy estimates model performance,’’ Climate Res., vol. 30, no. 1, pp. 79–82, Dec. 2005.
in k-fold cross validation,’’ IEEE Trans. Knowl. Data Eng., vol. 29, no. 11,
pp. 2417–2427, Nov. 2017.
[23] P. Jiang and J. Chen, ‘‘Displacement prediction of landslide based on
generalized regression neural networks with K-fold cross-validation,’’
Neurocomputing, vol. 198, pp. 40–47, Jul. 2016.
YUAN BAO received the B.S. degree in mathemat-
[24] J. He and X. Fan, ‘‘Evaluating the performance of the K-fold cross-
ics and applied mathematics from Henan Univer-
validation approach for model selection in growth mixture modeling,’’
Struct. Equation Model., Multidisciplinary J., vol. 26, no. 1, pp. 66–79, sity, Kaifeng, China, in 2013, and the Ph.D. degree
Jan. 2019. in computational mathematics from the Dalian
[25] T. Fushiki, ‘‘Estimation of prediction error by using K-fold cross- University of Technology, Dalian, China, in 2020.
validation,’’ Statist. Comput., vol. 21, no. 2, pp. 137–146, Apr. 2011. She is currently a Postdoctoral Fellow with the
[26] B. C. Moore, ‘‘Principal component analysis in linear systems: Controlla- School of Information Science and Technology,
bility, observability, and model reduction,’’ IEEE Trans. Autom. Control, Dalian Maritime University. Her research inter-
vol. AC-26, no. 1, pp. 17–32, Feb. 1981. ests include finite element methods and computer
[27] L. E. Pirogov and P. M. Zemlyanukha, ‘‘Principal component analysis for networks.
estimating parameters of the L1287 dense core by fitting model spec-
tral maps into observed ones,’’ Astron. Rep., vol. 65, no. 2, pp. 82–94,
Feb. 2021.
[28] M. Frean, ‘‘The upstart algorithm: A method for constructing and training
feedforward neural networks,’’ Neural Comput., vol. 2, no. 2, pp. 198–209,
Jun. 1990. SIBO YANG received the B.S. and Ph.D. degrees
[29] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, ‘‘Extreme learning machine: The- in computational mathematics from the Dalian
ory and applications,’’ Neurocomputing, vol. 70, nos. 1–3, pp. 489–501, University of Technology, Dalian, China, in
2006. 2013 and 2020, respectively. He is currently a Lec-
[30] G. B. Huang, H. Zhou, X. Ding, and R. Zhang, ‘‘Extreme learning machine turer with the School of Science, Dalian Maritime
for regression and multiclass classification,’’ IEEE Trans. Syst., Man, University, Dalian. His research interests include
Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Feb. 2012. extreme learning machine and improvement of
[31] J. M. DuBois, L. S. Boylan, M. Shiyko, W. B. Barr, and O. Devinsky, learning algorithms in neural networks.
‘‘Seizure prediction and recall,’’ Epilepsy Behav., vol. 18, nos. 1–2,
pp. 106–109, May 2010.

VOLUME 11, 2023 5823

You might also like