A Novel Resampling Technique For Imbalanced Classification in Software Defect Prediction by A Re-Sampling Method With Filtering
A Novel Resampling Technique For Imbalanced Classification in Software Defect Prediction by A Re-Sampling Method With Filtering
II. LITERATURE REVIEW PROMISE, JIRA, and Eclipse datasets. If these sam-
pling strategies are used with baseline models in fault
Scholars have noted that searching for a practical so- assessment datasets, the baseline model results are sig-
lution for misclassification is almost always a challenge nificantly improved. For intra-release and cross-release
because it is more likely to occur in the overlap area or SDP, non-linear and linear Bayesian regression have
near decision boundaries, as this work confirmed [18]. been carried out by Singh and Rathore30 . Therefore,
For instance, Napierala et al.18 showed in a number integrated with SMOTE data sampling methods, Ran-
of experiments that the number of borderline samples dom Forest (RF), Support Vector Support Vector Ma-
has a direct influence on the classifier’s degradation in chine (SVM), Linear Regression (LR), Linear Bayesian
an imbalanced scenario. Two distinct techniques have Regression (LBR), and Non-linear Bayesian Regression
been taken by the literature in an attempt to address (NLBr). The study demonstrated that, in an inde-
the aforementioned issue. pendent software product dataset of 46 products, the
Bayesian non-linear model outperforms the linear re-
A. Modifications of SMOTE gression model algorithms. Elahi et. al.31 , carried
change-direction techniques that are associated with out a study whereby a number of ensemble methods
SMOTE modifications. These direct SMOTEs gener- used in SDP were analyzed. For classification, applied
ate positive examples toward specified areas of the data are Logistic Regression LR, Naive Bayes NB, binomial
space as well as taking into account particular data producer multinomial NB, Decision Tree DT, and K-
features. This category includes the following meth- nearest neighbor KNN. This experiment is performed
ods: ADMOS19 , ADASYN20 , Borderline-SMOTE21 on four datasets from the PROMISE repository. This
and Safe level SMOTE22 . These techniques are in- paper uses F-measure as the performance measure. The
tended to generate positive examples only within the findings about model averaging outperform voting and
region of positive class or around regions where the den- stacking ensemble approaches were derived directly from
sity of positive examples is high. the data. He et al. proposed SHSE32 . In fact, it can
be concluded that CSS is a mixture between different
B. Extensions of SMOTE sampling, feature subspace, and ensemble learning. The
problems of data imbalance are solved with the help of
According to the results of the research, the results the Subspace Hybrid Sampling approach. When con-
of further data pretreatment methods combined with ducting experiments on 27 datasets, SHSE performed
SMOTE include filtering. Usually, noise filters are better than any other algorithms used in software fail-
used in ordinary classification problems to screen out ure number prediction. DT performs the best when im-
the possibly noisy samples and make the classifica- plemented together with SHSE.
tion boundaries more clear and definite when defin-
To address data imbalance issue in SDP, Goyal33
ing the training sets17 . In the empirical analysis of
proposed an innovative sampling technique known as
the performance of filters to identify the behavior of
neighborhood-based undersampling (N-US). ANN, DT,
data balance methods for the training of computer ML
KNN, SVM, and NB classifier models are used in the
data23 . The usefulness of integrating filters with over-
modeling process. The study also makes use of the
sampling approaches is established. Some of the gener-
PROMISE dataset. For the measurement of the model’s
als of SMOTE are SMOTE-RSB24 , SMOTE- FRST25 ,
performance accuracy, AUC and ROC are being used.
SMOTE-Tomek Links (TL)26 , SMOTE- ENN [Edited
As it has been evidenced, when acting in accordance
Nearest Neighbor Rule (ENN)]27 , and SMOTE-IPF17
with the N-US approach, the classifiers’ accuracy in-
in which a form of filtering takes place after the SMOTE
creases. Similar, Pandey et al.34 also used NASA and
operation. Tumar et al.28 proposed an SDP model in
the PROMISE repository for SDP. To address the prob-
which BMFO is combined with an ADASYN to over-
lem of data imbalance, the feature selection procedure
come data imbalance problems. When used on PMB
SMOTE combined with Kernel Principal Component
Poland’s PROMISE dataset, the suggested strategy im-
Analysis (K-PCA) is further used on the dataset in or-
proves the results of numerous classifiers.
der to exclude features that are irrelevant to the circum-
Rathore et al.29 created three independent generative stance. Compared with conventional methods of fault
oversampling techniques: Other models that we com- prediction, using rebooted NB, LR, Multi-Layer Percep-
pare our results against include conditional generative tron Neural Network (MLP), SVM, as well as conven-
adversarial networks, also known as CTGAN, vanilla tional K-PCA and SMOTE methods that are incorpo-
GAN, and Wasserstein GAN with Gradient Penalty, rated with Extreme Learning Machine and PCA-ELM
known as WGANGP. This experiment is carried out on
Conclusions in Engineering 11
have technically higher ROC indices in this study. The A. Synthetic Minority Over-Sampling Technique
recommended technique provides more objective results (SMOTE)
as compared to other reliable methods. To overcome
the data imbalance problem, Yedida and Menzies35 have A vast majority of the under-sampling and minority
put forward a new oversampling technique called fuzzy over-sampling methods have been discussed elaborately
sampling. An SDP model is developed by means of a in this literature on data sampling methods. This work
deep belief network. The experiment is implemented on employed the SMOTE2 , an algorithm that generates
the PROMISE dataset. It uses AUC, RUC, and false new synthetic instances in minority classes. They do so
alarm rates to assess the performance of the method- not in data space, but in feature space where they cre-
ologies. Thus, the authors find that oversampling is ate the synthetic instances. The SMOTE instances are
necessary before applying deep learning for SDP. given by S = S + u × (X 0 − X) with 0 ≤ u ≤ 1$, where
(X and X 0 ) are two similar samples belonging the mi-
Pandey et al.36 have done experiments on raw NASA nority class. X’ is randomly selected from K neighbor-
datasets to detect software faults. It is seen that the hoods of X in the minority class. The construction of
dataset is highly imbalanced, and thus the SMOTE the more recent examples expands the fullness and gen-
technique is applied to them. For the balanced dataset, erality of the minority while reducing the rarity of the
the SqueezeNet and Bottleneck DL models are used. occurrence.
Tantithamthavorn et al.37 applied four class imbalanc-
ing procedures: sampling techniques including oversam- B. Rough Set Analysis
pling, undersampling, SMOTE, and Random Oversam-
pling Examples (ROSE) associated with NB, AVNNet, Data is represented as an information system, orS =
xGBTree, C5.0, RF, LR, and GBM classification algo- (X, A) , in rough set analysis40 , where X = {x1 , . . . , xn }
rithms. The study revealed that to optimize the pa- and A = {a1 , . . . , am } are the non-empty and finite sets
rameters of SMOTE, AUC might be improved. In case of objects and attributes, respectively. There is a map-
of defect prediction, Nitin et al.38 used four ensemble ping a : X → Va for every a ∈ A , where Va is the value
techniques: random forest, bagging, random subspace, set of attribute a. The B-indiscernibility relation RB is
boosting, and SMOTE for handling imbalance data. defined with regard to any subsetB ⊆ A.
The ensemble techniques employ the use of DT, LR,
RB = {(x, y) ∈ X 2 : a(x) = a(y), ∀a ∈ B} (1)
and KNN as base learners. In the experiment, fifteen
datasets from the Eclipse and PROMISE repositories
are used. A model has been proposed by Balaram and RB is an equivalence relation, which can create a par-
Vasundra39 also known as BOA combined with E-RF tition of the universalX , denoted as X/RB . [x]RB =
linked to ADASYN. PROMISE dataset is used in the {y ∈ X : (x, y) ∈ RB } is the equivalence class of x
study. BOA is used to address the overfitting issue, and andX/RB = {[x]RB : x ∈ X} . GivenU ⊆ X , the lower
to address the class imbalance issue, ADASYN is em- and upper approximation w.r.t. RB are determined by
ployed. The evaluations in specificity, AUC, and sen- RB ↓ U = {x ∈ X|[x]RB ⊆ U } (2)
sitivity claimed that this proposed E-RF-ADASYN is
slightly higher than KNN and DT classifiers. Table I
shows some existing SDP models that used balancing RB ↑ U = {x ∈ X|[x]RB ∩ U 6= ∅} (3)
techniques
In the context of classification, a decision system
III. THE PROPOSED METHOD (X, A ∪ {d}) is a unique type of information system,
where the designated attribute d(d ∈/ A) is called the de-
In this section, we introduce the SMOTE-RSTNF tech- cision attribute. In X/Rd = {[x]Rd : x ∈ X}, the decision
nique for improving the SMOTE algorithm, in which classes with respect to d are given. Given B ⊆ A, the
every step removes noisy and borderline instances that objects from X for which the values of B allow for un-
can deteriorate learning performances. In the suggested ambiguous prediction of the decision class are included
algorithm, new synthetic minority class instances are in the B-positive areaP OSB :
first added to the training set using SMOTE. Synthetic
instances, or majority class instances, are then removed P OSB = RB ↓ [x]Rd (4)
[
a member of the same decision class as x. The following IV. EXPERIMENTAL DESIGN
number (degree of reliance of d on B ) represents the
predictive ability w.r.t. d of the qualities in B: In this work, the data is divided into ten mutual choice
subsets (the folds) of nearly the same measure as the
|P OSB |
γB = (5) training datasets are divided randomly. All the nine
|X| folders are chosen in the 10 fold procedure to train
both of the models and then testing it on the separate
(X, A ∪ {d}) is called consistent if γA = 1. If a subset
ninth fold. This continues until all of the folds have
B of A meets these requirements, it’s referred to as a
been utilized per their ability for testing or training al-
decision reduct. (1) P OSB = P OSA , meaning that B
ternatively. Empirical data is generated and analyzed
maintains A 0 s ability to make decisions. (2) It is not
by using computation programs WEKA version 3.8.1,
reducible further, that is, P OSB 0 = P OSA does not ex-
0 MATLAB R2016a, the KEEL software tool, and the R
ist for any suitable subset B of B . We refer to B as statistical program. in this work, the C4.5 learning al-
a choice superreduct if the latter requirement is lifted, gorithms are applied with the use of WEKA tools with
that is, if B is not necessarily minimum. the default settings. The AUC statistic determines the
accuracy of picture painted at how well the constructed
C. Iterative Partitioning Filtering Based Noise Filtering
models in terms of categorization. Datasets: All the
IPF arising due to the excellent work by41 The method datasets in the experiment except for JDT and PDE
of IPF eliminates a number of noisy cases in a number of that are mined from42 , were derived from public soft-
iterations prior to reaching a certain threshold. The it- ware project data repositories43 . Attached is Table II
erative method stops when, for a number of consecutive that gives features of information. The characteristics
iterations k, the number of mistakenly noisy examples of the data are presented in Table II.
in each of them is less than the percentage p of the ini-
tial training data set. Initially, the approach starts with
a set of noisy instances A = 0. The basic steps of each
iteration are as follows:
Two voting techniques can be employed to identify FIG. 2: Average AUC over all datasets for the C4.5
noisy examples: consensus and majority. The former classifier.
removes an example if it is misclassified by all the clas-
sifiers, while the latter eliminates an example if it is
misclassified by more than half the classifiers. The pa- V. RESULTS AND DISCUSSION
rameter setting for the implementation of IPF used in
this study has also been found to estimate the degree of To perform an evaluation of the results obtained in this
balance and the level of noisy and borderline samples on paper, several statistical tests are run to compare our
imbalanced datasets once preprocessed with SMOTE. In method with the one without preprocessing and also
fact, the majority approach is applied to recognize the compare it with eleven other preprocessing techniques
noisy examples; the number of partitions with random selected from the literature.
examples is n = 9, k = 3 iterations for the stop criterion, The outcomes of the experimental investigation for
and p = 1% of deleted examples. Scholarly investiga- the test partitions are summarized in Table V, where
tions into the effect of those parameters in the findings the best approach is highlighted for each data set and
are used to determine this parameter configuration. the one overall is in the first row. Because the proposed
Conclusions in Engineering 13
TABLE III: AUC Classification Results Over All Datasets for the C4.5 Classifier.
Dataset Normal SMOTE ADASYN ADOMS MWMOTE SL-SMOTE BL-SMOTE SMOTE-ENN SMOTE-RSB SMOTE-FRS SMOTE-TL SMOTE-IPF Proposed Method
ant-1.7 0.665 0.781 0.745 0.846 0.809 0.741 0.815 0.796 0.765 0.847 0.799 0.778 0.808
arc 0.502 0.836 0.813 0.864 0.88 0.783 0.931 0.843 0.75 0.856 0.869 0.833 0.883
EQ 0.686 0.75 0.726 0.771 0.779 0.702 0.785 0.842 0.739 0.646 0.836 0.799 0.851
jm1 0.617 0.7 0.672 0.819 0.833 0.695 0.795 0.757 0.728 0.692 0.734 0.756 0.872
Conclusions in Engineering
LC 0.654 0.844 0.838 0.92 0.927 0.737 0.937 0.797 0.808 0.904 0.835 0.866 0.95
PDE 0.577 0.805 0.794 0.879 0.879 0.705 0.869 0.854 0.831 0.807 0.803 0.85 0.908
Average 0.617 0.786 0.765 0.85 0.851 0.72 0.855 0.815 0.782 0.797 0.815 0.805 0.879
Winner 0 0 0 0 0 0 1 0 0 1 0 0 4
arc BL-SMOTE SMOTE-FRS Proposed Method MWMOTE SMOTE-TL ADOMS SMOTE-ENN SMOTE SMOTE-IPF ADASYN SL-SMOTE SMOTE-RSB Normal
EQ Proposed Method SMOTE-ENN SMOTE-TL SMOTE-IPF BL-SMOTE MWMOTE ADOMS SMOTE SMOTE-RSB ADASYN SL-SMOTE Normal SMOTE-FRS
Conclusions in Engineering
jm1 Proposed Method MWMOTE ADOMS BL-SMOTE SMOTE-ENN SMOTE-IPF SMOTE-TL SMOTE-RSB SMOTE SMOTE-FRS ADASYN SL-SMOTE Normal
LC Proposed Method BL-SMOTE MWMOTE ADOMS SMOTE-FRS SMOTE-RSB SMOTE-IPF SMOTE-TL SMOTE ADASYN SMOTE-ENN SL-SMOTE Normal
PDE Proposed Method ADOMS MWMOTE BL-SMOTE SMOTE-ENN SMOTE-RSB SMOTE-FRS SMOTE SMOTE-TL SMOTE-IPF ADASYN SL-SMOTE Normal
TABLE VI: Post Hoc comparison for A= 0:05, our proposed technique is the control method.
noise and borderline examples. This fact indicates more of safe examples, we have confirmed that the majority
efficient noise filtering since instances that were dis- method is superior to the consensus scheme. The con-
carded in the first stage do not interfere with the de- sensus scheme has been noted to be quite conservative
tection process at the next stage. Furthermore, in this with regard to deleting examples, and it does not permit
case, the ensemble nature of IPF with RST allows it to deletion to the extent that it can significantly improve
pool together predictions made by different classifiers, the performance.
leading to a better estimation of hard-to-classify noisy
samples than pooling data from a single classifier.
VI. CONCLUSION
IPF + RST might be effective in terms of remov-
ing noise and borderline samples because such charac- When it comes to learning from imprecise data, the pres-
teristics seem to be their main strength factors rather ence of noise and outliers is an art and still an active area
than another lower-ranked noise filter’s versatility. Most of research. In this work, we proposed SMOTE-RSTNF,
of the noise filters empowered with SMOTE, such as a combined preprocessing technique for unbalanced mul-
ENN or TL, define noise around two nearest neighbor tifaceted data. With the former, we employed the RST
instances but not around two different classes. Even to get rid of the original majority samples and synthetic
though they have not received attention in the liter- samples that were not part of the lower approximation
ature, this issue could be a source of embarrassment: of the class after applying SMOTE to generate the sam-
since SMOTE plots the position of a new positive ex- ples. The IPF is then applied to clean up all of the data.
ample relative to a nearest neighbor, these synthetic ex- Employing real-world software datasets, the suggested
amples, which are faulty, are likely to escape detection methodologies were employed to develop classifiers that
by noise filters, which are based on superfluous near- could identify malfunctioning modules. The techniques
est neighbors. While they will see some exceptions were tested using the C4.5 classifier, and the outcome
that have been classified as noise, noise identification was stored using the AUC performance metric. The
methods such as IPF and RST that are based on more outcome of the experiment proves that the use of our
complex criteria enable sample sizes that have similar proposed approach performed better than the existing
characteristics to be grouped. This solves the problem methods reported in the current literature. The signed-
mentioned earlier and helps in quite easily spotting the rank test conducted by the author revealed that the re-
outliers. Among all its parameters, the ones for IPF’s sults for the Friedmans scheme preferred the strategies
selection can be termed as one of its major weaknesses, suggested came out to be statistically significant for the
as there are many parameters and their values dictate independent experimental investigations. In our further
most of the filter’s performance. By way of our nu- work, we plan to address other problems associated with
merous experiments, then, we may draw some conclu- data, like class overlapping, which is not considered in
sions as to the effects of the different parameters on the our work, and taking it into consideration may enhance
performance results. Since there are enough noisy as the performance more. Also, we plan to apply boost-
well as borderline examples with regard to the number ing to bring greater improvements to implementing the
Conclusions in Engineering 16
suggested method for SDP. 2017 12th International Conference on Intelligent Systems and
Knowledge Engineering (ISKE) (IEEE, 2017) pp. 1–6.
12 S. Wang and X. Yao, “Using class imbalance learning for soft-
DECLARATION OF COMPETING INTER- ware defect prediction,” IEEE Transactions on Reliability 62,
EST 434–443 (2013).
13 C. W. Yohannese, T. Li, and K. Bashir, “A three-stage based
The authors declare that they have no known competing ensemble learning for improved software fault prediction: An
empirical comparative study,” International Journal of Compu-
financial interests or personal relationships that could tational Intelligence Systems 11, 1229–1247 (2018).
have appeared to influence the work reported in this 14 K. Bashir, T. Li, C. W. Yohannese, M. Yahaya, and T. Ali, “A
F. Herrera, “A review on ensembles for the class imbalance over-sampling method in imbalanced data sets learning,” in
problem: Bagging-, boosting-, and hybrid-based approaches,” International Conference on Intelligent Computing (Springer,
IEEE Transactions on Systems, Man, and Cybernetics, Part C 2005) pp. 878–887.
22 C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap,
(Applications and Reviews) 42, 463–484 (2011).
8 T. M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute se- “Safe-level-smote: Safe-level-synthetic minority over-sampling
lection and imbalanced data: Problems in software defect pre- technique for handling the class imbalanced problem,” in Ad-
diction,” in Proceedings of the 2010 22nd IEEE International vances in Knowledge Discovery and Data Mining (Springer,
Conference on Tools with Artificial Intelligence, Vol. 1 (IEEE, 2009) pp. 475–482.
23 V. García, J. Sánchez, and R. A. Mollineda, “An empir-
2010) pp. 137–144.
9 D. Van Nguyen, K. Ogawa, K.-i. Matsumoto, and ical study of the behavior of classifiers on imbalanced and
M. Hashimoto, “Editing training sets from imbalanced data overlapped data sets,” Progress in Pattern Recognition, Image
using fuzzy-rough sets,” in Artificial Intelligence Applications Analysis and Applications, , 397–406 (2007).
24 E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “Smote-
and Innovations: 11th IFIP WG 12.5 International Confer-
ence, AIAI 2015, Bayonne, France, September 14–17, 2015, rsb*: A hybrid preprocessing approach based on oversampling
Proceedings (Springer, 2015) pp. 115–129. and undersampling for high imbalanced data-sets using smote
10 C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napoli- and rough sets theory,” Knowledge and Information Systems
tano, “Rusboost: A hybrid approach to alleviating class im- 33, 245–265 (2012).
25 E. Ramentol, N. Verbiest, R. Bello, Y. Caballero, C. Cornelis,
balance,” IEEE Transactions on Systems, Man, and Cybernet-
ics—Part A: Systems and Humans 40, 185–197 (2009). and F. Herrera, “Smote-frst: A new resampling method using
11 C. W. Yohannese, T. Li, M. Simfukwe, and F. Khurshid, fuzzy rough set theory,” in Uncertainty Modeling in Knowledge
“Ensembles based combined learning for improved software Engineering and Decision Making (World Scientific, 2012) pp.
fault prediction: A comparative study,” in Proceedings of the 800–805.
Conclusions in Engineering 17
26 G. E. Batista, A. L. Bazzan, and M. C. Monard, “Balancing ing machine: an empirical study,” IET Software 14, 768–782
training data for automated annotation of keywords: A case (2020).
study,” WIT Transactions on Information and Communication 35 R. Yedida and T. Menzies, “On the value of oversampling for
Technologies 29, 10–18 (2003). deep learning in software defect prediction,” IEEE Transactions
27 G. E. Batista, R. C. Prati, and M. C. Monard, “A study of on Software Engineering 48, 3103–3116 (2021).
the behavior of several methods for balancing machine learning 36 S. K. Pandey, A. Haldar, and A. K. Tripathi, “Is deep learn-
training data,” ACM SIGKDD Explorations Newsletter 6, 20– ing good enough for software defect prediction?” Innovations in
29 (2004). Systems Software Engineering , 1–16 (2023).
28 I. Tumar, Y. Hassouneh, and H. Turabieh, “Enhanced binary 37 C. Tantithamthavorn, A. E. Hassan, and K. Matsumoto, “The
moth flame optimization as a feature selection algorithm to impact of class rebalancing techniques on the performance and
predict software fault prediction,” IEEE Access 8, 8041–8055 interpretation of defect prediction models,” IEEE Transactions
(2020). on Software Engineering 46, 1200–1219 (2018).
29 S. S. Rathore, S. Chouhan, D. Jain, and A. Vachhani, “Gen- 38 Nitin, K. Kumar, and S. S. Rathore, “Analyzing ensemble
erative oversampling methods for handling imbalanced data in methods for software fault prediction,” in Advances in Com-
software fault prediction,” IEEE Transactions on Reliability 71, munication and Computational Technology: Select Proceedings
747–762 (2022). of ICACCT 2019 (Springer, 2021) pp. 1253–1267.
30 R. Singh and S. S. Rathore, “Linear and non-linear bayesian 39 A. Balaram and S. Vasundra, “Prediction of software fault-
regression methods for software fault prediction,” International prone classes using ensemble random forest with adaptive syn-
Journal of System Assurance Engineering and Management 13, thetic sampling algorithm,” Automated Software Engineering
1864–1884 (2022). 29, 6 (2022).
31 E. Elahi, S. Kanwal, and A. N. Asif, “A new ensemble approach 40 Z. Pawlak, “Rough sets,” International Journal of Computer &
for software fault prediction,” in 2020 17th International Bhur- Information Sciences 11, 341–356 (1982).
ban Conference on Applied Sciences and Technology (IBCAST) 41 T. M. Khoshgoftaar and P. Rebours, “Improving software qual-
(IEEE, 2020) pp. 407–412. ity prediction by noise filtering techniques,” Journal of Com-
32 H. Tong, W. Lu, W. Xing, B. Liu, and S. Wang, “Shse: A puter Science and Technology 22, 387–396 (2007).
subspace hybrid sampling ensemble method for software de- 42 T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters,
fect number prediction,” Information Software Technology 142, and B. Turhan, “The promise repository of empirical software
106747 (2022). engineering data,” (2012), accessed: 2025-02-10.
33 S. Goyal, “Handling class-imbalance with knn (neighbourhood) 43 M. D’Ambros, M. Lanza, and R. J. E. S. E. Robbes, “Evalu-
under-sampling for software defect prediction,” Artificial Intel- ating defect prediction approaches: a benchmark and an exten-
ligence Review 55, 2023–2064 (2022). sive comparison,” Empirical Software Engineering 17, 531–577
34 S. K. Pandey, D. Rathee, and A. K. Tripathi, “Software defect
(2012).
prediction using k-pca and various kernel-based extreme learn-