0% found this document useful (0 votes)
4 views

Software Defect Prediction Based on Multi-filter Wrapper Feature (2)

This study introduces a novel software defect prediction (SDP) model that utilizes a multi-filter wrapper feature selection technique combined with a one-dimensional convolutional neural network (1D-CNN) and an attention mechanism. The approach aims to enhance predictive accuracy by selecting relevant features from software metrics while addressing issues related to irrelevant and redundant data. Experimental results on 17 open-source datasets demonstrate the model's superior performance compared to existing machine learning and hybrid algorithms.

Uploaded by

phuong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Software Defect Prediction Based on Multi-filter Wrapper Feature (2)

This study introduces a novel software defect prediction (SDP) model that utilizes a multi-filter wrapper feature selection technique combined with a one-dimensional convolutional neural network (1D-CNN) and an attention mechanism. The approach aims to enhance predictive accuracy by selecting relevant features from software metrics while addressing issues related to irrelevant and redundant data. Experimental results on 17 open-source datasets demonstrate the model's superior performance compared to existing machine learning and hybrid algorithms.

Uploaded by

phuong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Neural Computing and Applications

https://doi.org/10.1007/s00521-024-10902-y (0123456789().,-volV)
(0123456789().,-volV)

S.I.: HYBRID APPROACHES TO NATURE-INSPIRED OPTIMIZATION ALGORITHMS


AND THEIR APPLICATIONS

Software defect prediction based on multi-filter wrapper feature


selection and deep neural network with attention mechanism
Ruchika Malhotra1 • Sonali Chawla1 • Anjali Sharma2

Received: 10 May 2023 / Accepted: 12 December 2024


 The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025

Abstract
Software defect prediction (SDP) models rely on various software metrics and defect data to identify potential defects in
new software modules. However, the performance of these predictive models can be negatively impacted by irrelevant,
redundant metrics and the imbalanced nature of defect datasets. Additionally, the previous studies mainly use conventional
machine learning (ML) techniques, but their predictive performance is not superior enough. Addressing these issues is
crucial to improve the accuracy and effectiveness of SDP models. This study presents a novel approach to SDP using a
multi-filter wrapper feature selection technique (MFWFS). To identify a subset of relevant and informative features, we
leverage the combination of filter techniques—Information gain (IG), Chi-square (CS), and Relief-F (RF) method, and a
wrapper technique—Opposition-Based Whale Optimization Algorithm (OBWOA). One-dimensional-Convolutional
Neural Network (CNN) with an attention mechanism is employed to enhance the classification performance of the
predictive model by efficiently integrating the selected characteristics into abstract deep semantic features. We undertake
experiments on seventeen open-source software datasets on four performance measures—AUC, G-mean, F-measure, and
MCC and compare the obtained results with existing state-of-the-art ML and hybrid algorithms. The experimental findings
demonstrate the greater efficiency of our approach, highlighting the usefulness of the multi-filter wrapper feature selection
technique and 1D-CNN with attention to SDP.

Keywords Software defects  Filter techniques  Wrapper techniques  Opposition-based whale optimization algorithm 
CNN  Attention mechanism

1 Introduction

SDP plays a significant role as a software quality assurance


technique. It is a process of developing an efficient pre-
dictive model by leveraging data mining or ML algorithms
using the software metrics and historical defect data
& Anjali Sharma
[email protected]
obtained from different software repositories [1]. The early
estimation of software defects during the development
Ruchika Malhotra
[email protected]
phase helps minimize the effort and time required to
complete a project, where developers and managers can
Sonali Chawla
[email protected]
efficiently find potential defects by allocating the testing
resources judiciously toward defect-prone modules.
1
Department of Software Engineering, Delhi Technological Researchers have identified metrics that strongly corre-
University, Bawana Road, Delhi 110042, India late with defects in software defect datasets based on fac-
2
Department of Planning Monitoring and Evaluation, CSIR- tors such as code complexity and development process
National Physical Laboratory, Dr. KS Krishnan Marg, Pusa, analysis. These metrics can be considered attributes or
New Delhi 110012, India

123
Neural Computing and Applications

features. However, not all the attributes are desirable for information and enhances the prediction model’s overall
creating defect prediction models. The amount of time search performance. This research work uses multiple filter
needed to train a model and its prediction’s quality may be methods such as Information Gain, Chi-square, and Relief-
negatively impacted by redundant and irrelevant attributes, F methods as per the previous research findings [19, 20].
commonly known as the curse of dimensionality. The The features obtained are then integrated with the feature
previous works [2–5] have proven that feature selection subset obtained from the wrapper method—opposition-
techniques help diminish the big dimensionality problem in based whale optimization to maximize the classification
software defect datasets. performance (AUC) of the predictive model. Filter meth-
Feature selection identifies the most representative fea- ods extract the information from features, and the wrapper
ture subset from the initial set by assessing each feature’s uses the learning algorithm for judgment. The computa-
significance while preserving the model’s high classifica- tional time and complexity are reduced compared to the
tion performance. The three categories of feature selection pure wrapper method [10]. We believe that this multi-filter
techniques that help select the informative features are— wrapper technique has never been applied earlier in the
filter, wrapper, and hybrid approach. A filter approach SDP domain.
relies on the data’s characteristics to assess the feature’s A predictive model’s performance mainly depends on
relevance rather than any classification algorithm to eval- two important factors: feature representation and classifi-
uate the feature subset [6, 7]. Wrapper approaches out- cation algorithm. Researchers have been applying ML and
perform filter methods by employing the greedy approach deep learning (DL) algorithms to efficiently construct
to evaluate all potential optimal feature subsets with a models for predicting software defects [3, 21–27]. ML
classification algorithm [8, 9]. Though filter methods are algorithms need manually extracted software metrics for
computationally less expensive and faster, the wrapper building classifiers to predict defect or non-defect-prone
methods produce a better subset of features. Hybrid areas. While DL algorithms automatically extract high-
methods merge the complementary strengths of both filter level features and learn from high-dimensional and more
and wrapper techniques [10]. complex data [28]. As a result, a lot of researchers are now
Metaheuristic algorithms are frequently applied in a concentrating on creating SDP models using DL algo-
variety of optimization problems in recent years, and fea- rithms. CNN has gained popularity in many fields,
ture selection is another vital study area where these including speech recognition [29], semantic search [30],
methods have been studied. Unfortunately, there is no object detection [31], etc., recently because of its capability
assurance that the feature selection will provide the best to extract semantic features with powerful discriminating
combination of attributes due to the stochastic nature of capacity compared to ML algorithms. They are useful for
metaheuristic algorithms. Several metaheuristic algorithms developing SDP models, but they seem complicated and
have been applied in SDP, such as genetic algorithms [11], have insufficient low accuracy values. This may be because
particle swarm optimization [12], artificial bee colony CNNs were originally built using 2D structures based on
[13, 14], ant colony [15], moth flame optimization [2], salp 2D data such as images and videos. Recently, there has
swarm optimization [16], etc. Whale optimization algo- been a development of a new modified variant of 2D-CNN
rithm (WOA), a recently developed metaheuristic algo- called 1D-CNN [32–34]. They are preferable over 2D-
rithm, mimics the intelligent search strategy used by CNN due to reasons such as—the complexity of the 1D
humpback whales to find food by the method known as variant (O(NK)) is lower than that of the 2D variant
bubble-net feeding. Although WOA has superior search (O(N2K2)) [32]. One-dimensional variant implementation
ability compared to other existing algorithms, it has limi- is feasible on a standard computer and comparatively faster
tations [17] such as getting stuck in local optima. WOA due to fewer hidden layers and fewer parameters involved.
starts with random positions for each agent in the group Its exceptional performance in real-time electrocardiogram
and switches locations with either a randomly assigned (ECG) monitoring [34], structural damage detection [35],
search agent or the current best solution available high-power engine fault monitoring [36], and automatic
throughout the search. The WOA is combined with the idea speech recognition [37] have all contributed to its increased
of an opposition-based learning concept to form OBWOA popularity. Inspired by the successful application of 1D-
that improves its base version [18]. By determining if the CNN in these areas, 1D-CNN with an attention mechanism
opposite position for each agent is better, the opposition- has been employed as the classification algorithm for cre-
based idea can assist WOA in finding accurate solutions. ating a defect prediction model.
Instead of employing the filter and wrapper techniques The following are the main objectives of the paper:
independently, we combine the features produced from
• A novel hybrid model based on the multiple filter
both techniques to take maximum advantage of both
wrapper feature selection (MFWFS) techniques is
techniques. This helps to avoid missing any relevant

123
Neural Computing and Applications

proposed. This algorithm combines three filter meth- 2 Related work


ods—Information Gain, Chi-square, and Relief-F
methods with a swarm-based oppositional whale Many researchers in the past have explored the use of
optimization algorithm (OBWOA) that eliminates feature selection techniques with DL-based classification
the redundant and irrelevant features and selects the methods in SDP due to their significant performance
optimal feature subset to improve the overall model characteristics.
performance. Zhu et al. [3] proposed an improved feature selection
• A powerful unified defect predictor based on CNN algorithm combining two metaheuristic algorithms named
using an attention mechanism is created that inte- WOA and a simulated annealing algorithm called EMWS.
grates the automatic features generated from the output A defect predictor named WSHCKE was constructed as a
of CNN and features obtained from the MFWFS hybrid deep neural network combining CNN and KELM.
technique. The experiment was conducted on 20 different software
• The hybrid model is evaluated on 17 software defect projects using four performance evaluation measures—F1
datasets obtained from NASA, AEEEM, and Promise score, MCC, G-measure, and AUC and concluded that
repositories. We have computed the performance met- EMWS and WSHCKE performed superior to baseline
rics—AUC, G-mean, F-measure, and MCC to vali- feature selection and classification algorithms.
date and analyze the classification performance of the Arar and Ayan [14] proposed a hybrid classifier where
proposed model. an artificial bee colony algorithm (ABC) optimizes the
• The predictive capability of the proposed model is connection weights of the neural network (ANN). The cost-
compared to other state-of-the-art ML and hybrid sensitive feature was introduced to the ANN by imple-
algorithms present in the literature to depict that our menting the error function that considers the misclassifi-
model has produced better results across all the cation cost of each class and tries to minimize the total
datasets. We employed the Wilcoxon test to establish misclassification cost. The suggested method was evalu-
the statistical significance of our results. ated using five publicly accessible NASA software data-
The above objectives were achieved by addressing the sets, taking into account accuracy, probability of detection,
research questions outlined below: probability of false alarm, balance, and AUC measures.
The obtained results show that the proposed approach
RQ1. How efficient is the SDP model developed outperformed other ML algorithms.
using 1D-CNN with and without multi-filter wrapper Kassaymeh et al. [16] optimized the parameters of the
feature selection? BPNN algorithm using the metaheuristic technique of salp
RQ2. How efficient is the SDP model developed swarm optimization aiming to enhance the accuracy of the
using MFWFS-1D-CNN with and without attention? SDP model. The evaluation of results was conducted on 22
RQ3. How effective is the SDP model developed software datasets using accuracy, AUC, specificity, sensi-
using the proposed hybrid model MFWFS-1D-CNN- tivity, and error rate as the performance measures. In
Attention in comparison with the models created with comparison with state-of-the-art algorithms, the SSA-
individual filters and wrapper-based CNN model? BPNN method demonstrated superior performance, out-
RQ4. How effective is the SDP model developed performing all other methods.
using the proposed hybrid classifier (MFWFS-1D- Jin et al. [38] developed a model that combines ANN
CNN-Attention) in comparison with other state-of- and quantum particle swarm optimization (QPSO). ANN
the-art ML and hybrid techniques? acts as a classifier for predicting defects and non-defects,
The remainder of the paper is organized as follows. whereas QPSO is used for feature selection. The results
Section 2 analyzes the relevant works in the field of SDP, were evaluated on four NASA software projects using the
including DL models and other hybrid models that use AUC measure. The performance of this hybrid model was
feature selection techniques as a part of pre-processing. compared to other baseline ML classifiers which showed
Section 3 gives a background on the feature selection the superiority of the QPSO ? ANN model.
algorithms used in this study. In Sect. 4, an in-depth Turabieh et al. [39] employed binary variants of three
explanation of the proposed methodology is presented. wrapper-based feature selection methods—particle swarm
Section 5 describes the experimental layout of the paper. optimization (PSO), ant colony optimization (ACO), and
Section 6 presents the obtained results and analyzes them genetic algorithm (GA), iteratively, to improve the per-
in the form of answers to the research questions. Section 7 formance of a layered recurrent neural network (LRNN)
mentions the validity threats posed in this study, and for the SFP problem. The results were compared with and
Sect. 8 gives the conclusion and future work to our study. without 5 cross-validation, and statistical analysis proved

123
Neural Computing and Applications

that cross-validation with feature selection was better than However, the hybrid approaches applied in existing lit-
other approaches using AUC values. erature have very limited use of feature selection methods
Li et al. [21] presented a framework named Defect combined with deep neural network architectures for SDP.
Prediction via CNN (DP-CNN) that automatically produces The main focus of this study is to propose a unique hybrid
syntactic and semantic features obtained from source code feature selection technique that integrates the existing filter
using abstract syntax trees and integrates them with tradi- methods—IG, CS, RF, and the modified wrapper method—
tional features for SDP. According to the results, DP-CNN OBWOA. The OBWOA is chosen because of its significant
advances the existing techniques achieving an average 12% competitiveness over other metaheuristic algorithms in
increase in F-measure. solving complex neural network problems. The features
Hassouneh et al. [40] presented an efficient feature obtained from the above unification, along with superior
selection approach for SDP using WOA augmented with feature extraction capabilities of 1D-CNN combined with
five natural selection operators. Four different classifiers the coordinate attention mechanism result in a powerful
were used—KNN, DT, LDA, and SVM. The tournament- unified defect predictor that has superior classification
based WOA with DT classifier outperformed other variants performance compared to other existing ML algorithms.
of WOA in terms of AUC measure over 17 public software Further, the Wilcoxon test is employed for statistical
datasets. analysis to validate the results and assess the robustness of
Balogun et al. [41] introduced a new hybrid multi-filter the model’s performance compared to other algorithms
wrapper feature selection technique that uses a rank present in the literature.
aggregation method to choose irrelevant and redundant
features in software defect datasets. Twenty-five datasets
were selected from various software repositories to check 3 Background
the feasibility of the novel technique on Naı̈ve Bayes and
Decision Tree classifier to evaluate its performance. The 3.1 Filter methods
evaluation metrics used were F-measure, AUC, and accu-
racy. The results indicate this feature selection method Filter methods rely on univariate statistics to capture the
effectively addresses issues such as filter rank selection and inherent characteristics of features, independent of any
local optima stagnation. learning algorithm. These approaches are quicker and less
Zain et al. [42] proposed a novel 1D-CNN structure for computationally intensive compared to wrapper methods
SDP that is efficient in comparison with other 1D-CNN [7]. In this work, we have used the information gain, chi-
models varying in filter size, kernel size, dropout layers, square, and relief-F methods described below.
and a number of convolutional and max-pool layers. The
• Information gain
models were tuned with different hyperparameters and
It is a technique that determines the relevance of a
evaluated based on F-measure, accuracy, training, and
given attribute for the class label prediction. It is an
testing time. The comparison with other ML algorithms
entropy-based concept that measures the uncertainty in
showcased the exceptional performance of the proposed
the class label after the attribute value is observed [45].
model.
It can be found using Eq. (1).
Fan et al. [43] presented a DL-based approach named
defect prediction via an attention-based RNN (DP-ARNN). Xm
IGð yÞ ¼  pðxi Þlogpðxi Þ
This model makes use of abstract syntax trees that help to i
learn syntactic and semantic features from the source code X
m

of seven open-source Java software projects. F-measure þ pð y Þ pðxi =yÞlogpðxi =yÞ


i
and AUC were chosen as the performance metrics. DP- X
m
ARNN, on average, improved the F-measure and AUC þ pð y 0 Þ pðxi =y0 Þlogpðxi =y0 Þ ð1Þ
scores by 14% and 7% in comparison with state-of-the-art i

techniques. where xi denotes the ith class—defective/non-defective,


Akay and Akay [44] utilized the ABC algorithm to train pðxi Þ denotes the probability of the ith class, pð yÞ and
the ANN by optimizing its weights and bias for predicting pðy0 Þ signify the probabilities of the presence/absence
the number of software defects in five object-oriented of the attribute y, and pðxi =yÞ and pðxi =y0 Þ denote
software datasets. They compared the results with three conditional probabilities given the presence/absence of
other training algorithms, and the performance metrics— the attribute y, respectively.
recall, precision, accuracy, and F1 scores were computed. • Chi-Square
In SDP, the ABC algorithm proved to be a superior training It is the goodness of the fit test which assesses the
algorithm for ANN.

123
Neural Computing and Applications

independence of two variables [46]. It determines the • Exploitation phase Shrinking Encircling Mechanism:
most relevant features by considering the highest values The humpback whales swim in a spiral-like pattern as
for the chi-squared statistic of the class label computed they approach their prey in a shrinking encircling
using Eq. (2) mechanism. They update their positions and move
X ðObs  Exp Þ2 toward the best solution according to the current
i i
v2 ¼ ð2Þ optimal solution, using Eqs. (4) and (5):
Expi  
! ! ! !
D ¼  C :Z bestt  Z t  ð4Þ
where Obsi and Expi are the observed and expected
values. Higher values of chi-squared statistic indicate a ! ! ! !
Z tþ1 ¼ Z bestt  A : D ð5Þ
stronger dependence of the feature on the output
variable. !
where Z best t denotes the position of the best solution
• Relief-F method ! !
thus far. The coefficients A and C can be computed
This is an instance-based method that relies on the
using Eqs. (6) and (7),
concept of nearest neighbors to estimate the quality of
!
the attributes [47]. For each instance in a training set, A ¼ 2! a :!
r ! a ð6Þ
two nearest neighbors are determined—one belonging
to the same class (HIT) and the other of a different class C ¼ 2! r ð7Þ
 
(MISS). The importance of the attribute is then a ¼ 2  t M2 is linearly decreased from 2 to 0, M refers
~
calculated using Eq. (3). to the maximum number of iterations, and ! r refers to
I ðY Þ ¼ Pðy=nearestinstancedifferent classÞ any random vector in [0, 1].
 Pðy=nearest instancesame classÞ ð3Þ • Spiral path updating position:
The bubble-net feeding mechanism calculates the
where P is the probability, and y denotes the value of an
distance between the current and the best solution thus
attribute. An important attribute should be able to
far using the spiral Eq. (8), as illustrated below.
effectively differentiate between instances of different
! ! !
classes while maintaining similar values for instances Z tþ1 ¼ D0 :ebl :cosð2plÞ þ Z best t ð8Þ
within the same class [48].
where b defines the constant that dictates the shape of
the logarithmic spiral function, and l represents a ran-
3.2 Wrapper method dom value in [-1,1].
According to Eq. (9), humpback whales have a 50%
Wrapper methods assess the quality of various feature chance of randomly selecting either a spiral-shaped
subsets utilizing a particular machine learning algorithm. path model or a shrinking encircling mechanism to
In contrast with filter methods, that rely on intrinsic approach their prey.
properties, wrappers measure the usefulness of the features ( ! )
by training and evaluating a classifier based on the chosen ~ Zbest t  ~ ~;
A:D p\0:5
Z t ¼ !0 ! pe½0; 1
algorithm [49]. In this research, we apply the swarm-based D :ebl : cosð2plÞ þ Zbest t p  0:5;
whale optimization algorithm as a wrapper method for ð9Þ
finding the optimal feature subset.
where p is some random number e [0,1].
• Whale Optimization Algorithm • Exploration phase
It is a stochastic population-based metaheuristic !
This phase is controlled by the variable A that lies
algorithm recently invented in 2016 [17]. This algo-  
!
rithm mimics the intelligent search strategy adopted by between [-1,1]. If  A   1, the position of each whale is
humpback whales known as bubble-net feeding for modified according to the position of the randomly
 
searching their food. They create bubbles in a spiral !
chosen whale, and if  A \1, the position of each whale
motion around their food, i.e., small fishes, and then
slowly swim toward the surface. is modified according to the current optimal solution.
There are two stages in a WOA algorithm—the first This global search capability of WOA is shown in
involves encircling the prey and using a spiral-bubble- Eqs. (10) and (11):
 
net attacking mechanism (the exploitation stage), and ! ! ! ! 
D ¼  C :Z rand t  Z t  ð10Þ
the second involves randomly searching for prey (the
exploration stage) [50].

123
Neural Computing and Applications

! ! ! ! i = 1,2,3,…n. The opposite of Z represented as


Z tþ1 ¼ Z rand t  A : D ð11Þ
(Z~ðf
zi ; zf f
2; z 3 ;. . . zen Þ) and defined using Eq. (13).
!
where Z rand is a position of the whale selected randomly e
z ¼ ai þ bi  z i ð13Þ
from the current population.
Opposition-based optimization—Let Z(z1, z2, z3,…,zn) is a
candidate solution in the search space. Its objective func-
3.3 Opposition-based learning tion is given by f(Z). According to Eq. (14),
Ze(fzi ; zf
2 ; zf
3 ;. . . zen ) is the opposite of Z(z1, z2, z3,…zn) and its
This technology was proposed by Hamid R. Tizhoosh
fitness function is given by f( Ze). While evaluating the
[51, 52], and it has been proven to greatly enhance the
performance of optimization algorithms such as PSO [53], fitness function of both these points simultaneously, if f( Ze)
DE [54], GWO [55], and ACO [56], by evaluating the is found to be better than f(Z), then the point Ze is consid-
current and opposite solutions simultaneously. It increases ered a better solution and replaced by Z, or we retain Z as
the exploitation of search space and ultimately helps us to the better solution.
reach a globally optimal solution more quickly. We can use The algorithm for opposition-based whale optimizations
the below equations to find the opposite values. is shown below.
Opposite number—Assume z [ [a, b] is a real number,
where a and b represent lower and upper limits. The
opposite number is defined using Eq. (12).
e
z ¼aþbz ð12Þ
Opposite point—Assume Z(z1, z2, z3,…zn) be a point
where z1, z2,z3……zd are all real numbers and zi [ [ai, bi],

123
Neural Computing and Applications

4 Proposed methodology jSj


Fitness ¼ a:ð1  pÞ þ ð1  aÞ: ð14Þ
jNj
This research paper presents a hybrid of the multi-filter
where a represents user-defined hyperparameter 2 [1, 0]
wrapper feature selection technique with the attention-
that strikes a balance between both objectives. p denotes
based 1D-CNN model. The proposed methodology is
AUC measure, jSj represents feature subset length, and jNj
explained in Fig. 1.
indicates total no. of features. a = 0.99 as suggested in the
literature [59, 60]. We apply the OBWOA on the original
4.1 Feature selection based on multi-filter
training input dataset to extract the subset of features with
wrapper technique
high AUC values, using KNN (K = 5) as the classifier. To
provide the best possible set of features, feature subsets
The proposed MFWFS technique aims to analyze and
from filter and wrapper approaches are combined. The
combine the computational strengths of various indepen-
MFWFS is used only for feature selection for every sepa-
dent filter and wrapper-based feature selection methods.
rate software project. For classification purpose, we use the
The filter methods (IG, CS, and RF) evaluate the features
attention-based 1D-CNN defect predictor.
on the basis of their usefulness toward better prediction
The primary contribution of the above feature selection
performance to construct an optimal subset of features with
technique is to explore the complementary nature of filter
excellent prediction characteristics. The swarm-based
with wrapper feature selection with the aim to reduce the
oppositional whale optimization algorithm helps accelerate
feature subset of software defect datasets and enhance the
the convergence to the global optimum by finding solutions
classification performance without compromising on the
in opposite directions, increasing the chances of finding
computational cost. The selection process of features for
better regions effectively [57]. The MFWFS is carried out
filter methods is fast, as there is no learning algorithm
using the algorithm described below.

required, whereas, in the wrapper method, the selection


The input software defect datasets are split into training process is slow due to the involvement of the learning
and test datasets in the ratio of 70:30. To determine the algorithm. The integration of multiple filter methods IG,
ranks of all the features for each filter method, we inde- CS, and RF with the OBWOA wrapper-based method aims
pendently apply the information gain, chi-square, and to leverage the strengths of both, resulting in a novel,
relief-F methods to the training set. The top-ranked features optimized, and robust hybrid feature selection process that
are selected from the generated rank list using log2n, where is unprecedented in the domain of SDP.
‘n’ stands for the total number of features in every dataset.
Based on the previous empirical studies, the choice of 4.2 Defect prediction based on attention-based
log2n is made [45, 58]. As a result, software defect datasets CNN
with fewer optimal features will be produced. In OBWOA,
each solution refers to a feature subset for every project CNN is a deep neural network that efficiently handles
that is the 1D binary vector with ‘n’ elements, each of image data. It expands the traditional artificial neural net-
which represents ‘n’ features and has a value of either ‘0’ work by adding more layers, such as a convolutional layer,
or ‘1.’ Two optimization goals should be considered—re- pooling layer, flattening layer, etc. A classic CNN applies
ducing the number of features and maximizing the classi- to 2D data such as images and videos; therefore, it is
fication performance. We combine the above two goals into commonly referred to as 2D-CNN. Another recent devel-
an objective (fitness) function in Eq. (14) [40]. opment is 1D-CNN, a modified form of 2D-CNN [32–34].

123
Neural Computing and Applications

Fig. 1 Proposed methodology

One-dimensional variant implementation is feasible on a c) ReLU layer—The output from the preceding layer
standard computer and comparatively faster due to fewer (feature map) is subjected to a nonlinear activation
hidden layers and fewer parameters involved. function by the ReLU layer. It gives the model
Figure 2 illustrates the structure of the proposed 1D- nonlinearity and enhances its capacity for expression.
CNN model with attention comprising the following d) Flatten layer—The feature maps produced as a
important layers: - result of 2nd convolutional layer are transformed into
a single long continuous linear vector through the
a) Input layer—The 1D-input vector K[n] where
process of flattening.
n = 0,1, 2,…… N-1 is given to the first layer of the
e) Fully connected (FC) layer—The output from the
CNN architecture. ‘n’ represents the software metrics
flattening layer is linearly transformed into a new
present in the dataset. E.g.; the JM1 dataset contains
representation through a set of weights and biases. To
21 software metrics, so the input layer will accept 21
classify inputs accurately, we employed two FC
metrics.
layers. The first FC layer used ReLU activation,
b) Convolutional layer—It is the foundational layer of
while the second FC layer used softmax activation
1D-CNN. A series of convolutions are performed
generating a probability ranging from 0 to 1.
using filters or kernels to acquire local features from
f) Dropout layer— A regularization approach ran-
the input data. The filters help detect deep semantic
domly removes a particular proportion of the neurons
features from the low-level feature representation in
in the FC layer during training. Hence, the model’s
the defect instances. We have used two convolutional
generalization performance is enhanced, and overfit-
layers with 64 and 32 filters and a kernel size of 1 for
ting is prevented.
each filter.
g) Attention mechanism— It is a technique commonly
used in DL models to focus on the most important

123
Neural Computing and Applications

features or regions of an input, enabling the model to


This allows the attention block to capture long-range
focus on relevant information and improve its
spatial interactions while preserving precise positional
performance. This mechanism has been effectively
information. This modification can be used in SDP to help
utilized in several computer vision tasks, such as
1D-CNN accurately locate defects of interest and improve
object recognition [61], segmentation [62], and
the overall structure to capture deep semantic features.
detection [63].
The integration of CNNs with attention mechanisms
In the context of SDP, attention mechanisms can be used aims to leverage the powerful feature extraction capabili-
in a CNN to learn discriminative features from software ties of deep neural networks while enhancing focus on
artifacts, such as source code or execution traces. This discriminative regions within the input data. This enhances
facilitates the identification of potential defects in a soft- the overall performance and interpretability of the model,
ware system before they turn into failures or errors. We use making it more effective in tasks that require discerning
the coordinate attention as described by [64]. The coordi- important features from complex input data.
nate attention captures direction-aware and position-sensi-
tive data that aid models in precisely locating and
identifying the defects in software. As a computing unit, a 5 Experimental setup
coordinate attention block tries to increase the expressive
power of the learned features. Any intermediate feature 5.1 Dataset description
tensor, X = [x1, x2,..., xC] € RC*H*W, may be used as input;
transformed vector with Y= [y1, y2,..., yC] where X and Y We perform experiments on the 17 open-source datasets
are of the same size is the output. obtained from—NASA [65], PROMISE [66], and AEEEM
The channel attention algorithm modifies the global [67] which are publically available benchmark datasets for
pooling operation by factorizing the pooling into two 1D the SDP problem to ensure that the research results can be
feature encoding processes, one along the horizontal and verified, replicated, and refuted [3, 16, 68, 69]. We have
one along the vertical coordinates [64]. The input X selected six datasets from NASA (JM1, KC1, MC1, PC3,
undergoes encoding through two pooling kernels—(H, 1) PC4, and PC5), which contain traditional software metrics
and (1, W), capturing information along both horizontal such as McCabe, Halstead, Line Count, etc. Five datasets
and vertical coordinates for each channel. Therefore, are selected from AEEEM (Equinox, Eclipse JDT Core,
Eq. (15) can be used to express the output of the c-th Eclipse PDE, Lucene, and Mylyn) which contain 61 mea-
channel at height ‘h.’ sures, including 17 source code metrics, 17 entropy of
source code metrics, 17 churn of source code metrics, 5
1X W
ohc ðhÞ ¼ xc ðh; iÞ ð15Þ previous-defect metrics, and 5 entropy of change metrics.
W i¼0 The PROMISE datasets (ant-1.7, camel-1.6, jEdit-4.3, poi-
3.0, xalan-2.7, and xerces-1.4) contain 20 Object oriented
Likewise, c-th channel’s output at width ‘w’ can be
metrics, Mc Cabe’s metrics, and CK metrics.
written as
Table 1 provides basic statistical information for 17
1X H
projects across three datasets that include the number of
owc ðwÞ ¼ xc ðj; iÞ ð16Þ
W j¼0 instances, attributes, defects, defective ratio, and imbalance
ratio. All these datasets vary in both size and complexity.

Fig. 2 Architecture of attention-based 1D-CNN

123
Neural Computing and Applications

Also, the imbalance ratio in all the datasets is quite high. [73]. This is typically done to boost the neural network
Therefore, there is a need to address the imbalance issue in training process and solve model learning issues.
these software projects. The independent variables in these Equation (18) is used to normalize the dataset.
datasets denote the features/attributes, and the dependent X - min(X)
variable is the class label (‘1’—defective/‘0’—non-defec- Xnormalized = ð18Þ
maxðXÞ - min(X)
tive). In some datasets, the dependent variable shows the
count of defects within a particular module; in that case, we
In the above equation, the minimum and maximum
represent the class label as defective if there is more than
values of software metric X are denoted by min(X) and
one defect present; otherwise, we label it non-defective.
max(X), respectively.
5.2 Data pre-processing
5.3 Performance measures
• Missing values and Removal of Outliers—This stage
It is possible to assess the effectiveness of predictive
involved searching the datasets for any outliers and
models for datasets with imbalanced class distributions by
missing values that might have been present. The data
utilizing appropriate performance measures. In our study,
points are analyzed in each dataset, and those that vary
we have considered AUC, F-measure, MCC, and G-mean,
extremely from the remaining data points are consid-
as these metrics are widely utilized in the literature
ered outliers. These outliers are critical and need to be
[3, 39, 74–76]. The most traditional and commonly used
handled. In our study, we have used median imputation
performance metric to evaluate a classifier is accuracy.
to treat the outliers since deleting the outliers would
However, it gives biased results when handling imbalanced
lead to degradation of the dataset and a reduction in the
datasets [77]. Studies have advocated the use of AUC
available data volume.
(Area under the Receiver’s Operating Curve), a robust
• Class Imbalance processing—The datasets used to
measure particularly effective for imbalanced and noisy
solve SDP problems are usually imbalanced. There is a
datasets [68, 78]. Thus, in our research, we have employed
relatively small percentage of defective modules (mi-
AUC to examine the relationship between true-positive rate
nority class) as compared to non-defective modules
(recall or sensitivity) and false-positive rate [5, 79] defined
(majority class) [70]. If an imbalanced dataset is put
by Eq. (19).
directly to train a model, the results may be poor and
biased in favor of the majority class. Oversampling and Truepos
TPR ¼ ð19Þ
undersampling are common approaches to resolve the Truepos þFalseneg
class imbalance problem [71]. Oversampling includes Precision represents a measure that determines correctly
generating duplicates of minority class instances, while classified defective instances out of all the defective
undersampling entails eliminating instances from the instances. It can be calculated using Eq. (20):
majority class. A part of the training dataset may be
Truepos
partially lost in undersampling, resulting in underfitting. Precision ¼ ð20Þ
We use the random oversampling method to create a Truepos þFalsepos
class-balanced training set that replicates the instances F-measure combines recall and precision into a single
from the minority class randomly [72]. measure and is a harmonic mean of the two, as shown in
• Data Standardization—It is crucial to first normalize Eq. (21).
the input dataset before initiating the training process  
Precision. Recall
because each dimension in an input space may have a F  measure ¼ 2 ð21Þ
distinct range of values. The original dimension values Precision + Recall
must be standardized in order to produce reliable Matthews Correlation Coefficient (MCC) measures the
results, and the processed dimension values must correlation between predicted and actual instances. Equa-
correspond to the original distribution. In a training tion (22) is used to calculate its value, which falls between
process, it is a common practice to normalize the data, [-1,1]
ensuring that input values fall within the range [0, 1]

Truepos  Trueneg  Falsepos  Falseneg


MCC ¼ p ð22Þ
ðTruepos þ Falsepos Þ  ðTruepos þ Falseneg Þ  ðTrueneg þ Falsepos Þ  ðTrueneg þ Falseneg Þ

123
Neural Computing and Applications

Table 1 Statistical information of software projects


Datasets No. of No. of No. of defective No. of non-defective Defective Imbalance
instances attributes instances instances ratio ratio

JM1 7783 22 1672 6111 21.48 3.65


MC1 1988 38 46 1942 2.31 42.22
KC1 2109 22 326 1783 15.46 5.47
PC3 1077 37 134 943 12.44 7.04
PC4 1458 37 178 1280 12.21 7.19
PC5 1711 38 471 1240 27.53 2.63
Eclipse JDT 997 61 206 791 20.66 3.84
Core
Eclipse PDE 1492 61 209 1283 14.01 6.14
Equinox 324 61 129 195 39.81 1.51
Lucene 691 61 64 627 9.26 9.80
Mylyn 1862 61 245 1617 13.16 6.60
ant-1.7 745 20 166 579 22.28 3.49
camel-1.6 965 20 188 777 19.48 4.13
jedit-4.3 492 20 11 481 2.24 43.73
poi-3.0 442 20 281 161 63.57 0.57
Xalan-2.7 909 20 898 11 98.79 0.01
Xerces-1.4 588 20 437 151 74.32 0.35

G-mean represents a balanced measure that is computed We applied the Wilcoxon signed-rank test [81] to detect
using the geometric mean of sensitivity and specificity, as significant differences in the techniques pair. With the help
shown in the equation below (23). of this test, we may determine whether the null hypothesis,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi suggesting no difference in the performance of techniques
G  mean ¼ Sensitivity  Specificity ð23Þ
can be rejected or not. We compute the p-value for every
technique pair and make a decision on either to accept or
5.4 Parameter settings reject the null hypothesis, with a chosen significance
level = 0.05.
For performing feature selection, only the training dataset
is used in this paper. The model is evaluated using the same
features that were selected from the training dataset. The 6 Results and discussion
filter methods—IG, CS, and RF are implemented using the
Weka Tool (default settings), and top log2n features are RQ1. How efficient is the SDP model developed
selected. The wrapper-based, oppositional whale opti- using 1D-CNN with and without multi-filter wrapper
mization algorithm and 1D-CNN model are implemented feature selection?
in Python 3.9 on Spyder 5.2.2. We have divided the data
The study utilized 17 software defect datasets to
into 70% training and 30% testing before executing the
implement the 1D-CNN model with and without MFWFS.
CNN algorithm. Tables 2 and 3 show the parameter set-
The model was designed with parameter settings specified
tings for OBWOA and 1D-CNN.
in Table 3. The datasets are initially balanced using the
random oversampling technique before training the model.
5.5 Statistical analysis The top log2n features are selected from IG, CS, and
Relief-F methods applied to the training set. The OBWOA
The study employed statistical tests, mainly the Wilcoxon
uses KNN (K = 5) as the internal classifier with parameter
test, to support and reinforce the findings and conclusions.
settings specified in Table 2 Parameter settings for
Demšar [80] has suggested using non-parametric tests
OBWOA. To handle the stochastic nature of OBWOA, we
because assumptions regarding data distribution, such as
performed 20 runs of this technique to find the average
normality and homogeneity, are relaxed.
results. Table 4 shows the original number of features in all

123
Neural Computing and Applications

datasets, the average number of selected features after This clearly implies that the impact of MFWFS on the
performing OBWOA, the average selection ratio, the best 1D-CNN classifier is significant with respect to AUC val-
fitness value, and number of features in the MFWFS ues. Although 1D-CNN extracts high-level deep semantic
technique. For instance, the JM1 dataset has originally 21 features using various layers, feature selection using a
features, we select top log2n features, i.e., four features hybrid filter wrapper approach is applied prior to training
each from IG, CS, and RF methods, nine features are the model, removing all the defect features that are irrel-
selected from OBWOA, and after their union—we get 12 evant, redundant, and cause noise in the dataset and over-
features in MFWFS. We can observe from Table 4 that the fitting the model. As a result, this integration of static
attribute reduction rate varies from 20 to 77% across all defect feature subsets with abstract deep semantic features
seventeen datasets, resulting in a total average reduction of in CNN improves the model’s efficiency and builds a
50% attributes using the above MFWFS technique. This stronger ability to distinguish between defective/non-de-
implies that the new MFWFS technique generates feature fective classes.
subsets with substantially fewer features that produce
RQ2. How efficient is the SDP model developed
higher-quality solutions.
using MFWFS-1D-CNN with and without attention?
The classification performance of the SDP model
developed using 1D-CNN defect predictor without and The study evaluated the performance of SDP models
with MFWFS is evaluated across 17 software projects created with MFWFS-1D-CNN, both with and without an
measuring performance using F-measure, AUC, G-mean, attention layer, by measuring their F-measure, AUC,
and MCC. The results are displayed in Tables 5, 6, 7, and G-mean, and MCC values. The results are presented in
8. Results show that CNN with feature selection depicts the Tables 10, 11, 12, and 13. Across all datasets, the F-mea-
best F-measure, AUC, G-mean, and MCC values in nine, sure, AUC, G-mean, and MCC values range from 0.613 to
fifteen, ten, and ten out of seventeen datasets. There is an 0.901, 0.788 to 0.974, 0.761 to 0.958, and 0.381 to 0.897,
improvement in values of F-measure (1–8%), AUC respectively. Adding the attention layer improves the pre-
(0.1–30%), G-mean (0.2–6%), and MCC (1–23.5%) for all dictive model’s performance in all the datasets except for
the datasets except for the PDE and camel-1.6 dataset. The one, i.e., xalan. We can clearly observe that there is an
performance values of the PDE and camel-1.6 dataset after improvement of about 0.1–11% in the values of F-measure,
feature selection did not improve; hence, the feature 0.5–11% in the values of AUC, 0.3–12% in the values of
selection was not helpful for that particular dataset. G-mean, and 0.12–35% in the values of MCC measures
To assess how the number of features affects classifier obtained after applying the attention layer in the 1D-CNN
performance, the average F-measure, AUC, G-mean, and model. The improvement in the values is due to the
MCC values were compared using the Wilcoxon signed- application of the attention mechanism in the 1D-CNN
rank test with a significance threshold of 0.05. The test model, which can detect the direction and position of the
results are presented in Table 9, showing the p-values for data, thus enabling the precise identification of the defects
each comparison. A p-value below 0.05 suggests a statis- in the software.
tically significant difference between the two techniques.
The performance difference between the two techniques is
denoted as either significantly different (S ?) or not sig-
nificantly different (NS), respectively. The analysis Table 3 Parameter settings for 1D-CNN
revealed a significant difference between the two methods Parameters Values
in terms of AUC (p-value = 0.015) but not for F-measure,
No. of convolutional layers 2
G-mean, or MCC.
No. of dense layers 2
Activation function ReLU ? Softmax
Table 2 Parameter settings for OBWOA Number of attention layers 2
Filter 64,32
Parameters Values
Kernel 1,1
No. of iterations 100 Optimizer Adam
No. of search agents (whales) 50 Learning rate 0.01
Spiral factor b 1 Loss function Binary cross entropy
Convergence constant ‘a’ 1 No. of epochs 250
No. of runs 20 Dropout 0.2

123
Neural Computing and Applications

Table 4 Average selected features obtained by MFWFS


Datasets Original #features Selection ratio Best fitness value (OBWOA) #features(IG ? CS ? RF ? OBWOA)
#features (OBWOA)

JM1 21 9 0.4286 0.410967 12


MC1 38 5 0.1316 0.40894 13
KC1 21 8 0.381 0.338598 15
PC3 37 7 0.1892 0.359309 16
PC4 37 6 0.1622 0.251322 16
PC5 38 12 0.3158 0.31292 21
EQ 61 4 0.0656 0.178979 14
JDT 61 15 0.2459 0.22763 24
PDE 61 9 0.1475 0.36215 22
Lucene 61 3 0.0492 0.321484 17
Mylyn 61 9 0.1475 0.342688 17
ant-1.7 20 6 0.3 0.267295 12
camel-1.6 20 11 0.55 0.3794 16
jedit-4.3 20 6 0.3 0.410779 12
poi-3.0 20 9 0.45 0.15855 14
Xalan-2.7 20 3 0.15 0.158033 12
xerces-1.4 20 5 0.25 0.086125 12

To statistically validate our results, we performed the and poi-3.0 datasets. However, for the jedit-4.3 dataset, the
Wilcoxon test on all four performance measures. Table 14 performance of the IG ? CS ? RF ? OBWOA method
represents the p-values obtained when the MFWFS-1D- was better in terms of AUC and G-mean, while the
CNN-Attention model is compared with MFWFS-1D- OBWOA wrapper method alone was better in terms of
CNN. The p-values obtained with respect to F-mea- F-measure and MCC scores. For Xalan-2.7, IG ?
sure = 0.03, AUC = 0.041, and MCC = 0.012 show a OBWOA provided the best F-measure and MCC value,
significant difference in the two approaches but not sig- while OBWOA gave the best performance in terms of
nificantly with respect to the G-mean measure. These AUC and G-mean. For the Xerces-1.4 dataset, IG ?
results suggest that the MFWFS-1D-CNN-Attention model OBWOA performed the best, with the highest values for all
outperforms the MFWFS-1D-CNN model without atten- evaluation measures.
tion, signifying the importance of the attention layer in the We also present the box plots to visually compare the
neural network. difference in the prediction performances between
OBWOA, IG ? OBWOA, CS ? OBWOA, RF ?
RQ3. How effective are the SDP models developed
OBWOA, and IG ? CS ? RF ? OBWOA over seventeen
using the proposed hybrid model MFWFS-1D-CNN-
datasets, with respect to four performance measures in
Attention in comparison with the models created with
Fig. 3, respectively. The x-axis denotes the different
individual filters and wrapper-based CNN model?
methodologies being compared, and the y-axis denotes the
Answer: To assess whether the proposed MFWFS-1D- four evaluation measures. We observe that for each of the
CNN-Attention model is able to effectively classify the four evaluation metrics, the median value achieved by
software defects, we made a comparison of this model with IG ? CS ? RF ? OBWOA is higher than that obtained
the sole wrapper method (OBWOA) and combinations of by other approaches. The average predicted F-measure
filter (IG, CS, and RF) and wrapper method. Table 15 value by IG ? CS ? RF ? OBWOA is 0.794 (for all the
reports the results using the four performance measures— datasets), that is, an improvement between 7% (IG ?
F-measure, AUC, G-mean, and MCC. OBWOA) and 10.41% (OBWOA), the average AUC
In terms of all the evaluation measures, MFWFS (0.896) by IG ? CS ? RF ? OBWOA obtains improve-
(IG ? CS ? RF ? OBWOA) on CNN attention model ment between 4.4% (RF ? OBWOA) and 7% (OBWOA),
produced the highest values in JM1, MC1, KC1, PC1, PC3, the average G-mean (0.875) by IG ? CS ? RF ?
PC4, PC5, EQ, JDT, Lucene, Mylyn, ant-1.7, camel-1.6, OBWOA obtains improvement between 5.4% (IG ?

123
Table 5 Comparative results of 1D-CNN with and without MFWFS based on F-measure
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

123
Without FS 0.614 0.66 0.726 0.764 0.782 0.731 0.854 0.859 0.846 0.825 0.732 0.79 0.746 0.69 0.802 0.831 0.909
With FS 0.651 0.653 0.721 0.776 0.841 0.739 0.898 0.901 0.839 0.776 0.753 0.789 0.708 0.706 0.796 0.677 0.9

Table 6 Comparative results of 1D-CNN with and without MFWFS based on AUC
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

Without FS 0.712 0.878 0.847 0.915 0.916 0.806 0.871 0.902 0.912 0.847 0.833 0.805 0.826 0.935 0.825 0.754 0.895
With FS 0.752 0.888 0.848 0.919 0.94 0.813 0.932 0.926 0.89 0.861 0.875 0.833 0.826 0.88 0.833 0.98 0.942
Neural Computing and Applications
Table 7 Comparative results of 1D-CNN with and without MFWFS based on G-mean
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

Without FS 0.662 0.865 0.821 0.91 0.89 0.771 0.864 0.891 0.904 0.848 0.827 0.813 0.802 0.888 0.828 0.393 0.899
With FS 0.701 0.858 0.804 0.906 0.914 0.775 0.914 0.914 0.883 0.863 0.853 0.815 0.789 0.855 0.824 0.969 0.917
Neural Computing and Applications

Table 8 Comparative results of 1D-CNN with and without MFWFS based on MCC
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

Without FS 0.281 0.406 0.511 0.61 0.623 0.496 0.717 0.732 0.713 0.668 0.515 0.59 0.53 0.466 0.637 0.227 0.822
With FS 0.347 0.475 0.491 0.624 0.709 0.506 0.815 0.806 0.671 0.586 0.56 0.588 0.479 0.471 0.627 0.467 0.807

123
Neural Computing and Applications

Xerces-1.4
Table 9 Wilcoxon test result based on average F-measure, AUC, G-
mean, and MCC

0.901
0.9
Comparison p-value p-value p-value p-value
approach (F- (AUC) (G- (MCC)
measure) mean)

Xalan-2.7

0.677
0.613
CNN without NS S ? (0.015) NS NS
feature selection/ (0.463) (0.193) (0.201)
CNN with

Poi-3.0
MFWFS

0.796
0.818
Jedit-4.3
OBWOA) and 8.65% (OBWOA), and the average MCC

0.706
0.706
(0.632) by IG ? CS ? RF ? OBWOA obtains improve-
ment between 18% (IG ? OBWOA) and 28.78%
(OBWOA).

Camel-1.6
We also compared the performance of the proposed

0.708
0.761
feature selection method IG ? CS ? RF ? OBWOA with
transformer-based model BERT [82] used in a previous

Ant-1.7
literature study to validate the performance and robustness

0.789
0.809
of the proposed method against the original 1D-CNN-
Attention-based model. The F-measure, AUC, G-mean,

Mylyn
and MCC values are reported in Table 16. The results

0.753
0.811
indicate that the IG 1 CS 1 RF 1 OBWOA with 1D-
CNN model consistently outperforms the IG 1 CS 1

Lucene

0.776
0.831
RF 1 OBWOA with BERT approach across all datasets
in terms of F-measure, AUC, G-mean, and MCC values.

Table 10 Comparative results of MFWFS-1D-CNN with and without attention based on F-measure
Higher F-measure values for 1D-CNN indicate better

0.839
0.879
PDE
balance between precision and recall, which is essential for
accurate defect prediction. AUC values are also consis-
0.901
0.862
tently higher for 1D-CNN, indicating stronger discrimi-
JDT

natory power and better classification ability. Similarly,


G-mean values show that 1D-CNN is better at controlling
0.898
0.889
EQ

the trade-off between sensitivity and specificity, resulting


in more balanced classification results. Additionally, MCC
0.739
0.75
PC5

values are notably higher for CNN which provide insight


into the overall quality of predictions taking into account
both positive and negative classes. This indicates a stron-
0.841
0.854
PC4

ger correlation between predicted and actual classes


compared to BERT. Although BERT like transformers
0.776
0.803
PC3

generally perform well in handling complex data patterns


and have successful results in existing literature studies, in
0.721
0.753

this specific case, IG ? CS ? RF ? OBWOA with 1D-


KC1

CNN-Attention combination leads to enhanced perfor-


mance across these critical metrics, making it a more
0.653
0.712
MC1

accurate model for predicting software defects.


In conclusion, we can say that the IG ? CS ? RF ?
0.651
0.722

OBWOA-based CNN attention model behaves consistently


JM1

over all the considered datasets, mainly performing better


than other combinations of the filter and wrapper methods.
Without attention
With attention

Since all of the measures—F-measure, AUC, G-mean, and


MCC—show significant improvement in the results, we
can justify the use of this combination of feature selection
techniques for SDP.

123
Table 11 Comparative results of MFWFS-1D-CNN with and without attention based on AUC
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

Without attention 0.752 0.888 0.848 0.919 0.94 0.813 0.932 0.926 0.89 0.861 0.875 0.833 0.826 0.88 0.833 0.98 0.942
With attention 0.788 0.933 0.868 0.924 0.938 0.835 0.93 0.926 0.937 0.9 0.901 0.837 0.85 0.974 0.838 0.948 0.891
Neural Computing and Applications

Table 12 Comparative results of MFWFS-1D-CNN with and without attention based on G-mean
JM1 MC1 KC1 PC3 PC4 PC5 EQ JDT PDE Lucene Mylyn Ant-1.7 Camel-1.6 Jedit-4.3 Poi-3.0 Xalan-2.7 Xerces-1.4

Without attention 0.701 0.858 0.804 0.906 0.914 0.775 0.914 0.914 0.883 0.863 0.853 0.815 0.789 0.855 0.824 0.969 0.917
With attention 0.761 0.921 0.834 0.911 0.917 0.79 0.897 0.896 0.927 0.881 0.876 0.822 0.816 0.958 0.835 0.935 0.89

123
Neural Computing and Applications

RQ4. How effective is the SDP model developed


Xerces-1.4 using the proposed hybrid classifier (MFWFS-1D-
0.807
0.808
CNN-Attention) in comparison with other state-of-
the-art ML and hybrid techniques?
Xalan-2.7

In order to assess the effectiveness of the proposed


0.467
0.381

approach (MFWFS-1D-CNN-Attention), we conducted a


comparison of its results with those of other state-of-the-art
Poi-3.0

techniques that are currently available in the literature. We


0.627
0.649

report the results of only those techniques evaluated on the


same datasets used in our study and common performance
Jedit-4.3

measure (AUC) since it is commonly available in the lit-


0.471
0.517

erature studies. Table 17 provides a list of the abbreviated


comparison algorithms along with their particular imple-
mentation/parameter values utilized in evaluating the
Camel-1.6

study.
0.479
0.555

Table 18 compares the proposed algorithm MFWFS-1D-


CNN-Attention and other algorithms listed in Table 17,
Ant-1.7

with respect to AUC performance measure. The best values


0.588
0.623

are highlighted in bold. We observe that the proposed


approach achieves the highest values of AUC measure in
Mylyn

seven datasets (KC1, PC3, camel-1.6, jedit-4.3, poi-3.0,


0.646
0.56

xalan-2.7, and xerces-1.4) out of the ten datasets used for


comparison with eighteen algorithms and second best in
Lucene

other three datasets (JM1, PC4, and ant-1.7). For the JM1
0.586
0.678

dataset, PCA-ANN obtained the highest AUC value of


0.81. For the PC4 dataset, RF and PCA-ANN obtained the
0.671
0.773
Table 13 Comparative results of MFWFS-1D-CNN with and without attention based on MCC
PDE

best AUC value of 0.97. For the ant-1.7 dataset, ANN


obtained the best AUC value of 0.847.
0.806
0.735

We have also compared MFWFS-1D-CNN-Attention


JDT

with state-of-the-art models including BERT [82] and


CodeBERT [83] transformer models used in the previous
0.815
0.897
EQ

studies for software defect prediction in Table 19. The


average performance values are recorded across all three
0.506
0.534

datasets—NASA, AEEEM, and PROMISE. Our proposed


PC5

approach consistently outperformed all other models in


terms of F-measure, AUC, G-mean, and MCC, demon-
0.709
0.73
PC4

strating its superiority. In case of AEEEM datasets, an


impressive F-measure of 0.855 is achieved which is sub-
0.624
0.667
PC3

stantially higher than the values obtained by CNN-


WSHCKE (0.691), BERT (0.540), and CodeBERT (0.32).
0.491
0.551

Similarly, in case of PROMISE datasets, a remarkable


KC1

F-measure of 0.768 is obtained, outperforming the


F-measure values of CNN-WSHCKE (0.485), BERT
0.475
0.509
MC1

(0.652), and CodeBERT (0.306). Furthermore, our


approach exhibits notable improvements in AUC, G-mean,
0.347
0.469

and MCC metrics across all datasets compared to the other


JM1

models.
In a nutshell, we can say that the MFWFS-1D-CNN-
Without attention

based attention model has a powerful ability to address the


With attention

SDP issue. In order to obtain the best feature subset for


each software project, the unique properties of wrapper
method OBWOA are integrated with IG ? CS ? RF filter

123
Neural Computing and Applications

Table 14 Wilcoxon test result based on F-measure, AUC, G-mean, and MCC
Comparison approach p-value (F- p-value p-value (G- p-value
measure) (AUC) mean) (MCC)

MFWFS-1D-CNN without attention/MFWFS-1D-CNN with S ? (0.030) S ? (0.041) NS (0.061) S ? (0.012)


attention

Table 15 Performance of different feature selection methods on 1D-CNN


Datasets Methodology F-measure AUC G-mean MCC

JM1 OBWOA 0.632 0.734 0.682 0.315


IG ? OBWOA 0.632 0.73 0.68 0.314
CS ? OBWOA 0.616 0.712 0.659 0.279
RF ? OBWOA 0.638 0.73 0.688 0.325
IG ? CS ? RF ? OBWOA 0.722 0.788 0.761 0.469
MC1 OBWOA 0.581 0.836 0.815 0.289
IG ? OBWOA 0.585 0.872 0.86 0.313
CS ? OBWOA 0.594 0.852 0.842 0.315
RF ? OBWOA 0.592 0.887 0.885 0.329
IG ? CS ? RF ? OBWOA 0.712 0.933 0.921 0.509
KC1 OBWOA 0.709 0.834 0.78 0.463
IG ? OBWOA 0.701 0.832 0.767 0.444
CS ? OBWOA 0.66 0.845 0.757 0.42
RF ? OBWOA 0.672 0.821 0.76 0.412
IG ? CS ? RF ? OBWOA 0.753 0.868 0.834 0.551
PC3 OBWOA 0.691 0.827 0.824 0.469
IG ? OBWOA 0.697 0.856 0.849 0.512
CS ? OBWOA 0.721 0.85 0.854 0.523
RF ? OBWOA 0.732 0.89 0.868 0.547
IG ? CS ? RF ? OBWOA 0.803 0.924 0.911 0.667
PC4 OBWOA 0.745 0.891 0.861 0.556
IG ? OBWOA 0.794 0.923 0.901 0.641
CS ? OBWOA 0.788 0.864 0.849 0.598
RF ? OBWOA 0.811 0.925 0.906 0.664
IG ? CS ? RF ? OBWOA 0.854 0.938 0.917 0.73
PC5 OBWOA 0.692 0.796 0.73 0.422
IG ? OBWOA 0.68 0.778 0.719 0.402
CS ? OBWOA 0.696 0.789 0.735 0.431
RF ? OBWOA 0.702 0.798 0.727 0.427
IG ? CS ? RF ? OBWOA 0.75 0.835 0.79 0.534
EQ OBWOA 0.78 0.816 0.796 0.583
IG ? OBWOA 0.806 0.862 0.833 0.658
CS ? OBWOA 0.797 0.845 0.815 0.624
RF ? OBWOA 0.84 0.879 0.856 0.701
IG ? CS ? RF ? OBWOA 0.889 0.93 0.897 0.897
JDT OBWOA 0.824 0.881 0.866 0.672
IG ? OBWOA 0.861 0.897 0.892 0.733
CS ? OBWOA 0.841 0.882 0.874 0.697
RF ? OBWOA 0.855 0.897 0.887 0.724
IG ? CS ? RF ? OBWOA 0.862 0.926 0.896 0.735

123
Neural Computing and Applications

Table 15 (continued)
Datasets Methodology F-measure AUC G-mean MCC

PDE OBWOA 0.734 0.827 0.805 0.505


IG ? OBWOA 0.805 0.846 0.848 0.624
CS ? OBWOA 0.799 0.861 0.853 0.619
RF ? OBWOA 0.778 0.841 0.843 0.583
IG ? CS ? RF ? OBWOA 0.879 0.937 0.927 0.773
Lucene OBWOA 0.613 0.732 0.72 0.305
IG ? OBWOA 0.704 0.864 0.828 0.475
CS ? OBWOA 0.673 0.771 0.747 0.385
RF ? OBWOA 0.708 0.835 0.828 0.48
IG ? CS ? RF ? OBWOA 0.831 0.9 0.881 0.678
Mylyn OBWOA 0.773 0.864 0.849 0.578
IG ? OBWOA 0.757 0.863 0.829 0.546
CS ? OBWOA 0.751 0.851 0.835 0.54
RF ? OBWOA 0.738 0.837 0.821 0.517
IG ? CS ? RF ? OBWOA 0.811 0.901 0.876 0.646
ant-1.7 OBWOA 0.771 0.837 0.798 0.555
IG ? OBWOA 0.757 0.831 0.784 0.527
CS ? OBWOA 0.786 0.839 0.813 0.585
RF ? OBWOA 0.794 0.821 0.811 0.601
IG ? CS ? RF ? OBWOA 0.809 0.837 0.822 0.623
camel-1.6 OBWOA 0.578 0.739 0.668 0.271
IG ? OBWOA 0.698 0.81 0.773 0.46
CS ? OBWOA 0.681 0.809 0.754 0.424
RF ? OBWOA 0.689 0.79 0.75 0.424
IG ? CS ? RF ? OBWOA 0.761 0.85 0.816 0.555
jedit-4.3 OBWOA 0.761 0.97 0.869 0.558
IG ? OBWOA 0.669 0.958 0.942 0.46
CS ? OBWOA 0.741 0.942 0.871 0.525
RF ? OBWOA 0.652 0.96 0.945 0.446
IG ? CS ? RF ? OBWOA 0.722 0.982 0.964 0.544
poi-3.0 OBWOA 0.774 0.797 0.794 0.574
IG ? OBWOA 0.795 0.822 0.82 0.619
CS ? OBWOA 0.8 0.821 0.818 0.641
RF ? OBWOA 0.807 0.833 0.832 0.64
IG ? CS ? RF ? OBWOA 0.818 0.838 0.835 0.649
Xalan-2.7 OBWOA 0.687 0.989 0.97 0.482
IG ? OBWOA 0.755 0.879 0.873 0.554
CS ? OBWOA 0.687 0.974 0.8 0.403
RF ? OBWOA 0.612 0.97 0.815 0.323
IG ? CS ? RF ? OBWOA 0.613 0.948 0.935 0.381
xerces-1.4 OBWOA 0.873 0.873 0.863 0.75
IG ? OBWOA 0.912 0.917 0.907 0.827
CS ? OBWOA 0.873 0.874 0.872 0.748
RF ? OBWOA 0.873 0.872 0.87 0.749
IG ? CS ? RF ? OBWOA 0.901 0.891 0.89 0.808
Bold values represent the highest values among all the techniques for each dataset

123
Neural Computing and Applications

Fig. 3 Box plot for F-measure, AUC, G-mean, and MCC values for different feature selection methodologies

123
Neural Computing and Applications

Table 16 Comparison of
Datasets Methodology F-measure AUC G-mean MCC
performance of proposed
feature selection with 1D-CNN JM1 IG ? CS ? RF ? OBWOA with CNN 0.722 0.788 0.761 0.469
and BERT transformer
IG ? CS ? RF ? OBWOA with BERT 0.516 0.571 0.503 0.287
MC1 IG ? CS ? RF ? OBWOA with CNN 0.712 0.933 0.921 0.509
IG ? CS ? RF ? OBWOA with BERT 0.579 0.5 0.235 0.117
KC1 IG ? CS ? RF ? OBWOA with CNN 0.753 0.868 0.834 0.551
IG ? CS ? RF ? OBWOA with BERT 0.611 0.818 0.521 0.475
PC3 IG ? CS ? RF ? OBWOA with CNN 0.803 0.924 0.911 0.667
IG ? CS ? RF ? OBWOA with BERT 0.47 0.58 0.569 0.18
PC4 IG ? CS ? RF ? OBWOA with CNN 0.854 0.938 0.917 0.73
IG ? CS ? RF ? OBWOA with BERT 0.381 0.589 0.53 0.153
PC5 IG ? CS ? RF ? OBWOA with CNN 0.75 0.835 0.79 0.534
IG ? CS ? RF ? OBWOA with BERT 0.463 0.562 0.534 0.173
EQ IG ? CS ? RF ? OBWOA with CNN 0.889 0.93 0.897 0.897
IG ? CS ? RF ? OBWOA with BERT 0.677 0.792 0.673 0.509
JDT IG ? CS ? RF ? OBWOA with CNN 0.862 0.926 0.896 0.735
IG ? CS ? RF ? OBWOA with BERT 0.861 0.914 0.805 0.729
PDE IG ? CS ? RF ? OBWOA with CNN 0.879 0.937 0.927 0.773
IG ? CS ? RF ? OBWOA with BERT 0.388 0.557 0.463 0.119
Lucene IG ? CS ? RF ? OBWOA with CNN 0.831 0.9 0.881 0.678
IG ? CS ? RF ? OBWOA with BERT 0.456 0.763 0.617 0.272
Mylyn IG ? CS ? RF ? OBWOA with CNN 0.811 0.901 0.876 0.646
IG ? CS ? RF ? OBWOA with BERT 0.393 0.615 0.554 0.155
Ant-1.7 IG ? CS ? RF ? OBWOA with CNN 0.809 0.837 0.822 0.623
IG ? CS ? RF ? OBWOA with BERT 0.452 0.752 0.621 0.304
Camel-1.6 IG ? CS ? RF ? OBWOA with CNN 0.761 0.85 0.816 0.555
IG ? CS ? RF ? OBWOA with BERT 0.314 0.684 0.563 0.218
Jedit-4.3 IG ? CS ? RF ? OBWOA with CNN 0.722 0.982 0.964 0.544
IG ? CS ? RF ? OBWOA with BERT 0.65 0.98 0.703 0.393
Poi-3.0 IG ? CS ? RF ? OBWOA with CNN 0.818 0.838 0.835 0.649
IG ? CS ? RF ? OBWOA with BERT 0.851 0.912 0.874 0.732
Xalan-2.7 IG ? CS ? RF ? OBWOA with CNN 0.613 0.948 0.935 0.381
IG ? CS ? RF ? OBWOA with BERT 0.643 0.874 0.723 0.548
Xerces-1.4 IG ? CS ? RF ? OBWOA with CNN 0.901 0.891 0.89 0.808
IG ? CS ? RF ? OBWOA with BERT 0.854 0.832 0.819 0.747
Bold values represent the highest values among all the techniques for each dataset

methods significantly outperformed than using them sepa- Internal validity threats may arise from unimpeded
rately. Further, applying a deep learning-based CNN model factors impacting experimental outcomes, such as experi-
with an attention mechanism results in the identification of mental errors, parameter settings, and implementation
crucial features that improve the performance of SDP. errors. Although the experiment process was carried out
very carefully, some errors may have gone unnoticed [84].
External validity threat is associated with the general-
7 Threats to validity ization and replication of our results. This threat is reduced
by applying the proposed model to 17 open-source datasets
This section presents a comprehensive list of potential from PROMISE, NASA, and AEEEM repositories that are
threats that could impact the results of the study. Exam- publicly available and used in earlier SDP studies. The
ining these threats is essential to enhance the reliability of projects belong to different programming languages, sizes,
the obtained results. and application domains. The parameter settings and fitness

123
Neural Computing and Applications

Table 17 Abbreviations of the algorithms used for comparison with the proposed model
Abbr. Algorithm References Implementation/parameter values

k-NN k-nearest neighbor [68, 69] The number of neighbors was varied between 1, 3, 5, up to
15
SVM Support vector machine [68, 69] Kernel used—RBF, A multilevel grid search was employed
to tune kernel width and regularization parameter CC,
ranging from log(C) = [- 6, - 5,…,20]
RF Random forest [68] Number of trees tuned between 10, 50, 100, 250, 500, and
1000, and the number of attributes selected per tree
pffiffiffiffiffi
as [0.5,1,2] 9 M , where M is the number of attributes in
the dataset
L-SVM Lagrangian support vector machine [68, 69] No kernel function used, A range from log.C = [6; 5;... 20]
has been evaluated
LS-SVM Least squares support vector machine [68, 69] Kernel—RBF, A multilevel grid search was employed to
tune kernel width and regularization parameter CC,
ranging from log(C) = [- 6, - 5,…,20]
NB Naı̈ve Bayes [69] Default (Gaussian NB)
LDA Linear discriminant analysis [68, 69] Solver: SVD, shrinkage:None
C4.5 Decision tree [68] Pruning strategies varied confidence levels from 0.05 to 0.7,
with and without Laplacian smoothing and subtree raising
ANN Artificial neural network [38] Three-layer network, hidden neurons (10), no. of epochs
(300), activation function (sigmoid), learning rate (0.1)
CNN- CNN-whale optimization-simulated annealing-based [3] Number of neurons in convolutional layers: 32, 64, 16,
WSHCKE kernel extreme machine learning kernel sizes: 3 9 3, 3 9 3, and 4 9 4, activation function:
ReLU, KELM hyperparameters: The Gaussian kernel’s
bandwidth r
SSA-BPNN Salp swarm algorithm-backpropagation neural [16] SSA: Population size—30, max iterations—300
network BPNN: three-layer network, No. of hidden layer:1, number
of hidden neurons: 2n ? 1
PCA-ANN Principal component analysis-ANN [69] PCA uses maximum likelihood estimation,
ANN- no. of hidden neurons: 2n ? 1,
Sigmoid activation function, minimizes MSE during training
LRNN Layered recurrent neural network [39] No. of iterations: 1000, No. of input layer neurons: no. of
features, No. of hidden layer neurons: number of features/
2. Number of neurons in output layer: 1, threshold value:
0.5
BGA- Binary genetic algorithm-binary particle swarm [39] BGA—Number of Iterations: 300, Population Size: 40,
BPSO- optimizaton-binary ant colony optimization-layered Crossover Rate: 0.7, Mutation Rate: 0.1, Selection Type:
BACO- recurrent neural network Roulette Wheel Selection, Crossover Type: Single, double,
LRNN or uniform (randomly selected each iteration)
BPSO—Number of Iterations: 300, Swarm Size: 40, Degree
of Influence (c1, c2): Both set to 1.5, Inertia Weight (w):
0.8, Maximum and Minimum Velocity (v_max, v_min): 1
and 0, respectively
BACO—Number of Iterations: 300,Number of Ants
(Agents): 20, Initial Pheromone Level: 1, Pheromone
Exponential Weight (a): 0.8, Heuristic Exponential
Weight (b): 0.8, Evaporation Rate: 0.6
TBWOA- Tournament selection method with binary [40] Population Size: 10, Number of Iterations: 100
DT Whale optimization algorith-decision tree Tournament Size (T): 3
EBMFOV3 Enhanced binary moth flame optimization [2] Population size: 20–50, Maximum Iterations: 100, Transfer
functions used: S-shaped and V-shaped

123
Neural Computing and Applications

Table 17 (continued)
Abbr. Algorithm References Implementation/parameter values

DP-ARNN Defect prediction via attention-based recurrent neural [43] Embedding dimension: 30, AST vector length: 2000
network Bi-LSTM units: 40 per layer, First hidden layer nodes: 16,
Second hidden layer nodes: 24, Batch size: 32, Epochs: 20,
Activation functions: tanh in the first layer, linear in the
second layer, sigmoid in the output layer, Loss function:
Binary Cross-Entropy
Optimizer: RMSprop
CNN CNN [43] Number of filters: 10, Filter length: 5, Fully connected layer
nodes: 100
RNN Recurrent neural network [43] Same as for DP-ARNN [43]

Table 18 Results of the proposed approach against eighteen other algorithms in respect of AUC
Techniques JM1 KC1 PC3 PC4 ant-1.7 camel-1.6 jedit-4.3 poi-3.0 xalan-2.7 xerces-1.4

KNN 0.71 0.7 0.77 0.87 – – – – – –


SVM 0.72 0.76 0.77 0.92 – – – – – –
RF 0.76 0.78 0.82 0.97 – 0.677 0.797 0.636 0.674 0.576
L-SVM 0.73 0.76 0.84 0.92 – – – – – –
LS-SVM 0.74 0.77 0.83 0.94 – – – – – –
NB 0.69 0.76 0.81 0.84 – – – – – –
LDA 0.73 0.78 0.82 0.88 – – – – – –
C4.5 0.72 0.71 0.78 0.93 – – – – – –
ANN 0.713 0.735 – – 0.847 0.681 0.461 – 0.817 –
SSA-BPNN 0.7 0.79 0.87 0.9 0.79 – 0.97 – –
PCA-ANN 0.81 0.79 0.89 0.97 – – – – – –
LRNN – – - - 0.605 0.542 0.696 – 0.443 –
BGA-BPSO-BACO-LRNN 0.523 0.505 0.684 – 0.42 –
TBWOA-DT – – – – 0.737 0.677 0.847 – 0.796 –
EBMFOV3 – – – – 0.762 0.658 0.75 – 0.846 –
DP-ARNN – – – – 0.79 0.82 0.796 0.674 0.761
CNN 0.732 0.841 0.745 0.674 0.671
RNN 0.766 0.842 0.764 0.654 0.73
Proposed approach 0.788 0.868 0.924 0.938 0.837 0.85 0.974 0.838 0.948 0.891
Bold values represent the highest values among all the techniques for each dataset

functions are clearly specified to make it easy for Construct validity threats involve the appropriate
researchers to replicate the results [85, 86]. selection of evaluation metrics. AUC, F-measure, MCC,
Conclusion validity threat ensures that the drawn con- and G-mean are well-established metrics used in the pre-
clusions are statistically valid. To address this concern, we vious SDP studies. Therefore, this threat should also be
use a non-parametric Wilcoxon test to ensure we obtain acceptable [88, 89].
statistically valid results. Using stable performance mea-
sures such as F-measure, AUC, MCC, and G-mean to
evaluate results avoids the threat of conclusion validity
[87].

123
Neural Computing and Applications

8 Conclusion and future work

MCC

0
0
0
In conclusion, this study offers a novel hybrid model for

G-mean
SDP utilizing a multiple filter wrapper feature selection

0.425

0.396
0.5
(MFWFS) technique which combines the features obtained
from three filter methods—Information Gain, Chi-square,

0.497

0.503
AUC
and Relief-F and one swarm-based oppositional whale
CodeBERT

0.5
optimization algorithm. It effectively selects fewer optimal
features by taking advantage of both the filter and wrapper
F-measure

0.306 methods. The features obtained from the MFWFS are


0.355 integrated with the automatic features generated from the
0.32

output of attention-based 1D-CNN to construct a powerful


unified defect predictor. The hybrid model is evaluated on
0.300
0.407
0.002
MCC

17 different software defect datasets using four perfor-


mance metrics—F-measure, MCC, AUC, and G-mean to
validate its effectiveness in SDP. Wilcoxon test is used for
G-mean

0.652
0.685
0.448

statistical analysis to validate our findings.


The results emphasize the significance of feature
selection (MFWFS) in SDP so that developers can mainly
0.730
0.883
0.548
AUC

focus on those software modules that are defect-prone,


which helps minimize the software maintenance cost. The
F-measure

attention layer is also able to capture more crucial features


that ultimately enhance the classification performance of
BERT

0.540
0.652
0.349

1D-CNN. The MFWFS-1D-CNN-Attention model yielded


a better SDP model than individual filter and wrapper-
0.484
0.353
MCC

based models as well as other state-of-the-art methods from


the previous literature studies across all the datasets used in

the study.
G-mean

In the future studies, it would be valuable to evaluate the


0.799
0.646
Table 19 Average performance values of the proposed approach against other models

effectiveness of our SDP models on a wider range of open-



Bold values represent the highest values among all the techniques for each dataset

source and commercial projects utilizing various pro-


0.792
0.744
AUC

gramming languages like Python. We can explore auto-


CNN-WSHCKE

mated parameter tuning techniques to optimize the


parameter configuration of our models. Additionally, we
F-measure

can extend our implementation to address cross-version


0.691
0.485

SDP problems.

0.746
0.589
0.577
MCC

Author contributions All authors jointly contributed to the concep-


tualization and design of the study. Sonali Chawla undertook the data
collection and analysis process. Sonali Chawla drafted the initial
G-mean

manuscript, which was subsequently reviewed and commented on by


0.895
0.876
0.856

Anjali Sharma and Ruchika Malhotra. All authors participated in the


review process and provided final approval for the manuscript.
0.919

0.881
AUC
Proposed approach

0.89

Funding The authors declare that no funds, grants, or other support


were received during the preparation of this manuscript.
F-measure

Data availability The datasets that support this study are publicly
available. They are taken from http://promise.site.uottawa.ca/SER
0.855
0.768
0.766

epository/, http://purl.org/MarianJureczko/MetricsRepo, and http://


bug.inf.usi.ch/.
PROMISE
AEEEM
Average

NASA

123
Neural Computing and Applications

Declarations 15. Manivasagam G, Gunasundari R (2018) ‘‘An optimized feature


selection using fuzzy mutual information based ant colony opti-
Conflict of interest The authors have no competing interests to mization for software defect prediction. Int J Eng Technol
declare that are relevant to the content of this article. 7:456–460
16. Kassaymeh S, Abdullah S, Al-Betar MA, Alweshah M (2022)
Ethical approval This study does not contain any studies with human Salp swarm optimizer for modeling the software fault prediction
or animal subjects performed by any of the authors. problem. J King Saud Univ - Comput Inf Sci 34(6):3365–3378.
https://doi.org/10.1016/J.JKSUCI.2021.01.015
17. Mirjalili S, Lewis A (2016) The whale optimization algorithm.
Adv Eng Softw 95:51–67. https://doi.org/10.1016/J.ADVENG
References SOFT.2016.01.008
18. Alamri HS, Alsariera YA, Zamli KZ (2018) Opposition-based
1. Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for whale optimization algorithm. J Comput Theor Nanosci
cross-company software defect prediction. Inf Softw Technol 24(10):7461–7464. https://doi.org/10.1166/ASL.2018.12959
54(3):248–256. https://doi.org/10.1016/J.INFSOF.2011.09.007 19. Balogun AO et al (2020) Impact of feature selection methods on
2. Tumar I, Hassouneh Y, Turabieh H, Thaher T (2020) Enhanced the predictive performance of software defect prediction models:
binary moth flame optimization as a feature selection algorithm to an extensive empirical study. Symmetry. https://doi.org/10.3390/
predict software fault prediction. IEEE Access 8:8041–8055. SYM12071147
https://doi.org/10.1109/ACCESS.2020.2964321 20. Balogun AO, Basri S, Abdulkadir SJ, Hashim AS (2019) Per-
3. Zhu K, Ying S, Zhang N, Zhu D (2021) Software defect pre- formance analysis of feature selection methods in software defect
diction based on enhanced metaheuristic feature selection opti- prediction: a search method approach. Appl Sci. https://doi.org/
mization and a hybrid deep neural network. J Syst Softw 10.3390/APP9132764
180:111026. https://doi.org/10.1016/J.JSS.2021.111026 21. Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via
4. Wahono RS, Suryana N, Ahmad S (2014) Metaheuristic opti- convolutional neural network. In: Proceedings - 2017 IEEE
mization based feature selection for software defect prediction. international conference on software quality, reliability and
J Softw 9:5. https://doi.org/10.4304/jsw.9.5.1324-1333 security, QRS 2017, Institute of Electrical and Electronics
5. Liu S, Chen X, Liu W, Chen J, Gu Q, Chen D (2014) ‘‘FECAR: A Engineers Inc., pp 318–328. https://doi.org/10.1109/QRS.2017.
feature selection framework for software defect prediction. In: 42.
Proceedings - international computer software and applications 22. Singh Y, Kaur A, Malhotra R (2009) Software fault proneness
conference. IEEE Computer Society, pp 426–435. https://doi.org/ prediction using support vector machines. Eng Comput Sci
10.1109/COMPSAC.2014.66. 2176(1):240–245
6. Abe N, Kudo M (2006) Non-parametric classifier-independent 23. Huang CL, Wang CJ (2006) A GA-based feature selection and
feature selection. Pattern Recognit 39(5):737–746. https://doi.org/ parameters optimizationfor support vector machines. Expert Syst
10.1016/J.PATCOG.2005.11.007 Appl 31(2):231–240. https://doi.org/10.1016/j.eswa.2005.09.024
7. Laura Emmanuella LEA, De Paula Canuto AM (2014) Filter- 24. Ceylan E, Kutlubay FO, Bener AB, (2006) Software defect
based optimization techniques for selection of feature subsets in identification using machine learning techniques
ensemble systems. Expert Syst Appl 41(4):1622–1631. https:// 25. Ceylan E, Kutlubay FO, Bener AB, (2006) Software Defect
doi.org/10.1016/J.ESWA.2013.08.059 Identification Using Machine Learning Techniques. In: 32nd
8. Llobet E et al (2007) Efficient feature selection for mass spec- EUROMICRO conference on software engineering and advanced
trometry based electronic nose applications. Chemom Intell Lab applications (EUROMICRO06). https://doi.org/10.1109/EURO
Syst 85(2):253–261. https://doi.org/10.1016/J.CHEMOLAB. MICRO.2006.56
2006.07.002 26. Manjula C, Florence L (2019) Deep neural network based hybrid
9. Dash M, Liu H (1997) Feature selection for classification. Intell approach for software defect prediction using software metrics.
Data Anal 1(1–4):131–156. https://doi.org/10.1016/S1088- Cluster Comput 22(4):9847–9863. https://doi.org/10.1007/
467X(97)00008-5 s10586-018-1696-z
10. Hsu HH, Hsieh CW, Da Lu M (2011) Hybrid feature selection by 27. Li Z, Li T, Wu Y, Yang L, Miao H, Wang D (2021) Software
combining filters and wrappers. Expert Syst Appl defect prediction based on hybrid swarm intelligence and deep
38(7):8144–8150. https://doi.org/10.1016/J.ESWA.2010.12.156 learning. Comput Intell Neurosci. https://doi.org/10.1155/2021/
11. Sarro F, Di Martino S, Ferrucci F, Gravino C (2012) A further 4997459
analysis on the use of genetic algorithm to configure support 28. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature
vector machines for inter-release fault prediction. In: Proceedings 521(7553):436–444. https://doi.org/10.1038/nature14539
of the ACM symposium on applied computing, pp 1215–1220. 29. Anvarjon T, Mustaqeem, Kwon S (2020) Deep-Net: A light-
https://doi.org/10.1145/2245276.2231967. weight CNN-based speech emotion recognition system using
12. Moussa R, Azar D (2017) A PSO-GA approach targeting fault- deep frequency features. Sensors 20(18):5212. https://doi.org/10.
prone software modules. J Syst Softw 132:41–49. https://doi.org/ 3390/S20185212
10.1016/j.jss.2017.06.059 30. Shen Y, He X, Gao J, Deng L, Mesnil G (2014) Learning
13. Kumar PR, Saradhi Varma GP (2018) A novel probabilistic-ABC semantic representations using convolutional neural networks for
based boosting model for software defect detection. In: Pro- web search. In: Proceedings of the 23rd International conference
ceedings of 2017 international conference on innovations in on World Wide Web, association for computing machinery. Inc,
information, embedded and communication systems, ICIIECS pp 373–374. https://doi.org/10.1145/2567948.2577348.
2017, Institute of Electrical and Electronics Engineers Inc., 31. Galvez RL, Bandala AA, Dadios EP, Vicerra RRP, Maningo JMZ
pp 1–6. https://doi.org/10.1109/ICIIECS.2017.8276059 (2019) Object detection using convolutional neural networks. In:
14. Arar ÖF, Ayan K (2015) Software defect prediction using cost- IEEE Region 10 Annual International Conference, Proceedings/
sensitive neural network. Appl Soft Comput J 33:263–277. TENCON, Institute of Electrical and Electronics Engineers Inc.,
https://doi.org/10.1016/j.asoc.2015.04.045 pp 2023–2027. https://doi.org/10.1109/TENCON.2018.8650517

123
Neural Computing and Applications

32. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ RELIEF. Springer Verlag, Berlin, pp 171–182. https://doi.org/10.
(2021) 1D convolutional neural networks and applications: a 1007/3-540-57868-4_57/COVER
survey. Mech Syst Signal Process. https://doi.org/10.1016/J. 48. Ahmed S, Zhang M, Peng L (2014) Improving feature ranking for
YMSSP.2020.107398 biomarker discovery in proteomics mass spectrometry data using
33. Avci O, Abdeljaber O, Kiranyaz S, Boashash B, Sodano H, genetic programming. Connect Sci 26(3):215–243. https://doi.
Inman D (2018) Efficiency validation of one dimensional con- org/10.1080/09540091.2014.906388
volutional neural networks for structural damage detection using 49. Chandrashekar G, Sahin F (2014) A survey on feature selection
A SHM Benchmark Data. In: 25th International congress on methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.
sound and vibration, Hiroshima, pp 4600–4607 1016/j.compeleceng.2013.11.024
34. Kiranyaz S, Ince T, Hamila R, Gabbouj M (2015)‘‘Convolutional 50. Mafarja M, Mirjalili S (2018) Whale optimization approaches for
Neural Networks for patient-specific ECG classification. In: wrapper feature selection. Appl Soft Comput 62:441–453. https://
Proceedings of the annual international conference of the IEEE doi.org/10.1016/J.ASOC.2017.11.006
Engineering in Medicine and Biology Society, EMBS, Institute of 51. Rahnamayan S, Tizhoosh HR, Salama MMA (2008) Opposition
Electrical and Electronics Engineers Inc., pp 2608–2611. https:// versus randomness in soft computing techniques. Appl Soft
doi.org/10.1109/EMBC.2015.7318926 Comput 8(2):906–918. https://doi.org/10.1016/J.ASOC.2007.07.
35. Avci O, Abdeljaber O, Kiranyaz S, Hussein M, Gabbouj M, 010
Inman DJ (2021) A review of vibration-based damage detection 52. Tizhoosh HR (2005) Opposition-based learning: a new
in civil structures: From traditional methods to Machine Learning scheme for machine intelligence. In: International conference on
and Deep Learning applications. Mech Syst Signal Process computational intelligence for modelling, control and automation
147:107077. https://doi.org/10.1016/J.YMSSP.2020.107077 and international conference on intelligent agents, web tech-
36. Ince T, Kiranyaz S, Eren L, Askar M, Gabbouj M (2016) Real- nologies and internet commerce (CIMCA-IAWTIC’06),
time motor fault detection by 1-D convolutional neural networks. pp 695–701. https://doi.org/10.1109/CIMCA.2005.1631345
IEEE Trans Industr Electron 63(11):7067–7075. https://doi.org/ 53. Wang H, Wu Z, Rahnamayan S, Liu Y, Ventresca M (2011)
10.1109/TIE.2016.2582729 Enhancing particle swarm optimization using generalized oppo-
37. Jiang H (2010) Discriminative training of HMMs for automatic sition-based learning. Inf Sci (N Y) 181(20):4699–4714. https://
speech recognition: a survey. Comput Speech Lang doi.org/10.1016/j.ins.2011.03.016
24(4):589–608. https://doi.org/10.1016/J.CSL.2009.08.002 54. Rahnamayan RS, Tizhoosh HR, Salama MMA (2008) Opposi-
38. Jin C, Jin SW (2015) Prediction approach of software fault- tion-based differential evolution. IEEE Trans Evol Comput
proneness based on hybrid artificial neural network and quantum 12(1):64–79. https://doi.org/10.1109/TEVC.2007.894200
particle swarm optimization. Appl Soft Comput J 35:717–725. 55. Ibrahim RA, Elaziz MA, Lu S (2018) Chaotic opposition-based
https://doi.org/10.1016/j.asoc.2015.07.006 grey-wolf optimization algorithm based on differential evolution
39. Turabieh H, Mafarja M, Li X (2019) Iterated feature selection and disruption operator for global optimization. Expert Syst Appl
algorithms with layered recurrent neural network for software 108:1–27. https://doi.org/10.1016/J.ESWA.2018.04.028
fault prediction. Expert Syst Appl 122:27–42. https://doi.org/10. 56. Malisia AR, Tizhoosh HR (2007) Applying opposition-based
1016/J.ESWA.2018.12.033 ideas to the ant colony system. In: Proceedings of the 2007 IEEE
40. Hassouneh Y, Turabieh H, Thaher T, Tumar I, Chantar H, Too J swarm intelligence symposium, SIS 2007, Honolulu, pp 182–189.
(2021) Boosted whale optimization algorithm with natural https://doi.org/10.1109/SIS.2007.368044
selection operators for software fault prediction. IEEE Access 57. Wang WL, Li WK, Wang Z, Li L (2019) Opposition-based multi-
9:14239–14258. https://doi.org/10.1109/ACCESS.2021.3052149 objective whale optimization algorithm with global grid ranking.
41. Balogun AO et al (2021) A novel rank aggregation-based hybrid Neurocomputing 341:41–59. https://doi.org/10.1016/J.NEU
multifilter wrapper feature selection method in software defect COM.2019.02.054
prediction. Comput Intell Neurosci. https://doi.org/10.1155/2021/ 58. Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of
5069016 the impact of feature selection techniques on defect classification
42. Zain ZM, Sakri S, Ismail NHA, Parizi RM (2022) Software models. In: IEEE International working conference on mining
defect prediction harnessing on multi 1-dimensional convolu- software repositories. IEEE Computer Society, pp 146–157.
tional neural network structure. Comput, Mater Continua https://doi.org/10.1109/MSR.2017.18
71(1):1521–1546. https://doi.org/10.32604/cmc.2022.022085 59. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf
43. Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect optimization approaches for feature selection. Neurocomputing
prediction via attention-based recurrent neural network. Sci 172:371–381. https://doi.org/10.1016/j.neucom.2015.06.083
Program. https://doi.org/10.1155/2019/6230953 60. Mafarja M, Jarrar R, Ahmad S, Abusnaina AA (2018) Feature
44. Akay R, Akay B (2020) Artificial bee colony algorithm and an selection using Binary Particle Swarm optimization with time
application to software defect prediction. In: Nature-inspired varying inertia weight strategies. In: ACM international confer-
methods for metaheuristics optimization, vol. 16, Springer, ence proceeding series, association for computing machinery.
pp 73–92. https://doi.org/10.1007/978-3-030-26458-1_5. https://doi.org/10.1145/3231053.3231071
45. Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature 61. Sumbul G, Cinbis RG, Aksoy S (2019) Multisource region
selection on defect prediction performance: an empirical com- attention network for fine-grained object recognition in remote
parison. In: Proceedings - International symposium on software sensing imagery. IEEE Trans Geosci Remote Sens
reliability engineering, ISSRE. IEEE Computer Society, 57(7):4929–4937. https://doi.org/10.1109/TGRS.2019.2894425
pp 309–320. https://doi.org/10.1109/ISSRE.2016.13 62. Qi X, Li K, Liu P, Zhou X, Sun M (2020) Deep attention and
46. Liu H, Setiono R (1955) ‘‘Chi2: feature selection and dis- multi-scale networks for accurate remote sensing image seg-
cretization of numeric attributes. In: Proceedings of 7th IEEE mentation. IEEE Access 8:146627–146639. https://doi.org/10.
International conference on tools with artificial intelligence. 1109/ACCESS.2020.3015587
IEEE, Singapore, pp 388–391. https://doi.org/10.1109/TAI.1995. 63. Li W, Liu K, Zhang L, Cheng F (2020) Object detection based on
479783 an adaptive attention mechanism. Sci Rep 10(1):1–13. https://doi.
47. Kononenko I(1994) Estimating attributes: analysis and extensions org/10.1038/s41598-020-67529-x
of RELIEF. In: Estimating attributes: analysis and extensions of

123
Neural Computing and Applications

64. Hou Q, Zhou D, Feng J (2021) ‘‘Coordinate attention for efficient prediction. IEEE Trans Software Eng 45(12):1253–1269. https://
mobile network design. In: Proceedings of the IEEE computer doi.org/10.1109/TSE.2018.2836442
society conference on computer vision and pattern recognition, 78. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact
IEEE Computer Society. pp 13708–13717. https://doi.org/10. of classification techniques on the performance of defect pre-
1109/CVPR46437.2021.01350 diction models. In: 2015 IEEE/ACM 37th IEEE International
65. Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some conference on software engineering. IEEE, Florence,
comments on the NASA software defect datasets. IEEE Trans pp 789–800. https://doi.org/10.1109/ICSE.2015.91
Software Eng 39(9):1208–1215. https://doi.org/10.1109/TSE. 79. De Carvalho AB, Pozo A, Vergilio S, Lenz A (2008) Predicting
2013.11 fault proneness of classes trough a multiobjective particle swarm
66. Jureczko M, Madeyski L (2010) Towards identifying software optimization algorithm. In: Proceedings - international confer-
project clusters with regard to defect prediction. In: ACM inter- ence on tools with artificial intelligence. ICTAI, pp 387–394.
national conference proceeding series, pp 1–10. https://doi.org/ https://doi.org/10.1109/ICTAI.2008.76
10.1145/1868328.1868342 80. Demšar J (2006) Statistical comparisons of classifiers over mul-
67. Ambros MD’, Lanza M, Robbes R (2010) An extensive com- tiple data sets
parison of bug prediction approaches. In: Proceedings - interna- 81. Wilcoxon F (1945) Individual comparisons by ranking methods.
tional conference on software engineering, pp 31–41. https://doi. Biometrics Bullet 1(6):80–83
org/10.1109/MSR.2010.5463279 82. Uddin MN, Li B, Ali Z, Kefalas P, Khan I, Zada I (2022) Soft-
68. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking ware defect prediction employing BiLSTM and BERT-based
classification models for software defect prediction: a proposed semantic feature. Soft comput 26(16):7877–7891. https://doi.org/
framework and novel findings. In: IEEE transactions on software 10.1007/s00500-022-06830-5
engineering, pp 485–496. https://doi.org/10.1109/TSE.2008.35 83. Pan C, Lu M, Xu B (2021) An empirical study on software defect
69. Jayanthi R, Florence L (2019) Software defect prediction tech- prediction using codebert model. Appl Sci (Switzerland) 11:11.
niques using metrics based on neural network classifier. Cluster https://doi.org/10.3390/app11114793
Comput 22(1):77–88. https://doi.org/10.1007/S10586-018-1730- 84. Malhotra R, Khanna M (2018) Threats to validity in search-based
1/METRICS predictive modelling for software engineering. IET Software
70. Li J et al. (2017) Rare event prediction using similarity majority 12(4):293–305. https://doi.org/10.1049/IET-SEN.2018.5143
under-sampling technique. In: Soft computing in data science. 85. Kondo M, Bezemer CP, Kamei Y, Hassan AE, Mizuno O (2019)
SCDS 2017 communications in computer and information sci- The impact of feature reduction techniques on defect prediction
ence. Springer, Singapore, pp 23–39. https://doi.org/10.1007/978- models. Empir Softw Eng 24(4):1925–1963. https://doi.org/10.
981-10-7242-0_3/COVER 1007/S10664-018-9679-5/METRICS
71. Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction 86. Lu H, Kocaguneli E, Cukic B (2014) Defect prediction between
for imbalanced data. In: 2015 IEEE/ACM 37th IEEE Interna- software versions with active learning and dimensionality
tional conference on software engineering. IEEE Computer reduction. In: 2014 IEEE 25th International symposium on soft-
Society, pp 99–108. https://doi.org/10.1109/ICSE.2015.139 ware reliability engineering. IEEE Computer Society,
72. Farid AB, Fathy EM, Eldin AS, Abd-Elmegid LA (2021) Soft- pp 312–322. https://doi.org/10.1109/ISSRE.2014.35
ware defect prediction using hybrid model (CBIL) of convolu- 87. Malhotra R, Lata K (2021) An empirical study to investigate the
tional neural network (CNN) and bidirectional long short-term impact of data resampling techniques on the performance of class
memory (Bi-LSTM). PeerJ Comput Sci 7:1–22. https://doi.org/ maintainability prediction models. Neurocomputing
10.7717/PEERJ-CS.739 459:432–453. https://doi.org/10.1016/J.NEUCOM.2020.01.120
73. Srinivasan K, Fisher D (1995) Machine learning approaches to 88. Li M, Zhang H, Wu R, Zhou ZH (2012) Sample-based software
estimating software development effort. IEEE Trans Software defect prediction with active and semi-supervised learning.
Eng 21(2):126–137. https://doi.org/10.1109/32.345828 Autom Softw Eng 19(2):201–230. https://doi.org/10.1007/
74. Wei H, Hu C, Chen S, Xue Y, Zhang Q (2019) Establishing a S10515-011-0092-1/METRICS
software defect prediction model via effective dimension reduc- 89. Zhou T, Sun X, Xia X, Li B, Chen X (2019) Improving defect
tion. Inf Sci (N Y) 477:399–409. https://doi.org/10.1016/J.INS. prediction with deep forest. Inf Softw Technol 114:204–216.
2018.10.056 https://doi.org/10.1016/J.INFSOF.2019.07.003
75. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K
(2017) An empirical comparison of model validation techniques Publisher’s Note Springer Nature remains neutral with regard to
for defect prediction models. IEEE Trans Software Eng jurisdictional claims in published maps and institutional affiliations.
43(1):1–18. https://doi.org/10.1109/TSE.2016.2584050
76. Zhu K, Zhang N, Ying S, Wang X (2020) Within-project and
Springer Nature or its licensor (e.g. a society or other partner) holds
cross-project software defect prediction based on improved
exclusive rights to this article under a publishing agreement with the
transfer naive bayes algorithm. Comput, Mater Continua
author(s) or other rightsholder(s); author self-archiving of the
63(2):891–910. https://doi.org/10.32604/CMC.2020.08096
accepted manuscript version of this article is solely governed by the
77. Song Q, Guo Y, Shepperd M (2019) A comprehensive investi-
terms of such publishing agreement and applicable law.
gation of the role of imbalanced learning for software defect

123

You might also like