0% found this document useful (0 votes)
3 views

Robust_malicious_software_detection_and_classifica

The document presents a novel technique called Global Whale Optimization Algorithm with Neutrosophic Logic for Software Malware Detection and Classification (GWOANL-SMDC), which enhances malware detection using deep learning and feature selection methods. It employs Neutrosophic Cognitive Maps for feature selection and a convolutional long short-term memory model for classification, achieving improved detection capabilities. The study highlights the importance of adaptive systems in identifying evolving malware threats in cybersecurity.

Uploaded by

shariad9158
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Robust_malicious_software_detection_and_classifica

The document presents a novel technique called Global Whale Optimization Algorithm with Neutrosophic Logic for Software Malware Detection and Classification (GWOANL-SMDC), which enhances malware detection using deep learning and feature selection methods. It employs Neutrosophic Cognitive Maps for feature selection and a convolutional long short-term memory model for classification, achieving improved detection capabilities. The study highlights the importance of adaptive systems in identifying evolving malware threats in cybersecurity.

Uploaded by

shariad9158
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

www.nature.

com/scientificreports

OPEN Robust malicious software


detection and classification
using global whale optimization
algorithm with deep learning
approach
Mohammed Assiri
Software malware detection and classification leverage sophisticated procedures and methods
from the cybersecurity domain for identifying and categorizing malicious software, generally called
malware. This procedure analyses code behaviour, file structures, and other features to distinguish
between benign and malicious programs. Machine learning (ML) and artificial intelligence (AI) are
vital in this domain, allowing the progress of dynamic and adaptive systems that identify novel and
developing malware attacks. By training on massive datasets of benign and malicious instances, these
systems learn patterns and signatures indicative of malware. This lets them correctly categorize and
respond to potential attacks in real-time. This study presents a Global Whale Optimization Algorithm
with Neutrosophic Logic for Software Malware Detection and Classification (GWOANL-SMDC)
technique. The GWOANL-SMDC technique secures the software via the Android malware recognition
process. Primarily, the GWOANL-SMDC technique employs the Neutrosophic Cognitive Maps (NCM)
model for the feature selection process. The GWOANL-SMDC technique uses a convolutional long
short-term memory (ConvLSTM) model for software malware detection. At last, the GWOA-based
parameter tuning is performed to improve the performance of the ConvLSTM model. The simulation
values of the GWOANL-SMDC technique are examined on the malware dataset. The obtained results
ensured that the GWOANL-SMDC technique improved capability in detecting software malware.

Keywords Malware detection, Neutrosophic logic, Parameter tuning, Neutrosophic cognitive maps, Feature
selection

Recently, cyberattacks have been the most severe problem in the domain of modern technology. The term suggests
using a system’s errors for malicious activities like altering, stealing, or destroying. Malware is an instance of a
cyberattack1. Malware is a group of instructions or codes developed to affect the user, computer, business, or
computer system. The word “malware” comprises extensive attacks such as scareware, viruses, Trojan horses,
rogue software, spyware, adware, wipers, ransomware, etc. Malicious software is some part of code that will be
executed without user knowledge or permission2. Malware detection techniques have been used to evaluate the
data, which could be gathered and trained to identify whether a specific section of software or network link finds
a security issue. For example, explore a ML method that will demonstrate the principles that inspire the patterns
it detected3. Methods are trained using the ML approach, which will increase their capability for prediction,
employing feedback about how well they executed prior tasks and utilizing that data to make modifications4.
Malware Classification is a method of allocating a malware sample to a particular malware family. Malware
within the family exchanges the same features that could be employed for creating signatures for detection and
classification5. Signatures are considered static or dynamic depending upon how they can be extracted. Major
causes for producing a higher volume of malware instances are the wide-ranging usage of malware developer’s
obfuscation method that describes that malicious files from a similar malware family (for example, similar code
and common origin) have been incessantly adapted and obfuscated. Consequently, a generalized ML-based
malware analysis was deliberated as a real-world solution and will be executed well under unnoticed samples6. In
this context, dynamic and static analysis could be employed for malware detection and classification in training.

Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz
University, 16273 Al-Kharj, Saudi Arabia. email: [email protected]

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 1


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Static techniques typically analyze the malware’s program (machine or assembly) without its performance7. At
the same time, the malware’s dynamic technique behaviour is observed in its execution stage. Both categories of
analysis have their disadvantages8. For instance, in static methods, the susceptibility in the code can be dug out at
the correct location. At the same time, the dynamic technique could perform this function better. Alternatively,
the benefit of static analysis is that malware must be identified before its execution. Dynamic methods permit
recapture control of affected systems that cannot be in static methods9. During the malware analysis, malware
classification is significant because classifying different types of malware is vital to knowing how they will affect
personal computers, the risk level they provide, and how to secure them. In this condition, malware is recognized
and could be allocated to the more proper malware family over a classification method10.
This study presents a Global Whale Optimization Algorithm with Neutrosophic Logic for Software Malware
Detection and Classification (GWOANL-SMDC) technique. Primarily, the GWOANL-SMDC technique
employs the Neutrosophic Cognitive Maps (NCM) model for the feature selection (FS) process. The GWOANL-
SMDC technique uses a convolutional long short-term memory (ConvLSTM) approach for software malware
detection. Finally, GWOA-based parameter tuning is performed to improve the performance of the ConvLSTM
methodology. The experimental outcomes of the GWOANL-SMDC methodology are examined using the
malware dataset.

Literature works
Madhloom et al.11 developed a structure of an innovative packet-filter firewall system that overcomes the
restrictions of existing FPN-based filter techniques. The main contribution is to utilize SNPNs as a tool for
designing discrete occurrence structures in the region of the firewall packet filter, which can be represented by
inexact knowledge. Yasser et al.12 presented a robust, different, and intelligent analytical tool for automatically
recognizing COVID-19 by employing obtainable resources in digital chest X-rays (CXR). The introduced
method was a hybrid architecture dependent upon combining two methods such as ML and Neutrosophic
techniques (NTs). Classification features have been mined from X-ray images employing principal component
analysis (PCA) and morphological features (MFs). In13, a hybrid technique of intuitionistic fuzzy set (IFS) and
rough set theory has been developed. This technique is a classification model that obtains the benefits of two:
one is a rough set, and the other is IFS for handling indiscernibility, vagueness, and intrinsic uncertainty in the
database. The method categorizes the data samples, which could be exhibited using natural language. Rahman
et al.14 aimed to develop a new idea of parameterization of fuzzy sets at the hypersoft set background with
undefined constituents of neutrosophic set and IFS.
Kadali et al.15 introduced the game theory model, an analytical technique to evaluate individuals’ diverse
criminal behaviour maps. Based on Neutrosophic logic (NL) analysis, game theory encompasses identified
individual crimes from randomized crimes, employing clusters of randomization collected. The developed
method implemented an assessment of the Intra- or inter-cluster correlation coefficient (ICCC) on criminal
data (uncertainty and certainty) for determining the sizes of crime instances. In16, an innovative approach for
categorizing BC employing NTs and ML methods was presented, called the BC Classification Strategy (BC2S),
which contains two stages. The major target of the data preprocessing stage is to (1) features extraction, (2)
choose the informative features employing an innovative FS technique named Efficient ACO (EACO), and (3)
transfer the chosen features from the traditional field into neutrosophic field employing NTs. The developed
classification method employed the Deep Neural Network (DNN) technique. Jennifer and Sharmila17 considered
employing NT of categorizing into True (T), False (F), and Indeterminacy (I) set participation. Firstly, the images
have been preprocessed by alpha-mean and beta-improvement functions to decrease the Indeterminacy and
enrich the image constituents as the ranges of lung opacity range for determining the categories. Subsequently,
the NT-improved images have been provided with diverse DL methods such as ConvLSTM, VGG-16, and
ResNet-50 for classification. Alomari et al.18 present a high-performance malware detection system utilizing DL
and feature selection. Two malware datasets are preprocessed, and correlation-based feature selection creates
various feature-selected datasets.
Şahin et al.19 introduce a novel Android malware detection system employing filter-based feature selection
techniques for static analysis with ML. It utilizes permissions from application files as features and applies eight-
dimension reduction models. Four methods are tailored for Android malware detection, while the other four
are adapted from text classification. Akhiat et al.20 propose an effectual ensemble feature selection for intrusion
detection systems (IDS-EFS) to choose the optimum performing subset for attack detection. Ngo et al.21
compare two feature reduction methods. Feature selection usually presents improved detection performance
and faster processing as feature count increases, while feature extraction is more reliable with fewer features and
less sensitivity to feature count changes. Varzaneh and Hosseini22 present an enhanced equilibrium optimization
method called Levy-opposition-equilibrium optimization (LOEO) for feature selection in network IDSs. By
integrating opposition-based learning to improve population diversity and the Levy flight method to avert local
optima, the binary version, BLOEO, intelligently chooses the most informative features from high-dimensional
data. Li et al.23 compare feature extraction and selection for IoT network intrusion detection. Feature extraction
generally accomplishes better with fewer features and less sensitivity to changes. Eljialy, Uddin, and Ahmad24
introduce a multi-step feature selection process followed by classification. It utilizes various feature selection
methods to detect high-scoring features for anomaly detection, creating a candidate dataset. Multiple
classification algorithms are used later in this dataset to develop the models.
An innovative packet-filter firewall system utilizing SNPNs addresses existing FPN models’ limitations
but may face scalability difficulty against evolving threats. A hybrid architecture for COVID-19 recognition
integrates ML and Neutrosophic methods, yet its efficiency relies heavily on the quality of available X-ray data.
A classification model that integrates IFS and rough set theory encounters threats with high-dimensional data,
potentially resulting in computational complexity. The parameterization of fuzzy sets in hypersoft contexts

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 2


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

introduces interpretative difficulties. Meanwhile, an Android malware detection system dependent on specific
permissions might overlook critical detection features. Lastly, while an IDS-EFS technique improves feature
selection, it risks losing crucial data, and comparisons of feature reduction methodologies may need to consider
dataset discrepancies adequately. Current malware detection and intrusion systems methodologies often
concentrate on specific feature selection or extraction models without adequately addressing the dynamic
behaviour of growing threats. Furthermore, there is a lack of comprehensive studies that compare the efficiency
of these techniques across various datasets, specifically in real-world scenarios. This gap emphasizes the
requirement for more robust, adaptable models incorporating diverse feature selection strategies to improve
detection performance.

The proposed method


This study develops a novel GWOANL-SMDC technique. The technique secures the software via the Android
malware recognition process. To accomplish that, the GWOANL-SMDC approach involves three different
procedures: NCM-based FS, ConvLSTM-based classification, and GWOA-based parameter tuning. Figure 1
demonstrates the entire flow of the GWOANL-SMDC method.

Feature selection using NCM


At the primary stage, the GWOANL-SMDC technique employs NCM for the FS process25. The GWOANL-
SMDC technique utilizes the NCM model for the feature selection process because it can efficiently handle
uncertainty and imprecision in data. Unlike conventional methods, the NCM model incorporates qualitative
and quantitative data, allowing for a more comprehensive understanding of complex feature relationships.
This methodology outperforms scenarios where data may be incomplete or ambiguous, making it specifically
appropriate for emotion detection tasks. Furthermore, the NCM model facilitates the visualization of feature
interdependencies, assisting in detecting key influences on the target variable. Its flexibility and adaptability to
diverse contexts additionally improve its merit over other feature selection methodologies, promoting enhanced
performance and interpretability of the model. Figure 2 illustrates the NCM model.
NL is an incorporation of paraconsistent logic, intuitionistic logic, three-valued logic, and fuzzy logic. Here,
the logical variables, including F , T, and I , represent the amount of falsehood, truth, and Indeterminacy. The
union and intersection of single‐finite elements, subsets, intervals, finite or infinite, real sub‐unitary subsets,
continuous or discrete, etc., exemplify the variables. Due to incomplete knowledge, NL attempts to catch the
inaccuracy from observers’ vagueness or uncertainty, thus making T , I , and F subsets. If the edge value of the
NL map is from the set {0,1, I}, representing truth (0), falsehood (1), and Indeterminacy (I) values. The reasons
behind making NL beneficial for detecting leaf diseases are (i) it shows that specific features are useful for the
system, which might be false in another system, and (ii) it shows Indeterminacy. The major variation between
NL and intuitionistic fuzzy logic lies in the distinguishing relative and absolute truth. NL is used to transform
logical statements into 3D neutrosophic space. The definition of NCM is given in the following:

(i): NCM is a directed graph representing the causal relation among the features.
(ii): If the node is a fuzzy set, each node in NCM is considered a fuzzy node.
(iii): Nodes in the graph are said to be a feature. Weight has been allocated to the directed edge between C i &
Cj nodes. The weight values lie within {−1,0, 1, I}.
(iv): According to the NCM, the adjacent matrix of neutrosophic N ( E ) is formed where N (E) = (eij ), and
the weight of the directed graph is represented as eij within {1, 0, −1, I} .
(v): Consider A = (a1, a2 . . . an) as instantaneous state, whereas aiϵ {0,1, I} if ai is in off condition, at that
time ai = 0; if ai is in on condition, at that point ai = 1; ai = I if ai is indeterminate.
−−−→ −−−→ −−→
(vi): Each edge of NCM is considered as C1C2, C2C3, . . . .CiCj . When the NCM possesses a directed cycle
followed by it is known as cyclic. Otherwise, it is known as acyclic.
(vii): In NCM, when there is feedback, viz., causal relationship over the cycle, after that, the system is dynamic.
−−−→ −−−→ −−→
(vii): Assume C1C2, C2C3, . . . .CiCj is a cycle, and when Ci is ON, if the causal relationship is through the
edges of the cycle after the dynamic system goes in circles to attain the equilibrium state, it is known as a
hidden form.
(ix): A neutrosophic state is called a set point when the equilibrium state of a dynamic system is a unique state
vector.
(x): The equilibrium state is known as an NCM limit cycle if NCM settles with NL state vector repeated in
series of A1 → A2 → . . . Ai → A1.
(xi): Grouping of a finite amount of NCMs viz., N (E) = N (E1) + N (E2) + . . . + N (En) may lead to the
joint effect of NCM.

In the NCM approach, the objectives are joined as a single main calculation for giving weight to identify all major
significance26. During this work, an FF can be executed an FF that joins both objectives of FS as represented in
(1).
 
|R|
F itness (X) = α · E (X) + β ∗ 1 − (1)
|N |

In which F itness (X) denotes the fitness rate of subset X, E (X) implies the classifier errors by employing the
selected features from the X separation, |R| and |N | denotes the number of elected features and the number

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 3


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 1. Overall flow of GWOANL-SMDC approach.

of novel features within the data, α and β signifies the weights of the classifier error and reduction ratio,
α ∈ [0,1] and β = (1 − α ).

ConvLSTM-based classification
The GWOANL-SMDC technique uses the ConvLSTM model27 for software malware detection. The GWOANL-
SMDC technique employs the ConvLSTM model for software malware detection due to its unique capability
to capture spatial and temporal data patterns. Unlike conventional methods that may concentrate solely on one

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 4


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 2. Architecture of NCM model.

dimension, the ConvLSTM model incorporates convolutional layers with LSTM units, making it specifically
efficient for analyzing sequences of images or binary representations of malware. This dual capability improves
the model’s performance in recognizing malware behaviour over time, which is significant for accurate detection.
Furthermore, the structure of the ConvLSTM model allows for effectual processing of complex data formats,
resulting in enhanced classification outcomes. Its robustness in handling varying input sizes and its adaptability
to different malware types justify its selection over conventional methods. Figure 3 portrays the architecture of
the ConvLSTM model.
The ConvLSTM units combine convolutional to fully connected LSTM (FC-LSTM) by exchanging the
weights with convolution filters. This mathematical expression of the ConvLSTM unit is summarized in Eqs. (2–
6), but the convolutions were executed at the weighted connections.
I = σ (WXI ∗ Xz + WHI ∗ Hz−1 + WCI ◦ Cz−1 + bI )(2)
Fz = σ (WXF ∗ Xz + WHF ∗ Hz−1 + WCF ◦ Cz−1 + bF )(3)
Cz = F · C + iz ◦ (WXC ∗ xz + WHC ∗ hz−1 + bc)(4)
Oz = σ (WXO ∗ Xz + WHO ∗ Hz−1 + Wco · Cz−1 + bo)(5)
Hz = O ◦ tanh (Cz )(6)

The output, input, cell, forget, and hidden layers (HLs) of all the timestep are demonstrated by O, I, C, F,
and H correspondingly, the activation by σ , and weight connections among layers by a group of weights, W
. The resultant layer regulates that several data have been propagated from the prior timestep, where the HL
comprises, data obtained by the next timestep and layer. The peephole connections enable the LSTM unit to
access and propagate data reported from the cell layer of the prior timestep.
If developing with images, the ConvLSTM network is more valuable than the FC-LSTM because it can
propagate spatial features temporally with every ConvLSTM layer. The FC‐LSTM can be regarded as a particular
instance of ConvLSTM, but the filters’ dimension is equivalent to the input image, and a single convolution
function was executed, such that every ConvLSTM unit shares similar parameters with every timestep.
The resolution of feature maps generated from the input is determined by the convolutional filters of the
input-to-hidden connections; the convolution filter sizes of hidden-to‐hidden connections define the aggregate

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 5


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 3. Architecture of ConvLSTM approach.

data the ConvLSTM unit gets from the prior timestep. The layer transition among timesteps for the ConvLSTM
unit can be taken as action among frames.

Hyperparameter tuning using GWOA


Finally, the GWOA-based parameter tuning method is used to improve the performance of the ConvLSTM
model28. The GWOA-based parameter tuning method is utilized to enhance the performance of the ConvLSTM
model due to its robust search capabilities and effectiveness in exploring the hyperparameter space. By replicating
whales’ social behaviour, the GWOA model efficiently balances exploration and exploitation, leading to the
detection of optimal parameter settings that enhance the model’s accuracy. This methodology is advantageous
in complex models such as ConvLSTM, where various hyperparameters can substantially impact performance.
Moreover, the capability of the GWOA model to escape local optima makes it more reliable than conventional
optimization techniques. Incorporating the GWOA technique accelerates the tuning process and results in
a more generalized model that can adapt to varying datasets, ultimately improving the efficiency of malware
detection tasks. Figure 4 demonstrates the workflow of the GWOA model.
To enhance the global search ability and convergence velocity of traditional WOA, an enhanced GSWOA
is developed dependent upon three strategies: variable spiral location upgrade, adaptive weight, and optimum
neighbourhood perturbation. At initial, the strategy of adaptive weight is to present a weight of adaptive inertia
built on the iteration count t into the whale location upgrade, and expressed as below:
  
π t
w (t) = 0.2cos · 1− (7)
2 tmax

where t refers to the present iteration amount, tmax specifies the highest iteration amount, and w (t) denotes the
weight of adaptive inertia, which has a value of [0 and 1].
As per Eq. (7), the weight value is smaller in the initial phase but varies rapidly; in the later phase, with the
high growth in the iteration count, the weight is big, but the alteration velocity is reduced low by enhancing the
algorithm convergence.
The location upgrade formulation of the enhanced WOA is
  
w (t) · X * (t) − A · C · X * (t) − X (t) , p < 0.5
X (t + 1) = (8)
w (t) · X * (t) + D · eblcos (2π l) , p ≥ 0.5
X (t + 1) = w (t) Xrand (t) − A · |C · Xrand (t) − X (t)|(9)

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 6


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 4. Workflow of the GWOA model.

Secondly, the strategy of variable spiral location upgrade denotes altering the constant b, which reflects the spiral
form within the bubble net attack phase, to an energetically altered variable dependent upon the iteration count,
and its mathematical expression is as follows:

b = e5· cos(π · (1− tmax ))(10)


t

From Eq. (10), it is realized that the spiral shape range is bigger in the system’s initial stage. The whale could hunt
for an optimizer in a greater array and has a sturdier global search capability by the growth of iteration count;
the spiral shape range turns so small, and then the whale can hunt in the smallest range to enhance the optimizer
accuracy. The location upgrade formulation of an improved WOA is

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 7


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

X (t + 1) = w (t) · X * (t) + bD · eblcos (2π 1)(11)

Lastly, the strategy of optimum neighbourhood perturbation is to enlarge the search range of the best position to
the neighbourhood of the present finest position once the whale location can upgrade and hunt the close space
concurrently rather than being restricted to an existing optimum position. With this method, the whale search
efficacy and the convergence velocity of the process can be improved. A mathematical formulation to generate
trouble in the neighbourhood of the present best position and produce a novel position as
 *
 X (t) + 0.5 · rand1 · X * (t) , rand2 < 0.5
X (t) = (12)
X * (t) , rand2 ≥ 0.5

Where X (t) specifies the produced novel position, rand1 and rand2 refer to the even random numbers that
value zero and one
The novel position has been saved if the produced novel position is higher or lower than the original location.
The formulation has been stated as:
   � 
X  (t) , f X (t) < f X * (t)
*   (13)
X (t) = � 
 X * (t) , f X * (t) ≤ f X  (t)

Here, f (x) signifies the fitness value if the position is x.


The GWOA approach grows an FF to increase higher classifier results. It expresses a positive integer to
suggest a good solution for candidate results. During this case, the decline of classifier errors is supposed to be
FF, as provided in Eq. (14).

f itness (xi) = Classif ier Error Rate (xi)


N o. of misclassif ied Instances (14)
= × 100
T otal no. of Instances

Performance validation
The simulation outcomes of the GWOANL-SMDC approach were assessed on the malware database29. It
contains 7500 instances under two classes, as demonstrated in Table 1.
Figure 5 portrays the confusion matrices attained by the GWOANL-SMDC approach under various epochs.
The experimental value implied that the GWOANL-SMDC model effectively recognizes the benign and malware
instances in 2 classes.
The malware detection outcome of the GWOANL-SMDC approach is provided in Table 2; Fig. 6. The
experimental value demonstrated that the GWOANL-SMDC approach reaches effective outcomes under two
classes. With 500 epochs, the GWOANL-SMDC methodology reaches average accuy , precn, recal , Fscore ,
and MCC of 99.09%, 98.79%, 99.19%, 98.98%, and 97.98%, correspondingly. In addition, with 1000 epochs, the
GWOANL-SMDC method obtains average accuy , precn, recal , Fscore , and MCC of 99.17%, 98.80%, 99.17%,
98.98%, and 97.97%, correspondingly. Moreover, with 2000 epoch, the GWOANL-SMDC method reaches
average accuy , precn, recal , Fscore , and MCC of 98.84%, 98.39%, 98.84%, 98.61%, and 97.23%, correspondingly.
The performance of the GWOANL-SMDC approach is projected in Fig. 7 in the procedure of training
accuracy (TRAAC) and validation accuracy (VALAC) outcomes at 1000 epochs. The outcome exposes valuable
analysis of the GWOANL-SMDC approach under various counts of epochs, depicting its learning method and
generalized abilities. Noticeably, the result implies steady development from the TRAAC and VALAC with
maximum epochs. It ensures the adaptive nature of the GWOANL-SMDC technique in the pattern detection
method on both data. The maximum trend in VALAC reviews the GWOANL-SMDC technique’s ability to
fine-tune the TRA data and provide the correct classifier on unnoticed data, representing strong generalization
capabilities.
Figure 8 illustrates the training loss (TRALS) and validation loss (VALLS) curves of the GWOANL-SMDC
technique at 1000 epochs. The progressive decrease in TRALS emphasizes the GWOANL-SMDC technique
optimizer of the weights and decreases the classifier error on both data. The outcome inferred precise data as the
GWOANL-SMDC approach linked with the TRA data highlighted its ability to capture patterns from both data.
The GWOANL-SMDC approach continually enhances its parameters to diminish the differences between the
predictive and actual TRA class labels.
Analyzing the PR curve, as represented in Fig. 9, the outcomes assured that the GWOANL-SMDC approach
progressively attains improved PR rates with two classes at 1000 epochs. It controls the improved proficiency

Class No. of Samples


Benign 5000
Malware 2500
Total Number of Samples 7500

Table 1. Detailed database.

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 8


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 5. Confusion matrices of GWOANL-SMDC technique (a-d) Epochs 500–2000.

abilities of the GWOANL-SMDC method in detecting two classes, representing proficiency in the class
detections.
Besides, in Fig. 10, ROC curves attained by the GWOANL-SMDC approach are exposed in the classifier of
2 labels at 1000 epochs. This offers a comprehensive meaning of the tradeoff between TPR and FRP at various
detection threshold rates and counts of epochs. The outcome demonstrates the superior classifier outcomes of
the GWOANL-SMDC model in two classes, representing the solution for addressing various classifier problems.
The comparative malware detection outcome of the GWOANL-SMDC technique is given in Table 330.
Concerning accuy , the GWOANL-SMDC technique provides an improved accuy of 99.17%, but the J48,
RF, DT, SMO, logistic, and AAMD-OELAC approaches have obtained lesser accuy values of 96.86%, 97.87%,
94.68%, 96.47%, 96.38%, and 98.97%, correspondingly. Additionally, based on Fscore , the GWOANL-SMDC
methodology provides a higher Fscore of 98.98%. At the same time, the J48, RF, DT, SMO, logistic, and AAMD-

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 9


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Classes Accuy Precn Recal FScore MCC


Epoch − 500
Benign 99.09 99.74 98.90 99.32 97.98
Malware 99.09 97.84 99.48 98.65 97.98
Average 99.09 98.79 99.19 98.98 97.98
Epoch − 1000
Benign 98.94 99.70 98.94 99.32 97.97
Malware 99.40 97.91 99.40 98.65 97.97
Average 99.17 98.80 99.17 98.98 97.97
Epoch − 1500
Benign 98.86 99.68 98.86 99.27 97.83
Malware 99.36 97.76 99.36 98.55 97.83
Average 99.11 98.72 99.11 98.91 97.83
Epoch − 2000
Benign 98.60 99.54 98.60 99.07 97.23
Malware 99.08 97.25 99.08 98.16 97.23
Average 98.84 98.39 98.84 98.61 97.23

Table 2. Malware detection result of GWOANL-SMDC technique under various epochs.

Fig. 6. Average result of GWOANL-SMDC technique under various epochs.

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 10


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 7. Accuy curve of GWOANL-SMDC technique under 1000 epochs

Fig. 8. Loss curve of GWOANL-SMDC technique under 1000 epochs.

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 11


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 9. PR curve of GWOANL-SMDC technique at 1000 epochs.

OELAC approaches have obtained minimal Fscore values of 97.29%, 96.67%, 97.85%, 96.31%, 96.63%, and
98.44%, respectively.
Table 4; Fig. 11 give the comparative time cost (TC) analysis of the GWOANL-SMDC methodology.
Based on CT, the GWOANL-SMDC methodology provides a lesser TC of 0.52s, while the J48, RF, DT, SMO,
logistic, and AAMD-OELAC models have obtained higher TCs of 1.94s, 2.42s, 8.98s, 10.24s, 2.65s, and 1.61s,
correspondingly.
These performances ensured the enhanced detection outcomes of the GWOANL-SMDC approach.

Conclusion
In this study, a novel GWOANL-SMDC methodology is developed. The GWOANL-SMDC methodology
secures the software via the Android malware recognition process. To accomplish that, the GWOANL-SMDC
technique encompasses three different processes: NCM NCM-based FS, ConvLSTM-based classification, and
GWOA-based parameter tuning process. Initially, the GWOANL-SMDC methodology employs NCM for the FS
process. For software malware detection, the GWOANL-SMDC technique uses the ConvLSTM model. Finally,
the GWOA-based parameter tuning procedure is used to boost the performance of the ConvLSTM model. The
experimental results of the GWOANL-SMDC methodology can be assessed using the malware dataset. The
results ensured that the GWOANL-SMDC technique improved its capability to detect software malware.

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 12


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 10. ROC curve of GWOANL-SMDC technique under 1000 epochs.

Technique Accuy Precn Recal FScore FPR


J48 96.86 95.25 97.47 97.29 0.0307
Random Forest 97.87 96.67 97.29 96.67 0.0223
Decision Table 94.68 91.68 97.60 97.85 0.0427
SMO 96.47 94.66 97.13 96.31 0.0312
Logistic 96.38 94.47 97.81 96.63 0.0256
AAMD-OELAC 98.97 98.55 98.97 98.44 0.0218
GWOANL-SMDC 99.17 98.80 99.17 98.98 0.0209

Table 3. Comparative outcome of GWOANL-SMDC technique with recent models.

Technique Time costs (s)


J48 1.94
Random Forest 2.42
Decision Table 8.98
SMO 10.24
Logistic 2.65
AAMD-OELAC 1.61
GWOANL-SMDC 0.52

Table 4. TC analysis of GWOANL-SMDC methodology with recent methods.

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 13


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

Fig. 11. TC outcome of GWOANL-SMDC methodology with other methods.

Data availability
The datasets used and analyzed during the current study are available from the corresponding author upon
reasonable request.

Received: 23 May 2024; Accepted: 16 October 2024

References
1. Zhao, Y. et al. On the impact of sample duplication in machine-learning-based android malware detection. ACM Trans. Softw. Eng.
Methodol. 30(3), 1–38 (2021).
2. Bayazit, E. C., Sahingoz, O. K. & Dogan, B. Deep learning based malware detection for android systems: A comparative analysis.
Tehnički vjesnik 30(3), 787–796 (2023).
3. Rathore, H., Nandanwar, A., Sahay, S. K. & Sewak, M. Adversarial superiority in android malware detection: Lessons from
reinforcement learning based evasion attacks and defenses. Forens. Sci. Int. Digit. Invest. 44, 301511 (2023).
4. Ibrahim, M., Issa, B. & Jasser, M. B. A method for automatic android malware detection based on static analysis and deep learning.
IEEE Access 10, 117334–117352 (2022).
5. Hammood, L., Doğru, İA. & Kılıç, K. Machine learning-based adaptive genetic algorithm for android malware detection in auto-
driving vehicles. Appl. Sci. 13(9), 5403 (2023).
6. Bhat, P. & Dutta, K. A multi-tiered feature selection model for Android malware detection based on feature discrimination and
information gain. J. King Saud. Univ. Comput. Inf. Sci. 34(10), 9464–9477 (2022).
7. Wang, D., Chen, T., Zhang, Z., & Zhang, N. A survey of Android malware detection based on deep learning, In Proceedings of the
International Conference on Machine Learning and Cyber Security. Cham, Switzerland: Springer, 2023, pp. 228–242.
8. Zhu, H.-J., Gu, W., Wang, L.-M., Xu, Z.-C. & Sheng, V. S. Android malware detection based on multi-head squeeze-and-excitation
residual network. Expert Syst. Appl. 212, 118705 (2023).
9. Wang, H., Zhang, W. & He, H. You are what the permissions told me! android malware detection based on hybrid tactics. J. Inf.
Secur. Appl. 66, 103159 (2022).
10. Albakri, A., Alhayan, F., Alturki, N., Ahamed, S. & Shamsudheen, S. Metaheuristics with deep learning model for cybersecurity
and android malware detection and classification. Appl. Sci. 13(4), 2172 (2023).
11. Madhloom, J. K., Noori, Z. H., Ebis, S. K., Hassen, O. A. & Darwish, S. M. An information security engineering framework for
modeling packet filtering firewall using neutrosophic petri nets. Computers 12(10), 202 (2023).

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 14


Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/

12. Yasser, I., Abd El-Khalek, A.A., Twakol, A., Abo-Elsoud, M.E., Salama, A.A. and Khalifa, F. A hybrid automated intelligent
COVID-19 classification system based on neutrosophic logic and machine learning techniques using chest X-Ray images. In
Advances in Data Science and Intelligent Data Communication Technologies for COVID-19: Innovative Solutions Against COVID-19,
pp.119–137. (2022).
13. Mazarbhuiya, F. A. & Shenify, M. An intuitionistic fuzzy-rough set-based classification for anomaly detection. Appl. Sci. 13(9),
5578 (2023).
14. Rahman, A.U., Saeed, M., Alburaikan, A. & Khalifa, H.A.E.W. An intelligent multiattribute decision-support framework based on
parameterization of neutrosophic hypersoft set. Comput. Intell. Neurosci., (2022).
15. Kadali, D.K., Mohan, R.N.V. & Naik, M.C. Enhancing crime cluster reliability using neutrosophic logic and a three-Stage Model. J.
Eng. Sci. Technol. Rev., 16(4). (2023).
16. Shaban, W.M. Classification of breast cancer using neutrosophic techniques and deep neural network. (2021).
17. Jennifer, J. S. & Sharmila, T. S. A neutrosophic set approach on chest x-rays for automatic lung infection detection. Inf. Technol.
Control 52(1), 37–52 (2023).
18. Alomari, E. S. et al. Malware detection using deep learning and correlation-based feature selection. Symmetry 15(1), 123 (2023).
19. Şahin, D.Ö., Kural, O.E., Akleylek, S. and Kılıç, E., 2023. A novel Android malware detection system: adaption of filter-based
feature selection methods. J. Amb. Intell. Human. Comput. pp.1–15.
20. Akhiat, Y., Touchanti, K., Zinedine, A. & Chahhou, M. IDS-EFS: Ensemble feature selection-based method for intrusion detection
system. Multimed. Tools Appl. 83(5), 12917–12937 (2024).
21. Ngo, V. D., Vuong, T. C., Van Luong, T. & Tran, H. Machine learning-based intrusion detection: Feature selection versus feature
extraction. Cluster Comput. 27(3), 2365–2379 (2024).
22. Varzaneh, Z. A. & Hosseini, S. An improved equilibrium optimization algorithm for feature selection problem in network intrusion
detection. Sci. Rep. 14(1), 18696 (2024).
23. Li, J., Othman, M. S., Chen, H. & Yusuf, L. M. Optimizing IoT intrusion detection system: Feature selection versus feature
extraction in machine learning. J. Big Data 11(1), 36 (2024).
24. Eljialy, A. E. M., Uddin, M. Y. & Ahmad, S. Novel framework for an intrusion detection system using multiple feature selection
methods based on deep learning. Tsinghua Sci. Technol. 29(4), 948–958 (2024).
25. Shadrach, F. D. & Kandasamy, G. Neutrosophic cognitive maps (NCM) based feature selection approach for early leaf disease
diagnosis. J. Amb. Intell. Human. Comput. 12, 5627–5638 (2021).
26. Mafarja, M., Thaher, T., Al-Betar, M.A., Too, J., Awadallah, M.A., Abu Doush, I. & Turabieh, H. Classification framework for faulty-
software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. Appl.
Intell., pp.1–43 (2023).
27. Medel, J.R., & Savakis, A., Anomaly detection in video using predictive convolutional long short-term memory networks. arXiv
preprint: arXiv:1612.00390. (2016).
28. Hu, Q. et al. Time-frequency fusion features-based GSWOA-KELM model for gear fault diagnosis. Lubricants 12(1), 10 (2024).
29. Andro-AutoPsy. Accessed: Feb. 12, 2023. [Online]. Available: https:// ocslab.hksecurity.net/andro-autopsy
30. Alamro, H., Mtouaa, W., Aljameel, S., Salama, A.S., Hamza, M.A. & Othman, A.Y. Automated android malware detection using
optimal ensemble learning approach for cybersecurity. IEEE Access. (2023).

Author contributions
Dr. M.A. have all contributions in this manuscript.

Funding
This study is supported by Prince Sattam bin Abdulaziz University Project Number (PSAU/2024/R/1445).

Declarations

Competing interests
The authors declare no competing interests.

Additional information
Correspondence and requests for materials should be addressed to M.A.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and
your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by-nc-nd/4.0/.

© The Author(s) 2024

Scientific Reports | (2024) 14:25383 | https://doi.org/10.1038/s41598-024-76770-7 15


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like