1-Malicious Software Identification Based on Deep Learning Algorithms and API Feature Extraction
1-Malicious Software Identification Based on Deep Learning Algorithms and API Feature Extraction
https://doi.org/10.1186/s13635-025-00197-4
Information Security
Abstract
With the popularization of mobile Internet, the Android operating system has become the main target of malware
attacks because of its openness. Traditional malware detection methods face challenges in handling complex feature
representations, especially in utilizing the semantic information and call order of application programming interface
call sequences. Therefore, this study develops a deep learning method to identify malicious software by analyzing
the application programming interface calls and constructing heterogeneous graphs of Android applications. The
results showed that the proposed method achieved accuracies of 92.80% and 94.24% on the Drebin and Andro-
Zoo datasets, demonstrating excellent robustness and generalization ability. The ablation experiment showed
that the accuracy of the complete model was 94.71%, verifying the key role of each part of the method. In com-
parison with existing methods, the proposed method led with an average accuracy of 94.27%, while maintaining
detection time within 5–10 s, demonstrating high efficiency and practicality. This study contributes to the in-depth
exploration of semantic information and behavioral patterns of application programming interface call sequences.
The efficient malware identification method developed can cope with the constantly evolving malware threats.
Keywords Malicious software detection, Deep learning algorithms, Application programming interface, Feature
extraction techniques, Relational Graph Convolutional Network
© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or
parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To
view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 2 of 15
meet the need for rapid detection of large quantities of API feature extraction, heterogeneous graph construc-
malicious software [4]. In contrast, static analysis directly tion, and model architecture design. The fourth section
builds detection models based on the code structure and describes the experimental setup, evaluation index, and
call characteristics of malicious software, which is effi- analysis of experimental results. The fifth section dis-
cient and applicable. However, existing static analysis cusses the limitations of the research and the direction of
methods face many challenges when dealing with com- future work. The sixth section summarizes the main con-
plex feature representations. For example, traditional clusions of the study.
grammar feature expressions are difficult to fully char-
acterize the behavioral patterns of malicious software. In 2 Related works
addition, most detection methods do not fully utilize the In recent years, the complexity and diversity of mali-
semantic information and call order of Application Pro- cious software have significantly increased, especially
gramming Interface (API) call sequences, making it dif- malicious attacks targeting mobile platforms and smart
ficult to effectively capture deep behavioral features [5]. devices, posing a huge challenge to traditional signa-
Existing methods still have significant room for improve- ture-based detection methods. In response to this issue,
ment in terms of data dependency and model robustness. numerous studies have proposed innovative methods
Given this, this study proposes an MSD method based on based on Machine Learning (ML) and DL to enhance
DL algorithms and API feature extraction. This method detection and protection capabilities. Ban et al. inves-
involves the extraction of API call sequences and entry tigated the current situation of new malware variants
functions from Android Package (APK) files, resulting evading ML and DL-based malware classifiers through
in the construction of heterogeneous graph features that adversarial examples and proposed a method to bypass
are imbued with semantic and positional information. classifiers using black-box attacks. It used various pertur-
This method utilizes heterogeneous Graph Convolu- bation techniques to generate adversarial examples that
tional Networks (GCN) to aggregate semantic informa- were executable and retained malicious behavior. This
tion and identify malicious software through classifiers. method achieved evasion rates of 65.6% and 99% for mul-
The research objectives are to improve the semantic tiple detectors [6]. Mercaldo et al. proposed a method
understanding ability of API call sequences, optimize fea- for detecting Android malware and identifying its family
ture expression, and enhance the generalization ability of through audio signal processing technology. This method
malware detection. Based on the limitations of existing converted the executable file of the application into an
static analysis methods in feature representation, API call audio file, extracted numerical features from it, and con-
sequence utilization, and model robustness, a heteroge- structed multiple ML models. This method achieved an
neous GCN-based malware detection model is proposed accuracy of 0.952 in MSD and 0.922 in family identifi-
for Android malware static detection tasks. In addition, cation [7]. Tuncer et al. analyzed the threat of network
the heterogeneous graph with embedded semantic and attacks to information security and proposed a malicious
location information is constructed by extracting the API software classification method based on image process-
call sequence and entry function to improve detection ing technology. It utilized local binary patterns, singular
accuracy. value decomposition, and local ternary pattern networks
The contribution of this research is to propose an for feature extraction, and combined principal compo-
Android malware detection method based on heteroge- nent analysis and linear discriminant analysis to opti-
neous GCNs, which can improve the recognition ability mize classification. The classification success rate of this
of complex malicious behaviors by modeling the seman- method was 88.08% [8]. Ali et al. proposed a multi-task
tic information and call order of API call sequences. DL method based on Long Short-Term Memory (LSTM)
Location coding is introduced to enhance API call rela- models, which could simultaneously detect the benign
tionship modeling, capture the context features of API and malignant nature of traffic and identify malicious
call order, and improve the robustness and generalization software types. It utilized a large-scale dataset generated
ability of the detection model. Experiments on Drebin by 18 Internet of Things (IoT) devices, extracted features
and AndroZoo datasets show that the proposed method through time series analysis, and optimized model per-
is superior to existing malware detection methods in formance. This method achieved the highest accuracy of
terms of detection accuracy, generalization ability, and 95.83% in multi-task classification [9].
computational efficiency. The widespread application of IoT devices and the
The rest contents of the research are organized as six continuous upgrading of network threats have led
sections. The second section reviews the research pro- to in-depth research on MSD and defense. Multiple
gress of malware detection. The third section introduces research focuses on exploring optimization algorithms,
the proposed detection methods in detail, including feature extraction, and multi-task learning to address
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 3 of 15
the complex behavior and dynamic changes of mali- resource constraints faced by edge devices of the IoT. By
cious software. Kaithal et al. proposed a decision tree selecting features, detecting threats, and optimizing algo-
model based on African vulture optimization, aiming rithms to adjust hyperparameters, the detection accuracy
to improve the accuracy and optimization efficiency of has been improved to 98.22%, optimizing the security
MSD. It used static and dynamic detection methods to performance of IoT edge devices [16]. In response to
train the dataset and developed a prevention and detec- the increasingly serious global problem of cyber attacks,
tion system to address the optimization challenges of Swaminathan A et al. proposed to use supervised ML
multidimensional problems. This model could effectively methods to analyze cyber crimes and predict the char-
detect and predict malicious software activity [10]. Kim acteristics of attack means and attackers. A compara-
et al. proposed a detection method based on various ML tive analysis of the efficacy of these three algorithms in
models such as gradient enhancement and Random For- various models was conducted to enhance the threat
est (RF) to address the threat of adversarial malware to recognition capability of the network security system.
smart devices and analyzed the characteristics of adver- This analysis resulted in the optimization of the system,
sarial malware in the dataset. This method improved enabling it to proactively prevent threats and respond to
classification performance through data normalization malicious behaviors in real time [17].
and preprocessing and generated two new datasets. The In summary, research in MSD has made signifi-
prediction accuracy of the gradient enhancement model cant progress in recent years, but there is still room for
reached 88% [11]. Habtor et al. proposed an ML-based improvement in modeling the order of function calls and
malware assessment framework to address the grow- semantic context, as well as the depth and generalization
ing trend of ransomware. This study integrated the data performance of feature extraction. Therefore, based on
processing module and decision-making module and the DL algorithm, this study proposes an MSD method
optimized the MSD process through methods such as with API feature extraction as the core. The innovation
grayscale images and Opcode n-grams. Based on mul- of the research lies in introducing the encoding of API
tiple classifiers, this method had high accuracy in MSD call location information and call order into heterogene-
and classification [12]. Jhansi K S et al. studied the role of ous graph features, enhancing the expressive power of
API call features in Android MSD, proposed three popu- semantic information. In addition, a multi-level GCN is
lation optimization methods to reduce feature space, and designed to fully explore and utilize the multi-type node
constructed a hybrid artificial neural classifier. By opti- information of APIs, entry functions, and applications,
mizing and retaining only 7 key features, a high classifi- improving the efficiency and accuracy of software feature
cation accuracy could be achieved, providing an efficient extraction.
solution for improving MSD [13].
Vaiyapuri et al. proposed a network security technol- 3 Methods and materials
ogy that is based on improved reptile search optimiza- This section mainly introduces the MSD method, includ-
tion and integrated DL. This technology was designed ing two subsections. The first subsection introduces
to address the network intrusion risks brought about by the specific process of API feature extraction and het-
automation and intelligence in the industrial IoT envi- erogeneous graph construction. The second subsection
ronment. It also addressed the insufficient ability of tra- provides a detailed description of the design and imple-
ditional intrusion detection systems to detect complex mentation of a DL-based feature extraction and classifi-
attacks. Meanwhile, the improved gray wolf optimiza- cation model.
tion algorithm was used for hyperparameter adjustment.
Therefore, the identification accuracy of network attacks 3.1 API feature extraction and heterogeneous graph
was improved [14]. The IoT is under threat of botnet construction
attacks, and the data set is complex. The cost of hardware In MSD, API call sequences can accurately reflect the
is high, and memory is limited in the IoT environment. behavioral characteristics of programs, making them
Manimaran et al. proposed a mixed principal compo- an important means of characterizing the behavior pat-
nent analysis and autoencoder approach to reduce the terns of malicious software [18, 19]. However, traditional
dimensionality and information content of IoT network methods often only extract static features of API calls,
traffic data in response to these issues. The accuracy, pre- but ignore the expressive power of call order and contex-
cision, recall rate, and F1 score were increased to 96.51%, tual information, making it difficult to capture the poten-
100%, 96.51%, and 98.22%, respectively [15]. Alotaibi tial complex behavioral patterns of malicious software
et al. proposed a method combining the flamingo search [20, 21]. Therefore, this study proposes an MSD method
optimization algorithm based on inverse Chi-square dis- based on the DL algorithm and API feature extrac-
tribution with ML to address the security threats and tion. This method focuses on API feature extraction and
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 4 of 15
combines the DL algorithm to effectively identify mali- generating location codes, and embedding codes into
cious software. The method framework is shown in Fig. 1. edge features of graph structures. For each APK file, the
In Fig. 1, firstly, this method parses the Android APK first step is to extract the function call sequence and
file and extracts the function call sequence and entry identify the entry function. Subsequently, starting from
function from the classs.dex file, and generates the the entry function, a depth-first search is adopted to trav-
smali file through a decompilation tool. Step 2 involves erse the function call relationship, extract the Android
the extraction of the API call sequence, followed by API call sequence, and disconnect the call loop to avoid
the implementation of semantic embedding on this interference. Finally, each APK file is converted into an
sequence. This is then used to generate the location infor- entry function and its corresponding ordered API call
mation encoding of the API, which is achieved by com- sequence. After obtaining the API call sequence, this
bining the entry function information. The next step is to study adopts the sine-cosine position encoding method
construct a heterogeneous graph containing APP nodes, to represent the information of the call location. Assum-
entry function nodes, and API call nodes based on the ing pt is the encoding vector at position t in the sequence,
extracted API call and location information, and embed the i -th dimension of the encoding is represented by
semantic and sequential features of edges. The final step Eq. (1).
is to extract graph features through heterogeneous GCN,
sin (ωk · t), if i = 2k
generate high-dimensional feature representations that pti = (1)
can be used for classification, and use classifiers to detect cos (ωk · t), if i = 2k + 1
and distinguish malicious software from benign software.
In Eq. (1), k is a non-negative integer and ωk is a fre-
The Androguard tool is used to parse APK files and
quency parameter, as shown in Eq. (2).
extract the information of custom methods in Dalvik exe-
cutable files. Through static analysis, the calling block of 1
the method is obtained, the calling instructions are ana-
ωk = 2k (2)
10000 d
lyzed, and the function name and calling relationship are
extracted. To optimize memory, functions are numbered In Eq. (2), d is the total dimension of the encoding
in order of first occurrence, and call pair relationships are vector. The sine and cosine functions generate position
recorded. Finally, the function call graph is constructed encoding at different frequencies, ensuring that each
to support API feature extraction and behavior analysis. position has a unique encoding representation while
The process of obtaining Android API location codes keeping the Euclidean distance between adjacent posi-
involves three stages: extracting API call sequences, tions in the sequence close. The final generated encoding
vector pt is shown in Eq. (3).
(3)
pt = sin (ω1 · t), cos (ω1 · t), sin (ω2 · t), cos (ω2 · t), ..., sin ωd/2 · t , cos ωd/2 · t
Position coding is applied to both node representation method has good numerical stability and is not prone to
and edge features. In the node representation, a unique gradient vanishing or gradient exploding problems dur-
code is assigned to each API call through the use of sine ing graph convolution calculations, making the model
and cosine functions. This results in the appearance of more robust and generalizable in malware detection
the same API as being semantically different at different tasks. After completing the API location encoding, this
call locations. Consequently, the model is able to distin- study extracts the features of three types of nodes in het-
guish between normal and malicious behavior patterns. erogeneous graphs, as shown in Fig. 2.
In the edge feature, location coding not only represents In Fig. 2, the three types of nodes are application nodes,
the connection relationship of API calls but also incor- starting nodes, and system function nodes. The applica-
porates the call order information, thereby enabling the tion nodes are abstracted from Android applications. The
model to learn the time feature of the behavior pattern. initial features are set as 100-dimensional all-one vectors
In the process of graph convolution, position encod- to represent the global characteristics of the application.
ing and edge features jointly affect the aggregation of Through the feature aggregation mechanism of hetero-
information, making node updates not only depend- geneous GCN, application nodes collect information
ent on their neighbors but also constrained by the call- from the starting node and update it to global semantic
ing sequence. For example, the same API calls generate features. The starting node extracts the custom functions
different feature representations in different sequences, that have not been called from the function call graph,
allowing the model to more accurately identify unusual initializes them as a 100-dimensional all-one vector, and
call patterns, such as data theft and unauthorized access. represents the starting point of the application logic. By
Therefore, the introduction of location coding signifi- aggregating the characteristics of system function nodes,
cantly improves the model’s ability to model the API call the starting node is dynamically updated to a feature
sequence, thereby improving the accuracy and robust- representation with more behavioral patterns. The sys-
ness of malware detection [22]. tem function node is based on Android API strings and
It is worth noting that the study uses the sine-cosine is embedded as a low dimensional dense vector using
position encoding method to enhance the temporal Word2Vec’s continuous bag of words model, capturing
information modeling capability of API call sequences. the semantic characteristics of the system function and
Compared with learnable position embeddings and providing support for feature updates of the starting
RNN-based encoding, sine-cosine encoding does not node [25]. The final application behavior heterogeneous
require additional parameter learning, can avoid overfit- graph constructed on the basis of three types of nodes is
ting problems, and encodes the relative positions of API shown in Fig. 3.
calls with a fixed periodicity, thereby effectively captur- In Fig. 3, the heterogeneous behavior graph of the
ing long-term dependencies [23, 24]. In addition, this application is constructed by application nodes, starting
nodes, and system function nodes. By combining edge to the lack of deep mining and dynamic association mod-
position information and call order, it comprehensively eling of multi-level features, it is difficult to fully capture
characterizes the global and local behavior characteris- complex behavioral patterns and their semantic rela-
tics of Android applications. The application node serves tionships, which is insufficient to achieve accurate iden-
as the global core, responsible for integrating the feature tification of malicious software. Therefore, further deep
information of the starting node and dynamically updat- mining and semantic aggregation of multi-level features
ing the global behavior representation. The starting node in heterogeneous graphs are needed to comprehensively
serves as the logical starting point, connecting the appli- characterize the behavioral patterns of applications [26,
cation node and the system function node. On the one 27]. This study utilizes the DL algorithm to extract and
hand, it transmits local behavioral characteristics to the classify heterogeneous graph features, achieving precise
application node, and on the other hand, it aggregates differentiation between malicious software and benign
semantic information from the system function node. software. This study chooses Relationship GCN (R-GCN)
System function nodes exist as function providers, and as the fundamental algorithm for feature extraction in
their features are generated by embedding models to heterogeneous graphs. R-GCN classifies and aggregates
generate low dimensional dense vectors, accurately rep- node features in the graph through relationship types,
resenting the semantics and behavioral patterns of APIs. and its node update process is shown in Fig. 4.
Through the hierarchical design and feature interaction In Fig. 4, the node update of R-GCN first calculates
of three types of nodes, this heterogeneous graph focuses the feature aggregation of its neighboring nodes sepa-
on API feature extraction, which can provide key support rately for each relationship type and then obtains the
for modeling Android application behavior and precise feature update representation of the target node through
detection of malicious software. a specific aggregation method. Finally, the aggregated
results of different relationships are added together,
3.2 DL‑based heterogeneous graph feature extraction along with the updated features of the target node itself,
and classification and the updated features of the target node are gener-
Although simple API feature extraction and the construc- ated through an activation function. Taking relationship
tion of heterogeneous graphs of application behavior can r as an example, for each target node i , the features of
provide rich behavioral and semantic information. Due neighboring nodes from relationship r are first mapped
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 7 of 15
(l)
and aggregated through weight matrix Wr , as shown in node itself. σ is the ReLU activation function. However,
Eq. (4). since the standard R-GCN model is only applicable to
heterogeneous graphs that do not contain edge fea-
(l+1)
� � 1 (l) (l) (l) tures, the constructed heterogeneous graphs contain
hi = σ Wr(l) hj + W0 hi (4) important features such as edge position information.
r ci,r
r∈R j∈Ni
Therefore, this study improves the process of R-GCN to
(l) adapt to the proposed application behavior heteroge-
In Eq. (4), hi is the eigenvector of the target node i
(l+1) neous graph. The feature extraction process of applica-
input at layer l , and hi is the updated eigenvector of
tion behavior heterogeneous graph based on improved
that node at layer l + 1. R is the set of all relationships
R-GCN is shown in Fig. 5.
in a heterogeneous graph. Nir is a set of neighboring
In Fig. 5, the feature extraction process includes mul-
nodes that are connected to node i and have a relation-
tiple heterogeneous graph convolution operations.
ship of r . ci,r is a normalization constant, usually equal
(l) Firstly, through the first heterogeneous graph convolu-
to the number of neighboring nodes. W0 is a weight
tion operation, the semantic and positional information
matrix used to update the characteristics of the target
Fig. 5 Feature extraction flow of application behavior heterogeneous graph based on improved R-GCN
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 8 of 15
of the application nodes are transmitted to the starting input data and target label. In forward propagation, input
node. Subsequently, the starting node convolves again data are passed layer by layer to generate predicted out-
to pass the aggregated features to the system function puts, which are compared with real labels to calculate loss
nodes, completing the layer-by-layer aggregation of values. Subsequently, it enters the backpropagation stage,
features between nodes. Finally, through the feature where parameters are adjusted using a gradient descent
readout module, the global features of system function algorithm based on the loss value, and weights and biases
nodes are used as inputs for classification tasks. The are updated layer by layer to minimize errors. After each
key to the improved R-GCN lies in the full utilization of round of training, performance will be evaluated. If the
edge features, and the update process of node features accuracy requirements are met, the parameters will be
is shown in Eq. (5). saved and the training will end. Otherwise, iterating will
be continued until the preset goal is achieved. When the
(l+1)
� � 1 �
(l)
�
(l) (l) feature extraction approaches the optimal performance
hi = σ Wr(l) hj + eji + W0 hi through alternating forward and reverse directions, the
r ci,r
r∈R j∈Ni
extracted global features are input into the classifier to
(5) achieve accurate classification of malicious software and
In Eq. (5), eji is the edge feature, which is the posi- benign software. Therefore, the pseudo-code of malware
tional encoding of the API. When edge features are identification based on the DL algorithm and API fea-
directly added to the features of neighboring nodes, each ture extraction proposed in the final research is shown in
edge can not only convey the relationship information Fig. 7.
between nodes but also reflect the positional semantic
differences of the API. Therefore, even for the same API 4 Results
node, its semantic contribution to the target node will To verify the superiority of the MSD method based on
change in different applications due to different posi- the DL algorithm and API feature extraction, this study
tional encodings, thereby enhancing the model’s ability conducts extensive experiments on this method from dif-
to represent complex behavioral patterns. The training ferent aspects. Before starting the experiment, this study
process for feature extraction of application behavior het- first configures the experimental environment and sets
erogeneous graph based on improved R-GCN is shown the experimental parameters, as shown in Table 1.
in Fig. 6. In Table 1, the research selects a three-layer R-GCN for
In Fig. 6, the training process is divided into two stages: feature extraction to balance the expressiveness and com-
forward propagation and backward propagation. The putational complexity of the model. In malware detection
first step is to initialize the parameters and determine the tasks, API call patterns have multi-level dependencies.
Fig. 6 Training process for feature extraction of heterogeneous graphs based on improved R-GCN
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 9 of 15
Experimental configuration
Category Specific configuration Category Specific configuration
CPU Intel Xeon E5-2698 v4, 2.2 GHz, 16 cores GPU NVIDIA Tesla V100, 32 GB VRAM
Memory 64 GB DDR4 Storage 2 TB SSD
Operating system Ubuntu 20.04 LTS DL framework PyTorch 2.0
Graph Neural Network Library DGL 1.1.0 Programming language Python 3.9
Experimental parameter settings
Category Specific parameters Category Specific parameters
Embedding dimension 128 Number of heterogeneous 3 layers
graph convolution layers
Node feature dimension per layer 128 for the first layer, 64 for the second, 32 Learning rate 0.001
for the third
Weight decay 0.0001 Dropout 0.5
Position encoding dimension 64 Edge feature dimension 64
Batch size 64 Maximum training epochs 200
Shallow GCN (such as layers 1 and 2) can only capture to the convergence of all nodes’ characteristics and sub-
local neighborhood information, making it difficult to sequent impact on classification ability. Therefore, the
learn complex call patterns. The deep GCN (such as layer three-layer GCN can not only effectively capture the
4 and above) may lead to excessive smoothing, leading global behavior characteristics but also maintain the
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 10 of 15
effectiveness of information transmission, and the cal- Four common ML models, Support Vector Machine
culation cost is relatively controllable. Furthermore, the (SVM), RF, LSTM, and CNN, are compared with DL
embedding dimension (128) and the location encod- models, as well as traditional R-GCN and research meth-
ing dimension (64) are balanced, ensuring a trade-off ods. The results in the test set of two datasets are shown
between information richness and computational cost. in Fig. 8.
The learning rate (0.001) and weight decay (0.0001) are In Fig. 8a, on the Drebin dataset with a small sample
tuned to promote stable convergence and prevent over- size and clear features, the accuracy and F1 score of the
fitting, while Dropout (0.5) is employed to enhance gen- research method are 92.80% and 91.30%, which are 1.30%
eralization. The batch size (64) and maximum training and 1.20% higher than traditional R-GCN. This indicates
rounds (200) balance training efficiency and model effec- that traditional methods have been able to better capture
tiveness, while the edge feature dimension (64) is used to the characteristics of malicious software. The research
enhance API call relationship modeling. method further enhances semantic expression ability
Based on Table 1, this study selects Drebin and Andro- through edge feature modeling and API location infor-
Zoo as experimental data sources. Among them, the mation, demonstrating certain advantages. In Fig. 8b, on
Drebin dataset contains 5,560 malicious software sam- the AndroZoo dataset with a large sample size and com-
ples and 9,476 benign software samples, providing static plex features, the accuracy and F1 score of the research
features and family annotations, suitable for malicious method are 94.24% and 92.76%, which are 3.28% and
software classification tasks. The AndroZoo dataset is a 3.04% higher than traditional R-GCN, demonstrating a
large-scale repository of Android applications, contain- more significant performance advantage. This indicates
ing over 13 million APK files covering multiple Android that the improved method exhibits higher robustness
markets. It supports complex MSD experiments by and generalization ability in modeling complex behav-
extracting API call sequences. The Drebin dataset cov- ioral features and processing large-scale data. On this
ers common malware families such as Trojans, Ransom- basis, this study conducts experiments on the effective-
ware, Adware, Spyware, Remote Access Tools (RATs), ness of heterogeneous graph models, selecting traditional
and backdoors. Different families differ in their attack R-GCN, GCN, and LSTM as comparison methods for
targets and behavior patterns. For example, information improving R-GCN, as shown in Fig. 9.
theft malware often calls getSubscriberId() and getSim- In Fig. 9a, for the False Positive Rate (FPR), the
SerialNumber() to obtain user identity information, while improved R-GCN achieves convergence after 10.8 s of
RATs use execHttpRequest() and openSocket() to estab- training, with an FPR of 1.08%. The convergence time of
lish remote connections. In contrast, the AndroZoo data- the three comparison methods exceeds 20 s, and the FPR
set, which contains large APK samples collected from during convergence exceeds 2%. In Fig. 9b, for the False
multiple Android markets, is broader in scope and con- Negative Rate (FNR), the improved R-GCN achieves con-
tains a large number of recent malware variants. vergence after 17.7 s of training, with an FPR of 0.67%.
Both datasets are divided into training set, validation The performance of the three comparison methods is still
set, and testing set in a ratio of 7:1.5:1.5. Firstly, this study inferior to the improved R-GCN. Improved R-GCN train-
conducts basic performance verification experiments. ing has higher efficiency, with fewer false positives and
false negatives, is suitable for rapid deployment require- indicating that the randomization of position encoding
ments in large-scale MSD tasks, and can perform MSD destroys the semantic consistency of the sequence and
more accurately. Furthermore, this study validates the has a significant impact on the performance of the model.
impact of location information in API call sequences on In Fig. 10b, the FPR of G1 decreases the fastest and even-
MSD performance. A complete API call sequence group, tually converges to 1.06%, demonstrating the strongest
no location information group, and randomized location false alarm control capability. The FPR of G2 is slightly
encoding group are set up, named G1-G3, respectively. higher, converging to 2.15%, indicating that removing
The experimental results are shown in Fig. 10. location information leads to an increase in the model’s
In Fig. 10a, G1 has the best accuracy performance, con- misjudgment rate. The FPR of G3 remains at a relatively
verging to 91.16% after 200 iterations. The accuracy of high level, eventually converging to 2.97%, further veri-
G2 is lower than G1, with a convergence rate of 86.58%, fying that randomized positional encoding can disrupt
indicating that removing positional information weak- the model’s ability to model behavioral patterns. Table 2
ens the model’s ability to model contextual logic. The presents the results of ablation experiments conducted to
accuracy of G3 is the lowest, only converging to 83.62%,
validate the contributions of each module in the research by heterogeneous graph node type, edge features, and
methodology. improvement. In addition, although the memory usage of
The results in Table 2 show that API location informa- the complete method is the highest, reaching 72.36%, it
tion, heterogeneous graph node types, edge features, and is still in a reasonable range, which ensures the calcula-
improved R-GCN all contribute significantly to malware tion feasibility. In summary, all modules play an impor-
detection performance. After removing the API location tant role in feature expression, temporal relationship
information, the accuracy decreases by 5.5% and the FPR modeling, and structured data learning, and cooperate
rises to 4.15%, indicating its importance in modeling the with each other to significantly improve the accuracy and
order of API calls. The removal of heterogeneous graph robustness of malware detection.
node types and edge features reduces the accuracy to The methods mentioned in references [10–12], and
87.65% and 86.33%, respectively, indicating that the [13] are selected as comparison methods for final veri-
introduction of different node types and the modeling of fication under the same experimental environment and
API interaction are crucial to the detection performance. data set conditions. All methods are tested on the Drebin
The unimproved R-GCN model only achieves 87.48% and AndroZoo datasets, and the same data partitioning
accuracy and 4.23% FPR, indicating that the improved strategy is used to ensure the fairness and reproducibil-
R-GCN improves the detection capability by enhanc- ity of the experiment. In addition, to eliminate the per-
ing the feature propagation mechanism. The complete formance deviation caused by the parameter adjustment
method performs the best when all modules are enabled, of different methods, the hyperparameter configuration
the accuracy rate is 94.71%, and the FPR is reduced to of the comparison methods is strictly in accordance with
1.28%, which verifies the effectiveness of the synergistic the original paper settings. No additional optimization is
effect of all modules. Quantitative analysis shows that carried out to maintain the baseline performance of the
API position information contributes the most, followed original method. The result is shown in Fig. 11.
In Fig. 11a, the average recognition accuracy of the family. The average recognition accuracy of the pro-
method in reference [10] is 87.74%, and in reference [11] posed method in all test categories is 93.67%, which is
it is 89.18%. In references [12] and [13], the recognition obviously better than the existing methods, indicating
accuracy of the methods is relatively high, at 91.24% that the method in this study has stronger generaliza-
and 91.33%, but these two methods have strong volatil- tion ability in zero-day attack environment. To evaluate
ity. The average recognition accuracy of the research the scalability of the model, experiments are conducted
method is 94.27%, which performs the best. In Fig. 11b, with three methods: training time, reasoning speed,
the detection time of the methods and research methods and computing resource consumption. The results are
in references [10–12], and [13] are all between 5 and 10 compared with those in references [10–12], and [13], as
s. Overall, compared with existing methods, the research shown in Table 4.
method can achieve a certain level of efficiency while From Table 4, the proposed model is superior in train-
improving recognition accuracy. ing time and reasoning time, indicating that it has higher
To assess the adaptability of the model in the Zero-day training efficiency and can improve the real-time detec-
Attack environment, the research employs the Leave- tion of malware. In terms of memory usage, floating-
One-Family-Out (LOFO) training strategy. This entails point computation, and GPU utilization, the performance
the elimination of a specific type of malware during the of the proposed model is inferior to that of the refer-
training process, with the remaining categories being uti- ences [10] and [11], but superior to that of the references
lized for training purposes. The detection performance of [12] and [13]. This result shows that the computational
the model on the unseen malware family is tested. In the resource consumption of the proposed model is at a good
experiment, the Drebin dataset is used to select six major level while ensuring optimal detection performance. In
malware families to conduct LOFO training tests respec- the future, optimization strategies such as model prun-
tively, and the methods in references [10–12], and [13] ing, quantization, and edge computing can be combined
are compared. The results are shown in Table 3. to further reduce computing resource consumption and
In Table 3, the reference [10] method demonstrates make the model more suitable for large-scale deployment
an accuracy of 85.08% in zero-day attack environments, and real-time detection environments.
while the reference [11] method exhibits an accuracy of
86.96%. Notably, the reference [12] and [13] methods 5 Discussion
exhibit notably high accuracies of 89.34% and 89.94%, The malware detection method based on DL and API
respectively. However, these methods have large per- feature extraction proposed in this study shows advan-
formance fluctuations on the non-seen malware tages in behavior modeling, generalization ability,
computational efficiency, and false alarm rate control. such as Shapley Additive Explanations (SHAP) and Local
Compared with the optimization method based on API Interpretable Model-agnostic Explanations (LIME),
call features proposed by Jhansi et al. [13], this study should be explored in future research. The investigation
more accurately models API semantic relationships of the impact of API calls on malware classification could
through R-GCN and API position encoding, thereby be facilitated through the utilization of these methods.
improving detection capabilities. In terms of adversarial In addition, the combination of attention-based models
sample detection, Ban et al. [6] showed that the eva- such as Transformer or the Attention layer can high-
sion rate of existing classifiers under black-box attacks light the most critical sequence of API calls, making the
can reach 99%, while this method effectively reduces inspection decision process more transparent. Improving
detection vulnerabilities by enhancing feature learning. model interpretability not only helps security research-
Compared with the detection method based on image ers understand the detection logic but also improves
processing proposed by Tuncer et al. [8], this study more the reliability and acceptability of the model in practical
deeply describes malicious behavior patterns through applications.
heterogeneous graph modeling and improves feature
expression capabilities. At the same time, although Vai- 6 Conclusion
yapuri et al. [14] used optimization methods to improve Aiming at the accuracy and efficiency of malware detec-
detection efficiency in industrial IoT environments, the tion, this paper proposed a malware detection method
cost of parameter tuning was high, while this method based on DL algorithm and API feature extraction. This
completed detection within 5 to 10 s, maintaining high method extracted the API call sequence and entry func-
efficiency and stability. In summary, this method achieves tion of the Android application, constructed a hetero-
a balance between detection accuracy, robustness, and geneous graph, and used improved R-GCN for feature
efficiency, providing a more reliable and efficient solution learning to improve the detection capability of malware.
for malware detection. The experimental results showed that the accuracy of the
The research method shows high accuracy and gen- proposed method on the Drebin and AndroZoo data-
eralization ability in malware detection, but there is still sets reached 92.80% and 94.24%, and the F1 scores were
room for improvement in scalability, real-time deploy- 91.30% and 92.76%, respectively. Compared with tradi-
ment, and dynamic threat adaptability. Firstly, in terms tional R-GCN, this method improved the detection accu-
of scalability, as the data scale grows, the GNN comput- racy and enhanced the robustness and generalization
ing cost may become a bottleneck. In the future, resource ability of the model. The ablation experiment and API
allocation can be optimized through graph sampling, position coding influence experiment further verified the
model pruning, and distributed computing to improve effectiveness of each module, indicating that API posi-
system scalability. Secondly, in terms of real-time deploy- tion information, heterogeneous graph node types, and
ment, on mobile terminals and IoT devices with limited edge characteristics all contributed significantly to the
computing resources, the model computing overhead is detection performance. In the comparative experiment,
large. In the future, lightweight models, edge comput- the research method outperformed the existing meth-
ing, and incremental learning can be explored to improve ods in both conventional malware detection and zero-
detection speed while ensuring accuracy. Finally, in terms day attack detection. In the conventional detection tasks,
of dynamic threat adaptability, malware continues to the average recognition accuracy of the research method
evolve, and attackers use code obfuscation, mutation, was 94.27%, which was significantly improved compared
and evasion techniques to bypass the detection system. with the existing methods. The detection was completed
In the future, online learning, transfer learning, and static within 5 to 10 s, ensuring the computational efficiency.
dynamic hybrid analysis can be combined to improve the In the zero-day attack detection experiment, the aver-
model’s ability to detect unknown threats. In response to age recognition accuracy of this method for all tested
these challenges, the research can further optimize com- malware families reached 93.67%, significantly higher
puting efficiency, real-time detection capabilities, and than the reference methods. Especially in the detection
dynamic defense strategies to improve the practicality of remote access Trojans and information theft software,
and security of the method. the accuracy rates reached 94.35% and 94.01%, respec-
This study mainly focuses on the accuracy of malware tively. It shows stronger generalization ability and adapta-
detection, but in network security, the interpretability of bility to complex attack behavior. In contrast, the existing
AI models is equally crucial. The potential for the intro- methods have poor stability in zero-day attack detection,
duction of feature attribution visualization methods, and their performance fluctuates greatly, which further
Sun EURASIP Journal on Information Security (2025) 2025:10 Page 15 of 15
validates the effectiveness of this method. In summary, evaluation through the application of meta-heuristic optimization algo-
rithm. Cybernet. Inform. Technol. 24(2), 142–155 (2024)
the research method combines API location information, 11. T. Kim, M. Krichen, M.A. Alamro, A. Mihoub, G. Avelino Sampedro, S.
heterogeneous graph modeling, and improved R-GCN. Abbas, Exploiting smartphone defence: a novel adversarial malware
This combination improves detection accuracy and dataset and approach for adversarial malware detection. Peer-to-Peer
Net. Appl. 17(5), 3369–3384 (2024)
ensures computational efficiency. It also demonstrates 12. S.A. Habtor, A.H.H. Dahah, Machine-learning classifiers for malware detec-
stronger adaptability under zero-day attack environ- tion using data features. J. ICT Res. Appl 15(3), 265–290 (2021)
ments. This provides a more reliable malware detection 13. K.S. Jhansi, P.R.K. Varma, S. Chakravarty, Swarm optimization and machine
learning for android malware detection. Comput. Mater. Continua 73(3),
scheme for practical applications. 6327–6345 (2022)
14. T. Vaiyapuri, K. Shankar, S. Rajendran, S. Kumar, V. Gaur, D. Gupta, M.
Acknowledgements
Alharbi, Automated cyberattack detection using optimal ensemble deep
Not applicable.
learning model. Transact. Emerg. Telecommun. Technol. 35(4), e4899
(2024)
Author’s contributions
15. A. Manimaran, L. Kartheesan, D. Kumutha, R. Surendran, An optimized
W.S. draft manuscript preparation; study conception and design.
hybrid deep learning framework for monitoring botnet attacks in IoT net-
works, 2024 International Conference on IoT Based Control Networks and
Funding
Intelligent Systems (ICICNIS). (Bengaluru, 2024 International Conference
This paper didn’t receive any funding support.
on IoT Based Control Networks and Intelligent Systems (ICICNIS), 2024),
pp. 487–492
Data availability
16. Y. Alotaibi, R. Deepa, K. Shankar, S. Rajendran, Inverse chi-square-based
The datasets used and/or analysed during the current study available from the
flamingo search optimization with machine learning-based security solu-
corresponding author on reasonable request.
tion for Internet of Things edge devices. AIMS Math 9, 22–37 (2024)
17. A. Swaminathan, B. Ramakrishnan, M. Kanishka, R. Surendran Prediction
Declarations of cyber-attacks and criminality using machine learning algorithms. 2022
International Conference on Innovation and Intelligence for Informat-
Ethics approval and consent to participate ics, Computing, and Technologies (3ICT). (Sakheer, Bahrain, 2022), pp.
Not applicable. 547–552
18. M.E. Farfoura, A. Alkhatib, D.M. Alsekait et al., A low complexity ML-based
Consent for publication methods for malware classification. Comput. Mater. Continua 80(3),
Not applicable. 4833–4857 (2024)
19. K.A. Dhanya, P. Vinod, S.Y. Yerima et al., Obfuscated malware detection
Competing interests in IoT Android applications using Markov images and CNN. IEEE Syst. J.
The authors declare no competing interests. 17(2), 2756–2766 (2023)
20. X. Hu, C. Zhu, G. Cheng, R. Li, H. Wu, J. Gong, A deep subdomain adapta-
tion network with attention mechanism for malware variant traffic
Received: 20 December 2024 Accepted: 24 February 2025 identification at an IoT edge gateway. IEEE Int. Things J. 10(5), 3814–3826
(2023)
21. K.A. Dahri, M.S. Vighio, B.A. Zardari, Detection and prevention of malware
in Android operating system. Mehran Univ. Res. J. Eng. Technol. 40(4),
847–859 (2021)
References 22. G. Sun, Q. Qian, Deep learning and visualization for identifying malware
1. H. Mokayed, T.Z. Quan, L. Alkhaled, V. Sivakumar, Real-time human families. IEEE Trans. Dependable Secure Comput. 18(1), 283–295 (2021)
detection and counting system using deep learning computer vision 23. J. Kim, Y. Ban, E. Ko, H. Cho, J.H. Yi, MAPAS: a practical deep learning-based
techniques. Artificial. Intellig. Appl. 1(4), 221–229 (2023) android malware detection system. Int. J. Inf. Secur. 21(4), 725–738 (2022)
2. A. Al-Marghilani, Comprehensive analysis of IoT malware evasion tech- 24. Y. Chai, L. Du, J. Qiu, L. Yin, Z. Tian, Dynamic prototype network based on
niques. Eng. Technol. Appl. Sci. Res. 11(4), 7495–7500 (2021) sample adaptation for few-shot malware detection. IEEE Trans. Knowl.
3. P. Yadav, N. Menon, V. Ravi, S. Vishvanathan, T.D. Pham, A two-stage deep Data Eng. 35(5), 4754–4766 (2022)
learning framework for image-based android malware detection and 25. R. Yumlembam, B. Issac, S.M. Jacob, L. Yang, Iot-based android malware
variant classification. Comput. Intell. 38(5), 1748–1771 (2022) detection using graph neural network with adversarial defense. IEEE
4. S.A. Ajagbe, J.B. Awotunde, H. Florez, Ensuring intrusion detection for IOT Internet Things J. 10(10), 8432–8444 (2022)
services through an improved CNN. SN Comput. Sci. 5(1), 49 (2023) 26. S. Li, Q. Zhou, R. Zhou, Q. Lv, Intelligent malware detection based on
5. S.A. Ajagbe, M.O. Adigun, Deep learning techniques for detection and graph convolutional network. J. Supercomput. 78(3), 4182–4198 (2022)
prediction of pandemic diseases: a systematic literature review. Multime- 27. O.J. Falana, A.S. Sodiya, S.A. Onashoga, B.S. Badmus, Mal-Detect: an intel-
dia. Tools Appl. 83(2), 5893–5927 (2024) ligent visualization approach for malware detection. J. King Saud Univ.
6. X. Pei, X. Deng, S. Tian, L. Zhang, K. Xue, A knowledge transfer-based Comput. Inform. Sci. 34(5), 1968–1983 (2022)
semi-supervised federated learning for IoT malware detection. IEEE Trans.
Dependable Secure Comput. 20(3), 2127–2143 (2022) Publisher’s Note
7. F. Mercaldo, A. Santone, Audio signal processing for Android malware Springer Nature remains neutral with regard to jurisdictional claims in pub-
detection and family identification. J. Comput. Virol. Hacking Techniq. lished maps and institutional affiliations.
17(2), 139–152 (2021)
8. T. Tuncer, F. Ertam, S. Dogan, Automated malware identification method
using image descriptors and singular value decomposition. Multimedia
Tools Appl. 80(7), 10881–10900 (2021)
9. S. Ali, O. Abusabha, F. Ali, M. Imran, T. Abuhmed, Effective multitask deep
learning for iot malware detection and identification using behavioral
traffic analysis. IEEE Trans. Netw. Serv. Manage. 20(2), 1199–1209 (2022)
10. P.K. Kaithal, V. Sharma, African Vulture Optimization-Based Decision
Tree (AVO-DT): an innovative method for malware identification and