0% found this document useful (0 votes)
103 views15 pages

Liu Et Al. - 2024 - SeGDroid An Android Malware Detection Method Base

phát hiện mã độc

Uploaded by

hoangduchung0311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views15 pages

Liu Et Al. - 2024 - SeGDroid An Android Malware Detection Method Base

phát hiện mã độc

Uploaded by

hoangduchung0311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Expert Systems With Applications 235 (2024) 121125

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

SeGDroid: An Android malware detection method based on sensitive


function call graph learning
Zhen Liu a , Ruoyu Wang b ,∗, Nathalie Japkowicz c , Heitor Murilo Gomes d , Bitao Peng a ,
Wenbin Zhang e
a
School of Information Science and Technology/School of Cyber Security, Guangdong University of Foreign Studies, Guangzhou, 510006, PR China
b Information and Network Engineering Research Center, South China University of Technology, Guangzhou, 510641, PR China
c Department of Computer Science, American University, Washington DC, 20016, USA
d School of Engineering and Computer Science, Victoria University of Wellington, Wellington, 6140, New Zealand
e
Department of Computer Science, Michigan Technological University, Houghton, 49931, USA

ARTICLE INFO ABSTRACT

Keywords: Malware is still a challenging security problem in the Android ecosystem, as malware is often obfuscated to
Android malware detection evade detection. In such case, semantic behavior feature extraction is crucial for training a robust malware
Function call graph detection model. In this paper, we propose a novel Android malware detection method (named SeGDroid) that
Graph neural network
focuses on learning the semantic knowledge from sensitive function call graphs (FCGs). Specifically, we devise
Semantic knowledge
a graph pruning method to build a sensitive FCG on the base of an original FCG. The method preserves the
Model explanation
sensitive API (security-related API) call context and removes the irrelevant nodes of FCGs. We propose a node
representation method based on word2vec and social-network-based centrality to extract attributes for graph
nodes. Our representation aims at extracting the semantic knowledge of the function calls and the structure
of graphs. Using this representation, we induce graph embeddings of the sensitive FCGs associated with node
attributes using a graph convolutional neural network algorithm. To provide a model explanation, we further
propose a method that calculates node importance. This creates a mechanism for understanding malicious
behavior. The experimental results show that SeGDroid achieves an F-score of 98% in the case of malware
detection on the CICMal2020 dataset and an F-score of 96% in the case of malware family classification on
the MalRadar dataset. In addition, the provided model explanation is able to trace the malicious behavior of
the Android malware.

1. Introduction solution for malware detection because it has the potential to keep up
with the speed at malware evolving (Guerra-Manzanares et al., 2021;
Due to the open-source nature of the Android operating system, Ou & Xu, 2022). Feature extraction is crucial to improve malware
Android apps have become the main target of cybercriminals (Lo detection performance in machine learning approaches.
et al., 2022). Malicious behaviors of cybercriminals include: browser Commonly, two types of techniques are used to extract features
hijacking, malicious collection of user information (such as credit card from Android apps, i.e. static analysis (Martín et al., 2019; Ou &
and contact data), malicious bundling and launching of unwanted ad- Xu, 2022; Vasan et al., 2020) and dynamic analysis (Ananya et al.,
vertisements (Android Statistics, 2022, Xu et al., 2021). The emergence 2020; Lin et al., 2022). Static analysis-based techniques extract features
of a massive number of malware attacks poses a considerable challenge from the assembled files of app installation packages. For example,
to malware mitigation (Gao et al., 2021). AndroPyTool (Martín et al., 2019) extracts permissions, intents, ser-
Many methods have been proposed to cope with malware detection, vices and providers from AndroidManifest.xml and API calls from Smali
including signature-based (Grace et al., 2012; Zheng et al., 2013) and codes. Dynamic analysis-based techniques mainly focus on the system
machine learning based (Razgallah et al., 2021) methods. Recently, or network traffic data while running apps. These techniques require
machine learning, specifically deep learning, has been a promising

The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility
Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.
∗ Corresponding author.
E-mail addresses: [email protected] (Z. Liu), [email protected] (R. Wang), [email protected] (N. Japkowicz), [email protected]
(H.M. Gomes), [email protected] (B. Peng), [email protected] (W. Zhang).

https://doi.org/10.1016/j.eswa.2023.121125
Received 14 February 2023; Received in revised form 3 August 2023; Accepted 4 August 2023
Available online 11 August 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

executing the apps, which causes overhead and inconvenience (Cai (5) We perform multiple experiments to evaluate the performance
et al., 2021). Most approaches thus rely on static analysis for feature of our method, including binary classification, category classification
extraction (Shar et al., 2020). and malware family classification. Experimental results show that our
In the static analysis, recent works pay more attention on semantic method achieves an F-score of 98% for detecting malware and an
feature extraction from graphs (Lo et al., 2022; Onwuzurike et al., F-score of 96% for malware family classification on average.
2019; Ou & Xu, 2022). This is because that semantic features are more The remainder of the paper is organized as follows. Section 2
robust than syntax features when confronting malware evasion (Wu, Li, overviews related works. Section 3 analyzes the characteristics of FCGs
et al., 2019). An function call graph (FCG) consists of a set of program and proposes our malware detection method. Section 4 introduces our
functions and their interprocedural calls (Lo et al., 2022), which can experimental datasets, analyzes the experimental results and visualizes
capture the caller-callee relationship between methods inside an APK. the node importance in FCGs. Section 5 concludes this paper.
The structure of the graph can be further embedded as the input of
machine learning algorithms. In particular, recent work has used the 2. Related work
Graph Neural Networks(GNNs) to embed FCGs (Xu et al., 2021). The
GNN is able to leverage the topological structure and node features to 2.1. Related work on feature extraction in Android malware detection
generate informative embedding for each node (Gao et al., 2021).
The nodes in the FCG may be the system APIs or the self-defined Android malware detection methods have evolved from signature-
functions. A variety of methods have been proposed for node repre- based methods to machine learning-based methods. Machine learning
sentation (obtaining node feature vectors). Xu et al. (2021) applied approaches critically depend on feature extraction. The feature ex-
the SIF (Arora et al., 2017) to vectorize the opcode sequences on a traction methods for malware detection can be categorized into static
node. Cai et al. (2021) took the functions as words and leveraged analysis and dynamic analysis methods. Static analysis methods (Liu
word2vec to embed the functions into vectors; however, attackers et al., 2021; Onwuzurike et al., 2019) extract features from app installa-
can easily obfuscate the self-defined functions. Vinayaka and Jaidhar tion files, including AndroidManifest.xml and decompiled Smali codes.
(2021) leveraged the API package list for representing the external node Various features, such as permissions, API calls, intent filters, opcode,
(system API) and the opcode list for the internal node (self-defined FCG, etc., have been widely researched (Kabakus, 2022; Qiu et al.,
function). This allowed them to handle the obfuscation problem of 2023; Tang et al., 2022; Zhang et al., 2021). Static analysis-based meth-
self-defined functions. In addition, they also proposed a node balance ods implement malware detection before running apps. The dynamic
method for handling the node imbalance between benign and malware. analysis-based methods (Ananya et al., 2020; Guerra-Manzanares et al.,
However, their node attributes are based solely on the occurrence of 2022; Lin et al., 2022; Wang et al., 2020) extract features from the
the APIs or the opcodes. They do not carry semantic knowledge in the data (such as system calls and network traffic) while running apps.
API and opcode sequence. The node balance method is implemented It acquires executing the apps for detecting malware. Static analysis
by randomly removing the samples from the training set based on and dynamic analysis have also been combined for Android malware
occurrence and not on semantic knowledge, and it is likely that some detection (Alzaylaee et al., 2020; Li et al., 2018; Martín et al., 2019).
informative samples are removed. To address the different shortcom- In static analysis approaches, the extracted features are catego-
ings mentioned in this discussion, this paper proposes a new Android rized into: (1) occurrence-based features; (2) image-based features; (3)
malware detection method named SeGDroid that takes semantic knowl- text-based features; and (4) graph-based features.
edge information. The main contributions of this paper include the Regarding occurrence-based features, the API calls, permissions, in-
following five items: tents, activities, and others extracted from the static APK files are
(1) We propose a new Android malware detection method based combined together as features (Badhani & Muttoo, 2019; Scalas et al.,
on the sensitive FCG and GNN. Our method first builds the sensitive 2019). The feature value is 1 if the feature exists in an app, and the
FCGs and extracts the semantic attributes for function nodes. It then feature vector with binary value is used to represent an app (Badhani
implements a GNN for embedding FCGs into feature vectors. These & Muttoo, 2019; Kong et al., 2022). Also, the frequency of the fea-
vectors can be further used to train a malware detection model using a ture occurring in an app could be recorded for each feature (Martín
machine learning algorithm. et al., 2019). Badhani and Muttoo (2019) propose a malware detection
(2) Regarding sensitive FCG building, we propose a graph pruning method named CENDroid, which uses the API and Permissions as
method that preserves the context of sensitive API calls (i.e. security- features. The feature value is binary according to whether the feature
related API calls that require specific permissions) while leaving out exists in the Smali codes and AndroidManifest.xml file. The clustering
the others. This reduces the number of nodes included in huge FCGs and ensemble of classifiers are applied for malware detection. Exper-
while simultaneously handling the node imbalance problem without imental results on their datasets show that it obtains 98% accuracy
removing any app samples. In addition, it causes the model to pay more in the best case. OmniDroid (Martín et al., 2019) builds benchmark
attention to the malicious behavior that correlates with sensitive APIs. datasets taking API, Permission, Intents, Service, Activity, Receiver,
(3) From the aspect of feature representation for graph nodes, we System commands, FlowDroid, Package name, and Strings as features.
propose to train two models based on the word2vec (Mikolov et al., The feature value is the frequency of the feature that occurs in an app.
2013a) algorithm: an API2vec model and an opcode2vec model for Regarding image-based features, the DEX files are transformed into
embedding external nodes (system APIs) and internal nodes (opcode images, and the images are fed into the CNN(Convolutional Neural
sequence), respectively. This is a good way to keep the semantics Networks) for training an Android malware detection model (D’Angelo
of APIs and the opcode sequence. System APIs and opcodes are not et al., 2020; Naït-Abdesselam et al., 2020; Vasan et al., 2020). Vasan
easily obfuscated by attackers, so the node representation based on et al. (2020) propose an image-based malware detection method using
them is robust to obfuscation techniques. In addition, social-network- a fine-tuned CNN. It first converts the raw malware binaries into color
based centrality (Wasserman & Faust, 1994) is further used to weight images. The pretrained CNN on ImageNet datasets is fine-tuned on
the feature vectors. The feature vectors thus carry the graph structure the malware datasets. The results on the IoT-android mobile dataset
knowledge. This mechanism is able to accelerate the convergence of show that it obtains 97% accuracy. Naït-Abdesselam et al. (2020)
graph learning. also propose an image-based malware detection method. It transforms
(4) To explain the FCG based malware detection model, we propose each APK into an RGB image. The permissions and components from
a method to visualize the importance of different nodes and find that Android AndroidManifest.xml obtain the green channel of the image.
the majority of nodes with high importance correspond to the sensitive The API calls and unique opcode sequences of the DEX file obtain the
APIs. This provides useful information to the user. red channel of the image. The blue channel of the image is obtained

2
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

by the strains, suspected permission, app components and API calls.


The experiments on the AndroidZoo dataset show that it obtains 99%
accuracy.
Concerning the text-based features, the APIs and opcodes are handled
as texts (Sun et al., 2019; Zhang et al., 2021), and then the feature
learning method in NLP (Natural Language Processing) is adopted, such
as word2vec (Rong, 2014). Sun et al. (2019) utilize APIs, permissions
and metadata for characterizing apps. The metadata include the cate-
gory and description of apps. The word2vec is applied to vectorize the
metadata. Zhang et al. (2021) propose a TC-Droid method based on the
text sequence of permissions, service, intent and receiver. They apply
Fig. 1. The example of an FCG.
a text CNN to learn the features from the original text. The results on
the Genome dataset show that it obtains approximately 96% accuracy.
On the aspect of graph-based features, graphs provide an abstraction
representation for modeling the behaviors of Android applications. A node balance method may remove informative samples for malware
variety of graphs have been researched for Android malware detection, detection. In addition, there is no work involving the model explanation
including the graph with control flow and data flow (Alhanahnah in the field of FCG-based Android malware detection.
et al., 2020), the heterogeneous information network among apps To cope with the challenges mentioned above, we propose a new
and APIs (Hou et al., 2017), and FCG (Lei et al., 2019; Onwuzurike Android malware detection method called SeGDroid. Our method uti-
et al., 2019; Wu, Li, et al., 2019). MaMadroid (Onwuzurike et al., lizes word2vec (Mikolov et al., 2013a) and centrality (Wasserman &
2019) builds API paths for each app obtained from an FCG and then Faust, 1994) for the node semantic representation. Specifically, with
abstracts APIs to their corresponding package families. It then transfers the domain knowledge of Android malware detection, we propose
all abstracted paths to a feature vector for an app using the Markov a graph pruning method for simplifying the FCG. It preserves the
model. Their experimental results show that the model achieves an sensitive API correlated nodes and removes the irrelevant nodes to
F-measure of 98% in the best case. Malscan (Wu, Li, et al., 2019) decrease the node imbalance ratio between the malware and benign
combines all sensitive APIs as the feature set. The feature vector values app, reduce the resource consumption of training, and encourage the
denote the occurrence frequency of the corresponding APIs. To learn effect of nodes with sensitive APIs for malware detection. In addition,
the structure of FCGs, it leverages the centrality metrics defined in the to explain the FCG-based model, we devise a method to visualize the
social network to weight the feature vectors. For the results on the importance of graph nodes for Android malware detection.
Androidzoo dataset, Malscan achieves 98% accuracy.
3. Methodology
2.2. Related work on GNN based Android malware detection
3.1. Function call graph analysis
Recently, researchers have paid more attention to utilizing the GNN
to extract the feature vectors for representing FCGs. Cai et al. (2021) This section mainly analyzes the characteristics of FCGs. The FCG
propose enhanced FCGs (E-FCGs) to characterize app behaviors. They is defined as:
build the corpus of functions by putting together all the function call Definition 1 Function call graph: FCG is a directed graph. 𝐹 𝐶𝐺 =
records and adopting the CBOW(Continuous Bag of Words) algorithm ⟨, ⟩, where  is the set of functions (i.e., callers and callees);  is the
for embedding functions. They further acquire the enhanced FCGs set of directed edges between callers and callees.
with node attributes obtained by function embedding. Then, the GCN One example of an FCG is shown in Fig. 1, in which a function is
algorithm is used to learn a feature vector for each FCG. The learned denoted by a node. If a function A is invoked in another function B,
features are used as the input of the linear regression, decision tree, there is an edge from B to A. The external functions are represented in
SVM(support vector machine), KNN(K nearest neighbor), random for- red, and the internal functions are in blue. The function node without
est, MLP(Multilayer Perceptron) and CNN algorithms. The experimental indegree is the entry point of a function running path (Lei et al.,
results on the Androzoo and Google app store datasets show that it 2019), such as onReceive() and onCreate(). Previous works (Ou &
obtains 99% accuracy in the best case. However, the function-based Xu, 2022; Wu, Li, et al., 2019) have utilized sensitive API calls for
node attributes rely on the function names that are easily obfuscated malware detection. The context of sensitive API calls is also important
by renaming. for malicious reorganization. For example, if the functions of acquiring
Xu et al. (2021) apply the opcode sequences of each function user information are related to GUI events, this may be triggered by
for node representation. They utilize the SIF network (Arora et al., the users; otherwise, this may be triggered by attackers (Meng et al.,
2017) to learn the feature vectors of the opcode sequence and the 2018). Therefore, we also consider the context of the invoked sensitive
structure2vec algorithm for graph embedding. The obtained vectors are API, i.e., these nodes in the path from the entry point to the sensitive
used as the input of MLP. The results on the Drebin, AMD, AndroZoo API and the path from the sensitive API to the leaf node.
and PraGuard datasets show that it obtains 99% accuracy. However, Next, we analyze the sensitive API node distribution in the malware
the DEX file does not contain the implementation of external func- FCGs and benign FCGs. The CDF (Cumulative Distribution Function) of
tions (Vinayaka & Jaidhar, 2021), on which this method cannot obtain the sensitive API ratio is shown in Fig. 2. It shows that the sensitive
the opcode sequence about the function implementation. Vinayaka API ratio in the malware is larger than that in the benign class. This
and Jaidhar (2021) apply APIs to represent the external nodes and indicates that malwares invoke sensitive APIs with a high probability.
opcode sequences to represent the internal nodes. Then, they utilize
multiple GNNs for graph embedding and acquire the feature vector 3.2. The framework of SeGDroid
for each app. Specifically, they proposed a node balance method for
handling the node imbalance problem in FCGs. The results on the The framework of SeGDroid is shown in Fig. 3. It mainly includes
CIC and Androzoo datasets show that it obtains 92% accuracy with three parts.
the GraphSAGE algorithm. On the aspect of node representation, this (1) Function call graph building and pruning: We first unzip,
method only considers the existence of API calls and opcodes but omits decompile and build the FCGs from the APKs using Androidguard (An-
the semantic knowledge of the API calls and opcode sequences. The droguard, 2022). Some graphs may have a large number of nodes,

3
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

(3) Function call graph learning: On the FCGs associated with


node feature vectors, we utilize GraphSAGE (Hamilton et al., 2017)
to perform graph embedding. GraphSAGE embeds each graph node by
iteratively learning the knowledge from the corresponding neighbor
nodes. All graph node vectors are combined into a vector by the
readout. We then train a malware detection model on the obtained
feature vectors using machine learning algorithms.

3.3. Function call graph pruning

A graph with a large number of nodes would increase the time


consumption of graph learning. Graph pruning is a way to decrease
the number of nodes in a graph. In malware detection, it is important
to preserve malicious behavior-related nodes. The sensitive APIs are
correlated with malicious behavior (Wu, Li, et al., 2019). In addition,
Fig. 2. The CDFs of the sensitive API ratios in malwares and benign apps. the context of sensitive APIs is also helpful for malware detection.
Therefore, this paper proposes a sensitive API-based graph pruning
method. It aims at preserving the sensitive APIs and their context. The
pruned graph is called sensitive FCG. The sensitive node and sensitive
FCG are respectively defined as below.

Definition 2 Sensitive node: Among the nodes in an FCG, the function


of a node that matches a sensitive API is defined as a sensitive node,
i.e., {𝑆𝑠𝑣 |𝑆𝑠𝑣 ∈ 𝑆𝑎𝑝𝑖 , 𝑆𝑠𝑣 ∈ }. 𝑆𝑠𝑣 denotes the set of sensitive nodes and
𝑆𝑎𝑝𝑖 denotes the sensitive API set.
Definition 3 Sensitive function call graph: the sensitive function
call graph is a subgraph of an FCG. 𝑆𝐺 = {,  |  ∈  (𝑆𝑠𝑣 )};  (𝑆𝑠𝑣 )
is the set of all neighbors that can reach the sensitive nodes in 𝑆𝑠𝑣 .
Graph pruning is shown in Algorithm 1. It aims to preserve the
sensitive API-related nodes and leave out the remaining ones. Fig. 4 is
an example of graph pruning. The details of Algorithm 1 are illustrated
as follows.
(1) Lines 1 to 5 search the sensitive nodes. It obtains a set 𝑆𝑠𝑣
including all sensitive nodes in the graph. This corresponds to the step
1 in Fig. 4, acquiring 𝑆𝑠𝑣 = {𝑣8 }. The node 𝑣8 is highlighted in yellow.
(2) Lines 6 to 8 search the neighbors of the nodes in 𝑆𝑠𝑣 set in
the upward direction of the graph. This means that it searches all the
ancestor nodes of the sensitive nodes until the entry point. This step
obtains the ancestor node set of sensitive nodes that is denoted by 𝑆𝑠𝑎𝑣 .
This corresponds to the step 2 in Fig. 4, acquiring 𝑆𝑠𝑎𝑣 = {𝑣2 , 𝑣4 , 𝑣1 }.
(3) Lines 9 to 11 further search the descendant nodes of the nodes
in 𝑆𝑠𝑎𝑣 . For each node in 𝑆𝑠𝑎𝑣 , it searches all the descendant nodes until
the leaf node in the downward direction of the graph. This step acquires
the descendant node set that is denoted by 𝑆𝑠𝑑𝑣 . This corresponds to the
step 3 in Fig. 4, acquiring 𝑆𝑠𝑑𝑣 = {𝑣2 , 𝑣3 , 𝑣4 , 𝑣5 , 𝑣6 , 𝑣7 , 𝑣8 , 𝑣9 , 𝑣10 }.
Fig. 3. The framework of SeGDroid.
(4) Line 12 removes the nodes and correlated edges that are not in
𝑆𝑠𝑑𝑣 . This corresponds to the step 4 in Fig. 4. According to the obtained
node set (the union of 𝑆𝑠𝑑𝑣 and 𝑆𝑠𝑎𝑣 ), {𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 , 𝑣5 , 𝑣6 , 𝑣7 , 𝑣8 , 𝑣9 , 𝑣10 },
we preserve those nodes and remove all other nodes of the graph in
which would increase the complexity of graph learning. To simplify the
Fig. 4. As a result, we acquire the simplified graph used for graph
graphs while preserving the semantic knowledge of malicious behavior,
embedding.
we propose a graph pruning method that retains the nodes in the
function running path with sensitive APIs. The sensitive APIs used for The graph pruning method has the following four priorities. (1)
building sensitive FCGs are on the basis of the mappings of APIs and It reduces the training resource consumption of graph embedding by
permissions reported by PScout (Au et al., 2012). There are 21,986 decreasing the number of nodes. (2) It handles the node imbalance
sensitive APIs (Wu, Li, et al., 2019). problem among apps because it significantly reduces the number of
(2) Node representation: We transfer the API and opcode sequence nodes for a big graph that has a large number of nodes. (3) It en-
into a vector for each node to obtain the node attributes. We use a courages the effect of the sensitive nodes in graph embedding. We
centrality measure to weight the node vectors to improve the repre- implement graph readout on all nodes’ vectors and acquire the final
sentation ability of node vectors. Centrality measures (Wasserman & vector for a graph. The effect of the sensitive nodes in the readout
Faust, 1994) are widely used in social network studies to denote the is encouraged by removing the nodes that are not correlated with
importance of nodes to some extent. To the best of our knowledge, this sensitive nodes. (4) It preserves the context of invoking sensitive APIs.
is the first work that utilizes the centrality measure to weight the node This is because it retains all function running paths passing through the
feature vectors used for GNN. sensitive nodes.

4
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 4. The example of FCG pruning process.

Algorithm 1 Function call graph pruning algorithm Word2vec is an unsupervised learning NLP technique that generates
context-aware embeddings for words (Gao et al., 2021; Mikolov et al.,
Input: an FCG  = (, ), sensitive API set 𝑆𝑎𝑝𝑖
2013a). Word2vec contains two models: Skip-gram and CBOW (Rong,
Output: a sensitive FCG  = (, )
2014). Skip-gram performs better for the infrequent words (Gao et al.,
1: for 𝑣𝑖 in  do
2021) empirically. We adopt the skip-gram for node embedding, be-
2: if 𝑣𝑖 ∈ 𝑆𝑎𝑝𝑖 then
cause sensitive APIs are not invoked frequently.
3: 𝑆𝑠𝑣 ← 𝑆𝑠𝑣 ∪ 𝑣𝑖
Next, taking API2vec as an example, we further illustrate the process
4: end if
of node embedding. Since the self-defined functions are easily confused
5: end for
6: for 𝑠𝑣𝑖 in 𝑆𝑠𝑣 do
by changing their names, we only handle the system APIs. On the
7: addAncestor(𝑠𝑣𝑖 , 𝑆𝑠𝑎𝑣 , ) system API corpus collected from the apps in the training set, we train
8: end for
the API2vec model shown in Fig. 5.
9: for 𝑠𝑎𝑣𝑖 in 𝑆𝑠𝑎𝑣 do
The skip-gram utilizes a fixed-size sliding window that moves on
10: addDescendant(𝑠𝑎𝑣𝑖 , 𝑆𝑠𝑑𝑣 , ) the texts with multiple words to generate the training samples. The
11: end for objective of training is to update the word embedding, so as to predict
12: =obtainPrunedGraph(, 𝑆𝑠𝑑𝑣 ,𝑆𝑠𝑎𝑣 ) the surrounding context words. Given the words in an API package,
𝑎1 , 𝑎2 , … , 𝑎𝐾 and the window size of 2m+1, the model maximizes the
average log probability as

𝐾 ∑
3.4. Node representation 𝐽 (𝑎) = 1∕𝐾 𝑙𝑜𝑔𝑃 (𝑎𝑡+𝑗 |𝑎𝑡 ) (1)
𝑡=1 −𝑚≤𝑗≤𝑚
3.4.1. The node embedding based on word2vec 𝑃 (𝑎𝑡+𝑗 |𝑎𝑡 ) is defined as
This section aims at learning a feature vector for each node in
an FCG while considering the semantic information of functions in 𝑒𝑥𝑝(𝐕𝖳𝑎 𝐕𝑎𝑡+𝑗 )
each node. To represent the function calls, we handle the internal and 𝑃 (𝑎𝑡+𝑗 |𝑎𝑡 ) = ∑𝐿 𝑡 (2)
𝖳
𝑖=1 𝐕𝑎 𝐕𝑎𝑖
external functions differently. This is because the internal functions 𝑡

are usually those defined by the programmers, whose function names Where 𝐕𝑎𝑡 and 𝐕𝑎𝑡+𝑗 are the corresponding embeddings of the 𝑎𝑡
are easily obfuscated. Instead of using function names, we utilize the and 𝑎𝑡+𝑗 , respectively, and 𝐿 is the size of the vocabulary. However, this
opcode sequence to represent the internal function. External functions formulation is expensive to optimize because the number of parameters
that are usually the APIs of existing program libraries may also be to be updated is much high when the vocabulary is very large. In
changed when updating the libraries. The package names are more practice, negative sampling (Mikolov et al., 2013b) and hierarchical
stable than the function names. Therefore, the API package names are softmax (Mikolov et al., 2013) are used to decrease the resource
used to represent the external nodes. consumption and to improve the embedding quality.
To learn the semantics of APIs and opcode sequences, we respec- In an FCG, each node is represented by the concentration of the
tively train an embedding model based on word2vec algorithm, obtain- feature vectors obtained by API2vec and opcode2vec. For an external
ing API2vec and opcode2vec models. To build the API2vec model, the node 𝑣𝑖 ∈ , 𝐕𝑖𝑎 is a vector obtained by API2vec, and 𝐕𝑖𝑜 is a vector
API packages are collected to form the corpus. Word2vec is applied to with zero value. For an internal node 𝑣𝑗 ∈ , 𝐕𝑗𝑜 is a vector obtained
acquire the feature vector for each word in the package. The average by opcode2vec, and 𝐕𝑗𝑎 is a vector with zero value. Therefore, the vector
of the vectors of all words in a package is the feature vector for an 𝐕𝑖 of a node 𝑣𝑖 is formulated as
external node. The package names have the semantic knowledge of
APIs. According to the high cohesion in software design, the APIs in 𝐕𝑖 = [ 𝐕𝑖𝑎 , 𝐕𝑖𝑜 ] (3)
the same package may have similar usage purpose and their feature (|𝐕𝑖𝑎 |+|𝐕𝑖𝑜 |)
vectors should be close. Where 𝐕𝑖 ∈ R , 𝐕𝑖𝑎 = 𝑚𝑒𝑎𝑛(𝐕𝑖𝑎 , 𝐕𝑖𝑎 ,. . . , 𝐕𝑖𝑎 ) for the external
1 2 𝑘
To build the opcode2vec model, the opcode corpus is first built node; 𝐕𝑜 = 𝑚𝑒𝑎𝑛(𝐕𝑖𝑜 , 𝐕𝑖𝑜 , . . . , 𝐕𝑖𝑜 ) for the internal node; 𝑎𝑖 denotes
1 2 𝑞
by the opcode sequences in the training set. An opcode is handled the 𝑖th word in an API package; 𝑜𝑖 denotes the 𝑖th opcode in an opcode
as a word. The opcode2vec model is trained on the opcode corpus. sequence.
For a node with an opcode sequence, the average over the vector of
each opcode is acquired as the vector for an opcode sequence. The 3.4.2. Feature vector weighting based on the centrality metric
surrounding opcodes of an opcode in the sequence represent its usage Considering the importance of different functions, we introduce
context. The opcodes that appear in the similar usage context should the centrality concept into FCG-based malware detection. To the best
have close vectors (Khan et al., 2022). of our knowledge, this is the first paper to utilize the centrality in

5
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 5. The architecture of API2vec model.

Fig. 6. The architecture of graph convolutional network model.

graph learning-based Android malware detection. The centrality con-


cepts were first devised in social network analysis and used to mea- 𝐡(𝑙+1)
 (𝑖)
= aggregate(𝐡(𝑙)
𝑗 , ∀𝑗 ∈  (𝑖)) (7)
sure the importance of a node in the network (Wu, Li, et al., 2019).
Centrality analysis has been successfully utilized in different areas 𝐡(𝑙+1) = norm(𝐡(𝑙+1) ) (8)
𝑖 𝑖
(e.g. program dependency networks (Wu et al., 2022), transportation
networks (GuimerĂ et al., 2005)). Different types of centrality have 𝐡
norm(𝐡) = (9)
been proposed to quantify the importance of a node in a network ‖𝐡‖2
from different aspects, such as degree centrality (Freeman, 1978), 1 ∑
closeness centrality (Freeman, 1978), EigenCentrality (Newman, 2010), 𝐡 = 𝐡 (10)
|| 𝑣∈ 𝑣
and others. According to our empirical experiments, we apply degree
centrality as the weight for each node’s vector. Because it is efficient Where 𝐖(𝑙) is the weight matrix, aggregate is the mean of the rep-
and effective. It is defined as Eq. (4), where 𝑑𝑒𝑔(𝑖) denotes the degree resentation of neighboring nodes, 𝜎 is the activation function
of the 𝑖th node and 𝑛 denotes the number of nodes in a graph. (i.e. ReLu), the norm is the normalization function of the new node
representation. 𝐡 is the vector of a graph by readout, and it is obtained
𝑑𝑒𝑔(𝑖)
𝑑𝑖 = (4) by the average over vectors of all nodes.
𝑛−1
Finally, the vector of a node is represented by
3.6. Model explanation
𝐡𝑖 = 𝑑𝑖 ∗ 𝐕𝑖 (5)
Inspired by the visualization work in the vulnerability detection
Where 𝐕𝑖 is a vector obtained by API2vec and opcode2vec, and 𝐡𝑖 research field (Wu et al., 2022), we visualize the node importance of
is the final vector obtained by node representation in this paper. the FCGs in malware detection. The visualization helps to understand
the graph learning based malware detection,
3.5. Function call graph learning After graph embedding, each node is represented by a vector with
semantic and topology knowledge. This paper utilizes mean readout for
After building the sensitive FCGs with the nodes associated with acquiring a vector for each graph. The relation between the node vector
vectors, we further transform the graphs into vectors by a graph and graph vector is shown in Fig. 7. The graph vector is formulated as
embedding algorithm. GNN (Kipf & Welling, 2017) embeds nodes of
graphs while considering the topological information of the graph.
[ ]𝖳 ⎡ 𝑥11 + ⋯ + 𝑥𝑛1 ⎤
Different kinds of GNN algorithms have been proposed. In Vinayaka 𝐡 = 𝑔1 , 𝑔2 , ⋯ 𝑔𝑘 = 1∕𝑛 ⎢ ... ⎥ (11)
and Jaidhar (2021), the authors compared GCN (Zhang et al., 2022), ⎢ 1 ⎥
⎣𝑥𝑘 + ⋯ + 𝑥𝑛𝑘 ⎦
GraphSAGE (Hamilton et al., 2017), DotGAT (Velickovic et al., 2017),
and TAG (Du et al., 2017), and the results show that GraphSAGE Where 𝑥𝑗𝑖 denotes the 𝑖th vector of the 𝑗th node, 𝑘 is the number of
performs the best in malware detection. In this paper, we utilize the features after graph embedding, and 𝑛 is the number of nodes in a
GraphSAGE for graph embedding. graph.
The main structure of the GraphSAGE based neural network is The output 𝑦̂ of a sample is formulated as
shown in Fig. 6. Two convolutional layers are used to learn the latent
𝑦̂ = 𝐖 ∗ 𝐇 + 𝑏 (12)
representations of FCGs. The vector obtained by readout is fed into
the fully connected layer that follows an output layer. The output Where 𝐖 denotes the weight vector learned by training the graph
can predict the class label of an unknown app. GraphSAGE computes neural network, as shown in Fig. 7.
the node embeddings by aggregating the neighbor node’s features (Lo This could be further detailed as:
et al., 2022), and iteratively updates the node vectors at the object of [ ] [ ]𝖳
𝑦̂ = 𝑤1 , 𝑤2 , … , 𝑤𝑘 ∗ 𝑔1 , 𝑔2 , … 𝑔𝑘 + 𝑏
minimizing the cross entropy loss of predicting the class label of an app.
The vector of the 𝑖th node at the (𝑙+1) layer is formulated as [ ] ⎡ 𝑥11 + ⋯ + 𝑥𝑛1 ⎤
= 𝑤1 , 𝑤2 , … , 𝑤𝑘 ∗ 1∕𝑛 ⎢ ⋯ ⎥+𝑏
𝐡(𝑙+1) = 𝜎(𝐖(𝑙) concat(𝐡(𝑙) (𝑙+1)
(6) ⎢ 1 ⎥
𝑖 𝑖 , 𝐡 (𝑖) )) ⎣𝑥𝑘 + ⋯ + 𝑥𝑛𝑘 ⎦

6
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Table 1
The number of samples in each class of MalRadar dataset.
Families #apps Families #apps Families #apps
KBuster 54 FAKEBANK 80 GhostClicker 181
ZNIU 59 Lucy 80 HiddenAd 287
SpyNote 63 GhostCtrl 109 LIBSKIN 240
Joker 72 EventBot 124 Xavier 589
FakeSpy 74 MilkyDoor 208 RuMMS 795

4.2. Experiments
Fig. 7. Relation among the node vector, graph vector and weight vector.

To evaluate the performance of SeGDroid, we carry out experiments


from the following four aspects.
= 1∕𝑛(𝑤1 ∗ (𝑥11 + ⋯ + 𝑥𝑛1 ) + ⋯ + 𝑤𝑘 ∗ (𝑥1𝑘 + ⋯ + 𝑥𝑛𝑘 )) + 𝑏 (1) Ablation experiment: We evaluate the performance of differ-
= 1∕𝑛((𝑤1 ∗ 𝑥11 + 𝑤2 ∗ 𝑥12 + ⋯ + 𝑤𝑘 ∗ 𝑥1𝑘 ) ent parts in SeGDroid, specifically analyzing whether graph prun-
ing and node representation are able to improve malware detection
+ … + (𝑤1 ∗ 𝑥𝑛1 + 𝑤2 ∗ 𝑥𝑛2 + ⋯ + 𝑤𝑘 ∗ 𝑥𝑛𝑘 )) + 𝑏 (13)
performance.
According to Eq. (13), (𝑤1 ∗ 𝑥𝑗1
+ 𝑤2 ∗ 𝑥𝑗2
+ ⋯ + 𝑤𝑘 ∗ 𝑥𝑗𝑘 )
denotes (2) Graph pruning experiment: We check if graph pruning is able to
the contribution of the 𝑗th node to the prediction. The higher the value decrease the node imbalance ratio and to decrease the graph learning
is, the larger the contribution is for malware detection. Therefore, we time.
take the value of 𝑤𝑖 ∗ 𝑥𝑗𝑖 to denote the importance of the 𝑖th feature of (3) Comparison experiment: We compare SeGDroid with related
the 𝑗th node. works.
In an FCG, the importance value for each node is calculated in the (4) Discussion experiment: We mainly discuss the performance of
following three steps. SeGDroid using different graph learning algorithms.
Step 1: Obtain the weight vector 𝐖 = [𝑤1 , 𝑤2 , … , 𝑤𝑘 ] at the output On each dataset, 80% is used as training set, 20% used as testing
layer. set. In the training set, 80% is used for training the model and 20% for
Step 2: Extract the vector of each node 𝐡𝑖 = [𝑥𝑖1 , 𝑥𝑖2 … , 𝑥𝑖𝑘 ](𝑖 = validation. All experiments are performed on the server with the follow-
1, … , 𝑛) by performing graph embedding. ing environment: (1) operating system: Linux-3.10.0–957.el7.x86_64-
Step 3: Calculate the importance vector [𝑤1 ∗ 𝑥𝑖1 , 𝑤2 ∗ 𝑥𝑖2 , … , 𝑤𝑘 ∗ x86_64-with-centos-7.6.1810-Core; (2) GPU: Tesla V100-PCIE-32 GB.
𝑥𝑖𝑘 ] (𝑖 = 1, … , 𝑛) for each node. We adopt Androidguard to build the FCGs and extract the APIs and
opcodes from APKs. The DGL library (Wang et al., 2019) is used for
4. Experiments implementing graph learning algorithms.
Our graph pruning method aims at building sensitive FCGs. Using
4.1. Datasets graph pruning, the pruned graph may only have a few nodes if the
corresponding complete graph originally has a small number of nodes.
In our experiments, two publicly shared benchmark datasets are To preserve nodes for the small graphs, we utilize a threshold 𝜆 to check
applied. They are introduced as below. whether graph pruning is performed on a graph or not. The threshold
𝜆 is empirically set as 8000. That is, if the number of nodes is higher
4.1.1. CICMal2020 dataset than 8000 on an FCG, we implement graph pruning; otherwise, we will
The samples in CICMal2020 (Mahdavifar et al., 2022) are provided not perform graph pruning on it. The hyperparameters of GraphSAGE
by the CIC institute and can be downloaded from Mahdavifar et al.
are set as follows: (1) the number of convolutional layers: 2; (2) the
(2022). The Android apks are from several sources, including VirusTo-
number of nodes in each layer [64, 32]; (3) learning rate: 0.001; (4)
tal service, Contagio security blog, AMD and MalDozer. These malware
optimizer: Adam; and (5) the number of epochs: 100.
samples were collected from December 2017 to December 2018. They
In the following experiments, the accuracy, F-score, recall, and
are from the five categories of Benign, Adware, Banking malware, SMS
precision metrics are applied to evaluate the performance of different
malware and Riskware. There are respectively 4043, 1511, 2282, 4821
methods. The Acc. denotes the accuracy, Prec.(m), Rec.(m) and F-
and 3938 apks in the five categories.
score(m) respectively denote the malware class’s precision, recall and
F-score; Prec.(b), Rec.(b) and F-score(b) respectively denote the benign
4.1.2. MalRadar dataset
class’s precision, recall and F-score. The best performance is highlighted
MalRadar (Wang et al., 2022) is a growing and up-to-date Android
in bold in the tables of experimental results.
malware dataset. The family labels of some samples have been pro-
vided by this dataset. Therefore, we perform the family classification
experiments on this dataset. Some malware families contain only a 4.2.1. Ablation experiment
few samples, which are not sufficient for training the graph embed- Our work’s contributions mainly include graph pruning (denoted by
ding model. Therefore, in the following experiments, the 15 families P) and node representation, which includes node embedding (denoted
with the highest number of samples are chosen for experiments. The by E) and vector weighting with centrality (denoted by W) parts. To
details of the MalRadar dataset are shown in Table 1. Since there are analyze the contributions of the three parts, we carried out experiments
only malware samples in this dataset, we further downloaded benign to evaluate the performance of the variant models of SeGDroid. These
samples from AndroZoo. AndroZoo is also an up-to-date dataset. We variant models are illustrated as below. The symbol ‘‘−’’ means remov-
downloaded the apks with the VTScan timestamp in the year of 2022. ing. For example, SeGDroid-P-E-W means that the P, E and W parts are
To balance the number of samples between benign and these family removed from the original SeGDroid.
classes in MalRadar, we randomly selected about one thousand apks (1) SeGDroid-P-E-W: this model does not apply the three parts of
with benign label from AndroZoo. The benign samples (1024 apks) P, E and W. That is, the input data are the complete graphs associated
are combined with the malware samples of MalRadar for comparison with the raw features for nodes (the occurrence of APIs and opcodes
experiments. used in Vinayaka and Jaidhar (2021))

7
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 8. The training loss and validation loss of variant models.

(2) SeGDroid-P-W: this model only adopts the E part. That is, can further decrease the loss and improve the malware detection per-
the input data are the complete graphs associated with node features formance in most cases. For example, when compared with SeGDroid-P,
obtained by our node embedding method. SeGDroid improves the F-score of the malware class from 98.39% to
(3) SeGDroid-P: this model adopts the E and W parts. That is, 98.71%, and improves the F-score of the benign class from 95.09% to
the input data are the complete graphs associated with node features 96.11%.
obtained by our node representation method. (2) When analyzing the performance of the API and opcode embed-
(4) SeGDroid-E-W: this model adopts the P part. That is, the input ding part, we compare the two models in each pair of (SeGDroid-P-E-W,
data are the pruned graphs associated with the raw features for nodes. SeGDroid-P-W) and (SeGDroid-E-W, SeGDroid-W). The results show
(5) SeGDroid-W: this model adopts the P and E parts. That is, the that the loss of SeGDroid-P-W is less than that of SeGDroid-P-E-W. In
input data are the pruned graphs associated with node feature vectors addition, the malware F-score of SeGDroid-W (98.46%) is higher than
obtained by our node embedding method. that of SeGDroid-E-W (98.13%).
The training and validation losses of variant models are shown (3) When analyzing the performance of vector weighting with cen-
in Fig. 8. Fig. 8(a) shows that SeGDroid obtains the lowest training trality, we compare the two models in each pair of (SeGDroid-P-W,
loss. The models that adopt graph pruning achieve less training loss
SeGDroid-P) and (SeGDroid-W,SeGDroid). We observed that SeGDroid
(approximately 0.064 best among the models taking pruned graphs
outperforms SeGDroid-W in terms of accuracy, precision and F-score.
as input) than those with complete graphs (approximately 0.074 best
Similarly, when using the complete graph, the model (SeGDroid-P) that
among the models taking complete graphs as input). Fig. 8(b) shows
relies on the centrality measure outperforms the model(SeGDroid-P-W)
that SeGDroid obtains the lowest validation loss (approximately 0.09).
without any centrality measures.
We can see that the validation loss of SeGDroid is much smaller
than that of SeGDroid-W at the first epoch, and the validation loss
of SeGDroid is more stable than that of SeGDroid-W when the epochs 4.2.2. Graph pruning experiment
are higher. This demonstrates that the feature vector weighting with This paper proposes a graph pruning method based on sensitive
centrality is able to accelerate the convergence of graph learning. APIs. It aims to preserve the context of invoking sensitive APIs and
The malware detection performance of different models in terms reduce the number of nodes in a graph. The above section proves that
of accuracy, precision, recall and F-score is shown in Table 2. The graph pruning is able to improve malware detection performance. This
experimental results are analyzed from the following three aspects. section further evaluates the performance of graph pruning in the case
(1) When analyzing the performance of the graph pruning of SeG- of decreasing node imbalance and decreasing model training time.
Droid, we compare the two models in each pair of (SeGDroid-P-E-W, Regarding handling the node imbalance, we compare the number
SeGDroid-E-W), (SeGDroid-P-W, SeGDroid-W), and (SeGDroid-P, SeG- of nodes before and after graph pruning, as shown in Fig. 9. Benign-
Droid). Between the two models in each pair, the first one does not A denotes the results after graph pruning, and Benign-B denotes the
implement graph pruning, but the second one implements graph prun- results before pruning the graph. The meaning of other 𝑥𝑡𝑖𝑐𝑘 labels
ing. According to Table 2, the model that implements graph pruning is similar. It shows that the number of nodes of the benign samples

8
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 9. The number of nodes before and after graph pruning.

Table 2
The malware detection results of variant models.
Methods Acc. Prec.(m) Rec.(m) F-score(m) Prec.(b) Rec.(b) F-score(b)
SeGDroid-P-E-W 0.9715 0.9808 0.9816 0.9812 0.9423 0.9399 0.9411
SeGDroid-P-W 0.9732 0.9832 0.9815 0.9824 0.9432 0.9482 0.9457
SeGDroid-P 0.9757 0.9874 0.9804 0.9839 0.9408 0.9612 0.9509
SeGDroid-E-W 0.9715 0.9732 𝟎.𝟗𝟖𝟗𝟔 0.9813 𝟎.𝟗𝟔𝟓𝟕 0.9149 0.9396
SeGDroid-W 0.9768 0.9841 0.9852 0.9846 0.9540 0.9054 0.9522
SeGDroid 𝟎.𝟗𝟖𝟎𝟕 𝟎.𝟗𝟗𝟑𝟏 0.9812 𝟎.𝟗𝟖𝟕𝟏 0.9439 𝟎.𝟗𝟕𝟗𝟎 𝟎.𝟗𝟔𝟏𝟏

significantly decreased. Before graph pruning, the benign samples have Table 3
approximately 46,795 nodes on average. After graph pruning, the The results of SeGDroid with different thresholds of graph pruning.

benign samples have approximately 18,315 nodes on average. The most 𝜆 Acc. Prec.(m) Rec.(m) F-score(m) Prec.(b) Rec.(b) F-score(b)

significant number of nodes in benign samples decreases from 250,000 2000 0.9782 0.9817 𝟎.𝟗𝟖𝟗𝟔 0.9856 𝟎.𝟗𝟔𝟔𝟕 0.9426 0.9545
to 120,000. The node imbalance ratio decreases from 10.3 to 4.2. The 4000 0.9765 0.9907 0.9780 0.9843 0.9345 0.9715 0.9527
6000 0.9780 0.9864 0.9844 0.9854 0.9520 0.9579 0.9550
node imbalance ratio is calculated as the ratio between the number of 8000 𝟎.𝟗𝟖𝟎𝟕 𝟎.𝟗𝟗𝟑𝟏 0.9812 𝟎.𝟗𝟖𝟕𝟏 0.9439 𝟎.𝟗𝟕𝟗𝟎 𝟎.𝟗𝟔𝟏𝟏
nodes in benign samples and malware samples. 10000 0.9780 0.9822 0.9888 0.9855 0.9646 0.9442 0.9543
Regarding decreasing the time consumption of graph learning, we 12000 0.9749 0.9844 0.9824 0.9834 0.9458 0.9517 0.9487
conduct our experiments to compare the time consumption of training
the GNN on the data with or without graph pruning. The time consump-
tion performance is shown in Fig. 10. We mainly compare the following
4.2.3. Comparison experiments
three pairs of models: (SeGDroid-P-E-W, SeGDroid-E-W), (SeGDroid-P-
(1) Results on CIC dataset
W, SeGDroid-W), and (SeGDroid-P, SeGDroid). In each pair, the only
The novel aspect of SeGDroid is the feature learning based on sensi-
difference between the two models is if the graph pruning is used.
tive FCGs. This section mainly compares the feature vectors obtained by
Fig. 10 shows that the graph pruning can further decrease the time
SeGDroid with those obtained by previous works, including Permission,
consumption of training the graph learning model. This is because the MaMaDroid (Onwuzurike et al., 2019), Malscan (Wu, Li, et al., 2019)
number of nodes is significantly decreased after graph pruning. and GraphSAGE-Occ (Vinayaka & Jaidhar, 2021). Permission is the
Next, we further analyze the performance of graph pruning with feature set of required permissions. Malscan and MaMaDroid are gen-
different parameter values. We adopt a threshold 𝜆 in graph pruning to erally used with machine learning algorithms for malware detection.
preserve the node information for apps with a small number of nodes. Malscan achieves better performance when it applies 1NN(1 Nearest
If the number of nodes in an app is higher than 𝜆, it will be handled by Neighbor) according to the results in Wu, Li, et al. (2019). MaMaDroid
graph pruning. The malware detection performance of the SeGDroid performs better when it adopts random forest according to the results
with different thresholds in graph pruning is shown in Table 3. The in Onwuzurike et al. (2019). Our empirical results show that Permission
smaller the value of 𝜆, the more graphs will be pruned. The results show achieves better performance when it utilizes 3NN(3 Nearest Neighbor);
that there is no trend that the smaller the value of 𝜆 is, the better of the SeGDroid performs better when it adopts random forest. Therefore,
performance obtained by graph pruning. When further analyzing the the selected machine learning algorithms are 3NN, random forest,
results, we found that the graph pruning may decrease the performance 1NN, random forest for Permission, MaMaDroid, Malscan and SeGDroid
on the graphs with a small number of nodes. The model with the 𝜆 of respectively. In SeGDroid, we train a classification model on the vectors
8000 performs the best in terms of accuracy and F-score. Therefore, we obtained by graph embedding. The binary classification results on CIC
empirically set 𝜆 as 8000 in our experiments. dataset are shown in Table 4.

9
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 10. The training time of different models.

Table 4
Binary classification results obtained by different methods.
Methods Acc. Prec.(m) Rec.(m) F-score(m) Prec.(b) Rec.(b) F-score(b)
Permission 0.9470 0.8380 0.9690 0.8987 𝟎.𝟗𝟖𝟗𝟓 0.9400 0.9641
MaMaDroid 0.9645 0.9717 0.9817 0.9767 0.9410 0.9107 0.9256
MalScan 0.9789 0.9857 0.9865 0.9861 0.9577 0.9553 0.9565
GraphSAGE-Occ 0.9049 0.9357 0.8696 0.9014 0.8782 0.9402 0.9081
SeGDroid 𝟎.𝟗𝟖𝟑𝟕 𝟎.𝟗𝟖𝟖𝟗 𝟎.𝟗𝟖𝟗𝟔 𝟎.𝟗𝟖𝟗𝟐 0.9677 𝟎.𝟗𝟔𝟓𝟑 𝟎.𝟗𝟔𝟔𝟓

MaMaDroid is also based on the FCG for extracting the feature multiclass classification results will be discussed in this section. Also,
vectors. MalScan also weights features through centrality measures SeGDroid is compared with Permission, MaMaDroid (Onwuzurike et al.,
after combining all sensitive APIs. GraphSAGE-Occ utilizes GraphSAGE 2019), Malscan (Wu, Li, et al., 2019) and GraphSAGE-Occ (Vinayaka &
to learn the feature vector from the FCG, in which the occurrence of Jaidhar, 2021).
APIs and opcodes represents each node. In addition, GraphSAGE-Occ To carry out the binary classification experiment, all malware sam-
includes a node balance method with the objective of balancing the ples shown in Table 1 are combined into malware class. The benign
number of nodes between benign and malware. Those methods share samples from AndroZoo are in benign class. The binary classification
the source code in public. We implement the experiments based on their results are shown in Table 5. The results show that SeGDroid ob-
public shared codes. tains 98.13% accuracy, 98.75% F-score for malware class and 96.33%
Table 4 shows that SeGDroid performs the best. It obtains 98.37% F-score for benign class. It obtains comparable performance when com-
accuracy, 98.92% F-score for the malware class and 96.65% F-score pared with MalScan. It outperforms SAGEGraph-Occ that also utilizes
for the benign class. It improves the performance of GraphSAGE-Occ the graph learning method for Android malware detection.
from 90.49% to 98.37%. We also found that the node balance in To further analyze the performance of our method on fine-grained
GraphSAGE-Occ would decrease the malware detection performance.
malware detection (i.e. malware family identification). The multiclass
GraphSAGE-Occ without node balance is the same as SeGDroid-P-E-
classification results on MalRadar dataset are shown in Fig. 12. The
W, as shown in Table 2. SeGDroid-P-E-W utilizes the complete graph
results show that SeGDroid achieves the best accuracy (97.01%). There
and nodes are characterized by the occurrence of APIs and opcodes,
are 15 malware families and a benign class in the MalRadar dataset.
which are the same as Vinayaka and Jaidhar (2021). This implies that
There is no model that obtains the best performance on all classes
the node balance method that removes samples may decrease malware
in terms of precision, recall and F-score. The average values for the
detection performance. MalScan also utilizes the centrality for malware
three metrics are shown in Table 6 to assess the performance of
detection but without using graph pruning for simplifying the graphs
different models on malware family classification. Table 6 shows that
and the word2vec for semantic node representation. The results in
Table 4 show that SeGDroid also outperforms Malscan in malware SeGDroid obtains the best precision (0.9619) and F-score(0.9548) on
detection, especially in terms of the F-score of the benign class. average. SeGDroid improves F-score about 8.12%,7.25%,47.68% on av-
Concerning category classification on the CIC dataset, the classi- erage when compared with MaMaDroid, MalScan and GraphSAGE-Occ
fication results are shown in Fig. 11. Similarly, SeGDroid also per- respectively.
forms the best among these methods. SeGDroid obtains 95.29% ac- The multiclass classification is more complex than binary clas-
curacy. It achieves 96.26%, 94.44%, 89.99%, 93.53%, and 98.65% sification. The classification performance metrics of all models are
F-score for the Benign, Adware, Banking, Riskware and SMS classes, re- lower than those obtained in the case of binary classification. The
spectively. When compared with Permission, MaMaDroid,Malscan and performance metrics of some methods are reduced much, such as
GraphSAGE-Occ, SeGDroid improves F-score by about 6.94%, 11.68%, GraphSAGE-Occ. The possible reason is that the class imbalance prob-
0.37% and 4.37% respectively on average. lem exists among those families. The performance on some families
(2) Results on MalRadar dataset (such as GhostClicker and Joker) is not good. The class imbalance
To evaluate our method in more cases, we further perform exper- problem in FCG embedding is an interesting work that we will research
iments on MalRadar dataset. Similarly, the binary classification and in the future.

10
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 11. The results of category classification on CIC.

Table 5
The binary classification results on MalRadar dataset.
Methods Acc. Prec.(m) Rec.(m) F-score(m) Prec.(b) Rec.(b) F-score(b)
Permission 0.9739 0.9723 𝟎.𝟗𝟗𝟑𝟑 0.9827 𝟎.𝟗𝟕𝟗𝟏 0.9167 0.9468
MaMaDroid 0.9727 0.9866 0.9867 0.9816 0.9333 0.9608 0.9469
MalScan 𝟎.𝟗𝟖𝟏𝟑 0.9867 0.9883 𝟎.𝟗𝟖𝟕𝟓 0.9653 0.9606 0.9630
GraphSAGE-Occ 0.9763 0.9882 0.9800 0.9841 0.9423 0.9655 0.9538
SeGDroid 𝟎.𝟗𝟖𝟏𝟑 𝟎.𝟗𝟖𝟗𝟗 0.9850 𝟎.𝟗𝟖𝟕𝟓 0.9563 𝟎.𝟗𝟕𝟎𝟒 𝟎.𝟗𝟔𝟑𝟑

Table 6 SMS classes, respectively. This is because GraphSAGE is an inductive


The average metric values among these malware families.
framework that leverages node attribute information to generate rep-
Metrics Permission MaMaDroid MalScan GraphSAGE-Occ SeGDroid
resentations on previously unseen data efficiently. With the dynamic
Precision 0.9409 0.9087 0.9018 0.5960 𝟎.𝟗𝟔𝟏𝟗 nature of Android malware, the graphs in testing data may have some
Recall 𝟎.𝟗𝟔𝟑𝟑 0.8562 0.8986 0.4888 0.9527
unseen nodes. Therefore, this paper utilizes GraphSAGE for graph
F-score 0.9511 0.8736 0.8824 0.4768 𝟎.𝟗𝟓𝟒𝟖
embedding of FCGs in our experiments.

4.4. Model explanation


4.3. Discussion on SeGDroid using different GNN algorithms
To explain SeGDroid model, we propose a method to visualize the
In our experiments, GraphSAGE is used for graph embedding. In
importance of each graph node (the functions in an app). According
this section, we discuss the performance of SeGDroid using differ-
to the model visualization method proposed in Section 3.6, the visu-
ent GNN algorithms for graph embedding, including: GCN (Graph
alization first depicts the nodes’ importance and then summarizes the
Convolutional Networks) (Zhang et al., 2022), GraphSAGE (Hamilton
et al., 2017), DotGAT (dot product version of self attention in Graph functions with high importance values.
Attention Network) (Velickovic et al., 2017), TAG (Topology Adaptive On the CIC dataset, taking SMS malware as an example, the impor-
Graph) (Du et al., 2017), and SGC (Simplifying Graph Convolutional tance vector of each node is shown in Fig. 14. The results show that the
Networks) (Wu, Souza, et al., 2019). The binary classification results android.telephony.SmsMananger.getDefault and
are shown in Table 7. The results are consistent with the results android.telephony.SmsManager.sendTextMessage achieve higher val-
reported in Vinayaka and Jaidhar (2021). GraphSAGE performs the ues than other functions. This denotes that the two functions are more
best among the GNN algorithms. GraphSAGE obtains 98.07% accuracy, likely to relate to malicious behavior.
98.71% F-score for malware and 96.11% F-score for benign apps. We summarize the functions of the nodes with higher importance
Next, we carry out experiments of category classification on the CIC values in SMS malware, as shown in Table 8. First, we rank the nodes
datasets, as shown in Fig. 13. The results show that GraphSAGE also according to the importance value and select the top 10 nodes in each
achieves the best accuracy (92.3%), 95.24%, 92.71%, 81.21%, 89.78% malware. We further rank the selected nodes according to their selected
and 97.63% F-scores for the Benign, Adware, Banking, Riskware and frequency, and then the top 10 nodes are summarized in Table 8. The

11
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 12. The family classification results on the MalRadar dataset.

Table 7
The binary classification results obtained by different graph learning algorithms.
Methods Acc. Prec.(m) Rec.(m) F-score(m) Prec.(b) Rec.(b) F-score(b)
GCN 0.9710 0.9882 0.9732 0.9807 0.9208 0.9641 0.9420
DotGAT 0.9257 0.9800 0.9205 0.9493 0.7927 0.9418 0.8609
TAG 0.9586 0.9736 0.9716 0.9726 0.9127 0.9183 0.9155
SGC 0.9372 0.9682 0.9481 0.9580 0.8488 0.9035 0.8753
GraphSAGE 𝟎.𝟗𝟖𝟎𝟕 𝟎.𝟗𝟗𝟑𝟏 𝟎.𝟗𝟖𝟏𝟐 𝟎.𝟗𝟖𝟕𝟏 𝟎.𝟗𝟒𝟑𝟗 𝟎.𝟗𝟕𝟗𝟎 𝟎.𝟗𝟔𝟏𝟏

Table 8
The APIs of the top 10 nodes.
ClassName FunctionName Explanation
Landroid/telephony/SmsManager sendTextMessage Send a text-based SMS.
Landroid/telephony/SmsManager getDefault Get the SmsManager associated with the default subscription id.
Landroid/view/animation/Animation setDuration Sets the duration of the animation
Landroid/view/animation/TranslateAnimation <init> Initialize this animation
Landroid/telephony/TelephonyManager getLine1Number Get the phone number
Landroid/telephony/TelephonyManager getNetworkOperator Get the MCC+MNC of current registered operator
Landroid/telephony/TelephonyManager getSimOperator Get the MCC+MNC of the provider of the SIM
Landroid/telephony/SmsMessage createFromPdu Create a SmsMessage from a raw PDU
Landroid/telephony/SmsMessage getOriginatingAddress Get the originating address(sender) of this SMS message
Landroid/widget/Toast makeText Make a standard toast

description of these APIs is available from their documentation (Google, 5. Conclusion and future work
2022).
This benefits tracing the malicious functions of Android malware. This paper proposes a novel android malware detection method
The sendTextMessage of the SmsManager class is usually used to send named SeGDroid. It aims at learning semantic features from the sen-
a text-based SMS. SMS malware is any malicious software delivered sitive FCGs. Our method firstly builds the FCG from the Smali codes,
to victims by text messaging. It involves the malicious usage of Tele- and then extracts the sensitive FCG using our proposed graph pruning
phonyMnager (acquiring the SIM information of mobile devices) and method. Regarding node representation, we build API2vec and op-
SmsManager (sending malicious messages such as malicious links to code2vec to embed the external and internal nodes, respectively, and to
mobile devices). In addition, the SmsManager and TelephonyManager weight feature vectors at the base of the centrality measure. The result-
with high occurrence frequency in Table 8 are also the sensitive APIs ing sensitive FCGs associated with its node features are embedded by
found by PScout (Androguard, 2022). This further demonstrates that the GraphSAGE algorithm. To provide a mechanism for understanding
malicious behaviors invoke sensitive APIs with high probability. malicious behaviors, we propose an explanation method for our model.

12
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Fig. 13. The category classification results obtained by different graph learning algorithms.

Fig. 14. The visualization of the importance of each graph node in a SMS malware.

The experimental results on the CICMal2020 and MalRadar datasets are A limitation of this paper is that it does not consider the concept
summarized below. drift and class imbalance problems in FCG learning. In future work,
(1) The ablation experiments show that each of our proposals – we will further research how to improve the performance of SeGDroid
graph pruning and node representation (node embedding and vector when confronting these two problems.
weighting) – can further improve malware detection performance.
(2) The graph pruning experiments show that the graph pruning
CRediT authorship contribution statement
indeed decreases the node imbalance ratio and decreases the time
consumption of training the graph embedding model.
(3) The comparison results show that SeGDroid outperforms previ- Zhen Liu: Conceptualization, Methodology, Writing – original draft,
ous works in terms of accuracy and F-score. It achieves an F-score of Formal analysis, Software, Writing – review & editing. Ruoyu Wang:
98% in the case of malware detection, and 96% in the case of family Conceptualization, Methodology, Visualization, Software, Writing – re-
classification on average. view & editing. Nathalie Japkowicz: Conceptualization, Writing –
(4) Using our model explanation method, we visualized the impor- review & editing. Heitor Murilo Gomes: Conceptualization, Writing
tance values of FCG nodes. The malicious APIs can be highlighted by – review & editing. Bitao Peng: Funding acquisition, Writing – review
our method. It is helpful for understanding the malicious behavior. & editing. Wenbin Zhang: Resources, Writing – review & editing.

13
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Declaration of competing interest Hou, S., Ye, Y., Song, Y., & Abdulhayoglu, M. (2017). HinDroid: An intelligent android
malware detection system based on structured heterogeneous information network.
In Proceedings of the 23rd ACM SIGKDD international conference on knowledge
The authors declare that they have no known competing finan-
discovery and data mining (pp. 1507–1515). ACM.
cial interests or personal relationships that could have appeared to Kabakus, A. T. (2022). DroidMalwareDetector: A novel android malware detection
influence the work reported in this paper. framework based on convolutional neural network. Expert Systems with Applications,
206, Article 117833.
Data availability Khan, K. N., Khan, M. S., Nauman, M., & Khan, M. Y. (2022). Op2Vec: An opcode
embedding technique and dataset design for end-to-end detection of android
malware. Security and Communication Networks, 2022, Article 3710968.
Our experimental datasets have been shared in public by other Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolu-
research institutes. The source of these datasets can be found in our tional networks. In 5th International Conference on Learning Representations, ICLR
manuscript. 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings (pp. 1–14).
OpenReview.net.
Kong, K., Zhang, Z., Yang, Z., & Zhang, Z. (2022). FCSCNN: Feature centralized
Acknowledgments
Siamese CNN-based android malware identification. Computers & Security, 112,
Article 102514.
We thank the anonymous reviewers for their constructive com- Lei, T., Qin, Z., Wang, Z., Li, Q., & Ye, D. (2019). EveDroid: Event-aware android
ments. This work is supported by the Starting Research Fund from the malware detection against model degrading for IoT devices. IEEE Internet of Things
Guangdong University of Foreign Studies [Grant No. 2022RC049], Sci- Journal, 6(4), 6668–6680.
Li, Q., Chen, Z., Yan, Q., Wang, S., Ma, K., Shi, Y., & Cui, L. (2018). MulAV: Multilevel
ence and Technology Projects of Guangzhou [Grant No.
and explainable detection of android malware with data fusion. In J. Vaidya, & J. Li
202201010100], Key Research Platforms and Projects of Colleges and (Eds.), Lecture notes in computer science: vol. 11337, Algorithms and architectures for
Universities in Guangdong Province [Grant No. 2020ZDZX3060], Na- parallel processing - 18th International conference, ICA3PP 2018, Guangzhou, China,
tional Natural Science Foundation of China [Grant No. 61501128]. November 15-17, 2018, proceedings, part IV (pp. 166–177). Springer.
Lin, K., Xu, X., & Xiao, F. (2022). MFFusion: A multi-level features fusion model for
malicious traffic detection based on deep learning. Computer Networks, 202, Article
References 108658.
Liu, Z., Wang, R., Japkowicz, N., Tang, D., Zhang, W., & Zhao, J. (2021). Research on
Alhanahnah, M., Yan, Q., Bagheri, H., Zhou, H., Tsutano, Y., Srisa-an, W., & Luo, X. unsupervised feature learning for Android malware detection based on restricted
(2020). DINA: Detecting hidden android inter-app communication in dynamic Boltzmann machines. Future Generation Computer Systems, 120, 91–108.
loaded code. IEEE Transactions on Information Forensics and Security, 15, 2782–2797. Lo, W. W., Layeghy, S., Sarhan, M., Gallagher, M., & Portmann, M. (2022). Graph
Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). DL-droid: Deep learning based neural network-based Android malware classification with jumping knowledge. In
android malware detection using real devices. Computers & Security, 89. 2022 IEEE conference on dependable and secure computing (pp. 1–9).
Ananya, A., Aswathy, A., Amal, T. R., Swathy, P. G., Vinod, P., & Shojafar, M. (2020). Mahdavifar, S., Alhadidi, D., & Ghorbani, A. A. (2022). Effective and efficient hybrid
SysDroid: A dynamic ML-based android malware analyzer using system call traces. android malware classification using pseudo-label stacked auto-encoder. Journal of
Cluster Computing, 23(4), 2789–2808. Network and Systems Management, 30(1), 22.
Androguard (2022). Androguard. https://github.com/androguard/androguard.
Martín, A., Lara-Cabrera, R., & Camacho, D. (2019). Android malware detection through
Android Statistics (2022). Android statistics. https://www.businessofapps.com/data/
hybrid features fusion and ensemble classifiers: The AndroPyTool framework and
android-statistics/.
the OmniDroid dataset. Information Fusion, 52, 128–142.
Arora, S., Liang, Y., & Ma, T. (2017). A simple but tough-to-beat baseline for
Meng, G., Feng, R., Bai, G., Chen, K., & Liu, Y. (2018). DroidEcho: An in-depth
sentence embeddings. In 5th International conference on learning representations, ICLR
dissection of malicious behaviors in Android applications. Cybersecurity, 1(1), 1–17.
2017, Toulon, France, April 24-26, 2017, Conference track proceedings (pp. 1–16).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word
OpenReview.net.
representations in vector space. In Y. Bengio, & Y. LeCun (Eds.), 1st International
Au, K. W. Y., Zhou, Y. F., Huang, Z., & Lie, D. (2012). PScout: Analyzing the android
conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4,
permission specification. In T. Yu, G. Danezis, & V. D. Gligor (Eds.), The ACM
2013, Workshop track proceedings (pp. 1–12).
conference on computer and communications security (pp. 217–228). ACM.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient estimation of word
Badhani, S., & Muttoo, S. K. (2019). CENDroid - A cluster-ensemble classifier for
representations in vector space. In Y. Bengio, & Y. LeCun (Eds.), 1st International
detecting malicious android applications. Computers & Security, 85, 25–40.
conference on learning representations (pp. 1–12).
Cai, M., Jiang, Y., Gao, C., Li, H., & Yuan, W. (2021). Learning features from enhanced
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
function call graphs for Android malware detection. Neurocomputing, 423, 301–307.
representations of words and phrases and their compositionality. In C. J. C. Burges,
D’Angelo, G., Ficco, M., & Palmieri, F. (2020). Malware detection in mobile environ-
L. Bottou, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information
ments based on autoencoders and API-images. Journal of Parallel and Distributed
processing systems 26: 27th Annual conference on neural information processing systems
Computing, 137, 26–33.
(pp. 3111–3119).
Du, J., Zhang, S., Wu, G., Moura, J. M. F., & Kar, S. (2017). Topology adaptive graph
Naït-Abdesselam, F., Darwaish, A., & Titouna, C. (2020). An intelligent malware
convolutional networks. CoRR abs/1710.10370.
detection and classification system using apps-to-images transformations and con-
Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social
volutional neural networks. In 16th International conference on wireless and mobile
Networks, 1(3), 215–239.
computing, networking and communications (pp. 1–6). IEEE.
Gao, H., Cheng, S., & Zhang, W. (2021). GDroid: Android malware detection and
classification with graph convolutional network. Computers & Security, 106, Article Newman, M. E. J. (2010). Networks: An introduction. Oxford University Press.
102264. Onwuzurike, L., Mariconti, E., Andriotis, P., Cristofaro, E. D., Ross, G. J., & Stringh-
Google (2022). Android developers. https://developer.android.google.cn/. ini, G. (2019). MaMaDroid: Detecting android malware by building Markov chains
Grace, M. C., Zhou, Y., Zhang, Q., Zou, S., & Jiang, X. (2012). RiskRanker: Scalable and of behavioral models (extended version). ACM Transactions on Privacy and Security,
accurate zero-day android malware detection. In N. Davies, S. Seshan, & L. Zhong 22(2), 14:1–14:34.
(Eds.), The 10th international conference on mobile systems, applications, and services Ou, F., & Xu, J. (2022). S3 feature: A static sensitive subgraph-based feature for android
(pp. 281–294). malware detection. Computers & Security, 112, Article 102513.
Guerra-Manzanares, A., Bahsi, H., & Nõmm, S. (2021). KronoDroid: Time-based hybrid- Qiu, J., Han, Q., Luo, W., Pan, L., Nepal, S., Zhang, J., & Xiang, Y. (2023). Cyber code
featured dataset for effective android malware detection and characterization. intelligence for android malware detection. IEEE Transactions on Cybernetics, 53(1),
Computers & Security, 110, Article 102399. 617–627. http://dx.doi.org/10.1109/TCYB.2022.3164625.
Guerra-Manzanares, A., Luckner, M., & Bahsi, H. (2022). Concept drift and cross-device Razgallah, A., Khoury, R., Hallé, S., & Khanmohammadi, K. (2021). A survey of malware
behavior: Challenges and implications for effective android malware detection. detection in Android apps: Recommendations and perspectives for future research.
Computers & Security, 120, Article 102757. Computer Science Review, 39, Article 100358.
GuimerĂ, R., Mossa, S., Turtschi, A., & Amaral, L. A. N. (2005). The worldwide Rong, X. (2014). Word2vec parameter learning explained. CoRR abs/1411.2738.
air transportation network: Anomalous centrality, community structure, and cities’ Scalas, M., Maiorca, D., Mercaldo, F., Visaggio, C. A., Martinelli, F., & Giacinto, G.
global roles. Proceedings of the National Academy of Sciences, 102(22), 7794–7799. (2019). On the effectiveness of system API-related information for Android
Hamilton, W. L., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on ransomware detection. Computers & Security, 86, 168–182.
large graphs. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. Shar, L. K., Demissie, B. F., Ceccato, M., & Minn, W. (2020). Experimental comparison
V. N. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing of features and classifiers for Android malware detection. In Proceedings of the
systems 30: Annual conference on neural information processing systems 2017 (pp. IEEE/ACM 7th international conference on mobile software engineering and systems
1024–1034). (pp. 50–60).

14
Z. Liu et al. Expert Systems With Applications 235 (2024) 121125

Sun, B., Ban, T., Chang, S., Sun, Y. S., Takahashi, T., & Inoue, D. (2019). A Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.
scalable and accurate feature representation method for identifying malicious Cambridge University Press.
mobile applications. In C. Hung, & G. A. Papadopoulos (Eds.), Proceedings of the Wu, Y., Li, X., Zou, D., Yang, W., Zhang, X., & Jin, H. (2019). MalScan: Fast
34th ACM/SIGAPP symposium on applied computing (pp. 1182–1189). ACM. market-wide mobile malware scanning by social-network centrality analysis. In 34th
Tang, J., Li, R., Jiang, Y., Gu, X., & Li, Y. (2022). Android malware obfuscation variants IEEE/ACM international conference on automated software engineering (pp. 139–150).
detection method based on multi-granularity opcode features. Future Generation IEEE.
Computer Systems, 129, 141–151. Wu, F., Souza, A. H., Jr., Zhang, T., Fifty, C., Yu, T., & Weinberger, K. Q. (2019).
Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN: Simplifying graph convolutional networks. In K. Chaudhuri, & R. Salakhutdinov
Image-based malware classification using fine-tuned convolutional neural network (Eds.), Proceedings of machine learning research: vol. 97, Proceedings of the 36th
architecture. Computer Networking, 171, Article 107138. international conference on machine learning (pp. 6861–6871).
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., & Bengio, Y. (2017). Wu, Y., Zou, D., Dou, S., Yang, W., Xu, D., & Jin, H. (2022). VulCNN: An image-inspired
Graph attention networks. ArXiv abs/1710.10903. scalable vulnerability detection system. In 44th IEEE/ACM 44th international
Vinayaka, K. V., & Jaidhar, C. D. (2021). Android malware detection using function conference on software engineering (pp. 2365–2376). ACM.
call graph with graph convolutional networks. In 2021 2nd International conference Xu, P., Eckert, C., & Zarras, A. (2021). Detecting and categorizing android malware with
on secure cyber computing and communications (pp. 279–287). http://dx.doi.org/10. graph neural networks. In SAC ’21: The 36th ACM/SIGAPP symposium on applied
1109/ICSCCC51823.2021.9478141. computing (pp. 409–412).
Wang, S., Chen, Z., Yan, Q., Ji, K., Peng, L., Yang, B., & Conti, M. (2020). Deep and Zhang, H., Lu, G., Zhan, M., & Zhang, B. (2022). Semi-supervised classification of graph
broad URL feature mining for android malware detection. Information Sciences, 513, convolutional networks with Laplacian rank constraints. Neural Processing Letters,
600–613. http://dx.doi.org/10.1016/j.ins.2019.11.008. 54(4), 2645–2656.
Wang, L., Wang, H., He, R., Tao, R., Meng, G., Luo, X., & Liu, X. (2022). MalRadar: Zhang, N., an Tan, Y., Yang, C., & Li, Y. (2021). Deep learning feature exploration
Demystifying android malware in the new era. Proceedings of ACM Measurement for android malware detection. Applied Soft Computing, 102, Article 107069. http:
and Analysis of Computing, 6(2), 40:1–40:27. //dx.doi.org/10.1016/j.asoc.2020.107069.
Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Zheng, M., Sun, M., & Lui, J. C. S. (2013). Droid analytics: A signature based analytic
Gai, Y., Xiao, T., He, T., Karypis, G., Li, J., & Zhang, Z. (2019). Deep graph system to collect, extract, analyze and associate android malware. In 12th IEEE
library: A graph-centric, highly-performant package for graph neural networks. international conference on trust, security and privacy in computing and communications
arXiv: Learning. (pp. 163–171).

15

You might also like