0% found this document useful (0 votes)

127 views10 pages

An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF

Uploaded by

Grace

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views10 pages

An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF

Uploaded by

Grace

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Expert Systems With Applications 218 (2023) 119593

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

An effective end-to-end android malware detection method✩

Huijuan Zhu a , Huahui Wei a , Liangmin Wang b ,∗, Zhicheng Xu c , Victor S. Sheng d ,∗
a
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, Jiangsu, China
b
School of Cyber Science and Engineering, Southeast University, Nanjing, 211189, Jiangsu, China
c
School of Mathematical Sciences, Jiangsu University, Zhenjiang, 212013, Jiangsu, China
d
Department of Computer Science, Texas Tech University, Lubbock, TX79409, TX, USA

ARTICLE INFO ABSTRACT

Keywords: Android has rapidly become the most popular mobile operating system because of its open source, rich
Android hardware selectivity, and millions of applications (Apps). Meanwhile, the open source of Android makes it
Malware detection the main target of malware. Malware detection methods based on manual features are easily bypassed by
Convolution neural network
confusing technologies and are suffering from low code coverage. Thus, we propose an automated extraction
Image feature
method without any manual expert intervention. Specifically, we characterize the vital parts of the Dalvik
executable (Dex) to an RGB (Red/Green/Blue) image. Furthermore, we propose a novel convolutional neural
network (CNN) variant with diverse receptive fields using max pooling and average pooling simultaneously
(MADRF), named MADRF-CNN, which can capture the dependencies between different parts of the image
(transferred from the Dex file) by capitalizing on multi-scale context information. To evaluate the effectiveness
of the proposed method, we conducted extensive experiments and our experimental results showed that the
Accuracy of our method is 96.9%, which is much better than state-of-the-art solutions.

1. Introduction Nikam, & Sewak, 2021; Wang et al., 2017) rely on the features ex-
tracted from Android package (APK) files without running. In addition
With the rapid development of the mobile Internet, smartphones to the classic permissions and API, some studies have found that
have become an indispensable part of people’s life. According to Android-related intents, strings and components can also effectively
statista’s smartphone operating system shipment market share report characterize malware. One of the typical works is Drebin proposed
in 2022, Android’s market share increased to 83.8% (Statista, 2021). by Arp et al. (2014). Besides, Mahindru and Sangal (2021) proposed
The huge market has also promoted the development of Android MLDroid to detect real-world malware by selecting permissions and API
malicious software (malware). A special report on Android malware calls as raw features. Gao et al. (2021) converted the malware detection
published by Chianxin Threat Intelligence Center indicated that about problem into a node classification task by mapping Apps and APIs into
2.3 million new malware were detected on the mobile terminal, and a large heterogeneous graph. Dynamic detection (Cai, Jiang, Gao, Li, &
about 6,301 new mobile phone malware were intercepted every day on Yuan, 2021; D’Angelo, Palmieri, Robustelli, & Castiglione, 2021; Enck
average. Among the malicious acts, malicious fee deductions accounted et al., 2014; Haq, Khan, & Akhunzada, 2021; Hasan, Ladani, & Zamani,
for 34.9%. In addition, they also include resource consumption with 2021; Liu et al., 2015; Millar, McLaughlin, del Rincon, & Miller, 2021;
24.2%, rogue behavior with 22.8%, privacy theft with 12.3%, decep- Sihag, Vardhan, Singh, Choudhary, & Son, 2021) refers to monitoring
tion and fraud with 4.3% and remote control with 1.5% (QiAnXin, the runtime behaviors of an App. For instance, Surendran, Thomas, and
2021).
Emmanuel (2020) proposed a graph signal-based malware detection
Android malware detection has attracted considerable attention in
method GSDroid, which captures the runtime system call dependencies
recent years, and the existing malware detection works can be roughly
as raw features. Enck et al. (2014) proposed a dynamic stain analysis
divided into two categories: static analysis and dynamic analysis. Static
tool Taintdroid to detect malware by monitoring sensitive information
analysis (Arp, Spreitzenbarth, Hübner, Gascon, & Rieck, 2014; Gao,
flow. Liu et al. (2015) used API hooks to obtain the software runtime
Cheng, & Zhang, 2021; Mahindru & Sangal, 2021; Rathore, Sahay,

✩ This study was funded by the National Key R&D Program of China (2020YFB1005500), the Leading-edge Technology Program of Jiangsu Natural Science
Foundation (BK20202001) and the National Natural Science Foundation of China (62272204).
∗ Corresponding authors.
E-mail addresses: [email protected] (H. Zhu), [email protected] (H. Wei), [email protected] (L. Wang), [email protected]
(Z. Xu), [email protected] (V.S. Sheng).

https://doi.org/10.1016/j.eswa.2023.119593
Received 27 September 2022; Received in revised form 17 January 2023; Accepted 19 January 2023
Available online 28 January 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

information to detect malware. They applied a real-time system to Ping, Sun, and Ye (2020) extracted the API call sequences by running
monitor potential privacy data abuse and system call behavior and so the App in virtual environments and processed the extracted features
on. for subsequent modules, such as Temporal Convolution Network (TCN)
These methods have achieved excellent performance. However, and attention layer. To exploit the benefits of various types of fea-
static methods are difficult to deal with code confusion technologies, ture and deep learning methods, Gibert, Mateu, and Planes (2020)
and dynamic methods are suffering from time-consuming. Another proposed a multimodal deep learning based method HYDRA, which
interesting research branch of malware detection, namely, directly uses CNN and a fully connected layer to learn meaningful features
converting a Dex file to an RGB (Red/Green/Blue) image, has achieved from multiple types of features (e.g., API-based features, Byte-based
unexpected success (Fang, Gao, Jing, & Zhang, 2020; Hu et al., 2014; features and Opcode-based features) and the learned feature are input
Marastoni, Continella, Quarta, Zanero, & Dalla Preda, 2017; Meng into softmax after being integrated by the fully connected layer. Alazab,
et al., 2016; Wang, Zhou, Lu, & Zhang, 2019). The executable codes of Alazab, Shalaginov, Mesleh, and Awajan (2020) proposed an effective
Android Apps are stored in the Dex files in hexadecimal. Correspond- machine learning based malware detection method employing request
ingly, the color in the computer can also be expressed in hexadecimal. permissions and API calls. It is worth mentioning that they divided
Compared with traditional features (such as API, permissions and com- the APIs into three groups, namely ambiguous, risky and disruptive.
ponents), image features only require simple conversion processing, Their experiments showed that the combination of destructive and
which completely gets rid of manual intervention. Meanwhile, they are risk API calls plays an important role in malware detection. Kabakus
more effective than traditional features in dealing with code confusion (2022) proposed an end-to-end Android malware detection framework
and cover almost all the codes of an App. DroidMalwareDetector based on CNN, which uses intents and API
The existing image-based malware detection methods mainly con- calls alongside the permissions to perform comprehensive malware
vert the whole Dex or manifest file into an image and feed the image analysis. Lakshmanarao and Shashi (2022) extracted opcode sequences
into the neural network. However, after in-depth analysis, we find that from Android APK files. The extracted raw features are preprocessed
not all parts of these files make a positive contribution to malware and input to Long Short-term Memory (LSTM) for malware detection.
detection. Moreover, these efforts often ignore the relationship between A series of malware detection methods based on dynamic feature
different sections of the Dex file and different parts of the corresponding analysis have also been proposed Liu, Li, Zhao, Su, and Liu (2021),
image converted, Therefore we propose a novel Android malware Wang et al. (2020), Zhang, Qi, and Wang (2020). For instance, Zhang
detection framework, named MADRF-CNN, by converting the cropped et al. (2020) proposed a stacked deep network architecture to automat-
compact Dex files into an image. The main contributions of this article ically learn the correlation between API features, which combined with
are as follows: the advantages of CNN and Bi-LSTM. They used the Cuckoo sandbox to
run the samples and simulate the user’s operation, and then extracted
• We propose an image-based end-to-end Android malware de-
the API call sequence to construct the feature vector. Xue, Zhou, Chen,
tection method without any manual expert intervention which
Luo, and Gu (2017) proposed an on-device dynamic analysis tool by
improves the anti-confusion ability. Moreover, the framework
tracking information flow and monitoring the device at the system
includes a novel method MADRF-CNN to mine the association
level and instruction level. Wang et al. (2020) monitored kernel-level
relationship between different parts of the Dex file.
source data to capture the dynamic behavior of each target process, and
• We propose a novel preprocessing method for Dex files, which
then built an anomaly detection model based on a neural embedded
can not only reduce the resource consumption of network training
network. Alzaylaee, Yerima, and Sezer (2020) proposed a deep learning
but also effectively improve detection performance by eliminating
based malware detection system DL-Droid, which utilizes dynamic
unimportant redundant sections.
analysis (e.g., API calls, Actions/Events) combining a stateful input
• We collected the latest Apps from Google Play Store and Virus-
generation method. Martín, Rodríguez-Fernández, and Camacho (2018)
share and so on to build a dataset to effectively characterize the
proposed an Android malware families classification method CANDY-
current situation of malware. We evaluated the effectiveness of
MAN by exploiting dynamic traces and Markov chains. Zhou (2021)
the proposed method in various aspects and compared it with
performed taint analysis and dynamic function tracing to identify pri-
similar state-of-the-art solutions.
vate information leaks. Cai, Meng, Ryder, and Yao (2018) presented a
The rest of this paper is arranged as follows. The second section novel dynamic malware detection method DroidCat based on Random
discusses related work. The third section illustrates the overall archi- Forest (RF) by profiling inter-component communication (ICC) intents
tecture of MADRF-CNN. The fourth section shows and discusses our and method calls.
experimental results. The fifth section summarizes our work and points The existing machine learning-based malware detection works usu-
out future work. ally extract limited features from Dex (manifest) files or part of App
runtime behaviors. These features are usually one-sided and not enough
2. Related works to fully characterize malware. The recent great success of CNN in
image recognition provides a new direction. Bakour and Ünver (2021)
In recent years, machine learning technologies based on manual proposed DeepVisDroid, which converts manifest files and Dex files
features (e.g., permissions, APIs, components, intents) are widely used into grayscale images and feeds them to CNN to detect malware. Simi-
in malware detection (Calleja, Martín, Menéndez, Tapiador, & Clark, larly, Ghouti and Imam (2020) firstly converted the executable code of
2018). As the commonest used features, permissions play an important an App into a grayscale image, and then Support Vector Machine (SVM)
role in malware detection (Li & Li, 2020; Wu et al., 2021; Zhu, Wang, is adopted for classification. Xiao and Yang (2019) proposed a CNN-
Zhong, Li, & Sheng, 2022). For instance, Zhu et al. (2022) proposed a based Android malware detection method by converting the whole
malware detection framework based on hybrid deep learning by using Dalvik bytecode into an RGB image. Bourebaa and Benmohammed
permissions and sensitive APIs to represent an App. Li and Li (2020) (2020) transformed Dex files into grayscale images and then used CNN
extracted multiple information such as permissions, components, sys- for recognition. Yadav, Menon, Ravi, Vishvanathan, and Pham (2022)
tem calls and IP addresses of Apps to construct feature vectors and proposed an image-based Android malware detection method without
improved the robustness of the proposed malware detection model relying on manual analysis. It mainly explores the performance of the
based on integrated deep learning from the perspective of adversarial pre-trained EfficientNet-B4 model in malware detection by inputting
training. Wu et al. (2021) characterized malware with API calls and RGB images converted from Dex files. Sun, Daoudi, Allix, and Bissyandé
permissions and introduced an attention mechanism based on Multi- (2021) exploited the CNN model to detect malware by feeding gray-
layer Perceptron (MLP) to detect malware and interpret it. Huang, Lu, scale images converted from binary code and metadata/configuration

2
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

Fig. 1. The proposed end-to-end android malware detection framework.

files of Android APKs. Vasan et al. (2020) proposed an image-based Algorithm 1: reprocess Dex files
malware detection method IMCFN using fine-tuned CNN. Specifically, it Data: a Dex file
converts the raw malware binaries into color images and then the fine- Result: a txt file with hexadecimal characters
tuned CNN architecture is employed to detect and identify malware Array of 𝐷𝑒𝑥 ← read current Dex file;
families. Zhang, Luktarhan, Ding, and Lu (2021) proposed an effective ℎ𝑒𝑎𝑑𝑒𝑟 ← 𝐷𝑒𝑥[0, 112];
Android malware detection method based on TCN by inputting byte- // six sections with the same operation, so take
code gray images. The gray images are converted from the combination one as an example
of AndroidManifest.xml and the data section of classes.dex. for 𝑛𝑎𝑚𝑒 ← 𝑠𝑡𝑟𝑖𝑛𝑔, 𝑡𝑦𝑝𝑒, 𝑝𝑟𝑜𝑡𝑜, 𝑓 𝑖𝑒𝑙𝑑, 𝑚𝑒𝑡ℎ𝑜𝑑, 𝑐𝑙𝑎𝑠𝑠 do
Unfortunately, the entire Dex files or most of them are converted for 𝑠𝑡𝑎𝑟𝑡_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ← 56 to 96 do
to images and input to CNN, which is excessively time-consuming // Reverse is a function that reverses the
and inefficient. More importantly, most parts of Dex files, such as
order of input array
data and header, are difficult to provide cost-effective information for
// hexToDec is a function that translates hex
malware analysis. Therefore, we propose a cutting method for Dex files,
number to dec number
which can reduce the overhead of subsequent deep network training
𝑛𝑎𝑚𝑒 𝑠𝑖𝑧𝑒 ←
and improve detection performance by removing redundant sections.
ℎ𝑒𝑥𝑇 𝑜𝐷𝑒𝑐(Reverse(ℎ𝑒𝑎𝑑𝑒𝑟[start_position,start_position+4]));
Specifically, these compact and high-value RGB images achieved by
our cutting method are input to the proposed MADRF-CNN network to
𝑛𝑎𝑚𝑒 𝑜𝑓 𝑓 𝑠𝑒𝑡 ←
learn more efficient features. Finally, an efficient end-to-end malware
ℎ𝑒𝑥𝑇 𝑜𝐷𝑒𝑐(Reverse(ℎ𝑒𝑎𝑑𝑒𝑟[start_position+4,start_position+8]));
detection method is proposed in this work.
𝑠𝑡𝑎𝑟𝑡_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ←𝑠𝑡𝑎𝑟𝑡_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 + 8;
3. The proposed end-to-end malware detection framework
𝑤𝑟𝑖𝑡𝑒(Dex[𝑛𝑎𝑚𝑒 𝑜𝑓 𝑓 𝑠𝑒𝑡,𝑛𝑎𝑚𝑒 𝑜𝑓 𝑓 𝑠𝑒𝑡 + 𝑛𝑎𝑚𝑒 𝑠𝑖𝑧𝑒]);
// function 𝑤𝑟𝑖𝑡𝑒 put the needed hex characters
We propose an end-to-end malware detection framework without
into a new txt file
relying on manual features, named as MADRF-CNN, to efficiently detect
Android malware. As shown in Fig. 1, the proposed end-to-end malware end
detection framework can be divided into three main phases: Dex file end
cutting, image features generation and classification. Dex files can be
obtained by decompressing the Android APK files. When the necessary
sections of Dex files are filtered, one pixel can be generated from every
three-hexadecimal number of the sections of Dex files. Then, these data of an Android App, which can provide a panorama of the App. To
pixels can form an image. Classification is conducted by our proposed obtain the Dex file, we first decompress the APK file of each App, and
network MADRF-CNN. The source code to extract features is available then retain the files ending with .dex by matching the suffix of files. As
at https://github.com/MADRF-CNN/extractor. shown in Fig. 2, a Dex file can be divided into three portions: header,
index and data. The header portion stores the basic information such
3.1. The cut method of the dex file as sizes and offsets of other sections. The index portion consists of the
following parts: the string index, the type index, the proto index, the
field index, and the method index. The data portion contains the data
An APK file is essentially a zip file which contains the entire project section and class definitions and so on. Therefore, besides the general
of this App, including resource files, signature, Dex files, manifest files direction, a Dex file can be divided into eight sections according to the
and so on. The Dex files contain all operation instructions and runtime offset and size of each section recorded in the header portion. But not

3
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

Algorithm 2: image converting

Data: a txt file with hex characters
Result: an image of APK file
// ℎ𝑒𝑥𝑇 𝑜𝐵𝑦𝑡𝑒𝐴𝑟𝑟𝑎𝑦 is a function that convert hex
characters to bytes
𝑏𝑦𝑡𝑒 𝐴𝑟𝑟𝑎𝑦 ← ℎ𝑒𝑥𝑇 𝑜𝐵𝑦𝑡𝑒𝐴𝑟𝑟𝑎𝑦(readFileContent(𝑡𝑥𝑡 𝑓 𝑖𝑙𝑒));
Input: 𝑏𝑦𝑡𝑒 𝐴𝑟𝑟𝑎𝑦, 𝑤𝑖𝑑𝑡ℎ, ℎ𝑒𝑖𝑔ℎ𝑡
Output: 𝑝𝑖𝑥𝑒𝑙 𝐴𝑟𝑟𝑎𝑦
for 𝑖 ← 0 to ℎ𝑒𝑖𝑔ℎ𝑡 do
for 𝑗 ← 0 to 𝑤𝑖𝑑𝑡ℎ do
𝑖𝑑𝑥 ← 𝑤𝑖𝑑𝑡ℎ × 𝑖 + 𝑗;
𝑟𝑔𝑏𝐼𝑑𝑥 ← 𝑖𝑑𝑥 × 3;
𝑟𝑒𝑑 ← 𝑏𝑦𝑡𝑒 𝐴𝑟𝑟𝑎𝑦[rgbIdx];
𝑔𝑟𝑒𝑒𝑛 ← 𝑏𝑦𝑡𝑒 𝐴𝑟𝑟𝑎𝑦[rgbIdx+1];
Fig. 2. The structure of a Dex file. 𝑏𝑙𝑢𝑒 ← 𝑏𝑦𝑡𝑒 𝐴𝑟𝑟𝑎𝑦[rgbIdx+2];
𝑐𝑜𝑙𝑜𝑟 ← ((𝑏𝑙𝑢𝑒&0x000000FF),
(𝑔𝑟𝑒𝑒𝑛&0x0000FF00)>>8,
all of them can provide cost-effective information for malware analysis. (𝑟𝑒𝑑&0x00FF0000)>>16);
Then, we abandon the header and data portions. This is because the 𝑝𝑖𝑥𝑒𝑙 𝐴𝑟𝑟𝑎𝑦[idx] ← 𝑐𝑜𝑙𝑜𝑟;
header portion is used to locate every single section of the Dex file and end
identify some basic information of one Dex file, which is usually the end
same in essentials. The reasons that we abandon the data portion are Input: 𝑝𝑖𝑥𝑒𝑙 𝐴𝑟𝑟𝑎𝑦
as follows. Output: 𝑖𝑚𝑎𝑔𝑒
// set the pixels of the image and then write it to a
• A small amount of malicious behaviors is enough to make an App file with ‘jpg’ as its suffix
be identified as malware. However, the major parts of the data // 𝑠𝑒𝑡𝑅𝐺𝐵 and 𝑤𝑟𝑖𝑡𝑒 are functions of class
portion in malicious Apps are legitimate, and the data portion BufferedImage
usually accounts for 80% of Dex files. Intuitively, the data portion 𝑖𝑚𝑎𝑔𝑒 ← BufferedImage;
makes an App developed in a benign direction. 𝑠𝑒𝑡𝑅𝐺𝐵(pixel Array);
• Images with the data portion as a component bring a heavy 𝑤𝑟𝑖𝑡𝑒(image);
load to the deep learning-based method. Specifically, it is not
only extremely time-consuming but also sharply increases the
difficulty to learn efficient features.
• Adjusting the size of the image to fit the input of the deep colors) can store more information. Compared with other text features,
network is a conventional operation. However, information loss is image features show more connection between functions inside the
inevitable during the process of resizing. Therefore, images with App, which means more malicious combinations of functions will be
a large amount of redundant information will increase the risk of learned when one image is fed into a deep network. As shown in
crucial information loss. Algorithm 2, the filtered Dex file is firstly converted into a file com-
posed of hexadecimal numbers. After reading the hexadecimal file,
As shown in Algorithm 1, we extract six parts (i.e., String_ids,
we specify the required image height and width, and use the ‘‘idx’’
Type_ids, Proto_ids, Field_ids, Method_ids and Class_ids) of the Dex file
variable to identify the starting position of six hexadecimal characters.
according to the offset and size information provided by the header,
Next, the RGB file is constructed according to the following conversion
and then write them into a new file. Hence, the final retained parts
rules, that is, every six hexadecimal characters form a pixel containing
should be compact and efficient. It is worth mentioning that the Dex file
three channels, which are red, green and blue in turn. The value of
adopts Little-Endian, so when translating it into a well-known decimal
three channels is taken out through ‘‘and’’ and ‘‘shift’’ operation and
number, it is necessary to reverse the sequence to obtain the correct
calculated to decimal numbers, which will eventually form a pixel
value.
matrix (for example: 0x868816 =R:134, G:136, B:22). Further, the
generated pixel matrix is written into the jpg file to complete the
3.2. Image features generation process
conversion process.

One APK file usually contains multiple Dex files with the same 3.3. The classification based on MADRF-CNN algorithm
structures when an APK stores more than 65,536 methods. Therefore,
as shown in Fig. 3, we integrate multiple Dex files by following the Due to its strong data fitting ability, as a feature extractor or
method in Fang et al. (2020). In addition, similar to the Dex file, classifier, CNN performs well in various fields (Oquab, Bottou, Laptev,
the manifest file is another important file in APK, which contains & Sivic, 2014). However, there are some recent deep learning based
significant configuration information of an App. Because the size of methods ignore the global context information as well as the receptive
the manifest file is small and it is also a hexadecimal file before fields of pixels and do not consider the reuse of pixel features during
decompilation, the manifest file can also be used to generate an image. the feature extraction stage (Peng, Yu, Peng, & Lu, 2021; Wang, Guo,
The RGB color contains three components, which are known as & Wang, 2021). Therefore, in this work, we propose a novel variant
red, green and blue. The value of each color component ranges from of CNN with diverse receptive fields using max pooling and average
0 to 255, which is the same as the value range represented by two pooling simultaneously, known as MADRF-CNN. Specifically, we first
hexadecimal characters. As a result, this transformation process takes stack two CNN blocks as feature extractor and attaches a MADRF block.
less effort than extracting text features from manifest or other files. The classifier is implemented by three Fully Connected (FC) layers and
Compared with grayscale images (single channel, which can represent the Softmax function. In addition, we name the network structure that
256 colors), RGB images (three channels, which can represent 224 only use one pooling method (i.e., max pooling and average pooling) as

4
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

Fig. 3. Merging multiple Dex files.

Fig. 4. The architectural overview of model.

MaxDRF and AvgDRF for comparison in subsequent experiments. The context information. To balance the cost and efficiency, we retain three
architectural overview of the proposed method MADRF-CNN is shown different receptive fields. In addition, average pooling and max pooling
in Fig. 4. are employed in MADRF for scale conversion. Then, three convolutional
Before feeding into deep networks, each image is resized to 200 × layers are utilized to process those three scale feature maps. After
200 and normalized to [0, 1]. The CNN block is generated by the parallel pooling operation, the acquired features are spliced as the input
combination of a convolutional layer with ReLU as its activation and a of the fully-connection layer. The procedure is shown as follows.
max pooling layer. The forward propagation of the convolutional layer
𝑝𝑟,𝑚
𝑢,𝑣 = 𝑚𝑎𝑥 𝑧𝑚 (2)
is as follows. 0≤𝑖≤𝑤𝑤𝑟 −1,0≤𝑗≤𝑤ℎ𝑟 −1 𝑢×𝑠𝑟 +𝑖,𝑣×𝑠𝑟 +𝑗

∑ 𝑘ℎ−1
𝑘𝑤−1 ∑ ∑𝑐
𝑝𝑟,𝑚 𝑧𝑚 (3)
𝑥𝑙−1,𝑑 𝑢,𝑣 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒
𝑙,𝑚,𝑑
𝑧𝑙,𝑚
𝑢,𝑣 = 𝑓 ( 𝑢×𝑠+𝑖,𝑣×𝑠+𝑗 𝑘𝑖,𝑗 + 𝑏𝑙,𝑚 ) (1) 0≤𝑖≤𝑤𝑤𝑟 −1,0≤𝑗≤𝑤ℎ𝑟 −1
𝑢×𝑠 𝑟 +𝑖,𝑣×𝑠𝑟 +𝑗
𝑖=0 𝑗=0 𝑑=0
𝑎𝑟 = 𝐶𝑜𝑛𝑣(𝑝𝑟 ), 𝑟 ∈ {𝑟1 , 𝑟2 , 𝑟3 } (4)
where 𝑧𝑙,𝑚 represents the 𝑚th feature map of the output of the 𝑙th
layer, and 𝑥𝑙,𝑑 represents the 𝑑th channel of the input of 𝑙th layer after where 𝑟 represents the scale of the width or height of the output
padding. The channel number of the output is denoted by 𝑐. The width, feature map of a pool compared with the input one, 𝑤𝑤𝑟 , 𝑤ℎ𝑟 , 𝑠𝑟 mean
height and stride of the convolutional kernel are represented as 𝑘𝑤, 𝑘ℎ the window width, window height and stride of the pool, which are
and 𝑠, respectively. Additionally, 𝑘𝑙,𝑚,𝑑 represents the weights of the 𝑑th all not fixed but depend on 𝑟, and 𝐶𝑜𝑛𝑣 donates the convolutional
kernels of the 𝑚th convolutional filter in 𝑙th layer, and 𝑏𝑙,𝑚 represents layer. MADRF improves the receptive fields (Luo, Li, Urtasun, & Zemel,
the bias of the filter. Besides, 𝑓 is the activation function and it is ReLu 2016) of feature maps greatly. The receptive field of the feature map
in this work. generated based on the original scale is only 3 × 3, while 60% scale
To further capture the dependencies between different parts of a reaches 5 × 5 and 20% even reaches 15 × 15. The receptive field
Dex file (image) by capitalizing on multi-scale context information, is expanded to capture the long-distance dependence of interaction,
we propose a novel CNN block with MADRF. As shown in Fig. 4, the such as between String_ids and Methods_ids in the Dex file. This multi-
input of MADRF is the feature maps achieved by the CNN blocks. scale information is concatenated and input into FC layers to realize
MADRF can freely expand the receptive field to acquire multi-scale classification. The detailed process is shown below, 𝑊 and 𝑏 are the

5
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

weights and bias of the FC layers, 𝑓 is the ReLU function, and 𝑥 4.2. Evaluation index
represents the multi-scale features achieved by MADRF.
{ } In this work, we regard malware detection as a binary classification
𝑥 = 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑒 𝑓 𝑙𝑎𝑡𝑡𝑒𝑛(𝑎𝑟𝑝𝑜𝑜𝑙 ), 𝑟 ∈ {𝑟1 , 𝑟2 , 𝑟3 }, 𝑝𝑜𝑜𝑙 ∈ {𝑚𝑎𝑥, 𝑎𝑣𝑒𝑟𝑎𝑔𝑒} problem. The evaluation indicators employ the five most common and
representative indicators (e.g., Accuracy, Precision, Recall, F1-Score
(5)
and Matthews Correlation Coefficient (MCC)). It is worth mentioning
that MCC is a relatively appropriate evaluation indicator for unbal-
𝐹 𝐶(𝑥) = 𝑓 (𝑥𝑊 + 𝑏) (6) anced datasets. Because our dataset (2,507 malicious apps and 1,417
benign ones) is unbalanced, we specially introduce the MCC indicator.
The metrics are on account of the true positive (TP), true negative (TN),
𝑦̂ = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝐹 𝐶(𝐹 𝐶(𝐹 𝐶(𝑥)))) (7)
false positive (FP) and false negative (FN) values, where TP represents
where 𝑟1 = 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙, 𝑟2 = 60%, 𝑟3 = 20%. The cross-entropy loss function the number of malicious Apps correctly identified as malware. TN rep-
is used to calculate the classification loss, resents the number of benign Apps correctly identified as benign Apps.
FP represents the number of benign Apps misclassified as malware. FN
1 ∑
𝑁
̂ 𝑦) = −
𝑙𝑜𝑠𝑠(𝑦, 𝑦 𝑙𝑜𝑔(𝑦̂𝑖 ) (8) represents the number of malware misclassified as benign Apps.
𝑁 𝑖=1 𝑖
𝑇𝑃 + 𝑇𝑁
Accuracy = (9)
where 𝑁 represents the number of classes, and 𝑦 is the actual label in 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
one-hot representation. 𝑇𝑃
Precision = (10)
𝑇𝑃 + 𝐹𝑃
4. Experiment 𝑇𝑃
Recall = (11)
𝑇𝑃 + 𝐹𝑁
All the experiments have been conducted using an Intel Core i7-
2 × ( Precision × Recall )
11800H with 8 cores and 16 GB RAM. The k-fold cross-validation 𝐹 1-𝑆𝑐𝑜𝑟𝑒 = (12)
Precision + Recall
with k=5 has been applied in this work. The source code to reproduce
the results of our work is available at https://github.com/MADRF- 𝑇𝑃 × 𝑇𝑁 − 𝐹𝑃 × 𝐹𝑁
MCC = √ (13)
CNN/malware_network. (𝑇 𝑃 + 𝐹 𝑃 )(𝑇 𝑃 + 𝐹 𝑁)(𝑇 𝑁 + 𝐹 𝑃 )(𝑇 𝑁 + 𝐹 𝑁)

4.3. Performance evaluation

4.1. Sample set and parameter setting

To evaluate the effectiveness of the proposed method, we con-

Given that some existing public datasets usually lag behind the
ducted six sets of experiments using three types of features (i.e., Dex
latest released Apps, the benign Apps of the dataset are collected
images, manifest images and combination images), and the results
from Google Play Store (GooglePlayStore, 2022) and other legitimate
are reported in Table 1. The first set of experiments is based on the
Android markets in the past two years. APKs that have not been
image features of Dex files while running traditional CNN, and it
detected by any antivirus tool of VirusTotal (VirusTotal, 2022) are
is named as Dex features (CNN) in Table 1. The second and third
labeled as benign. Similarly, malicious Apps are downloaded from
sets are based on the manifest images and combination images re-
VirusShare (VirusShare, 2022). Finally, the dataset consists of 2,507
spectively using traditional CNN, and they are named as Manifest
malicious apps and 1,417 benign ones. It is worth mentioning that the
features (CNN) and combination features (CNN) in Table 1. The first
dataset contains three types of features, i.e., the image features of Dex three sets of experiments are the selected baselines. Correspondingly,
files (which are obtained by our cutting method), the image features of the fourth, fifth and sixth sets of experiments are to run our pro-
manifest files, the combination features of Dex and manifest files. posed method MADRF-CNN on these three types of features, which are
We have carried out a series of pre-experiments on the determi- named Dex features (MADRF-CNN), Manifest features (MADRF-CNN)
nation of network hyper-parameters. As we know, an inappropriate and combination features (MADRF-CNN) respectively in Table 1.
learning rate will lead to a decline in the performance of a deep learn- As we know, the manifest file will be loaded first during App
ing model, while an appropriate batch size will improve the training startup, because it contains a lot of important configuration informa-
speed and make the gradient decline direction more accurate. So for the tion. Numerous manual features are extracted from this file, e.g., per-
learning rate, we change the learning rate after fixing other parameters, missions and components. Given the importance of the manifest file,
and its value range is [1e−5, 1e−3]. After that, we adjust the batch size some image-based malware detection methods also convert the file
in the range [16,128]. Considering training efficiency and performance, into an image, but as shown in Table 1, it does not perform well in
after a certain amount of pre-experiment and parameter adjustment, the our experiment. We further investigate whether it can be used as a
training parameters of MADRF-CNN are as follows: the batch size is 64 supplementary to the Dex image to improve the detection performance.
and the learning rate is 0.001, the number of epochs is set to 5 and the Therefore, we conduct the third and sixth sets of experiments about
optimizer is Adam. combination features, which are composed of manifest images and Dex
In our MADRF-CNN method, we firstly stacked two CNN blocks, images. We can observe that the performance of combination features
which consist of a convolution layer with ReLU activation and a max has been significantly improved compared with using the manifest file
pool layer with a window size of 2. For these two blocks, we set the alone. For instance, all indicators are more than 90%, except MCC.
kernel size, filter number, stride, and padding of the two convolution Unfortunately, it does not improve the performance achieved by Dex
layers are 3 × 3, 32, 1, 1 and 3 × 3, 64, 1, 1, respectively. The shape image features. Therefore, we consider that files such as manifest are
of the output feature map is 𝑁 × 50 × 50, where 𝑁 is the number of not suitable to be converted into an image for malware detection,
the batch size. Then, there is a MADRF block and the output sizes of because it contains a lot of redundant information in the structure.
its every type of pools are 𝑁 ×50×50, 𝑁 ×30×30 and 𝑁 ×10×10 and every The Dex file is a highly structured and standardized file. There
pool is followed by a convolution layer with parameters 3 × 3, 16, 1, are internal links between different parts, and the proposed method
1. We flatten and concatenate the six output feature maps. Finally, we MADRF block focuses on mining this kind of association or inter-
stacked three FC layers with structure 128-64-2 as a classifier, and their action by exploiting on multi-scale context information. In fact, the
activation functions of them are ReLU, ReLU and Softmax, respectively. process of converting Dex files to images is completed byte by byte

6
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

Table 1
MADRF-CNN Performance Evaluation Results (Bold is the best, and italic is the second).
Feature types Accuracy Precision Recall F1-Score MCC
Dex features (CNN) 95.9% 96.5% 98.0% 94.5% 89.3%
Manifest features (CNN) 78.1% 83.2% 86.1% 73.5% 48.3%
Combination features (CNN) 95.1% 96.1% 97.0% 93.6% 87.4%
Dex features (MADRF-CNN) 96.9% 97.1% 98.9% 95.9% 92.0%
Manifest features (MADRF-CNN) 81.0% 84.1% 89.8% 76.3% 54.2%
Combination features (MADRF-CNN) 95.5% 96.8% 97.2% 94.1% 88.4%

Table 2 4.4. Comparison with existing works

Pooling Performance Evaluation Results using Dex Features (Bold is the best, and italic
is the second).
To further verify the performance of our proposed method, we study
Pooling types Accuracy Precision Recall F1-Score MCC
similar systems and select four of them for comparison. Two of them are
Max pooling 96.3% 96.8% 98.3% 95.1% 90.3%
based on grayscale images (Bakour & Ünver, 2021; Sun et al., 2021) and
Average pooling 96.1% 96.2% 98.6% 94.8% 89.7%
Max&average pooling 96.9% 97.1% 98.9% 95.9% 92.0% another two are based on RGB images (Fang et al., 2020; Xiao & Yang,
2019). These comparison schemes have not released their source codes
and datasets. Therefore, we re-implemented their feature extraction
and detection methods and conducted experiments on the datasets we
and part by part. It is very necessary to increase the flow of long- collected. It is worth noting that, because some parameter settings
distance information. This result in Table 1 further proves that our in the original research are missing, considering both our hardware
proposed method MADRF can capture the dependence or interaction conditions and the performance of the models themself, we set the
of different parts of the image, which can further improve the feature parameters as follows: the batch size we adopted is 64, the learning
learning ability of the model. Table 1 shows that the proposed method rate is 0.001 and the number of epochs are 30, 20, 60 and 60 (The
MADRF-CNN feeding the compact Dex image features achieved by our ranking of comparison schemes in Table 3 from top to bottom).
cutting method perform the best with an average Accuracy of 96.9%, Bakour and Ünver (2021) proposed a hybrid deep learning model
Precision of 97.1%, Recall of 98.9%, F1-Score of 95.9% and MCC of DeepVisDroid to detect Android malware based on grayscale image
92.0%. To further verify the effectiveness of our proposed MADRF- features. In this work, the manifest.xml file, resources.arsc file and Dex
CNN, we input the compact Dex image features into the native CNN file from each APK are converted into grayscale images, and the images
realized by removing our MADRF block, that is, our first set of baseline are fed into CNN for learning. The malware detection method proposed
experiments. As shown in Table 1, using the same experimental setup, by Fang et al. (2020) converts the Dex file into the RGB image and
compared with the native CNN, our MADRF-CNN achieves Accuracy plain text respectively. Finally, these image features and text features
by 1, Precision by 0.6, Recall by 0.9, F1-Score by 1.4 and MCC by will be input into multiple kernel learning for classification. It is worth
2.7 percentage points improvements. In the experiments of the other mentioning that we only reproduce the part similar to our scheme.
two types of features, our proposed method MADRF-CNN has also That is, in the feature processing stage, we only convert the Dex file
brought significant performance improvement. For instance, compared into a grayscale image or RGB image. Sun et al. (2021) exploited
with that of native CNN, when feeding manifest image features, our CNN model to detect malware by feeding gray-scale images converted
proposed method MADRF-CNN realizes Accuracy by 2.9, Precision by from binary code and metadata/configuration files of Android APK
0.9, Recall by 3.7, F1-Score by 2.8 and MCC by 5.9 percentage points files. Xiao and Yang (2019) propose a detection approach that directly
improvements. learns the meaningful features from Dalvik bytecode based on CNN.
As we know, average pooling (Hu, Shen, & Sun, 2018) has been Different from our proposed cutting scheme, these four works transform
widely used to aggregate spatial information. In fact, max pooling can the entire Dex file into grayscale or RGB image which is extremely
also gather another important clue to obtain distinctive features (Woo, time-consuming. There are also some differences in the methods of
Park, Lee, & Kweon, 2018). Thus, we utilize both average pooling and generating images. For example, the method proposed by Fang et al.
max pooling simultaneously. We empirically confirm that exploiting (2020) expands a single hexadecimal character to an RGB pixel, and
both of them effectively improves the representation power of the the size of the generated image exceeds 10MB. The comparison results
proposed network. Specifically, the first set of experiments only use the based on our reproduction process are shown in Table 3.
average pooling. The second set only uses the max pooling. The third From Table 3, we can notice that our proposed MADRF-CNN per-
set uses both of them, which is the structure proposed by us. All three forms the best, comparing with other existing similar works. In detail,
sets of experiments are based on the image features of Dex files, which in terms of Accuracy, Precision, Recall, F1-score and MCC, our method
is also the feature type recommended in this work. The results are achieves 1.7, 8.3, 6, 5.3 and 4.3 percentage points improvements than
reported in Table 2. The experimental results in Table 2 indicate that that of DeepVisDroid. Compared with the method proposed by Sun
the combination of max and average pooling performs best. Compared et al. (2021), our method realizes Accuracy by 6, Precision by 4.6,
with the model that only uses the max pooling, the model using Recall by 4.3, F1-score by 2.4 and MCC by 13.3 percentage points im-
the combination pooling achieves Accuracy by 0.6, Precision by 0.3, provements, respectively. Compared with the method proposed by Xiao
Recall by 0.6, F1-Score by 0.8 and MCC by 1.7 percentage points and Yang (2019), our method achieves Accuracy by 4.2, Precision by
improvements, respectively. Compared with the model that only uses 3.8, Recall by 3.4, F1-score by 1.5 and MCC by 7.8 percentage points
the average pooling, the model using the combination pooling achieves improvements, respectively. In addition, our method is superior to the
Accuracy by 0.8, Precision by 0.9, Recall by 0.3, F1-Score by 1.1 and method proposed by Fang et al. (2020). We believe that this huge gap
MCC by 2.3 percentage points improvements, respectively. To further is mainly caused by the differences in feature learning. Fang et al.
verify the reasonableness of our proposed method, we plot the obtained (2020) extracted the texture features and color features of RGB images
accuracy and loss values during the training for both training and test by using the GIST algorithm and Color Moment. It can be seen that
sets in Fig. 5. Specifically, during the training process, we saved a excessive preprocessing of original features sometimes does not bring
model to obtain the accuracy and loss values of the test set after each better performance improvement.
epoch. Generally, Table 2 and Fig. 5 show that combination pooling These four works can classify malware with gratifying performance,
can improve the detection performance. but our method still has certain advantages. The main reason is that,

7
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

Fig. 5. Loss and accuracy of MADRF-CNN.

Table 3
Comparison results with the existing works using Dex Features (Bold is the best, and italic is the second).
Method Accuracy Precision Recall F1-Score MCC
Fang et al. (2020) 82.8% 75.5% 71.5% 73.1% 60.9%
DeepVisDroid (Bakour & Ünver, 2021) 95.2% 88.8% 92.9% 90.6% 87.7%
Xiao and Yang (2019) 92.7% 93.3% 95.5% 94.4% 84.2%
Sun et al. (2021) 90.9% 92.5% 94.6% 93.5% 78.7%
MADRF-CNN (ours) 96.9% 97.1% 98.9% 95.9% 92.0%

as mentioned before, our method generates RGB images that are not manifest files are far less. The manifest is generally less than 100kb,
dominated by the sections commonly appeared in both malware and while the Dex file is usually in MB, which means that the information
benign apps, such as the Dex header and the data section. In addition, stored in the Dex file is more comprehensive and sufficient than the
our work not only capitalizes on the excellent feature learning ability manifest file. Moreover, it can be seen from the hexadecimal manifest
of CNN, but also proposes a novel deep learning block MADRF to file that the content of this small file is also filled with numerous zero
augment the pixel receptive field and enhance the global awareness characters. In other words, the information density of the manifest
of context information. Moreover, by combining two pooling methods file is much lower than that of the Dex file. In addition, there are
in the MADRF block, more comprehensive and effective features in the massive repetitive contents in the file, such as ‘‘Android: name’’ and
image are learned. Thus, the network performance is further improved. ‘‘uses permission’’. On the premise of insufficient information density,
the existence of these redundant contents increases the difficulty of
4.5. Discussion feature recognition and learning. However, it is worth mentioning that
the important configuration information contained in this file is very
In this work, an end-to-end malware detection method has been suitable for manual feature extraction.
proposed based on RGB image representation and MADRF-CNN. Three
types of image-based features have been constructed to verify the 5. Conclusion
effectiveness of MADRF-CNN. We can notice that the image converted
from the manifest file is far less effective than the image converted With the rapid development of mobile App programming and anti-
from Dex files. Under the same network structure and parameters, the reverse-engineering techniques, the difficulty of malware detection
average Accuracy of the latter can achieve 96.9%, while the result of is exacerbated. To address challenges in existing detection solutions
manifest features is only 81.0%. The main reasons we summarize are based on manual features, such as code obfuscation and limited cover-
as follows. Firstly, compared with Dex files, the size and number of age, we proposed an effective end-to-end Android malware detection

8
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

framework that does not rely on prior knowledge and manual fea- Gao, H., Cheng, S., & Zhang, W. (2021). Gdroid: Android malware detection and
tures. Specifically, the proposed method directly learns representative classification with graph convolutional network. Computers & Security, 106, Article
102264.
features from compact Dex images based on Convolutional Neural
Ghouti, L., & Imam, M. (2020). Malware classification using compact image features
Network. To further capture the dependency or interaction between and multiclass support vector machines. IET Information Security, 14(4), 419–429.
different parts of each Dex image, we proposed a novel CNN vari- Gibert, D., Mateu, C., & Planes, J. (2020). HYDRA: A multimodal deep learning
ant MADRF-CNN. We conducted a series of experiments using the framework for malware classification. Computers & Security, 95, Article 101873.
GooglePlayStore (2022). Google play store. https://play.google.com/store/apps.
dataset collected in the recent two years to verify the effectiveness
Haq, I. U., Khan, T. A., & Akhunzada, A. (2021). A dynamic robust DL-based model
of the proposed methods, including the image cutting scheme and for android malware detection. IEEE Access, 9, 74510–74521.
the MADRF-CNN method. Additionally, we compared MADRF-CNN Hasan, H., Ladani, B. T., & Zamani, B. (2021). MEGDroid: A model-driven event
with the existing state-of-the-art solutions. Experimental results demon- generation framework for dynamic android malware analysis. Information and
Software Technology, 135, Article 106569.
strated that MADRF-CNN obtains better performance in malware de-
Hu, j., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of
tection. The proposed method not only inherits the advantages of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
malware detection methods based on image features, such as not Hu, W., Tao, J., Ma, X., Zhou, W., Zhao, S., & Han, T. (2014). MIGDroid: Detecting
relying on manual interference and being able to deal with code APP-repackaging android malware via method invocation graph. In 2014 23rd
confusion technologies, but also inherits some inherent disadvantages international conference on computer communication and networks (pp. 1–7).
Huang, J., Lu, C., Ping, G., Sun, L., & Ye, X. (2020). TCN-ATT: A non-recurrent
of such methods, such as lack of good interpretability. Therefore, we model for sequence-based malware detection. In Pacific-asia conference on knowledge
will design more targeted and interpretable solutions based on the discovery and data mining (pp. 178–190). Springer.
inherent structure of Dex files and the visualization technology of CNN Kabakus, A. T. (2022). DroidMalwareDetector: A novel android malware detection
to promote the applications of such end-to-end solutions. framework based on convolutional neural network. Expert Systems with Applications,
206, Article 117833.
Lakshmanarao, A., & Shashi, M. (2022). Android malware detection with deep learning
CRediT authorship contribution statement using RNN from opcode sequences. International Journal of Interactive Mobile
Technologies, 16(01), 145–157.
Huijuan Zhu: Conceptualization, Methodology, Writing – origi- Li, S., Chen, J., Spyridopoulos, T., Andriotis, P., Ludwiniak, R., & Russell, G. (2015).
Real-time monitoring of privacy abuses and intrusion detection in android system.
nal draft. Huahui Wei: Investigation, Software, Validation. Liangmin In International conference on human aspects of information security, privacy, and trust
Wang: Conceptualization, Methodology, Supervision. Zhicheng Xu: (pp. 379–390). Springer.
Software, Validation. Victor S. Sheng: Conceptualization, Methodol- Li, D., & Li, Q. (2020). Adversarial deep ensemble: Evasion attacks and defenses
ogy, Writing – review & editing. for malware detection. IEEE Transactions on Information Forensics and Security, 15,
3886–3900.
Liu, C., Li, B., Zhao, J., Su, M., & Liu, X. (2021). MG-DVD: A real-time framework for
Declaration of competing interest malware variant detection based on dynamic heterogeneous graph learning. arXiv
preprint arXiv:2106.12288.
The authors declare that they have no known competing finan- Luo, W., Li, Y., Urtasun, R., & Zemel, R. (2016). Understanding the effective recep-
tive field in deep convolutional neural networks. Advances in neural Information
cial interests or personal relationships that could have appeared to
processing systems, 29.
influence the work reported in this paper. Mahindru, A., & Sangal, L. A. (2021). Mldroid-framework for android malware detection
using machine learning techniques. Neural Computing and Applications, 33(10),
Data availability 5183–5240.
Marastoni, N., Continella, A., Quarta, D., Zanero, S., & Dalla Preda, M. (2017). Group-
Droid: Automatically grouping mobile malware by extracting code similarities. In
Data will be made available on request. Proceedings of the 7th software security and protection workshop (pp. 1–12).
Martín, A., Rodríguez-Fernández, V., & Camacho, D. (2018). CANDYMAN: Classifying
References android malware families by modelling dynamic traces with Markov chains.
Engineering Applications of Artificial Intelligence, 74, 121–133.
Meng, G., Xue, Y., Xu, Z., Liu, Y., Zhang, J., & Narayanan, A. (2016). Semantic
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., & Awajan, A. (2020). Intelligent
modelling of android malware for effective malware comprehension, detection, and
mobile malware detection using permission requests and API calls. Future Generation
classification. In The 25th international symposium (pp. 306–317).
Computer Systems, 107, 509–521.
Millar, S., McLaughlin, N., del Rincon, J. M., & Miller, P. (2021). Multi-view deep
Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). DL-Droid: Deep learning based
learning for zero-day android malware detection. Journal of Information Security
android malware detection using real devices. Computers & Security, 89, Article
and Applications, 58, Article 102718.
101663. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-
Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., & Rieck, K. (2014). DREBIN: level image representations using convolutional neural networks. In Proceedings of
Effective and explainable detection of android malware in your pocket. In Network the IEEE computer society conference on computer vision and pattern recognition (pp.
& distributed system security symposium (pp. 23–26). 1717–1724).
Bakour, K., & Ünver, H. M. (2021). DeepVisDroid: Android malware detection by Peng, D., Yu, X., Peng, W., & Lu, J. (2021). DGFAU-net: Global feature attention upsam-
hybridizing image-based features with deep learning techniques. Neural Computing pling network for medical image segmentation. Neural Computing and Applications,
and Applications, 33(18), 11499–11516. 33(18), 12023–12037.
Bourebaa, F., & Benmohammed, M. (2020). Android malware detection using convo- QiAnXin (2021). Security situation analysis report of android platform in 2020. https:
lutional deep neural networks. In 2020 international conference on advanced aspects //www.qianxin.com/threat/reportdetail?report_id=125.
of software engineering (pp. 1–7). Rathore, H., Sahay, S. K., Nikam, P., & Sewak, M. (2021). Robust android malware
Cai, M., Jiang, Y., Gao, C., Li, H., & Yuan, W. (2021). Learning features from enhanced detection system against adversarial attacks using Q-learning. Information Systems
function call graphs for android malware detection. Neurocomputing, 423, 301–307. Frontiers, 23(4), 867–882.
Cai, H., Meng, N., Ryder, B., & Yao, D. (2018). Droidcat: Effective android malware Sihag, V., Vardhan, M., Singh, P., Choudhary, G., & Son, S. (2021). De-LADY: Deep
detection and categorization via app-level profiling. IEEE Transactions on Information learning based android malware detection using dynamic features. Journal of
Forensics and Security, 14(6), 1455–1470. Internet Services and Information Security, 11(2), 34–45.
Calleja, A., Martín, A., Menéndez, H. D., Tapiador, J., & Clark, D. (2018). Picking on Statista (2021). Smartphone market share. https://www.statista.com/statistics/
the family: Disrupting android malware triage by forcing misclassification. Expert 1236760/worldwide-smartphone-operating-system-shipment-market-share/.
Systems with Applications, 95, 113–126. Sun, T., Daoudi, N., Allix, K., & Bissyandé, T. F. (2021). Android malware detection:
D’Angelo, G., Palmieri, F., Robustelli, A., & Castiglione, A. (2021). Effective classifica- Looking beyond Dalvik bytecode. In 2021 36th IEEE/ACM international conference
tion of android malware families through dynamic features and neural networks. on automated software engineering workshops (pp. S34–39).
Connection Science, 33(3), 786–801. Surendran, R., Thomas, T., & Emmanuel, S. (2020). GSDroid: Graph signal based
Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B., Cox, L. P., et al. (2014). compact feature representation for android malware detection. Expert Systems with
Taintdroid: An information-flow tracking system for realtime privacy monitoring Applications, 159, Article 113581.
on smartphones. ACM Transactions on Computer Systems, 32(2), 1–29. Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN:
Fang, Y., Gao, Y., Jing, F., & Zhang, L. (2020). Android malware familial classification Image-based malware classification using fine-tuned convolutional neural network
based on dex file dection features. IEEE Access, 8, 10614–10627. architecture. Computer Networks, 171, Article 107138.

9
H. Zhu et al. Expert Systems With Applications 218 (2023) 119593

VirusShare (2022). Virusshare. https://virusshare.com. Xiao, X., & Yang, S. (2019). An image-inspired and cnn-based android malware
VirusTotal (2022). Virustotal. https://www.virustotal.com/gui/home/upload. detection approach. In 2019 34th IEEE/ACM international conference on automated
Wang, Z., Guo, Y., & Wang, J. (2021). Empower Chinese event detection with improved software engineering (pp. 1259–1261). IEEE.
atrous convolution neural networks. Neural Computing and Applications, 33(11), Xue, L., Zhou, Y., Chen, T., Luo, X., & Gu, G. (2017). Malton: Towards on-device
5805–5820. non-invasive mobile malware analysis for ART. In 26th USENIX security symposium
Wang, Q., Hassan, W. U., Li, D., Jee, K., Yu, X., Zou, K., et al. (2020). You are what (pp. 289–306).
you do: Hunting stealthy malware via data provenance analysis. In 27th annual Yadav, P., Menon, N., Ravi, V., Vishvanathan, S., & Pham, T. D. (2022). EfficientNet
network and distributed system security symposium. convolutional neural networks-based android malware detection. Computers &
Wang, X., Wang, W., He, Y., Liu, J., Han, Z., & Zhang, X. (2017). Characterizing android Security, 115, Article 102622.
apps’ behavior for effective detection of malapps at large scale. Future Generation Zhang, W., Luktarhan, N., Ding, C., & Lu, B. (2021). Android malware detection using
Computer Systems, 75, 30–45. TCN with bytecode image. Symmetry, 13(7), 1107.
Wang, S., Zhou, G., Lu, J., & Zhang, F. (2019). A novel malware detection and Zhang, Z., Qi, P., & Wang, W. (2020). Dynamic malware analysis with feature
classification method based on capsule network. In International conference on engineering and feature learning. In Proceedings of the AAAI conference on artificial
artificial intelligence and security (pp. 573–584). Springer. intelligence (pp. 1210–1217).
Woo, S., Park, J., Lee, J., & Kweon, I. S. (2018). Cbam: Convolutional block attention Zhou, Y. (2021). An automated pipeline for privacy leak analysis of android ap-
module. In Proceedings of the european conference on computer vision (pp. 3–19). plications. In 2021 36th IEEE/ACM international conference on automated software
Wu, B., Chen, S., Gao, C., Fan, L., Liu, Y., Wen, W., et al. (2021). Why an android engineering (pp. 1048–1050). IEEE.
app is classified as malware: Toward malware classification interpretation. ACM Zhu, H., Wang, L., Zhong, S., Li, Y., & Sheng, V. S. (2022). A hybrid deep network
Transactions on Software Engineering and Methodology, 30(2), 1–29. framework for android malware detection. IEEE Transactions on Knowledge and Data
Engineering, 34(12), 5558–5570.

Hit-Refresh-Satya-Nadella ENGLISH
0% (1)
Hit-Refresh-Satya-Nadella ENGLISH
377 pages
Entroplyzer Android Malware Classification and Characterisation
No ratings yet
Entroplyzer Android Malware Classification and Characterisation
12 pages
Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
No ratings yet
Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
77 pages
Shanabrook Forensic Audit
No ratings yet
Shanabrook Forensic Audit
63 pages
AHLBORN - Katalog Mjernih Instrumenata 2016
No ratings yet
AHLBORN - Katalog Mjernih Instrumenata 2016
311 pages
Hybrid Machine Learning Model For Malware Analysis in
No ratings yet
Hybrid Machine Learning Model For Malware Analysis in
18 pages
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
No ratings yet
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
12 pages
Detection Approaches For Android Malware
No ratings yet
Detection Approaches For Android Malware
28 pages
ANT-A114518R1v06-4261 Datasheet
No ratings yet
ANT-A114518R1v06-4261 Datasheet
4 pages
Opel Astra H (2004-2009) Fuses and Fuse Box Diagram and Location
No ratings yet
Opel Astra H (2004-2009) Fuses and Fuse Box Diagram and Location
28 pages
1-Malicious Software Identification Based On Deep Learning Algorithms and API Feature Extraction
No ratings yet
1-Malicious Software Identification Based On Deep Learning Algorithms and API Feature Extraction
15 pages
General Organic Chemistry
No ratings yet
General Organic Chemistry
78 pages
Android Malware Detection Report
No ratings yet
Android Malware Detection Report
9 pages
IELTS Academic Listening May-Aug 2022
100% (1)
IELTS Academic Listening May-Aug 2022
137 pages
1 s2.0 S0957417424024138 Main
No ratings yet
1 s2.0 S0957417424024138 Main
16 pages
Gangguan Pendengaran Dan Kelainan Telinga
No ratings yet
Gangguan Pendengaran Dan Kelainan Telinga
157 pages
Unit I 20cec21 Geometric
No ratings yet
Unit I 20cec21 Geometric
44 pages
Android Malware Classification Using Convolutional Neural Network and LSTM
No ratings yet
Android Malware Classification Using Convolutional Neural Network and LSTM
12 pages
Wireless Security 1
No ratings yet
Wireless Security 1
16 pages
Compare Two Images
0% (1)
Compare Two Images
3 pages
Hybrid Android Malware Detection and Classification Using Deep Neural Networks
No ratings yet
Hybrid Android Malware Detection and Classification Using Deep Neural Networks
26 pages
Android Malware Family Classification Using Images From Dex Files
No ratings yet
Android Malware Family Classification Using Images From Dex Files
6 pages
Hydraulic Diagram MM0434313 - 1
100% (1)
Hydraulic Diagram MM0434313 - 1
4 pages
EDHRM - HR Metrics 2023 Course Outline - Revised
No ratings yet
EDHRM - HR Metrics 2023 Course Outline - Revised
4 pages
Feature Engineering and Evaluation For Android Malware Detection Scheme
No ratings yet
Feature Engineering and Evaluation For Android Malware Detection Scheme
18 pages
A Vast Review of Recognizing The Presence of Andro
No ratings yet
A Vast Review of Recognizing The Presence of Andro
17 pages
General Purpose Processor
No ratings yet
General Purpose Processor
13 pages
Masum 2019
No ratings yet
Masum 2019
5 pages
IJCRT2405073
No ratings yet
IJCRT2405073
3 pages
SNDGCN - Robust Android Malware Detection Based On Subgraph Network and Denoising GCN Network
No ratings yet
SNDGCN - Robust Android Malware Detection Based On Subgraph Network and Denoising GCN Network
11 pages
Towards A Fair Comparison and Realistic Evaluation Framework of Android Malware
No ratings yet
Towards A Fair Comparison and Realistic Evaluation Framework of Android Malware
18 pages
Zhu 2015
No ratings yet
Zhu 2015
4 pages
SECTION 2 Course Outline Managerial Economics MGCR 293 002 Dr. K. Salmasi (Fall 2017)
No ratings yet
SECTION 2 Course Outline Managerial Economics MGCR 293 002 Dr. K. Salmasi (Fall 2017)
12 pages
TSP Csse 52875
No ratings yet
TSP Csse 52875
21 pages
16.experimental Comparison of Features and Classifiers For Android Malware Detection
No ratings yet
16.experimental Comparison of Features and Classifiers For Android Malware Detection
12 pages
A Hybrid Approach For Android Mal Ware Detection
No ratings yet
A Hybrid Approach For Android Mal Ware Detection
15 pages
Css NCii Checklist For Trainies
No ratings yet
Css NCii Checklist For Trainies
22 pages
Prof Ed 106 Written Report 2.1
No ratings yet
Prof Ed 106 Written Report 2.1
12 pages
A Survey of Android Malware Detection With Deep Neural Models
No ratings yet
A Survey of Android Malware Detection With Deep Neural Models
36 pages
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
No ratings yet
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
8 pages
1brochure - Machine Learning PDF
No ratings yet
1brochure - Machine Learning PDF
5 pages
Ntdroid: Android Malware Detection Using Network Traffic: Features
No ratings yet
Ntdroid: Android Malware Detection Using Network Traffic: Features
12 pages
Android Based Malware Detection Technique Using Machine Learning Algorithms
No ratings yet
Android Based Malware Detection Technique Using Machine Learning Algorithms
6 pages
Tifs 18
No ratings yet
Tifs 18
14 pages
Final Research
No ratings yet
Final Research
12 pages
Temperature Controllers: Installation and Maintenance
No ratings yet
Temperature Controllers: Installation and Maintenance
5 pages
Check List DPR at SRRDA Level
No ratings yet
Check List DPR at SRRDA Level
4 pages
11.a Study of The Recruitment and Selection Process
No ratings yet
11.a Study of The Recruitment and Selection Process
11 pages
System Call Graphs
No ratings yet
System Call Graphs
8 pages
DEF: Deep Ensemble Neural Network Classifier For Android Malware Detection
No ratings yet
DEF: Deep Ensemble Neural Network Classifier For Android Malware Detection
11 pages
TSP CMC 53163
No ratings yet
TSP CMC 53163
18 pages
Network Malware Detection Using Deep Learning Netw
No ratings yet
Network Malware Detection Using Deep Learning Netw
26 pages
Sensors: Deep Feature Extraction and Classification of Android Malware Images
No ratings yet
Sensors: Deep Feature Extraction and Classification of Android Malware Images
29 pages
PART ONE: Reading: Plagiarism
No ratings yet
PART ONE: Reading: Plagiarism
2 pages
American Choral Directors Association The Choral Journal
No ratings yet
American Choral Directors Association The Choral Journal
3 pages
A Deep Learning Based Android Malware Detection System With Static Analysis
No ratings yet
A Deep Learning Based Android Malware Detection System With Static Analysis
7 pages
CIC AndMal 2017
No ratings yet
CIC AndMal 2017
5 pages
Android Malware Detection Using Machine Learning Techniques
No ratings yet
Android Malware Detection Using Machine Learning Techniques
50 pages
Banquet Menu
No ratings yet
Banquet Menu
3 pages
Accomplishment Report On Booklet
No ratings yet
Accomplishment Report On Booklet
5 pages
DR Lal Pathlabs: Interpretation
No ratings yet
DR Lal Pathlabs: Interpretation
2 pages
BT Nhóm
No ratings yet
BT Nhóm
16 pages
Enhancing Android Malware Detection Throught Ensemble Stakcking
No ratings yet
Enhancing Android Malware Detection Throught Ensemble Stakcking
11 pages
Liu Et Al. - 2024 - SeGDroid An Android Malware Detection Method Base
No ratings yet
Liu Et Al. - 2024 - SeGDroid An Android Malware Detection Method Base
15 pages
Fire Fighter
No ratings yet
Fire Fighter
3 pages
Mathematics 09 02880 v2
No ratings yet
Mathematics 09 02880 v2
18 pages
Sensors 20 03645 v2
No ratings yet
Sensors 20 03645 v2
21 pages
Odusami2018 Chapter AndroidMalwareDetectionASurvey
No ratings yet
Odusami2018 Chapter AndroidMalwareDetectionASurvey
12 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
19 pages
IEEE Xplore Citation Plain Text Download 2025.1.5.19.1.38
No ratings yet
IEEE Xplore Citation Plain Text Download 2025.1.5.19.1.38
9 pages
FINAL REVIEW PAPER Android Dynamic Malware Analysis
No ratings yet
FINAL REVIEW PAPER Android Dynamic Malware Analysis
12 pages
1 s2.0 S2667305323001436 Main
No ratings yet
1 s2.0 S2667305323001436 Main
10 pages
7.analysis and Detection of Malware in Android Applications Using Machine Learning
No ratings yet
7.analysis and Detection of Malware in Android Applications Using Machine Learning
55 pages
Shi Et Al. - 2023 - SFCGDroid Android Malware Detection Based On Sens
No ratings yet
Shi Et Al. - 2023 - SFCGDroid Android Malware Detection Based On Sens
10 pages
Droiddeep: Using Deep Belief Network To Characterize and Detect Android Malware
No ratings yet
Droiddeep: Using Deep Belief Network To Characterize and Detect Android Malware
14 pages
Malware Detection in Android Applications
No ratings yet
Malware Detection in Android Applications
3 pages
Malware Detection in Android in Different Application Categories
No ratings yet
Malware Detection in Android in Different Application Categories
6 pages
Didroid: Android Malware Classification and Characterization Using Deep Image Learning
No ratings yet
Didroid: Android Malware Classification and Characterization Using Deep Image Learning
13 pages
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
No ratings yet
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
8 pages
Research BT4260
No ratings yet
Research BT4260
5 pages
Machine Learning Aided Android Malware Classification
No ratings yet
Machine Learning Aided Android Malware Classification
21 pages
Report Documentation: Key Words: - Android Security, Malware Detection, Characterization, Deep Learning
No ratings yet
Report Documentation: Key Words: - Android Security, Malware Detection, Characterization, Deep Learning
1 page
CYCLOPENTANE
No ratings yet
CYCLOPENTANE
2 pages
Droiddetector: Android Malware Characterization and Detection Using Deep Learning
No ratings yet
Droiddetector: Android Malware Characterization and Detection Using Deep Learning
10 pages
Paulo Coelho'S: Aleph
No ratings yet
Paulo Coelho'S: Aleph
1 page
Significant Permission Identification For Machine Learning Based Android Malware Detection
No ratings yet
Significant Permission Identification For Machine Learning Based Android Malware Detection
10 pages
Android Malware Detection Using Machine Learning
No ratings yet
Android Malware Detection Using Machine Learning
4 pages
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Wikitude Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF

Uploaded by

An Effective End-To-End Android Malware Detection Method - Research Base Paper PDF

Uploaded by

Expert Systems With Applications 218 (2023) 119593

Contents lists available at ScienceDirect

Expert Systems With Applications

An effective end-to-end android malware detection method✩

ARTICLE INFO ABSTRACT

Fig. 1. The proposed end-to-end android malware detection framework.

Algorithm 2: image converting

Fig. 3. Merging multiple Dex files.

Fig. 4. The architectural overview of model.

4.3. Performance evaluation

To evaluate the effectiveness of the proposed method, we con-

Table 2 4.4. Comparison with existing works

Fig. 5. Loss and accuracy of MADRF-CNN.

You might also like