0% found this document useful (0 votes)
14 views21 pages

An Automated Vision-Based Deep Learning Model For Efficient Detection of Android Malware Attacks

Uploaded by

ramyadev667
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

An Automated Vision-Based Deep Learning Model For Efficient Detection of Android Malware Attacks

Uploaded by

ramyadev667
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Received December 13, 2021, accepted December 29, 2021, date of publication January 4, 2022, date of current version

January 7, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3140341

An Automated Vision-Based Deep Learning


Model for Efficient Detection of
Android Malware Attacks
IMAN ALMOMANI 1,2 , (Senior Member, IEEE), AALA ALKHAYER 1,

AND WALID EL-SHAFAI 1,3


1 Security Engineering Laboratory, Computer Science Department, Prince Sultan University, Riyadh 11586, Saudi Arabia
2 Computer Science Department, King Abdullah II School of Information Technology, The University of Jordan, Amman 11942, Jordan
3 Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt
Corresponding authors: Iman Almomani ([email protected]) and Walid El-Shafai ([email protected])
This work was supported by Prince Sultan University, Saudi Arabia, under Grant SEED-CCIS-2021-84.

ABSTRACT Recently, cybersecurity experts and researchers have given special attention to developing
cost-effective deep learning (DL)-based algorithms for Android malware detection (AMD) systems. How-
ever, the conventional AMD solutions necessitate extensive computations to achieve high accuracy in
detecting Android malware apps. Consequently, there is a significant benefit in utilizing convolution neural
networks (CNNs) in vision-based AMD applications to quickly and efficiently learn without prior stages of
reverse engineering processes. Thus, this paper introduces an efficient and automated vision-based AMD
model composed of 16 well-developed and fine-tuned CNN algorithms. This model precludes the need for
a pre-designated features extraction process while generating accurate predictions of malware images with
minimum cost and high detection speed. Such performance is achieved with colored or grayscale malware
images, whether by using balanced or imbalanced datasets. Firstly, the bytecodes of the ‘‘classes.dex’’ files
extracted from the Android benign and malware apps were converted to color and grayscale visual images
before forwarding them to the developed CNN algorithms for classification. Then, the detection efficiency of
the proposed AMD model was examined and evaluated using the imbalanced benchmark Leopard Android
dataset that composes 14733 samples of malware apps and 2486 samples of benign apps. Finally, different
experimental scenarios were conducted using balanced and imbalanced Android samples of color and
grayscale images generated from the Leopard dataset; to extensively and sufficiently validate the detection
and classification performance of the suggested model. Comprehensive assessment classification parameters
in the evaluation experiments were applied to prove the high capability of the developed fine-tuned CNN
algorithms in recognizing Android malware attacks with low computational overhead. As a result, the
detection accuracy reached 99.40% for balanced samples and 98.05% for imbalanced samples. Furthermore,
the proposed AMD model outperforms the existing approaches that utilize conventional vision-based
algorithms and are tested on the same benchmark Android dataset.

INDEX TERMS Cyberattacks, android, malware detection, visualization, color and grayscale images,
imbalanced datasets, deep learning, machine learning, convolution neural network (CNN), fine-tuning,
transfer learning.

I. INTRODUCTION the Android platform, users can download applications from


Android Operating System (OS) is dominating the smart- several markets such as Google Play Store or third-party mar-
phone marketplace holding 72.84% of the mobile market ketplace. However, the open nature of Android along with its
share [1]. Furthermore, due to the open-source nature of popularity rose the attraction of malware attackers. Any appli-
cation with lousy intention is malicious software (malware).
The associate editor coordinating the review of this manuscript and Malware is developed to control the user’s device, steal
approving it for publication was Shuihua Wang . his or her information, and interrupt the OS functionality.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


2700 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 10, 2022
I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

Android malware can be categorized into several types such iterations by using only the fine-tuning process of the
as Riskware, SMS, Adware, and Banking [2]. To evade CNN layers, hyperparameters, and CNN optimization
malware detection systems, malicious software developers techniques.
usually imply small modifications on the original source code • Performing extensive experiments using both visual
of the malware app to generate new malicious software vari- color and grayscale images of Android benign and mal-
ants. In consequence, identifying the new malicious software ware apps to precisely evaluate the detection perfor-
variants becomes challenging even if they belong to the same mance of the suggested AMD model even when it is
family [3]. applied on different visual image representations.
In order to overcome the aforementioned challenge, • Executing comprehensive simulation tests to prove the
a model can be trained using machine learning (ML) to validity and efficiency of the proposed automated AMD
identify malicious software families in regards to the source model using 16 various detection and classification
code variants efficiently [4]. ML has been deployed in devel- parameters.
oping malware detection systems using different approaches • Implementing different experiments to check the storage
such as static, dynamic, or hybrid analysis [5]–[7]. In static capacity and complexity performance of the developed
analysis, the original source code of the Android application CNN algorithms to prove the simplicity and efficiency
is parsed without executing the app. On the other hand, the of the proposed automated AMD model in recognizing
dynamic analysis studies the app’s features and its behaviour Android malware attacks.
during the run-time. However, in both approaches, retrieving • Conducting a comparative study in terms of the obtained
the features of the Android Application Package (APK) by classification accuracy of Android malware attacks; to
reverse engineering or run-time execution consumes process- confirm the superiority of the proposed AMD model
ing time and computational resources. However, a model can in comparison to recent related and conventional AMD
be easily trained utilizing the deep learning (DL) approach models.
by converting the malicious classification issue to an image The rest of the paper is structured as follows. Section II
classification issue [8]–[11]. discusses a background on the Android application pack-
Convolutional Neural Networks (CNN) is a type of deep age and related works. Section III introduces the proposed
learning implemented in a multi-layer algorithm to suf- automated vision-based AMD model. Section IV displays
ficiently classify a large set of images. Inspired by the the experimental results and discussions. Finally, Section V
effective classification of CNN, this paper proposes an concludes this paper and presents possible future work.
automated vision-based DL model for Android malware
detection (AMD) systems. The substantial contributions of II. BACKGROUND AND LITERATURE REVIEW
this work are: A. ANDROID APPLICATION PACKAGE
• Presenting a comprehensive review of the static-based,
Android Application Package, APK, is a zipped file for dis-
dynamic-based, and vision-based AMD approaches.
tributing and installing applications by the Android OS [12].
• Introducing an automated vision-based AMD model for
However, unzipping the APK file results in mainly retrieving
accurate and efficient detection of malware attacks exist-
the following:
ing in the Android operating system.
• Developing 16 different fine-tuned DL-based CNN • AndroidManifest.xml: a binary XML file format that
algorithms (Xception, VGG16, VGG19, DarkNet53, contains metadata of the app such as app name, permis-
MobileNetV2, ResNet101, AlexNet, ResNet50, ResNet sions, and version.
18, InceptionV3, DarkNet19, ShuffleNet, Places365- • classes.dex: a Dex file format that contains the app code.
GoogleNet, NasNetMobile, GoogleNet, and Squeeze • resources.arsc: a file that contains the pre-compiled
Net) to proficiently classify benign apps from malware resources of the app, such as styles, colors, and strings.
apps without the need for extensive computations of • assets: a directory that contains the app assets.
reverse engineering or features extraction stages. • res: a directory that contains all the app resources which
• Testing two binary classification scenarios using imbal- are not included in the resources.arsc file.
anced (14733 malware samples and 2486 benign sam- • lib: a directory that contains all the app libraries.
ples) and balanced (2486 malware samples and 2486 • META-INF: a directory that contains the metadata of the
benign samples) Android apps datasets; to demonstrate APK, such as the APK signature.
the success of the developed fine-tuned CNN algorithms A further step can be implemented by utilizing some
to work on different balanced and imbalanced datasets reverse-engineering tools to get different formats of the .dex
sizes without the need for data augmentation techniques file as shown in Fig. 1. For example, the app classes can be
like other conventional classification approaches. retrieved in .smali format by using APKtool. Furthermore,
• Accomplishing lower computational overhead and the classes in Java format can be restored from the .dex file
higher detection accuracy for the proposed vision-based by deploying Dex2jar tool [13]. This reverse engineering step
automated AMD model compared to conventional might be necessary for some research works to implement
detection models. This is achieved with fewer training deep feature extraction [14]–[16].

VOLUME 10, 2022 2701


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

network (CNN) algorithms to effectively detect malicious


software. Huang et al. developed R2-D2, a color-based CNN
detection system for the Android platform [8]. The system
converted the classes.dex file of the Android application into
an RGB (Red, Green, Blue) image. Subsequently, the colored
image was utilized in the feature extraction and training of the
CNN model. The research work results in creating a database
namely, Leopard Mobile database.
Several papers have utilized Leopard Mobile database
[9]–[11], [28]. The author of [9] developed a malware threat
hunting system (MTHS) using deep CNN (DCNN) and ML.
MTHS aims to detect malware by applying machine learn-
ing and deep learning on the converted binary files of mal-
ware applications. However, the proposed system was trained
on colored images only. Another system, TensorFlow, was
proposed by [10]. Initially, the malware source code was
filtered then deep learning algorithm was applied to iden-
tify the source code plagiarism. In [11], an visual-based
FIGURE 1. Android APK decompilation process.
malware detection framework was implemented using three
Fine-tuned CNN models including InceptionV3, ResNet50,
B. STATIC/DYNAMIC ML-BASED ANALYSIS and VGG16. An accuracy of 97.35% was obtained. Never-
Machine learning (ML) is an approach in which the system theless, the framework endured additional complexity due to
learns a pattern, develops a model, and generates predictions the applied augmentation techniques to handle the imbalance
by observing only the input data. In implementing malicious samples distribution. Furthermore, Naeem et al. developed
detection systems for Android applications, the machine an industrial Internet of Things malicious software detection
learning analysis approach utilizes several features of the scheme utilizing a visual-based CNN algorithm [28]. They
Android app including API calls, permissions, and control proved that malware detection based on colored visualization
flows [17]. However, obtaining such features can be per- outperforms the utilization of gray-scale images.
formed by static, dynamic, or hybrid approaches [18]. In the However, in some works, the malware and benign samples
static analysis approach, the source code of the Android were obtained as APK files from several sources such as
malware is parsed without executing the application. Several Drebin, AMD, and Google Play Store [29], [30]. Subse-
static features can be extracted by scanning the binary files quently, the APK samples were converted into images that
of the malicious software including permissions and API match the CNN model requirements. In [29], the authors
calls [19]. Subsequently, various machine learning algorithms created an adjacency matrix of Android APK and converted it
can be deployed in the classification process. Despite the into an image as an input to the CNN model. Darwaish et al.
fact that the static analysis is inexpensive, it lacks to detect developed an intelligent mapping algorithm of APK files
zero-day and obfuscated malware attacks [20]. to RGB images [31]. Their proposed system mapped the
Besides static-based analysis applications, there are var- Manifest file to the green channel. Furthermore, API calls
ious contributions to dynamic-based malware analysis and opcodes were mapped to the red channel. Finally, the
[21]–[24]. The dynamic analysis approach observes the malicious behaviors were mapped to the blue channel. A fur-
behavioral features of the malware by running the Android ther investigation has been implemented by [32] utilizing
APK. To avoid any damage on real devices, the execu- the Drebin database. In the proposed system the malware is
tion is performed in isolated virtual machines. Various detected by identifying maliciously opcode sequence loca-
behavior-based characteristics can be acquired during the tions in the Android app. However, machine learning algo-
dynamic analysis such as network traffic activities [25], API rithms can be combined with the CNN model to enhance
calls [26], and system log files [27]. For further classification, the detection performance. In [33], they have substituted the
the obtained behavioral features are combined in the features softmax layer of CNN with Support Vector Machine (SVM).
dataset [17]. However, since the dynamic analysis is per- The results showed that the fine-tuned CNN-SVM model
formed in an isolated environment, the malware may change surpassed the original CNN.
its behavior during its run-time. Therefore, the dynamic anal-
ysis may be insufficient in capturing the real behavior of the D. RELATED WORK COMPARISON
malware attack. Table 1 presents a summary and comparison among cur-
rent works on vision-based Android malware detection.
C. VISION ML-BASED ANALYSIS It is concluded from Table 1 that most of the conventional
The research work on the visual-based analysis approach DL-based or ML-based AMD models in the literature have
offered a new direction to deploy convolutional neural accomplished certain detection accuracy levels that are not

2702 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 1. Summary and comparison among current works on vision-based Android malware detection.

highly appreciated and recommended for efficient Android and accurate vision-based AMD model is introduced in this
malware identification in cybersecurity applications. Further- article; to accurately and efficiently detect Android malware
more, some of them require features engineering steps before attacks. The suggested AMD model composes 16 different
performing the learning process. In addition, the conventional CNN algorithms that have been fine-tuned efficiently and
detection models used datasets with a small number of sam- adequately to achieve high malware detection accuracy and
ples in the training process that have dramatically reduced low malware misclassification.
the detection efficiency. Thus, because the number of mal- Consequently, the fine-tuned and developed CNN algo-
ware apps is increasing considerably and daily, an automated rithms suggested for the vision-based AMD process in this

VOLUME 10, 2022 2703


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

paper are different from conventional AMD models that three different main modules: (1) Pre-processing module,
introduce additional steps for extracting features. In the pro- (2) Training, fine-tuning, and classification module, and
posed AMD model, the bytecodes of the benign and malware (3) Detection evaluation module. The explanations and dis-
APKs were converted to color and grayscale visual images cussions of these three modules are as follows:
before resizing and forwarding them to the developed CNN
algorithms to classify them. The goal of transforming and A. PRE-PROCESSING MODULE
resizing benign and malware apps to graphical images is to In the proposed AMD model, the bytecodes of the classes.dex
generate an Android dataset in a proper structure adapted to files obtained from the Android dataset of benign and mal-
the input format and size of the utilized CNN algorithms. ware apps have been converted into the three-channels format
The main advantage of using the pre-trained CNN algorithms of visual color images (Red, Green, Blue). Because the type
in the proposed AMD model that they were well-trained of image files affects the performance of the Android mal-
previously on more than 14 million digital images of many ware detection system, consequently, the classes.dex files of
different classes of the ImageNet database [34]. So, in the the Android APK files are converted to ‘‘.png’’ format images
proposed AMD model, the transfer learning concept was files since it is the most effective file type compared to other
exploited by employing the already trained features and the image formats. Furthermore, the ‘‘.png’’ format is better than
obtained optimal weights of the pre-trained CNN algorithms other image formats regarding preserving the information
for detecting malware attacks efficiently. This terrific benefit included in the image file. The main objective of transforming
of transfer learning is recommended in AMD tasks, especially Android apps into visual images is to acquire more additional
when examining and analyzing the performance of malware features and extra texture details that cannot be obtained and
detection models on imbalanced Android datasets. Moreover, extracted from the original benign and malware apps in their
the fine-tuning of weights and hyperparameters of the CNN binary formats. So, the Android dataset conversion to visual
layers significantly improved the operation of the utilized images avoids the need for reverse feature engineering steps
pre-trained CNN algorithms. Consequently, increasing the or any specific domain knowledge, as the case in the exist-
detection performance of the proposed AMD model without ing conventional signature-based (static-based) or behavior-
using reverse-engineering tools or signal processing-based based (dynamic-based) Android analysis techniques.
augmentation algorithms. In the conversion process, each 8-bits (bytecode) in the
classes.dex file is transformed into an RGB pixel. This pro-
III. PROPOSED AUTOMATED VISION-BASED cess was repeated for all binary bits in the .dex file of all
AMD MODEL benign and malware apps in the Android dataset. After that,
In the last years, it has been evident that the number of all obtained RGB pixels were accumulated and reformatted
Android malware cyberattacks has increased gradually. As a to generate the final 2D color image of each Android app
result, cybersecurity scholars and experts are interested in (benign or malware).
developing cost-effective and reliable solutions to mitigate To precisely evaluate the detection performance of the
the severe impact of such attacks. Therefore, this paper suggested AMD model on successfully working on different
proposes an accurate and automated vision-based Android image visualizations and representations, the visual grayscale
malware detection (AMD) model that deals with this critical images of Android benign and malware apps have also gen-
cybersecurity challenge that cannot be neglected. This model erated. Fig. 3 presents samples of the generated color and
composes different fine-tuned DL-based CNN algorithms grayscale images of the benign and malware Android APKs
developed and exploited to detect malware attacks in Android in the Leopard mobile dataset. As shown in Fig. 3, the result-
OS efficiently. ing color or grayscale images have various resolutions with
The proposed vision-based AMD model is different from different widths based on the size of their original .dex files
the conventional and existing AMD solutions. So, in contrast extracted from the benign and malware APKs. Table 2 shows
to the preceding static-based or dynamic-based AMD solu- the relation between the Android app sizes and the specific
tions that necessitate manual procedures for features extrac- widths of the generated visual images.
tion and collection, the proposed AMD model in this paper Furthermore, it is demonstrated from the obtained visual
can efficiently detect Android malware attacks without exten- color or grayscale images presented in Fig. 3 that the
sive computations resulting from extracting many complex generated images have various layouts, styles, and forms.
features from the analyzed Android apps. To be more specific, So, the malware images have particular visual similarities and
as indicated in Fig. 2, the proposed AMD model composed 16 attributes that are entirely dissimilar from those of benign
different well-developed and fine-tuned CNN algorithms that images, where each category of them has various distinctive
preclude the need for pre-designated extracted features. Thus, stripes. These remarkable differences in the visualization
the proposed AMD model can quickly learn and efficiently features of the acquired benign and malware images inspired
differentiate and recognize Android malware and benign apps us to adapt and exploit the common DL-based pre-learned
more accurately. CNN algorithms for AMD challenges and mobile cybersecu-
The main steps of the proposed automated vision-based rity applications. Therefore, these CNN algorithms utilized
AMD model are demonstrated in Fig. 2. It comprises for general image processing applications of detection,

2704 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

FIGURE 2. The proposed vision-based AMD model.

classification, and recognition tasks have been exploited in is a mandatory step where each one of the employed CNN
the proposed work to detect malware attacks in Android OS. algorithm has its specific resolution for the input image size,
After obtaining the visual color and grayscale images, as depicted in Table 3.
they were resized before redirecting them to the suggested Additionally, the obtained visual Android dataset of benign
fine-tuned CNN algorithms for automated features extrac- and malware images was distributed into two different
tion, training, and classification purposes. The resizing pro- percentages for testing and training objectives. More sim-
cess for the generated Android benign or malware images ulation experiments were carried out to decide the optimal

VOLUME 10, 2022 2705


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 2. The relation between Android APK sizes and the generated TABLE 3. The image sizes of the CNN algorithms.
visual image widths.

percentages that can be utilized to achieve efficient malware


detection with high recognition accuracy and fast training.
The simulations’ outcomes disclosed that earmarking 80%
of the visual dataset for the training process and 20% of
the visual dataset for the testing process have realized the
recommended and superior AMD performance compared to
the other testing and training percentages for the examined model. The tested Android dataset has a limited and imbal-
fine-tuned CNN algorithms. Thus, in our work, we did not anced number of benign and malware images. Thus, this
use the validation set because we used transfer learning CNN incredibly avoided the occurrence of over-fitting as possible
algorithms, not CNN algorithms developed from scratch. during the training and testing processes while validating the
This is the common practice used in the literature work in the suggested model performance.
case of using transfer learning CNN algorithms. So, in the Different DL-based pre-trained CNN algorithms were
case of using transfer learning CNN algorithms, the valida- previously trained on various natural images such as
tion set and test set are combined, and they are considered the Xception [35], VGG16 [36], VGG19 [37], DarkNet-53 [38],
testing ratio of the utilized dataset. Therefore, in the experi- MobileNet-V2 [39], ResNet101 [40], AlexNet [41],
mental analysis, both 80% and 20% of the visual images were ResNet-50 [42], ResNet18 [43], InceptionV3 [44], Dark-
chosen randomly by the suggested AMD model. Net19 [45], ShuffleNet [46], Places365-GoogleNet [47],
NasNetMobile [48], GoogleNet [49], and SqueezeNet [50].
B. TRAINING, FINE-TUNING, AND In the proposed vision-based AMD model, the fine-tuned
CLASSIFICATION MODULE versions of these sixteen CNN algorithms have employed
Most of the existing DL-based AMD algorithms trained mal- to extract and obtain the significant texture features of the
ware detection models on Android datasets with a limited Android malware and benign images. These algorithms were
number of APKs. Thus, these conventional AMD algorithms pre-learned and pre-trained on the ImageNet dataset [34] to
have significant malware detection and classification prob- distinguish different types of visual objects. Consequently,
lems because they were not well trained. Consequently, they these CNN algorithms can be exploited and re-trained quickly
can not efficiently discriminate the behaviors of benign apps using Android benign and malware images to extract their
from malware apps due to the limited number of tested main visible details and texture features; this is the terrific
samples. advantage of the transfer learning process. Therefore, the
Consequently, the dataset size used for training the CNN transfer learned-based CNN algorithms were employed in
models has significant impacts on the detection efficacy, the proposed model to detect Android malware attacks. They
classification accuracy, and the number of computations of offered effective detection performance through knowledge
the training and testing processes. So, the DL-based transfer transfer from general image detection and classification chal-
learning CNN models are efficient solutions for malware lenges to Android malware image detection and classification
attacks analysis and AMD applications, mainly when the challenge studied in this paper.
examined Android dataset contains a small number of benign Therefore, the optimized and fine-tuned versions of two
and malware samples as in the benchmark Leopard mobile different categories of pre-trained CNN algorithms were
dataset used in this paper. utilized in the proposed AMD model: series CNN algo-
Therefore, the proposed AMD model has exploited and rithms and Direct Acyclic Graph (DAG) CNN algorithms.
employed transfer learning. The already learned features, In each one of the series CNN algorithms like AlexNet,
weights, and hyperparameters of previously pre-learned CNN VGG16, VGG19, and DarkNet19, the deep CNN layers
algorithms tested for image recognition challenges with gen- are organized one after the other. In addition, each series
eral images dataset were transferred to the proposed malware CNN architecture has a single output layer and a single
image recognition challenge that used a different Android input layer. So, the series CNN algorithms are considered
images dataset. Hence, transfer learning was an effective single-path deep CNN designs with no parallel paths of
solution for malware detection analysis in the proposed AMD convolutional layers. On the other hand, the DAG CNN

2706 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

FIGURE 3. 2D visual samples for malware and benign APKs.

algorithms are considered multi-path deep CNN designs, different classes) of the ImageNet database [34]. So, the
where they have concatenated parallel multi-paths of numer- already pre-trained features have exploited and transferred in
ous convolutional layers with different filter numbers/sizes. the proposed AMD model to quickly and accurately detect
Thus, the DAG CNN algorithms have deep CNN lay- Android malware attacks.
ers organized by a directed acyclic graph; therefore, they The Xception CNN algorithm is a modern and enhanced
have more complex structures than series CNN algorithms. version of the InceptionV3 CNN algorithm [44]. The Xcep-
In addition, each DAG architecture has inputs from differ- tion CNN algorithm is called ’Extreme Inception’ algorithm,
ent CNN layers and outputs to various CNN layers. The where the Xception algorithm has the same Inception algo-
Xception, DarkNet53, MobileNetV2, ResNet101, ResNet50, rithm by replacing more of the standard convolutional (Conv.)
ResNet18, InceptionV3, ShuffleNet, Places365-GoogleNet, layers with SeparableConv. layers. The SeparableConv. lay-
NasNetMobile, GoogleNet, and SqueezeNet are different ers are utilized instead on the Conv. layers to factorize the
examples of DAG CNN algorithms. In terms of detection convolution kernel into two smaller kernels. So, the detection
accuracy, the DAG CNN algorithms have higher detection and classification performance of the Xception algorithm out-
and classification accomplishment than the series CNN algo- performs that of the InceptionV3 algorithm through proper
rithms because they can extract more informative and texture and efficient use of the algorithm hyperparameters while
features in the training process from the input malware and using a small number of training iterations. The complete
benign images. structure with the full specifications of the fine-tuned Xcep-
Among the employed CNN algorithms tested by the pro- tion algorithm utilized in the proposed vision-based AMD
posed AMD model, the fine-tuned Xception CNN algorithm model is given in Fig. 4.
achieves the most outstanding and superior detection results The input layer of the Xception CNN algorithm has an
for visual Android benign and malware classification com- input image resolution of 299×299×3. Therefore, before for-
pared to other CNN algorithms. Consequently, this paper warding the visual begin and malware images to the Xception
discusses in-depth details and insights into its structure, train- CNN algorithm, they must be resized to 299×299×3 to meet
ing behavior, fine-tuning and optimization hyperparameters, the proper input size of the input layer. As shown in Fig. 4, the
and accomplishment detection outcomes. Thus, the proposed visual malware and benign images are firstly forwarded to the
vision-based AMD model has implemented and utilized the entry flow. Then, the resulting feature maps pass to the middle
fine-tuned structure of the pre-trained Xception CNN algo- flow, repeated eight times. Finally, the resulting feature maps
rithm shown in Fig. 4; to classify and detect visualized images go through the exit flow.
of Android malware and benign apps. The DL-based Xcep- The Xception CNN algorithm consists of 36 Conv. and
tion CNN algorithm is previously trained on more general SeparableConv. layers used for extracting the main informa-
digital images (approximately 14 million images with 1000 tive texture features from the input visual malware and benign

VOLUME 10, 2022 2707


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

images. These 36 stacked Conv. layers are structured and maximum number of epochs equals 10, minimum batch size
arranged into 14 separable modules (blocks). These modules of 16, validation frequency of 16, a dropout rate of 0.5,
have linear residual networks except for the first and last learnRateSchedule parameter is set to be ‘‘piecewise’’, Learn-
modules. In the proposed Xception CNN algorithm, only RateDropPeriod parameter is set to 3, LearnRateDropFactor
one fully-connected layer is utilized before the final softmax parameter is set to 0.9, and loss categorical cross-entropy
layer used for detection and classification purposes. Thus, function is used. These all fine-tuned hyperparameters were
the Xception CNN algorithm composes a linear group of carefully chosen to avoid the overfitting occurrence and
SeparableConv. layers, including more linear residual con- optimize the performance of the training and validation
nections. In the Xception CNN algorithm, all Conv. and processes.
SeparableConv. layers are followed by batch normalization Furthermore, in the whole employed CNN algorithms,
layers that are not incorporated in Fig. 4 for simplicity in the softmax and fully-connected classifiers were utilized to
the presentation. In addition, all SeparableConv. layers utilize classify between Android malware and benign samples. Thus,
a depth multiplier of 1, not a depth expansion. The stride the output layer in the employed 16 different CNN algorithms
value of 2 × 2 is used for all Conv. and MaxPooling layers. that includes 1000 classes is customized and fine-tuned
The ReLU activation function is used to accelerate the train- to have only two classes (malware and benign). Also, the
ing process. Also, the Xception CNN algorithm composes back-propagation technique [54] is utilized in the proposed
one GlobalAveragePooling layer and four MaxPooling layers AMD model to fine-tune and optimize the hyberparameters
with a kernel value of 3 × 3. The objective of the MaxPooling and weights of the layers in the employed CNN algorithms
(Maximum Pooling) layer is to estimate the maximum value that were initially trained on the ImageNet dataset; this is
for every patch of the feature map, while the GlobalAverage- to achieve high detection efficiency in identifying malware
Pooling layer estimates the average value for every patch on attacks.
the feature map.
The most important advantage of the fine-tuned Xception C. DETECTION EVALUATION MODULE
CNN algorithm compared to other CNN algorithms that The detection evaluation module is concerned with compre-
it can be improved easily where its stacked modules have hensively evaluating the proposed vision-based AMD model
internal repeated types of layers that can be simply adapted using 16 different detection assessment parameters. Conse-
and modified. In addition, the fine-tuned Xception algorithm quently, the classification and detection efficiency of the sug-
improves the detection performance without the need to per- gested 16 different CNN algorithms have examined in terms
form deeper training, where the composed Conv. or Separa- of (1) recognition accuracy, (2) recall (sensitivity) (TPR)
bleConv. layers have different kernels that can discover and (true positive rate), (3) precision (PPV) (positive predictive
learn distinctive texture features in the benign and malware value), (4) NPV (negative predictive value), (5) specificity
images with a small number of training iterations. Therefore, (TNR) (true negative rate), (6) FNR (false negative rate),
it is computationally efficient and attractive to be employed (7) FPR (false positive rate), (8) FOR (false omission rate),
for detecting Android malware attacks. Further information (9) FDR (false discovery rate), (10) misclassification rate,
and explanations of the rest of the other utilized 15 different (11) F1-Score, (12) AROC (Area under the receiver operating
pre-trained CNN algorithms (VGG16, VGG19, DarkNet-53, characteristic) score, (13) accuracy curve, (14) loss curve,
MobileNet-V2, ResNet101, AlexNet, ResNet-50, ResNet18, (15) confusion matrix, and (16) AROC curve. Further details
InceptionV3, DarkNet19, ShuffleNet, Places365-GoogleNet, and explanations of these detection assessment parameters
NasNetMobile, GoogleNet, and SqueezeNet), could be inves- can be explored in [55], [56], and they can be mathematically
tigated and explored in [36]–[50]. expressed as follows:
Besides exploiting the advantages of transfer learning in
the proposed AMD model, the whole hyperparameters of the
TN + TP
employed CNN algorithms are fine-tuned. So, the proposed Accuracy = (1)
AMD model utilized fine-tuning, not other types of tuning FP + TP + FN + TN
like shallow tuning or deep tuning [51]. This is because TP
Sensitivity = Recall (TPR) = (2)
fine-tuning is better than these tuning types in terms of FN + TP
achieving high detection accuracy compared to shallow tun- TP
Precision (PPV) = (3)
ing and low computational complexity compared to deep FP + TP
tuning. Therefore, all the hyperparameters of the employed TN
NPV = (4)
CNN algorithms have optimized and fine-tuned in the pro- FN + TN
posed AMD model until an efficient and high detection rate TN
Specificity (TNR) = (5)
is achieved. After running many tests and experiments, the FP + TN
final fine-tuning and optimization parameters used in the FN
FNR = (6)
proposed vision-based AMD model are: learning rate of TP + FN
0.00001, ADAM optimizer [52], ridge regression regularizer FP
FPR = (7)
(L2-regularization) [53] with a weight decay rate of 0.001, TN + FP

2708 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

FIGURE 4. Flow structure of the fine-tuned Xception CNN algorithm.

FN FN + FP
FOR = (8) Misclassification rate = (10)
TN + FN FP + TP + FN + TN
FP 2TP
FDR = (9) F1-Score = (11)
TP + FP 2TP + FN + FP

VOLUME 10, 2022 2709


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

where TN (true negative), TP (true positive), FN (false neg-


ative), and FP (false positive) are estimated by the confusion
matrix shown in Fig. 5. The confusion matrix is called an error
matrix, where it visualizes the different output predictions
of the analyzed detection task. TP means that the prediction
output is positive and it is actually positive, FN means that
the prediction output is negative, but it is actually positive,
FP means that the prediction output is positive, while it is
actually negative, and TN means that the prediction output is
negative and it is actually negative.
The ROC curve demonstrates the graphical representa-
tion of the tradeoff relationship between the TPR and FPR
(1-specificity). ROC score is the average value of the area
under the ROC curve. The accuracy curve is a graphical
representation that reflects the tracing curve with the accuracy
percentage for all training iterations (epochs). In contrast,
the loss curve is a graphical representation that reflects the
tracing curve with the loss percentage for all training itera-
tions (epochs).

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS


FIGURE 5. Binary confusion matrix.
This section introduces extensive experimental results, more
discussions, and comprehensive detection and complexity
analysis for the performance validation of the proposed AMD ROC curve, and the other estimated assessment parameters
model. Leopard Android dataset that originally contained are introduced in detail for Xception, the superior performed
an imbalanced number of malware and benign APKs was fine-tuned CNN algorithm amongst the 16 different tested
used by the proposed model. All training and classification CNN algorithms. In addition, the average outcomes of all
simulations are performed using MATLAB 2020b software estimated detection assessment metrics have been offered for
on a personal laptop with 8GB RAM and Intel Core i7-4500 the other CNN algorithms to deliver in-depth comparisons
processor. and evaluations among them.
In the experimental results, two binary classification In addition, a complexity analysis in terms of the storage
scenarios for validating the performance of the proposed capacity and execution time of the utilized Android datasets
vision-based AMD model are tested. The first scenario tested and CNN algorithms is presented in section IV-C. Finally,
the detection efficiency of the proposed model using imbal- a comparative detection analysis between the proposed
anced Android samples, as presented in section IV-A. The vision-based AMD model and other recent AMD models
second scenario tested the detection efficiency of the pro- that used the same Android Leopard mobile apps dataset is
posed model using balanced Android samples, as introduced discussed in section IV-D. This comparative study is pre-
in section IV-B. These two classification scenarios are inves- sented to confirm the superiority of the proposed automated
tigated to demonstrate the succeeding performance of the vision-based AMD model in comparison of recent related and
developed fine-tuned CNN algorithms even when applied on conventional DL and ML-based AMD models in detecting
different sizes of balanced and imbalanced datasets without and identifying Android malware attacks.
utilizing data augmentation techniques employed in the con-
ventional malware detection and classification approaches. A. PERFORMANCE ANALYSIS ON COLOR AND GRAYSCALE
Furthermore, for these two aforementioned classification IMAGES OF IMBALANCED ANDROID SAMPLES
scenarios, additional extensive experiments for the devel- This section provides the performance analysis of the pro-
oped 16 fine-tuned CNN algorithms utilized in the proposed posed vision-based AMD model using imbalanced color and
AMD model have been presented using two different image grayscale visual images (14733 malware images and 2486
modalities. So, the detection accomplishment of the sug- benign images). So, more experiments were carried out for
gested vision-based AMD model was validated using both testing the proposed model performance using the 16 differ-
visual color and grayscale images of Android benign and mal- ent fine-tuned CNN algorithms utilizing the imbalanced color
ware apps. These comprehensive experiments have been run and grayscale malware and benign images.
for the suggested AMD model to confirm its high detection The training and testing accuracy & loss curves of the
capability and elevated classification efficiency on different superior fine-tuned Xception CNN algorithm utilizing the
representations of visual images. imbalanced visual color and grayscale images across ten
For simplicity in displaying the detection evaluation out- epochs are demonstrated in Figs. 6 and 7, respectively. It is
comes, the confusion matrix, the loss & accuracy curves, the observed from these curves that both the accuracy and loss

2710 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

curves of the training and testing processes for the color algorithm accomplishes superior and substantial values than
and grayscale images are compatible with each other. There the other CNN algorithms for all considered and calculated
is only little overfitting in the loss curves, resulting from detection assessment parameters for both color and grayscale
the imbalanced samples of both visual color and grayscale image representations. Consequently, this CNN algorithm is
images of the malware and benign apps. But the validation remarkably advised to detect Android malware attacks of
losses in both image cases are still lower than 0.1 at epoch visualized Android apps effectively.
10, which are acceptable values. In general, both curves for
the loss and accuracy of the training and testing operations of B. PERFORMANCE ANALYSIS ON COLOR AND
the imbalanced visual color and grayscale images were stable GRAYSCALE IMAGES OF BALANCED
before less than five epochs. Thus, as noticed, the employed ANDROID SAMPLES
Xception CNN algorithm achieved high detection efficiency This section provides the performance analysis of the pro-
at a lower number of iterations (epochs) for both imbalanced posed vision-based AMD model using balanced color and
color and grayscale images. So, it is highly advocated for grayscale visual images (2486 malware images and 2486
recognizing malware attacks efficiently and accurately in benign images). So, more experiments were carried out for
Android cybersecurity applications. Similarly, it is noticed testing the proposed model performance using the 16 differ-
analogous loss and accuracy curves for the other examined 15 ent fine-tuned CNN algorithms utilizing the balanced color
different fine-tuned CNN algorithms of all tested experimen- and grayscale malware and benign images.
tal scenarios on color and grayscale images. The training and testing accuracy & loss curves of the
The confusion matrices obtained for the superior superior fine-tuned Xception CNN algorithm utilizing the
fine-tuned Xception CNN algorithm utilizing the imbalanced balanced visual color and grayscale images across ten epochs
visual color and grayscale images are presented in Fig. 8. are demonstrated in Figs. 9 and 10, respectively. It is observed
These are binary confusion matrices for the examined benign from these curves that both the accuracy and loss curves of
and malware color and grayscale images of the imbalanced the training and testing processes for the color and grayscale
Android samples. It is observed that the accomplished TP, FP, images are fully compatible with each other. In general, both
TN, and FN values for the visual color images are better than curves for the loss and accuracy of the training and testing
those of the grayscale images. But, in general, the obtained operations of the balanced visual color and grayscale images
values of both image visualizations for the fine-tuned Xcep- were stable before less than five epochs. Thus, as noticed,
tion CNN algorithm were acceptable, especially in the detec- the employed Xception CNN algorithm achieved high detec-
tion situation of the highly imbalanced Android datasets. tion efficiency at a lower number of iterations (epochs) for
Thus, the fine-tuned Xception CNN algorithm accomplished both balanced color and grayscale images. So, it is highly
98.05% and 97.93% of accuracy in correctly detecting mal- advocated for recognizing malware attacks efficiently and
ware and benign samples for imbalanced color images and accurately in Android cybersecurity applications. Similarly,
imbalanced grayscale images, respectively. These results are it is noticed analogous loss and accuracy curves for the
also confirmed and supported by the obtained outcomes for other examined 15 different fine-tuned CNN algorithms of all
the fine-tuned Xception CNN algorithm that achieved high tested experimental scenarios on color and grayscale images.
sensitivity, specificity, and ROC values of 0.9095, 0.9925, The confusion matrices obtained for the superior
and 0.9957, respectively, for the visual color images. Also, fine-tuned Xception CNN algorithm utilizing the balanced
this CNN algorithm attained high sensitivity, specificity, and visual color and grayscale images are presented in Fig. 11.
ROC values of 0.9074, 0.9915, and 0.9953, respectively, for These are binary confusion matrices for the examined benign
the visual grayscale images. These all achieved results are and malware color and grayscale images of the balanced
excellent due to exploiting the benefits of transfer learning Android samples. It is observed that the accomplished TP,
and fine-tuning the hyperparameters and CNN layers of the FP, TN, and FN values for the visual color images are
suggested CNN algorithms. better than those of the grayscale images. But, in general,
In addition, the detection performance capability of all 16 the obtained values of both image visualizations for the
analyzed fine-tuned CNN algorithms in recognizing color fine-tuned Xception CNN algorithm were highly recom-
or grayscale benign and malware images have quantita- mended and good. These all attained results are excellent due
tively examined. So, the accuracy (Acc.), recall (Rec.), pre- to exploiting the benefits of transfer learning and fine-tuning
cision (Prec.), NPV, specificity (Spec.), FNR, FPR, FOR, the hyperparameters and CNN layers of the suggested CNN
FDR, misclassification rate (Mis. Class. Rate), F1-Score, and algorithms. Thus, the fine-tuned Xception CNN algorithm
AROC score are computed for the suggested CNN algo- achieved malware and benign samples detection accuracy that
rithms. Table 4 demonstrates the detection outcomes of the reached 99.40% and 99.20% for balanced color images and
employed CNN algorithms on the imbalanced visual color balanced grayscale images, respectively. These results are
images. Similarly, the detection outcomes of the examined also confirmed and supported by the obtained outcomes for
CNN algorithms on the imbalanced visual grayscale images the fine-tuned Xception CNN algorithm that achieved high
are depicted in Table 5. These obtained detection compar- sensitivity, specificity, and ROC values of 0.9940, 0.9940,
isons disclosed that the proposed fine-tuned Xception CNN and 0.9995, respectively, for the visual color images. Also,

VOLUME 10, 2022 2711


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

FIGURE 6. Training and testing accuracy curves of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color
images and (b) visual grayscale images.

TABLE 4. Outcomes of detection assessment of the employed CNN algorithms on the imbalanced visual color images.

2712 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

FIGURE 7. Training and testing loss curves of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color images
and (b) visual grayscale images.

FIGURE 8. Confusion matrix of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color images
and (b) visual grayscale images.

this CNN algorithm attained high sensitivity, specificity, and of the empoyed CNN algorithms on the imbalanced visual
ROC values of 0.9920, 0.9920, and 0.9998, respectively, for grayscale images as depicted in Table 7. These obtained
the visual grayscale images. detection comparisons disclosed that the proposed fine-tuned
Furthermore, the detection performance capability of all 16 Xception CNN algorithm accomplishes superior and substan-
analyzed fine-tuned CNN algorithms in recognizing balanced tial values than the other CNN algorithms for all considered
color or grayscale benign and malware images has examined. and calculated detection assessment parameters for balanced
So, the accuracy (Acc.), recall (Rec.), precision (Prec.), NPV, color and grayscale image representations. Consequently, this
specificity (Spec.), FNR, FPR, FOR, FDR, misclassification CNN algorithm is remarkably advised to detect Android
rate (Mis. Class. Rate), F1-Score, and AROC are computed malware attacks of visualized Android apps effectively.
for the suggested CNN algorithms. Table 6 demonstrates the Overall, the whole examined and suggested fine-tuned
detection outcomes of the employed CNN algorithms on the CNN algorithms accomplished recommended detection find-
balanced visual color images, while the detection outcomes ings, and thus, they can be utilized effectively for detecting

VOLUME 10, 2022 2713


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 5. Outcomes of detection assessment of the best four performed CNN algorithms on the imbalanced visual grayscale images.

FIGURE 9. Training and testing accuracy curves of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color
images and (b) visual grayscale images.

malware attacks in the form of visual color or grayscale Furthermore, as observed, the whole accomplished results for
images using imbalanced or balanced Android datasets. the visual color images are better than those of accomplished

2714 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 6. Outcomes of detection assessment of the employed CNN algorithms on the balanced visual color images.

FIGURE 10. Training and testing loss curves of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color images
and (b) visual grayscale images.

results for the visual grayscale images either for balanced or algorithms. So, the quantitative computational analysis of
imbalanced samples. This is because color images contain the utilized CNN algorithms in the proposed automated
more visualization features and texture details than those vision-based AMD model is examined in terms of (1) stor-
included in the grayscale images. Also, as noticed, the whole age capacity of the used color and grayscale Android
obtained detection results on testing the balanced Android samples of the imbalanced and balanced datasets, (2) exper-
samples are better than those obtained for testing imbal- imental analysis in terms of the (a) number of layers,
anced Android samples for all examined fine-tuned CNN (b) storage capacity, (c) total number of the trainable and
algorithms. non-trainable parameters, and (d) reduction percentage in
the training parameters of the examined CNN algorithms
C. COMPLEXITY ANALYSIS used in the detection experiments, and (3) execution time
This section discusses the complexity performance in terms analysis of the examined CNN algorithms of the color and
of the storage capacity, experimental analysis, and exe- grayscale Android samples of the imbalanced and balanced
cution time of the utilized Android datasets and CNN datasets.

VOLUME 10, 2022 2715


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 7. Outcomes of detection assessment of the employed CNN algorithms on the balanced visual gray-scale images.

FIGURE 11. Confusion matrix of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color images and
(b) visual grayscale images.

TABLE 8. Total number of Android malware and benign samples over TABLE 9. Storage capacity (MB) analysis of the color and grayscale
imbalanced and balanced datasets. Android samples of the imbalanced and balanced datasets.

Table 8 shows the total number of Android malware and


benign samples over imbalanced and balanced datasets used
in the experiments. Table 9 introduces storage capacity anal-
ysis of the color and grayscale Android samples of the imbal- Table 10 presents the experimental analysis of the exam-
anced and balanced datasets. It is noticed from the last row ined CNN algorithms used in the detection experiments.
in Table 9 that color images cause an extra storage capacity This experimental analysis is represented by (i) size (storage
of 3.4% compared to grayscale images for the imbalanced capacity) of the employed CNN algorithm on a disk,
Android samples and 5.3% for the balanced Android samples. (ii) depth (layers) of the studied CNN algorithm containing
However, this additional storage capacity is considered low the number of successive parallel or series fully connected
comparing to the achieved detection accuracy when using or convolutional layers on a path from the input layer to the
color images, as explained in subsections IV-A and IV-B. output layer utilized for feature extraction purposes, (iii) total

2716 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 10. Experimental analysis of the examined CNN algorithms used in the detection experiments.

TABLE 11. Execution time analysis of the examined CNN algorithms of the color and grayscale Android samples of the imbalanced and balanced datasets.

parameters of the tested CNN algorithm from the input layer computational overhead of the CNN algorithms used in the
to the output layer, (iv) trainable parameters of the unfrozen proposed vision-based AMD model is estimated in terms of
CNN layers, (v) non-trainable parameters of the frozen CNN (1) the total computational time of the validation and training
layers, and (vi) reduced percentage in the training parameters processes and (2) the average computational time to identify
of the examined CNN algorithm. It is observed from Table 10 Android malware or benign sample, which is calculated by
that number of layers, non-trainable parameters, and trainable dividing the whole computational time by the total number
parameters vary from one CNN algorithm to another. Also, of Android malware and benign samples. It is noticed from
due to exploiting transfer learning advantages in the proposed Table 11 that the computational overhead is varied from
AMD model, there is a considerable reduction in training one CNN algorithm to another due to the variation in the
parameters of the whole employed CNN algorithms. So, most number of layers and parameters amongst the employed CNN
of the layers and training parameters of the employed CNN algorithms as demonstrated in Table 10. Nevertheless, the
algorithms were frozen, as discussed in subsection III-B. obtained outcomes proved that the average computational
For example, the best accurate fine-tuned Xception algo- time spent to detect Android malware or benign sample is
rithm used in the proposed vision-based AMD model trained adequate for all examined CNN algorithms. For example,
only 4,096 parameters from 22,900,000 parameters that were the best accurate fine-tuned Xception algorithm used in the
existed in the original Xception CNN algorithm. proposed vision-based AMD model accomplished low com-
Table 11 illustrates the execution time analysis of the exam- putational times of 0.4565 sec and 0.60335 sec to identify
ined CNN algorithms for the color and grayscale Android the Android sample in the imbalanced and balanced datasets,
samples of the imbalanced and balanced datasets. So, the respectively.
VOLUME 10, 2022 2717
I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

TABLE 12. Comparison between the proposed vision-based AMD model and the recent conventional vision-based AMD models tested on the Leopard
Android dataset.

D. COMPARATIVE ANALYSIS WITH RELATED MODELS hand-crafted features that increase the complexity of malware
This section compares the detection and classification perfor- detection algorithms. Thus, this paper introduced an auto-
mance of the proposed automated vision-based AMD model mated vision-based AMD model that composed 16 different
with the most recent vision-based AMD models. The purpose fine-tuned CNN algorithms to efficiently and quickly detect
of this comparison is to highlight and demonstrate the pro- Android malware attacks. The proposed AMD model was
posed AMD model’s superior accomplishment in recognizing developed based on the visualization of Android APKs,
and detecting visualized Android malware attacks using the transfer learning concept, and fine-tuning process; to pro-
Leopard mobile apps dataset, either with balanced or imbal- ficiently classify benign APKs from malware APKs with-
anced Android samples. out extensive computations, reverse engineering, or feature
Table 12 demonstrates the comparison between the extraction stages. Different experiments have been carried
proposed AMD model using fine-tuned Xception CNN out using balanced and imbalanced Android samples of
algorithm and the recent conventional AMD models. It is color and grayscale images generated from the benchmark
remarked that the suggested AMD model achieved supe- Leopard dataset. The purpose of these comprehensive exper-
rior detection accuracy that reached 98.05% for imbalanced iments is to extensively and sufficiently validate the detection
color dataset and 99.40% for balanced color dataset. These and classification achievement of the suggested automated
attained detection accuracies are higher than those of all other vision-based AMD model.
baseline-related AMD models that used the same Leopard The experiments results of various classification assess-
Android mobile dataset. The proposed model did not employ ment metrics revealed that the 16 different fine-tuned CNN
any augmentation algorithms or feature engineering tech- algorithms included in the proposed AMD model have effi-
niques like other conventional detection models. ciently performed with the visualized color and grayscale
Thus, in contrast to almost recent related vision-based images in case of balanced and imbalanced Android apps
AMD models that used some additional stages of feature- datasets. Moreover, compared to the related and conventional
engineering or/and data augmentation techniques in AMD models, the proposed AMD model achieved higher
their malware detection models, the proposed automated detection accuracy, lower computational overhead, and better
vision-based AMD model avoids the need for these compu- recognition performance without employing any augmenta-
tational stages. As observed, the proposed model achieved tion algorithms or complicated features-engineering tools.
higher classification performance and higher detection effi- Future work can consider further enhanced versions of the
ciency than the conventional models by employing only designed CNN algorithms that perform adequately with other
transfer learning and fine-tuning algorithms for the uti- highly imbalanced Android datasets. So, different Android
lized CNN algorithms; to detect Android malware attacks datasets of new malware attack families can be examined
efficiently. and investigated. In addition, the authors intend to collect
and build our Android dataset that composes ransomware
V. CONCLUSION AND FUTURE WORK attacks. This is to test further the classification efficiency of
There are enormous limitations and difficulties in process- the developed CNN algorithms and their detection capabil-
ing and analyzing unknown and massive Android malware ities in identifying and recognizing different recent families
samples using dynamic analysis, static analysis, or tradi- of Android malware or ransomware attacks. Moreover, the
tional ML techniques. Subsequently, there is an essential authors intend to propose and develop an image-based real-
need to develop innovative artificial intelligence algorithms time Android malware detection system in our future work.
to mitigate the critical cybersecurity problems resulting from So, the authors have already started developing an Android
Android mobile malware attacks. mobile application and a cloud-based back-end web service
The vision-based DL techniques utilized for recognizing that can detect malware APK files while downloading the
Android malware samples have significant detection mer- APK files from the google store by first converting them into
its by avoiding feature-engineering steps required to obtain images.

2718 VOLUME 10, 2022


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

ACKNOWLEDGMENT [20] J. Hemalatha, S. A. Roseline, S. Geetha, S. Kadry, and R. Damaševičius,


The authors would like to thank the support of Prince Sultan ‘‘An efficient DenseNet-based deep learning model for malware detec-
tion,’’ Entropy, vol. 23, no. 3, p. 344, Mar. 2021.
University for paying the Article Processing Charges (APC) [21] H. Kim, K. Kim, J. Hong, J. Heo, and J. Kook, ‘‘EDAroid: An efficient
of this publication. Moreover, this research was done during dynamic analysis tool for Android applications,’’ in Proc. Int. Conf. Res.
the author Iman Almomani’s sabbatical year 2021/2022 from Adapt. Convergent Syst., Oct. 2020, pp. 261–266.
[22] I. Pustogarov, Q. Wu, and D. Lie, ‘‘Ex-vivo dynamic analysis framework
The University of Jordan, Amman, Jordan. for Android device drivers,’’ in Proc. IEEE Symp. Secur. Privacy (SP),
May 2020, pp. 1088–1105.
REFERENCES [23] R. Thangavelooa, W. W. Jinga, C. K. Lenga, and J. Abdullaha, ‘‘DATDroid:
Dynamic analysis technique in Android malware detection,’’ Int. J. Adv.
[1] Statista Report Mobile Operating Systems’ Market Share Worldwide From
Sci., Eng. Inf. Technol., vol. 10, no. 2, pp. 536–541, 2020.
January 2012 to June 2021. Accessed: Jul. 3, 2021. [Online]. Available:
[24] H. Hasan, B. T. Ladani, and B. Zamani, ‘‘MEGDroid: A model-driven
https://www.statista.com/statistics/272698/global-market-share-held-by-
event generation framework for dynamic Android malware analysis,’’ Inf.
mobile-operating-systems-since-2009/
Softw. Technol., vol. 135, Jul. 2021, Art. no. 106569.
[2] S. Mahdavifar, A. F. Abdul Kadir, R. Fatemi, D. Alhadidi, and
[25] A. Mohaisen, O. Alrawi, and M. Mohaisen, ‘‘AMAL: High-fidelity,
A. A. Ghorbani, ‘‘Dynamic Android malware category classification using
behavior-based automated malware analysis and classification,’’ Comput.
semi-supervised deep learning,’’ in Proc. IEEE Int. Conf Depend-
Secur., vol. 52, pp. 251–266, Jul. 2015.
able, Auton. Secure Comput., Int. Conf Pervasive Intell. Comput., Int.
[26] E. Amer and I. Zelinka, ‘‘A dynamic Windows malware detection and pre-
Conf Cloud Big Data Comput., Int. Conf Cyber Sci. Technol. Congr.
diction method based on contextual understanding of API call sequence,’’
(DASC/PiCom/CBDCom/CyberSciTech), Aug. 2020, pp. 515–522.
Comput. Secur., vol. 92, May 2020, Art. no. 101760.
[3] S. Selvaganapathy, S. Sadasivam, and V. Ravi, ‘‘A review on Android
malware: Attacks, countermeasures and challenges ahead,’’ J. Cyber Secur. [27] R. Sihwail, K. Omar, K. Z. Ariffin, and S. A. Afghani, ‘‘Malware detection
Mobility, vol. 10, pp. 177–230, Mar. 2021. approach based on artifacts in memory image and dynamic analysis,’’ Appl.
Sci., vol. 9, no. 18, p. 3680, Sep. 2019.
[4] G. D’Angelo, M. Ficco, and F. Palmieri, ‘‘Malware detection in mobile
environments based on autoencoders and API-images,’’ J. Parallel Distrib. [28] H. Naeem, F. Ullah, M. R. Naeem, S. Khalid, D. Vasan, S. Jabbar, and
Comput., vol. 137, pp. 26–33, Mar. 2020. S. Saeed, ‘‘Malware detection in industrial Internet of Things based on
[5] I. Almomani, R. Qaddoura, M. Habib, S. Alsoghyer, A. A. Khayer, hybrid image visualization and deep learning model,’’ Ad Hoc Netw.,
I. Aljarah, and H. Faris, ‘‘Android ransomware detection based on a hybrid vol. 105, Aug. 2020, Art. no. 102154.
evolutionary approach in the context of highly imbalanced data,’’ IEEE [29] L. N. Vu and S. Jung, ‘‘AdMat: A CNN-on-matrix approach to
Access, vol. 9, pp. 57674–57691, 2021. Android malware detection and classification,’’ IEEE Access, vol. 9,
[6] V. Kouliaridis and G. Kambourakis, ‘‘A comprehensive survey on machine pp. 39680–39694, 2021.
learning techniques for Android malware detection,’’ Information, vol. 12, [30] Y. Ding, X. Zhang, J. Hu, and W. Xu, ‘‘Android malware detection method
no. 5, p. 185, Apr. 2021. based on bytecode image,’’ J. Ambient Intell. Hum. Comput., vol. 11,
[7] I. Almomani, A. AlKhayer, and M. Ahmed, ‘‘An efficient machine pp. 1–10, Jun. 2020.
learning-based approach for Android v.11 ransomware detection,’’ in Proc. [31] A. Darwaish and F. Naït-Abdesselam, ‘‘RGB-based Android malware
1st Int. Conf. Artif. Intell. Data Anal. (CAIDA), Apr. 2021, pp. 240–244. detection and classification using convolutional neural network,’’ in Proc.
[8] T. H.-D. Huang and H.-Y. Kao, ‘‘R2-D2: ColoR-inspired convolutional IEEE Global Commun. Conf. (GLOBECOM), Dec. 2020, pp. 1–6.
neural network (CNN)-based Android malware detections,’’ in Proc. IEEE [32] M. Kinkead, S. Millar, N. McLaughlin, and P. O’Kane, ‘‘Towards explain-
Int. Conf. Big Data (Big Data), Dec. 2018, pp. 2633–2642. able CNNs for Android malware detection,’’ Proc. Comput. Sci., vol. 184,
[9] H. Naeem, ‘‘Detection of malicious activities in Internet of Things envi- pp. 959–965, Jan. 2021.
ronment based on binary visualization and machine intelligence,’’ Wireless [33] J. Singh, D. Thakur, F. Ali, T. Gera, and K. S. Kwak, ‘‘Deep feature extrac-
Pers. Commun., vol. 108, no. 4, pp. 2609–2629, Oct. 2019. tion and classification of Android malware images,’’ Sensors, vol. 20,
[10] F. Ullah, H. Naeem, S. Jabbar, S. Khalid, M. A. Latif, F. Al-Turjman, and no. 24, p. 7013, Dec. 2020.
L. Mostarda, ‘‘Cyber security threats detection in Internet of Things using [34] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
deep learning approach,’’ IEEE Access, vol. 7, pp. 124379–124389, 2019. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
[11] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, cess. Syst., vol. 25, 2012, pp. 1097–1105.
‘‘IMCFN: Image-based malware classification using fine-tuned convolu- [35] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
tional neural network architecture,’’ Comput. Netw., vol. 171, Apr. 2020, lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Art. no. 107138. Jul. 2017, pp. 1251–1258.
[12] Z. Ren, H. Wu, Q. Ning, I. Hussain, and B. Chen, ‘‘End-to-end malware [36] D. Theckedath and R. R. Sedamkar, ‘‘Detecting affect states using VGG16,
detection for Android IoT devices using deep learning,’’ Ad Hoc Netw., ResNet50 and SE-ResNet50 networks,’’ Social Netw. Comput. Sci., vol. 1,
vol. 101, Apr. 2020, Art. no. 102098. no. 2, pp. 1–7, Mar. 2020.
[13] W. Chao, L. Qun, W. XiaoHu, R. TianYu, D. JiaHan, G. GuangXin, and [37] T. Carvalho, E. R. S. de Rezende, M. T. P. Alves, F. K. C. Balieiro,
S. EnJie, ‘‘An Android application vulnerability mining method based on and R. B. Sovat, ‘‘Exposing computer generated images by eye’s region
static and dynamic analysis,’’ in Proc. IEEE 5th Inf. Technol. Mechatronics classification via transfer learning of VGG19 CNN,’’ in Proc. 16th IEEE
Eng. Conf. (ITOEC), Jun. 2020, pp. 599–603. Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2017, pp. 866–870.
[14] M. Ziadia, J. Fattahi, M. Mejri, and E. Pricop, ‘‘Smali+: An operational [38] F. Hong, C. Lu, W. Jiang, W. Ju, and T. Wang, ‘‘RDNet: Regression dense
semantics for low-level code generated from reverse engineering Android and attention for object detection in traffic symbols,’’ IEEE Sensors J.,
applications,’’ Information, vol. 11, no. 3, p. 130, Feb. 2020. vol. 21, no. 22, pp. 25372–25378, Nov. 2021.
[15] M. Gonçalves and A. C. R. Paiva, ‘‘Reverse engineering of Android appli- [39] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
cations: REiMPAcT,’’ in Proc. Int. Conf. Quality Inf. Commun. Technol. ‘‘MobileNetV2: Inverted residuals and linear bottlenecks,’’ in
Faro, Portugal: Springer, 2020, pp. 369–382. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
[16] M. A. Rahim Khan and M. K. Jain, ‘‘Protection Android app with multi- pp. 4510–4520.
DEX and SO files from reverse engineering,’’ Mater. Today, Proc., pp. 1–9, [40] Z. Wu, C. Shen, and A. Van Den Hengel, ‘‘Wider or deeper: Revisiting
Jan. 2021, doi: 10.1016/j.matpr.2020.12.190. the ResNet model for visual recognition,’’ Pattern Recognit., vol. 90,
[17] B. A. Mantoo and S. S. Khurana, ‘‘Static, dynamic and intrinsic fea- pp. 119–133, Jun. 2019.
tures based Android malware detection using machine learning,’’ in Proc. [41] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
ICRIC. Kashmir, India: Springer, 2020, pp. 31–45. recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[18] P. Agrawal and B. Trivedi, ‘‘Machine learning classifiers for Android mal- Jun. 2016, pp. 770–778.
ware detection,’’ in Data Management, Analytics and Innovation. Kuala [42] E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, and P. de Geus, ‘‘Malicious
Lumpur, Malaysia: Springer, 2021, pp. 311–322. software classification using transfer learning of ResNet-50 deep neural
[19] I. Almomani and A. Khayer, ‘‘Android applications scanning: The guide,’’ network,’’ in Proc. 16th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA),
in Proc. Int. Conf. Comput. Inf. Sci. (ICCIS), Apr. 2019, pp. 1–5. Dec. 2017, pp. 1011–1014.

VOLUME 10, 2022 2719


I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks

[43] X. Ou, P. Yan, Y. Zhang, B. Tu, G. Zhang, J. Wu, and W. Li, ‘‘Moving AALA ALKHAYER received the Bachelor of
object detection method via ResNet-18 with encoder–decoder structure in Engineering degree in information technology
complex scenes,’’ IEEE Access, vol. 7, pp. 108152–108160, 2019. engineering from SVU University, Damascus,
[44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking in 2017, and the bachelor’s degree in software
the inception architecture for computer vision,’’ in Proc. IEEE Conf. engineering from Prince Sultan University (PSU),
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826. Riyadh, Saudi Arabia, in 2018. She is currently
[45] Q. A. Al-Haija, M. Smadi, and O. M. Al-Bataineh, ‘‘Identifying pha- a Research Engineer at the Security Engineering
sic dopamine releases using DarkNet-19 convolutional neural network,’’
Laboratory (SEL), PSU. Her research interests
in Proc. IEEE Int. IoT, Electron. Mechatronics Conf. (IEMTRONICS),
include software engineering, networks security,
Apr. 2021, pp. 1–5.
[46] X. Zhang, X. Zhou, M. Lin, and J. Sun, ‘‘ShuffleNet: An extremely malware analysis, multimedia networking, and
efficient convolutional neural network for mobile devices,’’ in computer vision.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 6848–6856.
[47] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, ‘‘Places: A 10
million image database for scene recognition,’’ IEEE Trans. Pattern Anal.
Mach. Intell., vol. 40, no. 6, pp. 1452–1464, Jun. 2018.
[48] F. Saxen, P. Werner, S. Handrich, E. Othman, L. Dinges, and A. Al-Hamadi,
‘‘Face attribute detection with MobileNetV2 and NasNet-mobile,’’ in
Proc. 11th Int. Symp. Image Signal Process. Anal. (ISPA), Sep. 2019,
pp. 176–180.
[49] R. U. Khan, X. Zhang, and R. Kumar, ‘‘Analysis of ResNet and GoogleNet
models for malware detection,’’ J. Comput. Virol. Hacking Techn., vol. 15,
no. 1, pp. 29–37, 2019.
[50] H. Lee, I. Ullah, W. Wan, Y. Gao, and Z. Fang, ‘‘Real-time vehicle
make and model recognition with the residual SqueezeNet architecture,’’
Sensors, vol. 19, no. 5, p. 982, Feb. 2019.
[51] N. A. El-Hag, A. Sedik, W. El-Shafai, H. M. El-Hoseny, A. A. Khalaf,
A. S. El-Fishawy, W. Al-Nuaimy, F. E. A. El-Samie, and G. M. El-Banby,
‘‘Classification of retinal images based on convolutional neural network,’’
Microsc. Res. Tech., vol. 84, no. 3, pp. 394–414, 2021.
[52] I. K. M. Jais, A. R. Ismail, and S. Q. Nisa, ‘‘Adam optimization algorithm
for wide and deep neural network,’’ Knowl. Eng. Data Sci., vol. 2, no. 1,
pp. 41–46, 2019.
[53] H. Gao, Y. Yang, S. Lei, C. Li, H. Zhou, and X. Qu, ‘‘Multi-branch
fusion network for hyperspectral image classification,’’ Knowl.-Based
Syst., vol. 167, pp. 11–25, Mar. 2019.
[54] T. Hegazy, P. Fazio, and O. Moselhi, ‘‘Developing practical neural network
applications using back-propagation,’’ Comput.-Aided Civil Infrastruct. WALID EL-SHAFAI was born in Alexandria,
Eng., vol. 9, no. 2, pp. 145–159, Mar. 1994. Egypt. He received the B.Sc. degree (Hons.) in
[55] M. Stamp, M. Alazab, and A. Shalaginov, Malware Analysis Using Artifi- electronics and electrical communication engi-
cial Intelligence and Deep Learning. Switzerland: Springer, 2021. neering from the Faculty of Electronic Engineer-
[56] A. P. Namanya, I. U. Awan, J. P. Disso, and M. Younas, ‘‘Similarity hash ing (FEE), Menoufia University, Menouf, Egypt,
based scoring of portable executable files for efficient malware detection
in 2008, the M.Sc. degree from the Egypt-Japan
in IoT,’’ Future Gener. Comput. Syst., vol. 110, pp. 824–832, Sep. 2020.
University of Science and Technology (E-JUST),
in 2012, and the Ph.D. degree from the Faculty
of Electronic Engineering, Menoufia University,
in 2019. Since January 2021, he has been joined
as a Postdoctoral Research Fellow at the Security Engineering Laboratory
(SEL), Prince Sultan University (PSU), Riyadh, Saudi Arabia. He is currently
IMAN ALMOMANI (Senior Member, IEEE) working as a Lecturer and an Assistant Professor with the Electronics and
received the bachelor’s degree from United Communication Engineering (ECE) Department, FEE, Menoufia University.
Arab Emirates, in 2000, the master’s degree in His research interests include wireless mobile and multimedia communi-
computer science from Jordan, in 2002, and the cations systems, image and video signal processing, efficient 2D video/3D
Ph.D. degree in wireless network security from multi-view video coding, multi-view video plus depth coding, 3D multi-view
De Montfort University, U.K., in 2007. She is video coding and transmission, quality of service and experience, digital
currently an Associate Professor in cybersecurity. communication techniques, cognitive radio networks, adaptive filters design,
She is the Associate Director with the Research 3D video watermarking, steganography, and encryption, error resilience and
and Initiatives Centre (RIC) and also the Leader concealment algorithms for H.264/AVC, H.264/MVC, and H.265/HEVC
with the Security Engineering Laboratory (SEL), video codecs standards, cognitive cryptography, medical image process-
Prince Sultan University (PSU), Riyadh, Saudi Arabia. Before Joining PSU, ing, speech processing, security algorithms, software defined networks, the
she worked as an Associate Professor and the Head of the Computer Science Internet of Things, medical diagnoses applications, FPGA implementations
Department, The University of Jordan, Jordan. Her research interests include for signal processing algorithms and communication systems, cancellable
wireless networks and security, mainly wireless mobile ad-hoc networks biometrics and pattern recognition, image and video magnification, arti-
(WMANETs), wireless sensor networks (WSNs), multimedia networking ficial intelligence for signal processing algorithms and communication
(VoIP), and security issues in wireless networks. She is also interested in the systems, modulation identification and classification, image and video
area of electronic learning (e-learning) and mobile learning (m-learning). super-resolution and denoising, cybersecurity applications, malware and
She has several publications in the above areas in a number of reputable ransomware detection and analysis, deep learning in signal processing, and
international and local journals and conferences. She is in the organizing and communication systems applications. He has several publications in the
technical committees for a number of local and international conferences. above research areas in several reputable international and local journals and
She is also a Senior Member of IEEE WIE. Also, she serves as a reviewer conferences. Also, he serves as a reviewer for several international journals.
and a member for the editorial board in a number of international journals.

2720 VOLUME 10, 2022

You might also like