An Automated Vision-Based Deep Learning Model For Efficient Detection of Android Malware Attacks
An Automated Vision-Based Deep Learning Model For Efficient Detection of Android Malware Attacks
January 7, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3140341
ABSTRACT Recently, cybersecurity experts and researchers have given special attention to developing
cost-effective deep learning (DL)-based algorithms for Android malware detection (AMD) systems. How-
ever, the conventional AMD solutions necessitate extensive computations to achieve high accuracy in
detecting Android malware apps. Consequently, there is a significant benefit in utilizing convolution neural
networks (CNNs) in vision-based AMD applications to quickly and efficiently learn without prior stages of
reverse engineering processes. Thus, this paper introduces an efficient and automated vision-based AMD
model composed of 16 well-developed and fine-tuned CNN algorithms. This model precludes the need for
a pre-designated features extraction process while generating accurate predictions of malware images with
minimum cost and high detection speed. Such performance is achieved with colored or grayscale malware
images, whether by using balanced or imbalanced datasets. Firstly, the bytecodes of the ‘‘classes.dex’’ files
extracted from the Android benign and malware apps were converted to color and grayscale visual images
before forwarding them to the developed CNN algorithms for classification. Then, the detection efficiency of
the proposed AMD model was examined and evaluated using the imbalanced benchmark Leopard Android
dataset that composes 14733 samples of malware apps and 2486 samples of benign apps. Finally, different
experimental scenarios were conducted using balanced and imbalanced Android samples of color and
grayscale images generated from the Leopard dataset; to extensively and sufficiently validate the detection
and classification performance of the suggested model. Comprehensive assessment classification parameters
in the evaluation experiments were applied to prove the high capability of the developed fine-tuned CNN
algorithms in recognizing Android malware attacks with low computational overhead. As a result, the
detection accuracy reached 99.40% for balanced samples and 98.05% for imbalanced samples. Furthermore,
the proposed AMD model outperforms the existing approaches that utilize conventional vision-based
algorithms and are tested on the same benchmark Android dataset.
INDEX TERMS Cyberattacks, android, malware detection, visualization, color and grayscale images,
imbalanced datasets, deep learning, machine learning, convolution neural network (CNN), fine-tuning,
transfer learning.
Android malware can be categorized into several types such iterations by using only the fine-tuning process of the
as Riskware, SMS, Adware, and Banking [2]. To evade CNN layers, hyperparameters, and CNN optimization
malware detection systems, malicious software developers techniques.
usually imply small modifications on the original source code • Performing extensive experiments using both visual
of the malware app to generate new malicious software vari- color and grayscale images of Android benign and mal-
ants. In consequence, identifying the new malicious software ware apps to precisely evaluate the detection perfor-
variants becomes challenging even if they belong to the same mance of the suggested AMD model even when it is
family [3]. applied on different visual image representations.
In order to overcome the aforementioned challenge, • Executing comprehensive simulation tests to prove the
a model can be trained using machine learning (ML) to validity and efficiency of the proposed automated AMD
identify malicious software families in regards to the source model using 16 various detection and classification
code variants efficiently [4]. ML has been deployed in devel- parameters.
oping malware detection systems using different approaches • Implementing different experiments to check the storage
such as static, dynamic, or hybrid analysis [5]–[7]. In static capacity and complexity performance of the developed
analysis, the original source code of the Android application CNN algorithms to prove the simplicity and efficiency
is parsed without executing the app. On the other hand, the of the proposed automated AMD model in recognizing
dynamic analysis studies the app’s features and its behaviour Android malware attacks.
during the run-time. However, in both approaches, retrieving • Conducting a comparative study in terms of the obtained
the features of the Android Application Package (APK) by classification accuracy of Android malware attacks; to
reverse engineering or run-time execution consumes process- confirm the superiority of the proposed AMD model
ing time and computational resources. However, a model can in comparison to recent related and conventional AMD
be easily trained utilizing the deep learning (DL) approach models.
by converting the malicious classification issue to an image The rest of the paper is structured as follows. Section II
classification issue [8]–[11]. discusses a background on the Android application pack-
Convolutional Neural Networks (CNN) is a type of deep age and related works. Section III introduces the proposed
learning implemented in a multi-layer algorithm to suf- automated vision-based AMD model. Section IV displays
ficiently classify a large set of images. Inspired by the the experimental results and discussions. Finally, Section V
effective classification of CNN, this paper proposes an concludes this paper and presents possible future work.
automated vision-based DL model for Android malware
detection (AMD) systems. The substantial contributions of II. BACKGROUND AND LITERATURE REVIEW
this work are: A. ANDROID APPLICATION PACKAGE
• Presenting a comprehensive review of the static-based,
Android Application Package, APK, is a zipped file for dis-
dynamic-based, and vision-based AMD approaches.
tributing and installing applications by the Android OS [12].
• Introducing an automated vision-based AMD model for
However, unzipping the APK file results in mainly retrieving
accurate and efficient detection of malware attacks exist-
the following:
ing in the Android operating system.
• Developing 16 different fine-tuned DL-based CNN • AndroidManifest.xml: a binary XML file format that
algorithms (Xception, VGG16, VGG19, DarkNet53, contains metadata of the app such as app name, permis-
MobileNetV2, ResNet101, AlexNet, ResNet50, ResNet sions, and version.
18, InceptionV3, DarkNet19, ShuffleNet, Places365- • classes.dex: a Dex file format that contains the app code.
GoogleNet, NasNetMobile, GoogleNet, and Squeeze • resources.arsc: a file that contains the pre-compiled
Net) to proficiently classify benign apps from malware resources of the app, such as styles, colors, and strings.
apps without the need for extensive computations of • assets: a directory that contains the app assets.
reverse engineering or features extraction stages. • res: a directory that contains all the app resources which
• Testing two binary classification scenarios using imbal- are not included in the resources.arsc file.
anced (14733 malware samples and 2486 benign sam- • lib: a directory that contains all the app libraries.
ples) and balanced (2486 malware samples and 2486 • META-INF: a directory that contains the metadata of the
benign samples) Android apps datasets; to demonstrate APK, such as the APK signature.
the success of the developed fine-tuned CNN algorithms A further step can be implemented by utilizing some
to work on different balanced and imbalanced datasets reverse-engineering tools to get different formats of the .dex
sizes without the need for data augmentation techniques file as shown in Fig. 1. For example, the app classes can be
like other conventional classification approaches. retrieved in .smali format by using APKtool. Furthermore,
• Accomplishing lower computational overhead and the classes in Java format can be restored from the .dex file
higher detection accuracy for the proposed vision-based by deploying Dex2jar tool [13]. This reverse engineering step
automated AMD model compared to conventional might be necessary for some research works to implement
detection models. This is achieved with fewer training deep feature extraction [14]–[16].
TABLE 1. Summary and comparison among current works on vision-based Android malware detection.
highly appreciated and recommended for efficient Android and accurate vision-based AMD model is introduced in this
malware identification in cybersecurity applications. Further- article; to accurately and efficiently detect Android malware
more, some of them require features engineering steps before attacks. The suggested AMD model composes 16 different
performing the learning process. In addition, the conventional CNN algorithms that have been fine-tuned efficiently and
detection models used datasets with a small number of sam- adequately to achieve high malware detection accuracy and
ples in the training process that have dramatically reduced low malware misclassification.
the detection efficiency. Thus, because the number of mal- Consequently, the fine-tuned and developed CNN algo-
ware apps is increasing considerably and daily, an automated rithms suggested for the vision-based AMD process in this
paper are different from conventional AMD models that three different main modules: (1) Pre-processing module,
introduce additional steps for extracting features. In the pro- (2) Training, fine-tuning, and classification module, and
posed AMD model, the bytecodes of the benign and malware (3) Detection evaluation module. The explanations and dis-
APKs were converted to color and grayscale visual images cussions of these three modules are as follows:
before resizing and forwarding them to the developed CNN
algorithms to classify them. The goal of transforming and A. PRE-PROCESSING MODULE
resizing benign and malware apps to graphical images is to In the proposed AMD model, the bytecodes of the classes.dex
generate an Android dataset in a proper structure adapted to files obtained from the Android dataset of benign and mal-
the input format and size of the utilized CNN algorithms. ware apps have been converted into the three-channels format
The main advantage of using the pre-trained CNN algorithms of visual color images (Red, Green, Blue). Because the type
in the proposed AMD model that they were well-trained of image files affects the performance of the Android mal-
previously on more than 14 million digital images of many ware detection system, consequently, the classes.dex files of
different classes of the ImageNet database [34]. So, in the the Android APK files are converted to ‘‘.png’’ format images
proposed AMD model, the transfer learning concept was files since it is the most effective file type compared to other
exploited by employing the already trained features and the image formats. Furthermore, the ‘‘.png’’ format is better than
obtained optimal weights of the pre-trained CNN algorithms other image formats regarding preserving the information
for detecting malware attacks efficiently. This terrific benefit included in the image file. The main objective of transforming
of transfer learning is recommended in AMD tasks, especially Android apps into visual images is to acquire more additional
when examining and analyzing the performance of malware features and extra texture details that cannot be obtained and
detection models on imbalanced Android datasets. Moreover, extracted from the original benign and malware apps in their
the fine-tuning of weights and hyperparameters of the CNN binary formats. So, the Android dataset conversion to visual
layers significantly improved the operation of the utilized images avoids the need for reverse feature engineering steps
pre-trained CNN algorithms. Consequently, increasing the or any specific domain knowledge, as the case in the exist-
detection performance of the proposed AMD model without ing conventional signature-based (static-based) or behavior-
using reverse-engineering tools or signal processing-based based (dynamic-based) Android analysis techniques.
augmentation algorithms. In the conversion process, each 8-bits (bytecode) in the
classes.dex file is transformed into an RGB pixel. This pro-
III. PROPOSED AUTOMATED VISION-BASED cess was repeated for all binary bits in the .dex file of all
AMD MODEL benign and malware apps in the Android dataset. After that,
In the last years, it has been evident that the number of all obtained RGB pixels were accumulated and reformatted
Android malware cyberattacks has increased gradually. As a to generate the final 2D color image of each Android app
result, cybersecurity scholars and experts are interested in (benign or malware).
developing cost-effective and reliable solutions to mitigate To precisely evaluate the detection performance of the
the severe impact of such attacks. Therefore, this paper suggested AMD model on successfully working on different
proposes an accurate and automated vision-based Android image visualizations and representations, the visual grayscale
malware detection (AMD) model that deals with this critical images of Android benign and malware apps have also gen-
cybersecurity challenge that cannot be neglected. This model erated. Fig. 3 presents samples of the generated color and
composes different fine-tuned DL-based CNN algorithms grayscale images of the benign and malware Android APKs
developed and exploited to detect malware attacks in Android in the Leopard mobile dataset. As shown in Fig. 3, the result-
OS efficiently. ing color or grayscale images have various resolutions with
The proposed vision-based AMD model is different from different widths based on the size of their original .dex files
the conventional and existing AMD solutions. So, in contrast extracted from the benign and malware APKs. Table 2 shows
to the preceding static-based or dynamic-based AMD solu- the relation between the Android app sizes and the specific
tions that necessitate manual procedures for features extrac- widths of the generated visual images.
tion and collection, the proposed AMD model in this paper Furthermore, it is demonstrated from the obtained visual
can efficiently detect Android malware attacks without exten- color or grayscale images presented in Fig. 3 that the
sive computations resulting from extracting many complex generated images have various layouts, styles, and forms.
features from the analyzed Android apps. To be more specific, So, the malware images have particular visual similarities and
as indicated in Fig. 2, the proposed AMD model composed 16 attributes that are entirely dissimilar from those of benign
different well-developed and fine-tuned CNN algorithms that images, where each category of them has various distinctive
preclude the need for pre-designated extracted features. Thus, stripes. These remarkable differences in the visualization
the proposed AMD model can quickly learn and efficiently features of the acquired benign and malware images inspired
differentiate and recognize Android malware and benign apps us to adapt and exploit the common DL-based pre-learned
more accurately. CNN algorithms for AMD challenges and mobile cybersecu-
The main steps of the proposed automated vision-based rity applications. Therefore, these CNN algorithms utilized
AMD model are demonstrated in Fig. 2. It comprises for general image processing applications of detection,
classification, and recognition tasks have been exploited in is a mandatory step where each one of the employed CNN
the proposed work to detect malware attacks in Android OS. algorithm has its specific resolution for the input image size,
After obtaining the visual color and grayscale images, as depicted in Table 3.
they were resized before redirecting them to the suggested Additionally, the obtained visual Android dataset of benign
fine-tuned CNN algorithms for automated features extrac- and malware images was distributed into two different
tion, training, and classification purposes. The resizing pro- percentages for testing and training objectives. More sim-
cess for the generated Android benign or malware images ulation experiments were carried out to decide the optimal
TABLE 2. The relation between Android APK sizes and the generated TABLE 3. The image sizes of the CNN algorithms.
visual image widths.
algorithms are considered multi-path deep CNN designs, different classes) of the ImageNet database [34]. So, the
where they have concatenated parallel multi-paths of numer- already pre-trained features have exploited and transferred in
ous convolutional layers with different filter numbers/sizes. the proposed AMD model to quickly and accurately detect
Thus, the DAG CNN algorithms have deep CNN lay- Android malware attacks.
ers organized by a directed acyclic graph; therefore, they The Xception CNN algorithm is a modern and enhanced
have more complex structures than series CNN algorithms. version of the InceptionV3 CNN algorithm [44]. The Xcep-
In addition, each DAG architecture has inputs from differ- tion CNN algorithm is called ’Extreme Inception’ algorithm,
ent CNN layers and outputs to various CNN layers. The where the Xception algorithm has the same Inception algo-
Xception, DarkNet53, MobileNetV2, ResNet101, ResNet50, rithm by replacing more of the standard convolutional (Conv.)
ResNet18, InceptionV3, ShuffleNet, Places365-GoogleNet, layers with SeparableConv. layers. The SeparableConv. lay-
NasNetMobile, GoogleNet, and SqueezeNet are different ers are utilized instead on the Conv. layers to factorize the
examples of DAG CNN algorithms. In terms of detection convolution kernel into two smaller kernels. So, the detection
accuracy, the DAG CNN algorithms have higher detection and classification performance of the Xception algorithm out-
and classification accomplishment than the series CNN algo- performs that of the InceptionV3 algorithm through proper
rithms because they can extract more informative and texture and efficient use of the algorithm hyperparameters while
features in the training process from the input malware and using a small number of training iterations. The complete
benign images. structure with the full specifications of the fine-tuned Xcep-
Among the employed CNN algorithms tested by the pro- tion algorithm utilized in the proposed vision-based AMD
posed AMD model, the fine-tuned Xception CNN algorithm model is given in Fig. 4.
achieves the most outstanding and superior detection results The input layer of the Xception CNN algorithm has an
for visual Android benign and malware classification com- input image resolution of 299×299×3. Therefore, before for-
pared to other CNN algorithms. Consequently, this paper warding the visual begin and malware images to the Xception
discusses in-depth details and insights into its structure, train- CNN algorithm, they must be resized to 299×299×3 to meet
ing behavior, fine-tuning and optimization hyperparameters, the proper input size of the input layer. As shown in Fig. 4, the
and accomplishment detection outcomes. Thus, the proposed visual malware and benign images are firstly forwarded to the
vision-based AMD model has implemented and utilized the entry flow. Then, the resulting feature maps pass to the middle
fine-tuned structure of the pre-trained Xception CNN algo- flow, repeated eight times. Finally, the resulting feature maps
rithm shown in Fig. 4; to classify and detect visualized images go through the exit flow.
of Android malware and benign apps. The DL-based Xcep- The Xception CNN algorithm consists of 36 Conv. and
tion CNN algorithm is previously trained on more general SeparableConv. layers used for extracting the main informa-
digital images (approximately 14 million images with 1000 tive texture features from the input visual malware and benign
images. These 36 stacked Conv. layers are structured and maximum number of epochs equals 10, minimum batch size
arranged into 14 separable modules (blocks). These modules of 16, validation frequency of 16, a dropout rate of 0.5,
have linear residual networks except for the first and last learnRateSchedule parameter is set to be ‘‘piecewise’’, Learn-
modules. In the proposed Xception CNN algorithm, only RateDropPeriod parameter is set to 3, LearnRateDropFactor
one fully-connected layer is utilized before the final softmax parameter is set to 0.9, and loss categorical cross-entropy
layer used for detection and classification purposes. Thus, function is used. These all fine-tuned hyperparameters were
the Xception CNN algorithm composes a linear group of carefully chosen to avoid the overfitting occurrence and
SeparableConv. layers, including more linear residual con- optimize the performance of the training and validation
nections. In the Xception CNN algorithm, all Conv. and processes.
SeparableConv. layers are followed by batch normalization Furthermore, in the whole employed CNN algorithms,
layers that are not incorporated in Fig. 4 for simplicity in the softmax and fully-connected classifiers were utilized to
the presentation. In addition, all SeparableConv. layers utilize classify between Android malware and benign samples. Thus,
a depth multiplier of 1, not a depth expansion. The stride the output layer in the employed 16 different CNN algorithms
value of 2 × 2 is used for all Conv. and MaxPooling layers. that includes 1000 classes is customized and fine-tuned
The ReLU activation function is used to accelerate the train- to have only two classes (malware and benign). Also, the
ing process. Also, the Xception CNN algorithm composes back-propagation technique [54] is utilized in the proposed
one GlobalAveragePooling layer and four MaxPooling layers AMD model to fine-tune and optimize the hyberparameters
with a kernel value of 3 × 3. The objective of the MaxPooling and weights of the layers in the employed CNN algorithms
(Maximum Pooling) layer is to estimate the maximum value that were initially trained on the ImageNet dataset; this is
for every patch of the feature map, while the GlobalAverage- to achieve high detection efficiency in identifying malware
Pooling layer estimates the average value for every patch on attacks.
the feature map.
The most important advantage of the fine-tuned Xception C. DETECTION EVALUATION MODULE
CNN algorithm compared to other CNN algorithms that The detection evaluation module is concerned with compre-
it can be improved easily where its stacked modules have hensively evaluating the proposed vision-based AMD model
internal repeated types of layers that can be simply adapted using 16 different detection assessment parameters. Conse-
and modified. In addition, the fine-tuned Xception algorithm quently, the classification and detection efficiency of the sug-
improves the detection performance without the need to per- gested 16 different CNN algorithms have examined in terms
form deeper training, where the composed Conv. or Separa- of (1) recognition accuracy, (2) recall (sensitivity) (TPR)
bleConv. layers have different kernels that can discover and (true positive rate), (3) precision (PPV) (positive predictive
learn distinctive texture features in the benign and malware value), (4) NPV (negative predictive value), (5) specificity
images with a small number of training iterations. Therefore, (TNR) (true negative rate), (6) FNR (false negative rate),
it is computationally efficient and attractive to be employed (7) FPR (false positive rate), (8) FOR (false omission rate),
for detecting Android malware attacks. Further information (9) FDR (false discovery rate), (10) misclassification rate,
and explanations of the rest of the other utilized 15 different (11) F1-Score, (12) AROC (Area under the receiver operating
pre-trained CNN algorithms (VGG16, VGG19, DarkNet-53, characteristic) score, (13) accuracy curve, (14) loss curve,
MobileNet-V2, ResNet101, AlexNet, ResNet-50, ResNet18, (15) confusion matrix, and (16) AROC curve. Further details
InceptionV3, DarkNet19, ShuffleNet, Places365-GoogleNet, and explanations of these detection assessment parameters
NasNetMobile, GoogleNet, and SqueezeNet), could be inves- can be explored in [55], [56], and they can be mathematically
tigated and explored in [36]–[50]. expressed as follows:
Besides exploiting the advantages of transfer learning in
the proposed AMD model, the whole hyperparameters of the
TN + TP
employed CNN algorithms are fine-tuned. So, the proposed Accuracy = (1)
AMD model utilized fine-tuning, not other types of tuning FP + TP + FN + TN
like shallow tuning or deep tuning [51]. This is because TP
Sensitivity = Recall (TPR) = (2)
fine-tuning is better than these tuning types in terms of FN + TP
achieving high detection accuracy compared to shallow tun- TP
Precision (PPV) = (3)
ing and low computational complexity compared to deep FP + TP
tuning. Therefore, all the hyperparameters of the employed TN
NPV = (4)
CNN algorithms have optimized and fine-tuned in the pro- FN + TN
posed AMD model until an efficient and high detection rate TN
Specificity (TNR) = (5)
is achieved. After running many tests and experiments, the FP + TN
final fine-tuning and optimization parameters used in the FN
FNR = (6)
proposed vision-based AMD model are: learning rate of TP + FN
0.00001, ADAM optimizer [52], ridge regression regularizer FP
FPR = (7)
(L2-regularization) [53] with a weight decay rate of 0.001, TN + FP
FN FN + FP
FOR = (8) Misclassification rate = (10)
TN + FN FP + TP + FN + TN
FP 2TP
FDR = (9) F1-Score = (11)
TP + FP 2TP + FN + FP
curves of the training and testing processes for the color algorithm accomplishes superior and substantial values than
and grayscale images are compatible with each other. There the other CNN algorithms for all considered and calculated
is only little overfitting in the loss curves, resulting from detection assessment parameters for both color and grayscale
the imbalanced samples of both visual color and grayscale image representations. Consequently, this CNN algorithm is
images of the malware and benign apps. But the validation remarkably advised to detect Android malware attacks of
losses in both image cases are still lower than 0.1 at epoch visualized Android apps effectively.
10, which are acceptable values. In general, both curves for
the loss and accuracy of the training and testing operations of B. PERFORMANCE ANALYSIS ON COLOR AND
the imbalanced visual color and grayscale images were stable GRAYSCALE IMAGES OF BALANCED
before less than five epochs. Thus, as noticed, the employed ANDROID SAMPLES
Xception CNN algorithm achieved high detection efficiency This section provides the performance analysis of the pro-
at a lower number of iterations (epochs) for both imbalanced posed vision-based AMD model using balanced color and
color and grayscale images. So, it is highly advocated for grayscale visual images (2486 malware images and 2486
recognizing malware attacks efficiently and accurately in benign images). So, more experiments were carried out for
Android cybersecurity applications. Similarly, it is noticed testing the proposed model performance using the 16 differ-
analogous loss and accuracy curves for the other examined 15 ent fine-tuned CNN algorithms utilizing the balanced color
different fine-tuned CNN algorithms of all tested experimen- and grayscale malware and benign images.
tal scenarios on color and grayscale images. The training and testing accuracy & loss curves of the
The confusion matrices obtained for the superior superior fine-tuned Xception CNN algorithm utilizing the
fine-tuned Xception CNN algorithm utilizing the imbalanced balanced visual color and grayscale images across ten epochs
visual color and grayscale images are presented in Fig. 8. are demonstrated in Figs. 9 and 10, respectively. It is observed
These are binary confusion matrices for the examined benign from these curves that both the accuracy and loss curves of
and malware color and grayscale images of the imbalanced the training and testing processes for the color and grayscale
Android samples. It is observed that the accomplished TP, FP, images are fully compatible with each other. In general, both
TN, and FN values for the visual color images are better than curves for the loss and accuracy of the training and testing
those of the grayscale images. But, in general, the obtained operations of the balanced visual color and grayscale images
values of both image visualizations for the fine-tuned Xcep- were stable before less than five epochs. Thus, as noticed,
tion CNN algorithm were acceptable, especially in the detec- the employed Xception CNN algorithm achieved high detec-
tion situation of the highly imbalanced Android datasets. tion efficiency at a lower number of iterations (epochs) for
Thus, the fine-tuned Xception CNN algorithm accomplished both balanced color and grayscale images. So, it is highly
98.05% and 97.93% of accuracy in correctly detecting mal- advocated for recognizing malware attacks efficiently and
ware and benign samples for imbalanced color images and accurately in Android cybersecurity applications. Similarly,
imbalanced grayscale images, respectively. These results are it is noticed analogous loss and accuracy curves for the
also confirmed and supported by the obtained outcomes for other examined 15 different fine-tuned CNN algorithms of all
the fine-tuned Xception CNN algorithm that achieved high tested experimental scenarios on color and grayscale images.
sensitivity, specificity, and ROC values of 0.9095, 0.9925, The confusion matrices obtained for the superior
and 0.9957, respectively, for the visual color images. Also, fine-tuned Xception CNN algorithm utilizing the balanced
this CNN algorithm attained high sensitivity, specificity, and visual color and grayscale images are presented in Fig. 11.
ROC values of 0.9074, 0.9915, and 0.9953, respectively, for These are binary confusion matrices for the examined benign
the visual grayscale images. These all achieved results are and malware color and grayscale images of the balanced
excellent due to exploiting the benefits of transfer learning Android samples. It is observed that the accomplished TP,
and fine-tuning the hyperparameters and CNN layers of the FP, TN, and FN values for the visual color images are
suggested CNN algorithms. better than those of the grayscale images. But, in general,
In addition, the detection performance capability of all 16 the obtained values of both image visualizations for the
analyzed fine-tuned CNN algorithms in recognizing color fine-tuned Xception CNN algorithm were highly recom-
or grayscale benign and malware images have quantita- mended and good. These all attained results are excellent due
tively examined. So, the accuracy (Acc.), recall (Rec.), pre- to exploiting the benefits of transfer learning and fine-tuning
cision (Prec.), NPV, specificity (Spec.), FNR, FPR, FOR, the hyperparameters and CNN layers of the suggested CNN
FDR, misclassification rate (Mis. Class. Rate), F1-Score, and algorithms. Thus, the fine-tuned Xception CNN algorithm
AROC score are computed for the suggested CNN algo- achieved malware and benign samples detection accuracy that
rithms. Table 4 demonstrates the detection outcomes of the reached 99.40% and 99.20% for balanced color images and
employed CNN algorithms on the imbalanced visual color balanced grayscale images, respectively. These results are
images. Similarly, the detection outcomes of the examined also confirmed and supported by the obtained outcomes for
CNN algorithms on the imbalanced visual grayscale images the fine-tuned Xception CNN algorithm that achieved high
are depicted in Table 5. These obtained detection compar- sensitivity, specificity, and ROC values of 0.9940, 0.9940,
isons disclosed that the proposed fine-tuned Xception CNN and 0.9995, respectively, for the visual color images. Also,
FIGURE 6. Training and testing accuracy curves of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color
images and (b) visual grayscale images.
TABLE 4. Outcomes of detection assessment of the employed CNN algorithms on the imbalanced visual color images.
FIGURE 7. Training and testing loss curves of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color images
and (b) visual grayscale images.
FIGURE 8. Confusion matrix of the superior fine-tuned Xception CNN algorithm on imbalanced samples of (a) visual color images
and (b) visual grayscale images.
this CNN algorithm attained high sensitivity, specificity, and of the empoyed CNN algorithms on the imbalanced visual
ROC values of 0.9920, 0.9920, and 0.9998, respectively, for grayscale images as depicted in Table 7. These obtained
the visual grayscale images. detection comparisons disclosed that the proposed fine-tuned
Furthermore, the detection performance capability of all 16 Xception CNN algorithm accomplishes superior and substan-
analyzed fine-tuned CNN algorithms in recognizing balanced tial values than the other CNN algorithms for all considered
color or grayscale benign and malware images has examined. and calculated detection assessment parameters for balanced
So, the accuracy (Acc.), recall (Rec.), precision (Prec.), NPV, color and grayscale image representations. Consequently, this
specificity (Spec.), FNR, FPR, FOR, FDR, misclassification CNN algorithm is remarkably advised to detect Android
rate (Mis. Class. Rate), F1-Score, and AROC are computed malware attacks of visualized Android apps effectively.
for the suggested CNN algorithms. Table 6 demonstrates the Overall, the whole examined and suggested fine-tuned
detection outcomes of the employed CNN algorithms on the CNN algorithms accomplished recommended detection find-
balanced visual color images, while the detection outcomes ings, and thus, they can be utilized effectively for detecting
TABLE 5. Outcomes of detection assessment of the best four performed CNN algorithms on the imbalanced visual grayscale images.
FIGURE 9. Training and testing accuracy curves of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color
images and (b) visual grayscale images.
malware attacks in the form of visual color or grayscale Furthermore, as observed, the whole accomplished results for
images using imbalanced or balanced Android datasets. the visual color images are better than those of accomplished
TABLE 6. Outcomes of detection assessment of the employed CNN algorithms on the balanced visual color images.
FIGURE 10. Training and testing loss curves of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color images
and (b) visual grayscale images.
results for the visual grayscale images either for balanced or algorithms. So, the quantitative computational analysis of
imbalanced samples. This is because color images contain the utilized CNN algorithms in the proposed automated
more visualization features and texture details than those vision-based AMD model is examined in terms of (1) stor-
included in the grayscale images. Also, as noticed, the whole age capacity of the used color and grayscale Android
obtained detection results on testing the balanced Android samples of the imbalanced and balanced datasets, (2) exper-
samples are better than those obtained for testing imbal- imental analysis in terms of the (a) number of layers,
anced Android samples for all examined fine-tuned CNN (b) storage capacity, (c) total number of the trainable and
algorithms. non-trainable parameters, and (d) reduction percentage in
the training parameters of the examined CNN algorithms
C. COMPLEXITY ANALYSIS used in the detection experiments, and (3) execution time
This section discusses the complexity performance in terms analysis of the examined CNN algorithms of the color and
of the storage capacity, experimental analysis, and exe- grayscale Android samples of the imbalanced and balanced
cution time of the utilized Android datasets and CNN datasets.
TABLE 7. Outcomes of detection assessment of the employed CNN algorithms on the balanced visual gray-scale images.
FIGURE 11. Confusion matrix of the superior fine-tuned Xception CNN algorithm on balanced samples of (a) visual color images and
(b) visual grayscale images.
TABLE 8. Total number of Android malware and benign samples over TABLE 9. Storage capacity (MB) analysis of the color and grayscale
imbalanced and balanced datasets. Android samples of the imbalanced and balanced datasets.
TABLE 10. Experimental analysis of the examined CNN algorithms used in the detection experiments.
TABLE 11. Execution time analysis of the examined CNN algorithms of the color and grayscale Android samples of the imbalanced and balanced datasets.
parameters of the tested CNN algorithm from the input layer computational overhead of the CNN algorithms used in the
to the output layer, (iv) trainable parameters of the unfrozen proposed vision-based AMD model is estimated in terms of
CNN layers, (v) non-trainable parameters of the frozen CNN (1) the total computational time of the validation and training
layers, and (vi) reduced percentage in the training parameters processes and (2) the average computational time to identify
of the examined CNN algorithm. It is observed from Table 10 Android malware or benign sample, which is calculated by
that number of layers, non-trainable parameters, and trainable dividing the whole computational time by the total number
parameters vary from one CNN algorithm to another. Also, of Android malware and benign samples. It is noticed from
due to exploiting transfer learning advantages in the proposed Table 11 that the computational overhead is varied from
AMD model, there is a considerable reduction in training one CNN algorithm to another due to the variation in the
parameters of the whole employed CNN algorithms. So, most number of layers and parameters amongst the employed CNN
of the layers and training parameters of the employed CNN algorithms as demonstrated in Table 10. Nevertheless, the
algorithms were frozen, as discussed in subsection III-B. obtained outcomes proved that the average computational
For example, the best accurate fine-tuned Xception algo- time spent to detect Android malware or benign sample is
rithm used in the proposed vision-based AMD model trained adequate for all examined CNN algorithms. For example,
only 4,096 parameters from 22,900,000 parameters that were the best accurate fine-tuned Xception algorithm used in the
existed in the original Xception CNN algorithm. proposed vision-based AMD model accomplished low com-
Table 11 illustrates the execution time analysis of the exam- putational times of 0.4565 sec and 0.60335 sec to identify
ined CNN algorithms for the color and grayscale Android the Android sample in the imbalanced and balanced datasets,
samples of the imbalanced and balanced datasets. So, the respectively.
VOLUME 10, 2022 2717
I. Almomani et al.: Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks
TABLE 12. Comparison between the proposed vision-based AMD model and the recent conventional vision-based AMD models tested on the Leopard
Android dataset.
D. COMPARATIVE ANALYSIS WITH RELATED MODELS hand-crafted features that increase the complexity of malware
This section compares the detection and classification perfor- detection algorithms. Thus, this paper introduced an auto-
mance of the proposed automated vision-based AMD model mated vision-based AMD model that composed 16 different
with the most recent vision-based AMD models. The purpose fine-tuned CNN algorithms to efficiently and quickly detect
of this comparison is to highlight and demonstrate the pro- Android malware attacks. The proposed AMD model was
posed AMD model’s superior accomplishment in recognizing developed based on the visualization of Android APKs,
and detecting visualized Android malware attacks using the transfer learning concept, and fine-tuning process; to pro-
Leopard mobile apps dataset, either with balanced or imbal- ficiently classify benign APKs from malware APKs with-
anced Android samples. out extensive computations, reverse engineering, or feature
Table 12 demonstrates the comparison between the extraction stages. Different experiments have been carried
proposed AMD model using fine-tuned Xception CNN out using balanced and imbalanced Android samples of
algorithm and the recent conventional AMD models. It is color and grayscale images generated from the benchmark
remarked that the suggested AMD model achieved supe- Leopard dataset. The purpose of these comprehensive exper-
rior detection accuracy that reached 98.05% for imbalanced iments is to extensively and sufficiently validate the detection
color dataset and 99.40% for balanced color dataset. These and classification achievement of the suggested automated
attained detection accuracies are higher than those of all other vision-based AMD model.
baseline-related AMD models that used the same Leopard The experiments results of various classification assess-
Android mobile dataset. The proposed model did not employ ment metrics revealed that the 16 different fine-tuned CNN
any augmentation algorithms or feature engineering tech- algorithms included in the proposed AMD model have effi-
niques like other conventional detection models. ciently performed with the visualized color and grayscale
Thus, in contrast to almost recent related vision-based images in case of balanced and imbalanced Android apps
AMD models that used some additional stages of feature- datasets. Moreover, compared to the related and conventional
engineering or/and data augmentation techniques in AMD models, the proposed AMD model achieved higher
their malware detection models, the proposed automated detection accuracy, lower computational overhead, and better
vision-based AMD model avoids the need for these compu- recognition performance without employing any augmenta-
tational stages. As observed, the proposed model achieved tion algorithms or complicated features-engineering tools.
higher classification performance and higher detection effi- Future work can consider further enhanced versions of the
ciency than the conventional models by employing only designed CNN algorithms that perform adequately with other
transfer learning and fine-tuning algorithms for the uti- highly imbalanced Android datasets. So, different Android
lized CNN algorithms; to detect Android malware attacks datasets of new malware attack families can be examined
efficiently. and investigated. In addition, the authors intend to collect
and build our Android dataset that composes ransomware
V. CONCLUSION AND FUTURE WORK attacks. This is to test further the classification efficiency of
There are enormous limitations and difficulties in process- the developed CNN algorithms and their detection capabil-
ing and analyzing unknown and massive Android malware ities in identifying and recognizing different recent families
samples using dynamic analysis, static analysis, or tradi- of Android malware or ransomware attacks. Moreover, the
tional ML techniques. Subsequently, there is an essential authors intend to propose and develop an image-based real-
need to develop innovative artificial intelligence algorithms time Android malware detection system in our future work.
to mitigate the critical cybersecurity problems resulting from So, the authors have already started developing an Android
Android mobile malware attacks. mobile application and a cloud-based back-end web service
The vision-based DL techniques utilized for recognizing that can detect malware APK files while downloading the
Android malware samples have significant detection mer- APK files from the google store by first converting them into
its by avoiding feature-engineering steps required to obtain images.
[43] X. Ou, P. Yan, Y. Zhang, B. Tu, G. Zhang, J. Wu, and W. Li, ‘‘Moving AALA ALKHAYER received the Bachelor of
object detection method via ResNet-18 with encoder–decoder structure in Engineering degree in information technology
complex scenes,’’ IEEE Access, vol. 7, pp. 108152–108160, 2019. engineering from SVU University, Damascus,
[44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking in 2017, and the bachelor’s degree in software
the inception architecture for computer vision,’’ in Proc. IEEE Conf. engineering from Prince Sultan University (PSU),
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826. Riyadh, Saudi Arabia, in 2018. She is currently
[45] Q. A. Al-Haija, M. Smadi, and O. M. Al-Bataineh, ‘‘Identifying pha- a Research Engineer at the Security Engineering
sic dopamine releases using DarkNet-19 convolutional neural network,’’
Laboratory (SEL), PSU. Her research interests
in Proc. IEEE Int. IoT, Electron. Mechatronics Conf. (IEMTRONICS),
include software engineering, networks security,
Apr. 2021, pp. 1–5.
[46] X. Zhang, X. Zhou, M. Lin, and J. Sun, ‘‘ShuffleNet: An extremely malware analysis, multimedia networking, and
efficient convolutional neural network for mobile devices,’’ in computer vision.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 6848–6856.
[47] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, ‘‘Places: A 10
million image database for scene recognition,’’ IEEE Trans. Pattern Anal.
Mach. Intell., vol. 40, no. 6, pp. 1452–1464, Jun. 2018.
[48] F. Saxen, P. Werner, S. Handrich, E. Othman, L. Dinges, and A. Al-Hamadi,
‘‘Face attribute detection with MobileNetV2 and NasNet-mobile,’’ in
Proc. 11th Int. Symp. Image Signal Process. Anal. (ISPA), Sep. 2019,
pp. 176–180.
[49] R. U. Khan, X. Zhang, and R. Kumar, ‘‘Analysis of ResNet and GoogleNet
models for malware detection,’’ J. Comput. Virol. Hacking Techn., vol. 15,
no. 1, pp. 29–37, 2019.
[50] H. Lee, I. Ullah, W. Wan, Y. Gao, and Z. Fang, ‘‘Real-time vehicle
make and model recognition with the residual SqueezeNet architecture,’’
Sensors, vol. 19, no. 5, p. 982, Feb. 2019.
[51] N. A. El-Hag, A. Sedik, W. El-Shafai, H. M. El-Hoseny, A. A. Khalaf,
A. S. El-Fishawy, W. Al-Nuaimy, F. E. A. El-Samie, and G. M. El-Banby,
‘‘Classification of retinal images based on convolutional neural network,’’
Microsc. Res. Tech., vol. 84, no. 3, pp. 394–414, 2021.
[52] I. K. M. Jais, A. R. Ismail, and S. Q. Nisa, ‘‘Adam optimization algorithm
for wide and deep neural network,’’ Knowl. Eng. Data Sci., vol. 2, no. 1,
pp. 41–46, 2019.
[53] H. Gao, Y. Yang, S. Lei, C. Li, H. Zhou, and X. Qu, ‘‘Multi-branch
fusion network for hyperspectral image classification,’’ Knowl.-Based
Syst., vol. 167, pp. 11–25, Mar. 2019.
[54] T. Hegazy, P. Fazio, and O. Moselhi, ‘‘Developing practical neural network
applications using back-propagation,’’ Comput.-Aided Civil Infrastruct. WALID EL-SHAFAI was born in Alexandria,
Eng., vol. 9, no. 2, pp. 145–159, Mar. 1994. Egypt. He received the B.Sc. degree (Hons.) in
[55] M. Stamp, M. Alazab, and A. Shalaginov, Malware Analysis Using Artifi- electronics and electrical communication engi-
cial Intelligence and Deep Learning. Switzerland: Springer, 2021. neering from the Faculty of Electronic Engineer-
[56] A. P. Namanya, I. U. Awan, J. P. Disso, and M. Younas, ‘‘Similarity hash ing (FEE), Menoufia University, Menouf, Egypt,
based scoring of portable executable files for efficient malware detection
in 2008, the M.Sc. degree from the Egypt-Japan
in IoT,’’ Future Gener. Comput. Syst., vol. 110, pp. 824–832, Sep. 2020.
University of Science and Technology (E-JUST),
in 2012, and the Ph.D. degree from the Faculty
of Electronic Engineering, Menoufia University,
in 2019. Since January 2021, he has been joined
as a Postdoctoral Research Fellow at the Security Engineering Laboratory
(SEL), Prince Sultan University (PSU), Riyadh, Saudi Arabia. He is currently
IMAN ALMOMANI (Senior Member, IEEE) working as a Lecturer and an Assistant Professor with the Electronics and
received the bachelor’s degree from United Communication Engineering (ECE) Department, FEE, Menoufia University.
Arab Emirates, in 2000, the master’s degree in His research interests include wireless mobile and multimedia communi-
computer science from Jordan, in 2002, and the cations systems, image and video signal processing, efficient 2D video/3D
Ph.D. degree in wireless network security from multi-view video coding, multi-view video plus depth coding, 3D multi-view
De Montfort University, U.K., in 2007. She is video coding and transmission, quality of service and experience, digital
currently an Associate Professor in cybersecurity. communication techniques, cognitive radio networks, adaptive filters design,
She is the Associate Director with the Research 3D video watermarking, steganography, and encryption, error resilience and
and Initiatives Centre (RIC) and also the Leader concealment algorithms for H.264/AVC, H.264/MVC, and H.265/HEVC
with the Security Engineering Laboratory (SEL), video codecs standards, cognitive cryptography, medical image process-
Prince Sultan University (PSU), Riyadh, Saudi Arabia. Before Joining PSU, ing, speech processing, security algorithms, software defined networks, the
she worked as an Associate Professor and the Head of the Computer Science Internet of Things, medical diagnoses applications, FPGA implementations
Department, The University of Jordan, Jordan. Her research interests include for signal processing algorithms and communication systems, cancellable
wireless networks and security, mainly wireless mobile ad-hoc networks biometrics and pattern recognition, image and video magnification, arti-
(WMANETs), wireless sensor networks (WSNs), multimedia networking ficial intelligence for signal processing algorithms and communication
(VoIP), and security issues in wireless networks. She is also interested in the systems, modulation identification and classification, image and video
area of electronic learning (e-learning) and mobile learning (m-learning). super-resolution and denoising, cybersecurity applications, malware and
She has several publications in the above areas in a number of reputable ransomware detection and analysis, deep learning in signal processing, and
international and local journals and conferences. She is in the organizing and communication systems applications. He has several publications in the
technical committees for a number of local and international conferences. above research areas in several reputable international and local journals and
She is also a Senior Member of IEEE WIE. Also, she serves as a reviewer conferences. Also, he serves as a reviewer for several international journals.
and a member for the editorial board in a number of international journals.