A Deep Learning Based Android Malware Detection System With Static Analysis
A Deep Learning Based Android Malware Detection System With Static Analysis
net/publication/361844576
A Deep Learning Based Android Malware Detection System with Static Analysis
CITATIONS READS
36 975
3 authors, including:
All content following this page was uploaded by Esra Çalık Bayazıt on 08 July 2022.
Abstract—In recent years, smart mobile devices have become of the first quarter of 2021 [3]. However, the total number
indispensable due to the availability of office applications, the of Android malware in 2021 is 3.36 million [4]. This sheer
Internet, game applications, vehicle guidance or similar most of when the number of Android users increases, and accordingly,
our daily lives applications in addition to traditional services such
as voice calls, SMSs, and multimedia services. Due to Android’s the number of malware is increasing too. Because the amount
open source structure and easy development platforms, the of valuable information that attackers can be to accessed
number of applications on Google Play, the official Android app increases. This clearly requires effective malware detection
store increased day by day. This also brig some security related systems against malware applications that are increasing in
issues for the end users. The increased popularity of Android complexity.
operating system on mobile devices, and the associated financial
benefits attracted attackers for developing some malware for Various techniques are available for Android malware de-
these devices, which results a significant increase in the number tection. These are classified as static, dynamic and hybrid
of Android malware applications. To detect this type of security analysis. Static analysis enables malware detection without
threats, signature based detection (static detection) in generally running an application. Thus, the mobile device is not affected
preferred due to its easy applicability and fast identification by the malicious functionality of the application. The static
ability. Therefore in this study it is aimed to implement an up-
to-date, effective, and reliable malware detection system with the analysis technique is a safe way to analyze malware. Dynamic
help of some deep learning algorithms. In the proposed system, analysis requires executing the application in an isolated
RNN-based LSTM, BiLSTM and GRU algorithms are evaluated environment to observe application behavior. In contrast to
on CICInvesAndMal2019 data set which contains 8115 static static analysis, dynamic analysis provides an advantage to
features for malware detection. Experimental results show that reveal the natural behavior of malware due to the presence
the BiLSTM model outperforms other proposed RNN-based deep
learning methods with an accuracy rate of 98.85%. of code execution processes. On the other hand, the static
Index Terms—malware detection, static analysis, deep learning, analysis technique is less costly than the dynamic analysis
RNN, android system technique in terms of time and computation, as it does not
require an isolated environment. Finally, the hybrid analysis
I. I NTRODUCTION technique consists of a combination of these two techniques.
In recent years, with the digital transformation that has taken Static and dynamic analysis techniques have advantages and
place, the rate of smart mobile device usage is increasing day disadvantages over each other. For example, malware can
by day. According to the “Digital 2022” report, the number often avoid static analysis techniques with code obfuscation
of smart mobile devices worldwide has reached 5.3 billion methods, while it can be more easily detected at execution
[1] and Android operating system is used in 71.63% of these time with dynamic analysis techniques.
devices [2]. This is an indication that Android mobile devices In this study, RNN-based deep learning models are proposed
are an indispensable part of daily life. Android systems have on the current and public data set containing static features.
become the target of cyber attackers because of the valuable The development of detection and classification tools is ex-
personal information they contain, which is directly propor- tremely important, as the popularity of Android systems on
tional to factors such as the number of users, usage areas, and mobile platforms leads to an increase in threats in this area.
ease of application access. The attackers may receive access to In the literature, the success of malware detection systems is
users’ devices through applications containing malware, which evident in many machine learning-based studies [5]–[7], [23].
they succeeded in uploading to Google Play. In particular, the However, the rapid growth of Android malware applications
fact that Android applications can be downloaded from third- and technologies to evade detection systems are rendering
party sources as well as from the Google Play Store allows defenses based on traditional methods ineffective. For this
attackers to easily access the systems through an application. reason, in this study, we compare RNN, LSTM, BI-LSTM,
The number of mobile apps downloadable, which was 3.14 GRU deep learning approaches on the CICInvesAndMal2019
million in the fourth quarter of 2020, and 3.48 million as data set instead of traditional machine learning algorithms.
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
The archive file called Android Package Kit (APK) contains apps are usually downloaded from the Android app store.
all the necessary data and resource files for Android appli- They used the Dalvik analysis method-based BiLSTM method,
cations to perform the required functions. The permissions one of the malware static analysis methods, to evaluate the
required by an application are kept in the Android Manifest performance of the system. It was stated that the accuracy
file and are used in malware detection as a feature. Android rate obtained in the study was 96.74%.
applications require user permission to use shared memory A different static alaysis of Android devices is proposed
or access functions. The permissions granted to an application Android malware detection model using static analysis features
can be used for the desired purpose without asking for consent of benign and malicious apps collected from Google Play
later. The CICInvesAndMal2019 data set contains permissions and Virus Share [12]. The DBN classification framework is
and intents, which are static features of applications. presented in the study, which uses 331 features, including API
The rest of the paper is organized as follows. Section II calls and permissions. The obtained accuracy success rate was
briefly covers a summary of the related works on Android reported as 94.64%. The study also models the permissions
malware detection and identification. Section III introduces required by Android systems by looking at features those
the data set, Section IV covers the background of Android regular applications require. Chen et al., [13] proposed a
file structure, static analysis techniques and deep learning new prepossessing technique that is used as a feature in the
methods. Section V presents our proposed approach for the operation of Dalvik opcode. Their proposed study used a new
detection of Android malware and comparative experimental reprocessing approach combined with the LSTM model to
results. In Section VI is ends with concluding the work. detect malware, solving the long string problem to achieve
fast training and high accuracy. It was stated that 95.58 %
II. RELATED WORKS accuracy rate was obtained with the training model developed
In essence, attackers develop various methods, aiming to in the study using the Drebin data set.
harm the functioning of systems through malware, to steal Elayan et al., [14] proposed a malware detection model
users’ personal data and to use them in line with their goals using the GRU deep learning algorithm. In the study, GRU
or to block systems. Researchers suggest many studies in the deep learning algorithm and machine learning algorithms were
literature to detect malware. In this part of the study, related compared. In the study using the CICAndMal2017 data set,
studies in the literature are included. the system proposed with GRU has an accuracy rate of 98.2%.
Tang et al., [8], proposed a malware classification model In the study proposed by Zang et al., [15], in which the
to detect Android malware samples, as well as an algorithmic deep learning based Deep Classify Droid detection system is
model, artificial intelligence-based learning solution based on presented, consists of three steps. These; feature extraction,
static analysis and feature extraction from source code. The feature embedding, and detection. In the first stages, five
accuracy of the data was found to be approximately 94.4 % different feature sets are created using the static analysis
after 200 epochs of training, and the e-validation set was used method, and in the next stage, CNN-based malware detection
to validate the model. is carried out. According to the performance result of the
In 2021, the authors proposed a hybrid DL capable An- proposed study, an accuracy rate of 97.4% is obtained.
droid malware detection framework is proposed that uses In related studies, data sets that generally do not contain
permissions, API calls, and intents to detect malware from real-world data were used. The data sets used were either
Android apps. With the designed approach, CNN and BiLSTM collected from various sources or outdated. Malware collected
were utilized by using 10-fold cross validation in classifying from platforms such as Google Play Store may be removed
malware [9]. The study was carried out on Androzoo and by the official market and application data may change in
AMD data sets. In the proposed study, hybrid DL models and the time between two snapshots. This situation imposes a
comparative DL-based algorithms were critically evaluated. limitation on the data sets created by users. The hottest point
The proposed hybrid CNN-BiLSTM model has an accuracy that distinguishes this work from the others is the use of deep
rate of 99.05 %. Wenbo et al., [10] proposed a multi-mode learning approaches that can best represent the data, make
deep learning-based Android malware detection system. In the the most of nonlinear functions, and solve problems from
system where API calls, permissions, hardware components, start to finish, using the CICInvesAndMal2019 data set, which
and intent features of applications are used, deep learning contains up-to-date and valid application data.
algorithms are modeled in three different ways. In the system
in which DNN, CNN and CNN-GRU deep learning algorithms III. DATASET
are used, it is shown that a 98.74 % accuracy rate is obtained In this study, we used CICInvesAndMal2019 data set, which
by using 5,560 malware and 16,666 benign samples. is provided by the University of New Brunswick, Canadian
Mu et al., [11] proposed an Android detection system Institute for Cybersecurity. The CICInvesAndMal 2019 dataset
by extracting the API sequence of the malware using text is the second part of the CICAndMal 2017 dataset produced
processing in the Cuckoo sandbox. A total of 11000 samples and published by Lashkari et al. [16]. CICInvesAndMal2019
containing 8000 Android malware and 3000 good applications data set produced by Taheri et al., [17] which includes static
were used in the study. Among them, Android malware is features, includes 426 malware and 5,065 benign labeled
mostly collected from Virus Share, Google Play Store. Benign samples divided into four categories: Ransomware, Adware,
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
SMS Malware, Scareware. There are family types belonging
to each category. For example, it includes family kinds in other
categories such as Charger family, Simplocker family belong-
ing to the Ransomware category. Incorporating these features,
CICInvesAndMal2019 is an up-to-date data set consisting of
application data from real devices and labeled data based on
multiple reliable sources. The used data set consisted of static
features, which are: permissions, and intents. The data set used
includes 8115 features and 396 malware samples and 1126
benign samples.
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
AndroidManifest.xml file and converted to vector form for TABLE I
training with learning methods. P ROPOSED M ODELS A RCHITECTURE .
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
TABLE II accuracy. The BiLSTM model achieved an accuracy rate of
C LASSIFICATION P ERFORMANCE E XAMINATION R ESULTS . 98.85%, providing better performance than other proposed
Models Accuracy (%) F1- Score (%) Recall (%) Precision (%) algorithms.
RNN 98.02 97.36 96.88 97.85 As an ongoing work, we are planning to propose a hybrid
LSTM 98.75 97.35 97.03 97.69 model using supervised and semi-supervised deep learning
BİLSTM 98.85 98.21 97.52 98.91
GRU 98.10 97.53 97.23 97.85
algorithms in the future against cyber threats that continue to
increase and change methods. In addition, the size of dataset is
increasing day by day. This is also important for data science
the problems of standard RNNs, which are very similar to concept. Currently in the used dataset, there exist 8115 features
LSTM, are new generation RNNs with fewer parameters since in it. To chose the better features set is a trivial problem for the
they have two gates. researchers. It is planned to use of different feature selection
The confusion matrix is an evaluation model that provides algorithms in our future work for decreasing the noise and
information about the classification performance of the learn- making an efficient analysis. Additionally to decrease the train
ing algorithm with a holistic approach. We evaluated accuracy, time and increase the effectiveness of the system GPU power
precision, recall, and F1 score for performance evaluation with can be taken into consideration as mentioned in [29].
the complexity matrix for the performance of the classifiers.
The formulas of the performance evaluation metrics used are ACKNOWLEDGMENT
given below [28]. This work has been supported by Marmara University
Where, the True Positive (TP) is the number of positive Scientific Research Projects Coordination Unit under grant
predicted values that are actually positive. The True Negative number FDK-2020-10066.
(TN) is the number of negative predicted values that are
actually negative. The False Positive (FP) is the number of R EFERENCES
values predicted as positive but actually negative. The False
[1] S. Kemp, Digital 2022: Global Overview Report, DataReportal – Global
Negative (FN) is the number of values predicted as negative Digital Insights, Jan. 26, 2022. https://datareportal.com/reports/digital-
but actually positive. 2022-global-overview-report. (accessed Apr. 19, 2022).
[2] Mobile Operating System Market Share Worldwide, Stat-
TP Counter Global Stats. https://gs.statcounter.com/os-market-
P recision = (1) share/mobile/worldwide/monthly-202103-202203-bar (accessed Apr.
TP + FP 19, 2022).
TP [3] Google Play: growth of available apps 2021,- Statista.
Recall = (2) https://www.statista.com/statistics/185729/google-play-quarterly-
TP + FN growth-of-available-app. (accessed Apr. 12, 2022).
TP + TN [4] “Malware Statistics and Trends Report — AV-TEST,” Av-test.org.
Accuracy = (3) https://www.av-test.org/en/statistics/malware/ (accessed Apr. 19, 2022).
TP + FP + FN + TN [5] E. C. Bayazit, O. K. Sahingoz, and B. Dogan, “Neural Network Based
P recision ∗ Recall Android Malware Detection with Different IP Coding Methods,” IEEE
F 1 − Score = 2 ∗ (4) Xplore, Jun. 01, 2021. https://ieeexplore.ieee.org/document/9461302.
P recision + Recall (accessed Apr. 19, 2022).
D. Experimental Results [6] A. Kapratwar, F. Di Troia, and M. Stamp, “Static and Dynamic
Analysis of Android Malware,” Proceedings of the 3rd International
This study aims to compare deep learning models in terms Conference on Information Systems Security and Privacy, 2017, doi:
of performance using static features in malware detection. The 10.5220/0006256706530662.
[7] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, “A Multimodal
performance results of the experimental studies performed are Deep Learning Method for Android Malware Detection Using Various
shown in Table 2. Features,” IEEE Transactions on Information Forensics and Security, vol.
14, no. 3, pp. 773–788, Mar. 2019, doi: 10.1109/TIFS.2018.2866319.
VI. CONCLUSIONS [8] B. H. Tang et al., “Android Malware Detection Based on Deep
Learning Techniques,” 2021 4th International Conference on Pat-
The fact that smart devices in the Android market are tern Recognition and Artificial Intelligence (PRAI), Aug. 2021, doi:
becoming more accessible day by day has increased their 10.1109/prai53619.2021.9551073.
usability in all areas of our daily life. This situation increases [9] I. U. Haq, T. A. Khan, and A. Akhunzada, “A Dynamic Robust DL-
Based Model for Android Malware Detection,” IEEE Access, vol. 9,
the need for Android applications as the dominant operating pp. 74510–74521, 2021, doi: 10.1109/access.2021.3079370.
system in this area. Malicious applications are great threat not [10] F. Wenbo, Z. Linlin, W. Chenyue, H. Yingjie, Y. Yuaner and Z.
only in desktop computers but also in this rapidly growing Kai,”AMC-MDL: A Novel Approach of Android Malware Classi-
fication using Multimodel Deep Learning,” 2020 IEEE Intl Conf
mobile operating system market and it requires a robust and on Dependable, Autonomic and Secure Computing, Intl Conf on
effective detection system. In this study, a deep learning- Pervasive Intelligence and Computing, Intl Conf on Cloud and
based model with static analysis technique is proposed to Big Data Computing, Intl Conf on Cyber Science and Technology
Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 251-256,
detect malwares in Android operating systems. The proposed doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00052.
detection system was presented by testing it on the publicly [11] T. Mu, H. Chen, J. Du, and A. Xu, “An Android Malware Detection
available CICInvesAndMal2019 dataset using some up-to-date Method Using Deep Learning Based on API Calls,” 2019 IEEE 3rd
Advanced Information Management, Communicates, Electronic and
deep learning algorithms based on RNN. The proposed models Automation Control Conference (IMCEC), Oct. 2019, doi: 10.1109/im-
have competent performance values in terms of detection and cec46724.2019.8983860.
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
[12] S. HR, ”Static Analysis of Android Malware Detection using
Deep Learning,” 2019 International Conference on Intelligent Com-
puting and Control Systems (ICCS), 2019, pp. 841-845, doi:
10.1109/ICCS45141.2019.9065765.
[13] Y. M. Chen, C. H. Hsu, and K. C. Kuo Chung, “A Novel Preprocessing
Method for Solving Long Sequence Problem in Android Malware Detec-
tion,” 2019 Twelfth International Conference on Ubi-Media Computing
(Ubi-Media), Aug. 2019, doi: 10.1109/ubi-media.2019.00012.
[14] O. N. Elayan and A. M. Mustafa, “Android Malware Detection Using
Deep Learning,” Procedia Computer Science, vol. 184, pp. 847–852,
2021, doi: 10.1016/j.procs.2021.03.106.
[15] Y. Zhang, Y. Yang, and X. Wang, “A Novel Android Malware Detection
Approach Based on Convolutional Neural Network,” Proceedings of the
2nd International Conference on Cryptography, Security and Privacy,
Mar. 2018, doi: 10.1145/3199478.3199492.
[16] A. H. Lashkari, A. F. A. Kadir, L. Taheri and A. A. Ghorbani,
”Toward Developing a Systematic Approach to Generate Benchmark
Android Malware Datasets and Classification,” 2018 International Car-
nahan Conference on Security Technology (ICCST), 2018, pp. 1-7, doi:
10.1109/CCST.2018.8585560.
[17] L. Taheri, A. F. A. Kadir and A. H. Lashkari, ”Extensible Android
Malware Detection and Family Classification Using Network-Flows
and API-Calls,” 2019 International Carnahan Conference on Security
Technology (ICCST), 2019, pp. 1-8, doi: 10.1109/CCST.2019.8888430.
[18] S. C. Kalkan and O. K. Sahingoz, ”Deep Learning Based Classification
of Malaria from Slide Images,” 2019 Scientific Meeting on Electrical-
Electronics Biomedical Engineering and Computer Science (EBBT),
2019, pp. 1-4, doi: 10.1109/EBBT.2019.8741702.
[19] G. Sismanoglu, M. A. Onde, F. Kocer and O. K. Sahingoz, ”Deep
Learning Based Forecasting in Stock Market with Big Data Ana-
lytics,” 2019 Scientific Meeting on Electrical-Electronics Biomedi-
cal Engineering and Computer Science (EBBT), 2019, pp. 1-4, doi:
10.1109/EBBT.2019.8741818.
[20] “Android Security Features,” Android Open Source Project.
https://source.android.com/security/features (accessed Apr. 20, 2022).
[21] H. Kim, T. Cho, G.-J. Ahn, and J. Yi, “Risk assessment of mobile
applications based on machine learned malware dataset,” Multimedia
Tools and Applications, 2017, doi: 10.1007/s11042-017-4756-0.
[22] M. G. Rekoff, “On reverse engineering,” IEEE Transactions on Systems,
Man, and Cybernetics, vol. SMC-15, no. 2, pp. 244–252, Mar. 1985, doi:
10.1109/tsmc.1985.6313354.
[23] E. C. Bayazit, O. Koray Sahingoz and B. Dogan, ”Malware Detection
in Android Systems with Traditional Machine Learning Models: A
Survey,” 2020 International Congress on Human-Computer Interaction,
Optimization and Robotic Applications (HORA), 2020, pp. 1-8, doi:
10.1109/HORA49412.2020.9152840.
[24] S. Jha, D. Prashar, H. V. Long, and D. Taniar, “Recurrent Neural
Network for Detecting Malware,” Computers and Security, p. 102037,
Sep. 2020, doi:10.1016/j.cose.2020.102037.
[25] F. A. Gers, “Learning to forget: continual prediction with LSTM,” 9th
International Conference on Artificial Neural Networks: ICANN ’99,
1999, doi: 10.1049/cp:19991218.
[26] T. Zhang and R. Xu, “Performance Comparisons of Bi-LSTM and Bi-
GRU Networks in Chinese Word Segmentation,” 2021 5th International
Conference on Deep Learning Technologies (ICDLT), Jul. 2021, doi:
10.1145/3480001.3480011.
[27] B. Athiwaratkun and J. W. Stokes, “Malware classification with LSTM
and GRU language models and a character-level CNN,” IEEE Xplore,
Mar. 01, 2017. https://ieeexplore.ieee.org/abstract/document/7952603
(accessed Apr. 20, 2022).
[28] D. M. W. Powers, “Evaluation: from precision, recall and F-measure to
ROC, informedness, markedness and correlation,” arXiv:2010.16061 [cs,
stat], Oct. 2020, [Online]. Available: https://arxiv.org/abs/2010.16061.
(accessed Apr. 19, 2022).
[29] S. I. Baykal, D. Bulut and O. K. Sahingoz, ”Comparing deep learning
performance on BigData by using CPUs and GPUs,” 2018 Electric Elec-
tronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT),
2018, pp. 1-6, doi: 10.1109/EBBT.2018.8391429.
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on July 08,2022 at 10:18:01 UTC from IEEE Xplore. Restrictions apply.
View publication stats