Malware Detection in Android in Different Application Categories
Malware Detection in Android in Different Application Categories
Malware Detection in Android in Different Application Categories
Abstract—In this smart era, mobile phone devices are very devices. The smartphones running under the Android
common in use. Most people are using it as a daily gadget, operating system are epidemically affected by severe
doing calculations, saving memos, using notebooks, keeping Android malware. Malware is considered as the most
account details, and even controlling their houses using smart threatening task that is affecting the businesses in reaching
sensors. Man, of this technological era, has earned many
their goals and stealing classified information. Malware is
inventions and innovations including smartphones. So, they
are at high risk by attackers and hackers. They are trying to comprising of two words i.e. Malicious + Software [7]. It is
inject malicious code that the user accepts as permission at the a term used by the attackers to harm the devices' security.
time of installing any application. Detection is necessary to Understanding malware is becoming more and more
protect the devices from injecting harmful malware. In this complicated, and the topic of interest in the mobile security
paper, we collected applications targeting five different industry. The countermeasures increase the complexity to
categories i.e. Islamic, Education, Home, Shopping, and detoxify the applications with cleanser in the technical
Medical. The applications are downloaded from their official industry.
marketplace and the permissions are extracted using
VirusTotal. The pre-analyzed dataset contains two thousand A. Malware Detection Background
and five hundred android applications. The malicious and
benign applications are identified with the help of permission
keywords. Machine Learning Classifiers including Support Commonly the malware detection techniques are of two
Vector Machine, Decision Tree, Gradient Boosted Tree, and types. (i) Static detection technique. (ii) Dynamic detection
Random Forest are used to analyze the pre-processed data by technique.
the latest version of Rapid Miner 9.8. The classifiers such as
the Decision tree and Random Forest give approximate values The static detection technique is an approach of analyzing
and the runtime rate is faster than the Gradient Boosted Tree source code before the execution of the different
and Support Vector Machine. applications downloaded by end-users [9]. This approach
unpacks and disassembles the code to extract some features
Keywords—Android Malware, Malware, Non-Malware,
Detection, Detection Techniques, Permissions, Machine Learning
of the apps to identify malware. Static based techniques are
Classifiers easily automated, flexible, and lightweight. Some well-
known static techniques are: Permission-based-technique,
I. INTRODUCTION Signature-based-technique, etc. [12]
Smartphones are the most-used devices conducting many The dynamic detection technique is an approach that
tasks from taking selfies, checking the weather to the identifies malicious reaction after executing the application
financial activities and businesses. Smartphones are being in a controlled environment. Some dynamic techniques are:
used at a large number of involvement of users nowadays. Anomaly-based-technique, Taint analysis, Emulation-
As smartphones are relatively new, so the cyber-attackers based-technique, etc. [12]
can easily catch the smartphones' damages, and viruses,
gain access to the emails, contacts, and other activities. Some researchers have identified the complexity of modern
Most of the previously developed cyber-attacks on personal malware. It is making it very difficult to detect. Agobot [17]
computers are repackaged versions that are attacking mobile is a malware program released in 2002, which attacks, steal
ISBN-978-969-23372-1-2 95
bank accounts details, and propagate itself over a network 94.5%. Experiencing one thousand and seven hundred
to avoid its detection. Malware comes in different forms benign apps from the Xiaomi market and, one thousand and
and uses different techniques to comply with their actions, six hundred malicious apps from malware families.
all depending upon the hacker's intent.
In another research in 2020, Rishabh Agarwal et al. [2]
B. Different permission types studied and highlighted the existing detection techniques
and analysis approaches used for android malevolent code.
Permission types can be categorized into two forms. Normal Machine learning algorithms are used to analyze malware
Permissions, Signature Permissions, and Dangerous by doing semantic analysis.
Permissions. [8]-[10]
In a research article in 2020 [3], an algorithm of android
Normal Permissions: malware detection is proposed using a hybrid deep-learning
Normal permissions, that the system automatically granted model. It combines Deep-Belief Networks (DBN) and Gate-
by the systems' app permission at the time of installation. Recurrent-Unit (GRU). Experiments result in comparison
Users cannot revoke these permissions. There are very low- with the traditionally customized machine learning
risk factors to the user's privacy and device protection. algorithms. Android malware detection model based on
hybrid deep learning algorithm gives higher detection
Signature permissions: accuracy.
These app permissions are granted by the system at the
installation, but only when the app permission is signed by In another research in 2019, [4] a detection model proposed
the same certificate as the app attempts to use the grant that to pursue and keep track of malware apps behaviors in such
defines the permission [5]. devices. An innovative machine learning classifier is
employed, and an alarm will be triggered if the android app
Dangerous Permissions: detects the malicious code. The proposed classification
Dangerous permission wants the data that involves the algorithm accomplishes high accuracy, true-positive rates,
private information of the user. They are granted only if the false-positive rates, etc.
user accepts the permission and it starts the installation.
Dangerous permissions contain harmful API calls which In 2019, the authors [18] proposed a new Android malware
will drain the user into a big loss. detection method based on the correlation relationship of
Such as SEND_SMS, RECEIVE_SMS or CALL_PHONE the API calls of applications. At the very first, they split the
are the permissions which give access to personal source code into the abstract API calls transactions and then
information, bank account details, loggings into the sites computed the confidence of the association of interrelated
and passwords, etc. rules between abstracted API calls.
96
of the application system call. The behavior is analyzed by A. Pre-processed data analytics of android apps:
executing on an emulator or in a controlled environment.
The dataset is achieved by collecting random applications
Several apps are available on the dataset provided on the containing both malicious and non-malicious content. The
http://amd.arguslab.org website. This dataset link contains pre-processed data is collected from an online tool i.e.
more than twenty-four thousand app-related data. It has APKPure. APKPure.com is a website providing smartphone
defined 135 types of malware, but the dataset does not software downloads. It is founded in the year 2014 by the
explicitly define the categories of analyzed apps. [14] APKPure Team and has grown into one of the leading
websites in the smartphone software industry. APKPure is
However, we collected our data of two thousand and five not affiliated with Google.com, Google Play, or Android
hundred samples (five nominated categories) from different anyway. These. apk files are downloaded relevantly to
sources. achieve the ratio of malicious and benign applications of
different categories under the umbrella of Google Play
II. METHODOLOGY Store provided in every smartphone and gadget. The five
targeted categories of the Play Store are Islamic, Education,
In this section, there are four phases. The first phase Medical, Shopping, and Home. Five hundred .apk files of
includes the downloading and installation of categorical each category are downloaded and extracted. In a
apps. The apps are downloaded from a well-known website meanwhile, the total number of downloaded .apk files is
APKPure.com [16]. It provides a huge number of two thousand five hundred. The .apk files are then uploaded
applications verified by the official marketplace Google to the VirusTotal one by one to get the desired permission
Play. In the second phase, the manifest file is extracted. The keywords. Each application was scanned and filtered out as
third phase covers the detection of malware and benign apps benign or malicious. The results were recorded based on
using an online tool VirusTotal [15]. The fourth phase hazardous and non-hazardous permission. Table 1 shows
analyzes the apps containing severe and harmful malware the permission keywords of the benign apps and table 2
using machine learning algorithms. shows the permission keywords of the malicious apps.
III. IMPLEMENTATION
97
However, android. permission.SEND_SMS allows an
application to send SMS.android.
android.permission.READ_EXTERNAL_STORAGE
allows an application from external storage. Android.
permission.ACCESS_NETWORK_STATE allows network
information etc.
(NM=Non-malware; M=Malware)
98
Fig.2 Data loading process of different classifiers
IV. RESULTS
99
To measure the purity of the dataset, entropy is calculated [6] Li, Jin, et al. "Significant permission identification
by using its formula is given in (1) for machine-learning-based android malware
detection." IEEE Transactions on Industrial
n
p(s ) log
i 1
i 2 p ( si ) Informatics 14.7 (2018): 3216-3225.
Entropy = (1)
[7] Ali, Kashif, Saleem, Muhammad and Ali, Baqar,
2364 2364 136 136 “Detection and Prevention of Malware in Android
Entropy = - 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 +⋯ Operating System.” 1st International Conference on
2500 2500 2500 2500
Computational Sciences and Technologies, INCCST, 2019.
Entropy = 0.3
[8] Zachariah, Raima, et al. "Android malware
The less value of entropy shows an unbalanced dataset. detection a survey." 2017 IEEE international conference on
That is why the classifiers applied in this work are circuits and systems (ICCS). IEEE, 2017.
achieving fewer accuracies.
[9] Abubaker, Howida, Siti Mariyam Shamsuddin, and
Aida Ali. "Analytics on malicious android
applications." International Journal of Advances in Soft
V. CONCLUSION
Computing & Its Applications 10.1, 2018.
In this paper, we have targeted the categories of android
[10] Yan, Ping, and Zheng Yan. "A survey of dynamic
applications. There are hundreds of thousands of datasets
mobile malware detection." Software Quality Journal 26.3
available on the internet but not explicitly hitting the
(2018): 891-919.
categorial part. We constructedown dataset.We aim to
identify which category of Play Store contains more [11] Bai, Chongyang, et al. "DBank: Predictive
malware and which category contains the least malicious Behavioral Analysis of Recent Android Banking
apps.One of the future work directions is the applicability of Trojans." IEEE Transactions on Dependable and Secure
different techniques to make data balanced to improve the Computing (2019).
accuracy of the classifiers. Another extension of this work
is to apply cross-validation techniques. [12] Murtaz, Muhammad, et al. "A framework for
Android Malware detection and classification." 2018 IEEE
ACKNOWLEDGMENT 5th International Conference on Engineering Technologies
and Applied Sciences (ICETAS). IEEE, 2018.
The authors would like to thank the anonymous reviewers
for their insightful feedback on this work. [13] Mostafa, Ahmad H., Marwa MA Elfattah, and Aliaa
AA Youssif. "An intelligent methodology for malware
REFERENCES
detection in android smartphones based on static analysis."
[1] Jiang, X., Mao, B., Guan, J., & Huang, X. (2020). International Journal of Communications 10 (2016).
Android malware detection using fine-grained
[14] http://amd.arguslab.org/behaviors
features. Scientific Programming, 2020.
[15] https://www.virustotal.com/gui/
[2] Agrawal, Rishab, et al. "Android Malware
Detection Using Machine Learning." 2020 International [16] https://apkpure.com/app
Conference on Emerging Trends in Information Technology
and Engineering (ic-ETITE). IEEE, 2020. [17] Ayed, Ahmed Ben. Android Security: Permission
Request Analysis Using Self-Organizing Maps in Android
[3] Lu, Tianliang, et al. "Android Malware Detection Malware Applications. Diss. Colorado Technical
Based on a Hybrid Deep Learning Model." Security and University, 2017.
Communication Networks 2020 (2020).
[18] Zhang, Hanqing, et al. "An efficient Android
[4] Zhou, Qingguo, et al. "A novel approach for malware detection system based on method-level behavioral
mobile malware classification and detection in Android semantic analysis." IEEE Access 7 (2019): 69246-69256.
systems." Multimedia Tools and Applications 78.3 (2019):
3529-3552.
100