Ijatcse 46915 SL 2020
Ijatcse 46915 SL 2020
attacker that might spread the malware that makes the system
ABSTRACT act differently than it is supposed to behave. The malware
During the past year until now, the amount of malware usually sent such fraudulent message and charge the user for
targeting Android operating system has been rising their fake services.
dramatically. Therefore, Android malware detection are
required to detect the malware before getting more serious. According to the Security Threat Report released by
The static analysis examines the full code of application Symantec in 2018 [12], the overall target activities that
meticulously while dynamic analysis identifies the malware attacked is up by 10 percent in 2017. In fact, by March 12,
applications by monitoring it behaviors. This study proposed 2018, there are 4, 964, 460 devices infected by RottenSys
a malware detection system by using machine learning malware [4]. This situation desperately needs to find a
approach and aims to detect malware that has attacked potential method to detect malware before it harmed more
Android operating system. In this research paper, the Android Android smartphones. In this era globalization, people
malware detection system are trained using five types of commonly used smartphones in such many ways like using a
classifiers meanwhile WEKA is used for simulation process. network connection to interact with the world. For example,
The dataset used contains 10k of malware and 10k of benign. online shopping, online banking, and cloud storage. Naturally,
The outcomes presented Random Forest classifiers attained there are also has disadvantages by using this kind of network
highest accuracy result, 89.36% compared to Naïve Bayes connections towards the user. Like example, the storing of
which 89.2%. TPR is viewed as detection rate which precisely confidential information in smartphones might attract the
predicted malware process while FPR is choosed as detection attacker to use dirty things in order to get user details like
rate which inaccurately predicted normal as malware. To spreading malware towards some software or applications that
evaluate detection exactness which is good or bad, the area might be installed in their smartphones either they realized or
under the curve (AUC) have been applied through this study. not especially for Android users.
The results show that Naïve Bayes has the lowest model
complexity as it uses minimal time to build the model. Hence, Besides, there are many kinds of existing research that had
it can be concluded that achieving reasonable accuracy and been proposed to detect the malware by using various types of
effectiveness in classifying unknown malware helps to techniques and methods that implement into the application.
determine the performance of the classifiers. For example, Google published an automated scan system for
potential malware which is called as Bouncer application [8].
Key words: Android, Intrusion, Machine Learning, Malware. However, there is still has room for improvement of Android
malware detection. The reason is the different type of method
1. INTRODUCTION and techniques will come out with a different rate of error
results.
Here Nowadays, the usage of mobile devices or smartphones
has increasing in our daily and almost all of the people around Furthermore, there is still some false alarm occurred on
world own a smartphone. According to Global market share, Android devices that tricked the user. For example, there are
during second quarter of 2018, there was 88% smartphones in 600,000 of Android user that have been downloaded the fake
the market have been sold towards end users and that is guide applications such as Pokemon Go and FIFA mobile.
Android systems [13]. This is because, they are mistakenly downloaded the malware
application when they want to seek the guide for the games
Besides, it is becoming more and more popular because of its [14]. These prove that not all the techniques have been
portability and convenient to use. For an example, the successfully developed in order to give protection for Android
smartphone contains various types of functions and services smartphones. Hence, the lead contributions of this research as
like it can hold the personal information and access files that follow:
usually been stored in the cloud such as bank account
information, email details, password and it also allows the i. To review current issue related to the Android malware
user to interact with each other by sending a message or call. detection system.
However, with the growth of the Android mobile popularity
has brought many security concerns and threats from the ii. The evaluation study applied machine learning approach
has improved the malware detection system.
327
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
iii. To evaluate propose of the system in terms of accuracy engineering technique to detect the android malware
of malware detection. detection. The authors focused on static approach based on an
automatic analysis of decompiled mobile application codes.
The rest of this paper are sorted as follows. The Section 2 In this research, the unique feature vector derived from Java
discuss about related work that have been used by previous code application was build. There are 696 number of features.
researcher. Meanwhile, Section 3 explained the details about
The authors divided them within three categories which are
the research method during experiment and Section 4
model implementation of onReceive() methods for
evaluates the successfulness in detecting the malware. Lastly,
section 5 is the conclusion and future work of this paper. BroadcastReceiver component. There are also commands
group that obtain administrative access to the device, expand
the opportunities of attack and hide the operation of malware
2. RELATED WORK on that devices. Based on the selected basis [5] the API Calls
contains the largest group of features which is 616.
Machine learning mainly known as an artificial intelligence
(AI) application that provide the system potentiality to
The second research by [2] also use the machine learning
automatically learn and upgrade experience without being
technique in their research to improve Android malware
program explicitly. Therefore, the machine learning approach
detection using big data of analytics. The author provided a
can provide a solution to improve the decision-making
comparison of seven different machine learning classifiers on
process [9]. Recently, machine learning approach have been
the SherLock dataset [19] which is one of the largest datasets
used to perform the decision-making task such as text-based
of Android malware. Using 35 GB of dataset and 17 node
on sentiment analysis and pattern recognition, detecting the
Spark cluster, the authors comparing the different classifiers
malware, network intrusion detector and etc. [3][25][26].
including Logistic Regression, Isotonic Regression, Random
There are some machine learning methods as stated in [21].
Forest, Gradient Boosted Trees, Decision Trees, Support
• Supervised machine learning algorithms able to predict the Vector Machine (SVM), and Multilayer Perceptron. Besides,
upcoming events by soliciting the things from the previous they observed the tree based on techniques provide better
learning to the new data using labeled examples. By using result in general. Moreover, Gradient boosted trees provided
this learning, inferred function can be produced to make approximately 91% precision and it is the highest among all
prediction about the output values which is training dataset the seven techniques. The authors also compared the FPR in
after analysis. Besides, this algorithm also able to compare detecting benign applications and observed that the gradient
the output with the exact one, intended output and find error boosted tree techniques have lowest false positive.
to adjust the model accordingly. Furthermore, they deployed their trained model on private
• Meanwhile for unsupervised machine learning algorithms, it cloud to facilitate the malware application detection in
trained the information neither classified nor labeled. This real-time. Therefore, the authors envisioned that a service
algorithm study how systems can infer the function to relate could be extremely useful for the communities.
with the hidden structure from unlabeled data. Moreover,
the system explores the data and draw the inferences from In the third research by [17] the main purposes are to solve the
datasets to express the unlabeled data hidden structures problem about malware detection depends on the network
without figure out the exact output. monitoring instead using static approaches by analyzing the
• Semi-supervised machine learning algorithms fall different network-based detection solutions that engaged the
somewhere in between supervised and unsupervised machine learning techniques and proposes enhancements to
learning because labeled and unlabeled will be used to train detect the malware precisely. The evaluation process consists
the data. two stages which are performing experiments and analysis.
• Reinforcement machine learning algorithms interacted with The experiment executes three sub stages such as data
its environment by making actions and locates the errors. collection, machine learning classifiers and feature selection
This method permitted the machines and software agents to and extraction. During data collection stage, network traffic
involuntary determine ideal behavior within specific context for benign and malicious applications was captured.
to maximize their performances. The existing studies of malware detection system used a
dataset, features, precision and etc. to calculate either the
malware is existed or not in the system. Despite that, none of
Machine learning authorize an analysis of huge amount of the existing machine learning system can give permission to
data. Besides, it mainly distributes the correct results faster to prevent the malware from entering their system. Therefore,
recognize the profitable chances or dangerous threat. In this research would like to improve an existing system by
addition, it required additional time and resources to train giving permission to certain software that cleared from the
correspondingly [22] By combining machine learning malicious code in it. Meanwhile, for the software that contains
between AI with cognitive technologies can make it more malware in it, the system would not give any permission from
productive in measuring information in large values. entering the system.
secondary
During first research by [18] use machine learning and reverse
328
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
3. RESEARCH METHOD
329
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
330
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
Actual Malware 1288 8712 Table 5 presented the time taken to produce results in second.
The results show that Naïve Bayes has the lowest model
Naïve Bayes Actual Benign 9135 865 complexity as it uses minimal time to build the model. Hence,
Actual Malware 1929 8071 it can be concluded that to achieve reasonable accuracy and
J48 Actual Benign 9095 905
effectiveness in classifying unknown malware as it helps to
Actual Malware 1259 8741
Decision Table Actual Benign 9126 874
determine the performance of the classifiers.
Actual Malware 1300 8700
MLP Actual Benign 9100 900 After completing the experiments, the findings of the study
Actual Malware 1276 8724 show that machine learning able to produce the most accurate
detection using five types of machine learning classifiers. The
obtained results from previous research paper seems to be
agreed that machine learning provides the best result in
prediction. Based on this experiment, the result from
analyzing the permission features of dataset provide better
331
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
332
Nuren Natasha Maulat Nasri et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(1.5), 2020, 327 – 333
.https://www.statista.com/statistics/266136/global-marke
t-share-held-by-smartphone-operating-systems/
14. D. Palmer, “FalseGuide malware dupes 600,000 Android
users into joining botnet,” ZDNet, 2017.
15. S. Y. Yerima, S. Sezer, and G. McWilliams, “Analysis of
Bayesian classification-based approaches for Android
malware detection,” IET Inf. Secur., vol. 8, no. 1, pp.
25–36, 2014.
16. Z. Mas’Ud, S. Sahib, M. F. Abdollah, S. R. Selamat, and
R. Yusof, “Analysis of features selection and machine
learning classifier in android malware detection,” ICISA
2014 - 2014 5th Int. Conf. Inf. Sci. Appl., pp. 1–5, 2014.
17. D. A. Alotaibi, M. F. Aldakheel, N. S. Al-serhani, R.
Zagrouba, I. Technology, and S. Arabia, “MACHINE
LEARNING BASED ON MALWARE DETECTION IN
MOBILE,” vol. 31, no. 3, pp. 505–511, 2019.
18. M. Kedziora, P. Gawin, and M. Szczepanik, “Android
Malware Detection Using Machine Learning And
Reverse Engineering,” no. 616, pp. 95–107, 2018.
19. Y. Mirsky, A. Shabtai, L. Rokach, B. Shapira, and Y.
Elovici, “SherLock vs moriarty: A smartphone dataset for
cybersecurity research,” AISec 2016 - Proc. 2016 ACM
Work. Artif. Intell. Secur. co-located with CCS 2016, pp.
1–12, 2016,
20. T. S. Chou, J. Pickard, and C. Popoviciu, “Machine
learning based IP network traffic classification using
feature significance analysis,” WMSCI 2018 - 22nd
World Multi-Conference Syst. Cyber Informatics, Proc.,
vol. 1, no. 3, pp. 1–3, 2018.
21. E. S. Team, “What is Machine Learning? A definition,”
Expert System, 2017
22. N. S. Zaini et al., “Phishing detection system using
machine learning classifiers,” Indones. J. Electr. Eng.
Comput. Sci., vol. 17, no. 3, pp. 1165–1171, 2019,
23. M. F. A. Razak, N. B. Anuar, R. Salleh, A. Firdaus, M.
Faiz, and H. S. Alamri, “‘Less Give More’: Evaluate and
zoning Android applications,” Meas. J. Int. Meas.
Confed., vol. 133, pp. 396–411, 2019.
24. M. H. Kamarudin, C. Maple, T. Watson, and N. S. Safa,
“A LogitBoost-based Algorithm for Detecting Known
and Unknown Web Attacks,” IEEE Access, vol. 3536,
pp. 1–12, 2017.
25. O. V. Lee et al., “A malicious URLs detection system
using optimization and machine learning classifiers,”
Indones. J. Electr. Eng. Comput. Sci., vol. 17, no. 3, pp.
1210–1214, 2020.
26. N. S. Zaini, D. Stiawan, A. F. Mohd Faizal Ab Razak, S.
K. Wan Isni Sofiah Wan Din, and T. Sutikno, “Phishing
detection system using machine learning classifiers,”
Indones. J. Electr. Eng. Comput. Sci., vol. 17, no. 3, pp.
1165–1171, 2020.
333