0% found this document useful (0 votes)
66 views15 pages

18.hybrid Intelligent Android Malware Detection Using Evolving Support Vector Machine Based On Genetic Algorithm and Particle Swarm Optimization

This document discusses the development of hybrid intelligent approaches for Android malware detection using an evolving support vector machine optimized with genetic algorithms and particle swarm optimization. The fast growth of Android applications has led to an increase in malware, which can be difficult to detect with conventional signature-based methods. The proposed approaches aim to overcome limitations of single machine learning classifiers and other hybrid detection methods by integrating support vector machines with genetic algorithms and particle swarm optimization to improve detection accuracy, especially for newly developed malware.

Uploaded by

ziklonn x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views15 pages

18.hybrid Intelligent Android Malware Detection Using Evolving Support Vector Machine Based On Genetic Algorithm and Particle Swarm Optimization

This document discusses the development of hybrid intelligent approaches for Android malware detection using an evolving support vector machine optimized with genetic algorithms and particle swarm optimization. The fast growth of Android applications has led to an increase in malware, which can be difficult to detect with conventional signature-based methods. The proposed approaches aim to overcome limitations of single machine learning classifiers and other hybrid detection methods by integrating support vector machines with genetic algorithms and particle swarm optimization to improve detection accuracy, especially for newly developed malware.

Uploaded by

ziklonn x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336777808

Hybrid Intelligent Android Malware Detection Using Evolving Support Vector


Machine Based on Genetic Algorithm and Particle Swarm Optimization

Article · October 2019

CITATIONS READS

5 260

1 author:

Waleed Ali

24 PUBLICATIONS   357 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multimedia Streaming over Bandwidth-Constrained Networks View project

All content following this page was uploaded by Waleed Ali on 24 October 2019.

The user has requested enhancement of the downloaded file.


IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 15

Hybrid Intelligent Android Malware Detection Using Evolving


Support Vector Machine Based on Genetic Algorithm and
Particle Swarm Optimization
Waleed Ali

Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University,
Rabigh, Kingdom of Saudi Arabia

Summary The fast growth rate of Android applications has led to a


The Android platform has become the most common mobile huge increase in the development and spread of Android
platform of smart mobile devices that attracts many users, malware applications by cyber attackers and criminals.
developers and vendors. Accordingly, millions of Android Primarily, the malware applications can be developed by
applications have been created to offer many functionalities and Android apps developers and then distributed in third-party
services to users. However, the fast growth rate of such
applications has led to a huge increase in the development and
Android markets since there are often no constrains for
spread of Android malware applications by cyber attackers and Android apps developers. Even at the official Google Play
criminals. In order to overcome the difficulties faced by the store, many new malware apps are periodically discovered
conventional signature-based methods, this paper suggests hybrid and not all Android malware Apps can be accurately
intelligent Android malware detection approaches based on detected, especially in the early stages of publication in the
evolving support vector machine with evolutionary algorithms in Google Android Market [4, 5].
order to enhance Android malware detection. In the proposed Many commercial Android malware detection tools and
hybrid intelligent evolving approaches, the optimization problem anti-virus programs have used traditional signature-based
in support vector machine is solved using a genetic algorithm (GA) methods, which are based on fixed identifiers called
and a particle swarm optimization (PSO), referred to as Droid-
HESVMGA and Droid-HESVMPSO, in order to help in
signatures [6-8] to detect the Android malware apps.
increasing the accuracy of the Android malware detection. The However, the conventional signature-based approaches
experimental results showed that the proposed Droid-HESVMGA cannot detect the recently developed malware apps,
and Droid-HESVMPSO approaches achieved the best detection especially zero-day Android malware apps [6-8]. Hence,
results and substantially outperformed the most popular machine there is a need to develop more effective and adaptive
learning classifiers and other existing hybrid malware detection solutions for Android malware detection.
approaches. In order to overcome limitations of the conventional
Key words: Android malware detection approaches, numerous single
Android, Malware, Support vector machine, Genetic algorithm, popular machine learning algorithms [6, 7, 9-16] have been
Particle swarm optimization. trained and then applied to detect the Android malware apps.
The support vector machine (SVM) algorithm has been
commonly used in literature for detecting malware apps, as
1. Introduction it has many advantages over the other machine learning
Recently, smartphones and mobile devices have become the techniques. However, only the classical SVM has been
most commonly used devices for personal and business use. employed in Android malware detection, although the
Recent reports and studies have reported that the number of classical SVM is still not good enough compared to the
mobile device users has been increasing rapidly and will advanced machine learning classifiers. Furthermore, it can
reach 6.1 billion by 2020 [1, 2]. be quite long time-consuming in the learning phase, as it is
Over the past few years, the Android platform has become based on an analytical approach or complex mathematical
the most common mobile platform of smart mobile devices calculations [17, 18].
as it is free and open-source. In addition, it can be In addition to single classifiers, ensemble learning methods
customized simply by users, developers and vendors. [14, 19] and fusion approaches [8] for multiple machine
Accordingly, millions of Android applications have been learning classifiers have also been exploited in order to
developed to offer many functionalities and services to enhance the detection accuracy of Android malware apps.
users. Recent reports have indicated that over 2.5 million The single classifiers and ensemble methods have achieved
Android apps are available on Google Play store, which is better detection performance compared to traditional
considered the largest apps store [3]. signature-based methods. However, the question: “Which
16 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

are the most effective machine learning techniques to 2. Related Work


maximize the performance of Android malware detection”
is still a popular research subject [8, 20-22]. In order to overcome the difficulties faced by the
Although many intelligent approaches based on machine conventional signature-based methods, machine learning
learning have become frequently utilized to detect Android techniques have been employed to discriminate the new
malware apps in recent years, not much work has focused malware from benign apps [8, 19, 20, 23]. In Android
on the hybrid intelligent Android malware detection malware detection, the static analysis-based machine
approaches. Hybrid intelligent approaches can be utilized to learning approach and the dynamic analysis-based machine
produce promising solutions with a higher detection learning approach are two popular approaches used to
accuracy of the Android malware apps, since they benefit detect the malware apps.
from the advantages of the integrated algorithms and In the static analysis-based approach, the machine learning
overcome their individual drawbacks. models are trained based on static features of Android apps,
Over the past few years, evolutionary algorithms have been which are extracted without installing or running these
effectively implemented in many recent applications and Android apps, in order to detect the malware applications.
fields, including optimization, feature selection, pattern On the other hand, the dynamic-based approach requires
recognition, classification, and clustering. The genetic installing, running and then monitoring the dynamic
algorithm (GA) and particle swarm optimization (PSO) are behavior of the Android application to collect the dynamics
the most well-known evolutionary algorithms used in features in order to train the machine learning models.
complicated optimization problems. The GA is the most The static malware analysis-based malware detection
common evolutionary algorithm, based on imitation of methods are easier, faster, and less resource-intensive [8,
biological evolution in chromosomes. The PSO is a well- 20-22] compared with the dynamic analysis-based methods,
known optimization and search algorithm inspired by the since they do not require the android applications to be
social and cooperative behavior of birds flocking, which installed or run. Therefore, much emphasis in this study is
belongs to a family of evolutionary algorithms. focused on intelligent malware detection methods based on
In this paper, hybrid intelligent evolving approaches based static malware analysis.
on support vector machine with evolutionary algorithms are In recent years, several existing intelligent methods using
proposed to enhance Android malware detection. The machine learning have been developed based on static
proposed methods integrate the attractive advantages of features of Android apps in order to detect Android malware
evolutionary algorithms and the good performance of the applications. Support vector machine (SVM) [10-12, 16],
support vector machine in order to produce hybrid naive Bayes [11, 13], k-NN [7, 9, 13], neural network [11],
intelligent evolving Android malware detection approaches decision tree [7, 12], random forest [7, 11, 12, 14, 16],
with outstanding performance. In these proposed hybrid regularized logistic regression [6], neuro-fuzzy [24], hybrid
intelligent evolving approaches, genetic algorithm (GA) evolving neuro-fuzzy [20], deep belief networks [15, 23],
and particle swarm optimization (PSO) are adopted to ensemble learning method [14, 19], fusion approach [8] and
effectively solve the optimization problem of support vector other machine learning classifiers have been trained and
machine, so-called Droid-HESVMGA and Droid- constructed using static features for detecting Android
HESVMPSO, in order to help in increasing the accuracy of malware applications. Some previous intelligent Android
the Android malware detection. malware detection methods based on the static malware
The remainder of the paper is arranged as follows. Section analysis are summarized in Table 1.
2 reviews some existing works on intelligent Android By examining the existing works in Table 1, it can be
malware detection based on static malware analysis. observed that permissions, intents, and API calls were
Section 3 describes the basics of android malware extracted and then used to train some popular conventional
Applications including Android Architecture, Android machine learning classifiers. In this study, instead of the
Malware, and Android Permissions. The genetic algorithm classical machine learning classifiers used in the literature,
and particle swarm optimization are described in Section 4. alternative hybrid intelligent approaches based on evolving
Section 5 provides an explanation of the support vector support vector machine with a genetic algorithm and a
machine. Section 6 presents a methodology of the proposed particle swarm optimization are suggested in order to
hybrid intelligent android malware detection based on enhance the performance of Android malware detection.
evolving SVM with GA and PSO. The experimental results
of the proposed hybrid evolving intelligent android
malware detection are analyzed and discussed in Section 7.
Finally, the conclusion of the proposed work is given in
Section 8.
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 17

Table 1: Summary of the existing intelligent Android malware detection based on the static malware analysis
Approach Features Machine learning Feature selection Dataset source
DroidMat([9] Permissions, intents, and k-means and k-NN None Google Play and
API calls Contagio Mobile
Permissions, API calls Several markets,
DREBIN [10] and network addresses SVM None Google Play and
Genome
Mining API calls and Correlation-based Official, third party
permissions for Android Permissions and API Naive Bayes and k- feature selection and Android markets and
malware detection [13] calls NN information gain Android malware
Genome project
Static detection of Android Permissions and API Naive Bayes, SVM, Information Anzhi Market and
malware by using permissions calls MLP, random forest, gain Contagio Mobile
and API calls [11] and J48
Exploring Permission-induced Individual permission SVM, decision Forward selection Google Play
Risk in Android Applications and group of trees, and random (SFS) and principal , Mal Zhou and
for Malicious Application collaborative forest. component analysis VirusShare
Detection [12] permissions. (PCA)
A probabilistic discriminative
model for Android malware API calls and permissions Regularized logistic Information gain and Google Play and
detection with decompiled regression Chi-square Genome
source code [6]
kEFCM-based
K-ANFIS [24] Permission-based Adaptive Neuro- Information gain Google Play and
features Fuzzy Inference ratio Genome
System
High Accuracy Android Permissions and API call Random forest as McAfee’s internal
Malware Detection Using features an ensemble None repository
Ensemble Learning [14] learning method
Static analysis-based Deep belief Frequency analysis - Google Play, Contagio
DroidDetector [15] features and dynamic networks based feature Community and
analysis -based features evaluation Genome Project
Permission-based Hybrid neuro-fuzzy Information gain Google Play and
EHNFC [20] features classifier with ratio Genome Project
evolving clustering
Entropy based
Identification of malicious Static features from the SVM, random Category Coverage Google Play store and
Android app using manifest manifest and executable forest, and rotation Difference and Drebin dataset
and opcode features [16] files forests Weighted Mutual
Information
Utilizing sensitive Google Play, Anzhi
subgraphs to construct Random forest, Market, Android
DAPASA [7] five features depicting decision tree, k-NN, None Malware Genome
invocation patterns. and PART Project and
piggybacked families
Detecting Android malicious 11 types of static features SVM was used to
apps and categorizing benign from each app to Ensemble of sort the weight of Markets in China called
apps with ensemble of characterize the multiple classifiers each feature Anzhi and Wild
classifiers [19] behaviors of the app
DroidFusion [8] Permissions, API calls Fusion approach Information gain DREBIN and
and intents Malgenome project

Compared to the existing studies, the hybrid evolving


support vector machine proposed for Android malware
detection is more accurate and effective than the usual 3. Android Malware Applications
support vector machine and other classical machine
learning classifiers used in the literature. 3.1 Android Architecture
As in most of the existing intelligent approaches, the most
important permission features were extracted and then used The Android platform was introduced by Google on
to train the proposed hybrid evolving support vector September 23, 2008 based on a Linux kernel, and has
machine. Compared to intents and API call features, the become a leading mobile operating system. Android
permission features are the most common and considered consists of four layers: the Linux kernel layer, a native
the first line of defense in the Android system. Furthermore, library layer, an application framework layer, and the
extracting the permission features and then applying them application layer [25, 26].
on machine learning models are easier, and consume fewer The most significant layer that represents the core of the
resources and require a shorter time [20, 24]. Thus, the Android system is the Linux kernel layer. This layer is
permission features are more suitable to be used in the responsible for managing the services and hardware's
mobile environment to train the machine learning models. functions. The native library layer deploys system libraries
and the Android Dalvik virtual machine, which provides a
18 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

variety of functionalities and runtime environment for


Android applications. The application framework layer is
responsible for the Android APIs required to interact with
the running apps and manage the essential functions.
Eventually, all Android functionalities and applications are
running and provided to the end-user by the application
layer.

3.2 Android Malware


Malware is a malicious software and refers to several forms
of hostile or intrusive applications, which are intentionally
developed to damage, disrupt, steal, or primarily inflict Fig. 1 The total number of mobile malware applications, mostly
some illegitimate and harmful actions on the computer or targeting Android [29]
network. There are several common several types of
Android malware, depending on their purpose, such as Over the past few years, the fast growth rate of Android
worm, virus, trojan, adware, spyware, rootkit, backdoor, applications has led to a huge increase in developing and
keylogger, ransomware and remote administration tools spreading Android malware applications by cyber attackers
(RAT). and criminals. Although numerous malware apps are
Basically, Android applications are Java codes compiled by frequently detected, other new Android malware Apps
the Android SDK tools into an Android package (APK). cannot be accurately discovered in the early stages by third-
The APK file is a type of archive files with a .apk suffix, party Android markets, even by the official Market [4, 5].
which consists of the files and folders shown in Table 2 that Android malware apps can use different techniques in order
are used by Android to install the application [27, 28]. to hijack the mobile device and access personal data.
Primarily, repackaging, update attack and drive-by-
Table 2: Components of Android package (APK) download techniques are used to trick users into
Component Description
A meta-data XML file including downloading the Android malware apps [27, 30]. In
AndroidManifest.xml information related to application’s repackaging, some popular applications from legitimate
descriptions, package information sources are downloaded and disassembled by the malware
and security permissions.
A file that contains the source code developer. The malware developer can enclose the malware
of an Android application written in payload to these popular applications, and then resubmit
Classes.dex Java programming language
compiled into .dex format (Dalvik them to official or other Android markets. Over 80% of
Executable) Android malware is a repackaged application. In an update
Resources.arsc A binary XML file that contains attack, the malware developer attaches only an update
precompiled application resources
A folder that includes non-pre- component to these popular applications, instead of
Resources folder compiled resources that the attaching the whole payload into the application code. The
(res/) application needs in runtime, such as
pictures, layout, use of a database update component can then download the entire malware
and data stored in the database, etc. payload in the app’s runtime. The drive-by-download is a
An optional folder that contains conventional social-engineering method implemented in
Assets (assets/) application assets that can be
retrieved by AssetManager the mobile devices field, through which the malware
An optional folder that contains developer deceives the mobile user into installing
Libraries (lib/) compiled code that is specific for
different processors, such as arm, interesting apps, which will perform other expected actions.
mips, x86, etc
A folder that contains 3.3 Android Permissions
META-INF MANIFEST.MF file, APK signature,
etc.
In the Android platform, the security issue is mainly based
According to the McAfee threat report in September 2018 on permissions-based mechanism, which is utilized to
[29] there were over 25 million mobile malware protect Android users from undesirable activities of some
applications as shown in Fig. 1. According to this report, Android apps, such as accessing sensitive user data, system
over 2 million new mobile malware applications had been resources, and other app's data. The Android platform has
detected during the second quarter of 2018 only, mostly more than 130 official permissions [30, 31, 32]. Some
targeting Android, due to the vast distribution of Android permissions are commonly requested by the malware and
devices, as well as the relatively open system for the benign apps while other permissions are rarely requested.
distribution of apps. At the installation time, some permissions are immediately
granted by Android without user confirmation, while the
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 19

user’s approval is required with other permissions, The optimal hyperplane can be obtained as a solution to the
according to the category of the permission requested, either following optimization problem.
normal or potentially dangerous permission. 1
The low-risk permissions are classified under normal minimize ‖𝑤𝑤‖2 (2)
2
permission category, which are not particularly harmful and
do not present any risk to the user's privacy or the device's subject to 𝑦𝑦𝑖𝑖 �(𝑤𝑤. 𝑥𝑥𝑖𝑖 ) + 𝑏𝑏� ≥ 1 , ∀𝑖𝑖 (3)
operation such as INTERNET,
ACCESS_NETWORK_STATE, and
MODIFY_AUDIO_SETTINGS [20, 30]. On the other hand, In real-world applications, the data are usually influenced
the higher risk permissions could potentially affect the by outliers, which are affected by noise. The decision
user's privacy, hardware, software or system. These high- boundaries can be softened by introducing a slack positive
risk permissions are categorized under the dangerous variable ξ for each training pattern. Eq. (4) is called the
permission category. The malware apps are highly primal optimization of SVM.
interested in requesting the dangerous permissions to gain 𝐿𝐿
the required privileges in order to access sensitive 1
minimize ‖𝑤𝑤‖2 + 𝐶𝐶 � 𝜉𝜉𝑖𝑖 (4)
information. For example, READ_CONTACTS, 2
𝑖𝑖=1
WRITE_CONTACTS, CALL_PHONE, and SEND_SMS
are four dangerous permissions, requiring the explicit
subject to 𝑦𝑦𝑖𝑖 �(𝑤𝑤. 𝑥𝑥𝑖𝑖 ) + 𝑏𝑏� ≥ 1 − 𝜉𝜉𝑖𝑖 , ∀𝑖𝑖 (5)
approval of the user at the installation time [20, 30].
The permission request can be approved or rejected by the
user without stopping the application, which will run with where C is a positive regularization constant, which controls
limited capabilities. In Android 6.0 or higher, the dangerous the degree of penalization of ξ . Therefore, C controls
permissions must be granted by user at runtime, in the case allowable errors in the trained solution: high C permits few
where the user is not notified of any app permissions at the errors while low C allows a higher proportion of errors in
installation time. Even if the dangerous permissions are the solution.
granted by the user at the installation time, the user can To solve the convex optimization problem, Lagrangian
enable and disable permissions one-by-one in system multipliers 𝛼𝛼𝑖𝑖 are used to produce the to the dual
settings at runtime [33]. optimization problem, as shown in Eq. (6), which must be
solved in order to find a separating maximum margin
hyperplane for a given set of data points.
4. Support Vector Machine 𝑛𝑛 𝑛𝑛 𝑛𝑛
1
maximize � 𝛼𝛼𝑖𝑖 − � � 𝑦𝑦𝑖𝑖 𝑦𝑦𝑗𝑗 𝛼𝛼𝑖𝑖 𝛼𝛼𝑗𝑗 (𝑥𝑥𝑖𝑖 . 𝑥𝑥𝑗𝑗 ) (6)
The support vector machine (SVM) was introduced by 2
𝑖𝑖=1 𝑖𝑖=1 𝑗𝑗=1
Vapnik [34] and has become one of the most popular
machine learning techniques. SVM has many advantages subject to 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶 for all 𝑖𝑖 = 1, … , 𝑛𝑛 (7)
over others. The generalization ability of SVM can be 𝑛𝑛
maximized, since SVM is trained to maximize the margin. and � 𝛼𝛼𝑖𝑖 𝑦𝑦𝑖𝑖 = 0 (8)
In addition, there is a global optimum solution in SVM 𝑖𝑖=1
training. Furthermore, SVM is robust to outliers, because
the margin parameter C controls the misclassification error. In most cases, the data points are not linearly separable.
Therefore, SVM has been successfully applied in many Thus, the SVM will transform the data to a higher-
complex classification applications. dimensional space and then classify them using the same
Consider a set of training data vectors 𝑋𝑋 = {𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 }, principle as the linear case. A kernel function 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 ) is
𝑥𝑥𝑖𝑖 ∈ 𝑅𝑅𝑑𝑑 , and a set of corresponding labels 𝑌𝑌 = {𝑦𝑦1 , … , 𝑦𝑦𝑛𝑛 }, used to perform this transformation and the dot product in a
𝑦𝑦𝑖𝑖 ∈ {1, −1} . SVM aims to maximize the margin between single step. Thus, the final dual optimization problem using
the separating hyperplane and the closest instance in each kernel function can be expressed using Eq. (9) to find a
class in order to obtain the ideal hyperplane between the two separating maximum margin hyperplane for non-separable
different classes. The hyperplane can be expressed as in Eq. data points.
(1). 𝑛𝑛 𝑛𝑛 𝑛𝑛
1
maximize � 𝛼𝛼𝑖𝑖 − � � 𝑦𝑦𝑖𝑖 𝑦𝑦𝑗𝑗 𝛼𝛼𝑖𝑖 𝛼𝛼𝑗𝑗 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 ) (9)
(𝑤𝑤. 𝑥𝑥) + 𝑏𝑏 = 0, 𝑤𝑤 ∈ 𝑅𝑅𝑑𝑑 , 𝑏𝑏 ∈ 𝑅𝑅 (1) 2
𝑖𝑖=1 𝑖𝑖=1 𝑗𝑗=1

where the vector w defines the boundary, x is the input subject to 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶 for all 𝑖𝑖 = 1, … , 𝑛𝑛 (10)
vector of dimension d, and b is a scalar threshold.
20 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

𝑛𝑛
function. The fittest chromosomes will be given
and � 𝛼𝛼𝑖𝑖 𝑦𝑦𝑖𝑖 = 0 (11) more opportunities to reproduce and evolve.
𝑖𝑖=1 iv. Reproduction: As in biological evolution, a GA
can recombine the fittest chromosomes to create
The dual optimization problem of SVM is usually solved by new better chromosomes and solutions. The
classical optimization methods such as Sequential Minimal reproduction process is conducted through three
Optimization (SMO) [35], Kernel Adatron (KA) [36, 37] genetic operators: selection, crossover, and
and Quadratic Program (QP)[38]. However, these classical mutation.
optimization methods are based on an analytical approach • Selection: The better chromosomes are
or complex mathematical calculations. Furthermore, and selected based on the fitness values to become
their performances are modest compared to those of the parents to produce new chromosomes
evolutionary algorithms used in this paper. (offspring).
• Crossover: In the crossover operator, GA
randomly chooses a crossover point, where
5. Evolutionary Algorithms two parent chromosomes break, and then
exchanges the chromosome parts after that
In the past few years, evolutionary algorithms have become
point in order to create new offspring.
a very popular research topic, which have been effectively
• Mutation: The mutation operator changes the
employed in many applications and fields such as
gene value in some randomly chosen location
optimization, feature selection, pattern recognition,
of the chromosome.
classification, and clustering. Evolutionary algorithms are a
set of modern metaheuristic optimization algorithms based
Some selected chromosomes are iteratively evolved to
on the evolution of populations, which are primarily
produce a new generation of new better solutions. The
developed to solve complicated optimization problems.
reproduction and fitness evaluation are repeated until the
The most well-known evolutionary algorithms used in
termination criterion is satisfied.
optimization problems are the genetic algorithm (GA) [39]
and particle swarm optimization (PSO) [40].
5.2 Particle Swarm Optimization
5.1 Genetic Algorithm The particle swarm optimization algorithm (PSO) is a
common population-based optimization algorithm tied to
The genetic algorithm [39] is the most common
evolutionary computation, which was introduced by
evolutionary algorithm based on the simulation of the
Kennedy and Eberhart [40]. PSO is a simpler and faster
biological evolution process in chromosomes. In other
evolutionary algorithm and has fewer parameters compared
words, the genetic algorithm mimics the survival of the
to GA. Therefore, PSO has been widely applied in many
fittest among chromosomes of consecutive generations in
problems and areas such as optimization, feature selection,
order to solve a certain optimization problem. The genetic
pattern recognition, classification and clustering [42-46].
algorithm (GA) is commonly utilized to solve the
Unlike the chromosome’s evolution in GA, PSO is inspired
optimization problems with a large search space [41, 42].
by the social behavior of birds flocking in interacting and
In GA, all possible candidate solutions construct the search
cooperating to find food. Like evolutionary algorithms, a
space or population of a specific optimization problem. A
PSO population (called a swarm) consists of candidate
basic GA mainly implements the following four major
solutions or individuals (called particles) which are
steps:
randomly initialized. Each particle then moves in the search
i. Encoding of chromosomes: Each candidate
space with a velocity 𝑣𝑣 in order to find the optimal solution.
solution represents chromosome in a population,
The particles learn over time based on their own experience
which is encoded with several genes. Each gene is
and the experience of the other particles in the swarm.
a small part of a candidate solution, which can
represent one parameter to be optimized. Le 𝑥𝑥𝑖𝑖 = (𝑥𝑥𝑖𝑖1 , 𝑥𝑥𝑖𝑖2 , 𝑥𝑥𝑖𝑖3 , … , 𝑥𝑥𝑖𝑖𝑖𝑖 ) be the current position of
ii. Initialization of the population: An initial particle i, and 𝑣𝑣𝑖𝑖 = (𝑣𝑣𝑖𝑖1 , 𝑣𝑣𝑖𝑖2 , 𝑣𝑣𝑖𝑖3 , … , 𝑣𝑣𝑖𝑖𝑖𝑖 )be the velocity of
population of chromosomes is randomly generated, particle i, where D is the dimensionality of the search space.
which consists of the initial solutions of a specific To find the best solution in PSO, each particle changes its
optimization problem. velocity, as shown in Eq. (12) and Eq. (13), according to
iii. Fitness evaluation: The GA computes the fitness pbest and gbest, which represent the best previous position
value of each individual chromosome, which of a particle (personal best position) and the best position
indicates the goodness of the solution provided by obtained by the whole population (global best position),
the individual chromosome. The chromosomes in respectively.
the population are then evaluated using the fitness
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 21

𝑡𝑡+1 𝑡𝑡 𝑡𝑡+1
In this study, we used the same dataset as that used in [20],
𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝑣𝑣𝑖𝑖𝑖𝑖 (12) which consists of 500 malware and benign apps, in order to
𝑡𝑡+1 𝑡𝑡 𝑡𝑡 evaluate the performance of the proposed methods. In the
𝑣𝑣𝑖𝑖𝑖𝑖 = 𝑤𝑤 ∗ 𝑣𝑣𝑖𝑖𝑖𝑖 + 𝑐𝑐1 ∗ 𝑟𝑟1 ∗ (𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑥𝑥𝑖𝑖𝑖𝑖 )
𝑡𝑡 dataset used in this study, 250 benign apps and 250 malware
+ 𝑐𝑐2 ∗ 𝑟𝑟2 (𝑝𝑝𝑔𝑔𝑔𝑔 − 𝑥𝑥𝑖𝑖𝑖𝑖 ) (13) apps were collected from official Google Play store [54] and
Genome [55], respectively, which are the most common
where d=1,2,3 …D, t represents the tth iteration, 𝑝𝑝𝑖𝑖𝑖𝑖 and sources of benign and malware apps. These apps have
𝑝𝑝𝑔𝑔𝑔𝑔 denote the pbest and gbest, 𝑤𝑤 is inertia weight, 𝑐𝑐1 and many permission features that can be used as input features
𝑐𝑐2 are acceleration parameters which are commonly set to to train and test the proposed hybrid intelligent Android
2.0, and 𝑟𝑟1 and 𝑟𝑟2 are random values in the range [0, 1]. malware detection.

6.2 Feature Extraction


6. Methodology
The extraction and selection of the important features of
This section will describe the methodology of the proposed Android apps play an extremely important role in
hybrid intelligent Android malware detection approaches recognizing malware from benign apps. In this step, some
using evolving support vector machine (SVM) based on popular permission features of apps are extracted from these
genetic algorithm (GA) and particle swarm optimization Android apps since the permissions features are the most
(PSO). In the proposed approaches, GA and PSO are significant features that can be utilized in Android malware
adopted to solve the dual optimization problem in SVM, so- detection.
called Droid-HESVMGA and Droid-HESVMPSO, in order In the development phase of an Android app, the developers
to help in increasing the accuracy of the Android malware of Android apps must declare all the permissions required
detection. Fig. 2 shows the methodology of the hybrid to access system resources using <uses-permission> tags in
evolving support vector machine approach suggested for the AndroidManifest.xml [31] as shown in Fig. 3.
Android malware detection. As can be seen from Fig. 3, AndroidManifest.xml is an
XML file that includes the permission features of Android
6.1 Collection of Malware and Benign Apps applications. In this study, Apktool was used to decompress
the Android application package (APK file) and then extract
Numerous researchers have collected and analyzed many the AndroidManifest.xml in order to obtain the permission
malware and benign apps from several sources such as features. Only 50 frequently requested permissions were
Google Play store [47], Genome [48], Contagiodump [49], collected from Android applications to be used as input
VirusTotal [50], MalShare[51], VirusShare[52] , and features to train and test the proposed hybrid Droid-
theZoo [53]. HESVMGA and Droid-HESVMPSO.

Fig. 2 A methodology of the proposed hybrid intelligent android malware detection using evolving SVM based on GA and PSO
22 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

In this study, one of the most common filter-based methods,


known as information gain ratio, is applied to select highly
significant permission features of Android apps.
Eq. (14) computes the information gain, Gain(S, A) of a
feature A, relative to a collection of examples S.
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝑆𝑆, 𝐴𝐴) = 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆)
|𝑆𝑆𝑣𝑣 |
− � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑣𝑣 ) (14)
|𝑆𝑆|
𝑣𝑣∈𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴)

where Value(A) is the set of all possible values for a feature


A, and 𝑆𝑆𝑣𝑣 is the subset of S for which feature A has value
Fig. 3 The permissions declared to access system resources in v. Entropy(S) is defined as Eq. (15):
AndroidManifest.xml of Android apps /𝐶𝐶/

𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆) = − � 𝑃𝑃𝑃𝑃( 𝑐𝑐𝑗𝑗 ) 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑃𝑃𝑃𝑃( 𝑐𝑐𝑗𝑗 ) (15)


6.3 Dataset Preparation 𝑗𝑗=1

In this phase, the permission features of Android apps are


where 𝑃𝑃𝑃𝑃( 𝑐𝑐𝑗𝑗 ) denotes the probability of class in S. It is
converted into numerical form to be effectively used to train
and construct the proposed hybrid Droid-HESVMGA and the number of examples of class 𝑐𝑐𝑗𝑗 in S divided by the total
Droid-HESVMPSO. Initially, a base vector is prepared, number of examples in S.
which includes a set of the frequently requested permissions The information gain ratio is an enhancement of the
that can be effectively utilized in Android malware information gain that decreases its bias toward high-branch
detection. For each Android app, a binary vector of attributes. The information gain ratio is employed in feature
permission features is then created as an instance of the base selection to achieve better performance. In the information
vector. gain ratio, Eq. (16) is used in order to evaluate the features:
Each Android app represents a single training pattern, 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝑆𝑆, 𝐴𝐴)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝑆𝑆, 𝐴𝐴) = (16)
which is encoded with a binary vector of permission 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖(𝑆𝑆, 𝐴𝐴)
features and a class label indicates whether the Android app
is benign or malware. A permission feature is assigned to where Split information (S, A) is computed using Eq. (17):
𝑘𝑘
binary value based on whether the permission feature is |𝑆𝑆𝑖𝑖 | |𝑆𝑆𝑖𝑖 |
requested or not by the Android apps. The permission Split information(𝑆𝑆, A) = − � 𝑙𝑙𝑙𝑙𝑙𝑙2 (17)
|𝑆𝑆| |𝑆𝑆|
feature is represented by 1 in the binary vector if it is 𝑖𝑖=1
requested by the app. Otherwise, the permission feature is
set to 0 in the binary vector. Furthermore, the last value in where 𝑆𝑆1 through 𝑆𝑆𝑘𝑘 are the k subsets of examples resulting
the binary vector represents the class of the Android app, from partitioning S by the k-values feature A.
either a malware or a benign app. In the proposed hybrid Droid-HESVMGA and Droid-
HESVMPSO, only the best 25 permission features that have
6.4 Feature Selection a high impact on Android malware detection are selected
using the information gain ratio (IGR), in order to
In the mobile environment, feature selection is a vital step contribute toward enhancing the performance of the
used to remove redundant and irrelevant permission evolving support vector machine classifiers suggested to
features that can produce noisy data, causing a negative detect the Android malware apps.
impact on the performance of intelligent classifiers.
Therefore, a feature selection method should be used to 6.5 Training of Droid-HESVMGA and Droid-
identify the most significant permissions that can be HESVMPSO
effective in distinguishing malware from benign apps.
Generally, there are two primary feature selection In this phase, the proposed Droid-HESVMGA and Droid-
approaches used in data mining: the filter approach and the HESVMPSO are trained using the prepared training dataset
wrapper approach. The methods under the filter feature with the Android permission features selected by
selection approach are easier and faster compared to the information gain ratio. The significant permission features
methods of the wrapper approach, since they analyze and are used as input features of Droid-HESVMGA and Droid-
evaluate the features without training of the classifiers. HESVMPSO, which are trained in order to classify the
Hence, the feature selection methods under the filter Android apps into two classes, either the malware or benign
approach are more suitable to be used in the mobile apps.
environment.
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 23

Unlike the conventional SVM, the GA and PSO are used in In the proposed Droid-HESVMPSO, an initial swarm of
the proposed Droid-HESVMGA and Droid-HESVMPSO to particles is randomly generated; each particle represents the
solve the dual optimization problem in support vector value of the Lagrange multiplier for a certain training
machine in order to increase accuracy of the Android example. Each particle’s fitness is then computed using Eq.
malware detection. (9) and evaluated accordingly. The PSO fitness function
In the proposed Droid-HESVMGA and Droid- aims at finding a separating maximum margin hyperplane
HESVMPSO, each candidate solution or individual is for given training examples. If the current particle fitness is
represented by a vector and denoted as a chromosome in better than the best fitness of that particle (pbest), then the
population GA or a particle in the PSO swarm. The GA new pbest will be updated to the current particle fitness. The
chromosome and position of each PSO particle is global best fitness(gbest) is then updated to the particle with
represented by a vector of real values, in which each value the best fitness value of all the particles. If the stopping
represents the value of the Lagrange multiplier for a training criteria (sufficiently good fitness or maximum iterations)
example as shown in Fig. 4. Fig. 4 illustrates an example are met, the PSO will terminate the search and return the
of encoding the Lagrange multipliers vector in a GA optimal values of the Lagrange multipliers. Otherwise, the
chromosome and PSO particle in the proposed Droid- pbest and gbest are utilized to update the velocity and
HESVMGA and Droid-HESVMPSO. position for every particle using Eq. (12) and Eq. (13). This
process is repeated until the stop conditions are met.
𝒚𝒚 𝑦𝑦1 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑛𝑛−1 𝑦𝑦𝑛𝑛 After solving the optimization problem and obtaining the
Lagrange multipliers by using the GA and PSO, the
𝜶𝜶 𝛼𝛼1 𝛼𝛼2 𝛼𝛼3 ………………. 𝛼𝛼𝑛𝑛−1 𝛼𝛼𝑛𝑛 proposed Droid-HESVMGA and Droid-HESVMPSO can
be used in Android malware detection. The proposed Droid-
HESVMGA and Droid-HESVMPSO use the decision Eq.
Fig. 4 Encoding of Lagrange multipliers vector in GA chromosome and (18) to classify each input vector x into positive or negative
PSO particle
class. In Android malware detection, the positive class
refers to the malware apps, while the negative class
In the evolutionary algorithms, once the candidate solutions represents the benign apps.
are encoded, the fitness function is used to evaluate the 𝑛𝑛
candidate solutions or individuals. In order to find a 𝑦𝑦(𝑥𝑥) = 𝑠𝑠𝑠𝑠𝑠𝑠 �� 𝛼𝛼𝑗𝑗 𝑦𝑦𝑗𝑗 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) + 𝑏𝑏� (18)
separating maximum margin hyperplane of SVM for a 𝑖𝑖=1
given set of data points, Eq. (9) is used as the fitness
function in the proposed Droid-HESVMGA and Droid- In the proposed Droid-HESVMGA and Droid-
HESVMPSO to evaluate the GA chromosomes and PSO HESVMPSO, the radial basis function (RBF) defined as Eq.
positions. (19) was used as the kernel function 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) , since it
In the proposed Droid-HESVMGA, an initial population of achieved a better performance in many applications
chromosomes is randomly generated, which represent the compared to other kernel functions. The parameter 𝛾𝛾
values of Lagrange multipliers for training patterns. The represents the width of the RBF.
chromosomes’ performances are then computed and
evaluated by the fitness function shown in Eq. (9). The GA 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) = 𝑒𝑒𝑒𝑒𝑒𝑒( − 𝛾𝛾‖𝑥𝑥𝑖𝑖 − 𝑥𝑥‖2 ), 𝛾𝛾 > 0 (19)
will stop the search and return the optimal vector of
Lagrange multipliers if the good fitness or maximum
generations number is reached. Otherwise, the GA
implements selection, crossover, and mutation to produce a 7. Analysis and Discussion of Results
new generation of chromosomes in order to find the optimal
vector of Lagrange multipliers that can maximize the 7.1 Dataset Collection and Preparation
performance of SVM. The fittest chromosomes are the most
In this study, the dataset with 500 Android apps used by
appropriate candidate for mating to produce a new
[20] was adopted in our experiments in order to train and
generation. Crossover and mutation are then employed to
evaluate the proposed hybrid intelligent android malware
produce child chromosomes, used as alternative
detection based on the evolving support vector machine:
chromosomes to their parent chromosomes in the GA
Droid-HESVMGA and Droid-HESVMPSO. In this dataset,
population. The parent chromosomes are then chosen to
250 malware apps were collected from official Google Play
exchange the chromosome genes using the crossover
[54] while 250 malware apps were collected from Genome
process to offer a child chromosome with genetic materials.
[55], which is commonly used in the literature to collect
In GA mutation, a gene in the child chromosome can be
malware apps.
changed to a random value between 0 and C in the proposed
In order to prepare the training dataset, the permission
Droid-HESVMGA.
features of these Android apps were extracted and
24 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

converted to binary forms based on whether the permission classifiers in Android malware detection. In Table 3, true
feature is requested or not by the Android apps. positive (TP) is the number of correctly classified malware
Accordingly, the best 25 permissions features were selected apps, false negative (FN) is the number of incorrectly
by using information gain ratio in order to help in improving classified malware apps, true negative (TN) is the number
the performance of the proposed hybrid intelligent Android of correctly classified benign apps, and false positive (FP)
malware detection method based on evolving support is the number of incorrectly classified benign apps.
vector machine.
7.2 Evaluation Measures 7.3 Comparison Against Popular Machine Learning
In order to evaluate the proposed methods, Droid- Classifiers
HESVMGA and Droid-HESVMPSO were trained and
evaluated using 5-fold cross-validation. In this study, the proposed Droid-HESVMGA and Droid-
In our experiments, we used five popular metrics, which are HESVMPSO were trained and compared with two common
commonly used in the literature for detecting malware apps, implementations of SVM, known as LibSVM [56] and
to evaluate the performance of the proposed Droid- mySVM [57], which use the classical optimization
HESVMGA and Droid-HESVMPSO. Correct classification techniques to solve the quadratic programming problem.
rate, true positive rate, false positive rate, false negative rate In all SVMs, RBF was used as the kernel function while the
and area under ROC curve were calculated in order to judge best parameters C (margin softness) and γ (RBF width)
the effectiveness of the proposed Droid-HESVMGA and were obtained by using a grid search algorithm in order to
Droid-HESVMPSO. The correct classification rate (CCR) achieve the best performance for Android malware
is the rate of malware and benign apps that are correctly detection. In addition, the proposed Droid-HESVMGA and
classified with respect to all Android apps. True positive Droid-HESVMPSO were compared with other four
rate (TPR) is the rate of malware apps classified as malware machine learning classifiers commonly used in the
out of total malware apps. False positive rate (FPR) is the literature to detect the Android malware applications: back-
rate of benign apps classified as malware out of total benign propagation neural network (BPNN), naïve Bayes classifier
apps. False negative rate (FNR) is the rate of malware apps (NB), random forest (RF), and k-Nearest neighbour (kNN).
classified as benign out of total malware apps. The area In the proposed Droid-HESVMGA and Droid-
under ROC curve (AUC) is a measure used to evaluate the HESVMPSO, it was found by a trial-and-error basis that the
trade-off between TPR and FPR. parameters settings of the GA and PSO shown in Tables 4
Table 3 shows the measures used to evaluate the and 5 produced good results.
performance of proposed evolving support vector machine

Table 3: The performance measures used to evaluate the proposed methods


Measure name Formula (%) Description
𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 The rate of malware and benign apps
Correct classification rate
(CCR) 𝐶𝐶𝐶𝐶𝐶𝐶 = × 100 correctly classified with respect to all the
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹 + 𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇 apps.
𝑇𝑇𝑇𝑇 Rate of malware apps classified as
True positive rate (TPR) 𝑇𝑇𝑇𝑇𝑇𝑇 = × 100 malware out of total malware apps.
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝐹𝐹𝑃𝑃 Rate of benign apps classified as malware
False positive rate (FPR) 𝐹𝐹𝐹𝐹𝐹𝐹 = × 100 out of total benign apps.
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
𝐹𝐹𝐹𝐹 Rate of malware apps classified as benign
False negative rate (FNR) 𝐹𝐹𝐹𝐹𝐹𝐹 = × 100 out of total malware apps.
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
1 + 𝑇𝑇𝑇𝑇𝑇𝑇 − 𝐹𝐹𝐹𝐹𝐹𝐹 This measures the trade-off between TPR
Area under ROC curve (AUC) 𝐴𝐴𝐴𝐴𝐴𝐴 = × 100 and FP
2

Table 4: Parameters settings of GA used in the proposed Droid- Table 5: Parameters settings of PSO used in the proposed Droid-
HESVMGA HESVMPSO
Parameter Value Parameter Value
Population size 20 Number of particles 20
Maximum generation 1000 Maximum iterations
(generations) 1000
Crossover probability 0.9
Mutation type Switching mutation C1 2
Selection scheme Tournament (0.75) C2 2
maximum number of
Stop condition iterations

Table 6 shows the performance in terms of CCR, TPR, FPR,


FNR, and AUC for the proposed Droid-HESVMGA and
Droid-HESVMPSO against other machine learning
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 25

classifiers used in Android malware detection. It is clear malware detection, since it calculates the rate of malware
from Table 6 that the proposed Droid-HESVMGA and apps classified as benign out of total malware apps. It can
Droid-HESVMPSO outperformed BPNN, NB, RF, kNN, be seen in Table 6 that the proposed Droid-HESVMGA and
mySVM, and LibSVM in most of the performance Droid-HESVMPSO achieved lower FNR compared to most
measures. of the other machine learning classifiers used in Android
malware detection. Only 2% of malware apps were
Table 6: Comparison of the proposed Droid-HESVMGA and Droid- incorrectly classified as benign apps by the proposed Droid-
HESVMPSO against popular machine learning classifiers used in HESVMGA and Droid-HESVMPSO.
Android malware detection
CCR TPR FPR FNR AUC
BPNN 87.80 91.20 15.60 8.80 95.40 7.4 Comparison Against Other Hybrid Android
NB 74.80 99.20 49.60 0.80 70.60 Malware Detection Works
RF 88.80 89.60 12.00 10.40 97.30
kNN 86.80 77.60 4.00 22.40 95.70
mySVM 82.00 85.60 21.60 14.40 96.10 In this section, the proposed Droid-HESVMGA and Droid-
LibSVM 88.20 91.60 15.20 8.40 96.20 HESVMPSO were compared with other existing hybrid
Proposed 95.60 98.00 6.80 2.00 96.90
Droid-HESVMGA malware detection approaches, which combined several
Proposed 94.80 98.00 8.40 2.00 96.00 algorithms into classifiers to enhance the performance of
Droid-HESVMPSO
malware detection. The proposed Droid-HESVMGA and
Droid-HESVMPSO were compared to other previous
As can be observed from Table 6, the proposed Droid- works: evolving hybrid neuro-fuzzy classifier (EHNFC)
HESVMGA and Droid-HESVMPSO achieved much better [20], dynamic evolving fuzzy inference system (DENFIS)
CCR than other machine learning classifiers used in [20, 58] and adaptive fuzzy inference system with triangular
Android malware detection. In particular, the proposed membership function (TRIMF–ANFIS) [20]. For a fair
Droid-HESVMGA produced the highest CCR (95.60%), comparison, the proposed Droid-HESVMGA and Droid-
followed by Droid-HESVMPSO (94.80%), among the other HESVMPSO were trained and then evaluated using the
machine learning classifiers. This indicates that the same dataset used in these previous works.
proposed Droid-HESVMGA and Droid-HESVMPSO were The results in Table 7 clearly depict the overall
able to correctly detect both malware and benign apps with classification accuracy (CCR), TPR, FPR, FNR and AUC
respect to all the Android apps. for the proposed Droid-HESVMGA and Droid-
In terms of other measures, the results shown in Table 6 HESVMPSO compared to those of EHNFC, DENFIS, and
demonstrate that the proposed Droid-HESVMGA and TRIMF–ANFIS.
Droid-HESVMPSO also achieved better performance in
both TPR and FPR compared to other machine learning Table 7: Comparison of the proposed Droid-HESVMGA and Droid-
classifiers used in Android malware detection. Actually, HESVMPSO against other hybrid Android malware detection works
there is a trade-off between TRR and FPR. Therefore, a CCR TPR FPR FNR AUC
EHNFC 90.00 88.24 5.00 5.00 95.00
balanced performance between TPR and FPR should be DENFIS 82.20 87.50 19.05 12.50 92.20
provided in a good malware detection system. TRIMF–ANFIS 88.00 78.95 11.11 21.05 93.00
Although the highest TPR was accomplished by NB, NB Proposed 95.60 98.00 6.80 2.00 96.90
Droid-HESVMGA
also produced the poorest FPR among other machine Proposed 94.80 98.00 8.40 2.00 96.00
learning classifiers used in this study. This was due to NB Droid-HESVMPSO
produced unbalanced detection between malware and
benign apps. This negatively affected the overall accuracy In terms of CCR, the results in Table 7 show that the
and AUC of NB used in Android malware detection. On the proposed Droid-HESVMGA accomplished the highest
other hand, the proposed Droid-HESVMGA and Droid- accuracy (95.60%), followed by the proposed Droid-
HESVMPSO produced balanced detection performance HESVMPSO (94.80%), EHNFC (90.00%), TRIMF–
between the positive and negative classes in both TRR and ANFIS (88.00%), and EHNFC (82.20%).
FPR. This indicates that the proposed Droid-HESVMGA In terms of TPR, FPR, and FNR, the proposed Droid-
and Droid-HESVMPSO were able to precisely detect both HESVMGA and Droid-HESVMPSO achieved much better
malware and benign apps. Consequently, the proposed TPR than EHNFC, DENFIS, and TRIMF–ANFIS.
Droid-HESVMGA and Droid-HESVMPSO achieved better Furthermore, the lowest FNR (only 2.00%) was
performance in terms of the overall accuracy (CCR), TPR , accomplished by the proposed Droid-HESVMGA and
FPR and AUC compared to the other machine learning Droid-HESVMPSO. Meanwhile, the proposed Droid-
classifiers. HESVMGA and Droid-HESVMPSO produced lower FPR
In addition, Table 6 presents the performance in terms of compared to the FPRs obtained by DENFIS and TRIMF–
FNR for the proposed Droid-HESVMGA and Droid- ANFIS. This was primarily due to the capability of the
HESVMPSO compared to other machine learning proposed Droid-HESVMGA and Droid-HESVMPSO to
classifiers. FNR is also an important measure in Android successfully detect both malware and benign apps. On the
26 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

other hand, EHNFC performed unbalanced performance References


between malware and benign apps, since it produced lower [1] A. Boxall, In 2020 6.1 Billion People Will Use A Smartphone
TPR and the best FPR among other malware detection Digital Trends, 2019. On the WWW, URL
approaches. https://www.digitaltrends.com/mobile/smartphone-users-
The best TPR and FPR of the proposed Droid-HESVMGA number-6-1-billion-by-2020
and Droid-HESVMPSO produced the best AUC (96.00%) [2] N. Milosevic, A. Dehghantanha, and K. K. R. Choo,
among the AUCs achieved by other malware detection “Machine learning aided Android malware classification,”
Computers and Electrical Engineering, vol. 61, pp. 266–274,
approaches. This was because the AUC metric is used to 2017.
measure the trade-off between TPR and FPR, as shown in [3] AppBrain, Number of Android applications on the Google
Table 3. The results in Table 7 demonstrate that the Play store | AppBrain, 2019. On the WWW, URL
proposed Droid-HESVMGA had the highest AUC https://www.appbrain.com/stats/number-of-android-apps
(96.90%), followed by Droid-HESVMPSO (96.00%), [4] W. J. Buchanan, S. Chiale, and R. Macfarlane, “A
EHNFC (95.00%), TRIMF–ANFIS (93.00%), and then methodology for the security evaluation within third-party
DENFIS (92.20%). Android Marketplaces,” Digital Investigation, vol. 23, pp.
88–98, 2017.
[5] F. Martinelli, I. Matteucci, M. Petrocchi, A. Saracino, G. Dini,
and D. Sgandurra, “Risk analysis of Android applications: A
8. Conclusion and Future Work user-centric solution,” Future Generation Computer Systems,
vol. 80, pp. 505–518.
This paper has presented a methodology for proposed [6] L. Cen, C. S. Gates, L. Si, and N. Li, “A probabilistic
hybrid intelligent Android malware detection approaches discriminative model for Android malware detection with
using evolving support vector machine based on decompiled source code,” IEEE Trans. Depend. Secure
evolutionary algorithms. In the proposed hybrid intelligent Comput., vol. 12(4), pp. 400–412.
Android malware detection approaches, GA and PSO were [7] Fan Ming, Liu Jun, Wang Wei, Li Haifei, Tian Zhenzhou, and
exploited in SVM to solve the dual optimization problem, Liu Ting, “DAPASA: Detecting Android piggybacked apps
referred to as Droid-HESVMGA and Droid-HESVMPSO, through sensitive subgraph analysis,” IEEE Trans. Inf.
in order to enhance the detection accuracy of Android Forensics Security, vol. 12(8), pp. 1772–1785, 2017.
malware apps. The proposed Droid-HESVMGA and Droid- [8] S. Y. Yerima, and S.Sezer, “DroidFusion: A Novel
Multilevel Classifier Fusion Approach for Android Malware
HESVMPSO produced promising solutions with higher Detection,” IEEE Transactions on Cybernetics, vol. 49(2), pp.
detection accuracy of the Android malware apps, since they 453–466, 2019.
had the potential gains derived from exploiting both GA and [9] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M Lee, and K.-P Wu,
PSO optimization methods in SVM classifier. The “DroidMat: Android malware detection through manifest and
experimental results demonstrated that the proposed Droid- API calls tracing,” In Proc. 7th Asia Joint Conf. Inf. Security
HESVMGA and Droid-HESVMPSO accomplished much (Asia JCIS), pp. 62–69, 2012.
better CCRs than popular machine learning classifiers and [10] Arp, Daniel, M. Spreitzenbarth, M. Hübner, H. Gascon, and
other existing hybrid malware detection approaches used in K. Rieck, “Drebin: Effective and Explainable Detection of
Android malware detection. Furthermore, the best TPR, Android Malware in Your Pocket,” In Proceedings 2014
Network and Distributed System Security Symposium, pp. 1–
FPR, FNR and AUC measures were accomplished by the 15, 2014.
proposed Droid-HESVMGA, followed by the proposed [11] P. P. K. Chan, and W.-K. Song, “Static detection of Android
Droid-HESVMPSO. malware by using permissions and API calls,” In Proc. Int.
In this study, the proposed Droid-HESVMGA and Droid- Conf. Mach. Learn. Cybern., Lanzhou, pp. 82–87, 2014.
HESVMPSO were trained using only the permission [12] Wang, Wei, X. Wang, D. Feng, J. Liu, Z. Han, and X. Zhang,”
features of Android malware applications. The proposed Exploring Permission-Induced Risk in Android Applications
Droid-HESVMGA and Droid-HESVMPSO can be for Malicious Application Detection,” IEEE Transactions on
improved further by utilizing the intents and API call Information Forensics and Security, vol. 9(11), pp. 1869 –
1882, 2014.
features of Android malware applications in the training
[13] A. Sharma and S. K. Dash, “Mining API calls and
phase. permissions for Android malware detection,” In Cryptology
and Network Security, Cham, Switzerland: Springer Int., pp.
191–205, 2014.
Acknowledgment [14] Yerima , S. Sezer, and Igor Muttik, “High Accuracy Android
Malware Detection Using Ensemble Learning,” IET
This project was funded by the Deanship of Scientific Information Security, vol. 9(6), pp. 313 – 320, 2015.
Research (DSR), King Abdulaziz University, Jeddah, under [15] Yuan, Zhenlong, Yongqiang Lu, and Yibo Xue.,
grant No. (D-212-830-1440). The authors, therefore, “Droiddetector: Android Malware Characterization and
gratefully acknowledge the DSR technical and financial Detection Using Deep Learning,” Tsinghua Science and
support. Technology, vol. 21(1), pp. 114-123, 2016.
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 27

[16] M. V. Varsha, P. Vinod, and K. A Dhanya. “Identification of [31] Android, Manifest.permission | Android Developers, 2019.
malicious Android app using manifest and opcode features,” On the WWW, URL
J. Comput. Virol. Hacking Tech., vol. 13(2), pp. 125–138, https://developer.android.com/reference/android/Manifest.p
2017. ermission#summary
[17] M. L. Dantas Dias, and A. R. R. Neto, “Evolutionary support [32] F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y.
vector machines: A dual approach,” In 2016 IEEE Congress Rahulamathavan, “PIndroid: A novel Android malware
on Evolutionary Computation, pp. 2185–2192, 2016. detection system using ensemble learning methods,”
[18] I. Mierswa, “Evolutionary Learning with Kernels: A Generic Computers and Security, vol. 68, pp. 36–46, 2017.
Solution for Large Margin Problems,” In Proceedings of the [33] Google, Control your app permissions on Android 6.0 and up
8th annual conference on Genetic and evolutionary - Google Play Help, 2019. On the WWW, URL
computation - GECCO ’06 (p. 1553), 2006. https://support.google.com/googleplay/answer/6270602?hl=
[19] W. Wang, Y. Li, X. Wang, J. Liu, and X. Zhang, “Detecting en
Android malicious apps and categorizing benign apps with [34] V. Vapnik, “The nature of statistical learning theory,” (2nd
ensemble of classifiers,” Future Generation Computer edition), New York: Springer, 1995.
Systems, vol. 78, pp. 987–994, 2018. [35] J. C. Platt, “Fast training of support vector machines using
[20] A. Altaher, “An improved Android malware detection sequential minimal optimization,” In Advances in Kernel
scheme based on an evolving hybrid neuro-fuzzy classifier Methods - Support Vector Learning. Cambridge, MA, USA:
(EHNFC) and permission-based features,” Neural MIT Press, 1999.
Computing and Applications, vol. 28(12), pp. 4147–4157, [36] J. K. Anlauf and M. Biehl, “The adatron: An adaptive
2017. perceptron algorithm,” Europhysics Letters, vol. 10(7), pp.
[21] A. T. Kabakus, and I. A. Dogru, “An in-depth analysis of 687, 1989.
Android malware using hybrid techniques,” Digital [37] T-T. Frie, N. Cristianini, and C. Campbell, “The kernel-
Investigation, vol. 24, pp. 25–33, 2018. adatron algorithm: a fast and simple learning procedure for
[22] C. Zhao, C. Wang, and W. Zheng, “Android Malware support vector machines,” In Machine Learning: Proceedings
Detection Based on Sensitive Permissions and APIs,” In of the Fifteenth International Conference (ICML’98),
International Conference on Security and Privacy in New Citeseer, pp. 188–196, 1998
Computing Environments (SPNCE), pp. 96–104, 2019. [38] P. E. Gill, W. Murray, and M. H. Wright, Practical
[23] E. M. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, optimization, 1981.
“MalDozer: Automatic framework for android malware [39] D. E. Goldberg, “Genetic Algorithms in Search Optimization
detection using deep learning,” Digital Investigation, vol. 24, and Machine Learning,” Addison-Wesley, 1989.
pp. S48–S59, 2018. [40] J. Kennedy and R. Eberhart, “Particle swarm optimization,”
[24] A. Shubair, and A. Altaher, “Intelligent Approach for In IEEE International Conference on Neural Networks, pp.
Android Malware Detection,” KSII Transactions on Internet 1942–1948, 1995.
and Information Systems, vol. 9(8), pp. 2964 – 2983, 2015. [41] B. Chakraborty, “Evolutionary Computational Approaches to
[25] K. Tam, A. Feizollah, N. B. Anuar, R. Salleh, and L. Feature Subset Selection,” International Journal of Soft
Cavallaro, “The Evolution of Android Malware and Android computing and Bioinformatics, vol. 1(2), pp. 59-65, 2010.
Analysis Techniques,” ACM Computing Surveys, vol. 49(4), [42] A. Kawamura, and B. Chakraborty, “A hybrid approach for
pp. 1–41, 2017. optimal feature subset selection with evolutionary
[26] H. S. Ham, and M. J. Choi, “Analysis of Android malware algorithms,” Proceedings - 2017 IEEE 8th International
detection performance using machine learning classifiers,” In Conference on Awareness Science and Technology, ICAST
International Conference on ICT Convergence, 2013. 2017, pp. 564–568, 2018.
https://doi.org/10.1109/ICTC.2013.6675404 [43] M-Y Cho, and T. T. Hoang, “Feature Selection and
[27] A. Guerra, “APPLICATION OF FULL MACHINE Parameters Optimization of SVM Using Particle Swarm
LEARNING WORKFLOW FOR MALWARE Optimization for Fault Classification in Power Distribution
DETECTION IN ANDROID ON THE BASIS OF SYSTEM Systems,” Computational Intelligence and Neuroscience,
CALLS AND PERMISSIONS,” MS Thesis, TALLINN Article ID 4135465, 9 pages, 2017.
UNIVERSITY OF TECHNOLOGY, School of Information [44] L. M. Abualigah, A. T. Khader, and E. S. Hanandeh, “A new
Technologies, 2018. See also URL feature selection method to improve the document clustering
https://digi.lib.ttu.ee/i/?10770 using particle swarm optimization algorithm,” Journal of
[28] Yerima, S. Sezer, and G. McWilliams, “Analysis of Bayesian Computational Science, vol. 25, pp.456-466, 2018.
classification-based approaches for Android malware [45] J. Wei, Z. Jian-Qi, and Z. Xiang, “Face recognition method
detection,” IET Information Security, vol. 8(1), pp. 25-36, based on support vector machine and particle swarm
2014. optimization,” Expert Systems with Applications, vol. 38(4):
[29] McAfee, McAfee Labs Threats Report, 2018. On the WWW, pp. 4390-4393, 2011.
URL https://www.mcafee.com/enterprise/en- [46] D. O’Neill, A. Lensen, B. Xue, and M. Zhang, “Particle
us/assets/reports/rp-quarterly-threats-sep-2018.pdf Swarm Optimisation for Feature Selection and Weighting in
[30] N. Peiravian, and X. Zhu, “Machine learning for Android High-Dimensional Clustering,”. In 2018 IEEE Congress on
malware detection using permission and API calls,” In Evolutionary Computation(CEC), 2018.
Proceedings - International Conference on Tools with [47] Google Play, Google Play Store, 2019. On the WWW, URL
Artificial Intelligence, ICTAI, pp. 300–305, 2013. https://play.google.com/store?hl=en
28 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019

[48] Genome, Android Malware Genome Project, 2019. On the


WWW, URL http://www.malgenomeproject.org
[49] Contagio, Contagio Mobile: mobile malware mini dump,
2019. On the WWW, URL
http://contagiominidump.blogspot.co.uk
[50] VirusTotal, VirusTotal for Android, 2019. On the WWW,
URL https://www.virustotal.com/en/documentation/mobile-
applications
[51] MalShare, MalShare project, 2019. On the WWW, URL
http://malshare.com/about.php
[52] VirusShare,VirusShare.com, 2019. On the WWW, URL
https://virusshare.com
[53] theZoo, theZoo aka Malware DB, 2019. On the WWW, URL
http://ytisf.github.io/theZoo
[54] Google Play, Google Play Store, 2014. On the WWW, URL
https://play.google.com/store?hl=en
[55] Genome, Android Malware Genome Project, 2014. On the
WWW, URL http://www.malgenomeproject.org
[56] C. C. Chang, and C. J. Lin. LIBSVM: A library for support
vector machines, 2001.
http://www.csie.ntu.edu.tw/~cjlin/libsvm
[57] S. Ruping, mySVM Manual, Universit¨at Dortmund,
Lehrstuhl Informatik VIII, 2000. http://www-ai.cs.tu-
dortmund.de/SOFTWARE/MYSVM/index.html
[58] N. Kasabov, Q. Song, “DENFIS: dynamic evolving
neuralfuzzy inference system and its application for time-
series prediction,” IEEE Trans Fuzzy Syst, vol. 10(2), pp.
144–154, 2002.

Waleed Ali received his B.Sc. in Computer


Science from Faculty of Science, Taiz
University, Yemen in 2005. He obtained his
M.Sc and Ph.D (Computer Science) from
Faculty of Computing, Universiti
Teknologi Malaysia(UTM), Malaysia in
2009 and 2012 respectively. Currently, he
is Assistant Professor in IT department,
Faculty of Computing and Information
Technology in Rabigh, King Abdulaziz University since
September 2013. He has published many papers in international
journals, conferences and book chapters. His research interests
include Intelligent Web caching, Intelligent Web prefetching, Web
usage mining, Intelligent phishing website detection, Intelligent
Android malware detection, and machine learning techniques and
their applications.

View publication stats

You might also like