18.hybrid Intelligent Android Malware Detection Using Evolving Support Vector Machine Based On Genetic Algorithm and Particle Swarm Optimization
18.hybrid Intelligent Android Malware Detection Using Evolving Support Vector Machine Based On Genetic Algorithm and Particle Swarm Optimization
net/publication/336777808
CITATIONS READS
5 260
1 author:
Waleed Ali
24 PUBLICATIONS 357 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Waleed Ali on 24 October 2019.
Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University,
Rabigh, Kingdom of Saudi Arabia
Table 1: Summary of the existing intelligent Android malware detection based on the static malware analysis
Approach Features Machine learning Feature selection Dataset source
DroidMat([9] Permissions, intents, and k-means and k-NN None Google Play and
API calls Contagio Mobile
Permissions, API calls Several markets,
DREBIN [10] and network addresses SVM None Google Play and
Genome
Mining API calls and Correlation-based Official, third party
permissions for Android Permissions and API Naive Bayes and k- feature selection and Android markets and
malware detection [13] calls NN information gain Android malware
Genome project
Static detection of Android Permissions and API Naive Bayes, SVM, Information Anzhi Market and
malware by using permissions calls MLP, random forest, gain Contagio Mobile
and API calls [11] and J48
Exploring Permission-induced Individual permission SVM, decision Forward selection Google Play
Risk in Android Applications and group of trees, and random (SFS) and principal , Mal Zhou and
for Malicious Application collaborative forest. component analysis VirusShare
Detection [12] permissions. (PCA)
A probabilistic discriminative
model for Android malware API calls and permissions Regularized logistic Information gain and Google Play and
detection with decompiled regression Chi-square Genome
source code [6]
kEFCM-based
K-ANFIS [24] Permission-based Adaptive Neuro- Information gain Google Play and
features Fuzzy Inference ratio Genome
System
High Accuracy Android Permissions and API call Random forest as McAfee’s internal
Malware Detection Using features an ensemble None repository
Ensemble Learning [14] learning method
Static analysis-based Deep belief Frequency analysis - Google Play, Contagio
DroidDetector [15] features and dynamic networks based feature Community and
analysis -based features evaluation Genome Project
Permission-based Hybrid neuro-fuzzy Information gain Google Play and
EHNFC [20] features classifier with ratio Genome Project
evolving clustering
Entropy based
Identification of malicious Static features from the SVM, random Category Coverage Google Play store and
Android app using manifest manifest and executable forest, and rotation Difference and Drebin dataset
and opcode features [16] files forests Weighted Mutual
Information
Utilizing sensitive Google Play, Anzhi
subgraphs to construct Random forest, Market, Android
DAPASA [7] five features depicting decision tree, k-NN, None Malware Genome
invocation patterns. and PART Project and
piggybacked families
Detecting Android malicious 11 types of static features SVM was used to
apps and categorizing benign from each app to Ensemble of sort the weight of Markets in China called
apps with ensemble of characterize the multiple classifiers each feature Anzhi and Wild
classifiers [19] behaviors of the app
DroidFusion [8] Permissions, API calls Fusion approach Information gain DREBIN and
and intents Malgenome project
user’s approval is required with other permissions, The optimal hyperplane can be obtained as a solution to the
according to the category of the permission requested, either following optimization problem.
normal or potentially dangerous permission. 1
The low-risk permissions are classified under normal minimize ‖𝑤𝑤‖2 (2)
2
permission category, which are not particularly harmful and
do not present any risk to the user's privacy or the device's subject to 𝑦𝑦𝑖𝑖 �(𝑤𝑤. 𝑥𝑥𝑖𝑖 ) + 𝑏𝑏� ≥ 1 , ∀𝑖𝑖 (3)
operation such as INTERNET,
ACCESS_NETWORK_STATE, and
MODIFY_AUDIO_SETTINGS [20, 30]. On the other hand, In real-world applications, the data are usually influenced
the higher risk permissions could potentially affect the by outliers, which are affected by noise. The decision
user's privacy, hardware, software or system. These high- boundaries can be softened by introducing a slack positive
risk permissions are categorized under the dangerous variable ξ for each training pattern. Eq. (4) is called the
permission category. The malware apps are highly primal optimization of SVM.
interested in requesting the dangerous permissions to gain 𝐿𝐿
the required privileges in order to access sensitive 1
minimize ‖𝑤𝑤‖2 + 𝐶𝐶 � 𝜉𝜉𝑖𝑖 (4)
information. For example, READ_CONTACTS, 2
𝑖𝑖=1
WRITE_CONTACTS, CALL_PHONE, and SEND_SMS
are four dangerous permissions, requiring the explicit
subject to 𝑦𝑦𝑖𝑖 �(𝑤𝑤. 𝑥𝑥𝑖𝑖 ) + 𝑏𝑏� ≥ 1 − 𝜉𝜉𝑖𝑖 , ∀𝑖𝑖 (5)
approval of the user at the installation time [20, 30].
The permission request can be approved or rejected by the
user without stopping the application, which will run with where C is a positive regularization constant, which controls
limited capabilities. In Android 6.0 or higher, the dangerous the degree of penalization of ξ . Therefore, C controls
permissions must be granted by user at runtime, in the case allowable errors in the trained solution: high C permits few
where the user is not notified of any app permissions at the errors while low C allows a higher proportion of errors in
installation time. Even if the dangerous permissions are the solution.
granted by the user at the installation time, the user can To solve the convex optimization problem, Lagrangian
enable and disable permissions one-by-one in system multipliers 𝛼𝛼𝑖𝑖 are used to produce the to the dual
settings at runtime [33]. optimization problem, as shown in Eq. (6), which must be
solved in order to find a separating maximum margin
hyperplane for a given set of data points.
4. Support Vector Machine 𝑛𝑛 𝑛𝑛 𝑛𝑛
1
maximize � 𝛼𝛼𝑖𝑖 − � � 𝑦𝑦𝑖𝑖 𝑦𝑦𝑗𝑗 𝛼𝛼𝑖𝑖 𝛼𝛼𝑗𝑗 (𝑥𝑥𝑖𝑖 . 𝑥𝑥𝑗𝑗 ) (6)
The support vector machine (SVM) was introduced by 2
𝑖𝑖=1 𝑖𝑖=1 𝑗𝑗=1
Vapnik [34] and has become one of the most popular
machine learning techniques. SVM has many advantages subject to 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶 for all 𝑖𝑖 = 1, … , 𝑛𝑛 (7)
over others. The generalization ability of SVM can be 𝑛𝑛
maximized, since SVM is trained to maximize the margin. and � 𝛼𝛼𝑖𝑖 𝑦𝑦𝑖𝑖 = 0 (8)
In addition, there is a global optimum solution in SVM 𝑖𝑖=1
training. Furthermore, SVM is robust to outliers, because
the margin parameter C controls the misclassification error. In most cases, the data points are not linearly separable.
Therefore, SVM has been successfully applied in many Thus, the SVM will transform the data to a higher-
complex classification applications. dimensional space and then classify them using the same
Consider a set of training data vectors 𝑋𝑋 = {𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 }, principle as the linear case. A kernel function 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 ) is
𝑥𝑥𝑖𝑖 ∈ 𝑅𝑅𝑑𝑑 , and a set of corresponding labels 𝑌𝑌 = {𝑦𝑦1 , … , 𝑦𝑦𝑛𝑛 }, used to perform this transformation and the dot product in a
𝑦𝑦𝑖𝑖 ∈ {1, −1} . SVM aims to maximize the margin between single step. Thus, the final dual optimization problem using
the separating hyperplane and the closest instance in each kernel function can be expressed using Eq. (9) to find a
class in order to obtain the ideal hyperplane between the two separating maximum margin hyperplane for non-separable
different classes. The hyperplane can be expressed as in Eq. data points.
(1). 𝑛𝑛 𝑛𝑛 𝑛𝑛
1
maximize � 𝛼𝛼𝑖𝑖 − � � 𝑦𝑦𝑖𝑖 𝑦𝑦𝑗𝑗 𝛼𝛼𝑖𝑖 𝛼𝛼𝑗𝑗 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 ) (9)
(𝑤𝑤. 𝑥𝑥) + 𝑏𝑏 = 0, 𝑤𝑤 ∈ 𝑅𝑅𝑑𝑑 , 𝑏𝑏 ∈ 𝑅𝑅 (1) 2
𝑖𝑖=1 𝑖𝑖=1 𝑗𝑗=1
where the vector w defines the boundary, x is the input subject to 0 ≤ 𝛼𝛼𝑖𝑖 ≤ 𝐶𝐶 for all 𝑖𝑖 = 1, … , 𝑛𝑛 (10)
vector of dimension d, and b is a scalar threshold.
20 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019
𝑛𝑛
function. The fittest chromosomes will be given
and � 𝛼𝛼𝑖𝑖 𝑦𝑦𝑖𝑖 = 0 (11) more opportunities to reproduce and evolve.
𝑖𝑖=1 iv. Reproduction: As in biological evolution, a GA
can recombine the fittest chromosomes to create
The dual optimization problem of SVM is usually solved by new better chromosomes and solutions. The
classical optimization methods such as Sequential Minimal reproduction process is conducted through three
Optimization (SMO) [35], Kernel Adatron (KA) [36, 37] genetic operators: selection, crossover, and
and Quadratic Program (QP)[38]. However, these classical mutation.
optimization methods are based on an analytical approach • Selection: The better chromosomes are
or complex mathematical calculations. Furthermore, and selected based on the fitness values to become
their performances are modest compared to those of the parents to produce new chromosomes
evolutionary algorithms used in this paper. (offspring).
• Crossover: In the crossover operator, GA
randomly chooses a crossover point, where
5. Evolutionary Algorithms two parent chromosomes break, and then
exchanges the chromosome parts after that
In the past few years, evolutionary algorithms have become
point in order to create new offspring.
a very popular research topic, which have been effectively
• Mutation: The mutation operator changes the
employed in many applications and fields such as
gene value in some randomly chosen location
optimization, feature selection, pattern recognition,
of the chromosome.
classification, and clustering. Evolutionary algorithms are a
set of modern metaheuristic optimization algorithms based
Some selected chromosomes are iteratively evolved to
on the evolution of populations, which are primarily
produce a new generation of new better solutions. The
developed to solve complicated optimization problems.
reproduction and fitness evaluation are repeated until the
The most well-known evolutionary algorithms used in
termination criterion is satisfied.
optimization problems are the genetic algorithm (GA) [39]
and particle swarm optimization (PSO) [40].
5.2 Particle Swarm Optimization
5.1 Genetic Algorithm The particle swarm optimization algorithm (PSO) is a
common population-based optimization algorithm tied to
The genetic algorithm [39] is the most common
evolutionary computation, which was introduced by
evolutionary algorithm based on the simulation of the
Kennedy and Eberhart [40]. PSO is a simpler and faster
biological evolution process in chromosomes. In other
evolutionary algorithm and has fewer parameters compared
words, the genetic algorithm mimics the survival of the
to GA. Therefore, PSO has been widely applied in many
fittest among chromosomes of consecutive generations in
problems and areas such as optimization, feature selection,
order to solve a certain optimization problem. The genetic
pattern recognition, classification and clustering [42-46].
algorithm (GA) is commonly utilized to solve the
Unlike the chromosome’s evolution in GA, PSO is inspired
optimization problems with a large search space [41, 42].
by the social behavior of birds flocking in interacting and
In GA, all possible candidate solutions construct the search
cooperating to find food. Like evolutionary algorithms, a
space or population of a specific optimization problem. A
PSO population (called a swarm) consists of candidate
basic GA mainly implements the following four major
solutions or individuals (called particles) which are
steps:
randomly initialized. Each particle then moves in the search
i. Encoding of chromosomes: Each candidate
space with a velocity 𝑣𝑣 in order to find the optimal solution.
solution represents chromosome in a population,
The particles learn over time based on their own experience
which is encoded with several genes. Each gene is
and the experience of the other particles in the swarm.
a small part of a candidate solution, which can
represent one parameter to be optimized. Le 𝑥𝑥𝑖𝑖 = (𝑥𝑥𝑖𝑖1 , 𝑥𝑥𝑖𝑖2 , 𝑥𝑥𝑖𝑖3 , … , 𝑥𝑥𝑖𝑖𝑖𝑖 ) be the current position of
ii. Initialization of the population: An initial particle i, and 𝑣𝑣𝑖𝑖 = (𝑣𝑣𝑖𝑖1 , 𝑣𝑣𝑖𝑖2 , 𝑣𝑣𝑖𝑖3 , … , 𝑣𝑣𝑖𝑖𝑖𝑖 )be the velocity of
population of chromosomes is randomly generated, particle i, where D is the dimensionality of the search space.
which consists of the initial solutions of a specific To find the best solution in PSO, each particle changes its
optimization problem. velocity, as shown in Eq. (12) and Eq. (13), according to
iii. Fitness evaluation: The GA computes the fitness pbest and gbest, which represent the best previous position
value of each individual chromosome, which of a particle (personal best position) and the best position
indicates the goodness of the solution provided by obtained by the whole population (global best position),
the individual chromosome. The chromosomes in respectively.
the population are then evaluated using the fitness
IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019 21
𝑡𝑡+1 𝑡𝑡 𝑡𝑡+1
In this study, we used the same dataset as that used in [20],
𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝑣𝑣𝑖𝑖𝑖𝑖 (12) which consists of 500 malware and benign apps, in order to
𝑡𝑡+1 𝑡𝑡 𝑡𝑡 evaluate the performance of the proposed methods. In the
𝑣𝑣𝑖𝑖𝑖𝑖 = 𝑤𝑤 ∗ 𝑣𝑣𝑖𝑖𝑖𝑖 + 𝑐𝑐1 ∗ 𝑟𝑟1 ∗ (𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑥𝑥𝑖𝑖𝑖𝑖 )
𝑡𝑡 dataset used in this study, 250 benign apps and 250 malware
+ 𝑐𝑐2 ∗ 𝑟𝑟2 (𝑝𝑝𝑔𝑔𝑔𝑔 − 𝑥𝑥𝑖𝑖𝑖𝑖 ) (13) apps were collected from official Google Play store [54] and
Genome [55], respectively, which are the most common
where d=1,2,3 …D, t represents the tth iteration, 𝑝𝑝𝑖𝑖𝑖𝑖 and sources of benign and malware apps. These apps have
𝑝𝑝𝑔𝑔𝑔𝑔 denote the pbest and gbest, 𝑤𝑤 is inertia weight, 𝑐𝑐1 and many permission features that can be used as input features
𝑐𝑐2 are acceleration parameters which are commonly set to to train and test the proposed hybrid intelligent Android
2.0, and 𝑟𝑟1 and 𝑟𝑟2 are random values in the range [0, 1]. malware detection.
Fig. 2 A methodology of the proposed hybrid intelligent android malware detection using evolving SVM based on GA and PSO
22 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019
Unlike the conventional SVM, the GA and PSO are used in In the proposed Droid-HESVMPSO, an initial swarm of
the proposed Droid-HESVMGA and Droid-HESVMPSO to particles is randomly generated; each particle represents the
solve the dual optimization problem in support vector value of the Lagrange multiplier for a certain training
machine in order to increase accuracy of the Android example. Each particle’s fitness is then computed using Eq.
malware detection. (9) and evaluated accordingly. The PSO fitness function
In the proposed Droid-HESVMGA and Droid- aims at finding a separating maximum margin hyperplane
HESVMPSO, each candidate solution or individual is for given training examples. If the current particle fitness is
represented by a vector and denoted as a chromosome in better than the best fitness of that particle (pbest), then the
population GA or a particle in the PSO swarm. The GA new pbest will be updated to the current particle fitness. The
chromosome and position of each PSO particle is global best fitness(gbest) is then updated to the particle with
represented by a vector of real values, in which each value the best fitness value of all the particles. If the stopping
represents the value of the Lagrange multiplier for a training criteria (sufficiently good fitness or maximum iterations)
example as shown in Fig. 4. Fig. 4 illustrates an example are met, the PSO will terminate the search and return the
of encoding the Lagrange multipliers vector in a GA optimal values of the Lagrange multipliers. Otherwise, the
chromosome and PSO particle in the proposed Droid- pbest and gbest are utilized to update the velocity and
HESVMGA and Droid-HESVMPSO. position for every particle using Eq. (12) and Eq. (13). This
process is repeated until the stop conditions are met.
𝒚𝒚 𝑦𝑦1 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑛𝑛−1 𝑦𝑦𝑛𝑛 After solving the optimization problem and obtaining the
Lagrange multipliers by using the GA and PSO, the
𝜶𝜶 𝛼𝛼1 𝛼𝛼2 𝛼𝛼3 ………………. 𝛼𝛼𝑛𝑛−1 𝛼𝛼𝑛𝑛 proposed Droid-HESVMGA and Droid-HESVMPSO can
be used in Android malware detection. The proposed Droid-
HESVMGA and Droid-HESVMPSO use the decision Eq.
Fig. 4 Encoding of Lagrange multipliers vector in GA chromosome and (18) to classify each input vector x into positive or negative
PSO particle
class. In Android malware detection, the positive class
refers to the malware apps, while the negative class
In the evolutionary algorithms, once the candidate solutions represents the benign apps.
are encoded, the fitness function is used to evaluate the 𝑛𝑛
candidate solutions or individuals. In order to find a 𝑦𝑦(𝑥𝑥) = 𝑠𝑠𝑠𝑠𝑠𝑠 �� 𝛼𝛼𝑗𝑗 𝑦𝑦𝑗𝑗 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) + 𝑏𝑏� (18)
separating maximum margin hyperplane of SVM for a 𝑖𝑖=1
given set of data points, Eq. (9) is used as the fitness
function in the proposed Droid-HESVMGA and Droid- In the proposed Droid-HESVMGA and Droid-
HESVMPSO to evaluate the GA chromosomes and PSO HESVMPSO, the radial basis function (RBF) defined as Eq.
positions. (19) was used as the kernel function 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) , since it
In the proposed Droid-HESVMGA, an initial population of achieved a better performance in many applications
chromosomes is randomly generated, which represent the compared to other kernel functions. The parameter 𝛾𝛾
values of Lagrange multipliers for training patterns. The represents the width of the RBF.
chromosomes’ performances are then computed and
evaluated by the fitness function shown in Eq. (9). The GA 𝐾𝐾(𝑥𝑥𝑖𝑖 , 𝑥𝑥) = 𝑒𝑒𝑒𝑒𝑒𝑒( − 𝛾𝛾‖𝑥𝑥𝑖𝑖 − 𝑥𝑥‖2 ), 𝛾𝛾 > 0 (19)
will stop the search and return the optimal vector of
Lagrange multipliers if the good fitness or maximum
generations number is reached. Otherwise, the GA
implements selection, crossover, and mutation to produce a 7. Analysis and Discussion of Results
new generation of chromosomes in order to find the optimal
vector of Lagrange multipliers that can maximize the 7.1 Dataset Collection and Preparation
performance of SVM. The fittest chromosomes are the most
In this study, the dataset with 500 Android apps used by
appropriate candidate for mating to produce a new
[20] was adopted in our experiments in order to train and
generation. Crossover and mutation are then employed to
evaluate the proposed hybrid intelligent android malware
produce child chromosomes, used as alternative
detection based on the evolving support vector machine:
chromosomes to their parent chromosomes in the GA
Droid-HESVMGA and Droid-HESVMPSO. In this dataset,
population. The parent chromosomes are then chosen to
250 malware apps were collected from official Google Play
exchange the chromosome genes using the crossover
[54] while 250 malware apps were collected from Genome
process to offer a child chromosome with genetic materials.
[55], which is commonly used in the literature to collect
In GA mutation, a gene in the child chromosome can be
malware apps.
changed to a random value between 0 and C in the proposed
In order to prepare the training dataset, the permission
Droid-HESVMGA.
features of these Android apps were extracted and
24 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019
converted to binary forms based on whether the permission classifiers in Android malware detection. In Table 3, true
feature is requested or not by the Android apps. positive (TP) is the number of correctly classified malware
Accordingly, the best 25 permissions features were selected apps, false negative (FN) is the number of incorrectly
by using information gain ratio in order to help in improving classified malware apps, true negative (TN) is the number
the performance of the proposed hybrid intelligent Android of correctly classified benign apps, and false positive (FP)
malware detection method based on evolving support is the number of incorrectly classified benign apps.
vector machine.
7.2 Evaluation Measures 7.3 Comparison Against Popular Machine Learning
In order to evaluate the proposed methods, Droid- Classifiers
HESVMGA and Droid-HESVMPSO were trained and
evaluated using 5-fold cross-validation. In this study, the proposed Droid-HESVMGA and Droid-
In our experiments, we used five popular metrics, which are HESVMPSO were trained and compared with two common
commonly used in the literature for detecting malware apps, implementations of SVM, known as LibSVM [56] and
to evaluate the performance of the proposed Droid- mySVM [57], which use the classical optimization
HESVMGA and Droid-HESVMPSO. Correct classification techniques to solve the quadratic programming problem.
rate, true positive rate, false positive rate, false negative rate In all SVMs, RBF was used as the kernel function while the
and area under ROC curve were calculated in order to judge best parameters C (margin softness) and γ (RBF width)
the effectiveness of the proposed Droid-HESVMGA and were obtained by using a grid search algorithm in order to
Droid-HESVMPSO. The correct classification rate (CCR) achieve the best performance for Android malware
is the rate of malware and benign apps that are correctly detection. In addition, the proposed Droid-HESVMGA and
classified with respect to all Android apps. True positive Droid-HESVMPSO were compared with other four
rate (TPR) is the rate of malware apps classified as malware machine learning classifiers commonly used in the
out of total malware apps. False positive rate (FPR) is the literature to detect the Android malware applications: back-
rate of benign apps classified as malware out of total benign propagation neural network (BPNN), naïve Bayes classifier
apps. False negative rate (FNR) is the rate of malware apps (NB), random forest (RF), and k-Nearest neighbour (kNN).
classified as benign out of total malware apps. The area In the proposed Droid-HESVMGA and Droid-
under ROC curve (AUC) is a measure used to evaluate the HESVMPSO, it was found by a trial-and-error basis that the
trade-off between TPR and FPR. parameters settings of the GA and PSO shown in Tables 4
Table 3 shows the measures used to evaluate the and 5 produced good results.
performance of proposed evolving support vector machine
Table 4: Parameters settings of GA used in the proposed Droid- Table 5: Parameters settings of PSO used in the proposed Droid-
HESVMGA HESVMPSO
Parameter Value Parameter Value
Population size 20 Number of particles 20
Maximum generation 1000 Maximum iterations
(generations) 1000
Crossover probability 0.9
Mutation type Switching mutation C1 2
Selection scheme Tournament (0.75) C2 2
maximum number of
Stop condition iterations
classifiers used in Android malware detection. It is clear malware detection, since it calculates the rate of malware
from Table 6 that the proposed Droid-HESVMGA and apps classified as benign out of total malware apps. It can
Droid-HESVMPSO outperformed BPNN, NB, RF, kNN, be seen in Table 6 that the proposed Droid-HESVMGA and
mySVM, and LibSVM in most of the performance Droid-HESVMPSO achieved lower FNR compared to most
measures. of the other machine learning classifiers used in Android
malware detection. Only 2% of malware apps were
Table 6: Comparison of the proposed Droid-HESVMGA and Droid- incorrectly classified as benign apps by the proposed Droid-
HESVMPSO against popular machine learning classifiers used in HESVMGA and Droid-HESVMPSO.
Android malware detection
CCR TPR FPR FNR AUC
BPNN 87.80 91.20 15.60 8.80 95.40 7.4 Comparison Against Other Hybrid Android
NB 74.80 99.20 49.60 0.80 70.60 Malware Detection Works
RF 88.80 89.60 12.00 10.40 97.30
kNN 86.80 77.60 4.00 22.40 95.70
mySVM 82.00 85.60 21.60 14.40 96.10 In this section, the proposed Droid-HESVMGA and Droid-
LibSVM 88.20 91.60 15.20 8.40 96.20 HESVMPSO were compared with other existing hybrid
Proposed 95.60 98.00 6.80 2.00 96.90
Droid-HESVMGA malware detection approaches, which combined several
Proposed 94.80 98.00 8.40 2.00 96.00 algorithms into classifiers to enhance the performance of
Droid-HESVMPSO
malware detection. The proposed Droid-HESVMGA and
Droid-HESVMPSO were compared to other previous
As can be observed from Table 6, the proposed Droid- works: evolving hybrid neuro-fuzzy classifier (EHNFC)
HESVMGA and Droid-HESVMPSO achieved much better [20], dynamic evolving fuzzy inference system (DENFIS)
CCR than other machine learning classifiers used in [20, 58] and adaptive fuzzy inference system with triangular
Android malware detection. In particular, the proposed membership function (TRIMF–ANFIS) [20]. For a fair
Droid-HESVMGA produced the highest CCR (95.60%), comparison, the proposed Droid-HESVMGA and Droid-
followed by Droid-HESVMPSO (94.80%), among the other HESVMPSO were trained and then evaluated using the
machine learning classifiers. This indicates that the same dataset used in these previous works.
proposed Droid-HESVMGA and Droid-HESVMPSO were The results in Table 7 clearly depict the overall
able to correctly detect both malware and benign apps with classification accuracy (CCR), TPR, FPR, FNR and AUC
respect to all the Android apps. for the proposed Droid-HESVMGA and Droid-
In terms of other measures, the results shown in Table 6 HESVMPSO compared to those of EHNFC, DENFIS, and
demonstrate that the proposed Droid-HESVMGA and TRIMF–ANFIS.
Droid-HESVMPSO also achieved better performance in
both TPR and FPR compared to other machine learning Table 7: Comparison of the proposed Droid-HESVMGA and Droid-
classifiers used in Android malware detection. Actually, HESVMPSO against other hybrid Android malware detection works
there is a trade-off between TRR and FPR. Therefore, a CCR TPR FPR FNR AUC
EHNFC 90.00 88.24 5.00 5.00 95.00
balanced performance between TPR and FPR should be DENFIS 82.20 87.50 19.05 12.50 92.20
provided in a good malware detection system. TRIMF–ANFIS 88.00 78.95 11.11 21.05 93.00
Although the highest TPR was accomplished by NB, NB Proposed 95.60 98.00 6.80 2.00 96.90
Droid-HESVMGA
also produced the poorest FPR among other machine Proposed 94.80 98.00 8.40 2.00 96.00
learning classifiers used in this study. This was due to NB Droid-HESVMPSO
produced unbalanced detection between malware and
benign apps. This negatively affected the overall accuracy In terms of CCR, the results in Table 7 show that the
and AUC of NB used in Android malware detection. On the proposed Droid-HESVMGA accomplished the highest
other hand, the proposed Droid-HESVMGA and Droid- accuracy (95.60%), followed by the proposed Droid-
HESVMPSO produced balanced detection performance HESVMPSO (94.80%), EHNFC (90.00%), TRIMF–
between the positive and negative classes in both TRR and ANFIS (88.00%), and EHNFC (82.20%).
FPR. This indicates that the proposed Droid-HESVMGA In terms of TPR, FPR, and FNR, the proposed Droid-
and Droid-HESVMPSO were able to precisely detect both HESVMGA and Droid-HESVMPSO achieved much better
malware and benign apps. Consequently, the proposed TPR than EHNFC, DENFIS, and TRIMF–ANFIS.
Droid-HESVMGA and Droid-HESVMPSO achieved better Furthermore, the lowest FNR (only 2.00%) was
performance in terms of the overall accuracy (CCR), TPR , accomplished by the proposed Droid-HESVMGA and
FPR and AUC compared to the other machine learning Droid-HESVMPSO. Meanwhile, the proposed Droid-
classifiers. HESVMGA and Droid-HESVMPSO produced lower FPR
In addition, Table 6 presents the performance in terms of compared to the FPRs obtained by DENFIS and TRIMF–
FNR for the proposed Droid-HESVMGA and Droid- ANFIS. This was primarily due to the capability of the
HESVMPSO compared to other machine learning proposed Droid-HESVMGA and Droid-HESVMPSO to
classifiers. FNR is also an important measure in Android successfully detect both malware and benign apps. On the
26 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019
[16] M. V. Varsha, P. Vinod, and K. A Dhanya. “Identification of [31] Android, Manifest.permission | Android Developers, 2019.
malicious Android app using manifest and opcode features,” On the WWW, URL
J. Comput. Virol. Hacking Tech., vol. 13(2), pp. 125–138, https://developer.android.com/reference/android/Manifest.p
2017. ermission#summary
[17] M. L. Dantas Dias, and A. R. R. Neto, “Evolutionary support [32] F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y.
vector machines: A dual approach,” In 2016 IEEE Congress Rahulamathavan, “PIndroid: A novel Android malware
on Evolutionary Computation, pp. 2185–2192, 2016. detection system using ensemble learning methods,”
[18] I. Mierswa, “Evolutionary Learning with Kernels: A Generic Computers and Security, vol. 68, pp. 36–46, 2017.
Solution for Large Margin Problems,” In Proceedings of the [33] Google, Control your app permissions on Android 6.0 and up
8th annual conference on Genetic and evolutionary - Google Play Help, 2019. On the WWW, URL
computation - GECCO ’06 (p. 1553), 2006. https://support.google.com/googleplay/answer/6270602?hl=
[19] W. Wang, Y. Li, X. Wang, J. Liu, and X. Zhang, “Detecting en
Android malicious apps and categorizing benign apps with [34] V. Vapnik, “The nature of statistical learning theory,” (2nd
ensemble of classifiers,” Future Generation Computer edition), New York: Springer, 1995.
Systems, vol. 78, pp. 987–994, 2018. [35] J. C. Platt, “Fast training of support vector machines using
[20] A. Altaher, “An improved Android malware detection sequential minimal optimization,” In Advances in Kernel
scheme based on an evolving hybrid neuro-fuzzy classifier Methods - Support Vector Learning. Cambridge, MA, USA:
(EHNFC) and permission-based features,” Neural MIT Press, 1999.
Computing and Applications, vol. 28(12), pp. 4147–4157, [36] J. K. Anlauf and M. Biehl, “The adatron: An adaptive
2017. perceptron algorithm,” Europhysics Letters, vol. 10(7), pp.
[21] A. T. Kabakus, and I. A. Dogru, “An in-depth analysis of 687, 1989.
Android malware using hybrid techniques,” Digital [37] T-T. Frie, N. Cristianini, and C. Campbell, “The kernel-
Investigation, vol. 24, pp. 25–33, 2018. adatron algorithm: a fast and simple learning procedure for
[22] C. Zhao, C. Wang, and W. Zheng, “Android Malware support vector machines,” In Machine Learning: Proceedings
Detection Based on Sensitive Permissions and APIs,” In of the Fifteenth International Conference (ICML’98),
International Conference on Security and Privacy in New Citeseer, pp. 188–196, 1998
Computing Environments (SPNCE), pp. 96–104, 2019. [38] P. E. Gill, W. Murray, and M. H. Wright, Practical
[23] E. M. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, optimization, 1981.
“MalDozer: Automatic framework for android malware [39] D. E. Goldberg, “Genetic Algorithms in Search Optimization
detection using deep learning,” Digital Investigation, vol. 24, and Machine Learning,” Addison-Wesley, 1989.
pp. S48–S59, 2018. [40] J. Kennedy and R. Eberhart, “Particle swarm optimization,”
[24] A. Shubair, and A. Altaher, “Intelligent Approach for In IEEE International Conference on Neural Networks, pp.
Android Malware Detection,” KSII Transactions on Internet 1942–1948, 1995.
and Information Systems, vol. 9(8), pp. 2964 – 2983, 2015. [41] B. Chakraborty, “Evolutionary Computational Approaches to
[25] K. Tam, A. Feizollah, N. B. Anuar, R. Salleh, and L. Feature Subset Selection,” International Journal of Soft
Cavallaro, “The Evolution of Android Malware and Android computing and Bioinformatics, vol. 1(2), pp. 59-65, 2010.
Analysis Techniques,” ACM Computing Surveys, vol. 49(4), [42] A. Kawamura, and B. Chakraborty, “A hybrid approach for
pp. 1–41, 2017. optimal feature subset selection with evolutionary
[26] H. S. Ham, and M. J. Choi, “Analysis of Android malware algorithms,” Proceedings - 2017 IEEE 8th International
detection performance using machine learning classifiers,” In Conference on Awareness Science and Technology, ICAST
International Conference on ICT Convergence, 2013. 2017, pp. 564–568, 2018.
https://doi.org/10.1109/ICTC.2013.6675404 [43] M-Y Cho, and T. T. Hoang, “Feature Selection and
[27] A. Guerra, “APPLICATION OF FULL MACHINE Parameters Optimization of SVM Using Particle Swarm
LEARNING WORKFLOW FOR MALWARE Optimization for Fault Classification in Power Distribution
DETECTION IN ANDROID ON THE BASIS OF SYSTEM Systems,” Computational Intelligence and Neuroscience,
CALLS AND PERMISSIONS,” MS Thesis, TALLINN Article ID 4135465, 9 pages, 2017.
UNIVERSITY OF TECHNOLOGY, School of Information [44] L. M. Abualigah, A. T. Khader, and E. S. Hanandeh, “A new
Technologies, 2018. See also URL feature selection method to improve the document clustering
https://digi.lib.ttu.ee/i/?10770 using particle swarm optimization algorithm,” Journal of
[28] Yerima, S. Sezer, and G. McWilliams, “Analysis of Bayesian Computational Science, vol. 25, pp.456-466, 2018.
classification-based approaches for Android malware [45] J. Wei, Z. Jian-Qi, and Z. Xiang, “Face recognition method
detection,” IET Information Security, vol. 8(1), pp. 25-36, based on support vector machine and particle swarm
2014. optimization,” Expert Systems with Applications, vol. 38(4):
[29] McAfee, McAfee Labs Threats Report, 2018. On the WWW, pp. 4390-4393, 2011.
URL https://www.mcafee.com/enterprise/en- [46] D. O’Neill, A. Lensen, B. Xue, and M. Zhang, “Particle
us/assets/reports/rp-quarterly-threats-sep-2018.pdf Swarm Optimisation for Feature Selection and Weighting in
[30] N. Peiravian, and X. Zhu, “Machine learning for Android High-Dimensional Clustering,”. In 2018 IEEE Congress on
malware detection using permission and API calls,” In Evolutionary Computation(CEC), 2018.
Proceedings - International Conference on Tools with [47] Google Play, Google Play Store, 2019. On the WWW, URL
Artificial Intelligence, ICTAI, pp. 300–305, 2013. https://play.google.com/store?hl=en
28 IJCSNS International Journal of Computer Science and Network Security, VOL.19 No.9, September 2019