A Hybrid Approach for Android Mal Ware Detection
A Hybrid Approach for Android Mal Ware Detection
A Hybrid Approach for Android Mal Ware Detection
6, Nº6
Abstract Keywords
With the increase in the popularity of mobile devices, malicious applications targeting Android platform have Android Malware,
greatly increased. Malware is coded so prudently that it has become very complicated to identify. The increase Dynamic Malware
in the large amount of malware every day has made the manual approaches inadequate for detecting the Analysis, Machine
malware. Nowadays, a new malware is characterized by sophisticated and complex obfuscation techniques. Learning, Static Malware
Thus, the static malware analysis alone is not enough for detecting it. However, dynamic malware analysis Analysis.
is appropriate to tackle evasion techniques but incapable to investigate all the execution paths and also it is
very time consuming. So, for better detection and classification of Android malware, we propose a hybrid
approach which integrates the features obtained after performing static and dynamic malware analysis. This
approach tackles the problem of analyzing, detecting and classifying the Android malware in a more efficient
manner. In this paper, we have used a robust set of features from static and dynamic malware analysis for
creating two datasets i.e. binary and multiclass (family) classification datasets. These are made publically
available on GitHub and Kaggle with the aim to help researchers and anti-malware tool creators for enhancing
or developing new techniques and tools for detecting and classifying Android malware. Various machine
learning algorithms are employed to detect and classify malware using the features extracted after performing
static and dynamic malware analysis. The experimental outcomes indicate that hybrid approach enhances the
accuracy of detection and classification of Android malware as compared to the case when static and dynamic DOI: 10.9781/ijimai.2020.09.001
features are considered alone.
I. Introduction 31 million Android malware were found in 2018 and also shows that
approximate 1.9 million new samples are identified every year [5]. As
- 174 -
Regular Issue
accuracy, the features acquired from both static and dynamic analysis Li et al. [14] suggested a malware identification system known
can be integrated [8]. Moreover, there exists only limited benchmark as significant Permission Identification (SigPID). They build 3 levels
datasets available publically to evaluate the proposed machine of pruning by extracting permission data to determine the relevant
learning techniques. permissions that can be to distinguish between malware and benign
In this paper, we have worked on both detection and family apps. The authors employed ML methods to classify the Android
classification of Android malware. Here detection relates to a binary apps. The experimental results show that SigPID performs better
classification problem which consists of two classes “malware” and with 93.62% of accuracy as compared to existing approaches. In
“benign” and family classification relates to the multiclass classification [15], the authors suggested a highly efficient method to extract API
problem which consists of 13 malicious families. Android malware calls, permission-rate, surveillance system events and permissions
family signifies a group of malicious programs that share common as features. They constructed a model based on ensemble Rotation
behavior and are generated from the same source code. We propose a Forest to identify whether an app is malicious or benign. The results
hybrid approach for detection and classification of Android malicious demonstrate that the proposed approach obtained highest precision
apps. It depends on the fusion of static and dynamic malware analysis. of 88.16% with 88.26% accuracy at the sensitivity of 88.40%. Yerima
Initially, we perform static malware analysis for extracting static and Sezer [16] introduced a novel fusion technique (DroidFusion)
features based on API calls, command strings, permissions and intents. which includes amalgamation of various ML techniques for improving
Then, we performed dynamic malware analysis to extract features accuracy. The DroidFusion creates a model by training classifiers and
using CuckooDroid [9]. CuckooDroid is an extension of cuckoo then they employed a feature ranking algorithm on the predictive
sandbox which is used for automatic analysis of Android suspicious accuracies in order to acquire a final classifier. The results indicate
files [10]. The features considered for dynamic malware analysis are that DroidFusion is more superior than stacking ensemble method.
based on cryptographic operations, dynamic permissions, information In [17], the authors presented a multimodal deep learning based
leaks and system calls. In order to strengthen the accuracy, we framework for the identification of Android malware. They extracted
integrate the features acquired from both static and dynamic malware diverse features and refined these using similarity based or existence-
analysis. Considering the presence of irrelevant, noisy and redundant based method. The results show that the accuracy obtained by the
features, an information gain ranking algorithm is applied to extract multimodal deep learning framework is 98%. Feizollah et al. [18]
the relevant features. presented an analysis of the usefulness of intents for classifying
the malicious apps. They reported that intents are more important
A. Research Contributions feature than permissions for classification of malware. The results
The major contributions of the paper are as follows: demonstrate that detection rate of intent and permission is 91% and
83% respectively. The authors also indicate that the detection accuracy
1. Two datasets i.e. binary and multiclass (family) classification
of combined features is 95.5% which is higher than the individual
datasets are created (using static and dynamic malware analysis)
features. In [19], the authors explored the risk based on permissions
and shared publically on GitHub and Kaggle.
in Android apps. They applied T-test, correlation coefficient and
2. Feature selection method is used to choose the appropriate set of mutual information to rank the specific permission according to their
features for both the datasets. risk. Principal component analysis and sequential forward selection
3. The relevant features selected for both static and dynamic malware are employed to determine the subsets of risky permission. They
analysis are integrated. evaluated the effectiveness of risky permission for detection of malapp
4. Machine learning (ML) algorithms belonging to different with Decision Tree (DT) Support Vector Machine (SVM) and Random
categories are employed and evaluated on both the datasets for Forest (RF). The results indicate that the detection accuracy of malapp
static, dynamic and integrated features. detector is 94.62% with 0.6 False Positive Rate (FPR). Dhalaria et al. [20]
performed a comparative analysis between different base classifiers
B. Organization such as SVM, Logistic Regression (LR), Naive Bayes (NB) K-Nearest
The rest of the paper is structured as follows: section II summarizes Neighbor (K-NN), DT, RF and ensemble techniques (Bagging, Stacking
the related work on classification and identification of Android and Boosting). The experimental results demonstrate that the stacking
malware. Section III describes the proposed methodology. Section IV ensemble technique found to be more superior then the base classifiers.
demonstrates the experimental outcomes based on different evaluation Dhalaria et al. [21] employed a convolutional neural network (CNN) to
parameters. Section V concludes the paper and provides future scope. classify Android malicious apps. The grayscale images of classes.dex
and AndroidManifest.xml are created which are extracted from the
Android package. The experimental results indicate that the classes.
II. Related Work
dex file performs better in comparison to AndroidManifest.xml.
In the literature, researchers have developed various novel The static malware analysis is quicker in analyzing the code but
techniques for identification and classification of Android malware it fails against code obfuscation techniques and morphed malware.
using ML methods. Current malware identification methods fall under The dynamic malware analysis overwhelms the constraints of static
two categories i.e. static and dynamic malware analysis [11]. This malware analysis.
section discusses the work associated with malware detection and
classification based on static and dynamic malware analysis using ML
B. Dynamic Malware Analysis
methods. It executes the samples in runtime environment such as an emulator
and a virtual machine to track the behavior of the app. This section
A. Static Malware Analysis includes the literature on detection and classification of Android
The static malware analysis is the way to discover the malicious malware using dynamic malware analysis.
patterns in app by examining its code. In order to find out the Cai et al. [22] presented a novel classification approach (DroidCat)
malicious patterns [12], it uses disassemble techniques to decompile which is based on dynamic analysis. The authors used a set of dynamic
the app source code [13]. This subsection includes the research papers features such as method calls, app resources and Inter-Component
related to static malware analysis which focuses on detection and Communication. The experimental outcomes indicate that DroidCat
classification of Android malware.
- 175 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
obtained 97% accuracy and F-measure for classifying the Android Moreover, there exist only two benchmark datasets i.e. Malgenome
malicious apps. In [23], the authors proposed a dynamic analysis [3] and Derbin [32] which have been made public over past few years.
framework i.e. EnDroid which used different types of dynamic features These datasets include old Android apps and were created in the years
for the identification of malware. They employed a chi-square algorithm 2012 and 2014 respectively. But nowadays, evolving malwares are so
to select the relevant features and applied an ensemble learning sophisticated and complex that they cannot be recognized easily. This
technique to differentiate between malware and benign apps. Das et paper presents the approach used for creating our own datasets. These
al. [24] proposed the model named as frequency centric for feature consist of recent Android apps and we have made these publically
construction using system calls to effectively identify the malware. available on GitHub and Kaggle. These would help the research
The authors build a ML method using Multilayer Perceptron (MLP) community to evaluate their proposed ML techniques for malware
in FPGA in order to train a classifier. They found that the proposed classification. Different machine learning algorithms are employed
approach obtained low power consumption, fast detection and high on these two datasets to perform binary and family classification of
accuracy. In [25], the authors addressed TaintDroid, a dynamic taint Android apps when both static and dynamic features are integrated.
tracking which is proficient of continuously tracking various source
of sensitive data. As a result, it provides security service firms seeking
III. Proposed Methodology
and essential input for Android users to identify malicious apps. Chen
et al. [26] presented a framework which uses a classification scheme This section discusses the proposed methodology for detection and
named as Model-Based Semi-Supervised (MBSS). The authors also family classification of Android apps. It consists of three phases i.e.
compared their proposed approach with the existing approach such data collection, data preparation and detection & family classification.
as K-NN, Linear Discriminant Analysis (LDA) and SVM. The results In the first phase, data is collected from various sources such as
indicate that the proposed approach achieves 98% accuracy at very virusshare [33], apkmirror [34] and apkpure [35]. In the second phase,
low FPR. In [27], the authors designed and implemented a dynamic MD5 hash is applied to remove the duplicate apps and then these
analysis method named as DroidTrace. It examined the system calls apps are examined using Avira Antivirus (AV) tool [36]. The static
which are executed in dynamic payloads. DroidTrace also carried out and dynamic malware analysis is performed to extract features from
physical alteration to trigger numerous dynamic loading behaviors the Android apps. Static features are extracted using self-developed
within an app. python script which uses multiple automated tools such as Baksmali
The dynamic malware analysis can detect the unfamiliar malware Diassembler [37], String [38] and AXMLPrinter2 [39]. The features
that a static analysis cannot but it takes more time and resources. extracted using static malware analysis includes API calls, command
Moreover, it explores only a single execution path. string, permissions and intents. Dynamic features are extracted using
CuckooDroid [9] which analyzes the behavior of app during runtime.
C. Hybrid Malware Analysis The features extracted using dynamic malware analysis include
Gandotra et al. [8] suggested that single approach either dynamic dynamic permissions, cryptographic operations, information leaks
or static is not sufficient for accurately classifying the malware due to and system calls. After feature extraction, an information gain feature
the obfuscation and execution stalling. To overcome this problem, the ranking algorithm is employed in order to remove the noisy, irrelevant
researchers have started to make use of a hybrid analysis approach. and redundant features. Various ML classifiers such as SVM, DT, RF,
This section includes the work done in the field of hybrid malware NB, K-NN PART and MLP are employed to identify and classify the
analysis which focuses on detection and classification of Android Android apps. Fig. 1 shows the workflow of the proposed methodology.
malware.
A. Data Collection (Phase-I)
Yuan et al. [28] introduced an engine named as DroidDetector
which automatically characterized the app as either malware or The initial phase of the proposed methodology is data collection.
benign. The authors extracted the features using static and dynamic The Android apps are collected from multiple sources such as apkpure,
analysis. The experimental results demonstrate that DroidDetector apkmirror and virusshare. These apps are stored in Android application
obtained highest accuracy 96.76% when compared with conventional packages (.apk) file format. A total of 4400 recent Android apps are
ML techniques. In [29], the authors proposed the hybrid approach downloaded from these sources. The malicious apps are downloaded
for identification of malware using static and dynamic analysis. from virusshare after getting registered with their website and also
They created the normal and malicious pattern sets by matching the getting permission from the administrator. The benign apps are
pattern of benign and malware apps with each other. To determine collected from apkpure and apkmirror.
the unknown app, the authors also compared these with both normal B. Data Preparation (Phase-II)
and malicious pattern sets offline. The results demonstrate that the
proposed approach obtained better detection rate. Martin et al. This subsection discusses various steps used for data preparation.
[30] presented an OmniDroid dataset consisting of 22,000 malware These include removing duplicate applications, labelling, feature
and benign samples. They developed a framework for static and extraction and feature selection.
dynamic analysis of apps and applied ensemble learning classifiers 1. Removing Duplicate Applications
for identification of malicious apps. In [31], the authors presented an
MD5 hash algorithm is employed on the collected Android apps to
Android Application Sandbox (AASandbox) which is capable to carry
eliminate the duplicate ones. After removing the duplicates, we are left
out both dynamic and static analysis to identify malicious apps. For
with 3547 Android apps.
providing distributed and fast detection, they deployed the detection
algorithm and sandbox in the cloud. The results show that AASandbox 2. Labelling
is more efficient than antivirus apps available for Android OS. The unique Android apps obtained from the previous step are
From the literature survey, it is found that the hybrid approach scanned using Avira Antivirus (AV) tool for labelling. After labelling,
is capable to classify the Android apps more accurately. Though, a out of 3547 apps, 1747 are malicious and 1800 are benign. Furthermore,
lot of work has been reported in the literature on detection (binary 1747 malicious apps are further labelled as 13 malware families as
classification) of Android apps using hybrid approach but the least shown in Fig. 2.
focus has been paid on family classification of Android malware.
- 176 -
Regular Issue
The features extracted for analysis using these tools are API calls,
Adware/ANDR.Fengvi.B.Gen
permissions, intents and command strings. The process of extracting
Adware/ANDR.Dianjin.A.Gen
Adware/ANDR.Waps.I.Gen
features is shown in Fig. 3. The .apk file is saved in compressed zip
Android/TrojanSMS.Boxer.B.Gen
format. To view the content of .apk file, we first need to unzip or unpack
Android/SmsAgent.AAV.Gen it. The .apk file consists of classes. dex file, Android Manifest file, res,
Android/Plankton.C.Gen lib and assets folder. Through this, we extracted four different types
Android/MTK.F.Gen
of static features using different static tools. Classes.dex file contains
Android/Mseg.E.Gen
Adware/ANDR.Mobwin.A.Gen
information about API calls, Android Manifest file contains information
Adware/ANDR.Kuguo.K.Gen
about permission and intents and the rest contains information about
Adware/ANDR.AdsWo.CG.Gen command strings. These features are selected on the basis of existing
Adware/ANDR.AdsMogo.FAN.Gen literature and the official site of Android which says that these specific
Android/AdLoad.A.Gen features are more prominent in malicious applications [16], [40].
0 50 100 150 200 250 300
Number of Aplications
350 400 450 • API calls: It is used to interact with the device. These contain
the method, classes and packages to help developers to build
Fig. 2. Graphical representation of Android malware families. apps. The Android is based on java programming language and
Java compiler converts the source code into java bytecode. It uses
3. Feature Extraction Dalvik Virtual Machine (DVM) after disassembling java bytecode,
Various features are extracted using static and dynamic malware it gives information about packages, methods and classes. A total
analysis. In static malware analysis, we have extracted four different of 47 API calls are extracted using a self-developed python script
types of static features i.e. API calls, intents, permissions and command after decompiling classes.dex with Baksmali Disassembler.
strings using self-developed python script which uses several • Permissions: The main purpose of permissions is to secure the
automated tools such as Baksmali Disassembler, AXMLPrinter2 and privacy of the users. The apps must request permission to access
string. In dynamic malware analysis, we have extracted four different user sensitive information and system features. The system
types of dynamic features i.e. cryptographic operations, dynamic sometimes gives permission itself or could provoke users to accept
permissions, information leaks and system calls using CuckooDroid the request. Permission is mainly declared in the AndroidManifest.
(Android malware analysis tool). The detailed description related xml. A total of 277 permissions are extracted using a self-
to feature extraction using static and dynamic malware analysis is developed python script after decompiling AndroidManifest.xml
explained below. with AXMLPrinter2.
a) Using Static Malware Analysis • Command strings: It is one of the static features which is used for
identification of Android malware. It analyzes the command string
It is performed without executing the code. It uses various which is present in lib, res, assets folder. A total of 6 command
disassemble techniques to decompile the app source code. To extract strings are extracted using a self-developed python script after
the static features, we developed a python script which uses various decompiling lib, res and assets with string.
automated tools i.e. Baksmali Disassembler, AXMLPrinter2 and string.
- 177 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
- 178 -
Regular Issue
Number of
Features Examples Feature value
features
Number of
Features Examples Feature value
features
in reducing the space and time complexity and also help in increasing classification dataset (named as Dataset-1) and 47 static features for
the accuracy. In this work, we have employed an information gain family classification dataset (named as Dataset-2). Fig. 5 and Fig. 6
feature ranking algorithm [44] to select the relevant features for show the top 20 selected attributes for detection (Dataset-1) and family
better detection and classification of Android malware. Information classification (Dataset-2) datasets respectively.
gain calculates the quantity of information provided about the class. The datasets created using dynamic malware analysis consist of
It makes use of entropy to compute the homogeneity of samples. The 323 features. Out of 323 features, we are left with 99 dynamic features
entropy H(X) of the dataset (having c number of classes) is calculated in Dataset-1 and 35 features in Dataset-2. Fig. 7 and Fig. 8 show the
as given in equation (1). top 20 selected dynamic features for detection (Dataset-1) and family
(1) classification (Dataset-2) datasets respectively.
The summary of both the datasets i.e. Dataset-1 and Dataset-2
Where pi is the probability of class i in the dataset X. The dataset is
before and after feature selection is given in table III. Fig. 9 shows the
then split on the different attributes A. The entropy for a dataset with
various steps for preparing these two datasets.
respect to attribute A i.e. H(X, A) is calculated using equation (2).
TABLE III. Description of Dataset (Where, # Stands for Number of)
(2)
Here k represents the possible values of the attribute A. #Feature extracted #Feature selected
Dataset #Benign #Malicious
Information gain achieved by an attribute is expressed as shown in
Name apps apps
equation (3). Greater the Information Gain (IG) of a particular feature, Static Dynamic Static Dynamic
more important the feature is.
(3) Dataset-1 1800 1747 352 323 110 99
The information gain method assigns rank and weight to each 1747 (with 13
Dataset-2 ----- 352 323 47 35
feature. We have not considered the attributes with zero weight. Thus families)
out of 352 features, we are left with 110 static features for binary
- 179 -
an
Weights dr Weights
da oi
t al
an d.
e dr pe
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Se
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
ak oi
d. rm rv tra
s|
op ge pe is
si Weights an i ce ns
er tp an rm on dr
oi C ac
at r i dr is .A d. on t
io ec d oi si C Lj pe at ne
C av
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
n= vf d. on ta ct
w r an pe .G ES a. rm ch io
rit om dr rm ET S_ la is In n
e| oi is ng si te
pa an d. si S_ ... .C on rfa
th dr in on TA la .B ce
... oi te .A S ss IN
d. nt C KS .g D
w an pe .a C Lj et _J
rit dr rm ct
io ES av
a. C O
e an ...
oi
d. is n. S_ ne on
so an pe si SE ... t.U ic
ck dr on N R
oi rm .S D LD
al
...
ge et d.
pe is YS TO
tu an si TE ec
id dr rm on od
32 oi is .R M er
ge
d. si EC _. an
en pt pe on .. d t Bi
c ry si ra rm .C V
EI on roid nd
pt gp ce is H Se .o er
io ro si
AN E...
on La rv s.
n_ cm on . G nd ic Bi
n
D as Se SE E_. ro eC d
ES k rv T_ .. Lj id on er
/E an ic W av .c n
C re eC A a. on ec
B/
N a dr L .. la te bi
oi on n n nd ted
oP d d. ne . an g. t.C S
ad pe
r b c dr C on erv
an te t
d. dr mis ind d od las e ic
e
- 180 -
de .. oi si Se .p s.
er ge xt.u
cr ch
m
d.
in
on r v m t D n
yp te .MO ice Lj is ec re.
Features
tio od av si ..
nt U la
Features
FO an
R n_ . N an a.l
a
on re
M AE
dr
oi a
ac
t T_ dr n . G d
an g.
Features
AT S d. nd ion oi C
ET Fi.
dr pe ro .C la _T ..
da
ta
AC _E m oi
d. rm id. A La d.in
nd t s AS
le C XT
ER kdi p is os LL ro
en
t.a ge
s.
KS
ak ES r id t
s| N
Lj erm sio .Bi
n .c ct M
S_ av n. Te
o i o e
op G AL a. issi
o
VI de
B r le n. th
er O _. l a R ph nte
at O .. an ng n.R nt
BO od
s
da R io G dr .C AT on O
ta E n= LE oi la
EA E yM .Co T_
le AD w _. d. ss D_ an an nt C
ak _ pe ..
Fig. 5. Top 20 selected static features for detection dataset (Dataset-1).
da
rit
e .. dr ag ext
s| rm .ge LO
Fig. 6. Top 20 selected static features for family classification dataset (Dataset-2).
ak stho ... oi on eth pe .ge
s| st
E_
S d. i od rm tL s...
i
an d es |d T pe nte .AC s is n e1
dr es si
oi th tp
AT
E
rm nt. CE
ac on N
d.
ac
os Lj is t S .M ...
t|d ort= O
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
co av sion ion S_
a. .M .RU U
un estp 80.
.. la O N
ts or ng N Se
rv
.A t= .C DIF tra T
cc 91 la Y_ ic
ou ss eC nsa
... ... on ct
nt .g
Au et ne
th Fi ct
... el
d io
n
Regular Issue
0.8
0.7
0.6
0.5
Weights
0.4
0.3
0.2
0.1
0
...
...
...
_S RD
..
.
...
...
..
..
..
|..
BC ...
T_ ..
ho estp th..
or E
sg
1.
8|
g
ES
K.
0.
K.
th
ad
SA CB d
1.
th
ce
52
es TIM
80
dd
yp dms
S
es t=8
n= cvm
O
8
/P
/P
a
7
a
pa
oP
e
vi
A
t=
rt=
KC
t=
ed Pa
|p
C
=
t|d e|p
pr
C
n_
er
t
r
or
te
de sen
te
or
/N
o
o
e
ES No
P
_R
rit
tio
tp
tp
r
i
tp
SE
C
ho wri
tp
AE wr
B/
BC
w
es
es
e/
e/
B/
M
N
C
n=
=
ed
C
I
O
C
|d
t|d
|d
de ion
/E
cr
|d
t_
io
S/
io
tio S/E
TI
t
t
ES
st
os
ar
s
os
os
t
t
AC
ra
ra
ra
St
th
D
th
E
R
pe
pe
pe
st
st
n_
st
es
n_
n_
es
n_
de
de s|o
n_
de
|o
|o
|d
tio
tio
da ks|d
tio
da aks
da aks
s|
s|
da aks
tio
s|
ak
yp
yp
yp
yp
ak
ak
ak
yp
a
le
cr
le
le
cr
cr
le
cr
le
le
le
le
ta
ta
ta
cr
de
de
ta
ta
de
ta
ta
ta
da
da
da
en
da
Features
Fig. 8. Top 20 selected dynamic features for family classification dataset (Dataset-2).
- 181 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
• PART is a partial decision tree algorithm. It is a separate and • F-measure: It signifies the harmonic mean of recall and precision.
conquer rule learner. This technique produces sets of rules known It is calculated as shown in equation (7).
as decision list. A new sample is compared to each rule and then
the sample is assigned the class of the first matching rule [50]. (7)
• Multilayer Perceptron (MLP) is also called as Multilayer Neural
• Accuracy: It is the ratio of true positive and true negative
Networks [51]. It consists of an input layer, an output layer and
instances divided by the total number of instances. It is calculated
the hidden layer. It has various output units. The units of the
as shown in equation (8).
hidden layer become input for the next layer. Semwal et al. [52],
[53] worked in the field of different classification problems using
(8)
deep learning techniques such as DNN based classifier and ANN.
In [54], the authors [54] worked in the Extreme Machine Learning • MCC: It is used to measure the quality of binary classification
(ELM) for classification and prediction of gait data. In our work, we algorithms. Its value lies between -1 to +1. Here -1 means inverse
applied MLP for detection and classification of Android malware. prediction and +1 means a perfect prediction. It is calculated as
We run the MLP for hidden layer h=3 and h=5 for Dataset-1 and shown in equation (9).
Dataset-2 respectively. The activation function used for Dataset-1
and Dataset-2 are sigmoid and Softmax respectively. The learning (9)
rate is considered to be as 0.3. Fig. 10 shows the general framework
of backpropagation based on neural network [53]. • AUC curve: It is one of the most significant parameters to
measure the performance of classification models. It represents
Connections with weighted Wij the measure of the separability.
(5)
• Precision: It is defined as the ratio of actual true predictive
instances divided by the total number of true cases. It is computed
as shown in equation (6).
(6)
- 182 -
Regular Issue
100 1
98
0.95
96
94 0.9
92
0.85
90
88 0.8
Accuracy(%)
86
MCC
0.75
84
82 0.7
80
0.65
78
76 0.6
74
0.55
72
70 0.5
SVM DT NB RF K-NN PART MLP SVM DT NB RF K-NN PART MLP
Classifers Classifers
(a) (b)
Fig. 11. Comparison of different classifiers based on (a) Accuracy (b) MCC using static features for Dataset-1.
- 183 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
- 184 -
Regular Issue
100 1
98
0.95
96
94 0.9
92
0.85
90
88 0.8
Accuracy(%)
86
MCC
0.75
84
82 0.7
80
0.65
78
76 0.6
74
0.55
72
70 0.5
SVM DT NB RF K-NN PART MLP
SVM DT NB RF K-NN PART MLP
Classifers
Classifers
(a) (b)
Fig. 13. Comparison of different classifiers based on (a) Accuracy (b) MCC using dynamic features for Dataset-1.
- 185 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
100 1
98
96
0.95
94 Static
92 Dynamic
90 Integrated 0.9
88
Accuracy(%)
86
MCC
0.85
84
82 Static
80 0.8 Dynamic
Integrated
78
76
0.75
74
72
70 0.7
SVM DT NB RF K-NN PART MLP SVM DT NB RF K-NN PART MLP
Classifers Classifers
(a) (b)
Fig. 15. Comparison of different classifiers based on (a) Accuracy (b) MCC using static, dynamic and integrated features for Dataset-1.
TABLE X. Classification Results of Best Classifier Using Static, Dynamic and Integrated Features for Dataset-1 and Dataset-2
Dataset Classifier Approach TPR FPR Precision F-measure MCC Accuracy (%)
Static 0.965 0.035 0.965 0.965 0.933 96.50
Dataset-1 RF Dynamic 0.970 0.030 0.970 0.970 0.940 97.01
Integrated 0.985 0.015 0.985 0.985 0.971 98.53
Static 0.867 0.024 0.870 0.866 -- 86.72
Dataset-2 RF Dynamic 0.886 0.018 0.888 0.885 -- 88.60
Integrated 0.901 0.016 0.902 0.901 -- 90.10
* MCC -- not applicable for multiclass dataset i.e. Dataset-2.
- 186 -
Regular Issue
- 187 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº6
- 188 -