Droidfusion: A Novel Multilevel Classifier Fusion Approach For Android Malware Detection
Droidfusion: A Novel Multilevel Classifier Fusion Approach For Android Malware Detection
Droidfusion: A Novel Multilevel Classifier Fusion Approach For Android Malware Detection
Abstract—Android malware has continued to grow in volume malware samples with nearly 2.5 million new samples discov-
and complexity posing significant threats to the security of mobile ered every year [2]. Android malware can be embedded in
devices and the services they enable. This has prompted increas- a variety of applications such as banking apps, gaming apps,
ing interest in employing machine learning to improve Android
malware detection. In this paper, we present a novel classi- lifestyle apps, educational apps, etc. These malware-infected
fier fusion approach based on a multilevel architecture that apps can then compromise security and privacy by allowing
enables effective combination of machine learning algorithms for unauthorized access to privacy-sensitive information, rooting
improved accuracy. The framework (called DroidFusion), gener- devices, turning devices into remotely controlled bots, etc.
ates a model by training base classifiers at a lower level and then Zero-day Android malware have the ability to evade
applies a set of ranking-based algorithms on their predictive accu-
racies at the higher level in order to derive a final classifier. The traditional signature-based defences. Hence, there is an
induced multilevel DroidFusion model can then be utilized as an urgent need to develop more effective detection methods.
improved accuracy predictor for Android malware detection. We Recently, machine learning-based methods are increasingly
present experimental results on four separate datasets to demon- being applied to Android malware detection. However, clas-
strate the effectiveness of our proposed approach. Furthermore, sifier fusion approaches have not been extensively explored
we demonstrate that the DroidFusion method can also effec-
tively enable the fusion of ensemble learning algorithms for as they have been in other domains like network intrusion
improved accuracy. Finally, we show that the prediction accuracy detection.
of DroidFusion, despite only utilizing a computational approach In this paper, we present and investigate a novel classi-
in the higher level, can outperform stacked generalization, a well- fier fusion approach that utilizes a multilevel architecture to
known classifier fusion method that employs a meta-classifier increase the predictive power of machine learning algorithms.
approach in its higher level.
The framework, called DroidFusion, is designed to induce a
Index Terms—Android malware detection, classifier fusion, classification model for Android malware detection by train-
ensemble learning, machine learning, mobile security, stacked ing a number of base classifiers at the lower level. A set of
generalization.
ranking-based algorithms are then utilized to derive combi-
nation schemes at the higher level, one of which is selected
to build a final model. The framework is capable of lever-
I. I NTRODUCTION aging not only traditional singular learning algorithms like
N RECENT years, Android has become the leading mobile decision trees or naive Bayes, but also ensemble learning algo-
I operating system with a substantially higher percentage
of the global market share. Over 1 billion Android devices
rithms like random forest, random subspace, boosting, etc. for
improved classification accuracy.
have been sold with an estimated 65 billion app downloads In order to demonstrate the effectiveness of the DroidFusion
from Google Play alone [1]. The growth in the popularity approach, we performed extensive experiments on four
of Android and the proliferation of third party app markets datasets derived from extracting features from two publicly
has also made it a popular target for malware. Last year, available and widely used malware samples collection (i.e.,
McAfee reported that there were more than 12 million Android Android Malgenome project [3] and DREBIN [4]) and a
collection of samples provided by Intel Security (formerly,
Manuscript received June 3, 2017; revised September 11, 2017; accepted McAfee). The unique contributions of this paper can be
November 11, 2017. This work was supported by the U.K. Engineering summarized as follows.
and Physical Sciences Research Council through the Centre for Secure
Information Security (CSIT-2) under Grant EP/N508664/1. This paper was 1) We propose a novel general-purpose classifier fusion
recommended by Associate Editor P. P. Angelov. (Corresponding author: approach (DroidFusion) and present its evaluation on
Suleiman Y. Yerima.) four different datasets. DroidFusion can be applied
S. Y. Yerima was with the Centre for Secure Information Technologies,
Queen’s University Belfast, Belfast BT3 9DT, Northern Ireland. He is now to not only traditional learners but also ensemble
with the Faculty of Technology, De Montfort University, Leicester LE1 9BH, learners.
U.K. (e-mail: [email protected]). 2) We propose four ranking-based algorithms that enable
S. Sezer is with the Centre for Secure Information Technologies, Queen’s
University Belfast, Belfast, Northern Ireland (e-mail: [email protected]). classifier fusion within the DroidFusion framework. The
This paper has supplementary downloadable multimedia material available algorithms are utilized in building a final improved
at http://ieeexplore.ieee.org provided by the authors. classification model for Android malware detection.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 3) We present the results of extensive experiments to
Digital Object Identifier 10.1109/TCYB.2017.2777960 demonstrate the effectiveness of our proposed approach.
2168-2267 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
The results of experiments with singular classifiers and SVM, decision tree, k-NN, and naive Bayes with information
ensemble classifiers are presented. priors and hierarchical mixture of naive Bayes.
4) Furthermore, we present results of a performance com- Wang et al. [52] applied logistic regression, linear SVM,
parison of DroidFusion with stacked generalization (or decision tree, and random forest with static analysis for the
stacking), a well-known classifier fusion method that is detection of malicious apps. They utilized app-specific static
also based on a multilevel architecture. features and platform-specific static features for training the
5) Datasets that we created from the feature extraction pro- machine learning algorithms. The authors reported a maximum
cess with DREBIN and Malgenome project malware true positive rate (TPR) of 96% and false positive rate (FPR)
samples are released in the supplementary material. of 0.06% with the logistic regression classifier based on
The rest of this paper is structured as follows. Section II dis- experiments conducted on 18 363 malware apps and 217 619
cusses related work while Section III presents the DroidFusion benign apps.
framework. The investigation methodology is presented in Other research papers that have investigated static fea-
Section IV, while Section V presents results with analyses tures with machine learning for Android malware detection
and discussion. Finally, the conclusion is given in Section VI. include [21]–[23], [45], [47], [48], and [54].
II. R ELATED W ORK B. Dynamic and Hybrid Analysis With Traditional Classifiers
In this section, we review related work on machine learning- Some of the detection methods utilized dynamic fea-
based Android malware detection. Static and/or dynamic tures with machine learning, for example AntiMalDroid [24].
analysis is used to extract model training features, and both AntiMalDroid is a dynamic analysis behavior-based mal-
methods have pros and cons. Static analysis is prone to obfus- ware detection framework that uses logged behavior sequence
cation [5], but is generally faster and less resource intensive as features with SVM. DroidDolphin [25] also employed
than dynamic analysis. Dynamic analysis is resistant to obfus- SVM with dynamically obtained features. Afonso et al. [26]
cation but can be hampered by anti-virtualization [6]–[9] and utilized dynamic API calls and system call traces and
code coverage limitations [10], [34]. investigated SVM, J48, IBk (an instance-based classifier),
BayesNet K2, BayesNet TAN, random forest, and naive
A. Static Analysis With Traditional Classifiers Bayes. Alzaylaee et al. [27] investigated SVM, naive Bayes,
PART, random forest, J48, multilayer perceptron (MLP), and
Recent Android malware detection work that employ
simple logistic by comparing their performances on real
machine learning with static features include the fol-
phones versus emulators using dynamically obtained features.
lowing. DroidMat [11] proposed applying k-means and
Ni et al. [46] proposed a real-time malicious behavior detec-
k-nearest neighbor (k-NN) algorithms based on static fea-
tion system that records API calls, permission uses, and other
tures from permissions, intents, and application program
real-time features such as user operations. In their paper, they
interface (API) calls, to classify apps as benign or malware.
used SVM and naive Bayes algorithms for detection with these
Arp et al. [4] proposed SVM based on permissions, API
run-time features.
calls, network access, etc. for lightweight on-device detec-
Mahindru and Singh [53] extracted 123 dynamic per-
tion. Yerima et al. [12], [14] proposed an eigenpsace analysis
missions from 11 000 Android applications which were
approach, as well as random forest ensemble learning models.
subsequently applied to several individual machine learning
The machine learning-based detection proposed in the papers
classifiers including naive Bayes, decision tree, random for-
were based on API calls, intents, permissions, and embed-
est, simple logistic, and k-star. In their experiments, simple
ded commands. Varsha et al. [15] investigated SVM, random
logistic was found to perform marginally better than the oth-
forest, and rotation forests on three datasets; their detection
ers but the malware classification accuracy of random forest,
method employed static features extracted from the manifest
decision tree (J48), and simple logistic were comparable.
and application executable files.
Other works such as MARVIN [28], adopt a hybrid static
Sharma and Dash [16] utilized API calls and permis-
and dynamic feature-based approach with machine learning
sions to build naive Bayes and k-NN-based detection systems.
(SVM and L2 regularized linear classifier). MARVIN assesses
In [17], API classes were used with random forest, J48, and
the risk associated with unknown Android apps in the form of
SVM classifiers. Wang et al. [18] evaluated the usefulness of
a malice score ranging from 0 to 10. Similarly, Su et al. [49]
risky permissions for malware detection using SVM, decision
adopted a hybrid static and dynamic feature approach by per-
trees, and random forest. DAPASA [19] focused on detecting
forming experiments on 1200 (900 clean and 300 malware)
malware piggybacked onto benign apps by utilizing sensi-
samples. Several machine learning algorithms were investi-
tive subgraphs to construct five features depicting invocation
gated including Bayes net, naive Bayes, k-NN, J48, and SVM.
patterns. The features are fed into machine learning algo-
The best overall accuracy of 91.1% was attained with SVM.
rithms, i.e., random forest, decision tree, k-NN, and PART,
with random forest yielding the best detection performance.
Cen et al. [20] proposed a detection method based on API C. Android Malware Detection With Classifier Fusion
calls from decompiled code and permissions. Their proposed Previous works in intrusion detection systems such
method applies a probabilistic discriminative model based on as [29]–[32] investigated classifier fusion for improving
regularized logistic regression (RLR). RLR is compared to detection accuracy. This method is also being applied
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 3
TABLE I
to the detection of Android malware. For example, OVERVIEW OF S OME OF THE PAPERS T HAT A PPLY C LASSIFIER F USION
Milosevic et al. [50] investigated classifier fusion approach FOR A NDROID M ALWARE D ETECTION . NB = NAIVE BAYES ; SL =
with static analysis based on Android permissions and source S IMPLE L OGISTIC ; LR = L INEAR R EGRESSION ; DT = D ECISION T REE ;
VP = VOTED P ERCEPTRON ; AVE P = AVERAGE OF P ROBABILITIES ;
code-based analysis. They used SVM, C.45, decision trees, P ROD P = P RODUCT OF P ROBABILITIES ; AND M AX P =
random tree, random forests, JRip, and linear regression classi- M AXIMUM P ROBABILITY
fiers. The authors experimented with ensembles that contained
odd combinations of three and five classifiers using the major-
ity voting fusion method. The best fusion model achieved an
accuracy rate of 95.6% using the source-code-based features.
However, the number of samples used in the experiments were
limited (387 samples for the permissions-based experiments
and 368 for source code-based analysis).
Yerima et al. [13] compared several classifier fusion meth-
ods, i.e., majority vote, product of probabilities, maximum
probability, and average of probabilities using J48, naive
Bayes, PART, RIDOR, and simple logistic classifiers. The
classifiers were trained with static features extracted from
6863 app samples, and in the experiments presented, the fused
models performed better than the single classifiers.
Wang et al. [51] extracted 11 types of static features
and employed multiple classifiers in a majority vote fusion
approach. The classifiers include SVM, k-NN, naive Bayes, produce different randomly induced models that are subse-
classification and regression tree (CART), and random for- quently combined). At the lower level, the (DroidFusion)
est. Their experiments on 116 028 app samples showed more base classifiers are trained on a training set using a strati-
robustness with the majority voting ensemble than with the fied N-fold cross-validation technique to estimate their relative
individual base classifiers. predictive accuracies. The outcomes are utilized by four differ-
Idrees et al. [55] utilized permissions and intents as fea- ent ranking-based algorithms (in the higher layer) that define
tures to train machine learning models and applied classifier certain criteria for the selection and subsequent combination
fusion for improved performance. Their experiments were per- of a subset (or all) of the applicable base classifiers. The out-
formed on 1745 app samples starting with a performance comes of the ranking algorithms are combined in pairs in
comparison between MLP, decision table, decision tree, ran- order to find the strongest pair, which is subsequently used
dom forest, naive Bayes, and sequential minimal optimization to build the final DroidFusion model (after testing against an
classifiers. The decision table, MLP, and decision tree classi- unweighted parallel combination of the base classifiers).
fiers were then combined using three schemes: 1) average of
probabilities; 2) product of probabilities; and 3) majority vot- A. DroidFusion Model Construction
ing. Coronado-De-Alba et al. [33] proposed and investigated The model building, i.e., training process is distinct from the
a classifier fusion method based on random forest and ran- prediction or testing phase, as the former utilizes a training-
dom committee ensemble classifiers. Their approach embeds validation set to build a multilevel ensemble classifier which is
random forest within random committee to produce a meta- then evaluated on a separate test set in the latter phase. Fig. 1
ensemble model. The meta-model outperformed the individual illustrates the two-level architecture of DroidFusion. It shows
classifiers in experiments performed with 1531 malware and the training paths (solid arrows) and the testing/prediction path
1531 benign samples. Table I summarizes papers that have (dashed arrows). First, at the lower level each base classi-
investigated classifier fusion for Android malware detection. fier undergoes an N-fold cross-validation-based estimate of
In contrast to all of the existing Android malware detection class performance accuracies. Let the N-fold cross validated
works, this paper proposes a novel classifier fusion approach predictive accuracies for K base classifiers be expressed by
that utilizes four ranking-based algorithms within a multilevel Pbase , a K-tuple of the class accuracies of the K base classifiers
framework (DroidFusion). We evaluated DroidFusion exten-
sively and compared its performance to stacking and other Pbase = {[P1m , P1b ], [P2m , P2b ], . . . , [PKm , PKb ]}. (1)
classifier fusion methods. Next, we present DroidFusion.
The elements of Pbase are applied to the ranking-based algo-
rithms average accuracy-based (AAB) ranking scheme, class
III. D ROID F USION : G ENERAL P URPOSE F RAMEWORK differential-based (CDB) ranking scheme, ranked aggregate
FOR C LASSIFIER F USION of per class performance-based (RAPC) scheme, and ranked
The DroidFusion framework consists of a multilevel archi- aggregate of average accuracy and class differential-based
tecture for classifier fusion. It is designed as a general (RACD) scheme described later in Section III-B. Let X be
purpose classifier fusion system, so that it can be applied the total number of instances with M malware and B benign
to both traditional singular classifiers and ensemble classi- instances, where the M instances possess a label L = 1 denot-
fiers (which themselves employ a base classifier usually to ing malware and the B instances from X possess a label L = 0
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
denoting benign. All X instances are also represented by fea- reclassification is accomplished using V̇(x), x ∈ X based on
ture vectors with f binary representations, where f is the the criteria defined by the schemes in S using Pbase . Each
number of features extracted from the given app. The fea- scheme in S derives a set of Z weights that will be applied
tures in the vectors take on 0 or 1 representing the absence with V̇(x), x ∈ X for every instance during the reclassification
or presence the given feature. Additionally, after the N-fold process.
cross-validation process (as shown in Fig. 1), a set of K-tuple Let ωi , i ∈ {1, . . . , Z}, Z ≤ K be the set of weights derived
class predictions are derived for every instance x, given by for a particular scheme in S. Then, to reclassify an instance
x according to the scheme’s criterion, its class prediction will
V(x) = {v1 , v2 , . . . , vk }, ∀k ∈ {1, . . . , K}. (2)
be given by
Note that v1 , v2 , . . . , vk could be crisp predictions or prob- Z
ωi vi
ability estimates from the base classifiers. Adding the original 1 : if i=1 ≥ 0.5
CSj (x) = Z
i=1 ωi (5)
(known) class label, l, we obtain 0 : otherwise ∀j ∈ {1, 2, 3, 4}.
V̇(x) = {v1 , v2 , . . . , vk , l}, ∀k ∈ {1, . . . , K}, l ∈ {0, 1}. (3) Hence, the benign class accuracy performance for the given
Pbase and V̇(x), ∀x ∈ X will be utilized in the level-2 scheme is calculated from
computation during the DroidFusion model construction. Let X
(CSj (x) + 1)|CSj (x) = 0, l(x) = 0
us denote the set of four ranking-based schemes by S = PSj = x=1
ben
(6)
B
{S1, S2, S3, S4}. The pairwise combinations of the elements
of S will result in six possibilities where B is the number of benign instances, while the malware
accuracy performance is calculated from
φ = {S1S2, S1S3, S1S4, S2S3, S2S4, S3S4}. (4) X
CSj (x)|CSj (x) = 1, l(x) = 1
Our goal is to select the best pair of ranking-based schemes PSj = x=1
mal
. (7)
X−B
from S, and if its performance exceeds that of an unweighted
combination of the original base classifiers, it would be Thus the average performance accuracy is simply
selected to construct the final DroidFusion model. In the B · Pben
Sj + (X − B) · PSj
mal
event that the unweighted combination performance is greater, ṖSj = . (8)
DroidFusion will be configured to apply a majority vote (or X
average of probabilities) of the base classifiers in the final con- Likewise, to determine the performance of each pairwise com-
structed model. In order to estimate the accuracy performance bination in φ: let ωi , i ∈ {1, . . . , Z}, Z ≤ K be the first set
of each scheme in S or each pairwise combination in set φ, of weights derived for the first scheme in the pair, and let
a reclassification of the X instances (in the training-validation μi , i ∈ {1, . . . , Z}, Z ≤ K be those derived for the second
set) is performed for each scheme or pair of schemes. The scheme in the pair. Then, to reclassify the X instances in the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 5
training-validation set according to the combination pair, the B. Proposed Ranking-Based Algorithms
class prediction of each instance x will be given by The design of our proposed algorithms is influenced by the
⎧ Z Z observation that most typical classifiers perform differently for
⎪ i=1 ωi vi +i=1 μi vi
⎪
⎪ 1 : if ≥ 0.5 both classes. That is, class accuracy performance for benign
⎨ Z Z
i=1 ωi + i=1 μi
CSjSn (x) = 0 : otherwise (9) and malware are very rarely equal in magnitude. The proposed
⎪
⎪ ∀j ∈ {1, 2, 3, 4}, ∀n ∈ {1, 2, 3, 4} ranking-based algorithms include the following.
⎪
⎩
j = n, SjSn ≡ SnSj. 1) An AAB ranking scheme.
2) A CDB ranking scheme.
Therefore, computing benign class accuracy and malware class 3) An RAPC-based scheme.
accuracy will utilize 4) An RACD-based scheme.
X 1) Average Accuracy-Based Ranking Scheme: With the
x=1 CSjSn (x) + 1 |CSjSn (x) = 0, l(x) = 0
Pben
SjSn = (10) AAB method, the ranking is designed to be directly propor-
B tional to the average prediction accuracies across the classes.
and In this case, base classifiers with larger overall accuracy
X performance will rank higher. AAB does not take into account
x=1 CSjSn (x)|CSjSn (x) = 1, l(x) = 1
Pmal = (11) how well a base classifier performs for a particular class. Let
SjSn
X−B AAB be the first scheme S1, from set S. The algorithm is
respectively. The average performance accuracy for the pair- summarized as follows.
wise schemes will then be given by Let Pbase be the set of performance accuracies Pk,c ∈ Pbase
of K base classifiers. If m denotes malware and b, benign then
B · Pben
SjSn + (X − B) · PSjSn
mal
the average accuracy of the kth base classifier is given by
ṖSjSn = . (12)
X
ak = 0.5 × Pk,c |k ∈ {1, . . . , K}, 0 < Pk,c ≤ 1. (18)
∀j ∈ {1, 2, 3, 4}, ∀n ∈ {1, 2, 3, 4}, j = n, SjSn ≡ SnSj. c=m,b
Equivalently, the unweighted majority vote class predictions Let A ← ak , ∀k ∈ {1, . . . , K} be a set of the average predictive
for instance x is given by accuracies, to which a ranking function Rankdesc (.) is applied
K Ā ← Rankdesc (A). (19)
k=1 vi
Cmv (x) = 1 : if K ≥ 0.5 (13) Thus, Ā contains an ordered ranking of the level-1 base classi-
0 : otherwise ∀k ∈ {1, . . . , K}.
fiers average predictive accuracies in descending order. Next,
Hence, the benign class accuracy performance for the the top Z rankings are utilized in weight assignments as
unweighted scheme will be given by follows:
X ω1 = Z, ω2 = Z − 1, . . . , ωZ = 1, Z ≤ K. (20)
x=1 (Cmv (x) + 1)|Cmv (x) = 0, l(x) = 0
Pben
mv = . (14)
B Thus, the AAB class prediction C(x) for instance x in the
Likewise, the malware class accuracy performance for the training-validation set is given by (5) or given by (9) when
unweighted scheme is given by used in the pairwise combination with another scheme.
2) Class Differential-Based Ranking Scheme: With the
X
Cmv (x)|Cmv (x) = 1, l(x) = 1 CDB method, the ranking is directly proportional to the aver-
Pmv = x=1
mal
. (15) age predictive accuracy and inversely proportional to the abso-
X−B
lute value of the performance difference between the classes.
Finally, the average accuracy performance for the unweighted Assuming a binary classification problem, this approach will
scheme is given by be less likely to favor the decision from a base classifier that
B · Pben exhibits much higher accuracy in one class over the other but
mv + (X − B) · Pmv
mal
Ṗmv = . (16) will assign larger weights to good classifiers that perform rel-
X
atively well in both classes. The CDB procedure is described
After all the reclassifications are completed, and the aver- as follows.
age accuracies computed, the applicable scheme that will be Suppose the CDB method is taken as scheme S2, let the
utilized to construct the DroidFusion model is selected thus average accuracy of each base classifier be given by ak in (18)
With D̄ containing the ordered rankings of dk values, the top Then, for each base classifier, aggregate the values and apply
Z rankings are also utilized to assigned weights according the ranking function Rankdesc (.)
to (20). Thus, the S2 = CDB class prediction for an instance
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 7
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 9
TABLE V TABLE VI
M ALGENOME 215 C OMPARISON OF D ROID F USION W ITH BASE DREBIN 215 T RAIN -VALIDATION S ET R ESULTS AND L EVEL -2
C LASSIFIERS AND T RADITIONAL C OMBINATION A LGORITHM -BASED R ANKINGS FOR THE BASE C LASSIFIERS
S CHEMES ON T EST S ET (5 = H IGHEST R ANK AND 1 = L OWEST )
TABLE VII
DREBIN 215 T RAIN -VALIDATION S ET L EVEL -2 C OMBINATION
S CHEMES I NTERMEDIATE R ESULTS
TABLE X
M C A FEE 350 T RAIN -VALIDATION S ET L EVEL -2 C OMBINATION
S CHEMES I NTERMEDIATE R ESULTS
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 11
TABLE XIII
M C A FEE -100 T RAIN -VALIDATION S ET L EVEL -2 C OMBINATION
S CHEMES I NTERMEDIATE R ESULTS
TABLE XIV
M C A FEE 100 C OMPARISON OF D ROID F USION W ITH (E NSEMBLE ) BASE
C LASSIFIERS AND T RADITIONAL C OMBINATION S CHEMES ON T EST S ET
TABLE XV
D ROID F USION V ERSUS S TACKED G ENERALIZATION
FOR THE F OUR DATASETS
YERIMA AND SEZER: DroidFusion: NOVEL MULTILEVEL CLASSIFIER FUSION APPROACH FOR ANDROID MALWARE DETECTION 13
deploy the system for scenarios requiring rapid analyses for [10] S. R. Choudhary, A. Gorla, and A. Orso, “Automated test input gen-
large scale vetting or screening of apps. eration for Android: Are we there yet?” in Proc. 30th IEEE/ACM Int.
Conf. Autom. Softw. Eng. (ASE), Nov. 2015, pp. 429–440.
Note that although this paper is based on specific static [11] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu, “DroidMat:
features, classifiers trained from other types of features can Android malware detection through manifest and API calls tracing,” in
also be combined using DroidFusion. Basically, DroidFusion Proc. 7th Asia Joint Conf. Inf. Security (Asia JCIS), 2012, pp. 62–69.
[12] S. Y. Yerima, S. Sezer, and I. Muttik, “Android malware detection: An
is agnostic to the feature engineering process. eigenspace analysis approach,” in Proc. Sci. Inf. Conf. (SAI), London,
U.K., Jul. 2015, pp. 1236–1242.
[13] S. Y. Yerima, S. Sezer, and I. Muttik, “Android malware detection using
G. Limitations of DroidFusion parallel machine learning classifiers,” in Proc. 8th Int. Conf. Next Gener.
Although the proposed general-purpose DroidFusion Mobile Apps Services Technol. (NGMAST), Oxford, U.K., Sep. 2014,
pp. 37–42
approach has been demonstrated empirically to enable
[14] S. Y. Yerima, S. Sezer, and I. Muttik, “High accuracy Android mal-
improved accuracy performance by classifier fusion, there ware detection using ensemble learning,” IET Inf. Security, vol. 9, no. 6,
is scope for further improvement. The current DroidFusion pp. 313–320, Nov. 2015.
design is aimed at binary classification. Future work could [15] M. V. Varsha, P. Vinod, and K. A. Dhanya, “Identification of malicious
Android app using manifest and opcode features,” J. Comput. Virol.
investigate extending the algorithms in the DroidFusion Hacking Tech., vol. 13, no. 2, pp. 125–138, 2017.
framework to handle multiclass problems. [16] A. Sharma and S. K. Dash, “Mining API calls and permissions for
Android malware detection,” in Cryptology and Network Security. Cham,
Switzerland: Springer Int., 2014, pp. 191–205.
VI. C ONCLUSION [17] P. P. K. Chan and W.-K. Song, “Static detection of Android malware
by using permissions and API calls,” in Proc. Int. Conf. Mach. Learn.
In this paper, we proposed a novel general purpose Cybern., vol. 1. Lanzhou, China, Jul. 2014, pp. 82–87.
multilevel classifier fusion approach (DroidFusion) for [18] W. Wang et al., “Exploring permission-induced risk in Android appli-
Android malware detection. The DroidFusion framework cations for malicious application detection,” IEEE Trans. Inf. Forensics
Security, vol. 9, no. 11, pp. 1869–1882, Nov. 2014.
is based on four proposed ranking-based algorithms that [19] M. Fan et al., “DAPASA: Detecting Android piggybacked apps through
enable higher-level fusion using a computational approach sensitive subgraph analysis,” IEEE Trans. Inf. Forensics Security, vol. 12,
rather than the traditional meta classifier training that is no. 8, pp. 1772–1785, Aug. 2017.
[20] L. Cen, C. S. Gates, L. Si, and N. Li, “A probabilistic discriminative
used for example in stacked generalization. We empiri- model for Android malware detection with decompiled source code,”
cally evaluated DroidFusion using four separate datasets. The IEEE Trans. Depend. Secure Comput., vol. 12, no. 4, pp. 400–412,
results presented demonstrates its effectiveness for improving Jul./Aug. 2015.
performance using both nonensemble and ensemble base clas- [21] Westyarian, Y. Rosmansyah, and B. Dabarsyan, “Malware detec-
tion on Android smartphones using API class and machine learn-
sifiers. Furthermore, we showed that our proposed approach ing,” in Proc. Int. Conf. Elect. Eng. Informat. (ICEEI), Aug. 2015,
can outperform stacked generalization whilst utilizing only pp. 294–297.
computational processes for model building rather than train- [22] F. Idrees and M. Rajarajan, “Investigating the Android intents and per-
missions for malware detection,” in Proc. 10th IEEE Int. Conf. Wireless
ing a meta classifier at the higher level. Mobile Comput. Netw. Commun. (WiMob), Oct. 2014, pp. 354–358.
[23] B. Kang, S. Y. Yerima, S. Sezer, and K. McLaughlin, “N-gram opcode
analysis for Android malware detection,” Int. J. Cyber Situational
R EFERENCES Awareness, vol. 1, no. 1, pp. 231–254, Nov. 2016.
[1] Smartphone OS Market Share Worldwide 2009-2015 Statistics, [24] M. Zhao, F. Ge, T. Zhang, and Z. Yuan, “AntiMalDroid: An effi-
Statista, Hamburg, Germany, 2017. [Online]. Available: cient SVM-based malware detection framework for Android,” in
https://www.statista.com/statistics/263453/global-market-share-held- Communications in Computer and Information Science, vol. 243, C. Liu,
by-smartphone-operating-systems J. Chang, and A. Yang, Eds. Heidelberg, Germany: Springer, 2011.
[2] McAfee Labs Threat Predictions Report, McAfee Labs, Santa Clara, CA, pp. 158–166.
USA, Mar. 2016. [25] W.-C. Wu and S.-H. Hung, “DroidDolphin: A dynamic Android malware
[3] Y. Zhou and X. Jiang, “Dissecting Android malware: Characterization detection framework using big data and machine learning,” in Proc. ACM
and evolution,” in Proc. IEEE Symp. Security Privacy (SP), Conf. Res. Adapt. Convergent Syst. (RACS), Towson, MD, USA, 2014,
San Francisco, CA, USA, May 2012, pp. 95–109. pp. 247–252.
[4] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, [26] V. M. Afonso, M. F. de Amorim, A. R. A. Grégio, G. B. Junquera, and
“Drebin: Efficient and explainable detection of Android malware in your P. L. de Geus, “Identifying Android malware using dynamically obtained
pocket,” in Proc. 20th Annu. Netw. Distrib. Syst. Security Symp. (NDSS), features,” J. Comput. Virol. Hacking Tech., vol. 11, no. 1, pp. 9–17,
San Diego, CA, USA, Feb. 2014, pp. 1–15. 2014.
[5] A. Apvrille and R. Nigam. (Jul. 2014). Obfuscation in Android [27] M. K. Alzaylaee, S. Y. Yerima, and S. Sezer, “EMULATOR vs
Malware and How to Fight Back Virus Bulletin. Accessed: Sep. 2017. REAL PHONE: Android malware detection using machine learning,”
[Online]. Available: https://www.virusbulletin.com/virusbulletin/ in Proc. 3rd ACM Int. Workshop Security Privacy Anal. (IWSPA),
2014/07/obfuscation-android-malware-and-how-fight-back Scottsdale, AZ, USA, Mar. 2017, pp. 65–72.
[6] Y. Jing, Z. Zhao, G.-J. Ahn, and H. Hu, “Morpheus: Automatically gen- [28] M. Lindorfer, M. Neugschwandtner, and C. Platzer, “MARVIN:
erating heuristics to detect Android emulators,” in Proc. 30th Annu. Efficient and comprehensive mobile app classification through static
Comput. Security Appl. Conf. (ACSAC), New Orleans, LA, USA, and dynamic analysis,” in Proc. IEEE 39th Annu. Comput. Softw. Appl.
Dec. 2014, pp. 216–225. Conf. (COMPSAC), 2015, pp. 422–433.
[7] T. Vidas and N. Christin, “Evading Android runtime analysis via sand- [29] D. Gaikwad and R. Thool, “DAREnsemble: Decision tree and rule
box detection,” in Proc. 9th ACM Symp. Inf. Comput. Commun. Security, learner based ensemble for network intrusion detection system,”
Kyoto, Japan, Jun. 2014, pp. 447–458. in Proc. 1st Int. Conf. Inf. Commun. Technol. Intell. Syst., 2016,
[8] T. Petsas, G. Voyatzis, E. Athanasopoulos, M. Polychronakis, and pp. 185–193.
S. Ioannidis, “Rage against the virtual machine: Hindering dynamic [30] A. Balon-Perlin and B. Gambäck, “Ensembles of decision trees for
analysis of Android malware,” in Proc. 7th Eur. Workshop Syst. Security network intrusion detection systems,” Int. J. Adv. Security, vol. 6,
(EuroSec), Amsterdam, The Netherlands, Apr. 2014, p. 5. nos. 1–2, pp. 62–77, 2013.
[9] F. Matenaar and P. Schulz. (Aug. 2012). Detecting Android [31] M. Panda and M. R. Patra, “Ensembling rule based classifiers for
Sandboxes. Accessed: Nov. 2017. [Online]. Available: detecting network intrusions,” in Proc. Int. Conf. Adv. Recent Technol.
http://www.dexlabs.org/blog/btdetect Commun. Comput., 2009, pp. 19–22, doi: 10.1109/ARTCom.2009.121.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[32] A. Zainal, M. A. Maarof, S. M. Shamsuddin, and A. Abraham, [53] A. Mahindru and P. Singh, “Dynamic permissions based Android mal-
“Ensemble of one-class classifiers for network intrusion detection ware detection using machine learning techniques,” in Proc. 10th Innov.
system,” in Proc. 4th Int. Conf. Inf. Assurance Security, 2008, Softw. Eng. Conf., Jaipur, India, Feb. 2017, pp. 202–210.
pp. 180–185, doi: 10.1109/IAS.2008.35. [54] M. Yang, S. Wang, Z. Ling, Y. Liu, and Z. Ni, “Detection of mali-
[33] L. D. Coronado-De-Alba, A. Rodriguez-Mota, and cious behavior in Android apps through API calls and permission uses
P. J. Escamilla-Ambrosio, “Feature Selection and ensemble of analysis,” Concurrency Comput. Pract. Exp., vol. 29, no. 19, 2017,
classifiers for Android malware detection,” in Proc. 8th IEEE Latin Art. no. e4172, doi: 10.1002/cpe.4172.
Amer. Conf. Commun. (LATINCOM), Nov. 2016, pp. 1–6. [55] F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y. Rahulamathavan,
[34] M. K. Alzaylaee, S. Y. Yerima, and S. Sezer, “Improving dynamic anal- “PIndroid: A novel Android malware detection system using ensemble
ysis of Android apps using hybrid test input generation,” in Proc. Int. learning methods,” Comput. Security, vol. 68, pp. 36–46, Jul. 2017.
Conf. CyberSecurity Protect. Digit. Services (Cyber Security), London,
U.K., Jun. 2017, pp. 1–8.
[35] Y. Aafer, W. Du, and H. Yin, “DroidAPIMiner: Mining API-level
features for robust malware detection in Android,” in Proc. 9th Int.
Conf. Security Privacy Commun. Netw. (SecureComm), Sydney, NSW,
Australia, Sep. 2013, pp. 86–103. Suleiman Y. Yerima (M’04) received the B.Eng.
[36] T. Book, A. Pridgen, and D. S. Wallach, “Longitudinal analysis of degree (First Class) in electrical and computer engi-
Android ad library permissions,” in Proc. Mobile Security Technol. neering from the Federal University of Technology,
Conf. (MoST), San Francisco, CA, USA, May 2013. Minna, Nigeria, the M.Sc. degree (with distinction)
[37] M. Hall et al., “The WEKA data mining software: An update,” ACM in personal, mobile, and satellite communications
SIGKDD Explor. Newslett., vol. 11, no. 1, pp. 10–18, Jun. 2009. from the University of Bradford, Bradford, U.K.,
[38] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. and the Ph.D. degree in mobile computing and
Hoboken, NJ, USA: Wiley, 2006, p. 41. communications from the University of South
[39] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Wales, Pontypridd, U.K. (formerly, the University
2001. of Glamorgan), in 2009.
[40] Y. Freund and R. E. Schapire, “Experiments with a new boosting He is a Senior Lecturer of cyber security with
algorithm,” in Proc. 13th Int. Conf. Mach. Learn., Bari, Italy, 1996, De Montfort University, Leicester, U.K. He was a Research Fellow with
pp. 148–156. the Centre for Secure Information Technologies, Queen’s University Belfast,
[41] T. K. Ho, “The random subspace method for constructing decision Belfast, Northern Ireland, where he led the mobile security research
forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, theme, from 2012 to 2017. He was a member of the Mobile Computing
pp. 832–844, Aug. 1998. Communications and Networking Research Group with University of
[42] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document Glamorgan, from 2005 to 2009. From 2010 to 2012, he was with the
Anal. Recognit., 1995, pp. 278–282. U.K.—India Advanced Technology Centre of Excellence in Next Generation
[43] D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2, Networks, Systems and Services, University of Ulster, Coleraine, Northern
pp. 241–259, 1992. Ireland.
[44] K. M. Ting and I. H. Witten, “Issues in stacked generalization,” J. Artif. Dr. Yerima is a member of IAENG professional societies. He is also a
Intell. Res., vol. 10, no. 1, pp. 271–289, Jan. 1999. Certified Information Systems Security Professional and a Certified Ethical
[45] T. Ban, T. Takahashi, S. Guo, D. Inoue, and K. Nakao, “Integration of Hacker. He was the recipient of the 2017 IET Information Security premium
multi-modal features for Android malware detection using linear SVM,” (best paper) award.
in Proc. 11th Asia Joint Conf. Inf. Security, 2016, pp. 141–146.
[46] Z. Ni, M. Yang, Z. Ling, J.-N. Wu, and J. Luo, “Real-time detection of
malicious behavior in Android apps,” in Proc. Int. Conf. Adv. Cloud Big
Data (CBD), Chengdu, China, 2016, pp. 221–227.
[47] Z. Wang, J. Chai, S. Chen, and W. Li, “DroidDeepLearner: Identifying Sakir Sezer (M’00) received the Dipl.Ing. degree
Android malware using deep learning,” in Proc. IEEE 37th Sarnoff in electrical and electronic engineering from RWTH
Symp., Newark, NJ, USA, 2016, pp. 160–165. Aachen University, Aachen, Germany, and the Ph.D.
[48] S. Wu, P. Wang, X. Li, and Y. Zhang, “Effective detection of Android degree from Queens University Belfast, Belfast,
malware based on the usage of data flow APIs and machine learning,” Northern Ireland, in 1999.
Inf. Softw. Technol., vol. 75, pp. 17–25, Jul. 2016. He is currently the Secure Digital Systems
[49] M.-Y. Su, J.-Y. Chang, and K.-T. Fung, “Machine learning on merg- Research Director and the Head of network
ing static and dynamic features to identify malicious mobile apps,” in security research with the School of Electronics
Proc. 9th Int. Conf. Ubiquitous Future Netw. (ICUFN), Milan, Italy, Electrical Engineering and Computer Science.
Jul. 2017, pp. 863–867. Queens University Belfast. He is also the Cofounder
[50] N. Milosevic, A. Dehghantanha, and K.-K. R. Choo, “Machine learning and the CTO of Titan IC Systems, Belfast. His
aided Android malware classification,” Comput. Elect. Eng., vol. 61, research is leading major (patented) advances in the field of high-performance
pp. 266–274, Jul. 2017. content processing and is currently commercialized by Titan IC Systems. He
[51] W. Wang, Y. Li, X. Wang, J. Liu, and X. Zhang, “Detecting Android has co-authored over 120 conference and journal papers in the areas of high-
malicious apps and categorizing benign apps with ensemble of classi- performance network, content processing, and system on chip.
fiers,” Future Gener. Comput. Syst., vol. 78, pp. 987–994, Jan. 2017. Prof. Sezer was a recipient of number of prestigious awards, including
[52] X. Wang et al., “Characterizing Android apps’ behavior for effec- InvestNI, Enterprise Ireland and Intertrade Ireland innovation and Enterprise
tive detection of malapps at large scale,” Future Gener. Comput. Syst., Awards, and the InvestNI Enterprise Fellowship. He is a member of the IEEE
vol. 75, pp. 30–45, Oct. 2017. International System-on-Chip Conference Executive Committee.