PermPair Android Malware Detection Using Permission Pairs

1968 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL.
15, 2020
PermPair: Android Malware Detection

Using Permission Pairs
Anshul Arora , Sateesh K. Peddoju , Senior Member, IEEE, and Mauro Conti , Senior Member, IEEE
Abstract— The Android smartphones are highly prone to platform [2]. This indicates that Android is evidently the major
spreading the malware due to intrinsic feebleness that per- target, with nearly 5000-6000 malware samples attacking
mits an application to access the internal resources when the them every 14 seconds, and the figure touched 3.5 million
user grants the permissions knowingly or unknowingly. Hence,
the researchers have focused on identifying the conspicuous malicious samples in 2017 and is expected to rise to 25 mil-
permissions that lead to malware detection. Most of these per- lion by 2019 [3]. This trend shows how malware attackers
missions, common to malware and normal applications present are relentlessly developing new malware samples targeting
themselves in different patterns and contribute to attacks. There- smartphones, especially Android. Apart from the traditional
fore, it is essential to find the significant combinations of the drive-by-downloads way of infecting the system, malware is
permissions that can be dangerous. Hence, this paper aims to
identify the pairs of permissions that can be dangerous. To the also injected into smartphones through repackaging and update
best of our knowledge, none of the existing works have used the attacks. The threats posed by mobile malware include financial
permission pairs to detect malware. In this paper, we proposed loss to users, information leakage, system damage, and mobile
an innovative detection model, named PermPair, that constructs bots [4].
and compares the graphs for malware and normal samples by The increase in the number of Android malware attacks is
extracting the permission pairs from the manifest file of an
application. The evaluation results indicate that the proposed mainly from three major sources: (a) App markets, an easy
scheme is successful in detecting malicious samples with an distribution gateway for malware developers; (b) Users, drive-
accuracy of 95.44% when compared to other similar approaches by-downloads, and (c) Developers, weak code.
and favorite mobile anti-malware apps. Further, we also proposed The design of the Android platform secures the system by
an efficient edge elimination algorithm that removed 7% of the restricting the access to local resources by the applications
unnecessary edges from the malware graph and 41% from the
normal graph. This lead to minimum space utility and also 28% (apps) using the permission constraints. A user is prompted
decrease in the detection time. with the list of permissions during the installation of an
application. This list is supposed to alert the users about
Index Terms— Android malware, android security, malware
detection, permissions pair graph, smartphone security. the resources that the application accesses. Most of the users
ignore them and grant the permissions liberally. They do not
I. I NTRODUCTION have adequate expertise to understand the significance of these
permissions, and the harm caused by them if any [10]. This
S MARTPHONES have gained popularity with the presence
of feature-rich apps which provide services like social
networking, online banking, online gaming, and location-
weakness of the users drew the attention of the attackers.
Consequently, the researchers aim to analyze the permissions
based services, in addition to the conventional services like for detecting malicious behavior.
phone calls and messaging. A report [1] shows that there is Several related works such as [5], [8], [14]–[16], [22],
tremendous growth in smartphone sales, with 82% Android [23], [25] have used permissions to detect Android mal-
smartphone users. Rising popularity has made them suscepti- ware. They have analyzed the permissions in malicious apps,
ble to malware attacks. The year 2013 recorded 1,45,000 new detected during the period, 2010-2012. These studies have
malware samples, with 97% of them targeted towards Android examined the permissions of normal and malicious samples,
and have reported that most of the top permissions in malware
Manuscript received August 9, 2018; revised December 11, 2018, and normal apps are quite similar, and hence not distin-
April 3, 2019, May 17, 2019, July 14, 2019, and August 26, 2019; accepted
October 11, 2019. Date of publication October 29, 2019; date of current guishable. For instance, the authors in [14] reported that
version January 30, 2020. The associate editor coordinating the review of this the top five permissions in both the categories are exactly
manuscript and approving it for publication was Prof. Wojciech Mazurczyk. the same. Therefore, it is important to find the vital com-
(Corresponding author: Anshul Arora.)
A. Arora is with the Discipline of Mathematics and Computing, binations of the permissions present in malware and normal
Delhi Technological University Delhi, Delhi 110042, India (e-mail: samples. None of the previous works, except the one proposed
[email protected]). in [22], have aimed at finding the permission patterns that
S. K. Peddoju is with the Department of Computer Science and Engineering,
IIT Roorkee, Roorkee 247667, India (e-mail: [email protected]). can launch any malicious activity. They have also identified
M. Conti is with the Department of Mathematics, University of Padua, risky permission patterns in malware samples. However, they
35122 Padova, Italy, and also with the Department of Electrical Engineer- did not find the permission patterns that occur prominently in
ing, University of Washington, Seattle WA 98195 USA (e-mail: conti@
math.unipd.it). normal apps. Moreover, they did not propose any detection
Digital Object Identifier 10.1109/TIFS.2019.2950134 model.
1556-6013 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on July 19,2022 at 07:21:07 UTC from IEEE Xplore. Restrictions apply.
ARORA et al.: PERMPAIR: ANDROID MALWARE DETECTION USING PERMISSION PAIRS 1969
Motivation: We believe that the pairing of dangerous per- PermPair model is in section III. A report of results and
missions together can be effective in detecting malicious apps. findings of the proposed work is presented in section IV.
For instance, to leak device-specific information to the server, Finally, section V concludes with the scope of future work.
an application requires only two permissions: INTERNET
and READ_PHONE_STATE. This permission pair, alone,
II. BACKGROUND AND R ELATED W ORK
is dangerous and can launch a malicious activity. However, to
evade the detection, malware developers may supplement some This section, initially, presents a brief description of
additional permissions [11]. The presence of such dangerous Android permission system followed by a critical review of
permission pairs can help detect malicious behavior. Therefore, the studies that have been proposed for Android malware
this work aims to analyze permissions in a group of two detection. In the end, a summary of the important takeouts
and proposes a new methodology to find such pairs that from the review of the literature is presented.
can distinguish normal and malicious samples. The following
research questions emerge in the light of permission pair
analysis: A. Background
1) How to represent the permission pairs extracted from Every Android application1 consists of AndroidManifest.xml
the applications? file having permissions and other parameters required by the
2) How to build a detection model using permission pairs? application. The user receives this list of permissions for
3) Is there any change in the permission pairs of malware additional resources at the time of installation. Once the user
samples over a period of time? grants all the permissions, the app gets installed. The Java code
4) What are the top dangerous permission pairs present in of the application houses, possibly, the malicious component
malicious samples but not in benign ones? of the malware samples. If the manifest file has the required
5) What are the top permission pairs present in normal apps permissions, it invokes the API calls in the code. This is the
and how are they different from malicious samples? primary reason why permissions have been the most used static
We are motivated to answer these questions with a vision feature in Android malware detection.
to develop an Android malware detector based on permission
pairs. We present PermPair: Permission Pair Based Android
B. Related Work
Malware Detection model, based upon permissions extracted
from the manifest file of the applications. We use the graph Android malware detection is broadly categorized into three
data structure to represent these permission pairs. Our detec- types: Static, Dynamic, and Hybrid Detection. This section
tion results are relatively better than the mobile anti-malware reviews all these detection types published in the literature in
apps which we evaluate against the same dataset of malicious the following subsections.
apps. The work proposed in this paper employs a mix of old 1) Static Detection: Static solutions aim to analyze the
and recent datasets for evaluation. app’s manifest file components, Java code or the sequence
Contributions: The contributions of this research are of API calls within the code. These related works can fur-
highlighted below: ther be sub-divided into six categories: Permissions Analysis,
• Built the permission pair graphs for different malware Permissions Based Malware Detection, Permission Pattern
datasets and analyzed the impact of the permission pairs Analysis of Android Malware, Manifest File-Based Detection,
on both, old and recent, malicious apps. Permission Graph Analysis, and API Calls Based Detection.
• Proposed a novel algorithm to merge graphs of different a) Permissions analysis: Some of the earlier works
malware datasets to construct a single final malware graph like [8], [9] analyzed permissions to detect malicious behavior
of permission pairs (named as Malicious-Graph (G M )). within the normal apps. Grace et al. [8] evaluated potential
Similarly, a separate permission pair graph of normal apps risks associated with in-app advertisement libraries by ana-
known as Normal-Graph (G N ) was also established. lyzing permissions and API calls. Kirin [9] model developed
• Designed an algorithm to detect malicious apps by com-
the security rules to identify the risky applications based upon
paring both malicious (G M ) and normal (G N ) graphs. permission combinations.
Holavanalli et al. [12] analyzed the cross-app, i.e., colluding
• Compared the detection results with that of widely used
apps, flow permissions to identify the interaction of apps with
mobile anti-malware apps and other similar defense
each other. Grace et al. [13] identified the permission leaks,
mechanisms proposed in the literature. Concluded that
i.e., several permissions that protect access to sensitive user
the proposed approach is more effective in detecting
data are unsafely exposed to other apps.
malicious apps.
In all of these studies, the authors have analyzed permissions
• Performed edge elimination to remove insignificant per-
within the normal apps to look for any signs of dangerous
mission pairs from both the graphs, to reduce the size of behavior. They did not consider the malware samples in their
the graphs and the detection time. analysis. However, we aim to find the dangerous permission
Organization: The rest of the paper is organized as follows: pairs found in malware samples by analyzing different mal-
Section II provides the background knowledge of Android ware datasets.
permissions, and related work proposed in the literature for
Android malware detection. The discussion of the proposed 1 https://developer.android.com/guide/topics/manifest/manifest-intro
1970 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 15, 2020
b) Permissions based malware detection: Sanz et al. [14] TABLE I

extracted top permissions in malicious and normal apps using C OMPARISON OF P ROPOSED W ORK W ITH S TATE - OF - THE -A RT
S TATIC M OBILE M ALWARE D ETECTION M ODELS
machine learning classifiers. MAMA model [15] extracted not
only the permissions but also other features of the manifest
file. In both these works, the authors focused on extracting
the widely used permissions by malware and normal apps.
Talha et al. [16] extracted permissions from the apps. For
each permission, they calculated the score based upon the
number of malware containing the permission, to the total
number of malware. Tao et al. [17] analyzed permissions,
APIs, and the correlation between them to detect malware.
Cen et al. [18] applied the probabilistic discriminative
model on decompiled source code and the permissions for
detecting malicious samples. In similar lines, Peng et al. [19] malware samples like Koodous dataset include only the per-
applied probabilistic generative models like Naïve Bayes for missions, not the other components from their manifest files.
evaluating risks of the Android apps based upon the permis- e) Permission graph analysis: Solokova et al. [28] used
sions requested. Few recent studies like [20], [21] applied graphs to represent the correlation between the permissions
permissions to detect Android malware. of normal apps. They grouped the apps of the same category
However, none of these works discussed the significant and calculated metrics like node degree, weighted degree, and
presence of permission patterns in malware samples that can page rank score for each graph. However, they considered only
launch any malicious activity. In comparison to these studies, the categorized normal apps and not the malicious samples.
our work focuses on analyzing which permission patterns, The proposed work in this paper considers both malware and
in pairs, are significantly present in normal and malicious apps. normal apps to construct separate permission pair graphs.
c) Permission patterns of android malware: Few studies Zhu et al. [29] built a system to evaluate the risks involved
like [22]–[24] have focused on the analysis of the permission in an app based on the permissions. They created a bipartite
patterns found in Android malware. In [22], permission pat- graph, where one set of vertices constitute the apps and the
terns mining algorithm is applied to identify dangerous permis- other set constitute the permissions. There was no relation
sion patterns. However, the model did not consider the normal between the permissions, unlike in our proposed work.
apps and did not present any model for detection. Similarly, f) API calls based detection: Fan et al. [30] formed API
the model presented in [23] used the feature ranking methods calls based frequent subgraphs to classify Android malware
to rank the risky permissions. It is identified that SVM gives into their corresponding families. Zhang et al. [31] built
high accuracy when the model uses a set of 40 permissions, dependency graphs of the API calls and applied similarity
and Random Forest gives better results with ten permissions. metrics to classify malware into their families. The Stowaway
DroidRanger [24] model identified that a few permis- model in [32] determined the set of API calls used by an
sion combinations like SEND_SMS,RECEIVE_SMS, and app and mapped them with the permissions. Apposcopy [33]
INTERNET,RECEIVE_SMS help in detecting malware. model aimed to detect the apps that steal the user information,
However, it does not discuss in detail which specific permis- by analyzing control-flow and data-flow properties of the
sion pairs are prominently present in normal or malicious apps. apps. Elish et al. [34] focused on user-trigger dependence and
Having realized that a large permission set is too complex sensitive APIs to detect malware.
to analyze, the proposed model selects permission pairs for Often, if the APIs used are not related to any manifest file
analysis and detection. Unlike the results reported in [23] component, they act as noise in the detection process.
and [24], in our study, a thorough examination of which Table I compares the existing static works with the proposed
permission pairs exist in normal and malicious apps is made, work. Most of the works have used outdated malware samples
and a detection model that was missing in [22] is proposed. for their experiments and have not analyzed dangerous permis-
d) Manifest file based malware detection: Few studies sion patterns in malware apps. They have also not analyzed the
like [5], [25]–[27] analyzed manifest file components in detection rate of their proposed model on unknown samples.
addition to the permissions. DroidMat [25] model analyzed 2) Dynamic Detection: Dynamic solutions intend to analyze
permissions, intents, components, and API calls, and applied the run-time behavior of the applications. Related works in
K-means clustering for detection. Arp et al. [5] used features this field mainly fall into two aspects: OS-level Detection and
like permissions, hardware components, API calls, and net- Network-level detection.
work addresses to detect malware. The work proposed in [26] a) OS-level detection: TaintDroid [35] model, based on
used permissions and intents to detect malware. Kim et al. [27] dynamic taint analysis, tracked the flow of privacy-sensitive
formed a vector of the apps consisting of static features such information through third-party apps. Many systems such
as manifest file components, strings, and API calls. as [36] are built on TaintDroid. TaintART [37] detected the
The proposed work in this paper employs only significant privacy leakage from the apps on the Android Run Time
permission pairs, instead of extracting all the components (ART). Yang et al. [38] extended the TaintDroid model to
from the manifest file, to reduce the substantial computation detect the data leaks from the apps and also determine whether
overhead. It is also observed in our studies that many recent the leak is due to user intention or not. All these works focused
TABLE II TABLE III

C OMPARISON OF P ROPOSED W ORK W ITH S TATE - OF - THE -A RT C OMPARISON OF P ROPOSED W ORK W ITH S TATE - OF - THE -A RT
DYNAMIC M OBILE M ALWARE D ETECTION M ODELS H YBRID M OBILE M ALWARE D ETECTION M ODELS
on analyzing data-leaks from the apps, rather than detecting

by generating static and dynamic graphs from manifest file
malicious apps.
components, and system calls respectively.
Shabtai et al. [39] analyzed dynamic features such as the
Xia et al. [49] performed static APIs analysis and dynamic
CPU usage, number of running processes, and the number
bytecode analysis to detect the data leaks from the apps.
of packets sent through Wi-Fi to detect malware. Copper-
Riskranker [50] analyzed dynamic features like run-time
Droid [40] model analyzed system calls of malware samples
Dalvik code loading, and static features like permissions to
and described whether the malicious behavior is initiated from
detect malware. Arora et al. [51], [52] presented two different
Java, JNI or native code execution. Afonso et al. [41] analyzed
hybrid models by combining traffic features and permissions.
dynamic API calls and system calls to detect malicious apps.
Table III compares the existing hybrid models with the
All these solutions have relatively high computational over-
proposed work. All these approaches also suffer from the
heads compared to static solutions. Moreover, stealthy mal-
drawbacks of the dynamic solutions, i.e., high computational
ware samples try to evade such detection by gaining awareness
overheads. Hence, the proposed work aims to analyze static
about the simulation environment. Hence, we focus on static
permission pairs to detect Android malware. Additionally,
permissions based detection.
it also analyzes and detects the recent unknown malware
b) Network-level detection: Wang et al. [42] applied
samples.
Natural Language Processing methods on the HTTP headers
to detect malicious apps. Shabtai et al. [43] applied machine
III. P ROPOSED P ERM PAIR D ETECTION
learning algorithms on traffic features to generate the patterns
of normal traffic and used those patterns to detect malicious Analyzing permissions in a group of more than two can
apps. Chen et al. [44] analyzed network traffic of malicious be complex, as it can lead to a high number of permission
samples and found that the majority of those samples gener- patterns. If an application contains N number of permissions,
ated their malicious traffic within the first five minutes. and we want to analyze permission patterns in the group
In one of our previous works [45], we deployed Android of R, then the total number of permission patterns will be
N C , i.e., N!
emulator to capture the network traffic of malicious and normal R . Increasing the value of R from two
apps. Out of 16 network traffic features, 7 of them were R!(N − R)!
will result in more number of permission patterns or will
found to distinguish normal and malicious traffic. In our other
be complex to represent. Therefore, we focus on analyzing
work [46], we used smartphones rather than emulators to
permission pairs.
capture the traffic. We did not observe any distinguishable
In this section, we present our proposed novel PermPair
features. Therefore, we minimized the number of features
model for detecting malicious Android apps.
required to detect Android malware by ranking them.
All such network traffic based solutions detect only those
samples that have network connectivity. For instance, any A. System Design
malware that only sends SMS in the background, will not In order to conduct extensive analysis on permission pairs,
generate any network traffic. The proposed approach in this we considered malware samples from three different sources:
paper, however, can be used to detect such type of malware. Genome [4], Drebin [5], and Koodous [6]. Besides, we down-
Table II compares the existing dynamic models with the pro- loaded normal Android apps from the Google Play Store.
posed work. All the dynamic solutions have feature extraction The proposed model consists of three phases as shown
overheads, and none of them except one has evaluated their in Figure 1. The first phase, Graph Construction phase,
model on unknown samples. Moreover, the malware dataset extracts permission pairs from each application to form a
used by other works is relatively older, whereas we have used graph. Four different permission pair graphs: Genome Graph
recent malware samples for our experiments. (G G ), Drebin Graph (G D ), Koodous Graph (G K ) and Normal
3) Hybrid Detection: Hybrid solutions aim to com- Graph (G N ) are built during this phase. Typically, every
bine static and dynamic components to detect malware. new dataset used in the analysis and detection needs the
Saracino et al. [47] analyzed various dynamic and static construction of the graph.
features such as system calls, API calls, user activity logs, The next phase, Graph Merge phase, deals with the merging
and permissions. Sun et al. [48] proposed a hybrid model of all malicious graphs into a single malicious graph (G M ).
Algorithm 1 Graph Construction Algorithm

1: Input: Set of Applications (N) in a dataset (D)
2: Output: Permission Pair Graph G(V,E)
3: Initially V ← ø and E ← ø
4: for ∀ N ∈ D do
5: Extract its Manifest File M using apktool
6: Let P = ( P1 , P2 , P3 ...Pm ) be set of permissions in M
7: if (|P| == 0 or |P| ==1) then
8: V ← V + ø and E ← E + ø zero or single
permission
9: else
10: For every permission Pi ∈ M, create a node V i if
Vi ∈ /V
Fig. 1. System design of PermPair detection model.
11: Add an edge between every permission pair
(Pi ,Pj ) derived from M
12: if (Pi , Pj ) ∈/ E then
Hence, the final model consists of two graphs: a normal 13: W (Pi , Pj ) = 1
graph (G N ) and a malicious graph (G M ). The subsequent 14: else
phase proposes a detection algorithm to distinguish normal 15: W (Pi , Pj ) ++
and malicious apps. Finally, the model eliminates the irrelevant 16: end if
edges, that do not affect the detection results, from both G M 17: end if
and G N , to optimize it. 18: end for
1) Graph Construction Algorithm: The proposed model 19: Divide each edge weight by N
uses graphs to represent the relationship between various per- 20: Return Graph G(V,E)
missions in each application. The model applies the Algorithm
1 to construct a graph G(V, E), by representing a unique
permission at vertex V and a pair of permissions connected as taking their average, minimum or maximum. However,
by a weighted edge E. Each edge weight increases with we have adapted the multi-objective optimization approach for
the occurrence of the same permission pair from different the reasons explained below.
applications. An application with single or no permission does The important merge criteria, to find the weights for com-
not contribute to any analysis or detection. mon edges are (i) a high number of true positives, and
The algorithm extracts the permissions from each appli- (ii) a low number of false positives. This is similar to a multi-
cation and generates all the permission pairs. If an app has objective optimization where objectives are conflicting, i.e., no
N number of unique permissions, then the algorithm creates single solution can give optimum value for all the objectives.
N number of vertices, one for each permission. The number Weighted sum method is the most suited classical solution for
of possible
permission pairs added to the graph among all solving such problems in which we assign weights to all the
is N2 . For instance, in an app, if there are two permissions objectives, as shown in Equation 1.
P1 and P2 , then, an edge, with an edge weight of one is created
between them if it did not exist earlier. Otherwise, edge weight minimize/maximize F(x) = [ f 0 (x), f 1 (x), . . . , f k (x)],
x x
is incremented by one, as presented in Algorithm 1.
subject to gi (x) ≤ bi , i = 1, . . . , m. (1)
Once we process all the apps in a dataset, we divide each
edge weight by the total number of apps in that dataset where f0 , f1 , . . . , fk are k objective functions to be opti-
to normalize weights of permission pairs between different mized, with m constraints. Weighted sum method applies
graphs. This phase answers our research question 1, describing scalar weights α i for each objective such that the sum of all
how to represent the permission pairs extracted from the apps. weights is one. Then, each objective is multiplied with its
After generating separate Genome(G G ), Drebin(G D ) and corresponding weight and all of them are added to form a
Koodous(G K ) graphs, they are merged into a single graph single objective function which is to be optimized, as shown
(G M ) to represent a common malicious graph. in Equation 2.
2) Graph Merge Algorithm: Graph Merge Algorithm,

k
k
Algorithm 2, merges three malware graphs (G G , G D , G K ) minimize Z , where Z = α j .fj and α j = 1,
to form a common malware graph (G M ) by combining their x
j=1 j=1
edges. There are two types of edges: common and disjoint. subject to gi (x) ≤ bi , i = 1, . . . , m. (2)
Inserting all disjoint edges along with their edge weights
to G M is straightforward. The challenge lies in deciding the Graph merging problem has two objective functions: one for
weights of the common edges. Consider a common edge maximizing true positives, say F1 , and other for minimizing
e, with weights ew1 , ew2 , and ew3 in G G , G D , and G K false positives, say F2 . The weighted sum method requires
respectively. There are different possible ways the weight weights for both the objectives. Consider an edge e having
of edge e can be considered in the final graph G M such weights ew1 (from G G ), ew2 (from G D ), ew3 (from G K ), and
ewn (from G N ) respectively. The weight assigned to function ei in G M . The Equation 4 denotes functions F1 and F2 .
F1 , say α1 , is the ratio of the percentage of malware samples
containing that edge e to the percentage of the total number F1 = (m i − wn )(w1 + w2 + w3 ), F2 = (wn − m i )(wn ).
of samples containing e, as mentioned in Equation 3. (4)

ewi where w1 , w2 , w3 , and wn denotes the weights for ei in
α1 = i . (3)
i wi + ewn
e G G , G D , G K , and G N respectively. The probability of a
sample to be classified as malware increases when the weight
Here i ewi is the sum of edge weights of any edge e in of any permission pair in G M is higher than G N . If m i is
all the malicious graphs and ewn represents its edge weight in greater than wn , then the probability of (w1 + w2 + w3 )
the normal graph. percentage of samples to be classified as malware increases
Similarly, the weight assigned to function F2 , say α2 , is the by (m i − wn ). If wn is more than m i , the probability of wn
ratio of the percentage of normal samples containing that edge percentage of samples to be classified as normal increases by
e to the percentage of the total number of samples containing (wn − m i ). Hence, we choose m i such that both F1 and F2
the edge e. are optimized. No single value of m i can optimize both the
Algorithm 2 describes the method to merge three mal- objectives. Therefore, the weighted sum method is opted to
ware graphs. We divide the complete edge set into disjoint solve, which reduces this to single-objective optimization as:
and collectively exhaustive sets called Common edge−set and
Di s j oi ntedge−set . We place the edges common in two or three Maxi mi ze Z = α1 (m i − wn )(w1 + w2 + w3 )
graphs in Common edge−set . We place the remaining edges in + α2 (wn − m i )(wn ), (5)
Di s j oi ntedge−set and then add them forthwith to G M .
subject to the three constraints:
Algorithm 2 Graph Merge Algorithm {α1 + α2 = 1, m i <= max(w1, w2 , w3 ), and
1: Input: Three separate malware graphs: G G (Vg ,E g ),
m i >= mi n(w1 , w2 , w3 )}. (6)
G D (Vd ,E d ) and G K (Vk ,E k )
2: Output: Final Malware Graph G M (Vm ,E m ) For every edge ei in Common edge−set , we calculate its
3: Distribute edge set of all three graphs in two subsets: weight m i using equations 5 and 6. Then we add ei and its
Common edge−set and Di s j oi ntedge−set . weight m i to the final malware graph G M .
4: for each edge ei ∈ Di s j oi ntedge−set do Let us consider an example where there is a common edge,
5: E m ← (ei , W (ei )) say E C , in all the three malware graphs and the normal graph.
6: end for Suppose E C has weights w1 , w2 , and w3 in the three malware
7: for each edge e j ∈ Common edge−set do graphs and wn in the normal graph respectively. We apply
8: w1 ← weight of e j ∈ G G Algorithm 2 to find the weight of E C in G M . There can be
9: w2 ← weight of e j ∈ G D three possibilities:
10: w3 ← weight of e j ∈ G K • If the edge weight of E C in all the three malware graphs
11: wn ← weight of e j ∈ G N is higher than that in the normal graph, Algorithm 2
12: if minimum{w1 ,w2 ,w3 } > wn then (Steps 12-14) gives Minimum(w1 , w2 , w3 ) as the weight
13: W (e j ) ← minimum{w1,w2 ,w3 } for the edge E C in G M .
14: E m ← (e j , W (e j ))
• If the edge weight of E C in all the three malware graphs
15: else if maximum{w1 ,w2 ,w3 } < wn then
is lower than that in the normal graph, Algorithm 2
16: W (e j ) ← maximum{w1,w2 ,w3 }
(Steps 15-17) gives Maximum(w1 , w2 , w3 ) as the weight
17: E m ← (e j , W (e j ))
for the edge E C in G M .
18: else
19: Let m j be weight of edge e j in final graph G M • Let w1 , w2 , w3 , and wn be 0.6, 0.7, 0.9, and 0.8 respec-
20: Solve for m j the following optimization problem: tively, i.e., edge weight of E C in G N lies between
21: Maximize Z = α1 (m j −wn )(w1 +w2 +w3 )+α2 (wn − Minimum(w1 , w2 , w3 ) and Maximum(w1, w2 , w3 ).
m j )(wn ); Algorithm 2 (Steps 18-23), formulates the following
Subject to constraints: α1 +α2 = 1 and m j <= multi-objective optimization problem:
max(w1, w2 , w3 ) and m j >= mi n(w1 , w2 , w3 ) Maxi mi ze Z = α1 (m i − 0.8)(0.6 + 0.7 + 0.9)
22: W (e j ) ← m j
23: E m ← (e j , W (e j )) + α2 (0.8 − m i )(0.8), (7)
24: end if where m i is the required weight of edge E C in G M ,
25: end for
26: Return G M (Vm ,E m ) 0.6 + 0.7 + 0.9
α1 = = 0.73, (8)
0.6 + 0.7 + 0.9 + 0.8
We apply the weighted sum method to find the weight for and
the common edges. Let m i be the weight of a common edge
α2 = 1 − 0.73 = 0.27. (9)
Hence, multi-objective optimization problem becomes:
Maxi mi ze Z = 0.73(m i − 0.8)(2.2)

+ 0.27(0.8 − m i )(0.8), (10)
Fig. 2. Phases of edge elimination.
Solving the above equation, keeping the Equation 6 in
mind, gives the value of m i as 0.9. Therefore, the weight
aforesaid edges from the graphs. We call this procedure as
of the edge E C in G M turns out to be 0.9.
Edge Elimination.
This graph merge procedure is not required for G N because Figure 2 describes the Edge Elimination mechanism that
we get only one normal graph from the Algorithm 1. generates Reduced Malicious and Normal graphs. The proce-
dure consists of two phases: 1) Irrelevant Edges Identification,
Algorithm 3 Detection Algorithm 2) Finding Maximal Set of Irrelevant Edges (M S ).
1: Input: Set of Applications ( A 1 , A 2 ,.....,A N ) to be Tested a) Irrelevant edges identification: For each unique edge
2: Output: Outputs each application as Malware or Normal existing in the union of G M and G N , we delete one edge,
3: for each A i do common or disjoint, at a time from both the graphs and apply
4: Extract its Manifest File M using apktool Algorithm 3 to check the detection accuracy. Only if both
5: Let P = ( P1 , P2 , P3 ...Pm ) be set of permissions in M TPR and TNR do not decrease, then the edge is considered
6: if (|P| == 0 or |P| ==1) then to be inappropriate, and it is added to the set say Ereduced .
7: Return zero or single permission Suppose an irrelevant edge has weight λ and μ in G M and
8: else G N respectively, then its weight difference from both the
9: Malscore =0; Nor m score =0 graphs comes out to be | λ − μ |. We define the Sum of
10: for Every permission pair (Pi , P j ) ∈ M do Weight Differences (SWD) of all these irrelevant edges as β.
11: Malscore + = W (Pi , Pj ) ∈ G M We insert all such weight differences for every irrelevant edge
12: Nor m score + = W (Pi , Pj ) ∈ G N of Ereduced in a separate list say Elist , sorted in increasing
13: end for order. Now, we need to find the maximal possible set of edges,
14: if Mali ci ousscore > Nor malscore then which is the subset of Elist , which we can remove from the
15: Return Ai as malware graphs that do not lower the TPR or TNR.
16: else b) Find maximal set of irrelevant edges: We aim to find
17: Return Ai as normal the maximal set of irrelevant edges that can be eliminated
18: end if from the graphs. To begin with, we delete all the irrelevant
19: end if edges, i.e., we delete the edges whose SWD is β and check
20: end for the detection rate. If both TPR and TNR do not decrease, then
we eliminate all the irrelevant edges and the algorithm ends.
However, if it results in lowering of TPR or TNR, then we need
3) Malicious App Detection Algorithm: Algorithm 3
to find the maximum possible set of edges that is a subset of
describes the procedure that determines whether the app is
Elist , which we can remove from the graphs.
malicious or not. For every testing app, the algorithm extracts
We apply a technique similar to the Bisection method [53]
its permission pairs using Algorithm 1. The procedure uses two
to find the maximum possible subset. We set a lower limit,
scores Nor malscore and Mali ci ousscore for the testing apps.
say δ1, initialized to zero, and an upper limit, say δ2, equal
These scores can be calculated by searching every permission
to β. Our objective is to find δmax, between δ1 and δ2, such
pair of the testing app in both G M and G N and adding
that TPR and TNR remain same or higher on deleting all the
the corresponding weights. If Mali ci ousscore is greater than
edges whose weight differences summed up to δmax.
Nor malscore then the procedure declares the app as malicious.
Initially, we set δmax equal to δ2. If, on deleting all
This technique is based on the fact that higher the weight of a
the edges of set Elist from the graphs, either TPR or TNR
permission pair in the malicious graph, higher is the chance
decreases, we half the upper limit, i.e. δ2 = β2 whereas the
of any app containing that permission pair to be malicious.
lower limit δ1 remains zero. The procedure terminates when
This phase answers our research question 2, describing the
both δ1 and δ2 converge to the same value, which we call as
detection model using permission pairs.
δmax. We delete the edges from the graphs whose SWD sums
4) Edge Elimination: The graphs, G M and G N , may contain
up to δ2 and test the detection results. Two cases may arise:
several edges that do not contribute to the detection such β
as the edges that have similar weights in both G M and • If the detection rate does not decrease, then 2 is the new
G N . We believe the removal of such edges may not alter lower limit and the previous value of δ2 becomes the new
the detection results. Consider an edge e1 that has weight upper limit.
0.1 in both the graphs. It is probable that the deletion of e1 • If the detection rate decreases, then zero remains the
from both the graphs may not alter the accuracy because e1 lower limit and β4 is the new upper limit.
contributes equally to Mali ci ousscore and Nor malscore . If on The above steps repeat until δ1 = δ2 and we name that
the removal of such edges, True Positive Rate (TPR) and True point as δmax. We finally remove those set of edges, whose
Negative Rate (TNR) do not decrease, then we can delete the weight difference sums up to the δmax, from G M and G N .
Fig. 3. Example of edge elimination from the graphs.
The following example, along with Figure 3, demonstrates TABLE IV

the working of edge elimination. For simplicity, we consider A NDROID M ALWARE D ATASET
four irrelevant edges, found at the end of phase 1. Let
these edges between the permission pairs be E 1 , E 2 , E 3 ,
and E 4 . Let E 1 has a weight of 0.6 in G M and 0.5 in G N
respectively. Hence, its weight Difference from both the graphs
is: | 0.6−0.5 |= 0.1. Similarly, let us assume weight difference
for the edges E 2 , E 3 , and E 4 is 0.2, 0.3 and 0.4 respectively.
We arrange these irrelevant edges in increasing order of weight
difference.
Deletion of these edges, individually, from G M and G N ,
does not decrease the detection rate. We assume that the TABLE V
deletion of all the four irrelevant edges from the graphs VARIOUS ATTACK C ATEGORIES OF A NDROID M ALWARE
decreases the detection rate. Hence, phase 2 of the procedure
is required to find the maximal set of irrelevant edges.
Sum of Weight Differences (SWD) from the Elist comes out
to be β = 0.1 + 0.2 + 0.3 + 0.4 = 1.0. According to the
procedure, the lower limit δ1 is set to zero, the upper limit
δ2 = β = 1.0, and δmax = δ2 = 1.0. Since it was assumed IV. R ESULTS AND D ISCUSSION
that deletion of all the four edges decreases the detection rate, This section reports the results of each phase of the
in the next iteration, we decrease the upper limit δ2 from β PermPair model. We performed all the experiments on a
to β2 , i.e., new δ2 is 0.5. Then we delete all the irrelevant desktop system with Ubuntu 12.04 OS and 8 GM RAM
edges whose SW D does not exceed δ2, i.e., 0.5. In this and used apktool to extract manifest files of the applications.
example, SWD for the edges E 1 and E 2 is 0.1 + 0.2 = 0.3. We wrote shell scripts to analyze all the phases.
Therefore, we delete only the edges E 1 and E 2 from both the Datasets: In order to perform extensive analysis, we con-
graphs, and check for detection results. Further two cases may sidered a total of 7533 malware apps from three different
arise: sources. We used 2944 samples for training during the Graph
• If the detection rate decreases, then we further lower the Construction phase and 3264 samples for testing the accuracy
upper limit from β2 to β4 , i.e., new δ2 is 0.25. The lower of the proposed model, as shown in Table IV. For the detection
limit remains as zero. Then we find the irrelevant edges phase, we divided the malware samples into two types: Known
whose SW D does not exceed 0.25. In this case only edge and Unknown samples. Consider a typical malware family
E 1 is selected for deletion from graphs. AnserverBot, say with a total of 180 samples. We used 100 of
• If the detection rate does not decrease, then δ1 is its samples in the training phase and remaining in the testing
increased from zero to β2 , δ1 = 0.5 and δ2 is increased phase. Though the malware family is the same, the samples
from β2 to β, i.e., δ2 = 1.0. All the irrelevant edges are different in the detection phase. Hence, we named them as
whose SW D lie in the range from δ1 to δ2 are selected Known samples. Table IV summarizes the number of Known
for deletion. We further test the detection rate by deleting samples from each dataset used for detection. Similarly,
the corresponding edges from both the graphs. the Unknown samples are the ones whose family has not
At every iteration, we check for the detection results after been considered in the graph construction phase. We used
deleting some edges from the graphs, and we repeat the steps 1325 Unknown samples to test the detection accuracy of our
to decide for the values of δ1 and δ2. The procedure terminates proposed model. All such unknown samples were in the year
when δ1 = δ2. Finally, we eliminate all the edges whose SW D 2018. We collected the diverse malware samples covering
does not exceed δ2 and return the reduced graphs. various attack categories as summarized in Table V.
TABLE VI TABLE IX
N UMBER OF N ORMAL A PPS U SED (C ATEGORY-W ISE ) D IFFERENT M ERGING S CHEMES TO G ET G M AND T HEIR
C OMPARISON OF D ETECTION R ESULTS
TABLE X
T OP T EN P ERMISSION PAIRS IN F INAL M ALICIOUS G RAPH
TAKEN F ROM D IFFERENT C OMBINATIONS
TABLE VII
N OTATIONS U SED FOR T OP P ERMISSIONS
INT : RPS with a weight of 0.9574 in G G (Table VIII) indicate

that the permissions INTERNET and READ_PHONE_STATE
occur together in 95.74% of the Genome samples. Similarly,
one can infer the other pairs and their weights.
It can be observed that the top five permission pairs of all
TABLE VIII
three malware graphs are quite similar. Pair {INT : RPS} has a
T OP T EN P ERMISSION PAIRS A LONG W ITH E DGE W EIGHTS
IN M ALWARE AND N ORMAL G RAPHS high weight in all the malware graphs compared to that of G N ,
indicating the leakage of private information of the device to
a server. Permissions patterns consisting of ANS, INT, AWS,
and WES are more prominent in recent malware compared to
the older ones. This leads to the conclusion that the recent
malware samples try to evade detection by retaining similar
permission patterns as that of normal apps. It is also observed
that some pairs containing permissions WAKE_LOCK and
GET_ACCOUNT existed in the top ten pairs of normal apps
but were missing in top pairs of malicious apps. Similarly,
Additionally, we downloaded 1493 normal apps for training the pairs like {ANS : RPS}, {RPS : WES} appeared in the top
and 4500 for testing. We manually downloaded the normal ten pairs of malware samples but were not found in normal
apps on the same desktop machine via a freely available ones.
interface.2 We selected 15 prominent categories from the On closely analyzing the pairs, we found that Koodous sam-
Google Play Store having the highest number of apps, as sum- ples have similarity not only with Genome and Drebin, but also
marized in Table VI. In each category, we selected the freely with the normal apps. This answers our research question 3
available apps having the rating of at least 4 in the Google that recent malware not only contains permission patterns of
Play Store. We have considered only those normal apps the yesteryears but also integrate permission patterns which
which pass the VirusTotal3 test. The complete dataset of match largely with the normal apps to evade detection.
benign apps used in our experiments is made available at
“www.iitr.ac.in/media/facspace/drpskfec/Dataset”. B. Phase-2: Graph Merge
For every common edge e with weights ew1 , ew2 and ew3
A. Phase-1: Permission Pair Graphs Analysis in three malware graphs, we considered different merging
First, we defined short notations for permissions as shown schemes as shown in Table IX. Note that Graph-4 and 5 are
in Table VII for easy reference. We highlighted the top ten different from Graph-3. In Graph-3, the final weight of an edge
permission pairs found in G G , G D , G K , and G N datasets in is the average of weights from G G , G D , and G K ; whereas
decreasing order of weights in Table VIII. Permission pair of in Graph-4, the final weight of an edge is the average of its
minimum and its maximum weight.
2 https://www.apkpure.com Table X presented the top ten pairs found from
3 https://www.virustotal.com/ three merging possibilities: taking the minimum (Graph-1),
Fig. 4. Comparison of top permission pairs for the detection of Malware samples.
maximum (Graph-2) and average (Graph-3) of the edge TABLE XI

weights. Similarly, Table X showed the top permission pairs C OMPARISON OF D ETECTION R ESULTS
found from the weighted sum approach (Graph-6). It is
observed that there is some similarity between the top pairs
of G D and Graph-1, and G K and Graph-2; because G K and
G D have the highest and the lowest weights for most of the
common edges.
Table IX summarized the TPR and False Positives Rate
(FPR) for different malware graphs. G N remained the same
in all the experiments. A least TPR and FPR is observed in
Graph-1 because of assignment of minimum weights to the detection rate of Genome and Drebin samples. Figure 4(b)
edges in the G M ; the normal score is likely to beat the malware showed that among the top ten pairs in G G , G K scored more
score. Graph-2, on the other hand, displayed the highest TPR than G N for nine pairs. Similarly, Figure 4(d) demonstrated
and FPR. We noted that with an increase in the weights of the that for top pairs in G D , G K scored more than G N for
edges in G M , both TPR and FPR increased. nine pairs. Therefore G K detected 99.17% and 99.13% of
To calculate the accuracy, we considered the average of TPR Genome and Drebin samples respectively. On the contrary, G G
and True Negative Rate (TNR). As can be seen from Table IX, and G D detected the malware samples with relatively lower
the weighted sum method gave relatively better accuracy accuracy. This is because G N scored higher for a few of the
of 95.44%. Hence, our proposed graph merge algorithm is top permission pairs, as Figures 4(a) and 4(c) depicted it.
better than other possibilities to merge the malicious graphs. b) Detection of Koodous samples: Figures 4(e) and 4(f)
showed that G N scored higher than G G and G D for three
and five permission pairs respectively out of the top ten pairs
C. Phase-3: Detection of G K . As a result, G G and G D detected a low accuracy,
In this section, we analyzed the detection results obtained 58.92%, and 34.92% respectively, of the Koodous samples.
from the proposed PermPair approach for two cases: (i) Indi- c) Detection of normal samples: Figure 5(c) showed that
vidual Detection in which we do not perform any graph many top permission pairs of normal apps scored high in G K
merge, and every individual malicious graph is taken as G M than G N . Hence G K generated a high false positives rate as
for detection, and (ii) Graph Merge based Detection in which compared to G G and G D . There were some of the permissions
we apply graph merging to get the G M for detection. which existed only in one or two of the malicious datasets.
1) Individual Detection: As an outcome from Section III(1), Consider a case where some permission pairs existed only in
three malware graphs G G , G D , and G K were produced. These G G and G D . Even on adding such pairs to G K , the detection
graphs individually were considered as G M to observe the results did not alter. Similarly, some pairs existed only in G K .
detection results. We tested all the datasets with all these Even on adding these pairs to G G and G D , the results did not
malware graphs. Note that, we considered different samples improve because such pairs existed in very few samples as
for the graph construction and detection phases. Table XI they were having a very low edge weight and therefore did not
summarized the detection results. contribute to the detection results. Hence, the overall detection
a) Detection of Genome and Drebin samples: Table XI accuracy from individual malware graphs was relatively low.
indicates that G K detected the malware samples with high 2) Graph Merge Based Detection: Refer to Tables IX
accuracy because the permission pairs in G K had a high and XI, as discussed in the previous subsection, the over-
similarity to G G and G D (Table VIII). Figure 4 showed the all accuracy achieved by merging of malware graphs using
Fig. 5. Comparison of top permission pairs for the detection of normal samples.
TABLE XII
FALSE P OSITIVES : P RESENCE OF T OP D ANGEROUS P ERMISSION PAIRS IN N ORMAL A PPS
weighted sum approach is 95.44%, which is better than the a high difference of 21 and 10 respectively. WhatsApp also
accuracy achieved by any of the individual malware graphs. contained 6 of the top ten dangerous permission pairs, but
We merged two malware graphs at a time, using the same it was the presence of permissions like SYSTEM_ALERT_
Weighted Sum approach, and checked for detection accuracy, WINDOW, RECEIVE_MMS, BROADCAST_SMS, READ_
as summarized in Table XI. None of the two-graphs merge CALL_LOG and WRITE_CALL_LOG in TextPlus and Chaton
approaches yielded better accuracy than the three-graph merge which gave a high malicious score. The pairing of these per-
approach. Hence, we focused our discussion on three-graphs missions with INTERNET and ACCESS_NETWORK_STATE
merge approach. This subsection deals with the analysis of had a high weight in G M than in G N , hence giving a high
false positives and false negatives when we merge all three malware score.
graphs to get G M . Remaining 65% of the FPs had a very low score difference,
a) False positives analysis: Graph-6, from Table IX, and they belonged to categories like online shopping apps, taxi
when used as G M , identified 4.25% False Positives (FPs). booking apps, spy camera apps, and online games. Figure 6(a)
Table XII, having top ten dangerous permission pairs, pre- showed the difference in the score of the testing set of normal
sented the reason behind these results. A pair is said to be apps. Most of the FPs had a score difference of as low as 0.05.
dangerous if it has more weight in G M than in G N . As big as Apps like Fonetastic and Block SMS and call had a high score
35% of the total FPs recorded high malware score. Table XII difference of 41 in favor of G M .
summarized many of such apps. We observed that apps like b) False negatives analysis: Table XIII and Figure 6(b)
Fonetastic, Block SMS and Call, Android Assistant, etc. had all showed False Negatives (FNs) analysis. Analogous to dan-
top ten dangerous permission pairs. Hence, they are detected gerous pairs, we define the term normal permission pairs;
as malicious with a high score. Besides, most of these normal i.e., pairs with the highest difference in favor of G N . The
apps indicated the signs of suspicious behavior. For instance, proposed model detected samples of 19 malware families as
they can block calls/SMS, kill apps, delete files, and disable FNs. Eight of them contained many of the top normal pairs.
features like Wi-Fi. Though our model detected such apps as The remaining ones did not include any of the top normal
malware, we observed that the majority of the FPs belong to permission pair, instead, they had a very less number of
the Social and Communication category of the normal apps. permissions (between two to five) as showed in Table XIII(b).
Interestingly, normal chat apps like TextPlus and Chaton, Edge weight of these pairs was almost equivalent in both the
too had the dangerous permission pairs, but they did not graphs with very little difference in favor of G N . 84% of the
pose any serious functionality. We tested other similar apps FNs had a score difference of less than or equal to one.
like WhatsApp and found that the proposed approach detected We conclude that our approach is effective in detecting
WhatsApp as malware with a low score difference of 0.04 in Android malware with an accuracy of 95.44%, and the
favor of G M . On the contrary, TextPlus and Chaton scored majority of the FPs (65%) and FNs (84%) had a very low
TABLE XIII
FALSE N EGATIVES A NALYSIS . ( A ) T OP N ORMAL P ERMISSION PAIRS IN M ALICIOUS A PPS . ( B ) L ESS N UMBER OF P ERMISSIONS IN M ALICIOUS A PPS
TABLE XIV
N UMBER OF E DGES R EMOVED IN G M AND G N A FTER E VERY
I TERATION OF P HASE 2 OF E DGE E LIMINATION
Fig. 6. Difference in detection score.
score difference. Tables XII and XIII answered our research

questions 4 and 5, highlighting the dangerous and normal
permission pairs significantly present in malware and normal
apps respectively.
c) Detection of unknown samples: We considered
775 recent samples from Koodous, 250 samples from Con-
tagio, and 300 samples from Pwnzen Infotech [55] to test
the detection rate of the proposed approach on the unknown
samples. The results demonstrated that the proposed approach
detected 714 of the Koodous, 205 of the Contagio, and
238 of the Pwnzen samples correctly with the detection rates
of 92.12%, 82%, and 79.33% respectively. Hence, the overall that the accuracy of our model was 95.44%. Table XIV
detection rate on unknown samples was 84.48%. Many recent summarized the results of Phase-2. After a certain number of
samples from Contagio and Pwnzen contained permissions iterations, we got δ1 and δ2 nearly up to four decimal places.
similar to the normal apps, hence, gave a relatively lower We observed that the algorithm could remove a maximum
detection rate of 82% and 79% respectively. This highlighted of 552 and 4382 edges from G M and G N respectively, thus,
the fact that the proposed model should be continuously eliminating 7.2% edges in G M and 41.1% edges in G N . The
trained with the recent malware so that it can detect the detection time for testing 3264 malicious and 4500 normal
unknown samples with high accuracy. While analyzing the apps was measured to be 2 minutes 45 seconds without edge
false negatives for Koodous samples, we found similar per- elimination. This got reduced to 1 minute 58 seconds, i.e.,
mission patterns as in false negatives of known samples, a reduction of around 28% was possible. Even though this
i.e., containing less number of permissions such as INT, ANS, reduction is in seconds, this will have an impact if there are
and VIB. hundreds of thousands of files to be tested. Hence, the edge
elimination algorithm reduces both, the number of edges in
D. Phase-4: Edge Elimination G M & G N , and the detection time, as shown in Table XV.
The number of edges in G M and G N is 7,658 and
10,661 respectively. We applied the Edge Elimination algo- E. Comparison With Anti-Malware Apps
rithm to eliminate the unnecessary edges. After the first phase, To appraise the achieved results, we compared the mal-
we found a total of 12,130 edges, i.e., deleting these edges ware detection rate of our model with 12 popular mobile
from both the graphs, one at a time, did not decrease the anti-malware apps, taken from VirusTotal.4 For comparison,
detection accuracy. However, deleting all these edges together we chose 275 random unknown malware from Koodous from a
decreased the accuracy. Phase-2 of the algorithm aimed at total of 775 unknown malicious samples to check manually for
finding the maximal subset of edges that can be deleted from
the graphs and do not decrease the accuracy. We noticed 4 https://www.virustotal.com/
TABLE XV permission pairs analysis. Presence of such pairs contributed

C OMPARISON OF D IFFERENT PARAMETERS W ITH AND W ITHOUT to detecting a high number of malicious samples.
G RAPH R EDUCTION A LGORITHM
Our model detected a few FPs with some malicious
behavior. Few normal apps behaved maliciously such as
blocking phone calls and deleting the files. Therefore, our
model reported a little higher FPR than [5] and [23]. However,
authors in [5] did not analyze the permission patterns of
malicious samples. The authors in [23] reported dangerous
permission patterns but in the set of 40. Representing and ana-
lyzing a large set of permissions can be complex. We analyzed
permissions in a group of two and can be easily represented
by graphs.
G. Limitations
The proposed approach has some gray areas which we
intend to discuss in this subsection. The proposed model
Fig. 7. Comparison of the proposed approach with different anti-malware
apps. requires permission pairs to test the apps. Hence, the apps
containing single or no permission cannot be analyzed. Some
TABLE XVI of the malware samples with few permissions evade the detec-
C OMPARISON OF THE P ROPOSED A PPROACH W ITH E XISTING tion. Moreover, the proposed model gives relatively high FPR
P ERMISSIONS -BASED M ALWARE D ETECTION T ECHNIQUES . because many normal apps from Social and Communication
(a) TPR AND FPR. (b) P RECISION AND R ECALL category of the Google Play Store have been identified as
malicious due to the presence of dangerous permission pairs.
To evade detection, attackers may add widely used normal
permissions in the malware samples, thus, generating more
number of normal permission pairs. Besides, the proposed
approach does not focus on detecting colluding apps [56], [57].
To overcome these limitations, in the future, we will analyze
how efficiently other components such as intent filters, hard-
ware components, and API call logs can be used for detection,
in addition to the permissions. Since the proposed approach
each sample through VirusTotal. Figure 7 compared the results is static, malware with update capabilities (downloading of
of PermPair model with widely known anti-malware apps. malicious components at update time) may evade the detec-
We noticed that except QuickHeal, all of the anti-malware apps tion. To overcome this limitation, a dynamic detector can be
had a lower malware detection rate than PermPair. Quickheal deployed that could analyze the run time behavior of the apps.
detected 265 samples, three more than PermPair. Those three However, this will come at the cost of some computational
samples had single permission defined in their manifest file, overhead.
hence those three samples went undetected by our model.
We conclude that (a) our approach is relatively better than
eleven of the mobile anti-malware apps, and (b) most of V. C ONCLUSION AND F UTURE W ORK
the anti-malware apps are unable to detect newer malicious
In this paper, we proposed a novel approach for detecting
samples.
malicious apps by using permission pairs extracted from the
manifest files. We constructed the graphs to analyze permis-
F. Comparison With Existing Approaches sion pairs for both normal and malicious samples and assigned
In this subsection, we presented a comparative evaluation an edge weight to every pair depending upon the number
with other state-of-the-art permissions-based detection tech- of samples in which the pair is present. We subjected three
niques, though they used different datasets for their experi- malware datasets namely: Genome, Drebin, and Koodous to
ments. To the best of our knowledge, no other work in the analysis. The datasets contained the newer samples detected
literature used the same datasets as ours. Some of the works in 2014-18 in addition to the older samples, detected in 2010-
reported their results in TPR and FPR, whereas others mea- 2014. Initially, we constructed three different graphs; one for
sured Precision and Recall. Table XVI compared the detection each dataset and we observed certain deviations in permission
results of the proposed approach with other permissions- pairs of newer samples compared to the older ones. We merged
based solutions. The proposed approach gave nearly the same different malware graphs in a single graph using the weighted
detection rate as that of other related works which used sum method. We further performed edge elimination to remove
Precision and Recall for evaluation. Furthermore, the proposed the unnecessary edges. Results showed that our proposed
approach outperformed all the works in terms of TPR. The method is better than eleven of the popular mobile anti-
better performance of our approach was due to the dangerous malware apps. Our future work will focus on analyzing the
other components of the manifest file like intent filters and [22] V. Moonsamy, J. Rong, and S. Liu, “Mining permission patterns for
hardware components to increase the detection accuracy. contrasting clean and malicious Android applications,” Future Gener.
Comput. Syst., vol. 36, pp. 122–132, Jul. 2014.
[23] W. Wang, X. Wang, D. Feng, J. Liu, Z. Han, and X. Zang, “Exploring
ACKNOWLEDGMENT permission-induced risk in Android applications for malicious appli-
cation detection,” IEEE Trans. Inf. Forensics Security, vol. 9, no. 11,
We have taken the Genome dataset from the authors of [4], pp. 1869–1882, Nov. 2014.
Drebin [5] dataset from Institute for System Security, Tech- [24] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang, “Hey, you, get off of my
market: Detecting malicious apps in official and alternative Android
nische Universitat Braunschweig, Germany, and Koodous [6] markets,” in Proc. NDSS, 2012, pp. 50–52.
dataset from their website that hosts a large number of [25] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu, “Droidmat:
malicious applications for analysis. We express deep gratitude Android malware detection through manifest and API calls tracing,” in
Proc. 7th Asia Joint Conf. Inf. Secur. (Asia JCIS), Aug. 2012, pp. 62–69.
to Genome, Drebin and Koodous projects for providing us the
[26] F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, and Y. Rahulamathavan,
malware samples. “Pindroid: A novel Android malware detection system using ensemble
learning methods,” Comput. Secur., vol. 68, pp. 36–46, Jul. 2017.
[27] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, “A multi-
R EFERENCES modal deep learning method for Android malware detection using
[1] Market Share Alert: Preliminary, Mobile Phones, Worldwide, Gartner, various features,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 3,
Stamford, CT, USA, 2017. pp. 773–788, Mar. 2019.
[2] 97% of Mobile Malware is on Android. This is the Easy Way You Stay [28] K. Sokolova, C. Perez, and M. Lemercier, “Android application classi-
Safe, Forbes Media, Jersey City, NJ, USA, 2014. fication and anomaly detection with graph-based permission patterns,”
[3] 2018 Malware Forecast: The Onward March of Android Malware, Decis. Support Syst., vol. 93, pp. 62–76, Jan. 2016.
Security Report, 2017. [29] H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Mobile app recommendations
[4] Y. Zhou and X. Jiang, “Dissecting Android malware: Characteriza- with security and privacy awareness,” in Proc. ACM KDD, 2014,
tion and evolution,” in Proc. IEEE Symp. Secur. Privacy, May 2012, pp. 951–960.
pp. 95–109. [30] M. Fan et al., “Android malware familial classification and representative
[5] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, and K. Rieck, sample selection via frequent subgraph analysis,” IEEE Trans. Inf.
“DREBIN: Effective and explainable detection of Android malware in Forensics Security, vol. 13, no. 8, pp. 1890–1905, Aug. 2018.
your pocket,” in Proc. NDSS, 2014, pp. 23–26. [31] M. Zhang, Y. Duan, H. Yin, and Z. Zhao, “Semantics-aware Android
[6] Koodous Malware Dataset. Accessed: Nov. 25, 2019. [Online]. Avail- malware classification using weighted contextual API dependency
able: https://www.koodous.com graphs,” in Proc. ACM CCS, 2014, pp. 1105–1116.
[7] Android Developers Guide. Accessed: Nov. 25, 2019. [Online]. Avail- [32] A. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android
able: https://developer.android.com/guide/index permissions demystified,” in Proc. ACM CCS, 2011, pp. 627–638.
[8] M. C. Grace, W. Zhou, X. Jiang, and A.-R. Sadeghi, “Unsafe exposure [33] Y. Feng, S. Anand, I. Dillig, and A. Aiken, “Apposcopy: Semantics-
analysis of mobile in-app advertisements,” in Proc. 5th ACM WiSec, based detection of Android malware through static analysis,” in Proc.
2012, pp. 101–112. 22nd ACM SIGSOFT Symp. Found. Softw. Eng., 2014, pp. 576–587.
[9] W. Enck, M. Ongtang, and P. McDaniel, “On lightweight mobile phone [34] K. O. Elish, X. Shu, D. Yao, B. G. Ryder, and X. Jiang, “Profiling user-
application certification,” in Proc. 16th ACM CCS, 2009, pp. 235–245. trigger dependence for Android malware detection,” Comput. Secur.,
[10] A. P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, and D. Wagner, vol. 49, pp. 255–273, Mar. 2015.
“Android permissions: User attention, comprehension, and behavior,” in [35] W. Enck et al., “TaintDroid: An information-flow tracking system for
Proc. 8th Symp. Usable Privacy Secur., 2012, Art. no. 3. realtime privacy monitoring on smartphones,” ACM Trans. Comput.
[11] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “PScout: Analyzing Syst., vol. 32, no. 2, 2014, Art. no. 5.
the Android permission specification,” in Proc. 19th ACM CCS, 2012, [36] V. Rastogi, Y. Chen, and W. Enck, “AppsPlayground: Automatic security
pp. 217–228. analysis of smartphone applications,” in Proc. ACM CODASPY, 2013,
[12] S. Holavanalli et al., “Flow permissions for Android,” in Proc. 28th pp. 209–220.
IEEE/ACM Int. Conf. Automated Softw. Eng., Nov. 2013, pp. 652–657. [37] M. Sun, T. Wei, and J. C. S. Lui, “TaintART: A practical multi-level
[13] M. Grace, Y. Zhou, Z. Wang, and X. Jiang, “Systematic detection of information-flow tracking system for Android runtime,” in Proc. ACM
capability leaks in stock Android smartphones,” in Proc. NDSS, 2012, CCS, 2016, pp. 331–342.
p. 19.
[38] Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S. Wang,
[14] B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, P. G. Bringas, and
“AppIntent: Analyzing sensitive data transmission in Android for privacy
G. Álvarez, “PUMA: Permission usage to detect malware in Android,” in
leakage detection,” in Proc. ACM CCS, 2013, pp. 1043–1054.
Proc. Int. Joint Conf. CISIS’12-ICEUTE’12-SOCO’12 Special Sessions.
[39] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss,
Berlin, Germany: Springer, 2013.
“‘Andromaly’: A behavioral malware detection framework for Android
[15] B. Sanz et al., “MAMA: Manifest analysis for malware
devices,” J. Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, 2011.
detection in Android,” Cybern. Syst., vol. 44, nos. 6–7,
pp. 469–488, 2013. [40] A. Reina, A. Fattori, and L. Cavallaro, “A system call-centric analysis
[16] K. A. Talha, D. I. Alper, and C. Aydin, “APK Auditor: Permission-based and stimulation technique to automatically reconstruct Android malware
Android malware detection system,” Digital Invest., vol. 13, pp. 1–14, behaviors,” in Proc. 6th Eur. Workshop Syst. Secur., 2013, pp. 1–6.
Jun. 2015. [41] V. M. Afonso, M. F. de Amorim, A. R. A. Grégio, G. B. Junquera, and
[17] G. Tao, Z. Zheng, Z. Guo, and M. R. Lyu, “MalPat: Mining patterns of P. L. de Geus, “Identifying Android malware using dynamically obtained
malicious and benign Android apps via permission-related APIs,” IEEE features,” J. Comput. Virology Hacking Techn., vol. 11, no. 1, pp. 9–17,
Trans. Rel., vol. 67, no. 1, pp. 355–369, Mar. 2018. 2015.
[18] L. Cen, C. S. Gates, L. Si, and N. Li, “A probabilistic discriminative [42] S. Wang, Q. Yan, Z. Chen, B. Yang, C. Zhao, and M. Conti, “Detecting
model for Android malware detection with decompiled source code,” Android malware leveraging text semantics of network flows,” IEEE
IEEE Trans. Depend. Secure Comput., vol. 12, no. 4, pp. 400–412, Trans. Inf. Forensics Security, vol. 13, no. 5, pp. 1096–1109, May 2018.
Jul./Aug. 2015. [43] A. Shabtai, L. Tenenboim-Chekina, D. Mimran, L. Rokach, B. Shapira,
[19] H. Peng et al., “Using probabilistic generative models for ranking risks and Y. Elovici, “Mobile malware detection through analysis of devia-
of Android apps,” in Proc. ACM CCS, 2012, pp. 241–252. tions in application network behavior,” Comput. Secur., vol. 43, no. 6,
[20] N. Milosevic, A. Dehghantanha, and K.-K. R. Choo, “Machine learning pp. 1–18, 2014.
aided Android malware classification,” Comput. Elect. Eng., vol. 61, [44] Z. Chen et al., “A first look at Android malware traffic in first few
pp. 266–274, Jul. 2017. minutes,” in Proc. IEEE Trustcom, Aug. 2015, pp. 206–213.
[21] H.-J. Zhu, Z.-H. You, Z.-X. Zhu, W.-L. Shi, X. Chen, and L. Cheng, [45] A. Arora, S. Garg, and S. K. Peddoju, “Malware detection using network
“DroidDet: Effective and robust detection of Android malware using traffic analysis in Android based mobile devices,” in Proc. 8th Int. Conf.
static analysis along with rotation forest model,” Neurocomputing, Next Gener. Next Gener. Mobile Apps, Services Technol., Sep. 2014,
vol. 272, pp. 638–646, Jan. 2018. pp. 66–71.
[46] A. Arora and S. K. Peddoju, “Minimizing network traffic features for Sateesh K. Peddoju (SM’18) has been with IIT
Android mobile malware detection,” in Proc. 18th Int. Conf. Distrib. Roorkee, India, since 2010. He has publications in
Comput. Netw., 2017, Art. no. 32. reputed journals like IEEE P OTENTIALS , MTAP,
[47] A. Saracino, D. Sgandurra, G. Dini, and F. Martinelli, “MADAM: WPC, and IJIS and conferences, including Trust-
Effective and efficient behavior-based Android malware detection and Com, MASS, ICDCN, and ISC. His research inter-
prevention,” IEEE Trans. Depend. Sec. Comput., vol. 15, no. 1, ests include cloud computing, ubiquitous computing,
pp. 83–97, Jan./Feb. 2018. high-performance computing and security. He is
[48] M. Sun, X. Li, J. C. S. Lui, R. T. B. Ma, and Z. Liang, “Monet: currently a Senior Member of ACM. He was a
A user-oriented behavior-based malware variants detection system recipient of University Rank and scholarship, and
for Android,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 5, several Best Paper Awards and Best Teacher Award.
pp. 1103–1112, May 2017. He is also the Secretary of the IEEE Roorkee section,
[49] M. Xia, L. Gong, Y. Lyu, Z. Qi, and X. Liu, “Effective real-time Android the Vice-Chair of IEEE Computer Society, India council, and a Founding
application auditing,” in Proc. IEEE Symp. Secur. Privacy, May 2015, Faculty Advisor of ACM Student Chapter-IIT Roorkee. He is also a Reviewer
pp. 899–914. of top-rated journals like IEEE TCC, IEEE TSC, MTAP, COSE, COMNET,
[50] M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang, “Riskranker: and JNCA. He is also the Founding General Co-Chair of SLICE-2018 and the
Scalable and accurate zero-day Android malware detection,” in Proc. General Chair of DSEA-2018. He is in several conferences like IEEE MASS,
10th Int. Conf. Mobile Syst., Appl., Services, 2012, pp. 281–294. IEEE ATC, IEEE SmartComp, IEEE iNIS, and IoTSMS.
[51] A. Arora and S. Peddoju, “NTPDroid: A hybrid Android malware
detector using network traffic and system permissions,” in Proc. 17th
IEEE TrustCom, Aug. 2018, pp. 808–813.
[52] A. Arora, S. K. Peddoju, V. Chouhan, and A. Chaudhary, “Hybrid
Android malware detection by combining supervised and unsupervised
learning,” in Proc. 24th ACM MobiCom, 2018, pp. 798–800.
[53] K. Atkinson, An Introduction to Numerical Analysis. Hoboken, NJ, USA: Mauro Conti (SM’14) received the Ph.D. degree
Wiley, 2008. from the Sapienza University of Rome, Italy,
[54] Contagio Mobile Malware Dump. Accessed: Nov. 25, 2019. [Online]. in 2009. After then, he was a Post-Doctoral
Available: https://www.contagiominidump.blogspot.com Researcher with Vrije Universiteit Amsterdam, The
[55] S. Chen et al., “Automated poisoning attacks and defenses in malware Netherlands. In 2011, he joined the University of
detection systems: An adversarial machine learning approach,” Comput. Padua as an Assistant Professor, where he became
Secur., vol. 73, pp. 326–344, Mar. 2018. an Associate Professor in 2015, and a Full Professor
[56] S. Bugiel, L. Davi, A. Dmitrienko, T. Fischer, A. R. Sadeghi, and in 2018. He was a Visiting Researcher with GMU
B. Shastry, “Towards taming privilege-escalation attacks on Android,” in 2008 and 2016, UCLA in 2010, UCI from 2012
in Proc. 19th Annu. Netw. Distrib. Syst. Secur. Symp., 2012, p. 19. to 2014, and in 2017, TU Darmstadt in 2013, UF
[57] K. Elish, H. Cai, D. Barton, D. Yao, and B. Ryder, “Identifying mobile in 2015, and FIU from 2015 to 2016. He was
inter-app communication risks,” IEEE Trans. Mobile Comput., to be awarded with a Marie Curie Fellowship in 2012 by the European Commission,
published. and with a Fellowship by the German DAAD in 2013. He is currently a Full
Professor with the University of Padua, Italy, and an Affiliate Professor with
Anshul Arora is currently pursuing the Ph.D. the University of Washington, Seattle, USA. His research is also funded by
degree from the Department of Computer Science companies, including Cisco and Intel. His main research interests include
and Engineering, IIT Roorkee, India, under the security and privacy. In this area, he published more than 200 articles
guidance of Dr. S. K. Peddoju. He is also an in topmost international peer-reviewed journals and conference. He is also
Assistant Professor of discipline of mathematics and an Area Editor-in-Chief of the IEEE C OMMUNICATIONS S URVEYS AND
computing with the Delhi Technological University T UTORIALS , and an Associate Editor for several journals, including the
Delhi, India. His research interests include mobile IEEE C OMMUNICATIONS S URVEYS AND T UTORIALS , the IEEE T RANS -
security, mobile malware detection, and network ACTIONS ON I NFORMATION F ORENSICS AND S ECURITY , and the IEEE
traffic analysis. T RANSACTIONS ON N ETWORK AND S ERVICE M ANAGEMENT. He was the
Program Chair of TRUST 2015, ICISS 2016, WiSec 2017, and the General
Chair of SecureComm 2012 and ACM SACMAT 2013.

PermPair Android Malware Detection Using Permission Pairs

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

PermPair Android Malware Detection Using Permission Pairs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PermPair Android Malware Detection Using Permission Pairs

Uploaded by

Copyright:

Available Formats

1968 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL.