Real-Time Behavior Analysis and Identification For Android Application
Real-Time Behavior Analysis and Identification For Android Application
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
ABSTRACT The number of applications based on the Android platform is increasing rapidly now.
However, as the supervision and review of Android applications are inadequate, a reasonable chance exists
that users will download malware. This malware can lead to information leakage, monetary loss, and other
damages. At present, a variety of applications exist for detecting malware, but most of these applications
cannot show specific malicious behaviors. Moreover, the operation of this detection software is based on
the database of viruses, and thus, it cannot identify unknown malware. To solve these problems, we
implemented a system to detect the behaviors of Android applications and identify known or unknown
malware. Our system can monitor specified applications utilizing loading a kernel module. After the
detection process, the related documents are uploaded to the server, and the dynamic behaviors are
reconstructed. As a result, a behavior diagram is generated. In addition, if the user needs to know whether
the application is malware, the related Android package is sent to the server and analyzed. Then, the server
calculates the results and the results are returned to the client.
INDEX TERMS Android malware, behavior analysis, dynamic detection, software identification
2169-3536 © 2017 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission. 1
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
known malware, and therefore are unable to identify presented in Section III. Then, we present the evaluation in
unknown malware. Section IV. In Section V, we review and compare related work.
In this paper, we apply hybrid analysis in our system to Finally, Section VI discusses limitations and future work.
resolve the problems described above. We implement an
Android application to detect, analyze, and identify II. OVERVIEW
applications and evaluated it using many sample applications.
The key contributions of this paper are summarized as follows: A. TECHNIQUES
1) We create a new approach for detecting real-time Several techniques are applied in our system to manage data
behaviors based on an Android kernel, which uses storage, network transmission [23], and the service terminal.
kernel-level monitoring mechanisms. MySQL is used to store data, including the permissions and
2) We create a new approach for identifying malware APIs of applications and the probability that selected
using the results of both dynamic detection and static permissions and APIs are used. Using these data, our system
analysis. Then, it is easy to analyze an application can identify Android applications. Moreover, the small scale
according to the statistical results to identify whether it and fast speed of MySQL is suitable for our system. The
is malware by using a naive Bayesian algorithm. In process of identifying applications is executed on the server.
addition, it is noteworthy that this method can identify To build this, we apply Struts 2, a Web application framework.
both known and unknown software. As compared to Struts 1, Struts 2 is based on WebWork and
3) Our approach analyzes behaviors using a data-centric handles requests using an interceptor mechanism so that it can
technique, which differs from the traditional analysis completely separate the business logic from Servlet.
of a single application or the entire system [1] [14].
The analysis approach presented in this paper is B. SYSTEM OVERVIEW
capable of reconstructing the behaviors of multiple An overview of the system of real-time behavior analysis and
applications while incurring less overhead. identification for Android applications is presented in Fig. 1.
4) We implement a complete system with graphical The system is composed of six modules, two of which are on
interfaces, which is more convenient to use. Moreover, the client side and four on the server side. Specifically, on the
we present the evaluation of our system in detail. client side, an Android application is installed that is composed
The rest of this paper is structured as follows. In Section II, of two modules. The first time the application is run on a
we introduce the techniques applied in our system and give an device, the required files are initialized. Then, each time the
overview of the system. The detailed design of the system is application is opened, additional initialization work is
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
completed. The system loads a kernel module to detect the After obtaining the address of sys_call_table, specific
behaviors of the selected applications, and all behaviors are system calls can be intercepted. The source addresses of
recorded in the general log file. required system calls should first be stored and then
Our server is composed of four modules. If required, the replaced with the addresses of the handling functions
dynamic behaviors of an application are reconstructed, and designed in this study. In these handling functions, the
then the behavior graph is generated. The configuration files original system calls are still called to handle interrupts or
and codes of an Android application can be acquired by using a exceptions. Four types of system calls are intercepted in our
decompiler. In addition, relevant content such as permissions system: Android interprocess communication, file
and APIs can be extracted from these files. Samples of operations, network operations, and process management.
malicious or benign applications are parsed, and the statistical Among these system calls, Android interprocess
results for permissions and APIs can be acquired. After feature communication can be parsed by Binder Parser, as most of
selection, selected permissions and APIs are used to train a the system calls depend on the Android Binder mechanism;
classifier. When it has received an APK, the server analyzes it other system calls can be directly parsed using System Call
and utilizes the classifier to identify the application. Parser. The two parses are introduced below.
1) BINDER PARSER
III. DETAILED DESIGN The Binder framework is the standardized interprocess
communication mechanism. The Binder mechanism
A. INITIALIZATION consists of four components: Client, Server, Service
The initialization module is responsible for generating Manager, and Binder Driver. Client, Server, and Service
resource files when the Android application is used for the Manager run in the user space, whereas Binder Driver runs
first time. Specifically, some resource files need to be in the kernel space. In addition, in communications, Binder
copied, including all Android interface definition language Driver provides /dev/binder, which is a type of device file
(AIDL) files and the loadable kernel module [24] [28], and for communicating with the userspace. At the same time,
some files need to be created, such as the uid_file and Client, Server, and Service Manager communicate with
directories for log files and behavior graphs. Binder Driver by file manipulation functions, including
Nonsystem applications are shown as a list each time the open and ioctl. Finally, Service Manager is used as a
user opens our Android application. PackageManager is daemon for managing Server and providing Client with the
used to indicate all applications, and system applications capacity to query the interfaces of Server. AIDL is used to
are filtered out. The detailed information of each nonsystem implement one-to-one correspondence above in the
application includes the name of the application, name of Android system, as it allows an application to define
the application package, the id of the application, the name interfaces between Client and Server. The Android software
of the version, number of the version, installation date, development kit (SDK) automatically generates a Java
update date, and the icon of the application. interface file when an AIDL file has been completed.
The source function of the ioctl system call is ioctl (unsigned
B. DYNAMIC BEHAVIOR DETECTION int fd, unsigned int cmd, unsigned long arg). The first
An important task in the behavior detection module is to parameter represents a file descriptor of one binder device, the
add a hook to the kernel to monitor system calls. A loadable second represents the IO control command, and the third
kernel module can implement this task. Most operating represents the content sent in userspace. If the second
systems, including Unix and Windows, support the loadable parameter is BINDER_WRITE_READ, the third must be a
kernel module, and thus it can be used to detect behaviors binder_write_read structure. In the binder_write_read structure,
of applications without recompiling the Linux kernel or the member variable “write_buffer” records content that is
restarting the system. transmitted from the user space to Binder Driver, while the
In rootkit.ko, the first step is to obtain sys_call_table, which member variable “read_buffer” preserves data from Binder
contains all system calls. When an interrupt or an exception Driver in the user space.
occurs, the kernel jumps to the exception vector table to handle Further, “write_buffer” is stored as an array, each
it. In the Linux kernel of the Android system, a section of element of which consists of a communication protocol
space ranging from the address of 0xffff0000 represents all code and a set of communication data. The communication
interrupt routines, and an instruction, which is stored at the protocol code BC_TRANSACTION represents
address of 0xffff0008, is able to copy the address of exception communication data between processes; the corresponding
handling to the current instruction register. In the process of structure is binder_transaction_data. The proposed system
exception handling, an instruction exists that loads the address extracts required information from the two member
of sys_call_table to a register. Therefore, it is feasible to search variables described in the following in the structure of
for the loading instruction in the process of handling, and then binder_transaction_data.
find the address of sys_call_table. “Data_buffer_address,” which stores information from user
space, is the first member variable required. The first content
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
in the buffer is the request header. The following content the algorithm is O(n), where n stands for the number of
comprises the parameters of the interface, which can be parsed records in the log file.
by Server. Contents other than primitive information and The details of the algorithm flow are as follows:
contents in Intents are unreadable. Therefore, we directly parse 1) Initialize the array nodes and the hash table map.
the primitive information of specific functions and information 2) Get log records from the log file one by one. If no log
in Intents to gain data in the buffer. record remains, the system goes to Step 9.
The second member variable is “function_feature_code,” 3) Extract pid, uid, the name of the function, and
which matches the function in the AIDL file. The Android parameters from each log record, and then store them
SDK may generate some constants according to rules when in a node of behavior. If the system call is a clone call,
compiling the AIDL file. These constants are stored at cid also needs to be stored.
member variable “code,” and Server searches for the 4) If a uid has been stored at map, this indicates that the
corresponding function through the constants. To parse node of an application has been created, and the system
“code,” it is necessary to parse the AIDL files in advance for must go to Step 5. If the node of an application has not
interfaces. In this system, the AIDL files are parsed been created, the system goes to Step 8.
automatically and stored in memory. In addition, some 5) When pid has also been stored at map, the process tree,
bound services such as ActivityManager and at which the current node is stored, has been in the
ContentProvider are services themselves, and thus we graph, and the system then goes to Step 6. It goes to
convert them into AIDL files artificially. Step 7 when pid is not in map.
2) SYSTEM CALL PARSER 6) If the system call is clone, insert this node(clone) after
System calls of file operations contain mainly open, read, the node in map, and update map[uid][cid] to this
write, and close, and parameter parsing is aimed at file node(clone). For other system calls, insert this
descriptors. To improve the performance, we apply a hash node(clone) after the node in map, and then update
table to store file descriptors that have been parsed. map[uid][pid] to this node(clone).
Furthermore, some file operations are triggered by an 7) If pid does not exist, make the current behavior node a
application itself; these are probably operations on processes, child node of the application node, and then update
device files, and class libraries. In this system, file operations map[uid][pid].
on /proc, /dev, /vendor/lib, and /system/lib are not recorded to 8) If uid does not exist, create an application node and
avoid future influence on the behavior reconstruction. make it a child node of the root node. In addition, make
For network operations, we choose to record the data, the current behavior node a child node of the
sources, and destinations that are transmitted. For system application node, and update map[uid][pid].
calls of sendto and recvfrom, attention should be paid to the 9) A primary behavior graph has been generated.
first three parameters. In detail, the first parameter is the Algorithm. 1. Graph-generating algorithm
file descriptor of the socket and is used to record the IP 1: nodes[0] = root
addresses and port numbers of sources and destinations. 2: map[uid][pid] = 0
The second represents the data to be sent or received, and 3: i = 1
4: for each line in log file do
the third specifies the length of the data. 5: store pid, uid, function, and parameters into a node
All system calls are eventually recorded in a general file 6: for clone function store child id (cid) into the node
in the format [Rootkit] (process id) (application id) system 7: if uid exists then
8: if pid exists then
call (parameter 1, …, parameter n). In the Android system, 9: if function == clone then
each Android application has a unique id; its id can 10: let this become the child of nodes[map[uid][pid]]
distinguish an application. 11: map[uid][cid] = i
12: else
13: let this become the child of nodes[map[uid][pid]]
C. BEHAVIOR RECONSTRUCTION 14: map[uid][pid] = i
The records in log files are too complicated to be understood 15: end if
by users; therefore, log records are reconstructed to generate 16: else
17: this node becomes the child of application node
behavior graphs in the proposed system. As the behavior 18: map[uid][pid] = i
reconstruction module is distributed on the server, the log file, 19: end if
and packages.list must be sent over the network to the server. 20: else
21: create an application node
1) GRAPH GENERATING ALGORITHM 22: this node becomes the child of application node
The code for the graph-generating algorithm is presented in 23: map[uid][pid] = i
Algorithm. 1. In the algorithm, “uid” stands for the unique id 24: end if
25: i++
of an Android application, “pid'” stands for the id of a process, 26:end for
and “cid” stands for the id of a child process. Main steps of the
algorithm are to create a node and add the node to the tree for
each record in the log file. Therefore, the time complexity of
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
cycle, and also make the current node the last function program, we execute commands of CMD by obtaining the
of the broadcast life cycle. At the same time, make the runtime environment.
two nodes the children of the node of the schedule
method of the broadcast life cycle. E. APPLICATION STATISTICS AND ANALYSIS
9) Remove the queued node from the queue, and return to During the preparation phase, massive samples of Android
Step 1. applications were essential. As it was unrealistic to download
10) The broadcast-matching algorithm is completed, and a all of the applications manually, benign applications were
behavior graph with abundant semantic information achieved by means of a Web crawler from the Android Market,
has been generated. and malware applications were downloaded from several
3) GRAPH-SIMPLIFYING ALGORITHM forums such as the Kafan and phpBB forums. For this system,
The behavior graph generated above still contains redundant it was sufficient to use a simple Web crawler program to grab
information and needs to be simplified. First, the redundant information from the Internet automatically.
nodes of clones without children should be removed. Second, The statistics of the permissions and APIs are required.
duplicate nodes that are continuously called are combined into Permissions are considered as an example in this paper. The
one node, and the number of repetitions is marked. Finally, in first step is to calculate the sums of permissions. Thus, for
this system, each broadcast life cycle can be abstracted into a each permission, four numbers need to be calculated: the
node, which will contain abstract behavior information. There number of malware applications using the permission, the
are three types of nodes after abstraction, as follows: number of benign applications using the permission, the
1) File operations, including the “open” system call, number of malware applications not using the permission,
“write” system call, “read” system call, and “close” and the number of benign applications not using the
system call, are classified as File Access. permission. This is the process of collecting the statistics of
2) Network operations, including the “sendto” and permissions, as well as of APIs.
“recvfrom” system calls, are classified as Network The second step is to analyze the permissions and obtain
Access. characteristic attributes. In this system, a chi-square test is
3) The third type contains IPC calls, but they are too applied to determine whether the presence of a permission
numerous to arrange. Therefore, the authority mechanism and the nature of an application owning the permission are
of the Android system is used. In the Android system, the related. For the algorithm, the chi-square values can be
permissions of an application are checked if one of the calculated by using the formula for four-fold tables. In the
key APIs is called. For example, the system checks formula, a stands for the number of malware applications
whether the application owns using the permission, b stands for the number of benign
android.permission.SEND_SMS permission if a message applications using the permission, c stands for the number
is sent. In practice, permission checking is completed of malware applications not using the permission, d stands
through the function checkPermission in the class for the number of benign applications using the permission,
ActivityManager, and the checked permission is added to and n stands for the number of applications. In the special
the abstract node as abstract information. case where the value of a, b, c, or d is less than 5 and that of
n is greater than 40, the correction formula 2
(|ad−bc|− )
( + )( + )( + )( + )
2
(1)
The server receives the APK of an Android application from
the client and analyzes it to extract permissions and APIs [25]. is required. In addition,
In this system, APKTool, which is a decompiler provided by
2
( − )
2
=
(2)
Google, is used to decompile APKs. The tool generates files ( + )( + )( + )( + )
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
TABLE I
PERFORMANCE O VERHEAD
Without Proposed With Proposed
Operation Method Method Overhead
(average time) (average time)
IPC 11110.5 (ms) 13141 (ms) 18%
File 4132.1 (ms) 4853.2 (ms) 17%
operation
TABLE Ⅱ
COMPARISON WITH PREVIOUS M ETHODS
Research Name Overhead
Research in this paper 17% - 18%
TaintDroid >= 32%
VetDroid >= 32%
Aurasium 14% - 35%
CopperDroid 20% - 30%
B. EVALUATION OF PERFORMANCE
In the case of dynamic detection, we evaluated the
performance when our system was run on Android devices.
Two custom test schemes were applied. One was applied to
test the upper limit on the performance overhead of parsing
system calls, which include mainly ioctl system calls and file
system calls. The second scheme tested the performance
overhead in an actual operation process. We tested the start-up
time of applications to evaluate their performance overhead, as
applications are disturbed less during this time. In addition,
during the start-up time, operations including
ActivityManager analysis, interprocess communication,
display of graphical interfaces, and behaviors in the life
cycle of onCreate() are executed.
FIGURE 3. Monitoring behaviors of GGTracker For the first performance test, a program was designed to
request multiple file operations or IPC calls repeatedly, and then
received. Under normal circumstances, in fact, if users decide record the execution time to test the upper limit of the
to subscribe to a service by using messages, the service performance overhead. Table Ⅰ shows that the percentage of the
provider sends a message to the users to confirm the fees. After overhead of file operations is 17%, and that of the overhead of
receiving the message, users need to respond with specific parsing IPC calls is 18%. Moreover, as Table Ⅱ shows, as
content such as “Confirm” or “Y.” However, GGTracker is compared with previous methods, our proposed method increases
able to subscribe to fee-based services without the user being the burden on the Android system only slightly.
aware, as it intercepts and listens to messages. In the second test process, an Android Debug Bridge
The second malicious behavior is the interception of (ADB) instruction was used to test the start-up time of
messages, which is presented in Fig. 3. When a mobile Android applications. To avoid interference, we tested the
device has received a message, GGTracker intercepts the start-up time of each application 10 times and then
message and parses it. The analysis of behaviors showed calculated the average time. Ultimately, 16 different
that GGTracker intercepts messages sent from phone applications were tested, of which 14 were provided by the
numbers, including 99735, 46621, 96512, 33335, system, and 2 were installed manually. The behaviors of
00033335, 00036397, 36397, 55991, 55999, 56255, and applications in the start-up process differ, and therefore
41001. Then, GGTracker sends the message to a remote their overheads are not the same. According to the results,
server, the domain name of which is www.amaz0n- the performance overhead of the start-up process ranges
cloud.com. In particular, GGTracker may reply to the from 0.20% to 10.67%, and the average start-up time
message sent by phone number 41001 with “Yes.” increases by 5.25% with dynamic analysis.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
C. EVALUATION OF MALWARE APPLICATION obfuscation and dynamic code loading. Dynamic analysis
IDENTIFICATION based on a Dalvik VM is widely used for taint analysis.
We evaluated the results of the malware application TaintDroid [6], which performs taint tracking by modifying the
identification. After being downloaded, 122 malware Dalvik VM, was the original dynamic taint analysis method.
applications and 166 benign applications were available for Based on TaintDroid, several researchers proposed different
testing. We traversed all applications to test them in our methods [18] [26]. However, all of these methods have the
system one by one, to get the results of the identification same shortcoming: it is difficult to analyze native codes. Thus,
process. to solve the problem, VMI-based research studies were
Fig. 4. presents the results of our experiment. As we presented. For example, DroidScope [22] runs the entire
focused on identifying malware application, we confirmed Android system on a QEMU VM in order to seamlessly
malware application as the positive class. In the reconstruct the semantics of the OS and the Java layer. These
experiment, 104 malware applications and 150 benign approaches need to be run in a simulated environment, and it is
applications can be accurately identified. After being unlikely that they will be ported to real devices. They are
calculated, the accuracy of the malware application unable to obtain real behaviors of applications and are faced
identification is 88.2%. We can also obtain the results that with problems such as anti-forensic techniques.
the precision of the malware application identification is In this paper, we propose a new method of dynamic analysis
86.7% and the recall of the malware application based on a kernel to detect the behaviors of applications. As
identification is 85.2%. With the results of our experiment compared with the approaches above, the most important
above, this identification method is proved to be effective. unique characteristic of our method is that it can be used in
real devices to yield reliable results. Furthermore, our method
16 uses kernel-level monitoring mechanisms in order to monitor
both Java codes and native codes, while dynamic analysis
based on a Dalvik VM is unable to achieve this.
104 True positives In addition, malware cannot find our method, and thus
avoid detection, because our method runs at the kernel level
False negatives
and owns the highest-level permission, whereas most
True negatives applications can own only lower-level permissions.
150 False positives Although Jarvis also operates in the Linux kernel, its main
goal is to bridge the semantic gap between high-level
18 Android APIs and low-level system calls, not to analyze
application behaviors. In addition, our method transforms
the detection results into a behavior graph, and thus, users
FIGURE 4. Results of identifying applications. Malware application is can understand the analysis results more easily.
confirmed as the positive class and benign application is confirmed as
the negative class. For example, “False positives” means that an
instance of benign application is identified as malware application. VI. DISCUSSION
In the study described in this paper, we implemented a
system of real-time behavior analysis and identification for
V. RELATED WORK Android applications. The system can be improved to enhance
Static Analysis. Static analysis extracts information required security and accuracy in the future. Firstly, the process of
by means of analyzing source codes or binary files. It analyzes identification is completed through the network, and thus it is
and covers all codes rather than executing the application, and faced with severe security issues [4] [5] [10] [21] such as
therefore its code coverage is high. However, the method lacks information leakage. In the future, we can create an effective
practical execution paths and relevant contextual information algorithm to select the best relay to assist the secure
[2]. Moreover, it is faced with the challenges of code transmission like [7] [8]. Moreover, we can ensure the
obfuscation and dynamic code loading. transmission security by implementing data encryption and
As compared with most static analysis methods, decryption. To improve the system performance, we can also
including RiskRanker [9], our static analysis method apply cache techniques [20]. If the identification results are
extracts all of the permissions and APIs and then analyzes pre-stored at the relay nodes around the user, the data
them using statistics, instead of looking up sensitive or transmission will be directly performed from the relays to the
dangerous codes. Furthermore, we use permissions and user, instead of experiencing the identification process again.
APIs from the APK and the results of dynamic detection to Thus, it is an effective way to reduce the transmission load and
solve the problem of dynamic code loading. to speed up the transmission of identification results.
Dynamic Analysis. Dynamic analysis is performed by Considering that a large number of users are likely to identify
observing the behaviors of applications while they are running applications at the same time, we had better improve cahce
by executing them. It is able to avoid the problems of code techniques [13] to deal with such a situation. However,
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
outdated channel state information (CSI) [13] may be a [10] X. Hei, X. J. Du and S. Lin, “PIPAC: Patient Infusion Pattern based
Access Control Scheme for Wireless Insulin Pump System,” in Proc.
tricky problem for us. of IEEE INFOCOM, Turin, Italy, 2013, pp. 3030-3038
Secondly, the limitations of the dynamic analysis method in [11] J. Hoffmann, M. Ussath, T. Holz and M. Spreitzenbarth, “Slicing droids:
our system are essentially the same as those of traditional program slicing for smali code,” in Proc. 28th Annual ACM Symposium on
Applied Computing, Coimbra, Portugal, 2013, pp. 1844-1851.
dynamic analysis methods. On the one hand, dynamic analysis
[12] A. Kovacheva, “Efficient code obfuscation for Android,” in Proc. Int.
methods usually utilize custom automated tools to trigger Conf. Advances in Information Technology, Bangkok, 2013, pp. 104-
events, or the users themselves trigger events. Thus, it takes a 119.
long time to analyze a large number of applications. On the [13] X. Z. Lai, J. J. Xia, M. B. Tang, H. C. Zhang and J. H. Zhao, “Cache-
aided multiuser cognitive relay networks with outdated channel state
other hand, there are many logic branches in an application, In formation,” IEEE Access, vol. 6, pp. 21879-21887, 2018.
and this is likely to give rise to a path explosion problem. In [14] A. Lanzi, D. Balzarotti, C. Kruegel, M. Christodorescu and E. Kirda,
addition, our system is designed mainly for Android system “AccessMiner: using system-centric models for malware protection,” in
Proc. 17th ACM Conf. Computer and Communications Security,
4.4.2 and is unable to deal with parts of services automatically. Chicago, IL, USA, 2010, pp. 399-412.
Therefore, our system may be improved to enhance the [15] S. Poeplau, Y. Fratantonio, A. Bianchi, C. Kruegel and G. Vigna,
accuracy in the future as follows: “Execute this! Analyzing unsafe and malicious dynamic code loading
in Android applications,” in Proc. NDSS Symposium, San Diego,
1) We should attempt to apply the corresponding AIDL California, USA, 2014.
files in different Android systems to help the parsing [16] A. Reina, A. Fattori and L. Cavallaro, “A system call-centric analysis
binder. and stimulation technique to automatically reconstruct android
malware behaviors,” in Proc. ACM European Workshop on Systems
2) The dynamic analysis should be able to extract bound Security, Prague, 2013, pp. 1-6.
services from the Android system automatically, and [17] J. Sahs and L. Khan, “A machine learning approach to Android malware
automatically analyze these services to provide detection,” in Proc. EISIC, Odense, Denmark, 2012, pp. 141-147.
information for the parsing binder. [18] D. Schreckling, J. Köstler and M. Schaff, “Kynoid: real-time
enforcement of fine-grained, user-defined, and data-centric security
3) We can combine static analysis and dynamic analysis policies for android,” in Proc. 6th IFIP WG 11.2 Int. Conf. Information
to generate more comprehensive graphs. Security Theory and Practice: security, privacy and trust in computing
systems and ambient intelligent ecosystems, Egham, UK, 2012, pp.
208-223.
[19] L. Wu, X. J. Du and J. Wu, “Effective Defense Schemes for Phishing
REFERENCES Attacks on Mobile Computing Platforms,” IEEE Transactions on
[1] Adity and D. Kaur, “Detection and prevention of malicious node using data Vehicular Technology, vol. 65, pp. 6678 - 6691, Aug. 2016. DOI.
centric techniques,” International Journal of Emerging Trends and 10.1109/TVT.2015.2472993
Technology in Computer Science, vol. 5, no. 2, pp. 95-97, Mar. 2016. [20] J. J. Xia, F. S. Zhou, X. Z. Lai, H. C. Zhang, H. B. Chen, Q. H. Yang,
[2] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. L. X. Liu, J. H. Zhao, “Cache Aided Decode-and-
Traon, D.Octeau and P. McDaniel, “FlowDroid: Precise context, flow, Forward Relaying Networks: From the Spatial View,” Wireless Comm
field, object-sensitive and lifecycle-aware taint analysis for Android unications and Mobile Computing, pp. 1-9, Apr. 2018, DOI.
apps,” in Proc. 35th ACM SIGPLAN Conf. Programming Language 10.1155/2018/5963584.
Design and Implementation, Edinburgh, United Kingdom, 2014, pp. [21] Y. Xiao, V. Rayi, B. Sun, X. Du, F. Hu and M. Galloway, “A survey of
259-269. key management schemes in wireless sensor networks,” Journal of
[3] Y. Cheng, et al., “A lightweight live memory forensic approach based Computer Communications, vol. 20, no. 11-12, pp. 2314-2341, Sep.
on hardware virtualization,” Elsevier Information Sciences, vol. 379, 2007.
pp. 23-41, Feb. 2017. [22] L. K. Yan and H. Yin, “DroidScope: seamlessly reconstructing the OS
[4] X. Du and H. H. Chen, “Security in wireless sensor networks,” IEEE and dalvik semantic views for dynamic android malware analysis,” in
Wireless Communications, vol. 15, no. 4, pp. 60-66, Aug. 2008, DOI. Proc. 21st USENIX Conf. Security symposium, Bellevue, WA, 2012,
10.1109/MWC.2008.4599222. pp. 29-29.
[5] X. Du, M. Guizani, Y. Xiao, and H. H. Chen, “Secure and efficient [23] S. L. Yang and J. P. He, “Research and implementation of web services
time synchronization in heterogeneous sensor networks,” IEEE Trans. in Android network communication framework Volley,” in Proc. 11th
Vehicular Technology, vol. 57, no. 4, pp. 2387-2394, Jul. 2008, DOI. Int. Conf. Service Systems and Service Management, Beijing, China,
10.1109/TVT.2007.912327. 2014, pp. 1-3
[6] W. Enck, P. Gilbert, B. G. Chun, L. P. Cox, J. Y. Jung, P. McDaniel and [24] D. H. You and B. N. Noh, “Android platform based linux kernel
A. N. Sheth, “TaintDroid: An information-flow tracking system for rootkit,” in Proc. 6th Int. Conf. Malicious and Unwanted Software,
realtime privacy monitoring on smartphones,” in Proc. 9th USENIX Fajardo, Puerto Rico, 2011, pp. 79-87.
Conf. Operating systems design and implementation, Vancouver, BC, [25] F. Yu, S. Anand, I. Dillig and A. Aiken, “Apposcopy: semantics-based
Canada, 2010, pp. 393-407. detection of Android malware through static analysis,” in Proc. 22nd
[7] L. S. Fan, X. F. Lei, N. Yang, T. Q. Duong and G. K. Karagiannidis, “Secure ACM SIGSOFT International Symposium on Foundations of Software
multiple amplify-and-forward relaying with cochannel interference,” IEEE Engineering, Hong Kong, China, 2014, pp. 576-587.
Journal of Selected Topics in Signal Processing, vol. [26] Y. Zhang, M. Yang, B. Q. Xu, Z. M. Yang, G. F. Gu, P. Ning, X. S.
10, no. 8, pp. 1494-1505, Dec. 2016, DOI. Wang and B. Y. Zang, “Vetting undesirable behaviors in android apps
10.1109/JSTSP.2016.2607692. with permission use analysis,” in Proc. ACM SIGSAC Conf. Computer
[8] L. S. Fan, X. F. Lei, N. Yang, T. Q. Duong and G. K. Karagiannidis, and communications security, Berlin, Germany, 2013, pp. 611-622.
“Secrecy cooperative networks with outdated relay selection over correlated [27] Y. Zhou and X. Jiang, “Dissecting android malware: characterization
fading channels,” IEEE Trans. Vehicular Technology, vol. 66, no. 8, pp. and evolution,” in Proc. IEEE Symposium on Security and Privacy,
7599-7603, Aug. 2017, DOI. 10.1109/TVT.2017.2669240. San Francisco, CA, USA, 2012, pp. 95-109.
[9] M. Grace, Y. Zhou, Q. Zhang, S.H. Zou and X. X. Jiang, “Riskranker: [28] W. Zhu, Y. J. Wang and Z. Xue, “Study on Android rootkit based on
scalable and accurate zero-day android malware detection,” in Proc. VFS,” Information Security and Communications Privacy, vol. 1, pp.
10th Int. Conf. Mobile systems, applications, and services, Low Wood 68-69, Jan. 2013.
Bay, Lake District, UK, 2012, pp. 281-294.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final
publication. Citation information: DOI 10.1109/ACCESS.2018.2853121, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.