System Call Graphs
System Call Graphs
Abstract—With explosive growth of Android malware and apps to an Android based smart phone. Because of the lack in
due to its damage to smart phone users (e.g., stealing user trustworthiness review methods, developers can upload their
credentials, resource abuse), Android malware detection is one Android apps including cracked apps, repackaged apps, or
of the cyber security topics that are of great interests. Currently,
the most significant line of defense against Android malware is trojans to the market easily. The presence of other third-party
anti-malware software products, such as Norton, Lookout, and Android markets (e.g., Opera Mobile Store, Wandoujia) makes
Comodo Mobile Security, which mainly use the signature-based this problem worse. Today, a lot of android malware (e.g.,
method to recognize threats. However, malware attackers increas- Geinimi, DriodKungfu and Hongtoutou) is released on the
ingly employ techniques such as repackaging and obfuscation markets, which poses serious threats to smart phone users,
to bypass signatures and defeat attempts to analyze their inner
mechanisms. The increasing sophistication of Android malware such as stealing user information, making premium calls, and
calls for new defensive techniques that are harder to evade, sending SMS advertisement spams without the user’s permis-
and are capable of protecting users against novel threats. In sion [14]. According to Symantec’s latest Internet Security
this paper, we propose a novel dynamic analysis method named Threat Report [29], one in every five Android apps (nearly
Component Traversal that can automatically execute the code one million total) were actually malware. To protect legitimate
routines of each given Android application (app) as completely
as possible. Based on the extracted Linux kernel system calls, users from the attacks of Android malware, currently, the most
we further construct the weighted directed graphs and then significant line of defense is anti-malware software products,
apply a deep learning framework resting on the graph based such as Norton, Lookout, and Comodo Mobile Security, which
features for newly unknown Android malware detection. A mainly use the signature-based method to recognize threats.
comprehensive experimental study on a real sample collection However, malware attackers increasingly employ techniques
from Comodo Cloud Security Center is performed to compare
various malware detection approaches. Promising experimental such as repackaging and obfuscation to bypass signatures
results demonstrate that our proposed method outperforms and defeat attempts to analyze their inner mechanisms. The
other alternative Android malware detection techniques. Our increasing sophistication of Android malware calls for new
developed system Deep4MalDroid has also been integrated into defensive techniques that are harder to evade, and are capable
a commercial Android anti-malware software. of protecting users against novel threats.
I. I NTRODUCTION
In this paper, we propose a novel dynamic analysis method
Smart phones have been widely used to perform the tasks
named Component Traversal that can automatically execute
such as banking, automated home control, and bill paying
the code routines of each given Android app as completely
in people’s daily life. In recent years, there has been an
as possible. Based on the extracted Linux kernel system calls,
exponential growth in the number of smart phone users around
we further construct the weighted directed graphs and then
the world: according to a recent report [27], there were over
apply a deep learning framework resting on the graph based
1.91 billion smart phone users across the globe in 2015. De-
features for newly unknown Android malware detection. A
signed as an open, free, and programmable operating system,
comprehensive experimental study on a real sample collection
Android as one of the most popular smart phone platforms
from Comodo Cloud Security Center is performed to compare
dominates the current market share [3]. However, the openness
various malware detection approaches. Promising experimen-
of Android not only attracts the developers for producing
tal results demonstrate that our proposed method outperforms
legitimate apps, but also attackers to deliver malware (short
other alternative Android malware detection techniques. Our
for malicious software) onto unsuspecting users. Google’s
developed system Deep4MalDroid has also been integrated
Android market is the official online platform for delivering
into a commercial Android anti-malware software. The major
* Corresponding author contributions of our work can be summarized as follows:
105
101
104
and Decision Tree (DT) have successfully applied in Android system calls are recorded by using Strace that is a
malware detection [24], [30], [31], [11], [12]. Deep learning, a diagnostic, debugging and instructional userspace utility
new frontier in machine learning and data mining, is starting to for Linux. (See Section IV-A for details.)
be leveraged in industrial and academic research for different • Graph Constructor: For each Android app, based the
applications (e.g., Computer Vision) [6], [23]. Yuan et al. extracted system calls, a weighted directed graph will
have worked with the deep learning methods in Android be constructed. Each vertex in the graph is a unique
malware detection [34], [35]. Their works combined static Linux kernel system weighted by its frequency, while the
and dynamic analysis of files with Deep Belief Network directed edge indicates which system call is performed
(DBN) and experienced a 15.5% higher accuracy compared afterwards and is weighted to show how frequently that
to traditional shallow learning models. Based on our collected sequence appears. (See Section IV-B for details.)
sample set, we intend to explore a deep learning framework • Deep Learning Classifier: Resting on the constructed
resting on our constructed Linux kernel system call graphs. system call graphs, a deep learning framework is used for
model construction and thus for newly unknown Android
III. S YSTEM A RCHITECTURE malware detection. (See Section IV-C for details.)
In this paper, based on the collected Android apps, we • Malware Detector: For the new collected unknown An-
extract their Linux kernel system calls and construct the corre- droid app, it will be parsed through the preprocessor,
sponding weighted directed graphs for feature representations, dynamic feature extractor and graph constructor for its
and then apply a deep learning framework for model construc- feature extraction, and then the classification model will
tion and thus for newly unknown Android malware detection. be used to predict whether it is benign or malicious.
Figure 1 shows the system architecture of our developed
IV. P ROPOSED M ETHOD
Android malware detection system Deep4MalDroid, which
consists of the following five major components. To be resilient towards typical malware evasion techniques
(e.g., repackaging, obfuscation), in this paper, we choose
dynamic analysis for Android malware detection and propose
a novel method named Component Traversal that can automat-
ically execute the code routines of each given Android app as
completely as possible. Based on the extracted Linux kernel
system calls, the weighted directed graphs will be constructed.
Then, resting on the constructed system call graphs, a deep
learning framework is used for model construction and thus
for newly unknown Android malware detection.
106
102
105
buttons for this starting page: Login Account (Figure 2 will be stored locally for use in manipulating the Android app
(b)) with “com.ispyoo.android.activity.LoginActivity” during execution in the emulator.
and Register New Account (Figure 2 (c)) with 2) Emulation and Feature Extraction: Android provides a
“com.ispyoo.android.activity.LoginActivity”. When using sandboxed app execution environment [19]. A customized
ADT Monkey to auto-execute this app, no matter which embedded Linux system interacts with the hardware in the
event generated from ADT Monkey, it cannot cross the login smart phone. The middleware and application framework API
page and thus fails to extract the sensitive behaviors from runs on top of Linux. Compared with the framework APIs,
the app. Actually, the sensitive behaviors hide in the services Linux kernel system calls are more resilient towards mal-
of “com.ispyoo.common.monitor.AndroidWatchdogService” ware evasion techniques (e.g., the attacker can substitute the
and “com.ispyoo.common.calltracker.receiver.ProcessCall”, framework APIs to evade the detection). The Linux operating
which are triggered in an activity called system has about 300 system calls which can be classified
“com.ispyoo.android.activity.Splash” (as shown in Figure 2 into different categories depending on function of operating
(d)). In order to solve this problem, we propose the Component system, such as process management, memory management
Traversal method as a way of automating the execution of an and device management [19]. If any Android app needs to
entire app’s code. By locating all the executable components request services (e.g., network transmission), developers can
from the manifest file, a more complete system call list use “HttpUrlConnection” which is network API provided by
can be generated since all the runnable code may have an Java Development Kit (JDK), or “HttpComponents” that is
opportunity to be executed. Java request networking framework provided by Apache, or
they can even develop a networking framework by themselves.
The framework APIs can be substituted, but the implemen-
tations of such request service have to rely on the system
calls provided by the Linux kernel such as “sendto()”and
“recvfrom()”. Moreover, framework APIs vary from version
to version of Android OS, while Linux kernel system calls
are version independent of Android OS. Therefore, in this
paper, we will extract the Linux kernel system calls from the
executing Android apps, other than the framework APIs.
a) Emulation: Since Genymotion is a relatively fast and
Fig. 2: Screen shots of “SecureKid” App robust Android emulator with over two million users world-
wide [16], we use it to execute the Android app for dynamic
1) Preprocessing: Android app is compiled and packaged behavior extraction. Using the Android Debug Bridge (ADB),
in a single archive file with an .apk suffix which includes all of a versatile command line tool that allows the communication
the app code (.dex files), resources, assets, and manifest file. with an emulator instance or connected Android-powered
Android defines a component-based framework for developing device [20], the app to be analyzed will be loaded into the
mobile apps [32] and an Android app is composed of four Genymotion emulator.
different types of components [2]: Activities provide Graphical b) Linux kernel system call extraction: After successfully
User Interface (GUI) functionality to enable user interactivity; loading, by traversing each component in the Component List
Services are background communication processes that pass generated in the preprocessing step, each Activity and Service
messages between the components of the app and commu- of the Android app will be executed to completion. To record
nicate with other apps; Broadcast Receivers are background the executed Linux kernel system calls, Strace is used for
processes that respond to system-wide broadcast messages as collecting logs. Strace is a diagnostic, debugging and instruc-
necessary; Content Providers act as database management sys- tional userspace utility for Linux, which is used to monitor
tems that manage the app data. The Android app must declare interactions between processes and the Linux kernel, including
its components in a manifest file (“AndroidManifest.xml”) system calls, signal deliveries, and changes of process state
which is one of the most important files in the Android [25]. When all of the components have completed execution,
project structure. Before the Android system can start an app using ADB and the Visual Studio file pipe, the full Linux
component, the system must know that the component exists kernel system call list is offloaded from the emulator and
by reading the app’s manifest file. The manifest file actually brought back to the host machine. Genymotion is then reset to
works as a road map to ensure that each app can function prevent malware infection from skewing further extractions.
properly in the Android system. The proposed Component Transversal method allows for an
By using the APKTool [4], which is a tool for reverse automated, complete approach to dynamic analysis of Android
engineering the binary Android apps, we first unzip the APK apps rather than relying on user interactions or a random event
file and then access the manifest file to retrieve the package generator. In order to enhance the process, a multi-threaded
name and the list of runnable components, specifically the approach is implemented: one thread is used for Android app
Services and Activities. The Component List including the execution, while a separate thread is created to run Strace for
extracted package name and all runnable component names logging Linux kernel system calls.
107
103
106
B. Graph Construction C. Deep Learning Framework for Android Malware Detection
In the previous step of our method, an Android app is Although classification methods based on shallow learning
analyzed and a list of its Linux kernel system calls is au- architectures, such as Support Vector Machine (SVM), Artifi-
tomatically extracted. Using independent Linux kernel system cial Neural Network (ANN), Naı̈ve Bayes (NB), and Decision
calls alone, however, is not sufficient information to model the Tree (DT), can be used to solve the Android malware detection
app’s behaviors [15]. Simply keeping a count of the system problem [24], [30], [31], [11], [12], deep learning has been
calls ignores any sequential relationships among them. In order demonstrated to be one of the most promising architectures
to capture the relationships among the extracted system calls, for its superior layerwise feature learning models and can
a weighted directed graph G = {V, E} is generated, where V thus achieve comparable or better performance [18]. In this
is a set of nodes and v ∈ V is unique Linux kernel system call, paper, we explore a deep learning architecture with Stacked
and E ⊆ V × V represents a set of directed weighted edges, AutoEncoders (SAEs) model to detect Android malware. The
where an edge − v−→
i vj indicates a sequential pair of system calls. SAEs model is a stack of AutoEncoders, which are used as
Using a weighted directed graph maintains not just the number building blocks to create a deep network.
of each Linux kernel call made, but also the sequence of the 1) AutoEncoder: An AutoEncoder, also called AutoAsso-
system calls and the frequency of that sequence. ciator, is an artificial neural network used for learning efficient
To construct the graph, each extracted unique Linux ker- codings [28]. Architecturally, an AutoEncoder is composed of
nel system call is first mapped to an integer node. The an input layer, an output layer, and one or more hidden layers
frequency of the system call is directly proportional to the connecting them. The goal of an AutoEncoder is to encode a
size of the node. For each sequential pair of system calls representation of the input layer into the hidden layer, which
extracted, a directed edge between the nodes is created. is then decoded into the output layer, yielding the same (or
This edge will be weighted and incremented each time that as close as possible) value as the input layer [5]. In this way,
sequence occurs in the sample app. To further illustrate the hidden layer acts as another representation of the feature
the process, Figure 3 is a segment of the collected Linux space, and in the case when the hidden layer is narrower
kernel system call list from “Live wallpaper.APK” (MD5: (has fewer nodes) than the input/output layers, the activations
2d38973d442ae070f76399ec4ef730e7) which is a live wallpa- of the final hidden layer can be regarded as a compressed
per app embedded with malicious code that can steal user’s representation of the input [5], [28], [6]. Figure 5 illustrates
credentials. In this paper, we only consider the names of a one-layer AutoEncoder model with one input layer, one
Linux kernel system call while ignoring the parameters. This hidden layer, and one output layer. Typically, the number
log contains five Linux system calls: “clock gettime”, “read”, of hidden units (d0 ) is much less then number of visible
“getuid32”, “ioctl”, “fcntl64” with corresponding frequency of (input/output) ones (d). As a result, when passing data through
h4, 1, 4, 4, 1i. A node is created for each system call in Figure 4 such a network, it first compresses (encodes) input vector to
with a different size based on its frequency. In the recorded log, fit in a smaller representation, and then tries to reconstruct
the system call “read” (line 2) is followed by “clock gettime” (decode) it back. The task of training is to minimize an error
(line 3), which means a directed edge is created from the or reconstruction (using Equation 1), i.e. find the most efficient
“read” node to the “clock gettime” node. The weight of this compact representation (encoding) for input data (Equation 2).
edge is the frequency of this pair of Linux kernel system calls n
1X
appears in the generated log. E(x, z) = ||xi − zi ||2 , (1)
2 i=1
where x is an input vector, z is a reconstructed d-
dimensional vector in the input space, n is the number of
training samples.
108
104
107
2) Deep Learning Framework with SAEs: To form a deep TABLE I: Performance indices of Android malware detection
network, an SAEs model is created by daisy chaining Au-
Indices Specification
toEncoders together, known as stacking: the output of one
AutoEncoder in the current layer is used as the input of the True Positive (T P ) Num. of apps correctly classified as malicious
AutoEncoder in the next [6]. More rigorously, with an SAEs True Negative (T N ) Num. of apps correctly classified as benign
deep network with h hidden layers, the first layer takes the False Positive (F P ) Num. of apps mistakenly classified as malicious
input from the training dataset and is trained simply as an False Negative (F N ) Num. of apps mistakenly classified as benign
AutoEncoder. Then, after the k th hidden layer is obtained, its T P Rate (T P R) T P/(T P + F N )
output is used as the input of the (k +1)th hidden layer, which F P Rate (F P R) F P/(F P + T N )
is trained similarly. Finally, the hth layer’s output is used as the Accuracy (ACY ) (T P + T N )/(T P + T N + F P + F N )
output of the entire SAEs model. In this manner, AutoEncoders
can form a hierarchical stack. Figure 6 illustrates a SAEs
model with h hidden layers. To use the SAEs for Android
B. Comparisons of Different Dynamic Feature Extraction
malware detection, a classifier needs to be added on the top
Methods
layer. In this paper, we combine the SAEs and the classifier
together as the entire deep architecture model for Android In this set of experiments, we compare the performance
malware detection, which is illustrated in Figure 7. of the two methods of automatic Android app execution:
ADT Monkey and our proposed Component Traversal method
(short for CompTrav in the following), using four typical
classification methods (i.e., SVM, ANN, NB and DT). Based
on the sample set described in Section V-A, we use both
ADT Monkey and CompTrav method to extract the Linux
kernel system calls from the Android apps. The extracted
Linux kernel system calls are directly used as the inputs
for each classifier. We conduct 10-fold cross validations for
the evaluation. The experiment results shown in Table II
Fig. 6: Framework of Fig. 7: Deep learning framework demonstrate that our proposed CompTrav method outperforms
Stacked AutoEncoders for Android malware detection using ADT Monkey for the Linux kernel system call extraction
in Android malware detection. This should come as no surprise
since CompTrav method allows more of the app components
V. E XPERIMENTAL R ESULTS AND A NALYSIS
to be executed resulting in a more complete listing of Linux
In this section, we conduct three sets of experimental studies kernel system calls. The CompTrav method will therefore be
using a real sample collection obtained from Comodo Cloud used for the remaining experiments.
Security Center to fully evaluate the performance of our
developed Android malware detection system: (1) In the first TABLE II: Comparisons of different dynamic feature
set of experiments, we compare the detection performances extraction methods
using ADT Monkey and our proposed Component Traversal
method for Linux kernel system call extraction; (2) In the Method Accuracy TP FN TN FP
second set of experiments, we evaluate the detection perfor-
mance of our graph construction method; (3) In the last set ADT Monkey+SVM (M SVM) 67.46% 1667 833 1706 794
of experiments, we evaluate the detection performance of the ADT Monkey+ANN (M ANN) 66.36% 1635 865 1683 817
deep learning framework by comparisons with typical shallow ADT Monkey+NB (M NB) 62.74% 1553 947 1584 916
learning methods. ADT Monkey+DT (M SVM) 66.86% 1660 840 1683 817
CompTrav+SVM (C SVM) 73.66% 1829 671 1854 646
A. Experimental Setup
CompTrav+ANN (C ANN) 71.98% 1779 721 1820 680
The sample set obtained from Comodo Cloud Security CompTrav+NB (C NB) 67.3% 1653 847 1712 788
Center includes 3,000 android apps, half of which are be- CompTrav+DT (C DT) 71.92% 1787 713 1809 691
nign, while the other half are malicious including the pop-
ular malware families of Geinimi, GinMaster, DriodKungfu,
Hongtoutou, FakePlayer etc. We evaluate the Android malware For feature extraction efficiency, CompTrav also shows
detection performance of different methods using the measures better performance than ADT Monkey. In our experiments, we
shown in Table I. The emulations are conducted on the find that in order to activate the code routines as completely
Genymotion. All the experiments are performed under the as possible, ADT Monkey usually needs to generate at least
environment of 64 Bit Windows 8.1 operating system with 500 random events, which takes more than 2 minutes for
Inter(R) Core(TM) i7-4790 CPU @ 3.60GHZ plus 16 GB of executing an app. If using CompTrav, 5 seconds will be enough
RAM. to execute an activity or a service. For our collected sample
109
105
108
set, most of the malicious apps have less than 10 components,
and the average execution time of an app using CompTrav is
about 41 seconds whose speed is three times of using ADT
Monkey.
TABLE III: Evaluation of graph based features the different deep learning models. While building the deep
learning model, there are two key parameters: the number
Structure Accuracy TP FN TN FP of hidden layers and the number of neurons in each hidden
layer. Table IV shows the detection performance changes with
CompTrav+SVM (C SVM) 73.66% 1829 671 1854 646 different deep learning model constructions. We also conduct
CompTrav+ANN (C ANN) 71.98% 1779 721 1820 680 the comparisons between the deep learning model and other
CompTrav+NB (C NB) 67.3% 1653 847 1712 788 four typical shallow learning classification models as shown
CompTrav+DT (C DT) 71.92% 1787 713 1809 691 in Table V. Figure 9 shows the ROC curves of different
CompTrav Graph+SVM (CG SVM) 88.24% 2181 319 2231 269 classification methods. From Table V and Figure 9, we can
CompTrav Graph+ANN (CG ANN) 87.88% 2190 310 2204 296 see that, compared with the typical shallow learning methods,
CompTrav Graph+NB (CG NB) 77.94% 1942 558 1955 545 the detection performance is greatly improved by using deep
CompTrav Graph+DT (CG DT) 87.42% 2185 315 2186 314 learning framework.
110
106
109
TABLE V: Comparisons between deep learning and shallow [3] Android, iOS combine for 91 percent of market, http://www.cnet.com.
learning models [4] Apktool, http://ibotpeaches.github.io/Apktool/.
[5] Y. Bengio. Learning Deep Architectures for AI. In Foundations and
Trends in Machine Learning, Vol 2(1), 1-127, (2009).
Method Accuracy TP FN TN FP [6] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy Layer-
Wise Training of Deep Networks. In NIPS, (2007).
Deep Learning 93.68% 2334 166 2350 150 [7] M. Bierma, E. Gustafson, J. Erickson, D. Fritz, Y. Choe. Andlantis: Large-
scale Android Dynamic Analysis. In SPW, (2014).
CompTrav Graph+SVM (CG SVM) 88.24% 2181 319 2231 269
[8] I. Burguera, U. Zurutuza, S. Nadjm-Tehrani. Crowdroid: Behavior-Based
CompTrav Graph+ANN (CG ANN) 87.88% 2190 310 2204 296 Malware Detection System for Android. In ICDM, (2011).
CompTrav Graph+NB (CG NB) 77.94% 1942 558 1955 545 [9] S.J. Chang. Ape: A smart automatic testing environment for android
malware. (2013).
CompTrav Graph+DT (CG DT) 87.42% 2185 315 2186 314 [10] DroidBox, https://github.com/pjlantz/droidbox.
[11] M. Dimjaevi, S. Atzeni, I. Ugrina, and Z. Rakamari. Evaluation of
android malware detection based on system calls. In IWSPA, (2016).
[12] M. Dimjasevic, S. Atzeni, I. Ugrina, Z. Rakamaric. Android Malware
Detection Based on System Calls. In UUCS, (2015).
[13] W. Enck, P. Gilbert, B. Chun, L. P. Cox, J. Jung, P. McDaniel, and A.
N. Sheth. TaintDroid: an information-flow tracking system for realtime
privacy monitoring on smartphones. In OSDI, (2010).
[14] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner. A survey of
mobile malware in the wild. In SPSM (2011).
[15] H. Gascon, F. Yamaguchi, D. Arp, and K. Rieck. Structural detection
of android malware using embedded call graphs. In AISec, 2013.
[16] Genymotion, https://www.genymotion.com/.
[17] M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. RiskRanker: scalable
and accurate zero-day android malware detection. In MobiSys, (2012).
[18] G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for
Fig. 9: ROC curves of different classification models deep belief nets. In Neural Computation, Vol 18, 1527-1554, (2006).
[19] T. Isohara, K. Takemori, A. Kubota. Kernel-based Behavior Analysis for
Android Malware Detection. In CIS, (2011).
[20] Kaspersky Security Bulletin 2015. https://securelist.com/.
analysis method named Component Traversal is proposed [21] A. Krizhevsky, I. Sutskever and G. Hinton. Imagenet classification with
which can automatically execute the code routines of each deep convolutional neural networks. Proc. Adv. Neural Inf. Process. Syst.
Vol. 25, 1106–1114, (2012).
given Android app as completely as possible. Based on the [22] H. Larochelle, Y. Bengio, J. Louradour and P. Lamblin. Exploring
extracted system calls, we construct the weighted directed strategies for training deep neural networks. J. Mach. Learn. Res. Vol.
graphs and then apply a deep learning framework for newly 10, 1–40, (2009).
[23] Y. Lv, Y. Duan, W. Kang, Z. Li, F. Wang. Traffic Flow Prediction
unknown Android malware detection. To the best of our With Big Data: A Deep Learning Approach. In Intelligent Transportation
knowledge, this is a unique approach to automated dynamic Systems, 865 - 873, (2014).
analysis. A comprehensive experimental study on a real sample [24] N. Peiravian, X. Zhu. Machine Learning for Android Malware Detection
Using Permission and API Calls. In ICDM, (2013).
collection from Comodo Cloud Security Center is performed [25] Strace, https://en.wikipedia.org/wiki/Strace.
to compare various malware detection approaches. Promising [26] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, Y. Weiss. Andromaly: a
experimental results demonstrate that our proposed method behavioral malware detection framework for android device. J.Intell Inf
Syst, 161-190 (2012).
outperforms other alternative Android malware detection tech- [27] Two Billion Consumers Worldwide to Get Smart(phones) by 2016, http:
niques. The developed system Deep4MalDroid has also been //www.emarketer.com.
integrated into a commercial Android anti-malware software. [28] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol.
Stacked Denoising Autoencoders: Learning Useful Representations in a
For future work, we want to further improve Component Deep Network with a Local Denoising Criterion. In Journal of Machine
Traversal method by introducing generation of random events Learning Research, Vol 11, 3371-3408, (2010).
on each component. We will also further explore how spar- [29] P. Wood. Internet Security Threat Report 2015. Symantec, California
(2015).
sity constraints are imposed on AutoEncoder to yield better [30] D. Wu, C. Mao, T. Wei, H. Lee, K. Wu. DroidMat: Android Malware
detection performance. Meanwhile, it would be interesting to Detection through Manifest and API Calls Tracing. In Asia JCIS, (2012).
investigate other deep learning models for Android malware [31] W. Wu, S. Hung. DroidDolphin: a dynamic Android malware detection
framework using big data and machine learning. In RACS, (2014.
detection. [32] C. Yang, Z. Xu, G. Gu, V. Yegneswaran, P. Porras. DroidMiner: Auto-
mated Mining and Characterization of Fine-grained Malicious Behaviors
ACKNOWLEDGMENT in Android Applications. In Computer Science, Vol 8712, 163–182
The authors would also like to thank the anti-malware (2014).
[33] R. Xu, H. Sadi, R. Anderson. Aurasium: practical policy enforcement
experts of Comodo Security Lab for the data collection as for Android applications. In USENIX, (2012).
well as helpful discussions and supports. This work is partially [34] Z. Yuan, Y. Lu, Z. Wang, Y. Xue. Droid-Sec: Deep Learning in Android
supported by the U.S. National Science Foundation under Malware Detection. In SIGCOMM, (2014).
[35] Z. Yuan, Y. Lu , Y. Xue. Droiddetector: android malware characterization
grant CNS-1618629. and detection using deep learning. Tsinghua Science and Technology, Vol
21 ,114–123 (2016).
R EFERENCES [36] C. Zheng, S. Zhu, S. Dai, G. Gu, X. Gong, X. Han, W. Zou. SmartDroid:
[1] ADT Monkey, http://developer.android.com/tools/help/monkey.html. an Automatic System for Revealing UI-based Trigger Conditions in
[2] Android application fundamentals, http://developer.android.com/guide/ Android Applications. In SPSM, (2012).
components/fundamentals.html.
111
107
110