DroidData Tracking and Monitoring Data Transmission in The Android Operating System
DroidData Tracking and Monitoring Data Transmission in The Android Operating System
http://www.scirp.org/journal/cn
ISSN Online: 1947-3826
ISSN Print: 1949-2421
Hani Alshahrani1,2, Abdulrahman Alzahrani1, Alexandra Hanton3, Ali Alshehri1, Huirong Fu1,
Ye Zhu4
1
Oakland University, Rochester, MI, USA
2
Najran University, Najran, KSA
3
Loyola University Chicago, Chicago, IL, USA
4
Cleveland State University, Cleveland, OH, USA
Keywords
Android, Security, Privacy, Tracking Data, Malware
1. Introduction
Smartphones continue to become increasingly ubiquitous, and Android leads
the way in the field with over 75 percent of the market share [1]. As smartphones
gain popularity and functionality, an increasing number of people use their de-
• More informative results about the transmissions than other tools, providing
information such as the security of the data transmission.
The paper is organized as follows. Sections 2 and 3 present our problem
statement and an overview of Android’s architecture respectively. Section 4 pro-
vides an example of code that leaks sensitive data. Section 5 explains how static
and dynamic analysis will cooperate in DroidData in order to track and monitor
data. Section 6 explains how we use symbolic execution to identify which GUI
events caused a data leak. Section 7 explains the related works. Section 8 men-
tions some limitations of our tool and how we will work on them in the future,
and section 9 contains the conclusion.
3. Android Overview
We begin by presenting an overview of Android in order to explain how Droid-
Data will function and protect users. Android is an open-source platform devel-
oped by Google and the Open-Handset Alliance [6]. Initially released in 2007,
Android has gone through many updates and changes to improve its security
and user experience. At the time of this writing, the latest version is Marshmal-
low. We will discuss some of the updated features in this section.
telephony manager to provide calling abilities to the device [7]. Most of these are
written in Java, like most of the Android applications they serve [7].
Android applications, which may come with the system or be downloaded by
the user, are contained in an APK file, which contains a variety of items besides
the source code. It also might contain XML layout files, unique libraries, the
META-INF directory, which contains the signature for the package, the assets
directory of application assets, and the XML manifest [8]. This manifest is one of
the most important pieces of the application and contains key information, such
as the name, components, permissions, and API of the application [9]. The
components are key parts of the app and include activities, services, broadcast
receivers, and content providers. Each activity represents a single screen in the
user interface of an application. Services provide long-running functions in the
4. Example
We begin by giving an example of how a malicious application disguised as a
normal one can transmit a user’s privacy-sensitive data without his or her
knowledge.
The example code in Figure 2 is taken from an actual malicious application
that poses as a kitchen timer. One of its malicious activities involves stealing the
user’s account information, such as usernames and passwords, and transmitting
them to an outside URI.
Figure 2 illustrates both sources and sinks, all of which are underlined in the
code snippets. A source is the origin of privacy-sensitive information, which we
see here with the method GetAccounts(). Privacy-sensitive constitutes anything
that is unique to the user. It is not limited to items such as account information
or location information, which are typically seen as vulnerable. For example, a
less obvious source of privacy-sensitive information is device identifiers, which
can be misused by malicious applications. A sink is the part of the code where
sensitive data is transmitted from the device. In this case, we see that occur at
Main. this. DoPost, followed by the URL of the malicious site. This shows that it
by the user. It exploits the user, but not the Android security model.
5. Implementation
DroidData will use both static and dynamic analysis to track and monitor data
transmissions. In order to accomplish this type of analysis, our application will
sit below the application frameworks and above the libraries, Dalvik Virtual
Machine, and core libraries, as illustrated in Figure 4. This is similar to the posi-
tion that a traditional desktop antivirus has. This will provide our application
with the access to the ICC components, device sensors and resources it needs to
accurately track data as it moves through the system. Static and dynamic analysis
will cooperate by each generating a separate report when a threat is found. In
this way, the two will not necessarily work together; rather, they will each look
for data leaks independently to increase security by ensuring that no potential
leaks are missed.
Taint analysis is a widely accepted method for tracking data transmission in
the Android OS. It can be performed either dynamically or statically, and both
our static and dynamic analysis will use taint tracking [13]. Taint analysis in-
volves tagging data of interest at sources, in this case privacy-sensitive data, with
a “taint tag” and tracking the propagation of the taint through the rest of the
code [3]. Taint tags are stored in adjacent memory and logs are used to deter-
mine when the tainted data leaves the system [14]. Taint propagation must be
performed carefully in order to prevent taint explosion, when almost everything
in the system accidentally becomes tainted [3].
does not actually execute [5]. For this reason, it was crucial that we include static
analysis in our tool, so that users did not lose sensitive information, then be no-
tified when it was already too late. Static analysis begins by converting the Dalvik
bytecode from an APK file back into Java code so that it can be analyzed [12].
We did this manually with tools such as JD-GUI, apktool, and dex-2-jar [16]
[17] [18]. We plan to automate this process in DroidData.
Once we have extracted the bytecode, we must analyze it in order to identify
the sources and sinks. This must be comprehensive in order to identify all possi-
ble places of origin and loss of sensitive data. To do make it so, we must look not
only at the Java classes, but also at the XML manifest and layouts, any files and
libraries associated with the application, and any other items that may be in
the .apk file [12]. We will implement a modified version of the open-source tool
SuSi, which uses supervised machine learning to generate a categorized list of
sources and sinks in an Android application [19]. Using the sources and sinks
that we identify, we can identify all of the potential paths that connect sources to
sinks using static taint analysis, which we discussed earlier. These potential paths
are then reported to the user in a report that details the types of data that may be
transmitted and other known information about the paths.
There are some potential flaws in static analysis, and these must be noted to
the user in the initial report. It will be made clear that these are only potential
transmissions, and that use of the application does not guarantee that the specif-
ic code path will ever be executed and data transmitted as such [5]. It may also
be the case that not all information about the transmission will be known
through static analysis. It is difficult to tell what type of channel the data will be
transmitted through via static analysis, and code paths that interact with other
parts of the system, such as libraries, may not be uncovered [5]. These are all
reasons that we also implement dynamic analysis, but we will make clear to the
user in this initial report that the results of static analysis may not reflect actual
data transmissions that will occur. As an initial method of analysis, we went
through the code of known malicious applications to determine potential sources
and sinks of sensitive information. This is the same process that will be per-
formed automatically in our application during static analysis. We found a wide
variety of sources and sinks during our manual analysis, including several in-
stances of location and account information being sent to outside sources.
Our application will also run an additional type of static analysis, the extrac-
tion of event-space constraints. We will discuss this in the next section where we
explain symbolic execution.
events [20]. These are then used to create the event constraint graph, which
shows all possible data transmission paths and the GUI events related to each
one [20]. It is made carefully so as to not violate the Android lifecycle, which is
the set up methods that are used in Android instead of a main method to start
and stop the application [21]. Symbolic execution then runs and is used to de-
termine the possible event sequences that lead to data transmission [20]. This
involves narrowing the search space of event sequences by traversing the graph
to find the minimal chains of events that lead to a transmission [20]. The result
is the event inputs and data constraints that must be present to transmit the da-
ta, thus providing the preconditions of data transmission [20]. These are not in
an easily understandable form, so a dynamic analysis platform identifies which
function of the application is used when each GUI manipulation occurs to de-
termine which caused the transmission.
AppIntent uses a human analyst to look at the results of the dynamic analysis
and determine whether or not a transmission was user-intended or not, which is
not something we are able to implement in a downloadable mobile application
[20]. For this reason, DroidData will provide the sequence of events that lead to
the data transmission so that a technologically-aware user can use them to de-
termine whether the transmission was intentional. For users who may not be
able to understand the sequence of GUI manipulations, the static and dynamic
taint analysis and their resulting reports will provide a clear enough picture for
them to understand the transmission, even if they do not know whether it was
intentional or not.
7. Related Works
A variety of tools have used taint analysis to analyze Android applications for
data loss are listed in Table 1.
TaintDroid [3] uses dynamic taint analysis to track sensitive data as it moves
through the Android operating system and provides the user with a notification
when data leaves the system. It uses dynamic taint analysis to track sensitive data
as it moves through the Android operating system and provides the user with a
notification when data leaves the system. Widely used in the field, it tracks taint
Features
Applications Dynamic Creates Requires Tech User Can
Static Analysis
Analysis Report Knowledge Block App
DroidData X X X X
Taint-Droid [3] X X
Securacy [22] X X X
Flow-Droid [12] X X X
AppIntent [20] X X X
DroidScope [13] X X
such as sensors that are difficult to emulate. This makes it difficult to protect all
sensitive data.
8. Future Work
We are currently in the process of implementing DroidData as it is explained in
this paper. We will continue to test and improve it moving forward. In the fu-
ture, we would like to continue to deal with some of the following issues that ex-
ist in the proposed version. DroidData, like most other current analysis tools for
Android, is unable to track implicit, or control, data flows. Some malicious code
uses implicit flows to exploit security mechanisms and avoid detection, so this is
something to protect against in the future [23]. Tools such as Flow Caml support
implicit data flow monitoring in specific languages, and we are interested in ex-
ploring such an implementation in a mobile software in the future [24].
The symbolic execution features of our tool that are meant to determine the
origin of data leaks are of little use to users who are not technologically know-
ledgeable. The results of the dynamic analysis produce the event that was the
origin of the data leak, but it is difficult for someone without an understanding
of the steps of the process to interpret that information and determine whether
the data transmission was intentional or not. A potential future work is to put
that GUI event into a description that the average user can understand in order
to be better able to decide whether the transmission was something he or she
requested.
9. Conclusion
We have proposed DroidData, our novel tool that uses both static and dynamic
analysis to track and monitor data transmission in Android applications. This
approach minimizes false positives and increases code coverage to catch the
maximum number of data leaks. We also implement symbolic execution in or-
der to determine the origin of the data leak, and we provide a clear user interface
that allows users without a technological background to understand where ap-
plications send their data and whether it is being sent through secure channels,
and the opportunity to block those that transmit sensitive information inappro-
priately.
Acknowledgements
This work was supported in part by NSF under grants CNS-1460897,
DGE-1623713. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not necessarily re-
flect the views of the NSF.
References
[1] Faruki, P., Bharmal, A., Laxmi, V., Ganmoor, V., Gaur, M.S., Conti, M. and Rajara-
jan, M. (2015) Android Security: A Survey of Issues, Malware Penetration, and De-
https://ibotpeaches.github.io/Apktool/
[18] SourceForge (2016) Dex2jar. https://sourceforge.net/projects/dex2jar
[19] Arzt, S., Rasthofer, S. and Bodden, E. (2013) Susi: A Tool for the Fully Automated
Classi Cation and Categorization of Android Sources and Sinks. University of
Darmstadt, Darmstadt.
[20] Yang, Z.M., Yang, M., Zhang, Y., Gu, G.F., Ning, P. and Wang, X.S. (2013) Appin-
tent: Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detec-
tion. Proceedings of the 2013 ACM SIGSAC Conference on Computer & Commu-
nications Security, Berlin, 4-8 November 2013, 1043-1054.
https://doi.org/10.1145/2508859.2516676
[21] Android Developer (2016) Managing the Activity Lifecycle.
https://developer.android.com/training/basics/activity-lifecycle/index.html
[22] Ferreira, D., Kostakos, V., Beresford, A.R., Lindqvist, J. and Dey, A.K. (2015) Secu-
racy: An Empirical Investigation of Android Applications’ Network Usage, Privacy
and Security. Proceedings of the 8th ACM Conference on Security & Privacy in
Wireless and Mobile Networks, New York, 22-26 June 2015, 1-11.
https://doi.org/10.1145/2766498.2766506
[23] Russo, A., Sabelfeld, A. and Li, K. (2009) Implicit Flows in Malicious and Nonmali-
cious Code. Proceedings of the 2009 Marktoberdorf Summer School, Garching,
4-16 August 2009, 301-322.
[24] Simonet, V. (2015) Flow Caml.
http://www.normalesup.org/~simonet/soft/flowcaml