TaintDroid - An Information-Flow Tracking System For Realtime Privacy
TaintDroid - An Information-Flow Tracking System For Realtime Privacy
TaintDroid - An Information-Flow Tracking System For Realtime Privacy
1
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
lutions that rely on heavy-weight whole-system emula- Monitoring network disclosure of privacy sensitive in-
tion [7, 57], we leveraged Android’s virtualized archi- formation on smartphones presents several challenges:
tecture to integrate four granularities of taint propaga-
tion: variable-level, method-level, message-level, and • Smartphones are resource constrained. The re-
file-level. Though the individual techniques are not source limitations of smartphones precludes the use
new, our contributions lie in the integration of these of heavyweight information tracking systems such
techniques and in identifying an appropriate trade-off as Panorama [57].
between performance and accuracy for resource con- • Third-party applications are entrusted with several
strained smartphones. Experiments with our prototype types of privacy sensitive information. The mon-
for Android show that tracking incurs a runtime over- itoring system must distinguish multiple informa-
head of less than 14% for a CPU-bound microbench- tion types, which requires additional computation
mark. More importantly, interactive third-party applica- and storage.
tions can be monitored with negligible perceived latency. • Context-based privacy sensitive information is dy-
We evaluated the accuracy of TaintDroid using 30 ran- namic and can be difficult to identify even when
domly selected, popular Android applications that use lo- sent in the clear. For example, geographic locations
cation, camera, or microphone data. TaintDroid correctly are pairs of floating point numbers that frequently
flagged 105 instances in which these applications trans- change and are hard to predict.
mitted tainted data; of the 105, we determined that 37
• Applications can share information. Limiting the
were clearly legitimate. TaintDroid also revealed that 15
monitoring system to a single application does not
of the 30 applications reported users’ locations to remote
account for flows via files and IPC between applica-
advertising servers. Seven applications collected the de-
tions, including core system applications designed
vice ID and, in some cases, the phone number and the
to disseminate privacy sensitive information.
SIM card serial number. In all, two-thirds of the applica-
tions in our study used sensitive data suspiciously. Our We use dynamic taint analysis [57, 44, 8, 61, 39] (also
findings demonstrate that TaintDroid can help expose po- called “taint tracking”) to monitor privacy sensitive in-
tential misbehavior by third-party applications. formation on smartphones. Sensitive information is first
Like similar information-flow tracking systems [7, identified at a taint source, where a taint marking indi-
57], a fundamental limitation of TaintDroid is that it can cating the information type is assigned. Dynamic taint
be circumvented through leaks via implicit flows. The analysis tracks how labeled data impacts other data in a
use of implicit flows to avoid taint detection is, in and of way that might leak the original sensitive information.
itself, an indicator of malicious intent, and may well be This tracking is often performed at the instruction level.
detectable through other techniques such as automated Finally, the impacted data is identified before it leaves
static code analysis [14, 46] as we discuss in Section 8. the system at a taint sink (usually the network interface).
The rest of this paper is organized as follows: Sec- Existing taint tracking approaches have several lim-
tion 2 provides a high-level overview of TaintDroid, Sec- itations. First and foremost, approaches that rely on
tion 3 describes background information on the Android instruction-level dynamic taint analysis using whole sys-
platform, Section 4 describes our TaintDroid design, tem emulation [57, 7, 26] incur high performance penal-
Section 5 describes the taint sources tracked by Taint- ties. Instruction-level instrumentation incurs 2-20 times
Droid, Section 6 presents results from our Android ap- slowdown [57, 7] in addition to the slowdown introduced
plication study, Section 7 characterizes the performance by emulation, which is not suitable for realtime analysis.
of our prototype implementation, Section 8 discusses the Second, developing accurate taint propagation logic has
limitations of our approach, Section 9 describes related proven challenging for the x86 instruction set [40, 48].
work, and Section 10 summarizes our conclusions. Implementations of instruction-level tracking can experi-
ence taint explosion if the stack pointer becomes falsely
tainted [49] and taint loss if complicated instructions
2 Approach Overview such as CMPXCHG, REP MOV are not instrumented
We seek to design a framework that allows users to properly [61]. While most smartphones use the ARM
monitor how third-party smartphone applications handle instruction set, similar false positives and false negatives
their private data in realtime. Many smartphone appli- could arise.
cations are closed-source, therefore, static source code Figure 1 presents our approach to taint tracking on
analysis is infeasible. Even if source code is available, smartphones. We leverage architectural features of vir-
runtime events and configuration often dictate informa- tual machine-based smartphones (e.g., Android, Black-
tion use; realtime monitoring accounts for these environ- Berry, and J2ME-based phones) to enable efficient,
ment specific dependencies. system-wide taint tracking using fine-grained labels with
2
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
Message-level tracking
plications using native third-party libraries, but that the
number of affected applications remains small.
Application Code Msg Application Code
In summary, we provide a novel, efficient, system-
Virtual Virtual Variable-level wide, multiple-marking, taint tracking design by com-
Machine Machine tracking
bining multiple granularities of information tracking.
Method-level
Native System Libraries
tracking While some techniques such as variable tracking within
Network Interface Secondary Storage
File-level an interpreter have been previously proposed (see Sec-
tracking
tion 9), to our knowledge, our approach is the first to
extend such tracking system-wide. By choosing a mul-
Figure 1: Multi-level approach for performance efficient tiple granularity approach, we balance performance and
taint tracking within a common smartphone architecture. accuracy. As we show in Sections 6 and 7, our system-
wide approach is both highly efficient (∼14% CPU over-
clear semantics. First, we instrument the VM interpreter head and ∼4.4% memory overhead for simultaneously
to provide variable-level tracking within untrusted ap- tracking 32 taint markings per data unit) and accurately
plication code.1 Using variable semantics provided by detects many suspicious network packets.
the interpreter provides valuable context for avoiding
3 Background: Android
the taint explosion observed in the x86 instruction set.
Additionally, by tracking variables, we maintain taint Android [1] is a Linux-based, open source, mobile
markings only for data and not code. Second, we use phone platform. Most core phone functionality is imple-
message-level tracking between applications. Tracking mented as applications running on top of a customized
taint on messages instead of data within messages mini- middleware. The middleware itself is written in Java
mizes IPC overhead while extending the analysis system- and C/C++. Applications are written in Java and com-
wide. Third, for system-provided native libraries, we use piled to a custom byte-code known as the Dalvik EXe-
method-level tracking. Here, we run native code with- cutable (DEX) byte-code format. Each application exe-
out instrumentation and patch the taint propagation on cutes within its Dalvik VM interpreter instance. Each in-
return. These methods accompany the system and have stance executes as unique UNIX user identities to isolate
known information flow semantics. Finally, we use file- applications within the Linux platform subsystem. Ap-
level tracking to ensure persistent information conserva- plications communicate via the binder IPC mechanism.
tively retains its taint markings. Binder provides transparent message passing based on
To assign labels, we take advantage of the well- parcels. We now discuss topics necessary to understand
defined interfaces through which applications access sen- our tracking system.
sitive data. For example, all information retrieved from Dalvik VM Interpreter: DEX is a register-based ma-
GPS hardware is location-sensitive, and all informa- chine language, as opposed to Java byte-code, which is
tion retrieved from an address book database is contact- stack-based. Each DEX method has its own predefined
sensitive. This avoids relying on heuristics [10] or man- number of virtual registers (which we frequently refer to
ual specification [61] for labels. We expand on informa- as simply “registers”). The Dalvik VM interpreter man-
tion sources in Section 5. ages method registers with an internal execution state
In order to achieve this tracking at multiple granulari- stack; the current method’s registers are always on the
ties, our approach relies on the firmware’s integrity. The top stack frame. These registers loosely correspond to
taint tracking system’s trusted computing base includes local variables in the Java method and store primitive
the virtual machine executing in userspace and any na- types and object references. All computation occurs
tive system libraries loaded by the untrusted interpreted on registers, therefore values must be loaded from and
application. However, this code is part of the firmware, stored to class fields before use and after use. Note that
and is therefore trusted. Applications can only escape DEX uses class fields for all long term storage, unlike
the virtual machine by executing native methods. In our hardware register-based machine languages (e.g., x86),
target platform (Android), we modified the native library which store values in arbitrary memory locations.
loader to ensure that applications can only load native li- Native Methods: The Android middleware provides ac-
braries from the firmware and not those downloaded by cess to native libraries for performance optimization and
the application. Note that an early 2010 survey of the top third-party libraries such as OpenGL and Webkit. An-
50 most popular free applications in each category of the droid also uses Apache Harmony Java [3], which fre-
Android Market [2] (1100 applications in total) revealed quently uses system libraries (e.g., math routines). Na-
that less than 4% included a .so file. A similar survey tive methods are written in C/C++ and expose function-
conducted in mid 2010 revealed this fraction increased to ality provided by the underlying Linux kernel and ser-
5%, which indicates there is growth in the number of ap- vices. They can also access Java internals, and hence are
3
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
transaction, the modified binder library (4) ensures the
Code
owhowDalvik
TrustedApplication
Application Untrusted Application
Code
how Dalvik Trusted Application Untrusted Application
Interpreted Code
Trusted Application Untrusted Application
parcel has a taint tag reflecting the combined taint mark-
d.ed.internal
Interpreted
Interpreted
ed. internal
Interpreted
(8) (8) ings of all contained data. The parcel is passed transpar-
(8)
(8)
VM
ll VM(more
TaintSource
Source
(more Taint
Taint
Taint Source
Source
(1) Trusted LibraryTaint
(1)
(1)
Trusted
Trusted
Trusted Library Library
Library
Taint Sink TaintSink ently through the kernel (5) and received by the remote
Taint Sink
Sink
ll
all
call bridge]
bridge] JNIHook Hook
Binder Hook
(9)
(9) (9) (11)
untrusted
(11) (11)
application. Note that only the interpreted code
JNI
JNI Hook
Binder
Binder Hook
(3) Hook (7)Binder
Binder
Binder Hook
Hook Hook JNI JNIHook JNI Hook
Hook
: internal
ds:
ds: internal
internal
(2)
(9)
is untrusted. The modified binder library retrieves the
taint(10)tag from the parcel and assigns it to all values read
Userspace
(3)
Userspace
(7)
(3)(3) (4) (7)
MM methods
methods
Userspace
(7)
M methods
(6) (10)
Userspace
ssing
passing Java
Java � (5)
ified as a taint sink (8), e.g., network send, the library
Kernel
s, which is
Kernel
Kernel
Binder Kernel Module retrieves the taint tag for the data in question (9) and re-
es,which
which isis Binder
Binder Kernel Kernel
Module Module
vik. Internal ports the event.
k.
vik.Internal
nually Internal
parse Figure 2: TaintDroid Figure 2.architecture TaintDroidwithin Architecture
Android.
Figure
Figure 2. 2.TaintDroid
TaintDroid Architecture
Architecture Implementing this architecture requires addressing
ally
nually
ated byparse
parse
the several system challenges, including: a) taint tag stor-
ed
ated bythe
by the included in our trusted computing base (see Section 2). age, b) interpreted code taint propagation, c) native code
requent use Dalvik
Android VMcontains interpreter, two types storing the specified
of native methods:taint inter- marking(s) taint propagation, d) IPC taint propagation, and e) sec-
quent
requentuse
methods are
use Dalvik
in
nalthe
Dalvik VM
VMvirtual
VMmethods interpreter,
taint andmap.
interpreter, JNIstoringstoring
As
methods. the
thethetrusted Thespecified
specified application
internal taintVM taint uses
marking(s) marking(s)
the storage taint propagation. The remainder of this
ondary
ethods
unctionality
methods are are in the
tainted virtual
information, taint map.
the As
Dalvik the
in the virtual taint map. As the trusted application uses the describes
methods access interpreter-specific VM trusted
structures propagates
andapplication
APIs. taint uses
tags
section the our design.
JNI
(3) methods
according
tainted conform
information, to our the todataJava native
flow
theDalvik Dalvik interface
rules. When standards
the trusted ap- tags
rvices. An-
nctionality
unctionality tainted information, VMVM propagates
propagates taint tainttags
4.1 Taint Tag Storage
specifications
plication uses [32],
the which requires
tainted information Dalvik in to
an separate
IPC transaction,
of JavaAn-
vices.
ervices. [12]
An- (3)
(3) according
according
Javamodified
argumentsbinder to to our our
into variablesdata data flow flow rules.
usingensures
rules. When
a JNI call
When the
bridge.
the
trusted trusted
ap- ap-
the library information
(4) the parcel message The choice of how to store taint tags influences per-
Portions
Java [12]
of Java [12]
of plication
plication
Conversely, uses
uses the
the tainted
internal tainted
VM methods information in
in an IPCparse
must manually an IPC transaction,
transaction, formance and memory overhead. Dynamic taint track-
em libraries carries a taint tag reflecting the combined taint markings
ortions
Portionsofof the
the modified
modified
arguments frombinder binder
the interpreter’s library
library (4)
byte(4) ensures
ensures
array ofthe the parcel
parcel
arguments. messageingmessage
systems commonly store tags for every data byte or
he Android of all contained data. The parcel is passed transparently
carries
carries
Binder aIPC: a taint
taint Alltag tag reflecting
reflecting
Android the the combined
combined taint
taint markings markings
word [57, 7]. Tracked memory is unstructured and with-
memlibraries
f JNI. libraries
Fur- through the kernel (5) andIPC occurs
received through binder.
by the remote untrusted
out content semantics. Frequently taint tags are stored
enterfaces
heAndroid
Android of
of all is
Binder
all contained
a component-based
contained
application. Notedata. data.The
that theThe
processing
parcelparcel
third-party isand isinterpreted
IPC
passed passed
frame- transparently
transparentlycode is
to work designed for BeOS, extended by Palm in non-adjacent shadow memory [57] and tag maps [61].
JNI.
fkit. JNI. Fur-
Fur- through
through
untrusted. the
the The kernel
kernel modified(5)(5)and and
binder received
received library by the by Inc.,
retrieves
and
the remote
remote the untrusted
taint untrusted
tag
TaintDroid uses variable semantics within the Dalvik in-
Finally, customized for Android by Google. Fundamental to
erfaces
nterfaces toto application.
application.
from the parcel Note
Note and thatthat the
assigns the it third-party
third-party
to all values interpreted
interpreted read code
from code
is
the
terpreter. isWe store taint tags adjacent to variables in
lkit (NDK) binder are parcels, which serialize both active and stan-
t.
kit. Finally,
Finally,
implement untrusted.
untrusted.
parcel (6). The
The
The modified
modified
remote Dalvik
dard data objects. The former includes references to binder
binder library
VM library retrieves
instance retrieves the
propagates the
taint taint
tag
taint
memory, tag
providing spatial locality.
kit (NDK)
pplications.
olkit (NDK) from
tags
from
binder the
(7)
the parcel
identically
parcel
objects, whichand and for assigns
assigns
allows the the untrustedtoit all
it framework to application.
all tovalues
values read read
manage from from
When the the
Dalvik
the has five variable types that require taint stor-
untrusted
parcel (6).application
shared (6).
parcel data Theremote
objects
The remote
between invokesDalvik
Dalvik processes.
VMa library
VM specified
A instance
instancebinder kernelpropagates
propagates age:
as a taint method
taint taint local variables, method arguments, class
itimplement
impedes
mplement
module
sink (8),passes
e.g.,parcel sending messages a thedata between
bufferprocesses.overapplication.
the network, static fields,
the the class instance fields, and arrays. In all cases,
on different
plications.
applications. tags
tags (7)
(7) identically
identically for forthe untrusted
untrusted application. Whenwe When
the
library retrieves the taint tag for the dataspecified
in specified
question store a 32-bit bitvector with each variable to encode
tditimpedes
x86. The
impedes 4 TaintDroid
untrusted
untrusted application
application invokes
invokes a library
a library as a(9-11) as taint
taint
the a tainttag, allowing 32 different taint markings.
and makes a policy decision.
nding better
different
on different sink
sink (8), e.g.,
(8),
TaintDroid
At a high
e.g., asending
is sending
level,
realization
TaintDroid
aofdata
a data ourbuffer buffer
multiple
architecture
over
overgranularity the network,
the network,
enables system-
Dalvikthe
the stores method local variables and arguments
ndx86.
x86.The
The library
library retrieves
retrieves
taint tracking approachthe thetaint taint
within tagtag for for
Android. the the data data inuses
in question
TaintDroid question (9-11)
on an (9-11)
internal stack. When an application invokes a
wide tracking by combining
variable-level theexecution taint Mul- tracking,method, IPC a new stack frame is allocated for all local vari-
ing better
iding better and
and makesaatracking
makes policydecision.
policy within
decision. VM interpreter.
taint tracking,
tiple taint markings native areinterface
stored as taint one taint tracking,
tag. When and secondary ables. Method arguments are also passed via the internal
At aa high
storage
applications highexecute
taint level,TaintDroid
level,
tracking. TaintDroid
native methods, architecture
architecture
variable taint enables enables
tags system- system-
m-wide taint stack. Before calling a method, the callee places the ar-
TaintDroid wide
wide tracking
tracking
are patched
Variable-level on by bycombining
return.
taint combining
Finally, execution
tracking taint execution
While taint
tags previous
are taint
tracking,
assigned tracking,
approaches IPC IPC
guments on the top of the stack such that they become
within an taint
such tracking,
taint tracking,
to parcels
as and
Panorama native
native
propagated
[panorama] interface
interface through taint
and taint
binder. tracking,
tracking,
TaintBochs Note that and secondary
and[taintbochs]
secondary high numbered registers in the callee’s stack frame. We
wide taint the Technical
storage
storage
provide taint
taint Report
tracking.
tracking.
high-accuracy [17] version
taint tracking of this paper via contains
instruction-level allocate taint tag storage by doubling the size of the stack
m-wide taint
more implementation details. frame allocation. Taint tags are interleaved between val-
TaintDroid
tracking
TaintDroidto Variable-level
Variable-level
taint propagation, taint
taint performancetracking
tracking is While
While sacrificed.
previous previousOn theapproaches
approaches other
Figure 2 depicts TaintDroid’s architecture. Informa- ues such that register vi originally accessed via f p[i] is
pplications.
within
within anan end
such
such of
tion isas
the
astainted
Panorama
Panorama spectrum,
(1) in[panorama]
approaches
a[panorama]
trusted application andand such as PRECIP
TaintBochs
TaintBochs
with
[precip]
[taintbochs]
[taintbochs]
sufficient accessed as f p[2 · i] after modification. Note that Dalvik
computing consider
context (e.g., the location provider). The taint inter- trading
provide
provide only high-level
high-accuracy
high-accuracy system
tainttaint calls
tracking
tracking into
via thevia kernel,
instruction-level
instruction-level stores 64-bit variables as two adjacent 32-bit registers on
em applica-
racking toto off
faceaccuracy
taint
taint propagation,
invokes
propagation, afornativeperformance;
performance
method
performance (2) thatthus, isthey
is interfaces
sacrificed. provide
sacrificed.
with Onthe onlyOn nomi-
the theinternal
the
other other stack. While the byte-code interprets these
tracking
oid distribu- nal advantage
Dalvik VM over OSstoring
interpreter, permissions specified (e.g.,
taint those
markings implemented adjacent registers as a single 64-bit value, the interpreter
plications.
applications. end
end of of thethe spectrum,
spectrum,approaches approaches suchsuch as PRECIPas PRECIP [precip] [precip]
int tracking in Android).
in the virtual taint map. The Dalvik VM propagates taint manages these registers as separate values. Therefore,
computing
computing consider
consider
tags
onlyhigh-level
only high-level system
system calls calls
intointo theap-kernel,
the kernel, trading trading
assume all In (3) according towe
TaintDroid, datachoose flow rules a middleas the trusted ground, our modified stack transparently stores and retrieves 64-
variable-
mem applica-
applica- off
off accuracy
accuracy
plication forforperformance;
performance; thus, thus,
they they
provide provide only nomi- only nomi-
n the Dalvik level taintuses the
tracking. tainted information.
TaintDroid Every
is designed interpreter
to taint primitive bit values to and from separate 32-bit registers (at f p[2·i]
doiddistribu-
distribu-
native code, nal
nal advantage
advantage
instance simultaneously over
over OS OS permissions
permissions
propagates
type variables (e.g., int, float, etc). Our taint source and taint(e.g., (e.g.,
tags. those
When those
implemented
the implemented
and f p[2 · i + 2]). Finally, native method targets require
t tracking
int tracking trusted
in
in application uses the tainted information in an IPC
Android).
Android). a slightly different stack frame organization for reasons
usly modify sink libraries (Section VI) provide an easy interface to set
ssume
assume all
all andIncheck
TaintDroid,
the taintwewe
TaintDroid, choose
choose
markings aprimitive
middle
a middle
on ground,
ground,
types. variable-
variable-
However,
hetheDalvik
nTaintDroid.
Dalvik level
there taint
level taint tracking.
tracking.
are cases TaintDroid
whenTaintDroid
object is designed
is designed
references to become
must taint 4
to taint primitive
primitive
tainted
cation
ative with
nativecode,
code, to
typeensure
type taint (e.g.,
variables
variables propagation operates
(e.g.,int,int,float,
float, correctly.
etc).etc).
OurOur Applications
taint taint
sourcesource
and and
. The taint are compiled into the Dalvik EXecutable (DEX) byte-code
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
Low Addresses (0x00000000) Interpreted Targets Native Targets
propriate propagation logic. We use a data flow logic, as
stack pointer (top)
out0
tracking implicit flows requires static analysis and causes
VM goop
significant performance overhead and overestimation in
frame pointer (current)
v0 == local0 tracking [29] (see Section 8). We begin by defining taint
v0 taint tag markings, taint tags, variables, and taint propagation. We
out0 v1 == in0 arg0
then present our logic rules for DEX.
out0 taint tag v1 taint tag arg1
out1 v2 == in1 return taint Let L be the universe of taint markings for a particular
out1 taint tag v2 taint tag arg0 taint tag system. A taint tag t is a set of taint markings, t ⊆ L.
(unused) arg1 taint tag
Each variable has an associated taint tag. A variable is an
VM goop
instance of one of the five types described in Section 4.1.
v0 == local0
frame pointer (previous)
variable We use a different representation for each type. The local
v0 taint tag variable taint tag and argument variables correspond to virtual registers,
v1 == local1
denoted vx . Class field variables are denoted as fx to in-
v1 taint tag
v2 == in0
dicate a field variable with class index x. Instance fields
require an instance object and are denoted vy (fx ), where
v4 taint tag
vy is the instance object reference (note that both the ob-
High Addresses (0xffffffff) ject reference and the dereferenced value are variables).
Figure 3: Modified Stack Format. Taint tags are inter- Static fields are denoted as fx alone, which is shorthand
leaved between registers for interpreted method targets for S(fx ), where S() is the static scope. Finally, vx [·]
and appended for native methods. Dark grayed boxes denotes an array, where vx is an array object reference
represent taint tags. variable.
discussed in Section 4.3. The modified stack format is Our virtual taint map function is τ (·). τ (v) returns the
shown in Figure 3. taint tag t for variable v. τ (v) is also used to assign a
taint tag to a variable. Retrieval and assignment are dis-
Taint tags are stored adjacent to class fields and ar-
tinguished by the position of τ (·) w.r.t. the ← symbol.
rays inside the VM interpreter’s internal data structures.
When τ (v) appears on the right hand side of ←, τ (v) re-
TaintDroid stores only one taint tag per array to minimize
trieves the taint tag for v. When τ (v) appears on the left
storage overhead. Per-value taint tag storage is severely
hand side, τ (v) assigns the taint tag for v. For example,
inefficient for Java String objects, as all characters have
τ (v1 ) ← τ (v2 ) copies the taint tag from v2 to v1 .
the same tag. Unfortunately, storing one taint tag per ar-
ray may result in false positives during taint propagation. Table 1 captures our propagation logic. The table enu-
For example, if untainted variable u is stored into array A merates abstracted versions of the byte-code instructions
at index 0 (A[0]) and tainted variable t is stored into A[1], specified in the DEX documentation. Register variables
then array A is tainted. Later, if variable v is assigned and class fields are referenced by vX and fX , respec-
to A[0], v will be tainted, even though u was untainted. tively. R and E are the return and exception variables
Fortunately, Java frequently uses objects, and object ref- maintained within the interpreter, respectively. A, B, and
erences are infrequently tainted (see Section 4.2), there- C are constants in the byte-code. The table does not list
fore this coding practice leads to less false positives. instructions that clear the taint tag of the destination reg-
ister. For example, we do not consider the array-length
4.2 Interpreted Code Taint Propagation instruction to return a tainted value even if the array is
Taint tracking granularity and flow semantics influ- tainted. Note that the array length is sometimes used to
ence performance and accuracy. TaintDroid implements aid direct control flow propagation (e.g., Vogt et al. [53]).
variable-level taint tracking within the Dalvik VM in- 4.2.2 Tainting Object References
terpreter. Variables provide valuable semantics for taint
propagation, distinguishing data pointers from scalar val- The propagation rules in Table 1 are straightforward
ues. TaintDroid primarily tracks primitive type variables with two exceptions. First, taint propagation logics com-
(e.g., int, float, etc); however, there are cases when object monly include the taint tag of an array index during
references must become tainted to ensure taint propaga- lookup to handle translation tables (e.g., ASCII/UNI-
tion operates correctly; this section addresses why these CODE or character case conversion). For example, con-
cases exist. However, first we present taint tracking in sider a translation table from lowercase to upper case
the Dalvik machine language as a formal logic. characters: if a tainted value “a” is used as an array index,
the resulting “A” value should be tainted even though the
4.2.1 Taint Propagation Logic “A” value in the array is not. Hence, the taint logic for
The Dalvik VM operates on the unique DEX machine aget-op uses both the array and array index taint. Sec-
language instruction set, therefore we must design an ap- ond, when the array contains object references (e.g., an
5
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
Table 1: DEX Taint Propagation Logic. Register variables and class fields are referenced by vX and fX , respectively.
R and E are the return and exception variables maintained within the interpreter. A, B, and C are byte-code constants.
Op Format Op Semantics Taint Propagation Description
const-op vA C vA ← C τ (vA ) ← ∅ Clear vA taint
move-op vA vB vA ← vB τ (vA ) ← τ (vB ) Set vA taint to vB taint
move-op-R vA vA ← R τ (vA ) ← τ (R) Set vA taint to return taint
return-op vA R ← vA τ (R) ← τ (vA ) Set return taint (∅ if void)
move-op-E vA vA ← E τ (vA ) ← τ (E) Set vA taint to exception taint
throw-op vA E ← vA τ (E) ← τ (vA ) Set exception taint
unary-op vA vB vA ← ⊗vB τ (vA ) ← τ (vB ) Set vA taint to vB taint
binary-op vA vB vC vA ← vB ⊗ vC τ (vA ) ← τ (vB ) ∪ τ (vC ) Set vA taint to vB taint ∪ vC taint
binary-op vA vB vA ← vA ⊗ vB τ (vA ) ← τ (vA ) ∪ τ (vB ) Update vA taint with vB taint
binary-op vA vB C v A ← vB ⊗ C τ (vA ) ← τ (vB ) Set vA taint to vB taint
aput-op vA vB vC vB [vC ] ← vA τ (vB [·]) ← τ (vB [·]) ∪ τ (vA ) Update array vB taint with vA taint
aget-op vA vB vC vA ← vB [vC ] τ (vA ) ← τ (vB [·]) ∪ τ (vC ) Set vA taint to array and index taint
sput-op vA fB fB ← vA τ (fB ) ← τ (vA ) Set field fB taint to vA taint
sget-op vA fB vA ← fB τ (vA ) ← τ (fB ) Set vA taint to field fB taint
iput-op vA vB fC vB (fC ) ← vA τ (vB (fC )) ← τ (vA ) Set field fC taint to vA taint
iget-op vA vB fC vA ← vB (fC ) τ (vA ) ← τ (vB (fC )) ∪ τ (vB ) Set vA taint to field fC and object reference taint
public static Integer valueOf(int i) { only by including the object reference taint tag when the
if (i < -128 || i > 127) { value field is read from the Integer (i.e., the iget-op prop-
return new Integer(i); }
agation rule), will the correct taint tag of TAG be assigned
return valueOfCache.CACHE [i+128];
} to out.
static class valueOfCache {
static final Integer[] CACHE = new Integer[256]; 4.3 Native Code Taint Propagation
static {
for(int i=-128; i<=127; i++) { Native code is unmonitored in TaintDroid. Ideally,
CACHE[i+128] = new Integer(i); } } we achieve the same propagation semantics as the in-
} terpreted counterpart. Hence, we define two necessary
postconditions for accurate taint tracking in the Java-
Figure 4: Excerpt from Android’s Integer class illustrat-
like environment: 1) all accessed external variables (i.e.,
ing the need for object reference taint propagation.
class fields referenced by other methods) are assigned
taint tags according to data flow rules; and 2) the re-
Integer array), the index taint tag is propagated to the ob- turn value is assigned a taint tag according to data flow
ject reference and not the object value. Therefore, we rules. TaintDroid achieves these postconditions through
include the object reference taint tag in the instance get an assortment of manual instrumentation, heuristics, and
(iget-op) rule. method profiles, depending on situational requirements.
The code listed in Figure 4 demonstrates a real in-
Internal VM Methods: Internal VM methods are called
stance of where object reference tainting is needed. Here,
directly by interpreted code, passing a pointer to an ar-
valueOf() returns an Integer object for a passed int. If the
ray of 32-bit register arguments and a pointer to a return
int argument is between −128 and 127, valueOf() returns
value. The stack augmentation shown in Figure 3 pro-
reference to a statically defined Integer object. valueOf()
vides access to taint tags for both Java arguments and
is implicitly called for conversion to an object. Consider
the return value. As there are a relatively small number
the following definition and use of a method intProxy().
of internal VM methods which are infrequently added
Object intProxy(int val) { return val; } between versions,2 we manually inspected and patched
int out = (Integer) intProxy(tVal); them for taint propagation as needed. We identified 185
Consider the case where tVal is an int with value 1 internal VM methods in Android version 2.1; however,
and taint tag TAG. When intProxy() is passed tVal, TAG only 5 required patching: the System.arraycopy() native
is propagated to val. When intProxy() returns val, it method for copying array contents, and several native
calls Integer.valueOf() to obtain an Integer instance cor- methods implementing Java reflection.
responding to the scalar variable val. In this case, Inte- JNI Methods: JNI methods are invoked through the
ger.valueOf() returns a reference to the static Integer ob- JNI call bridge. The call bridge parses Java arguments
ject with value 1. The value field (of the Integer class) in and assigns a return value using the method’s descriptor
the object has taint tag of ∅; however, since the aget-op string. We patched the call bridge to provide taint propa-
propagation rule includes the taint of the index register, gation for all JNI methods. When a JNI method returns,
the object reference has a taint tag of TAG. Therefore, TaintDroid consults a method profile table for tag propa-
6
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
gation updates. A method profile is a list of (f rom, to) However, this additional complexity will negatively im-
pairs indicating flows between variables, which may be pact IPC performance.
method parameters, class variables, or return values.
Enumerating the information flows for all JNI methods 4.5 Secondary Storage Taint Propagation
is a time consuming task best completed automatically Taint tags may be lost when data is written to a file.
using source code analysis (a task we leave for future Our design stores one taint tag per file. The taint tag
work). We currently include an additional propagation is updated on file write and propagated to data on file
heuristic patch. The heuristic is conservative for JNI read. TaintDroid stores file taint tags in the file sys-
methods that only operate on primitive and String ar- tem’s extended attributes. To do this, we implemented
guments and return values. It assigns the union of the extended attribute support for Android’s host file system
method argument taint tags to the taint tag of the return (YAFFS2) and formatted the removable SDcard with the
value. While the heuristic has false negatives for meth- ext2 file system. As with arrays and IPC, storing one
ods using objects, it covers many existing methods. taint tag per file leads to false positives and limits the
We performed a survey of the JNI methods included granularity of taint markings for information databases
in the official Android source code (version 2.1) to de- (see Section 5). Alternatively, we could track taint tags
termine specific properties. We found 2,844 JNI meth- at a finer granularity at the expense of added memory and
ods with a Java interface and C or C++ implementation.3 performance overhead.
Of these methods, 913 did not reference objects (as argu-
ments, return value, or method body) and hence are auto-
4.6 Taint Interface Library
matically covered by our heuristic. The remaining meth- Taint sources and sinks defined within the virtualized
ods may or may not have information flows that produce environment must communicate taint tags with the track-
false negatives. Currently, we define method profiles as ing system. We abstract the taint source and sink logic
needed. For example, methods in the IBM NativeCon- into a single taint interface library. The interface per-
verter class require propagation for conversion between forms two functions: 1) add taint markings to variables;
character and byte arrays. and 2) retrieve taint markings from variables. The library
only provides the ability to add and not set or clear taint
4.4 IPC Taint Propagation tags, as such functionality could be used by untrusted
Taint tags must propagate between applications when Java code to remove taint markings.
they exchange data. The tracking granularity affects Adding taint tags to arrays and strings via internal VM
performance and memory overhead. TaintDroid uses methods is straightforward, as both are stored in data ob-
message-level taint tracking. A message taint tag repre- jects. Primitive type variables, on the other hand, are
sents the upper bound of taint markings assigned to vari- stored on the interpreter’s internal stack and disappear
ables contained in the message. We use message-level after a method is called. Therefore, the taint library uses
granularity to minimize performance and storage over- the method return value as a means of tainting primitive
head during IPC. type variables. The developer passes a value or variable
We chose to implement message-level over variable- into the appropriate add taint method (e.g., addTaintInt())
level taint propagation, because in a variable-level sys- and the returned variable has the same value but addition-
tem, a devious receiver could game the monitoring by ally has the specified taint tag. Note that the stack storage
unpacking variables in a different way to acquire val- does not pose complications for taint tag retrieval.
ues without taint propagation. For example, if an IPC
parcel message contains a sequence of scalar values, the
5 Privacy Hook Placement
receiver may unpack a string instead, thereby acquiring Using TaintDroid for privacy analysis requires iden-
values without propagating all the taint tags on scalar val- tifying privacy sensitive sources and instrumenting taint
ues in the sequence. Hence, to prevent applications from sources within the operating system. Historically, dy-
removing taint tags in this way, the current implementa- namic taint analysis systems assume taint source and sink
tion protects taint tags at the message-level. placement is trivial. However, complex operating sys-
Message-level taint propagation for IPC leads to false tems such as Android provide applications information
positives. Similar to arrays, all data items in a parcel in a variety of ways, e.g., direct access, and service inter-
share the same taint tag. For example, Section 8 dis- face. Each potential type of privacy sensitive information
cusses limitations for tracking the IMSI that results from must be studied carefully to determine the best method of
passing as portions the value as configuration parameters defining the taint source.
in parcels. Future implementations will consider word- Taint sources can only add taint tags to memory for
level taint tags along with additional consistency checks which TaintDroid provides tag storage. Currently, taint
to ensure accurate propagation for unpacked variables. source and sink placement is limited to variables in in-
7
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
terpreted code, IPC messages, and files. This section 6 Application Study
discusses how valuable taint sources and sinks can be im-
This section reports on an application study that uses
plemented within these restrictions. We generalize such
TaintDroid to analyze how 30 popular third-party An-
taint sources based on information characteristics.
droid applications use privacy sensitive user data. Exist-
Low-bandwidth Sensors: A variety of privacy sensitive ing applications acquire a variety of user data along with
information types are acquired through low-bandwidth permissions to access the Internet. Our study finds that
sensors, e.g., location and accelerometer. Such informa- two thirds of these applications expose detailed location
tion often changes frequently and is simultaneously used data, the phone’s unique ID, and the phone number using
by multiple applications. Therefore, it is common for the combination of the seemingly innocuous access per-
a smartphone OS to multiplex access to low-bandwidth missions granted at install. This finding was made possi-
sensors using a manager. This sensor manager represents ble by TaintDroid’s ability to monitor runtime access of
an ideal point for taint source hook placement. For our sensitive user data and to precisely relate the monitored
analysis, we placed hooks in Android’s LocationMan- accesses with the data exposure by applications.
ager and SensorManager applications.
6.1 Experimental Setup
High-bandwidth Sensors: Privacy sensitive informa- An early 2010 survey of the 50 most popular free ap-
tion sources such as the microphone and camera are plications in each category of the Android Market [2]
high-bandwidth. Each request from the sensor frequently (1,100 applications, in total) revealed that roughly a third
returns a large amount of data that is only used by one of the applications (358 of the 1,100 applications) re-
application. Therefore, the smartphone OS may share quire Internet permissions along with permissions to ac-
sensor information via large data buffers, files, or both. cess either location, camera, or audio data. From this set,
When sensor information is shared via files, the file must we randomly selected 30 popular applications (an 8.4%
be tainted with the appropriate tag. Due to flexible APIs, sample size), which span twelve categories. Table 2 enu-
we placed hooks for both data buffer and file tainting for merates these applications along with permissions they
tracking microphone and camera information. request at install time. Note that this does not reflect ac-
tual access or use of sensitive data.
Information Databases: Shared information such as ad-
We studied each of the thirty downloaded applica-
dress books and SMS messages are often stored in file-
tions by starting the application, performing any initial-
based databases. This organization provides a useful un-
ization or registration that was required, and then man-
ambiguous taint source similar to hardware sensors. By
ually exercising the functionality offered by the appli-
adding a taint tag to such database files, all informa-
cation. We recorded system logs including detailed in-
tion read from the file will be automatically tainted. We
formation from TaintDroid: tainted binder messages,
used this technique for tracking address book informa-
tainted file output, and tainted network messages with
tion. Note that while TaintDroid’s file-level granularity
the remote address. The overall experiment (conducted
was appropriate for these valuable information sources,
in May 2010) lasted slightly over 100 minutes, generat-
others may exist for which files are too coarse grained.
ing 22,594 packets (8.6MB) and 1,130 TCP connections.
However, we have not yet encountered such sources.
To verify our results, we also logged the network traffic
Device Identifiers: Information that uniquely identifies using tcpdump on the WiFi interface and repeated exper-
the phone or the user is privacy sensitive. Not all per- iments on multiple Nexus One phones, running the same
sonally identifiable information can be easily tainted. version of TaintDroid built on Android 2.1. Though the
However, the phone contains several easily tainted iden- phones used for experiments had a valid SIM card in-
tifiers: the phone number, SIM card identifiers (IMSI, stalled, the SIM card was inactivate, forcing all the pack-
ICC-ID), and device identifier (IMEI) are all accessed ets to be transmitted via the WiFi interface. The packet
through well-defined APIs. We instrumented the APIs trace was used only to verify the exposure of tainted data
for the phone number, ICC-ID, and IMEI. An IMSI taint flagged by TaintDroid.
source has inherent limitations discussed in Section 8. In addition to the network trace, we also noted whether
applications acquired user consent (either explicit or im-
Network Taint Sink: Our privacy analysis identifies plicit) for exporting sensitive information. This provides
when tainted information transmits out the network in- additional context information to identify possible pri-
terface. The VM interpreter-based approach requires the vacy violations. For example, by selecting the “use my
taint sink to be placed within interpreted code. Hence, location” option in a weather application, the user im-
we instrumented the Java framework libraries at the point plicitly consents to disclosing geographic coordinates to
the native socket library is invoked. the weather server.
8
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
Table 2: Applications grouped by the requested permissions (L: location, C: camera, A: audio, P: phone state). Android
Market categories are indicated in parenthesis, showing the diversity of the studied applications.
Permissions†
Applications∗ #
L C A P
The Weather Channel (News & Weather); Cestos, Solitaire (Game); Movies (Entertainment); 6 x
Babble (Social); Manga Browser (Comics)
Bump, Wertago (Social); Antivirus (Communication); ABC — Animals, Traffic Jam, Hearts, 14 x x
Blackjack, (Games); Horoscope (Lifestyle); Yellow Pages (Reference); 3001 Wisdom Quotes
Lite, Dastelefonbuch, Astrid (Productivity), BBC News Live Stream (News & Weather); Ring-
tones (Entertainment)
Layar (Lifestyle); Knocking (Social); Coupons (Shopping); Trapster (Travel); Spongebob Slide 6 x x x
(Game); ProBasketBall (Sports)
MySpace (Social); Barcode Scanner, ixMAT (Shopping) 3 x
Evernote (Productivity) 1 x x x
∗ Listed names correspond to the name displayed on the phone and not necessarily the name listed in the Android Market.
† All listed applications also require access to the Internet.
Table 3: Potential privacy violations by 20 of the studied applications. Note that three applications had multiple
violations, one of which had a violation in all three categories.
Observed Behavior (# of apps) Details
Phone Information to Content Servers (2) 2 apps sent out the phone number, IMSI, and ICC-ID along with the
geo-coordinates to the app’s content server.
Device ID to Content Servers (7)∗ 2 Social, 1 Shopping, 1 Reference and three other apps transmitted
the IMEI number to the app’s content server.
Location to Advertisement Servers (15) 5 apps sent geo-coordinates to ad.qwapi.com, 5 apps to admob.com,
2 apps to ads.mobclix.com (1 sent location both to admob.com and
ads.mobclix.com) and 4 apps sent location† to data.flurry.com.
∗ TaintDroid flagged nine applications in this category, but only seven transmitted the raw IMEI without mentioning such practice in the EULA.
† To the best of our knowledge, the binary messages contained tainted location data (see the discussion below).
6.2 Findings ber, (2) the IMSI which is a unique 15-digit code used to
identify an individual user on a GSM network, and (3)
Table 3 summarizes our findings. TaintDroid flagged
the ICC-ID number which is a unique SIM card serial
105 TCP connections as containing tainted privacy sen-
number. We verified messages were flagged correctly by
sitive information. We manually labeled each mes-
inspecting the plaintext payload.4 In neither case was the
sage based on available context, including remote server
user informed that this information was transmitted off
names and temporally relevant application log messages.
the phone.
We used remote hostnames as an indication of whether
data was being sent to a server providing application This finding demonstrates that Android’s coarse-
functionality or to a third party. Frequently, messages grained access control provides insufficient protection
contained plaintext that aided categorization, e.g., an against third-party applications seeking to collect sensi-
HTTP GET request containing geographic coordinates. tive data. Moreover, we found that one application trans-
However, 21 flagged messages contained binary data. mits the phone information every time the phone boots.
Our investigation indicates these messages were gen- While this application displays a terms of use on first use,
erated by the Google Maps for Mobile [21] and Flur- the terms of use does not specify collection of this highly
ryAgent [20] APIs and contained tainted privacy sensi- sensitive data. Surprisingly, this application transmits the
tive data. These conclusions are supported by message phone data immediately after install, before first use.
transmissions immediately after the application received Device Unique ID: The device’s IMEI was also exposed
a tainted parcel from the system location manager. We by applications. The IMEI uniquely identifies a specific
now expand on our findings for each category and reflect mobile phone and is used to prevent a stolen handset
on potential privacy violations. from accessing the cellular network. TaintDroid flags
Phone Information: Table 2 shows that 21 out of the indicated that nine applications transmitted the IMEI.
30 applications require permissions to read phone state Seven out of the nine applications either do not present
and the Internet. We found that 2 of the 21 applications an End User License Agreement (EULA) or do not spec-
transmitted to their server (1) the device’s phone num- ify IMEI collection in the EULA. One of the seven ap-
9
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
plications is a popular social networking application and TaintDroid, 37 were deemed clearly legitimate use. The
another is a location-based search application. Further- flags resulted from four applications and the OS itself
more, we found two of the seven applications include the while using the Google Maps for Mobile (GMM) API.
IMEI when transmitting the device’s geographic coordi- The TaintDroid logs indicate an HTTP request with the
nates to their content server, potentially repurposing the “User-Agent: GMM . . . ” header, but a binary pay-
IMEI as a client ID. load. Given that GMM functionality includes download-
In comparison, two of the nine applications treat the ing maps based on geographic coordinates, it is obvious
IMEI with more care, thus we do not classify them as that TaintDroid correctly identified location information
potential privacy violators. One application displays a in the payload. Our manual inspection of each message
privacy statement that clearly indicates that the applica- along with the network packet trace confirmed that there
tion collects the device ID. The other uses the hash of were no false positives. We note that there is a possibil-
the IMEI instead of the number itself. We verified this ity of false negatives, which is difficult to verify with the
practice by comparing results from two different phones. lack of the source code of the third-party applications.
Location Data to Advertisement Servers: Half of the Summary: Our study of 30 popular applications shows
studied applications exposed location data to third-party the effectiveness of the TaintDroid system in accu-
advertisement servers without requiring implicit or ex- rately tracking applications’ use of privacy sensitive data.
plicit user consent. Of the fifteen applications, only two While monitoring these applications, TaintDroid gener-
presented a EULA on first run; however neither EULA ated no false positives (with the exception of the IMSI
indicated this practice. Exposure of location informa- taint source which we disabled for experiments, see Sec-
tion occurred both in plaintext and in binary format. tion 8). The flags raised by TaintDroid helped to identify
The latter highlights TaintDroid’s advantages over sim- potential privacy violations by the tested applications.
ple pattern-based packet scanning. Applications sent lo- Half of the studied applications share location data with
cation data in plaintext to admob.com, ad.qwapi.com, advertisement servers. Approximately one third of the
ads.mobclix.com (11 applications) and in binary format applications expose the device ID, sometimes with the
to FlurryAgent (4 applications). The plaintext location phone number and the SIM card serial number. The anal-
exposure to AdMob occurred in the HTTP GET string: ysis was simplified by the taint tag provided by Taint-
...&s=a14a4a93f1e4c68&..&t=062A1CB1D476DE85
Droid that precisely describes which privacy relevant
B717D9195A6722A9&d%5Bcoord%5D=47.6612278900 data is included in the payload, especially for binary pay-
00006%2C-122.31589477&... loads. We also note that there was almost no perceived
latency while running experiments with TaintDroid.
Investigating the AdMob SDK revealed the s= parameter
is an identifier unique to an application publisher, and the 7 Performance Evaluation
coord= parameter provides the geographic coordinates. We now study TaintDroid’s taint tracking overhead.
For FlurryAgent, we confirmed location exposure by Experiments were performed on a Google Nexus One
the following sequence of events. First, a component running Android OS version 2.1 modified for TaintDroid.
named “FlurryAgent” registers with the location man- Within the interpreted environment, TaintDroid incurs
ager to receive location updates. Then, TaintDroid log the same performance and memory overhead regardless
messages show the application receiving a tainted par- of the existence of taint markings. Hence, we only need
cel from the location manager. Finally, the application to ensure file access includes appropriate taint tags.
reports “sending report to http://data.flurry.
com/aar.do” after receiving the tainted parcel. 7.1 Macrobenchmarks
Our experimentation indicates these fifteen applica- During the application study, we anecdotally observed
tions collect location data and send it to advertisement limited performance overhead. We hypothesize that this
servers. In some cases, location data was transmitted is because: 1) most applications are primarily in a “wait
to advertisement servers even when no advertisement state,” and 2) heavyweight operations (e.g., screen up-
was displayed in the application. However, we note that dates and webpage rendering) occur in unmonitored na-
TaintDroid helped us verify that three of the studied ap- tive libraries.
plications (not included in the Table 3) only transmitted To gain further insight into perceived overhead, we
location data per user’s request to pull localized content devised five macrobenchmarks for common high-level
from their servers. This finding demonstrates the impor- smartphone operations. Each experiment was measured
tance of monitoring exercised functionality of an appli- 50 times and observed 95% confidence intervals at least
cation that reflects how the application actually uses or an order of magnitude less than the mean. In each case,
abuses the granted permissions. we excluded the first run to remove unrelated initializa-
Legitimate Flags: Out of 105 connections flagged by tion costs. Experimental results are shown in Table 4.
10
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
2000
Android
Table 4: Macrobenchmark Results 1800 TaintDroid
Android TaintDroid
1600
App Load Time 63 ms 65 ms
600
Application Load Time: The application load time
measures from when Android’s Activity Manager re- 400
11
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
or write “direct” variant is used, which anecdotally oc-
Table 5: IPC Throughput Test (10,000 msgs).
curred with minimal frequency. Similar implementation
Android TaintDroid limitations exist with the sun.misc.Unsafe class, which
Time (s) 8.58 10.89
also operates on native addresses.
Memory (client) 21.06MB 21.88MB
Memory (service) 18.92MB 19.48MB Taint Source Limitations: While TaintDroid is very ef-
fective for tracking sensitive information, it causes sig-
nificant false positives when the tracked information con-
count objects (a username string and a balance integer)
tains configuration identifiers. For example, the IMSI nu-
and provides two interfaces: setAccount() and getAc-
meric string consists of a Mobile Country Code (MCC),
count(). The experiment measures the time for the client
Mobile Network Code (MNC), and Mobile Station Iden-
to invoke each interface pair 10,000 times.
tifier Number (MSIN), which are all tainted together.5
Table 5 summarizes the results of the IPC benchmark. Android uses the MCC and MNC extensively as con-
TaintDroid was 27% slower than Android. TaintDroid figuration parameters when communicating other data.
only adds four bytes to each IPC object, therefore over- This causes all information in a parcel to become tainted,
head due to data size is unlikely. The more likely cause of eventually resulting in an explosion of tainted informa-
the overhead is the continual copying of taint tags as val- tion. Thus, for taint sources that contain configuration
ues are marshalled into and out of the parcel byte buffer. parameters, tainting individual variables within parcels
Finally, TaintDroid used 3.5% more memory than An- is more appropriate. However, as our analysis results in
droid, which is comparable to the consumption observed Section 6 show, message-level taint tracking is effective
during the CaffeineMark benchmarks. for the majority of our taint sources.
8 Discussion 9 Related Work
Approach Limitations: TaintDroid only tracks data Mobile phone host security is a growing concern.
flows (i.e., explicit flows) and does not track control OS-level protections such as Kirin [18], Saint [42],
flows (i.e., implicit flows) to minimize performance over- and Security-by-Contract [15] provide enhanced security
head. Section 6 shows that TaintDroid can track applica- mechanisms for Android and Windows Mobile. These
tions’ expected data exposure and also reveal suspicious approaches prevent access to sensitive information; how-
actions. However, applications that are truly malicious ever, once information enters the application, no addi-
can game our system and exfiltrate privacy sensitive in- tional mediation occurs. In systems with larger displays,
formation through control flows. Fully tracking control a graphical widget [27] can help users visualize sensor
flow requires static analysis [14, 37], which is not appli- access policies. Mulliner et al. [36] provide information
cable to analyzing third-party applications whose source tracking by labeling smartphone processes based on the
code is unavailable. Direct control flows can be tracked interfaces they access, effectively limiting access to fu-
dynamically if a taint scope can be determined [53]; ture interfaces based on acquired labels.
however, DEX does not maintain branch structures that Decentralized information flow control (DIFC) en-
TaintDroid can leverage. On-demand static analysis to hanced operating systems such as Asbestos [52] and HiS-
determine method control flow graphs (CFGs) provides tar [60] label processes and enforce access control based
this context [39]; however, TaintDroid does not currently on Denning’s lattice model for information flow secu-
perform such analysis in order to avoid false positives rity [13]. Flume [30] provides similar enhancements for
and significant performance overhead. Our data flow legacy OS abstractions. DEFCon [34] uses a logic simi-
taint propagation logic is consistent with existing, well lar to these DIFC OSes, but focuses on events and modi-
known, taint tracking systems [7, 57]. Finally, once in- fies a Java runtime with lightweight isolation. Related to
formation leaves the phone, it may return in a network these system-level approaches, PRECIP [54] labels both
reply. TaintDroid cannot track such information. processes and shared kernel objects such as the clipboard
Implementation Limitations: Android uses the Apache and display buffer. However, these process-level infor-
Harmony [3] implementation of Java with a few custom mation flow models are coarse grained and cannot track
modifications. This implementation includes support for sensitive information within untrusted applications.
the PlatformAddress class, which contains a native ad- Tools that analyze applications for privacy sensi-
dress and is used by DirectBuffer objects. The file and tive information leaks include Privacy Oracle [28] and
network IO APIs include write and read “direct” vari- TightLip [59]. These tools investigate applications while
ants that consume the native address from a DirectBuffer. treating them as a black box, thus enabling analysis of
TaintDroid does not currently track taint tags on Direct- off-the-shelf applications. However, this black-box anal-
Buffer objects, because the data is stored in opaque native ysis tool becomes ineffective when applications use en-
data structures. Currently, TaintDroid logs when a read cryption prior to releasing sensitive information.
12
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
Language-based information flow security [46] ex- for files and SQL databases to serialize and de-serialize
tends existing programming languages by labeling vari- objects and policy with byte-level granularity. Taint-
ables with security attributes. Compilers use the secu- Droid’s interpreted code taint propagation bears similar-
rity labels to generate security proofs, e.g., Jif [37, 38] ity to some of these works. However, TaintDroid im-
and SLam [24]. Laminar [45] provides DIFC guarantees plements system-wide information flow tracking, seam-
based on programmer defined security regions. However, lessly connecting interpreter taint tracking with a range
these languages require careful development and are of- of operating system sharing mechanisms.
ten incompatible with legacy software designs [25].
10 Conclusions
Dynamic taint analysis provides information track-
ing for legacy programs. The approach has been used While some mobile phone operating systems allow
to enhance system integrity (e.g., defend against soft- users to control applications’ access to sensitive informa-
ware attacks [41, 44, 8]) and confidentiality (e.g., dis- tion, such as location sensors, camera images, and con-
cover privacy exposure [57, 16, 61]), as well as track tact lists, users lack visibility into how applications use
Internet worms [9]. Dynamic tracking approaches their private data. To address this, we present TaintDroid,
range from whole-system analysis using hardware exten- an efficient, system-wide information flow tracking tool
sions [51, 11, 50] and emulation environments [7, 57] that can simultaneously track multiple sources of sensi-
to per-process tracking using dynamic binary transla- tive data. A key design goal of TaintDroid is efficiency,
tion (DBT) [6, 44, 8, 61]. The performance and mem- and TaintDroid achieves this by integrating four gran-
ory overhead associated with dynamic tracking has re- ularities of taint propagation (variable-level, message-
sulted in an array of optimizations, including optimizing level, method-level, and file-level) to achieve a 14% per-
context switches [44], on-demand tracking [26] based formance overhead on a CPU-bound microbenchmark.
on hypervisor introspection, and function summaries for We also used our TaintDroid implementation to study
code with known information flow properties [61]. If the behavior of 30 popular third-party applications, cho-
source code is available, significant performance im- sen at random from the Android Marketplace. Our study
provements can be achieved by automatically instru- revealed that two-thirds of the applications in our study
menting legacy programs with dynamic tracking func- exhibit suspicious handling of sensitive data, and that 15
tionality [56, 31]. Automatic instrumentation has also of the 30 applications reported users’ locations to remote
been performed on x86 binaries [47], providing a com- advertising servers. Our findings demonstrate the effec-
promise between source code translation and DBT. Our tiveness and value of enhancing smartphone platforms
TaintDroid design was inspired by these prior works, but with monitoring tools such as TaintDroid.
addressed different challenges unique to mobile phones. Acknowledgments
To our knowledge, TaintDroid is the first taint tracking
We would like to thank Intel Labs, Berkeley and
system for a mobile phone and is the first dynamic taint
Seattle for its support and feedback during the design
analysis system to achieve practical system-wide analy-
and prototype implementation of this work. We thank
sis through the integration of tracking multiple data ob-
Jayanth Kannon, Stuart Schechter, and Ben Greenstein
ject granularities.
for their feedback during the writing of this paper. We
Finally, dynamic taint analysis has been applied to vir- also thank Kevin Butler, Stephen McLaughlin, Machigar
tual machines and interpreters. Haldar et al. [22] in- Ongtang, and the SIIS lab as a whole for their helpful
strument the Java String class with taint tracking to pre- comments. This material is based upon work supported
vent SQL injection attacks. WASP [23] has similar mo- by the National Science Foundation. William Enck and
tivations; however, it uses positive tainting of individ- Patrick McDaniel were partially supported by NSF Grant
ual characters to ensure the SQL query contains only No. CNS-0905447, CNS-0721579 and CNS-0643907.
high-integrity substrings. Chandra and Franz [5] pro- Landon Cox and Peter Gilbert were partially supported
pose fine-grained information flow tracking within the by NSF CAREER Award CNS-0747283. Any opinions,
JVM and instrument Java byte-code to aid control flow findings, and conclusions or recommendations expressed
analysis. Similarly, Nair et al. [39] instrument the Kaffe in this material are those of the author(s) and do not nec-
JVM. Vogt et al. [53] instrument a Javascript interpreter essarily reflect the views of the National Science Foun-
to prevent cross-site scripting attacks. Xu et al. [56] au- dation.
tomatically instrument the PHP interpreter source code
with dynamic information tracking to prevent SQL in- References
jection attacks. Finally, the Resin [58] environment for [1] Android. http://www.android.com.
PHP and Python uses data flow tracking to prevent an as- [2] Android Market. http://market.android.com.
sortment of Web application attacks. When data leaves [3] Apache Harmony – Open Source Java Platform. http://
the interpreted environment, Resin implements filters harmony.apache.org.
13
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
[4] A PPLE , I NC . Apples App Store Downloads Top Three [21] Google Maps for Mobile. http://www.google.com/
Billion. http://www.apple.com/pr/library/2010/ mobile/products/maps.html.
01/05appstore.html, January 2010. [22] H ALDAR , V., C HANDRA , D., AND F RANZ , M. Dynamic
[5] C HANDRA , D., AND F RANZ , M. Fine-Grained Information Taint Propagation for Java. In Proceedings of the 21st Annual
Flow Analysis and Enforcement in a Java Virtual Machine. In Computer Security Applications Conference (ACSAC) (Decem-
Proceedings of the 23rd Annual Computer Security Applications ber 2005), pp. 303–311.
Conference (ACSAC) (December 2007). [23] H ALFOND , W. G., O RSO , A., AND M ANOLIOS , P. WASP:
[6] C HENG , W., Z HAO , Q., Y U , B., AND H IROSHIGE , S. Taint- Protecting Web Applications Using Positive Tainting and Syntax-
Trace: Efficient Flow Tracing with Dyanmic Binary Rewriting. Aware Evaluation. IEEE Transactions on Software Engineering
In Proceedings of the IEEE Symposium on Computers and Com- 34, 1 (2008), 65–81.
munications (ISCC) (June 2006), pp. 749–754. [24] H EINTZE , N., AND R IECKE , J. G. The SLam Calculus: Pro-
[7] C HOW, J., P FAFF , B., G ARFINKEL , T., C HRISTOPHER , K., gramming with Secrecy and Integrity. In Proceedings of the
AND ROSENBLUM , M. Understanding Data Lifetime via Whole Symposium on Principles of Programming Languages (POPL)
System Simulation. In Proceedings of the 13th USENIX Security (1998), pp. 365–377.
Symposium (August 2004). [25] H ICKS , B., A HMADIZADEH , K., AND M C DANIEL , P. Under-
[8] C LAUSE , J., L I , W., AND O RSO , A. Dytan: A Generic Dy- standing practical application development in security-typed lan-
namic Taint Analysis Framework. In Proceedings of the 2007 in- guages. In 22st Annual Computer Security Applications Confer-
ternational symposium on Software testing and analysis (2007), ence (ACSAC) (2006), pp. 153–164.
pp. 196–206. [26] H O , A., F ETTERMAN , M., C LARK , C., WARFIELD , A., AND
[9] C OSTA , M., C ROWCROFT, J., C ASTRO , M., ROWSTRON , A., H AND , S. Practical Taint-Based Protection using Demand Emu-
Z HOU , L., Z HANG , L., AND BARHAM , P. Vigilante: End-to- lation. In Proceedings of the European Conference on Computer
End Containment of Internet Worms. In Proceedings of the ACM Systems (EuroSys) (2006), pp. 29–41.
Symposium on Operating Systems Principles (2005). [27] H OWELL , J., AND S CHECHTER , S. What You See is What they
[10] C OX , L. P., AND G ILBERT, P. RedFlag: Reducing Inadvertent Get: Protecting users from unwanted use of microphones, cam-
Leaks by Personal Machines. Tech. Rep. TR-2009-02, Duke Uni- era, and other sensors. In Proceedings of Web 2.0 Security and
versity, 2009. Privacy Workshop (2010).
[28] J UNG , J., S HETH , A., G REENSTEIN , B., W ETHERALL , D.,
[11] C RANDALL , J. R., AND C HONG , F. T. Minos: Control Data
M AGANIS , G., AND KOHNO , T. Privacy Oracle: A System for
Attack Prevention Orthogonal to Memory Model. In Proceedings
Finding Application Leaks with Black Box Differential Testing.
of the International Symposium on Microarchitecture (December
In Proceedings of ACM CCS (2008).
2004), pp. 221–232.
[29] K ING , D., H ICKS , B., H ICKS , M., AND JAEGER , T. Implicit
[12] DAVIES , C. iPhone spyware debated as app li-
Flows: Can’t Live with ’Em, Can’t Live without ’Em. In Pro-
brary “phones home”. http://www.slashgear.
ceedings of the International Conference on Information Systems
com/iphone-spyware-debated-as-app-
Security (2008).
library-phones-home-1752491/, August 17, 2009.
[30] K ROHN , M., Y IP, A., B RODSKY, M., C LIFFER , N.,
[13] D ENNING , D. E. A Lattice Model of Secure Information Flow.
K AASHOEK , M. F., KOHLER , E., AND M ORRIS , R. Informa-
Communications of the ACM 19, 5 (May 1976), 236–243.
tion Flow Control for Standard OS Abstractions. In Proceedings
[14] D ENNING , D. E., AND D ENNING , P. J. Certification of Pro- of ACM Symposium on Operating Systems Principles (2007).
grams for Secure Information Flow. Communications of the ACM [31] L AM , L. C., AND CKER C HIUEH , T. A General Dynamic Infor-
20, 7 (July 1977). mation Flow Tracking Framework for Security Applications. In
[15] D ESMET, L., J OOSEN , W., M ASSACCI , F., P HILIPPAERTS , Proceedings of the Annual Computer Security Applications Con-
P., P IESSENS , F., S IAHAAN , I., AND VANOVERBERGHE , D. ference (ACSAC) (2006).
Security-by-contract on the .NET platform. Information Security [32] L IANG , S. Java Native Interface: Programmer’s Guide and
Technical Report 13, 1 (January 2008), 25–32. Specification. Prentice Hall PTR, 1999.
[16] E GELE , M., K RUEGEL , C., K IRDA , E., Y IN , H., AND S ONG , [33] L OOKOUT. Introducing the App Genome Project.
D. Dyanmic Spyware Analysis. In Proceedings of the USENIX http://blog.mylookout.com/2010/07/
Annual Technical Conference (June 2007), pp. 233–246. introducing-the-app-genome-project/, July
[17] E NCK , W., G ILBERT, P., C HUN , B.-G., C OX , L. P., J UNG , 2010.
J., M C DANIEL , P., AND S HETH , A. N. TaintDroid: An [34] M IGLIAVACCA , M., PAPAGIANNIS , I., E YERS , D. M., S HAND ,
Information-Flow Tracking System for Realtime Privacy Mon- B., BACON , J., AND P IETZUCH , P. DEFCon: High-Performance
itoring on Smartphones. Tech. Rep. NAS-TR-0120-2010, Net- Event Processing with Information Security. In PROCEEDINGS
work and Security Research Center, Department of Computer of the USENIX Annual Technical Conference (2010).
Science and Engineering, Pennsylvania State University, Univer-
[35] M OREN , D. Retrievable iPhone numbers mean potential
sity Park, PA, USA, August 2010.
privacy issues. http://www.macworld.com/article/
[18] E NCK , W., O NGTANG , M., AND M C DANIEL , P. On 143047/2009/09/phone_hole.html, September 29,
Lightweight Mobile Phone Application Certification. In Proceed- 2009.
ings of the 16th ACM Conference on Computer and Communica-
[36] M ULLINER , C., V IGNA , G., DAGON , D., AND L EE , W. Us-
tions Security (CCS) (November 2009).
ing Labeling to Prevent Cross-Service Attacks Against Smart
[19] F ITZPATRICK , M. Mobile that allows bosses to snoop on staff Phones. In Proceedings of Detection of Intrusions and Malware
developed. BBC News, March 2010. http://news.bbc. & Vulnerability Assessment (DIMVA) (2006).
co.uk/2/hi/technology/8559683.stm. [37] M YERS , A. C. JFlow: Practical Mostly-Static Information Flow
[20] Flurry Mobile Application Analytics. http://www.flurry. Control. In Proceedings of the ACM Symposium on Principles of
com/product/technical-info.html. Programming Langauges (POPL) (January 1999).
14
To appear at the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10)
[38] M YERS , A. C., AND L ISKOV, B. Protecting Privacy Using the [53] VOGT, P., N ENTWICH , F., J OVANOVIC , N., K IRDA , E.,
Decentralized Label Model. ACM Transactions on Software En- K RUEGEL , C., AND V IGNA , G. Cross-Site Scripting Preven-
gineering and Methodology 9, 4 (October 2000), 410–442. tion with Dynamic Data Tainting and Static Analysis. In Proc. of
Network & Distributed System Security (2007).
[39] NAIR , S. K., S IMPSON , P. N., C RISPO , B., AND TANENBAUM ,
A. S. A Virtual Machine Based Information Flow Control Sys- [54] WANG , X., L I , Z., L I , N., AND C HOI , J. Y. PRECIP: Towards
tem for Policy Enforcement. In the 1st International Workshop Practical and Retrofittable Confidential Information Protection.
on Run Time Enforcement for Mobile and Distributed Systems In Proceedings of 15th Network and Distributed System Security
(REM) (2007). Symposium (NDSS) (2008).
[40] N EWSOME , J., M C C AMANT, S., AND S ONG , D. Measuring [55] WhatApp. http://www.whatapp.org. Accessed April
channel capacity to distinguish undue influence. In ACM SIG- 2010.
PLAN Workshop on Programming Languages and Analysis for [56] X U , W., B HATKAR , S., AND S EKAR , R. Taint-Enhanced Pol-
Security (2009). icy Enforcement: A Practical Approach to Defeat a Wide Range
[41] N EWSOME , J., AND S ONG , D. Dynamic Taint Analysis for of Attacks. In Proceedings of the USENIX Security Symposium
Automatic Detection, Analysis, and Signature Generation of Ex- (August 2006), pp. 121–136.
ploits on Commodity Software. In Proc. of Network and Dis- [57] Y IN , H., S ONG , D., E GELE , M., K RUEGEL , C., AND K IRDA ,
tributed System Security Symposium (2005). E. Panorama: Capturing System-wide Information Flow for Mal-
ware Detection and Analysis. In Proceedings of ACM Computer
[42] O NGTANG , M., M C L AUGHLIN , S., E NCK , W., AND M C -
and Communications Security (2007).
DANIEL , P. Semantically Rich Application-Centric Security in
Android. In Proceedings of the 25th Annual Computer Security [58] Y IP, A., WANG , X., Z ELDOVICH , N., AND K AASHOEK , M. F.
Applications Conference (ACSAC) (2009). Improving Application Security with Data Flow Assertions. In
Proceedings of the ACM Symposium on Operating Systems Prin-
[43] P ENDRAGON S OFTWARE C ORPORATION. CaffeineMark 3.0.
ciples (Oct. 2009).
http://www.benchmarkhq.ru/cm30/.
[59] Y UMEREFENDI , A. R., M ICKLE , B., AND C OX , L. P. TightLip:
[44] Q IN , F., WANG , C., L I , Z., SEOP K IM , H., Z HOU , Y., AND Keeping Applications from Spilling the Beans. In Proceedings
W U , Y. LIFT: A Low-Overhead Practical Information Flow of the 4th USENIX Symposium on Network Systems Design &
Tracking System for Detecting Security Attacks. In Proceedings Implementation (NSDI) (2007).
of the 39th Annual IEEE/ACM International Symposium on Mi-
croarchitecture (2006), pp. 135–148. [60] Z ELDOVICH , N., B OYD -W ICKIZER , S., KOHLER , E., AND
M AZI ÈRES , D. Making Information Flow Explicit in HiStar. In
[45] ROY, I., P ORTER , D. E., B OND , M. D., M C K INLEY, K. S., Proceedings of the 7th symposium on Operating Systems Design
AND W ITCHEL , E. Laminar: Practical Fine-Grained Decentral- and Implementation (OSDI) (2006).
ized Information Flow Control. In Proceedings of Programming
Language Design and Implementation (2009). [61] Z HU , D., J UNG , J., S ONG , D., KOHNO , T., AND W ETHERALL ,
D. Privacy Scope: A Precise Information Flow Tracking Sys-
[46] S ABELFELD , A., AND M YERS , A. C. Language-based tem For Finding Application Leaks. Tech. Rep. EECS-2009-145,
information-flow security. IEEE Journal on Selected Areas in Department of Computer Science, UC Berkeley, 2009.
Communication 21, 1 (January 2003), 5–19.
[47] S AXENA , P., S EKAR , R., AND P URANIK , V. Efficient Fine- Notes
Grained Binary Instrumentation with Applications to Taint- 1 A similar approach can be applied to just-in-time compilation by
Tracking. In Proceedings of the IEEE/ACM symposium on Code inserting tracking code within the generated binary.
Generation and Optimization (CGO) (2008). 2 Only 11 internal VM methods were added between versions 1.5
[49] S LOWINSKA , A., AND B OS , H. Pointless Tainting? Evaluating Section 8, we disabled the IMSI taint source for experiments. Nonethe-
the Practicality of Pointer Tainting. In Proceedings of the Euro- less, TaintDroid’s flag of the ICC-ID and the phone number led us to
pean Conference on Computer Systems (EuroSys) (April 2009), find the IMSI contained in the same payload.
5 Regardless of the string separation, the MCC and MNC are identi-
pp. 61–74.
fiers that warrant taint sources.
[50] S UH , G. E., L EE , J. W., Z HANG , D., AND D EVADAS , S. Se-
cure Program Execution via Dynamic Information Flow Track-
ing. In Proceedings of Architectural Support for Programming
Languages and Operating Systems (2004).
[51] VACHHARAJANI , N., B RIDGES , M. J., C HANG , J., R ANGAN ,
R., OTTONI , G., B LOME , J. A., R EIS , G. A., VACHHARA -
JANI , M., AND AUGUST, D. I. RIFLE: An Architectural Frame-
work for User-Centric Information-Flow Security. In Proceed-
ings of the 37th annual IEEE/ACM International Symposium on
Microarchitecture (2004), pp. 243–254.
[52] VANDEBOGART, S., E FSTATHOPOULOS , P., KOHLER , E.,
K ROHN , M., F REY, C., Z IEGLER , D., K AASHOEK , F., M OR -
RIS , R., AND M AZI ÈRES , D. Labels and Event Processes in
the Asbestos Operating System. ACM Transactions on Computer
Systems (TOCS) 25, 4 (December 2007).
15