0% found this document useful (0 votes)
11 views10 pages

A Forensic Analysis of Android Malware

This paper presents a forensic analysis of Android malware, utilizing a dataset of over 500,000 applications collected from various markets. The study reveals that malware labeling is inconsistent across antivirus products, with many malware writers using code from existing sources rather than creating original content. Additionally, the authors propose a basic malware detection scheme to enhance existing antivirus solutions based on their findings.

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

A Forensic Analysis of Android Malware

This paper presents a forensic analysis of Android malware, utilizing a dataset of over 500,000 applications collected from various markets. The study reveals that malware labeling is inconsistent across antivirus products, with many malware writers using code from existing sources rather than creating original content. Additionally, the authors propose a basic malware detection scheme to enhance existing antivirus solutions based on their findings.

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2014 IEEE 38th Annual International Computers, Software and Applications Conference

A Forensic Analysis of Android Malware


How is Malware Written and How it Could be Detected?

Kevin Allix, Quentin Jerome, Tegawendé F. Bissyandé, Jacques Klein, Radu State and Yves Le Traon
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg
Luxembourg
{firstname.lastname}@uni.lu

Abstract—We consider in this paper the analysis of a large For the purpose of our study, we have collected a large
set of malware and benign applications from the Android and up-to-date dataset of hundreds of thousands of Android
ecosystem. Although a large body of research work has dealt applications from markets and repositories. We have then
with Android malware over the last years, none has addressed
it from a forensic point of view. scanned each of these applications using about 45 antivirus
After collecting over 500 000 applications from user markets products generously hosted by VirusTotal to assess whether
and research repositories, we perform an analysis that yields they are labelled as malware or not. This effort was made to
precious insights on the writing process of Android malware. obtain a clear view of the business of malware writing and
This study also explores some strange artifacts in the datasets, some insights in the evolution of malware and its detection
and the divergent capabilities of state-of-the-art antivirus to
recognize/define malware. We further highlight some major by antivirus products. We also take this opportunity to
weak usage and misunderstanding of Android security by investigate, indirectly, how skilled malware creators are.
the criminal community and show some patterns in their Several research studies have investigated Android
operational flow. Finally, using insights from this analysis, we malware [1], [8], [23], [24]. Most of these academic works
build a naive malware detection scheme that could complement
existing anti virus software.
are however about using advanced code analysis and data
Keywords-Android Security, Digital Forensics, Malware mining techniques to study applications. Thus, there are
Analysis, Malware development scarce reports on the actual artefacts that a typical incident
I. I NTRODUCTION responder would rely on in practice. Our study aims at filling
this gap by performing such an analysis and reporting our
Android has progressively grown to become in a few findings based on a large dataset. The main contributions of
years the most widely used smartphone operating system [6]. this paper are:
With more and more users relying on Android-enabled
handheld device, and able to install third party applications • We extensively provide evidence that malware labelling
from official and alternative markets, the security of both is not a precise science. Applications are flagged or not
devices and the underlying network becomes an essential depending on the antivirus product;
concern for both the end user and his service provider. In • We show that most malware writers basically
recent years, practitioners and researchers have witnessed the copy/paste code from fellow developer code and from
emergence of a variety of Android malware. The associated public tutorials/samples from the Web;
threats range from simple user tracking and disclosure of • We find that malware writing is almost a regular
personal information to advanced fraud and premium-rate business, with work cycles following a similar 5
SMS services subscription, or even unwarranted involvement working days per week;
in botnets. Although most users are nowadays aware that • We also highlight that almost all malware writers are
personal computers can and will be attacked by malware, incapable to properly use digital certificates;
very few realize that their smartphone is prone to an equally • Finally, we propose roadmaps for basic detection of
dangerous threat. malware that have not yet been detected by antivirus
products.
To assess the threat of software downloaded from the
internet, discerning users rely on scan results yielded by The paper is structured as follows: Section II provides
antivirus products. Unfortunately, each antivirus vendor has detailed information on the construction of our dataset
its secret recipe on how/why it decides to assign a malware of Android applications, and on the labelling process to
label to a given application. Thus, an application can be categorize benign and malware applications. Section III
differently appreciated by distinct antivirus products, leading depicts the main findings of our experimental analysis. We
to damaging confusions. Indeed, both practitioners and also provide a discussion to guide this analysis. We discuss
researchers heavily rely on antivirus, whether to trust apps related work in Section V. Section VI concludes the paper
or to build the ground truths for assessment tasks. and outlines future work.

0730-3157/14 $31.00 © 2014 IEEE 384


DOI 10.1109/COMPSAC.2014.61
II. P RELIMINARIES download more than a given quantity of application in a
An investigation into the business of malware requires given time frame.
a significant dataset representing real-world applications. AppChina3 : This market is by far the largest alternative
We have built our dataset by collecting applications from market of our dataset. At the time of collection, AppChina
markets, i.e., the online stores where developers distribute was enforcing drastic scraping protections such as a 1Mb/s
their applications to end-users. Indeed, although Google –the bandwidth limitation and a several-hour ban if using
main developer of the Android software stack– operates an simultaneously more than one connection to the service.
official market named Google Play, the policy of Google Anzhi4 : The anzhi market is operated from and for
makes it possible for Android users to download and install Chinese Android user base. It stores and distributes apps
Apps from any other alternative market. that are written in the Chinese languages, and provides a
Alternative markets are often created to distribute specific less-strict screening policy than e.g., Google Play.
selection of applications. For example, some of these Slideme5 : Operated from the United States of America.
markets may focus on a specific geographical area, e.g., this alternative market is a direct competitor of Google Play:
Russia or China, providing users with Apps in their local it provides both free and paid Apps for the Android platform.
languages. Other markets focus exclusively on free software, FreewareLovers6 : A market run by a German
and at least one market is known to be dedicated to adult company, FreewareLovers provides freeware for every major
content. Users may also directly share Apps, either in mobile platform, including Android. A big advantage of
close circles, or with application bundles released through FreewareLovers is that it does not require any specific
BitTorrent. Such apps are often distributed by other users application and can be used with any web browser.
who have paid for them in non-free markets. Finally, we ProAndroid7 : Operated from Russia, ProAndroid
have included in our datasets, apps that have been collected market is the smallest market that we crawled. It distributes
by others to construct research repositories. free Apps only.
In the remainder of this section, we provide details on the F-Droid8 : This repository of Free and open-source
different sources of our dataset, on the scanning process that software on the Android platform also provides a number
were used to label each application as malware or benign, of apps that users can download and install on their devices.
1mobile9 : This market proposes free Android apps for
and on the artifacts that we have extracted from application
direct downloads. It is a very large market that offers users
packages to perform our study.
with opportunities of browsing and retrieving thousands of
A. Dataset sources apps.
We have developed specialized crawlers for several market In addition to market places, we also looked into other
places to automatically browse their content, find Android distribution channels to collect applications that are shared
applications that could be retrieved for free, and download by bundles.
them into our repository. In this step we have found that Torrents: We have collected a small set of apps which
several market owners took various steps in order to prevent were made available through BitTorrent. We note that such
their market to be automatically mined. Thus, for two of applications are usually distributed without their authors’
such markets, we cannot assure that we have retrieved their consent, and often include paid Apps. Nevertheless, when
whole content. However, to the best of our knowledge, the considering the number of leeches, we were able to notice
total number of apps that we have collected constitutes the that such collections of Android applications appear to
largest dataset of Android apps ever used in research studies. attract a significant number of user downloads, increasing
Google Play1 : The official market of Android is a the interest for investigating malware distributed in such
web-site that allows users to browse its content through a channels.
web browser. Applications cannot however be downloaded Genome10 : Zhou et al. [25] have collected Android
through the web browser as any other file would be. malware samples and gave the research community access
Instead, Google provides an Android application2 that uses to their built dataset. This dataset is divided in families, each
a proprietary protocol to communicate with Google Play containing malware that are closely related to each other.
servers. Furthermore, no application can be downloaded Table I summarizes the number of applications collected
from Google Play without a valid Google account – not even from each market used to build our dataset. The largest share
free Apps. Both issues thus outlined were overcome using of applications are from the official Android market, Google
open-source implementations of the proprietary protocol and Play. Using the SHA256 hash function on applications, we
by creating free Google accounts. The remaining constraint noticed that several thousands applications are found in more
was time, as Google also enforces a strict account-level
3 http://www.appchina.com 4 http://www.anzhi.com
rate-limiting. Indeed, one given account is not allowed to
5 http://slideme.org 6 http://www.freewarelovers.com
1 2 7 http://proandroid.net 8 http://f-droid.org/ 9 market.1mobile.com
http://play.google.com (previously known as Google Market) Also
named Google Play 10 http://www.malgenomeproject.org/

385
than one market. Hence, the total number of unique apps in including McAfee , Symantec or Avast . We have sent
Table I is less than the sum of unique applications in each all applications from our dataset to VirusTotal and collected
market. the scan results for analysis and correlation studies.
Table I
O RIGIN OF THE A NDROID APPS IN OUR DATASET D. Test of Statistical Significance
Marketplace # of Android apps Percentage Our forensics analysis is based on a sample of Android
applications. Although, to the best of our knowledge, no
Google Play 325 214 54.73%
appchina 125 248 21.13%
related study involving Android malware has ever exploited
anzhi 76 414 12.86% that many applications, there is a need to ensure, for some
1mobile 57 506 9.68% of our findings, that they are significant. To this end, we
slideme 27 274 4.68% resort to the common metric of the Mann-Whitney-Wilcoxon
torrents 5 294 0.89% (MWW) test.
freewarelovers 4 145 0.70% The MWW test is a non-parametric statistical hypothesis
proandroid 3 683 0.62%
test that assesses the statistical significance of the difference
fdroid 2 023 0.34%
genome 1 247 0.21% between the distributions in two datasets [19]. We adopt
apk_bang 363 0.06% this test as it does not assume any specific distribution,
a suitable property for our experimental setting. Once the
Total 594 000 Unique apps
Mann-Whitney U value is computed it is used to determine
the p-value. Given a significance level α = 0.001, if
B. Artifacts of study p − value < α, then the test rejects the null hypothesis,
To perform our study we have mined information from implying that the two datasets have different distributions at
the application packages focusing on two artifact metadata the significance level of α = 0.001: there is one chance in
in Android package files. a thousand that this is due to a coincidence.
Packaging dates: An Android application is distributed
III. A NALYSIS
as an .apk file which is actually a ZIP archive containing
all the resources an application needs to run, such as In this section, we describe and interpret the results of our
the application binary code and images. An interesting findings on how malware are written, in comparison with
side-effect of this package format is that all the files that benign applications, and how anti virus products perform in
makes an application go from the developer’s computer to their detection.
end-users’ devices without any modification. In particular, A. Malware identification by anti virus products
all metadata of the files contained in the .apk package,
such as the last modification date, are preserved. Malware identification by anti virus products is critical
All bytecode, representing the application binary code, is to practitioners and researchers alike. Indeed, anti virus
assembled into a classes.dex file that is produced at products remain the most trusted means to flag an
packaging-time. Thus the last modification date of this file application as malware. Traditionally, the common detection
represents the packaging time. In the remainder of this paper, scheme of anti virus is signature-based. Thus, to identify
packaging date will refer to this date. malware statically, antivirus software compares the contents
Certificate Metadata: In the Android platform, a first of application files to their secret dictionary of virus
security measure was made mandatory to guarantee that the signatures. This approach can be very effective, but can only
authenticity of each application can be traced back to its help identify malware for which samples have already been
creator. Thus, all Android applications must be signed with a obtained and associated signatures created. Some antivirus
cryptographic certificate. Certificates are included in the app products add heuristics to their process in order to identify
package to allow end-users to verify the package’s signature. new malware or variants of known malware.
For each application from our dataset, we have collected the In Figure 1, we see that most of our data sources contain
certificates and analyzed their attributes, including owner Android applications that are flagged as malware by at least
and issuer, as described by the X.509 standard [18]. 1 anti virus product hosted by VirusTotal. Even Google Play,
where each application goes through the Bouncer12 , shows a
C. Malware Labelling malware-rate of 22%. These malware are often in the form of
Over the course of several months, while we collect adware, i.e., applications that continuously display undesired
the dataset, we have undertaken to analyze them with anti advertisement during use. Anzhi and AppChina include the
virus products actually used in the software market. For our largest share of flagged applications. Each of all the malware
study, we have relied on VirusTotal11 , a web portal that samples from the Genome dataset are indeed flagged by at
hosts about 40 products from renown anti virus vendors, least one anti virus software.
11 http://www.virustotal.com 12 Google’s in-house environment for screening malware

386
GooglePlay AppChina Anzhi 1mobile Slideme ProAndroid Genome
14% 9%
22% 28% 27% 0%
51% 49%
78% 86% 72% 73% 91% 100%

Malware Goodware

Figure 1. Share of Malware in Datasets: Applications are flagged by at least 1 antivirus product

GooglePlay AppChina Anzhi 1mobile Slideme ProAndroid Genome


6%
10%
2% 20% 22% 1% 0%

98% 80% 78% 94% 90% 99% 100%

Malware Goodware

Figure 2. Share of Malware in Datasets: Applications are flagged by at least 10 antivirus products

GooglePlay AppChina Anzhi 1mobile Slideme ProAndroid Genome

4%
0.01% 0.01% 0% 0.01% 0.01% 0.01%

99.99% 99.99% 100% 99.99% 99.99% 99.99% 96%

Malware Goodware

Figure 3. Share of Malware in Datasets: Applications are flagged by at least 26 antivirus products

Malware shares depicted in Figure 2 indicate that anti of Android applications yields some distinct patterns. In
virus software have divergent scanning results. Indeed, if Figure 4, we plot the packaging date, subdivided by hour,
we require that an application should be tagged as a for benign applications and for all applications flagged by
malware only if at least 10 anti virus products have found at least one anti virus. Despite the potential noise due to the
it suspicious, then the malware rate drops significantly for threshold set by each anti virus to tag malware, we note a
all our data sources. Google Play now only contains 2% of pattern in the compilation dates: it stands out that there are
malware, while all Genome samples are still identified as many more peaks of malware packaging. This suggests that
true malware. malware often are compiled in batches, while compilation
The Genome dataset being a reliable source of known of benign applications are more spread over time.
malware, we change the threshold of anti virus until some
of the applications in the dataset are missed in the scanning To further investigate and strengthen the validity of our
process. Figure 3 provides the different malware share when finding, we consider the samples of confirmed malware from
at least 26 anti virus, out of more than 40, are required to known families exposed in the Genome dataset, and consider
flag an application before it is considered a malware. all other applications from our datasets as benign. This
process is valid when considering a very strict threshold
Anti virus software cannot each identify all existing where an application is labelled as malware if at least half,
malware. Only a small subset of widely known i.e., 22, of the anti virus software from VirusTotal flag it.
malware are recognized by a large number of anti Figure 5 thus confirms more strongly that Android malware
virus software. are compiled in batches. The 1258 malware of the genome
dataset have been packaged on only 244 different days. 51
B. Android Malware Production malware were packaged on 2011-09-21 alone, representing
We proceed to investigate the production of Android 16% of all Android apps packaged on this day. Only 72
malware to draw insights. The analysis of packaging dates malware were packaged each alone in a distinct day when

387
1000
900 malware
800 benign
700
600
500
400
300
200
100
0
2012-01-01 2012-02-01 2012-03-01 2012-04-01 2012-05-01 2012-06-01
Figure 4. Number of benign and malware packaged between 01 January 2012 and 01 June 2012

400 software operated by VirusTotal.


350 Total number of apps
Number of malware in the genome
Table II highlights the distribution of app packaging dates
300
for both benign applications and malware across week days.
250
The percentage of apps packaged during business days are
200
150
actually similar for malware and benign applications. A test
100 of significance with the MWW test further confirms that the
40 46
50 2 8 0 0 1 1 0 1 1 0 8 0 3 8 2
statistical difference is near null.
0 On average, 19% of benign applications are packaged
20 20 20 20 20 20 20 20
11 11 11 11 11 11 11 11 during weekends, while this is the case for only 13% of
-06 -06 -06 -06 -06 -06 -06 -06
-02 -04 -06 -08 -10 -12 -14 -16
malware. We further use the MWW test to confirm that
the difference between weekdays and week-end days is
Figure 5. Number of packaged application and of packaged malware over
time: Focus on period 2011-06-01 to 2011-06-17
statistically more significant for malware than for benign.
There is thus a clear pattern of five-day work per week. A
possible explanation to this pattern could be that malware
no other malware was packaged. We counted 78 cases where writing is performed by some developers during their regular
at least two malware were packaged in the same second. office hours while working for their employer. A second
At 15 instances, four or more malware were packaged in reason might be that malware writers follow a standard work
the same second; Two of those instances saw ten or more schedule and do not work during weekends, thus suggesting
new malware being packaged. Such a strong time locality an industrial process in the building of malware rather than
suggests that malware writers have set up tools to automate a spare-time hobby.
the malware packaging process. One single certificate (md5: There appear to be evidence that the business of
264BF7D71E0EDC4FCB8A9A16AB7C3357) even managed malware writing, or at least their proliferation, is at
to sign 781 apps detected as malware by at least one anti an industrial scale.
virus in the same second (2012-01-07 14:25:06).
Malware development is often a standardized process D. Digital Certificates
that aims at producing a large number of malware Android applications rely on digital certificates to build a
at once. Aside from rare cases of target-specialized trust model between developers and end users. Applications
malware, malware are built in bulk in the like of signed using the same certificate can share information
slightly different applications. and data at runtime (if allowed by explicit permissions).
Certificates also allow to link a set of applications with
C. Business of Malware Writing their developer, although this linking does not ensure that
We now look into the process of malware writing. We the identity of a developer is certified. Indeed, certificates
focus our analysis on their apparition cycles by clustering can be self-signed, rather than signed by a competent
Android applications based on the week day during which trustworthy authority, and therefore do not necessary lead
they were packaged. For this experiment, we only consider to the real developer. However, finding the same certificate
malware that were detected by at least half of the anti virus (serial number, fingerprint , issuer and owner) in several

388
Table II
D ISTRIBUTION OF A NDROID PACKAGING DATES ACROSS WEEK DAYS

Monday Tuesday Wednesday Thursday Friday Saturday Sunday


Benign Apps 56,476 15.75% 57,728 16.10% 58,078 16.20% 58,995 16.45% 58,926 16.43% 34,223 9.54% 34,182 9.53%
Malware (Threshold=24) 236 14.68% 376 23.38% 284 17.66% 276 17.16% 225 13.99% 90 5.60% 121 7.52%
Malware (Threshold=25) 211 14.46% 342 23.44% 265 18.16% 254 17.41% 205 14.05% 81 5.55% 101 6.92%
Malware (Threshold=26) 200 14.60% 328 23.94% 249 18.18% 234 17.08% 190 13.87% 74 5.40% 95 6.93%

applications is a strong indicator of either a unique origin, other applications. Indeed, we have found that 95% of
or of advanced certificate stealing and reuse. the certificates signed less than 10 apps. However, for the
Our analysis on certificates aims at understanding the remainder of certificates that were used in large numbers of
practice of certificate use by malware writers. We first note, applications, different patterns emerge.
based on our datasets, that self-signed certificates are the
Table III summarizes the top certificates used in
norm rather than the exception for Android developers.
malware packages. Once again, we consider as malware
Of the 165 542 certificates in our dataset, only 51 are
all applications that were flagged as suspicious by at least
not self-signed. Self-signed certificates were used to sign
half of the anti virus products in VirusTotal. The numbers
99.88% of the apps in our dataset. Consequently, most
distinctly provide evidence that there a mass development
certificates carry no information that could be trusted about
and deployment of Android benign and malware apps was
the identity of the application developer.
put in place. For instance, three certificates were used each
Our findings apply particularly to malware development.
for more than 160 malware. The top-used certificate by
We focus our study on the subset of malware in the Genome
malware is also used by over 4 623 benign applications: a
dataset. These are well established malware that most anti
realistic hypothesis to support this fact would be that the
virus products can identify. For instance, the certificate
private key was somehow leaked, leading to many otherwise
that holds the serial number E6EFD52A17E0DCE7 was
unrelated writers to use and share the same certificate.
used in at least two different malware applications. Manual
searching for the Issuer-related fields13 does lead to the We further consider the overlap between benign and
blog of a well known Android developer. One entry of this malicious applications that share the same certificate. In
blog addressed the issue of signing of Android applications. Table IV, we indicate the top certificates that are used by
After reading this entry, we found out that writers of the both malware and goodware. We note that there is a clear
referred malware just copy/pasted the command in the overlap showing the usage of certificates for both malicious
posted example without any effort to change the basic and benign applications. A number of explanations can be
information that indicates what a certificate is supposed to provided for this phenomenon:
certify. • Dr Jekyll and Mr Hyde syndrome. Developers use
We have further investigated this copy/paste strategy and the same development tools and environment for both
found that it occurs too often. Thus, although a certificate legitimate and malicious applications. This observation
issued to Android Debug can be used to develop and test supports the 5 working day behavior shown in table II.
an application, the release version cannot be published with This means that developers write malware during their
such a certificate. This basic rule is stated in almost every regular working hours.
online tutorial and Android textbook. Yet, we identified more • Reputation biasing. In this hypothesis, a developer
than 50 well-known malware which use such a certificate: might increase her/his reputation by developing benign
this questions the competency or may highlight the laziness applications. As soon as enough positive reviews
of malware writers as in a day-to-day job. have been obtained, successive malware might be
Finally, our manual investigation into the attributes of more easily downloaded and installed. For instance,
certificates in malware, reveal that, sometimes, malware the certificate with the serial 4DFF5300, has been
writers brag or use obvious offensive names. For instance, a observed signing both a malicious and a non malicious
certificate whose owner is named PhoneSniper appears in at application on 2011-08-30, in the very same time:
least 281 different malware. If users were able to carefully 21:52:38. On the overall 1 benign application and 176
inspect certificates before installation, such malware would malware are associated with this certificate. The most
have been less propagated. Similarly, this information could recent application in our dataset using this certificate
be used with techniques of natural language processing to was packaged on 2012-03-11 17:19:54, while its first
silently filter some malware in application markets. usage can be traced back to 2011-07-14 21:45:12.
The vast majority of Android apps in our datasets are • Anti virus false negatives. Probably, some of the
signed with a certificate that was used to sign very few applications tagged as benign are in fact malicious. It is
13
possible that existing tools have not detected them yet
Issuer: C=ID, ST=Jawa Barat, L=Bandung,
O=Londatiga, OU=AndroidDev, CN=Lorensius W. L. as malicious, due to a better obfuscation and stealthier
T/[email protected] behavior.

389
Table III
T OP 20 CERTIFICATES WHICH WERE USED TO SIGN THE MOST MALWARE
Certificate MD5 Number of Benign Number of Malware Certificate Issuer & Owner
C=US, ST=California, L=Mountain View, O=Android, OU=Android,
E8. . . 87 4 623 192
CN=Android/[email protected]
E5. . . 3F 0 167 C=keji0003
CF. . . 26 1 166 C=cn, ST=shenzhen, L=china, O=Phone, OU=Phone, CN=PhoneSniper
50. . . BA 0 98 C=kejikeji, ST=kejikeji, L=kejikeji, O=kejikeji, OU=kejikeji, CN=kejikeji
E5. . . C2 0 95 C=US, OU=Google Inc.
8B. . . D2 0 52 CN=Fujian Kaimo Network Tech
3C. . . 3E 0 29 C=a, ST=a, L=a, O=a, OU=a, CN=a
AC. . . A7 1 21 CN=Sexy
C4. . . 2B 0 20 C=CA, ST=Ontario, L=Toronto, O=Typ3 Studios, OU=Typ3 Studios, CN=Typ3 Studios
CF. . . 6C 0 19 C=0
1D. . . 07 6 17 C=CN, ST=Sichuan, L=Chengdu, O=jiemai-tech, OU=jiemai-tech, CN=Jiemai Technology
77. . . F3 8 17 CN=alan
B1. . . A4 0 17 OU=Safe System Inc., CN=Safe System Inc.
74. . . 50 0 16 C=cn, ST=guangdong, L=shenzhen, O=hynoo, OU=hynoo, CN=wang
21. . . 37 2 15 C=cn, ST=fujian, L=xiamen, O=guopai, OU=guopai, CN=jtwang
76. . . A8 1 14 C=CN, CN=picshow1
AC. . . 94 0 13 C=86, ST=BeiJing, L=BeiJing, O=Gold Dream Studio, OU=Gold Dream Studio, CN=Hong Fu
73. . . A3 0 12 C=001, ST=US, L=LSA, O=www.android.com, OU=www.android.com, CN=Android
C6. . . 1B 0 12 C=86, ST=SH, L=CN, O=MJ, OU=MJ, CN=MJ
E7. . . AE 34 12 C=0086, ST=Beijing, L=Beijing, O=Gall me, OU=Android, CN=Gall me

Table IV
T OP 15 CERTIFICATES WHICH WERE USED TO SIGN MANY MALWARE AND WHICH SIGNED BENIGN APPS AS WELL
Certificate MD5 Number of Benign Number of Malware Certificate Issuer & Owner
C=US, ST=California, L=Mountain View, O=Android, OU=Android,
E8. . . 87 4 623 192
CN=Android/[email protected]
1D. . . 07 6 17 C=CN, ST=Sichuan, L=Chengdu, O=jiemai-tech, OU=jiemai-tech, CN=Jiemai Technology
77. . . F3 8 17 CN=alan
21. . . 37 2 15 C=cn, ST=fujian, L=xiamen, O=guopai, OU=guopai, CN=jtwang
E7. . . AE 34 12 C=0086, ST=Beijing, L=Beijing, O=Gall me, OU=Android, CN=Gall me
C=US, ST=California, L=Mountain View, O=Android, OU=Android,
8D. . . F9 92 10
CN=Android/[email protected]
DE. . . 92 3 9 C=CN, ST=Guangdong, L=Guangzhou, O=synkay, OU=sunkay, CN=sunkay
C7. . . 80 56 8 C=US, ST=Fl, L=Miami, O=Gp Imports, OU=Gp Imports, CN=Gp Imports
69. . . A5 87 7 C=CN, ST=beijing, L=beijing, O=Wali, OU=Wali, CN=Lee
C=KR, ST=South Korea, L=Suwon City, O=Samsung Corporation,
34. . . F5 2 6
OU=DMC, CN=Samsung Cert/[email protected]
3D. . . 10 6 4 CN=Ngan Viet Dung
BA. . . 26 48 3 C=CN, ST=Zhejiang, L=Hangzhou, O=Feelingtouch, OU=Feelingtouch, CN=Feelingtouch
82. . . C5 2 3 C=86, ST=china, L=ysler, O=ysler, OU=ysler, CN=ysler.com
59. . . EE 178 2 C=86, ST=Guangdong, L=Guangzhou, O=3g.cn, OU=GAU, CN=Jarod Yv
51. . . B3 7 2 C=CN, ST=ShenZhen, L=ShenZhen, O=nmting.com, OU=nmting.com, CN=Ale Zhao

• Anti virus false positives. Anti virus can also pattern. Furthermore, it would make sense for the
wrongly flag a benign application as malware. developer to create a new certificate if he once wrote a
For instance the digital certificate whose md5 malware, in order to avoid having his/her future benign
is 75BDB3531C04EB8246846532A7AE2050 has been applications signed with a certificate that is associated
observed to sign 2 844 total applications, only one (1) with a malware.
of which being tagged as malicious. In this case, we Malware writers do not use digital certificates
suspect that either the certificate was stolen, but using properly, and often reuse compromised keys that were
it for only one single malicious application does not used to build certificates of benign applications.
really make sense. More probable is the hypothesis that
IV. D ISCUSSION
the single malicious application is a false positive. We
have correlated this information also with the time-line The forensic analysis that we have performed and whose
of the packaging dates for this certificate. The single results were outlined in the previous section has yielded a
malicious application was packaged on 2013-11-15 number of insights for the research and practice of malware
19:16:04; On this very same day, this certificate signed detection. In this section, we summarize these insights and
55 other apps that are all undetected by anti virus discuss how this empirical study could be instrumented in
products. The usage pattern for this certificate shows our work on malware detection.
very frequent application signing, often with just a
A. Summary of findings
few minutes between two apps, and the application
detected as a malware exhibits no deviation from this On Anti virus software: Our large-scale analysis of
hundreds of thousands of Android applications with over

390
40 anti virus products have revealed that most malware are of the bytecode could allow to isolate this code and then
not simultaneously identified by several anti virus. Only a locate it in other malware samples.
small subset of common malware is detected by most anti
virus software. This finding actually supports the idea that V. R ELATED W ORK
there is a need to invest in alternative tools for malware In this section, we enumerate a number of related
detection such as machine-learning based approaches which work to emphasize on the importance of understanding
are promising to flag more malware variants. the development of malware in order to devise efficient
On malware business: We have presented empirical techniques for their detection. These related work span from
evidence that malware were mass produced. This raises empirical studies on on datasets of malware, to malware
a number of questions leading to hypothesis on how detection schemes.
malware developers manage to remain productive. The first
hypothesis would be that, malware is not written from A. Malicious datasets analysis
scratch, thus providing an opportunity to detect malware by Researchers have already shown interest in malicious
discovering the piece of code that was grafted to existing, application datasets analysis. Felt et al. have analyzed
potentially popular, apps. several instances of malware deployed on various mobile
platforms such as iOS, Android and Symbian [15]. They
B. Insights detail the wide range of incentives for malware writing, such
Building a naive anti virus software: Exploring the as users’ personal information and credentials exfiltration,
rate of shared certificates within malware, we were able to ransom attack and the easiest way to profit from smartphone
devise a naive malware detection mechanism based on the malware, premium-rate SMS services. Their study is
appearance of a tagged certificate. In its simplest form, the however qualitative, while we have focused on a quantitative
scheme consists in tagging any application as malicious if study to draw generalizable findings on common patterns.
the signing key has been already observed for a confirmed The Genome dataset, our source of well-established
malicious application. malware, was built as part of a study by Zhou et al. [25].
To assess this naive approach we have considered that They expose in details features and incentives of the current
in a first phase we have manually discovered all malware malware threat on Android. They also suggest that existing
packaged before 01/Jan/2013 in our dataset. We consider for anti virus software still need improvements. Our analysis
this step only malware that are detected by at least half of the also comes to this conclusion when we demonstrate that
anti virus products. Then based on the certificates recorded most malware cannot be found by all anti virus products.
for the found malware, we arbitrarily tag as malicious all Opposite to our lightweight forensic analysis approach,
applications packaged after 01/Jan/2013 and that are signed Enck et al. [13] did an in-depth analysis of Android
with any of the flagged certificates. Table V provides the applications by using advanced static analysis methods.
results for this experiment. We were able to build a malware Doing so, they were able to discover some risky features
detector with a Precision of 84% (2, 166 false positives out of the android framework that could be used by malicious
of 2, 166 + 11, 460 tagged). While we succeed in flagging applications. However, our approach allowed to highlight
almost 1 actual malware out of 10, we only wrongly tag as interesting patterns that are could be leveraged more easily.
malicious about 1 benign app in 100. In recent work [1], Allix et al. have devised a sophisticated
Table V Feature set to use in a machine learning-based malware
P ERFORMANCE OF A NAIVE ANTI VIRUS SOFTWARE BASED ON detection for Android. This approach has however proved
CERTIFICATES to be resource-intensive, suggesting further investigations
Benign apps tagged Malware tagged into more straightforward features. The study detailed in
this paper is part of the roadmaps we have devised for our
Number 2 166 11 460 investigations.
Percentage 1.19% 8.82%
At the minimum, the obtained results show that our naive B. Dynamic analysis
approach could be used by anti virus vendors to improve Various solutions have been proposed to detect malicious
their recall, by being suspicious of more apps, and improve Android applications. Crowdroid, presented by Burguera et
precision by trusting apps signed with certificates that have al. [8], performs dynamic analysis of Android applications
been used in a large number of benign apps. by first collecting system calls patterns of applications,
Localizing malware: Our findings on the potential mass and then applying clustering algorithms to discriminate
production of malware could be leveraged in an approach benign from suspicious behaviors. Crowdroid strongly rely
of malware localization. Indeed, simultaneous development on crowd sourcing for system calls patterns collection.
and packaging of malware suggests a redundant insertion of Vidas and Christin has investigated applications from
malware code in all applications. Thus, a similarity measure alternative markets and compared them to applications from

391
the official market [22]. They have found that certain Enck et al. introduced Taintdroid [12] which uses taint
alternative markets almost exclusively distribute repackaged analysis techniques to detect sensitive data leaks and warn
applications containing malware. They have proceeded the end-user by showing him/her with relevant information.
to propose AppIntegrity to strengthen the authentication In [17], authors have also studied data leaks. Hornyack
properties offered in application marketplaces. Our findings et al. present a framework capable of shadowing sensitive
are in line with those, when we note that malware seem user data and of blocking outgoing connexion implying data
to be mass produced, and that the same certificates overlap leaked.
between malware and benign applications.
E. Miscellaneous approaches
C. Similarity and Heuristics based malware detection An offensive framework was presented in [16] which
embeds a broad range of available Android exploits such
In order to detect repackaged applications, which malware as Rage Against The Cage, known to overflow the number
authors often do to embed their malicious payloads, Zhou of processes allowed. In the wild, this exploit is used by
et al. [23] presented DroidMOSS. Their approach consists various malware. The framework is able to run arbitrary root
in building a signature of the whole application by using a exploit and to maintain privileges among reboots.
fuzzy hashing technique on the application’s opcodes. Then Rootkits possibilities on smartphones are exposed in [5],
a similarity score is computed for all Apps of a reference showing that smartphones are as vulnerable as desktop
dataset, thus concluding to the detection of rep if a similarity computers. The most valuable incentive to deploy rootkits
score is higher than a given threshold on smartphones would be the interesting personal data such
DroidRanger presented by Zhou et al. [24] tries to detect as voice communications and location.
suspicious applications by first performing a fast filtering With Androguard14 , Desnos et al. provide a tool
step based on permissions requested by an application. to decompile Android applications and perform code
It then analyze the application code structure, as well analyses [11]. Built on top of these features, Androguard
as other properties of applications. Finally, an heuristics also provides a way to detect a large selection of malware,
based detection engine is run with the data gathered about and to measure the similarity of two applications, to detect
applications. With this approach, the authors were able to repackaging for instance.
find malware on the official Android market but also two Finally, concerning the detection of private data leaks,
zero-day malware. static analysis tools [4], [21], including taint analysis [2],
Regarding information leakage detection, Zhou et al. also have been proposed to deal with the specificities of Android.
proposed TISSA [26] allowing an end-user to have a fine
grained control of the access to her personal data. VI. C ONCLUSION
The recent and steady rise of Android malware over the
D. On device mitigation past four years has lead to a rapidly growing automation
The topic of embedded mitigation solution was covered in the malware creation process. Due to the specific
by a wide range of previous works. nature of development of Android applications, important
artifacts leak out and can provide some insights about their
XManDroid [7] provides a mechanism capable of
creators. We have analyzed the available data through this
analyzing Inter Process Communications and decide if
perspective. For our large-scale study, we have considered
connections between applications are compliant with the
over 500,000 Android applications, which included both
system policy. This full dynamic solution addresses the
benign applications and malware.
problem of application level privilege escalation introduced
Packaging dates show substantial time localization
in [10].
behavior. Waves of packaging can be observed thus shedding
DroidChecker [9] attends to address the same issue by
a new light on the malware creation process. Digital
tracking permissions from the manifest files until their
certificates, albeit self-signed also provide valuable pieces of
utilization within the application. To achieve this, Chan et al.
information. We have observed huge quantities of malware
proposed the use of control flow graphs and taint checking
sharing the same private key and thus proving that either
techniques.
keys have been stolen, or those malware have the same
Apex [20] proposes an extension to the Android
origin. On the other hand massive copy/paste coding, relying
permission manager allowing users to customize permissions
on directly copying code from popular tutorials and blogs,
owned by applications.
shows that the malware programming is done at a fast pace
Kirin [14] extends the package installer and analyze by developers lacking elementary cryptography knowledge.
permissions before installation. It embeds security rules This, unfortunately shows that current Android malware
based on permissions sets and can prevent a program from as well as mitigation techniques are still in the infancy.
being installed according to the permissions it requests. A
similar approach has been presented in [3]. 14 http://code.google.com/p/androguard/

392
It’s surprising to see that most malware writers do not use [12] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung,
digital certificates properly and that many of the current P. McDaniel, and A. N. Sheth, “Taintdroid: an
information-flow tracking system for realtime privacy
mitigation techniques did not check them. However, more monitoring on smartphones,” in OSDI’10. Vancouver, BC,
troubling is the extent to which private keys seem to have Canada: USENIX Association, 2010, pp. 1–6.
been compromised and that both benign applications and
[13] W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri, “A study
malware share the same certificates. of android application security,” in SEC’11. San Francisco,
In the future, we plan to leverage the insights discussed CA: USENIX Association, 2011, pp. 21–21.
in Section IV. Furthermore, we plan to extend this work
[14] W. Enck, M. Ongtang, and P. McDaniel, “On lightweight
by considering also the automated analysis of the bytecode. mobile phone application certification,” in CCS ’09. Chicago,
Some preliminary work have been done and the results are Illinois, USA: ACM, 2009, pp. 235–245.
promising.
ACKNOWLEDGEMENTS [15] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A
survey of mobile malware in the wild,” in SPSM ’11. New
This work was supported by the Fonds National de York, NY, USA: ACM, 2011, pp. 3–14.
la Recherche (FNR), Luxembourg, under the project
AndroMap C13/IS/5921289. [16] S. Höbarth and R. Mayrhofer, “A framework for on-device
We would like to thank VirusTotal for providing us access privilege escalation exploit execution on android,” in Proc.
IWSSI/SPMU, June 2011.
to their tool.
R EFERENCES [17] P. Hornyack, S. Han, J. Jung, S. Schechter, and D. Wetherall,
[1] K. Allix, T. F. Bissyandé, Q. Jerome, J. Klein, R. State, and “These aren’t the droids you’re looking for: Retrofitting
Y. Le Traon, “Large-scale machine learning-based malware android to protect data from imperious applications,” in
detection: Confronting the ”10-fold cross validation scheme” Proceedings of the 18th ACM Conference on Computer and
with reality,” in CODASPY ’14, 2014. Communications Security, ser. CCS ’11. New York, NY,
USA: ACM, 2011, pp. 639–652.
[2] S. Arzt, S. Rasthofer, E. Bodden, A. Bartel, J. Klein,
Y. Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: Precise [18] ITU, “Information technology Open Systems Interconnection
context, flow, field, object-sensitive and lifecycle-aware taint The Directory: Public-key and attribute certificate frameworks
analysis for android apps,” in Conference on Programming Technical Corrigendum 2,” ITU, Genebra, Series X: Data
Language Design and Implementation (PLDI), 2014. Networks, Open System Communications and Security
Directory, nov 2008, ITU-T Recommendation X.509.
[3] A. Bartel, J. Klein, M. Monperrus, K. Allix, and Y. Le Traon,
“Improving privacy on android smartphones through in-vivo [19] H. B. Mann and D. R. Whitney, “On a test of whether one of
bytecode instrumentation,” Technical Report, May 2012. two random variables is stochastically larger than the other,”
The Annals of Mathematical Statistics, vol. 18, no. 1, pp.
[4] A. Bartel, J. Klein, M. Monperrus, and Y. Le Traon, “Dexpler: 50–60, 1947.
Converting Android Dalvik Bytecode to Jimple for Static
Analysis with Soot,” in ACM Sigplan Workshop on the State [20] M. Nauman, S. Khan, and X. Zhang, “Apex: extending
Of The Art in Java Program Analysis (SOAP), 2012. android permission model and enforcement with user-defined
runtime constraints,” in ASIACCS ’10, 2010, pp. 328–332.
[5] J. Bickford, R. O’Hare, A. Baliga, V. Ganapathy, and
L. Iftode, “Rootkits on smart phones: attacks, implications [21] D. Octeau, P. McDaniel, S. Jha, A. Bartel, E. Bodden,
and opportunities,” in HotMobile ’10, Maryland, 2010. J. Klein, and Y. Le Traon, “Effective inter-component
communication mapping in android with epicc: An essential
[6] J. Brodkin, “On its 5th birthday, 5 things we love about step towards holistic security analysis,” in Proceedings of the
android,” Nov. 2012, http://arstechnica.com/gadgets/2012/11/ 22nd USENIX Security Symposium, 2013.
on-androids-5th-birthday-5-things-we-love-about-android/.
[22] T. Vidas and N. Christin, “Sweetening android lemon
[7] S. Bugiel, L. Davi, A. Dmitrienko, T. Fischer, and markets: Measuring and combating malware in application
A.-R. Sadeghi, “Xmandroid: A new android evolution to marketplaces,” in CODASPY ’13, 2013.
mitigate privilege escalation attacks,” Technische Universität
Darmstadt, Technical Report TR-2011-04, Apr. 2011. [23] W. Zhou, Y. Zhou, X. Jiang, and P. Ning, “Detecting
repackaged smartphone applications in third-party android
[8] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani, “Crowdroid: marketplaces,” in CODASPY ’12. ACM, 2012, pp. 317–326.
behavior-based malware detection system for android,” in
SPSM ’11, Chicago, Illinois, USA, 2011, pp. 15–26. [24] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang, “Hey, you, get
[9] P. P. Chan, L. C. Hui, and S. M. Yiu, “Droidchecker: off of my market: Detecting malicious apps in official and
analyzing android applications for capability leak,” in WISEC alternative android markets,” in NDSS’12, 2012.
’12. Tucson, Arizona, USA: ACM, 2012, pp. 125–136.
[25] Y. Zhou and X. Jiang, “Dissecting android malware:
[10] L. Davi, A. Dmitrienko, A.-R. Sadeghi, and M. Winandy, Characterization and evolution,” in SP ’12, Washington, DC,
“Privilege escalation attacks on android,” in ISC’10. Boca USA, 2012, pp. 95–109.
Raton, FL, USA: Springer-Verlag, 2011, pp. 346–360.
[26] Y. Zhou, X. Zhang, X. Jiang, and V. W. Freeh, “Taming
[11] A. Desnos, “Android: Static analysis using similarity information-stealing smartphone applications (on android),”
distance,” in HICSS ’12. Washington, DC, USA: IEEE in TRUST’11. Pittsburgh, PA: Springer-Verlag, 2011, pp.
Computer Society, 2012, pp. 5394–5403. 93–107.

393

You might also like