Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
Internet Malware
1 Introduction
Many of the most visible and serious problems facing the Internet today depend
on a vast ecosystem of malicious software and tools. Spam, phishing, denial of
service attacks, botnets, and worms largely depend on some form of malicious
code, commonly referred to as malware. Malware is often used to infect the com-
puters of unsuspecting victims by exploiting software vulnerabilities or tricking
users into running malicious code. Understanding this process and how attackers
C. Kruegel, R. Lippmann, and A. Clark (Eds.): RAID 2007, LNCS 4637, pp. 178–197, 2007.
c Springer-Verlag Berlin Heidelberg 2007
Automated Classification and Analysis of Internet Malware 179
use the backdoors, key loggers, password stealers, and other malware functions
is becoming an increasingly difficult and important problem.
Unfortunately, the complexity of modern malware is making this problem
more difficult. For example, Agobot [3], has been observed to have more than
580 variants since its initial release in 2002. Modern Agobot variants have the
ability to perform DoS attacks, steal bank passwords and account details, prop-
agate over the network using a diverse set of remote exploits, use polymorphism
to evade detection and disassembly, and even patch vulnerabilities and remove
competing malware from an infected system [3]. Making the problem even more
challenging is the increase in the number and diversity of Internet malware. A
recent Microsoft survey found more than 43,000 new variants of backdoor trojans
and bots during the first half of 2006 [20]. Automated and robust approaches to
understanding malware are required to successfully stem the tide.
Previous efforts to automatically classify and analyze malware (e.g., AV, IDS)
focused primarily on content-based signatures. Unfortunately, content-based sig-
natures are inherently susceptible to inaccuracies due to polymorphic and meta-
morphic techniques. In addition, the signatures used by these systems often
focus on a specific exploit behavior—an approach increasingly complicated by
the emergence of multi-vector attacks. As a result, IDS and AV products charac-
terize malware in ways that are inconsistent across products, incomplete across
malware, and that fail to be concise in their semantics. This creates an en-
vironment in which defenders are limited in their ability to share intelligence
across organizations, to detect the emergence of new threats, and to assess risk
in quarantine and cleanup of infections.
To address the limitations of existing automated classification and analysis
tools, we have developed and evaluated a dynamic analysis approach, based on
the execution of malware in virtualized environments and the causal tracing of
the operating system objects created due to malware’s execution. The reduced
collection of these user-visible system state changes (e.g., files written, processes
created) is used to create a fingerprint of the malware’s behavior. These fin-
gerprints are more invariant and directly useful than abstract code sequences
representing programmatic behavior and can be directly used in assessing the
potential damage incurred, enabling detection and classification of new threats,
and assisting in the risk assessment of these threats in mitigation and clean
up. To address the sheer volume of malware and the diversity of its behavior,
we provide a method for automatically categorizing these malware profiles into
groups that reflect similar classes of behaviors. These methods are thoroughly
evaluated in the context of a malware dataset that is large, recent, and diverse
in the set of attack vectors it represents (e.g., spam, worms, bots, spyware).
This paper is organized as follows: Section 2 describes the shortcomings of
existing AV software and enumerates requirements for effective malware clas-
sification. We present our behavior-based fingerprint extraction and fingerprint
clustering algorithm in Section 3. Our detailed evaluation is shown in Section 4.
We present existing work in Section 5, offer limitations and future directions in
Section 6, and conclude in Section 7.
180 M. Bailey et al.
Host-based AV systems detect and remove malicious threats from end systems.
As a normal part of this process, these AV programs provide a description for the
malware they detected. The ability of these products to successfully characterize
these threats has far-reaching effects—from facilitating sharing across organiza-
tions, to detecting the emergence of new threats, and assessing risk in quarantine
and cleanup. However, for this information to be effective, the descriptions pro-
vided by these systems must be meaningful. In this section, we evaluate the
ability of host-based AV to provide meaningful intelligence on Internet malware.
Table 1. The datasets used in this paper: A large collection of legacy binaries from
2004, a small six-week collection from 2006, and a large six-month collection of malware
from 2006/2007. The number of unique labels provided by five AV systems is listed for
each dataset.
Dataset Date Number of Number of Unique Labels
Name Collected Unique MD5s McAfee F-Prot ClamAV Trend Symantec
legacy 01 Jan 2004 - 31 Dec 2004 3,637 116 1216 590 416 57
small 03 Sep 2006 - 22 Oct 2006 893 112 379 253 246 90
large 03 Sep 2006 - 18 Mar 2007 3,698 310 1,544 1,102 2,035 50
After collecting the binaries, we analyzed them using the AV scanners shown
in Table 2. Each of the scanners was the most recent available from each vendor
at the time of the analysis. The virus definitions and engines were updated
uniformly on November 20th, 2006, and then again on March 31st, 2007. Note
that the first update occured more than a year after the legacy collection ended
and one month after the end of the small set collection. The second update was
13 days after the end of the large set collection.
Automated Classification and Analysis of Internet Malware 181
Table 2. Anti-virus software, vendors, versions, and signature files used in this paper.
The small and legacy datasets were evaluated with a version of these systems in No-
vember of 2006 and both small and large were evaluated again with a version of these
systems in March of 2007.
AV systems rarely use the exact same labels for a threat, and users of these
systems have come to expect simple naming differences (e.g., W32Lovsan.worm.a
versus Lovsan versus WORM MSBLAST.A) across vendors. It has always been
assumed, however, that there existed a simple mapping from one system’s name
space to another, and recently investigators have begun creating projects to unify
these name spaces [4]. Unfortunately, the task appears daunting. Consider, for
example, the number of unique labels created by various systems. The result
in Table 1 is striking—there is a substantial difference in the number of unique
labels created by each AV system. While one might expect small differences, it
is clear that AV vendors disagree not only on what to label a piece of malware,
but also on how many unique labels exist for malware in general.
One simple explanation for these differences in the number of labels is that
some of these AV systems provide a finer level of detail into the threat landscape
than the others. For example, the greater number of unique labels in Table 1 for
F-Prot may be the result of F-Prot’s ability to more effectively differentiate small
variations in a family of malware. To investigate this conjecture, we examined the
182 M. Bailey et al.
labels of the legacy dataset produced by the AV systems and, using a collection
of simple heuristics for the labels, we created a pool of malware classified by
F-Prot, McAfee, and ClamAV as SDBot [19]. We then examined the percentage
of time each of the three AV systems classified these malware samples as part
of the same family. The result of this analysis can be seen in Figure 1. Each AV
classifies a number of samples as SDBot, yet the intersection of these different
SDBot families is not clean, since there are many samples that are classified as
SDBot by one AV and as something else by the others. It is clear that these
differences go beyond simple differences in labeling—anti-virus products assign
distinct semantics to differing pieces of malware.
Our previous analysis has provided a great deal of evidence indicating that la-
beling across AV systems does not operate in a way that is useful to researchers,
operators, and end users. Before we evaluate these systems any further, it is im-
portant to precisely define the properties an ideal labeling system should have.
We have identified three key design goals for such a labeling system:
– Consistency. Identical items must and similar items should be assigned the
same label.
– Completeness. A label should be generated for as many items as possible.
– Conciseness. The labels should be sufficient in number to reflect the unique
properties of interest, while avoiding superfluous labels.
Table 3. The percentage of time two binaries classified as the same by one AV are
classified the same by other AV systems. Malware is inconsistently classified across AV
vendors.
legacy small
McAfee F-Prot ClamAV Trend Symantec McAfee F-Prot ClamAV Trend Symantec
McAfee 100 13 27 39 59 100 25 54 38 17
F-Prot 50 100 96 41 61 45 100 57 35 18
ClamAV 62 57 100 34 68 39 23 100 32 13
Trend 67 18 25 100 55 45 23 52 100 16
Symantec 27 7 13 14 100 42 25 46 33 100
Table 4. The percentage of malware samples detected across datasets and AV vendors.
AV does not provide a complete categorization of the datasets.
Table 5. The ways in which various AV products label and group malware. AV labeling
schemes vary widely in how concisely they represent the malware they classify.
as well as the number of unique families these labels belong to. In this analysis,
the family is a generalized label heuristically extracted from the literal string,
which contains the portion intended to be human-readable. For example, the
literal labels returned by a AV system W32-Sdbot.AC and Sdbot.42, are both
in the “sdbot” family. An interesting observation from this table is that these
systems vary widely in how concisely they represent malware. Vendors such as
Symantec appear to employ a general approach, reducing samples to a small
handful of labels and families. On the other extreme, FProt appears to aggres-
sively label new instances, providing thousands of unique labels for malware,
but still maintaining a small number of groups or families to which these labels
belong.
impact of any immediate attack behaviors (e.g., scanning, DDoS, and spam) is
minimized during the limited execution period. The system events are captured
and exported to an external server using the Backtracker system [12]. In addition
to exporting system events, the Backtracker system provides a means of building
causal dependency graphs of these events. The benefit of this approach is that
we can validate that the changes we observe are a direct result of the malware,
and not of some normal system operation.
While the choice of abstraction and generation of behaviors provides useful infor-
mation to users, operators, and security personnel, the sheer volume of malware
makes manual analysis of each new malware intractable. Our malware source
observed 3,700 samples in a six-month period—over 20 new pieces per day. Each
generated fingerprint, in turn, can exhibit many thousands of individual state
changes (e.g., infecting every .exe on a Windows host). For example, consider
the tiny subset of malware in table 6. The 10 distinct pieces of malware generate
from 10 to 66 different behaviors with a variety of different labels, including
disjoint families, variants, and undetected malware. While some items obviously
belong together in spite of their differences (e.g., C and D), even the composition
of labels across AV systems can not provide a complete grouping of the malware.
Obviously, for these new behavioral fingerprints to be effective, similar behaviors
need to be grouped and appropriate meanings assigned.
Table 6. Ten unique malware samples. For each sample, the number of process, file,
registry, and network behaviors observed and the classifications given by various AV
vendors are listed.
Table 7. A matrix of the NCD between each of the 10 malware samples in our example
A B C D E F G H I J
A 0.06 0.07 0.84 0.84 0.82 0.73 0.80 0.82 0.68 0.77
B 0.07 0.06 0.84 0.85 0.82 0.73 0.80 0.82 0.68 0.77
C 0.84 0.84 0.04 0.22 0.45 0.77 0.64 0.45 0.84 0.86
D 0.85 0.85 0.23 0.05 0.45 0.76 0.62 0.43 0.83 0.86
E 0.83 0.83 0.48 0.47 0.03 0.72 0.38 0.09 0.80 0.85
F 0.71 0.71 0.77 0.76 0.72 0.05 0.77 0.72 0.37 0.54
G 0.80 0.80 0.65 0.62 0.38 0.78 0.04 0.35 0.78 0.86
H 0.83 0.83 0.48 0.46 0.09 0.73 0.36 0.04 0.80 0.85
I 0.67 0.67 0.83 0.82 0.79 0.38 0.77 0.79 0.05 0.53
J 0.75 0.75 0.86 0.85 0.83 0.52 0.85 0.83 0.52 0.08
metric for clustering. Our initial naive approach to defining similarity was based
on the concept of edit distance [7]. In this approach, each behavior is treated
as an atomic unit and we measure the number of inserts of deletes of these
atomic behaviors required to transform one behavioral fingerprint into another.
The method is fairly intuitive and straightforward to implement (think the Unix
command diff here); however, it suffers from two major drawbacks:
– Overemphasizing size. When the size of the number of behaviors is large,
the edit distance is effectively equivalent to clustering based on the length
of the feature set. This overemphasizes differences over similarities.
– Behavioral polymorphism. Many of the clusters we observed had few
exact matches for behaviors. This is because the state changes made by mal-
ware may contain simple behavioral polymorphism (e.g., random file names).
To solve these shortcomings we turned to normalized compression distance
(NCD). NCD is a way to provide approximation of information content, and it
has been successfully applied in a number of areas [25,29]. NCD is defined as:
C(x + y) − min(C(x), C(y))
N CD(x, y) =
max(C(x), C(y))
where ”x + y” is the concatenation of x and y, and C(x) is the zlib-compressed
length of x. Intuitively, NCD represents the overlap in information between two
samples. As a result, behaviors that are similar, but not identical, are viewed as
close (e.g., two registry entries with different values, random file names in the
same locations). Normalization addresses the issue of differing information con-
tent. Table 7 shows the normalized compression distance matrix for the malware
described in Table 6.
Constructing Relationships Between Malware. Once we know the in-
formation content shared between two sets of behavioral fingerprints, we can
combine various pieces of malware based on their similarity. In our approach,
we construct a tree structure based on the well-known hierarchical clustering
algorithm [11]. In particular, we use pairwise single-linkage clustering, which de-
fines the distance between two clusters as the minimum distance between any
two members of the clusters. We output the hierarchical cluster results as a tree
graph in graphviz’s dot format [14]. Figure 2 shows the generated tree for the
malware in Table 6.
Automated Classification and Analysis of Internet Malware 187
0.7
c9 0.6
0.5
c6 c8
0.4
c3 c4 c1 c7
0.3
C D c2 G A B c5 J 0.2
0.1
E H F I
A B F I J C D E H G
Fig. 2. On the left, a tree consisting of the malware from Table 6 has been clustered via
a hierarchical clustering algorithm whose distance function is normalized compression
distance. On the right, a dendrogram illustrating the distance between various subtrees.
Table 8. The clusters generated via our technique for the malware listed in Table 6
3e+08
100
Inconsistency-based Tree Cutting
2.5e+08 Normalized Compression Distance
Single-Linkage Hierarchical Clustering 10
2e+08 1
Seconds
Bytes
1.5e+08 0.1
1e+08 0.01
Inconsistency-based Tree Cutting
Normalized Compression Distance
5e+07 0.001 Single-Linkage Hierarchical Clustering
0 0.0001
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Number of Malware to Cluster Number of Malware to Cluster
Fig. 3. The memory and runtime required for performing clustering based on the num-
ber of malware clustered (for a variety of different sized malware behaviors)
link to consider in the calculation. All the links at the current level in the hier-
archy, as well as links down to the given depth below the current level, are used
in the inconsistency calculation.
In Table 8 we see the result of the application of this approach to the exam-
ple malware in Table 6. The 10 unique pieces of malware generate four unique
clusters. Each cluster shows the elements in that cluster, the average number of
unique behaviors in common between the clusters, and an example of a high-level
behavior in common between each binary in the cluster. For example, cluster one
consists of C and D and represents two unique behaviors of mytob, a mass mail-
ing scanning worm. Five of the behaviors observed for C and D are identical
(e.g., scans port 25), but several others exhibit some behavioral polymorphism
(e.g., different run on reboot registry entries). The other three clusters exhibit
similar expected results, with cluster two representing the cygwin backdoors,
cluster three the bancos variants, and cluster four a class of IRC backdoors.
4 Evaluation
To demonstrate the effectiveness of behavioral clustering, we evaluate our tech-
nique on the large and small datasets discussed in section 2. We begin by demon-
strating the runtime performance and the effects of various parameters on the
system. We then show the quality or goodness of the clusters generated by our
system by comparing existing AV groups (e.g., those labeled as SDBot) to our
clusters. Next we discuss our clusters in the context of our completeness, concise-
ness, and consistency criteria presented earlier. Finally, we illustrate the utility
of the clusters by answering relevant questions about the malware samples.
325 10000
300
Depth
275 1
2 1000
250
4
225 6
8
Number of Clusters
200 10 100
12
175 14
150
10
125
100
75
1 Average Cluster Size
50 Number of Clusters
25
0 0.1
0 1 2 3 4 0 0.5 1 1.5 2 2.5 3
Inconsistency Threshold Inconsistency
Fig. 4. On the left, the number of clusters generated for various values of the inconsis-
tency parameter and depth. On the right, the trade-off between the number of clusters,
the average cluster size, and the inconsistency value.
we analyze its run time and memory consumption by running ten trials for each.
The experiments were performed on a Dell PowerEdge 4600 with two Intel Xeon
MP CPUs (3.00GHz), 4 GB of DDR ECC RAM, 146G Cheetah Seagate drive
with an Adaptec 3960D Ultra160 SCSI adapter, running Fedora Core Linux.
We first decompose the entire execution process into five logical steps: (1)
trace collection, (2) state change extraction, (3) NCD distance matrix compu-
tation: an O(N 2 ) operation, (4) clustering the distance matrix into a tree, (5)
cutting the tree into clusters. We focus on the latter three operations specific to
our algorithm for performance evaluation. Figure 3 shows the memory usage for
those three steps. As expected, computing NCD requires the most memory with
quadratic growth with an increasing number of malware for clustering. However,
clustering 500 malware samples requires less than 300MB of memory. The mem-
ory usage for the other two components grows at a much slower rate. Examining
the run-time in Figure 3 indicates that all three components can complete within
hundreds of seconds for clustering several hundred malware samples.
Phases 1-4 of the system operate without any parameters. However, the tree-
cutting algorithm of phase 5 has two parameters: the inconsistency measure and
the depth value. Intuitively, larger inconsistency measures lead to fewer clus-
ters and larger depth values for computing inconsistency result in more clusters.
Figure 4 illustrates the effects of depth on the number of clusters produced
for the small dataset for various inconsistency values. Values of between 4-6
for the depth (the 3rd and 4th colored lines) appear to bound the knee of the
curve. In order to evaluate the effect of inconsistency, we fixed thedepth to 4
and evaluated the number of clusters versus the average size of the clusters for
various inconsistency values in the large dataset. The results of this analysis,
shown in Figure 4, show a smooth trade-off until an inconsistency value between
2.2 and 2.3, where the clusters quickly collapse into a single cluster. In order
to generate clusters that are as concise as possible without, losing important
190 M. Bailey et al.
feature information, the experiments in the next selection utilize values of depth
and inconsistency just at the knee of these curves. In this case, it is a depth
value of 4 and an inconsistency value of 2.22.
We previously examined how the clusters resulting from the application of our
algorithm to the large dataset compared to classification of AV systems. In this
section, we examine more general characteristics of our clusters in an effort
to demonstrate their quality. In particular, we demonstrate the completeness,
conciseness, and consistency of the generated clusters. Our analysis of these
properties, summarized in Table 9, are highlighted each in turn:
Completeness. To measure completeness, we examined the number of times
we created a meaningful label for a binary and compared this to the detection
rates of the various AV products. For AV software, “not detected” means no
Automated Classification and Analysis of Internet Malware 191
Table 9. The completeness, conciseness, and consistency of the clusters created with
our algorithm on the large dataset as compared to various AV vendors
Examining the Malware Behaviors. Clearly one of the values of any type
of automated security system is not to simply provide detailed information on
individual malware, but also to provide broad analysis on future directions of
malware. Using the behavioral signatures created by our system, we extracted
Automated Classification and Analysis of Internet Malware 193
the most prevalent behaviors for each of the various categories of behaviors we
monitor. The top five such behaviors in each category are shown in Table 10.
The network behavior seems to conform with agreed notions of how the tasks
are being performed by most malware today. Two of the top five network behav-
iors involve the use of mail ports, presumably for spam. Port 6667 is a common
IRC port and is often used for remote control of the malware. Two of the ports
are HTTP ports used by systems to check for jailed environments, download
code via the web, or tunnel command and control over what is often an unfil-
tered port. The process behaviors are interesting in that many process executa-
bles are named like common Windows utilities to avoid arousing suspicion (e.g.,
svchost.exe, tasklist32.exe). In addition, some malware uses IEXPLORE.EXE
directly to launch popup ads and redirect users to potential phishing sites. This
use of existing programs and libraries will make simple anomaly detection tech-
niques more difficult. The file writes show common executable names and data
files written to the filesystem by malware. For example, the winhlp32.dat file
is a data file common to many Bancos trojans. Registry keys are also fairly in-
teresting indications of behavior and the prevalence of wininet.dll keys shows
heavy use of existing libraries for network support. The writing to PRNG keys
indicates a heavy use of randomization, as the seed is updated every time a
PRNG-related function is used. As expected, the malware does examine and
modify the registered application on a machine, the TCP/IP proxy settings (in
part to avoid AV), and it queries mounted drives.
5 Related Work
Our work is the first to apply automated clustering to understand malware be-
havior using resulting state changes on the host to identify various malware
families. Related work in malware collection, analysis, and signature generation
has primarily explored static and byte-level signatures [23,17] focusing on in-
variant content. Content-based signatures are insufficient to cope with emerging
threats due to intentional evasion. Behavioral analysis has been proposed as a
solution to deal with polymorphism and metamorphism, where malware changes
its visible instruction sequence (typically the decryptor routine) as it spreads.
Similar to our work, emulating malware to discover spyware behavior by using
anti-spyware tools has been used in measurements studies [22].
There are several abstraction layers at which behavioral profiles can be cre-
ated. Previous work has focused on lower layers, such as individual system
calls [15,10],instruction-based code templates [6], the initial code run on malware
194 M. Bailey et al.
infection (shellcode) [18], and network connection and session behavior [30]. Such
behavior needs to be effectively elicited. In our work, we chose a higher abstrac-
tion layer for several reasons. In considering the actions of malware, it is not the
individual system calls that define the significant actions that a piece of malware
inflicts upon the infected host; rather, it is the resulting changes in state of the
host. Also, although lower levels may allow signatures that differentiate mal-
ware, they do not provide semantic value in explaining behaviors exhibited by a
malware variant or family. In our work, we define malware by what it actually
does, and thereby build in more semantic meanings to the profiles and clusters
generated.
Various aspects of high-level behavior could be included in the definition of a
behavioral profile. Network behavior may be indicative of malware and has been
used to detect malware infections. For example, Ellis et al. [9] extracted network-
level features, such as similar data being sent from one machine to the next. In
our work, we focus on individual host behavior, including network connection
information but not the data transmitted over the network. Thus, we focus more
on the malware behavior on individual host systems instead of the pattern across
a network.
Recently, Kolter and Maloof [13] studied applying machine learning to clas-
sify malicious executables using n-grams of byte codes. Our use of hierarchical
clustering based on normalized compression distance is a first step at examining
how statistical techniques are useful in classifying malware, but the features used
are the resulting state changes on the host to be more resistant to evasion and
inaccuracies. Normalized information distance was proposed by Li et al. [16] as
an optimal similarity metric to approximate all other effective similarity metrics.
In previous work [29], NCD was applied to worm executables directly and to the
network traffic generated by worms. Our work applies NCD at a different layer
of abstraction. Rather than applying NCD to the literal malware executables,
we apply NCD to the malware behavior.
Our system is not without limitations and shares common weaknesses asso-
ciated with dynamic analysis. Since the malware samples were executed within
VMware, samples that employ anti-VM evasion techniques may not exhibit their
malicious behavior. To mitigate this limitation, the samples could be run on a
real, non-virtualized system, which would be restored to a clean state after each
simulation. Another limitation is the time period in which behaviors are collected
from the malware execution. In our experiments, each binary was able to run for
five minutes before the virtual machine was terminated. It is possible that cer-
tain behaviors were not observed within this period due to time-dependent or de-
layed activities. Previous research has been done to detect such time-dependent
triggers [8]. A similar limitation is malware that depends on user input, such
as responding to a popup message box, before exhibiting further malicious be-
havior, as mentioned in [22]. Finally, the capabilities and environment of our
Automated Classification and Analysis of Internet Malware 195
7 Conclusion
Acknowledgments
This work was supported in part by the Department of Homeland Security (DHS)
under contract numbers NBCHC040146 and NBCHC060090, by the National
Science Foundation (NSF) under contract number CNS 0627445 and by corpo-
rate gifts from Intel Corporation and Cisco Corporation. We would like to thank
our shepherd, Jonathon Giffin, for providing valuable feedback on our submission
as well as the anonymous reviewers for critical and useful comments.
References
1. Arbor malware library (AML) (2006), http://www.arbornetworks.com/
2. Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.: The nepenthes plat-
form: An efficient approach to collect malware. In: Zamboni, D., Kruegel, C. (eds.)
RAID 2006. LNCS, vol. 4219, Springer, Heidelberg (2006)
3. Barford, P., Yagneswaran, V.: An inside look at botnets. In: Series: Advances in
Information Security, Springer, Heidelberg (2006)
4. Beck, D., Connolly, J.: The Common Malware Enumeration Initiative. In: Virus
Bulletin Conference (October 2006)
5. Willems, C., Holz, T.: Cwsandbox ( 2007), http://www.cwsandbox.org/
6. Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware
malware detection. In: Proceedings of the 2005 IEEE Symposium on Security and
Privacy (Oakland 2005), Oakland, CA, USA, May 2005, pp. 32–46. ACM Press,
New York (2005)
7. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms.
MIT Press, Cambridge, MA (1990)
8. Crandall, J.R., Wassermann, G., de Oliveira, D.A.S., Su, Z., Wu, S.F., Chong, F.T.:
Temporal Search: Detecting Hidden Malware Timebombs with Virtual Machines.
In: Proceedings of ASPLOS, San Jose, CA, October 2006, ACM Press, New York
(2006)
9. Ellis, D., Aiken, J., Attwood, K., Tenaglia, S.: A Behavioral Approach to Worm
Detection. In: Proceedings of the ACM Workshop on Rapid Malcode (WORM04),
October 2004, ACM Press, New York (2004)
10. Gao, D., Beck, D., Reiter, J.C.M.K., Song, D.X.: Behavioral distance measurement
using hidden markov models. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006.
LNCS, vol. 4219, pp. 19–40. Springer, Heidelberg (2006)
11. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer, Heidelberg (2001)
12. King, S.T., Chen, P.M.: Backtracking intrusions. In: Proceedings of the 19th ACM
Symposium on Operating Systems Principles (SOSP’03), Bolton Landing, NY,
USA, October 2003, pp. 223–236. ACM Press, New York (2003)
13. Kolter, J.Z., Maloof, M.A.: Learning to Detect and Classify Malicious Executables
in the Wild. Journal of Machine Learning Research (2007)
Automated Classification and Analysis of Internet Malware 197
14. Koutsofios, E., North, S.C.: Drawing graphs with dot. Technical report, AT&T
Bell Laboratories, Murray Hill, NJ (October 8, 1993)
15. Lee, T., Mody, J.J.: Behavioral classification. In: Proceedings of EICAR 2006 (April
2006)
16. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. In: SODA ’03:
Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algo-
rithms, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics,
pp. 863–872 (2003)
17. Li, Z., Sanghi, M., Chen, Y., Kao, M., Chavez, B.: Hamsa: Fast Signature Gener-
ation for Zero-day Polymorphic Worms with Provable Attack Resilience. In: Proc.
of IEEE Symposium on Security and Privacy, IEEE Computer Society Press, Los
Alamitos (2006)
18. Ma, J., Dunagan, J., Wang, H., Savage, S., Voelker, G.: Finding Diversity in Remote
Code Injection Exploits. In: Proceedings of the USENIX/ACM Internet Measure-
ment Conference, October 2006, ACM Press, New York (2006)
19. McAfee: W32/Sdbot.worm (April 2003),
http://vil.nai.com/vil/content/v_100454.htm
20. Microsoft: Microsoft security intelligence report: (January-June 2006) (October
2006), http://www.microsoft.com/technet/security/default.mspx
21. Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for mal-
ware analysis. In: Proceedings of the IEEE Symposium on Security and Privacy
(Oakland 2007), May 2007, IEEE Computer Society Press, Los Alamitos (2007)
22. Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A Crawler-based Study of
Spyware in the Web. In: Proceedings of the Network and Distributed System Se-
curity Symposium (NDSS), San Diego, CA (2006)
23. Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures
for polymorphic worms. In: Proceedings 2005 IEEE Symposium on Security and
Privacy, Oakland, CA, USA, May 8–11, 2005, IEEE Computer Society Press, Los
Alamitos (2005)
24. Norman Solutions: Norman sandbox whitepaper (2003),
http:// download.norman.no/whitepapers/whitepaper Norman SandBox.pdf
25. Nykter, M., Yli-Harja, O., Shmulevich, I.: Normalized compression distance for
gene expression analysis. In: Workshop on Genomic Signal Processing and Statistics
(GENSIPS) (May 2005)
26. Prince, M.B., Dahl, B.M., Holloway, L., Keller, A.M., Langheinrich, E.: Under-
standing how spammers steal your e-mail address: An analysis of the first six
months of data from project honey pot. In: Second Conference on Email and Anti-
Spam (CEAS 2005) (July 2005)
27. Walters, B.: VMware virtual platform. j-LINUX-J 63 (July 1999)
28. Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King,
S.T.: Automated web patrol with strider honeymonkeys: Finding web sites that
exploit browser vulnerabilities. In: Proceedings of the Network and Distributed
System Security Symposium, NDSS 2006, San Diego, California, USA (2006)
29. Wehner, S.: Analyzing worms and network traffic using compression. Technical
report, CWI, Amsterdam (2005)
30. Yegneswaran, V., Giffin, J.T., Barford, P., Jha, S.: An Architecture for Generat-
ing Semantics-Aware Signatures. In: Proceedings of the 14th USENIX Security
Symposium, Baltimore, MD, USA, August 2005, pp. 97–112 (2005)