Im 2007
Im 2007
Contents
Abstract 3
1 Introduction 4
2 What is Malware? 4
2.1 Who are the Users and Creators of Malware? . . . . . . . . . . . . . . . 6
5 Summary 42
References 43
Abstract
1 Introduction
Malware has had a tremendous impact on the world as we know it. The rising number
of computer security incidents since 1988 [7, 8] suggests that malware is an epidemic.
Surfing the World Wide Web with all anti-virus and firewall protection disabled for a day
should convince any reader of the widespread and malicious nature of malware. A sim-
ilar experiment was conducted by the San Diego Supercomputer Center (SDSC) [39].
In December of 1999, SDSC installed Red Hat Linux 5.2 with no security patches on
a computer connected to the Internet. Within eight hours of installation, the computer
had been attacked. 21 days after installation, the computer had experienced 20 at-
tacks. Approximately 40 days after the installation the computer had been deemed
“compromised.” Malware can result in consequences ranging from Web site deface-
ment [39] to the loss of human life [6].
Equipped with the knowledge of malware’s capabilities, the detection of malware is
an area of major concern not only to the research community but also to the general
public. Techniques researchers develop for malware detection are realized through
the implementation of malware detectors. In this report we are interested in surveying
malware detection techniques. Section 2 defines malware. Section 3 gives a gen-
eral description of malware detectors. Section 4 surveys various malware detection
techniques proposed in literature. Finally, Section 5 summarizes this report.
2 What is Malware?
Viruses: A computer virus is code that replicates by inserting itself into other programs.
A program that a virus has inserted itself into is infected, and is referred to as the virus’s
host. An important caveat is that viruses, in order to function, require their hosts, that is,
a virus needs an existing host program in order to cause harm. For example, in order
to get into a computer system, a virus may attach itself to some software utility (e.g.
a word processing application). Launching the word processing application could then
activate the virus that may, for example, duplicate itself and disable malware detectors
enabled on the computer system.
Worms: A computer worm replicates itself by executing its own code independent of
any other program. The primary distinction between a virus and a worm is that a worm
does not need a host to cause harm. Another distinction between viruses and worms is
their propagation model. In general, viruses attempt to spread through programs/files
on a single computer system. However, worms spread via network connections with
the goal of infecting as many computer systems connected to the network as possible.
Malware writers/users go by a variety of names. Some of the most popular names are
black hats, hackers, and crackers. The actual persons or organizations that take on the
aforementioned names could be an external/internal threat, a foreign government, or
an industrial spy [6].
There are essentially two phases in the lifecycle of software during which malware
is inserted. These phases are referred to as the pre-release phase and the post-
release phase. An internal threat or insider is generally the only type of hacker capable
of inserting malware into software before its release to the end-users. An insider is
a trusted developer, typically within an organization, of some software to be deployed
to its end users. All other persons or organizations that take on the hacker role insert
malware during the post-release phase, which is when the software is available for its
intended audience.
In creating new malware, black hats generally employ one or both of the follow-
ing techniques: obfuscation and behavior addition/modification [11] in order to circum-
vent malware detectors. Obfuscation attempts to hide the true intentions of mali-
cious code without extending the behaviors exhibited by the malware. Behavior addi-
tion/modification effectively creates new malware, although the essence of the malware
may not have changed. The widespread use of the aforementioned techniques by mal-
ware coders along with those mentioned by researchers [12, 19] suggests that reused
code is a major component in the development of new malware. This implication plays
a critical role in some of the signature-based malware detection–sometimes referred to
as misuse detection–methods as we shall see in Section 4.3.
havior. In anomaly-based detection, the inverse of this knowledge comes from the
learning phase. So theoretically, anomaly-based detection knows what is anoma-
lous behavior based on its knowledge of what is normal. Since anomalous behavior
subsumes malicious behavior, some sense of maliciousness is captured by anomaly-
based detection. If the malware detector employs a signature-based method, its knowl-
edge of what is malicious comes from its repository, which is usually updated/maintained
manually by people who were able to identify the malicious behavior and express it in
a form amenable for the signature repository, and ultimately for a machine to read.
The other input that the malware detector must take as input is the program under
inspection. Once the malware detector has the knowledge of what is considered mali-
cious behavior (normal behavior) and the program under inspection, it can employ its
detection technique to decide if the program is malicious or benign. Although Intrusion
Detection Systems (IDS) and malware detectors are sometimes used synonymously,
a malware detector is usually only a component of a complete IDS.
Techniques used for detecting malware can be categorized broadly into two categories:
anomaly-based detection and signature-based detection. An anomaly-based detec-
tion technique uses its knowledge of what constitutes normal behavior to decide the
maliciousness of a program under inspection. A special type of anomaly-based de-
tection is referred to as specification-based detection. Specification-based techniques
leverage some specification or rule set of what is valid behavior in order to decide
the maliciousness of a program under inspection. Programs violating the specification
are considered anomalous and usually, malicious. Signature-based detection uses its
characterization of what is known to be malicious to decide the maliciousness of a pro-
gram under inspection. As one may imagine this characterization or signature of the
malicious behavior is the key to a signature-based detection method’s effectiveness.
Figure 1 depicts the relationship between the various types of malware detection
techniques. Each of the detection techniques can employ one of three different ap-
proaches: static, dynamic, or hybrid (see Figure 1). The specific approach or analysis
of an anomaly-based or signature-based technique is determined by how the technique
gathers information to detect malware. Static analysis uses syntax or structural prop-
Anomaly-based detection usually occurs in two phases–a training (learning) phase and
a detection (monitoring) phase. During the training phase the detector attempts to learn
the normal behavior. The detector could be learning the behavior of the host or the PUI
or a combination of both during the training phase. A key advantage of anomaly-based
detection is its ability to detect zero-day attacks. Weaver, et al. [53] describe zero-day
exploits. Similar to zero-day exploits, zero-day attacks are attacks that are previously
unknown to the malware detector. The two fundamental limitations of this technique
is its high false alarm rate and the complexity involved in determining what features
should be learned in the training phase.
Figure 2 illustrates why anomaly-based detection alone is insufficient for malware
detection. As shown, V is the set of all valid behaviors of a system derived from a set
of non-conflicting requirements, and V’ is the set of all invalid behaviors. As is often
the case, an implementation approximates its requirements. Anomaly-based detection
attempts to approximate the implementation.
the high false positive rate commonly associated with anomaly-based detection tech-
niques. The possibility for a system to exhibit previously unseen behavior during the
detection phase is not zero. Therefore, the probability of an anomaly-based technique
raising a false positive is not zero. Developing better approximations to a computer
system’s normal behavior is an open Computer Science problem.
PAYL
Wang and Stolfo [51] present PAYL, a tool which calculates the expected payload
for each service (port) on a system. A byte frequency distribution is created which al-
lows for a centroid model to be developed for each of the host’s services. This centroid
model is calculated during the learning phase. The detector compares incoming pay-
loads with the centroid model, measuring the Mahalanobis distance between the two.
The Mahalanobis distance takes into account not only the mean values of a feature
vector but also variance and covariance yielding a stronger statistical measure of sim-
ilarity. If the incoming payload is too far from the centroid model (a large Mahalanobis
distance value), then the payload is considered to be malicious.
Wang and Stolfo evaluated their technique on the 1999 MIT Lincoln Labs data.
This data contains 3 weeks of training data, and 2 weeks of testing data. Of the 201
attacks in the Lincoln Labs data, 97 of them should be detected by Wang and Stolfo’s
technique. The authors’ technique could detect 57 of the 97 attacks it should have
been able to catch. Overall, the detection rate for their technique was approximately
60 percent when the false positive rate was 1 percent or lower.
Wang and Stolfo also evaluated their technique on a Columbia University Computer
Science (CUCS) dataset. The value in running their technique on this data was that it
was real data. The MIT Lincoln Labs data, although very thorough, is simulated data.
Using real data gives some confidence that the technique is actually effective for use
in a real network environment. Due to the privacy policies of Columbia University this
dataset has been destroyed. Consequently, other researchers cannot use this data as
another basis of comparison for Intrusion Detectors. PAYL was able to detect the buffer
overflow attack of the Code Red II malware in the CUCS dataset.
false negatives.
their technique is robust against buffer overflow attacks, Trojan horses, maliciously
crafted input, password guessing attacks, and Denial-of-Service attacks.
process and the corresponding process under inspection, then there is a high likelihood
that the process under inspection is a possible intrusion.
Wepsi et al. evaluate their method using the ftpd process. They compare their
method to Hofmeyr et al.’s [21] comparable method which uses fixed length audit trail
patterns. From the ftpd process, 65 unique benign sequences were derived from it.
17 percent of the fixed length patterns matched these benign sequences of the ftpd
process. 72 percent of the benign sequences were matched with the variable length
approach. Based on this experiment, the variable-length approach appeared to be
significantly more accurate than the fixed length approach.
NATE
Taylor and Alves-Foss [48] propose a computationally low cost approach to detect-
ing anomalous traffic. Their approach is referred to as the Network Analysis of Anoma-
lous Traffic Events (NATE). This technique focuses on attacks which exploit network
protocol vulnerabilities. Their approach relies on the assumption that malicious pack-
ets tend to have a large number of syn, fin, and reset packets, while having a low
number of ack packets. This approach also relies on abnormalities showing up in the
number bytes being transferred for each packet. A session is defined as information
flow from a source to some IP address and port number pair. Multivariate cluster anal-
ysis is used to group the normal TCP/IP sessions.
To evaluate their technique Taylor and Alves-Foss used the 1999 MIT Lincoln labs
data, namely the FTP, HTTP, and SMTP data. The authors noted that the drawback
in using this data set is that it is simulated, and therefore, their tool, NATE, may be-
have differently in a real environment. They used Mahalanobis distance to measure
the distance between the known attacks and the normal clusters. Portsweep, Satan,
and Neptune were significantly distant from the normal clusters and consequently eas-
ily found to be anomalous. However, the Mailbomb seemed to match some of the
generated normal clusters.
is an extended finite state automaton. An EFSA can (1) make transitions on events
that have arguments, and (2) use a finite set of state variables in which values can be
stored. EFSAs model the network interface of the gateway host of the target network.
Sekar et al. leverage statistical properties seen in network traffic to determine the
maliciousness of the network events on a target network. For example, the number of
timeout transitions that are taken over some subset traces of the EFSA can be taken to
identify useful properties about the data stream. Another example would be to analyze
the distribution of state variable values seen over some period of time. Anomalous
behavior is based on repetition. The authors use the Lincoln Labs 1999 evaluation
data. Sekar et al.’s approach were able to detect all attacks in the 1999 data that were
within the scope of their method. Their method generated 5.5 false alarms a day, which
is low when compared to false alarm rate reported in the Lincoln Labs evaluation data.
In static anomaly-based detection, characteristics about the file structure of the pro-
gram under inspection are used to detect malicious code. A key advantage of static
anomaly-based detection is that its use may make it possible to detect malware without
having to allow the malware carrying program execute on the host system.
Fileprint Analysis
Li et al. [31] describe Fileprint (n-gram) analysis as a means for detecting malware.
During the training phase, a model or set of models are derived that attempt to charac-
terize the various file types on a system based on their structural (byte) composition.
These models are derived from learning the file types the system intends to handle.
The authors’ premise is that benign files have predictable regular byte compositions for
their respective types. So for instance, benign .pdf files have a unique byte distribution
that is different from .exe or .doc files. Any file under inspection that is deemed to vary
“too greatly” from the given model or set of models, is marked as suspicious. These
suspicious files are marked for further inspection by some other mechanism or decider
to determine whether it is actually malicious.
Li et al. found that applying 1-gram analysis to PDF files embedded with malware
pretty effectively relative to the COTS AV scanner they compared their technique to in
their experiments. Li et al.’s technique exhibited detection rates between 72.1 percent
and 94.5 percent for PDF file that had embedded malware, whereas the COTS AV
scanner had a detection rate of zero effectively. The caveat of the aforementioned
experiment is that the embedded malware is embedded in the either the head or tail of
the PDF files tested. Since it is also possible to embed malware in the middle of a PDF
file carefully, such that the reader may still be able to open the file, the authors found
it worthwhile to assess their techniques ability to detect malware in PDF files that have
malware embedded in the middle portion of PDF files. The authors believe that more
work needs to be done to determine the viability and effectiveness of 2-gram or 3-gram
analysis. When tested on different file types, the detection results varied.
Strider GhostBuster
Wang et al. [52] propose a method for detecting a type of malware they refer to as
“ghostware.” Ghostware is malware that attempts to hide its existence from the Operat-
ing System’s querying utilities. This is typically done by intercepting the results for these
queries and modifying them so traces of the ghostware could not be found/detected via
API queries. For example, if a user performs a command to list the files in the current
directory, “dir,” the ghostware would remove any of its resources from the results re-
turned by the “dir” command.
Wang et al. offer a “cross-view diff-based” approach to detecting these type of mal-
ware. In addition to this approach they offer two ways of scanning for the malware, one
being an inside-the-box approach and the other an outside-the-box approach. Since
there are many layers that return values, and actual arguments must pass through
them when a system call is made, many opportunities are afforded to ghostware to
intercept function calls. The authors’ proposed method to counter this vulnerability will
compare the results from a high-level system call like “dir” to a low-level access of the
same data without using a system call. An example of a low-level access may be ac-
cessing the Master File Table (MFT) directly. This described process is considered the
“cross-view diff-based” approach.
The inside-the-box approach mandates that the comparison of the high-level and
low-level results are within the same machine. However, one may imagine that the
ghostware may compromise the entire Operating System in which case the low-level
Self-Nonself
Forrest et al.’s technique in [17] is generally infeasible for real detection tools. The
goal of the proposed technique is to detect modifications to data being protected. A
caveat of this approach is that it cannot detect the removal of items from the protected
data collection. “Self” is defined as the protected data. “Other” is all data that do no
match “Self.” It may becoming clear why this may be infeasible. In order to be effective,
Other must be approximated. The approximation proposed by the authors still results
in unpalatable computational costs. The idea is that if Self is modified, a match will
be made with an element from the Other collection. As the probability of matching two
random data (strings) exactly for arbitrarily sized data (strings) is extremely low, the
authors relax the notion of matching. The authors approach allow for matching to be
defined by the user. For example, if comparing strings of size ten each, a match can
be defined as a contiguous subsequence of length two starting at the same position in
both strings.
In some of the tests conducted, the authors created infected files by mutating a
single character in a file. The results showed that with more detectors available, the
detection of the this modification is discovered at a higher rate. They evaluated their
technique on a real virus, namely, the TIMID virus. The results show that for 2 detec-
tors, the virus can be detected 76 percent of the time reliable. When 10 detectors are
present, the virus is found virtually all the time.
audit-trails that is compared to the audit-trails of the PUI. The audit-trails of the PUI are
captured by the Operating System.
A potential disadvantage of this technique is that the malware would be detected
after the attack. Another potential disadvantage is that the technique can only be as
granular as the Operating System’s auditing mechanism. No empirical study of the
effectiveness of this approach was given.
users to disable SRAS at will. Another alternative would be to force the user to recom-
pile the application, and only allow certain types of non-LIFO behavior, namely going
deeper into the stack. Another alternative, would not involve recompiling the appli-
cation, and would involve dynamically inserting SRAS push and pop instructions into
the executable for the known non-LIFO procedures. Lee et al.’s simulation revealed
that their technique had negligible performance degradation consequences (less than
1 percent for SRAS of 128 entries). They evaluated their technique on 12 SPEC2000
integer benchmarks. There was one benchmark (parser) where the SRAS has 64
entries where the performance degradation 2.11 percent.
ating System and processor. Their method is based on the notion that the data can be
marked as either safe or unsafe–authentic or spurious, respectively.
A security policy specified by the Operating System will identify the spurious data.
Spurious data are typically all the privileged I/O of networking mechanisms. The Op-
erating System does this by setting a bit of the identified data. The processor ensures
that the control in the program is not contingent on the spurious data. Spurious data
should not be a jump target nor should it be the address of a load/store operation–
unless, however, it is within a known bound that has been explicitly checked. If it finds
that control is based on spurious data–by checking the bit that is tagged by the Op-
erating System–the processor will throw an exception that is caught by the Operating
System. At the point the Operating System catches this exception, the action typically
taken by the Operating System is to kill the process that caused the exception.
The data is tracked through dependencies, that is, if any data created that was
dependent on some spurious data, then it too is spurious. For example, if one operand
of the add operator is marked as spurious while the other is not marked as spurious,
then the resultant will be marked as spurious.
The authors evaluated their approach against stack buffer overflows, heap buffer
overflows, vudo (heap buffer overflow), and format string attacks. Their approach was
able to stop all attacks with no false alarms.
jected code–then the basic block under inspection is considered malicious. Milenkovic
et al. determined that their technique did not decrease system efficiency significantly,
and consequently suggests that their technique is viable.
January were 627 minutes and 66 minutes respectively. The authors compared their
reverse sequential hypothesis testing to their implementation of virus throttling. Virus
throttling is a mechanism used to reduce the number of outgoing first-contact connec-
tions from a host. The basic idea of virus throttling is to queue outgoing first-contact
connections when the current working set is beyond some threshold. For example if
the threshold is 5, and the host has made 5 first-contact connections with other hosts,
subsequent first-contact connections are queued. Then once per second, the least
recently used first-contact connection in the working set is removed, and a queued
first-contact connection is enqueued and placed in the current working set. The au-
thors’ algorithm, though operationally slower than virus throttling, outperformed virus
throttling’s effectiveness by more than two fold.
During the detection phase, static specification-based detection uses the structural
properties of the PUI to determine its maliciousness.
against the behavioral specifications while simultaneously making the problem of de-
tection easier. If the specifications are violated, then the program under inspection is
deemed malicious. Bergeron et al. give no empirical study for their proposed method.
This work differs from [2] primarily in its specification of how to derive a high-level
imperative representation and its use of program slicing. For example, in order to
make the disassembled code easier to analyze, the program stack is eliminated. That
is all pop() and push() calls are eliminated and replaced with mov() instructions that
move values in to or out of temporary variables. Program slicing produces a subset of
program statements considered security relevant when given a slicing criterion (a node
from the CFG and a subset of the program variables).
following: control flow safety, memory safety, and stack safety. At the time of the paper,
the authors were still developing the implementation, so no preliminary results were
given. The ideas given in this appear are very similar to that given in [14], however no
comparison is given of the two approaches.
DOME
Rabek, et al. [38] offer a technique called DOME (Detection Of Malicious Executa-
bles). DOME was designed to detect injected, dynamically generated, and obfuscated
code. DOME is characterized by two steps. In the first step, DOME statically prepro-
cesses the PUI. Preprocessing consists of (1) saving system call addresses, (2) their
names, and (3) the address of the instruction directly following each system call. The
third component saved by DOME are the return addresses for system calls in the exe-
cutable. In the second step, DOME monitors the executable at runtime, ensuring that
all system calls made at runtime match those recorded from the static analysis per-
formed in the first step. The API is instrumented with pre-stub and optionally post-stub
code at load time. The pre-stub code ensures that items 1 - 3 from the preprocessing
stage match what is seen at execution time. In the proof of concept study conducted
by Rabek et al. they found that DOME was able to detect all system calls made by the
malicious code.
general this approach in intractable. Wagner and Dean use branching factor to help
evaluate their approach. If an application’s execution were to be frozen at some point
in its execution, then the branching factor would be the set of system calls that would be
allowed to execute next without setting off any alarms. Having a small branching factor
is desirable as this suggests that attackers who try to circumvent IDSs by mimicking
the normal behavior of the PUI (mimicry attack) will have fewer ways to successfully
attack a system without being detected. Wagner and Dean found that checking the
arguments to system calls helped with performance and precision. This would suggest
that it is always desirable to check the arguments to system calls as it makes the
models more precise and decreases the number of valid possible paths for a given
model and consequently, a given execution. In evaluating their models, the authors
found that generally the abstract stack model was most precise, then the call graph
model, and then the digraph model.
Wagner and Dean implemented their system on Red Hat Linux, whereas Giffin et al.
used Solaris 8.
StackGuard
StackGuard [13] prevents a type of buffer overflow called “stack smashing.” Stack-
Guard is a compiler extension that can detect the change of active return addresses
or simply prevent writing to active return addresses for programs compiled with Stack-
Guard. Detection of the buffer overflow attack exploits the fact that stack smashing is
a linear attack. In other words, typically, stack smashing overwrites everything up to
and including the return address. So StackGuard inserts a “canary word” close to the
return address. Before going to the return address, the canary word is checked. If the
canary word has changed then control is not transferred to the return address. In order
to prevent overwriting of a return address, StackGuard leverages MemGuard which is
a tool for offering memory protection. When a function is active (is called) its return
address is made read-only (via its virtual page), and making the location writeable af-
ter the function returns. In all experiments conducted by the authors, StackGuard was
able to detect the stack smashing attacks.
SPiKE
SPiKE, [49], is a framework designed to help users monitor the behavior of appli-
cations, with the goal of finding malicious behavior. The monitoring is done by instru-
menting Operating System services (system calls). The authors claim that “most if not
all malware are sensitive to code modification.” If this statement is true, then in order for
binary instrumentation to be useful, it must be more stealthy to malware. For example,
instrumentation introduces abnormal latency for a system call. Malware may request
for the real-time clock time and ascertain that it is being monitored because system
calls are taking too long to complete. SPiKE combats this or hides itself by applying
a clock patch such that any requests will resemble a time closer to what the malware
would expect (i.e. the real-time clock would reflect a time that would be consistent with
the time generated by the real-time clock had normal execution of the system call taken
place).
SPiKE allows for instrumentation anywhere in an executable with the use of “drifters.”
A drifter can be described by its 2 components: a code-breakpoint and an instrument.
Code-breakpoints are implemented by setting the “not-present” attribute of the to-be-
instrumented memory location’s virtual page. Once the page fault exception is raised,
an opportunity for stealth instrumentation avails itself. The instrument component of a
drifter is simply the monitoring (or whatever functionality) the user desires to perform.
In the assessment of Vasudevan and Yerraballi, SPiKE appeared to successfully track
the W32.MyDoom Trojan, which is an intelligent malware instance that can identify
traditional binary instrumentation.
a known signature.
Currently, we primarily rely on human expertise in creating the signatures that rep-
resent the malicious behavior exhibited by programs. Once a signature has been cre-
ated, it is added to the signature-based method’s knowledge (i.e. repository). One
of the major drawbacks of the signature-based method for malware detection is that it
cannot detect zero-day attacks, that is an attack for which there is no corresponding
signature stored in the repository.
Figure 3 illustrates the major disadvantage of signature-based methods. Since the
set of possible malicious behaviors, U, is infinitely large, there are no known techniques
for accurately representing U via signatures. Furthermore, a repository of signatures
is a weak approximation to U. Another drawback of signature-based methods is that
human involvement/expertise is usually needed to develop the signatures. This not
only allows for the introduction of human error, but takes considerably more time than
if signature development was completely automated. Given that some malware has
the capability to spread extremely fast, the capability to quickly develop an accurate
signature becomes paramount. Automated signature builders do exist [26], but more
work needs to be done in this area.
ceptible to obfuscations. Another salient reason for building signatures in this manner
is to minimize the number of malware signatures that are stored in the repository. Al-
though currently, storage is not an issue, over time, storage could potentially become
a serious one as this will have direct affects on the time complexity of the malware
detector.
coming in and going out of a single node. A base signature is when a server changes
into client. Since the worm must propagate itself, after compromising the server, it must
once again act like a client to another host in hopes of infecting more machines typi-
cally via the same vulnerability. This approach to detection is not as effective if being
applied in a peer-to-peer environment.
Another base signature is alpha-in and alpha-out. This base signature simply says
that the worms typically send similar data across nodes, and therefore often have sim-
ilar if not the same ingress and egress data flow links. This approach is limited in that
for some services, it is not unusual for them to send out similar data. For example,
there is nothing unusual about file servers receiving and sending similar data.
Another form of behavioral signature is called fanout. Fanout simply places a
threshold on the number of descendants a host can have at any given time. The de-
scendant relation is an example of an inductive signature. Inductive signatures assume
that there is some set of infected hosts responsible for infecting other hosts, which in
turn infect more than one other host, causing an exponential growth in infections and
a great indicator that a worm is spreading rapidly. In order to create a signature for
this type of a behavior, thresholds must be set on the following: (1) the tree’s depth,
(2) number of descendants in the tree, (3) average branching factor, and (4) the time it
takes to reach a particular tree depth.
The authors analyzed the server-to-client signature and the alpha-in/alpha-out sig-
natures. The server to client signature was found to be perfectly sensitive to active
worms that changes a server to a client. The alpha-in/alpha-out signature is depen-
dent on the threshold value chosen for alpha. For example, an alpha value of 1, would
be very unhelpful as the false alarm rate would be exceedingly high.
known signatures for a match. A major advantage of the static signature-based method
is that the PUI can be analyzed and maliciousness accurately determined without hav-
ing to run the executable.
SAVE
Sung et al. [47] propose a method called Static Analysis for Vicious Executables
(SAVE). The form of the signature for a given virus is given by a sequence of Windows
API calls. Each API call is represented by a 32-bit number. The most significant 16 bits
correspond to the module the API call belongs to, whereas the least significant 16 bits
corresponds to the API functions position in a vector of API functions. The Euclidean
distance is calculated between the known signatures and the sequence of API calls
found in the program under inspection. The average of three similarity functions gives
the similarity of the PUI’s API sequence with that of the signatures from the repository.
If the difference is 10 percent or less, then the PUI is flagged as malicious.
Sung et al. compared SAVE to 8 malware detectors. The malware detectors SAVE
was compared to were Norton, McAfee Unix Scanner, McAfee, Dr. Web, Panda,
Kasperksy, F-Secure, and Anti Ghostbusters. Sung et al. tested all these scanners
against variants of W32.Mydoom, W32.Bika, W32.Beagle, and W32.Blaster.Worm.
SAVE was the only detector that was able to detect all variants of the aforementioned
malware in the study.
Semantics-aware
In the work of Christodorescu et al. [11] malware signatures are represented by
templates. Each template is a 3-tuple of instructions, variables, and symbolic con-
stants. Templates attempt to generalize the signature of a malware instance and yet
maintain the essence of the malicious code’s behavior.
Three steps are needed to identify whether or not the PUI is malicious. First, the
PUI is converted into a platform independent intermediate representation (IR) which
is a variant of the x86 language. Next, a control flow graph is computed for the inter-
mediate representation of the PUI, and is compared to that control flow graph of the
template. Finally, comparison is done via the use of def-use pairs. If for each def-use
pair found in the template, there is a corresponding def-use pair in the IR of the PUI,
then the program is malicious.
Christodorescu et al.’s results indicate their template based approach has the ability
to detect malware variants with zero false positives. The 21 email worm instances
were derived from the following malware families: Netsky, B[e]agle, and Sober. The
impressive detection ability was achieved with only 2 templates, namely a decryption
loop and a mass mailing template. The mass mailing behavior of the Sober worm was
not detected; however, this was because the implementation developed at the time of
the work did not support the Microsoft runtime library.
Christodorescu et al. also found that on 2,000 benign Windows programs, their al-
gorithm did not produce any false positives. The authors also evaluated their methods
resilience to obfuscation, namely garbage insertion. They used 3 types of garbage
insertion methods. One method is the nop insertion, which is the insertion of nop in-
structions. Another garbage insertion method is stack-op insertion, which inserts stack
operations that do not change the semantics of the malware. Lastly, math-op insertion
is used, where arithmetic operations are inserted into the malware. B[e]agle.Y was
chosen as the malware to obfuscate. Christodorescu et al.’s algorithm outperformed
McAfee virus scan for nop insertion, stack-op insertion, and math-op insertion by 25
percent, 75 percent, and 90 percent respectively.
Honeycomb
Kreibich and Crowcroft [26] proposed honeycomb, which is a system that uses hon-
eypots to generate signatures and detect malware stemming from network traffic. The
authors’ technique operates under the assumption that traffic which is directed to a
honeypot is suspicious.
Honeycomb stores the information regarding each connection, even after the con-
nection has been terminated. The number of connections it can save is limited. The
reassembled stream of the connection is stored. The Longest Common Subsequence
(LCS) algorithm is used to determine whether a match is found between connections
stored, and new connections that honeycomb receives.
Since honeycomb needs a set of signatures to compare to, initially it uses anoma-
lies in the connection stream to create signatures. For example, if odd TCP flags are
found, then a signature will be generated for that stream. The signature is the stream
that came into honeycomb modulo honeycomb’s responses to the incoming stream.
To detect malicious code, horizontal and vertical detection schemes are used. In
the horizontal approach, the last (nth) message of an incoming stream is compared
to the nth message of all streams stored by honeycomb. In the vertical approach,
the messages are aggregated, and the LCS algorithm is run on the aggregation of
the newly arrived stream as well as the aggregated form of the streams stored by
honeycomb.
Signatures that are not used much are deleted from the queue of signatures. Also
if a new signature is found to be the same, or a subset of an already existing signature,
it is not added to the signature pool. This helps keep the signature pool as small
as possible. At regular intervals signatures found are reported and logged to another
module. In Kreibich and Crowcroft’s empirical study, they were able to develop “precise”
signatures for the Slammer and CodeRed II worms.
MEDiC
NMTMEDiC stands for New Mexico Tech’s Malware Examiner using Disassembled
Code [46]. The method proposed by Sulaiman et al. attempts to identify malicious code
by comparing assembly code of the PUI with known malicious signatures. The PUI is
disassembled using the PE Explorer producing ASM code. Since assembly code is
essentially a collection of key/value pairs where the key is the code label, and the value
corresponds to the instructions for the given label, MEDiC compares programs on the
basis of these key/value pairs. A dictionary threshold is used to determine whether or
not a key/value pair is “important.” The key/value pairs deemed important are recorded
as checkpoints. The virus threshold decides whether a PUI is malicious. The virus
threshold is the lowest ratio of matches over the number of checks performed. MEDiC
has three scanning phases. In the first phase, the key/value pairs are compared with
those in the signature set. If the virus threshold is not surpassed, then MEDiC proceeds
to the second phase. In the second phase, comparisons are only done for the value
component of key/value pair effectively disregarding the key component. This phase is
robust against malicious code that changes the name of program labels in an attempt to
obfuscate its attack. The third phase is depicted as a “more thorough” process. There
is a search threshold which relaxes the matching constraint. The search threshold
allows for some minimum number of instructions to match before MEDiC decides that
it has found a match between a known signature’s instruction and the PUI’s instruction.
When compared to 13 different viruses, MEDiC outperformed Norton, McAfee, Dr.
Web, Panda, Kaspersky, F-Secure, and Pccilin. The 13 viruses were variants of the
Dos, CodeGreen, Aircop, Badboy, and Yaha. Also to the credit of MEDiC, no false
positives were observed.
The hybrid signature-based detection approach uses static and dynamic properties to
determine the maliciousness of the PUI.
modeled as state machines which represent malicious behavior. The user specifies
the detection policy (or signature). When a match occurs, the mobile application is
considered malicious. In an experiment, their tool was able to detect 600 virus/worm
samples.
enters the passive anti-worm scheme. An example of a condition could be timer. For
instance, after three hours of the active anti-worm scheme, the anti-worm changes into
the passive scheme. The hybrid was just as successful as the anti-worm scheme, with
less network traffic.
The IDS based anti-worm scheme was the last propagation scheme evaluated by
the authors. The idea is that there would be IDS sensors deployed throughout the Inter-
net. These sensors would detect when malicious traffic was traveling between nodes.
Once malicious packets were identified, anti-worms will be sent to the sender and in-
tended recipients. The effectiveness of this approach hinges on the packet capture rate
of the sensors that would be placed at various nodes around the Internet. For instance,
with a packet capture rate of .2 percent, the number of infected hosts is well below 15
percent. With a lower packet capture rate of .05 percent, the number of infected hosts
appears to be over 30 percent of the total hosts.
tell-tale sign mandates that checks be done on dereferenced pointers to ensure they
have valid addresses and that pointers and arrays do not overflow. The third class of
tell-tale signs requires human expertise and may utilize program slicing. A tell-tale sign
from this class is “Identification of Changes.” For example, if a user has what he writes
maliciously altered before it reaches the reading application on his system, slicing for
the write() and read() system calls may be useful. An example of how the technique
works on a .c file is given in the paper; however, no empirical study evaluating the
method is given.
5 Summary
In this survey we have presented a series of techniques, examples, issues, and top-
ics within the area of malware detection. We have proposed a novel classification
scheme for malware detection techniques. We have also identified inadequacies in the
signature-based and anomaly-based (specification-based) detection methods.
Table 1 classifies the detection techniques described in this survey. The sample of
malware detection techniques depicted here is an indicator of current trends in mal-
ware detection and raises some interesting questions. For example, why are there so
few (only one in this survey) static anomaly-based–that are not specification-based–
techniques ? Is specification-based detection the most promising malware detection
technique–as it seems to have the most techniques in the literature ? Does the ma-
jority of the research community simply find anomaly-based and specification-based
detection more interesting than signature-based detection ? Given Figure 2 and Fig-
ure 3, should we be contemplating other strategies for malware detection ?
To answer the raised questions effectively, a universal metric for malware detection
ability needs to be developed. Since certain Operating Systems and file systems are
more susceptible to malware attacks than others, this should also be considered in the
development of a metric system. Currently, evaluation of malware detection techniques
seem rather arbitrary. When the creator of technique A compares his technique to
technique B, it is usually unclear whether technique B has its parameters optimized
for detection. It is also unclear whether the set of test subjects (malware) used for
techniques A and B had properties that made technique A appear better, and if this
is case, it is unclear whether there is possibly another set of test subjects that would
make technique B appear better.
The literature suggests that COTS malware detectors are easily obfuscated. Every
malware detector prototype from the literature that is compared to COTS malware de-
tectors outperforms them. Given the current state of malware detection research, com-
parison to COTS malware detectors should be treated as a sign of baseline achieve-
ment and not a sign of a particularly strong malware detection technique.
References
[1] F. Adelstein, M. Stillerman, and D. Kozen. Malicious code detection for open
firmware. In Proceedings of the 18th Annual Computer Security Applications Con-
ference, 2002.
[2] J. Bergeron, M. Debbabi, J. Desharnais, M.M. Erhioui, and N. Tawbi. Static de-
tection of malicious code in executable programs. Int. J. of Req. Eng., 2001.
[3] J. Bergeron, M. Debbabi, M.M. Erhioui, and B. Ktari. Static analysis of binary
code to isolate malicious behavior. In 8th Workshop on Enabling Technologies:
Infrastructure for Collaborative Entrerprises, 1999.
[5] F. Castaneda, E. C. Sezer, and J. Xu. Worm vs. worm: preliminary study of an
active counter-attack mechanism. Proceedings of the 2004 ACM Workshop on
Rapid Malcode, 2004.
[6] CERT/CC, Carnegie Mellon University. http: // www. cert. org/ present/
cert-overview-trends/ module-2. pdf , May 2003.
[7] CERT/CC, Carnegie Mellon University. http: // www. cert. org/ present/
cert-overview-trends/ module-4. pdf , May 2003.
[8] CERT/CC, Carnegie Mellon University. http: // www. cert. org/ stats/ cert \
stats. html \#incidents , last updated: April 2006.
[16] E. Filiol. Malware pattern scanning schemes secure against black-box analysis.
Journal of Computer Virol., 2006.
[18] J. T. Giffin, S. Jha, and B. Miller. Detecting manipulated remote call streams. 11th
USENIX Security Symposium, 2002.
[19] J. Gordon. Lessons from virus developers: The beagle worm history through april
24, 2004. SecurityFocus, May 2004.
[20] W. Halfond and A. Orso. Amnesia: Analysis and monitoring for neutralizing sql-
injection attacks. In Proceedings of the 20th IEEE/ACM Internation Conference
on Automated Software Engineering, pages 174 – 183, 2005.
[30] W. Lee and S. Stolfo. Data mining approaches for intrusion detection. In Proceed-
ings of the 7th USENIX Security Symposium, 1998.
[31] W. Li, K. Wang, S. Stolfo, and B. Herzog. Fileprints: Identifying file types by n-gram
analysis. 6th IEEE Information Assurance Workshop, June 2005.
[33] R.W. Lo, K.N. Levitt, and R.A. Olsson. Mcf: Malicious code filter. Computers and
Society, pages 541–566, 1995.
[34] W. Masri and A. Podgurski. Using dynamic information flow analysis to detect
attacks against applications. In Proceedings of the 2005 Workshop on Software
Engineering for secure sytems –Building Trustworthy Applications, 30, May 2005.
[35] G. McGraw and G. Morrisett. Attacking malicious code: A report to the infosec
research council. IEEE Software, 17(5):33–44, 2000.
[37] A. Mori, T. Izumida, T. Sawada, and T. Inoue. A tool for analyzing and detecting
malicious mobile code. In Proceedings of the 28th International Conference on
Software Engineering, pages 831 – 834, 2006.
[39] San Diego Supercomputer Center. http: // security. sdsc. edu/ incidents/
worm. 2000. 01. 18. shtml , June 2002.
[40] I. Sato, Y. Okazaki, and S. Goto. An improved intrusion detection method based
on process profiling. IPSJ Journal, 43:3316 – 3326, 2002.
[45] G. E. Suh, J. Lee, and S. Devadas. Secure program execution via dynamic infor-
mation flow tracking. International Conference Architectural Support for Program-
ming Languages and Operating Systems, 2004.
[47] A. Sung, J. Xu, P. Chavez, and S. Mukkamala. Static analyzer of vicious executa-
bles (save). In Proceedings of the 20th Annual Computer Security Applications
Conference (ACSAC ’04), 00:326–334, 2004.
[48] C. Taylor and J. Alves-Foss. Nate – network analysis of anomalous traffic events,
a low-cost approach. New Security Paradigms Workshop, 2001.
[49] A. Vasudevan and R. Yerraballi. Spike: Engineering malware analysis tools us-
ing unobtrusive binary-instrumentation. In Proceedings of the 29th Australasian
Computer Science Conference, pages 311–320, 2006.
[50] D. Wagner and D. Dean. Intrusion detection via static analysis. IEEE Symposium
on Security and Privacy, 2001.
[54] A. Wespi, M. Dacier, and H. Debar. Intrusion detection using variable-length audit
trail patterns. Recent Advances in Intrusion Detection (RAID), 2000.
[55] J. Xiong. Act: Attachment chain tracing scheme for email virus detection and
control. In Proceedings of the ACM Workshop on Rapid Malcode (WORM), 2004.