0% found this document useful (0 votes)
48 views12 pages

Malgene: Automatic Extraction of Malware Analysis Evasion Signature

The document discusses an automated technique called MalGene for extracting analysis evasion signatures from malware samples. MalGene uses algorithms from bioinformatics to locate evasive behavior in system call sequences and identifies call events and data comparisons used for evasion to construct a signature. Malware samples are clustered based on their underlying evasive techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views12 pages

Malgene: Automatic Extraction of Malware Analysis Evasion Signature

The document discusses an automated technique called MalGene for extracting analysis evasion signatures from malware samples. MalGene uses algorithms from bioinformatics to locate evasive behavior in system call sequences and identifies call events and data comparisons used for evasion to construct a signature. Malware samples are clustered based on their underlying evasive techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MalGene: Automatic Extraction of Malware Analysis

Evasion Signature

Dhilung Kirat Giovanni Vigna


University of California, Santa Barbara University of California, Santa Barbara
[email protected] [email protected]

ABSTRACT Keywords
Automated dynamic malware analysis is a common approach computer security; malware analysis; evasive malware; se-
for detecting malicious software. However, many malware quence alignment; bioinformatics
samples identify the presence of the analysis environment
and evade detection by not performing any malicious ac- 1. INTRODUCTION
tivity. Recently, an approach to the automated detection
Automated dynamic malware analysis is a common ap-
of such evasive malware was proposed. In this approach,
proach for analyzing and detecting a wide variety of mali-
a malware sample is analyzed in multiple analysis environ-
cious software. Dynamic analysis systems have become more
ments, including a bare-metal environment, and its various
popular because signature-based and static-analysis-based
behaviors are compared. Malware whose behavior deviates
detection approaches are easily evaded using widely available
substantially is identified as evasive malware. However, a
techniques such as obfuscation, polymorphism, and encryp-
malware analyst still needs to re-analyze the identified eva-
tion. However, many malware samples identify the presence
sive sample to understand the technique used for evasion.
of the analysis environment and evade detection by avoid-
Different tools are available to help malware analysts in this
ing the execution of suspicious operations. Malware authors
process. However, these tools in practice require consider-
have developed several ways to detect the presence of mal-
able manual input along with auxiliary information. This
ware analysis systems [13, 25, 26, 28, 29]. The most common
manual process is resource-intensive and not scalable.
approach is based on the inspection of some specific arti-
In this paper, we present MalGene, an automated tech-
facts related to the analysis systems. This includes checking
nique for extracting analysis evasion signatures. MalGene
for the presence of registry keys or I/O ports, background
leverages algorithms borrowed from bioinformatics to auto-
processes, function hooks, or IP addresses that are specific
matically locate evasive behavior in system call sequences.
to some known malware analysis service. For example, a
Data flow analysis and data mining techniques are used to
malware running inside a Virtualbox guest operating sys-
identify call events and data comparison events used to per-
tem can simply inspect Virtualbox-specific service names,
form the evasion. These events are used to construct a suc-
or the hardware IDs of the available virtual devices, and
cinct evasion signature, which can be used by an analyst to
check for the substring VBOX. Another approach to evasion
quickly understand evasions. Finally, evasive malware sam-
is to fingerprint the underlying CPU that is executing the
ples are clustered based on their underlying evasive tech-
malware. For example, fingerprinting can be achieved by
niques. We evaluated our techniques on 2810 evasive sam-
detecting the differences in the timing property of the ex-
ples. We were able to automatically extract their analysis
ecution of certain instructions, or a small variation in the
evasion signatures and group them into 78 similar evasion
CPU execution semantics [25, 29].
techniques.
Recently, an approach to the automated detection of eva-
sive malware has been proposed [17]. In this approach, mal-
Categories and Subject Descriptors ware is executed in a bare-metal execution environment as
C.2.0 [Computer-Communication Networks]: General— well as environments that leverage virtualization and em-
Security and protection; D.4.6 [Software Engineering]: ulation. Malware behaviors are extracted from these exe-
Security and Protection—Invasive software (malware); J.3 cutions and compared to detect deviations in the behavior
[Computer Applications]: Life and Medical Sciences— in the assumption that bare-metal execution represents the
Biology and genetics “real” behavior of the malware. Malware whose behavior
deviates substantially among the execution environments
Permission to make digital or hard copies of all or part of this work for personal or
is labeled as evasive malware. This way, evasive malware
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
is identified without knowing the underlying evasion tech-
on the first page. Copyrights for components of this work owned by others than the nique. This approach requires each malware to be run on
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or a bare-metal environment. However, compared to a bare-
republish, to post on servers or to redistribute to lists, requires prior specific permission metal environment, emulated and virtualized environments
and/or a fee. Request permissions from [email protected]. are easier to scale and they provide far better control and
CCS’15, October 12–16, 2015, Denver, Colorado, USA.
visibility over malware execution. For these practical rea-
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3832-5/15/10 ...$15.00.
sons, emulated or virtualized sandboxes are widely used for
DOI: http://dx.doi.org/10.1145/2810103.2813642 . large-scale automated malware analysis. However, keeping
up emulated and virtualized sandboxes resistant to evolv- analysis and inverse document frequency-based techniques
ing evasion techniques is a current industry challenge. To to automatically identify call events and data comparisons
combat sandbox evasion attacks, a complete understanding used by the evasion techniques. We build evasion signatures
of evasion techniques is the first fundamental step, as this from these identified events. Finally, malware samples are
knowledge can help “fix” sandboxes and make them robust clustered based on their underlying evasive techniques.
against evasion attacks. Currently, understanding evasion Our work makes the following contributions:
techniques is largely a manual process.
Several analysis tools are available to help analyze mal- • We present MalGene, a system for automatically ex-
ware behavior differences [7, 15]. These tools are effective in tracting evasion signatures from evasive malware. Our
performing manual, fine-grained analysis of evasive malware. system leverages a combination of data mining and
However, they require additional auxiliary information, such data flow analysis techniques to automate the signa-
as a set of system calls corresponding to malicious behavior ture extraction process, which can be applied to a
or the selection of control-flow differences. Finding this aux- large-scale sample set.
iliary information is a manual process. This manual process
• We propose a novel bioinformatics-inspired approach
is resource-intensive and not scalable. However, performing
to system call sequence alignment for locating evasions.
such analysis on a large scale is necessary to combat rapidly
The proposed algorithm performs deduplication, dif-
evolving evasion attacks.
ference pruning, and can handle branched sequences.
In general, the manual process required to understand an
evasion instance starts from two sequences of system call • We evaluated our techniques on 2810 evasive samples.
traces of the same malware sample when executed in two dif- We were able to automatically extract their analysis
ferent execution environments. The malware sample evades evasion signatures and group them into 78 similar eva-
one of the environments, creating a difference between the sion techniques.
system call sequences. The first step of the evasion analysis
involves finding the location in the system call traces where
the execution deviates due to evasion. After accurately
2. EVASION SIGNATURE MODEL
locating the deviation, understanding the evasion requires In general, malware evades analysis in two steps. First, it
identifying environment-specific artifacts that are used for extracts information about the execution environment. Sec-
fingerprinting the analysis environment. In the first step, ond, it performs some comparison on the extracted informa-
manually finding the location of the deviation in the system tion to make the decision whether to evade or not. Usually,
call sequence can be difficult. The naı̈ve approach of looking malware uses system calls and user-mode API calls in the
for the first call that is different in both sequences does not first step to probe the execution environment. In the second
work. System call traces are usually noisy, and there can be step, it uses some predefined constant values or information
thousands of events in the sequence. Even when running the extracted from previous system or user API calls. With this
same program in exactly the same environment twice, the generalization, we define an evasion signature as a set of sys-
system call traces can be quite different. Thread schedul- tem call events, user API call events, and comparison events
ing is one of the main reasons for these differences, however, that are used as the basis for evading the analysis system.
other factors, such as operating-system and library-specific A comparison event is an execution of a comparison instruc-
aberrations, initialization characteristics, and timing, can tion, such as a CMP instruction in the x86 instruction set.
play a substantial role. Another approach would be to take Usually, a call to one of such instructions is necessary to
a diff of the sequences, in the assumption that there will make the control flow decision during evasion, which is the
be a large gap in the alignment corresponding to the eva- second step of the evasion process as mentioned earlier.
sion in one of the environments. However, this approach Formally, let P be the set of all call events (both system
may not accurately align the sequences. A generic diff algo- calls and API calls) and Q be the set of all comparison events
rithm finds the longest common subsequence (LCS) of the that are used by an evasion technique; we define the evasion
sequences. This approach is effective when large portions of signature ∆ of this technique as:
the sequences have unique alphabets, such as the lines of a
source code. However, a system call sequence has a limited ∆=P ∪Q
alphabet, while the sequence itself is usually long. Because
We represent a call event p : p ∈ P as a pair (name(p),
of this, instead of forming a gap, some subsequence of sys-
attrib(p)), where, name(p) represents the name of the call,
tem calls corresponding to malicious behaviors are likely to
e.g., NtCreateFile, and attrib(p) represents the name of
align with another sequence where the malicious behavior is
the operating system object associated with the call, e.g.,
absent.
C:/boot.ini. We represent a comparison event q : q ∈ Q
In this paper, we present MalGene, an automatic tech-
as a pair (p, v), where p is a call event that produced the
nique for extracting human-readable evasion signatures from
information in the first operand compared by event q. v
evasive malware. MalGene leverages local sequence align-
represents either some constant value used in the second
ment techniques borrowed from bioinformatics to automat-
operand, or another call event that produced the information
ically locate evasions in a system call sequence. Such se-
for the second operand.
quence alignment techniques are widely used for aligning
We extract the evasion signature ∆ of an evasive malware
long sequences of DNA or proteins [11, 14, 27]. These algo-
sample in two steps. In the first step, we locate the evasion
rithms are known to be effective even if there are large gaps
in the call sequences resulting from the execution of the mal-
and the size of the alphabet is limited, such as the alphabet
ware in different environments, as described in Section 3.2.3.
of four bases: Thymine (T), Adenine (A), Cytosine (C), and
In the second step, we identify the elements of ∆ used for
Guanine (G) in case of DNA sequence. We use data flow
the evasion, as described in Section 4.
The model defined by ∆ only captures those evasion tech- system calls in the Windows platform, and b) the differ-
niques that must trigger some system or user API calls. The ence between the sequences tends to be large. System call
majority of known evasion techniques falls in this category. names when combined with their arguments can increase
Some techniques may not directly make a system or API call, the size of the alphabet. However, there are frequent sys-
such as a forced exception-based CPU-fingerprinting [25]. tem calls, such as NtAllocateVirtualMemory, that act on
However, such techniques indirectly trigger calls to excep- unnamed OS objects or nondeterministic argument values.
tion handlers, which are captured by P . But again, in case of Using nondeterministic argument values, such as memory
an emulated CPU, there are known evasion techniques that addresses, creates too many undesirable mismatches result-
are entirely based on the inspection of the FPU, memory, ing a poor alignment. In such cases, we discard the attrib()
or register state after the execution of certain instructions. values of the system call events to get a more stable align-
Some evasion techniques are based on stalling code. Our ments. To illustrate this, let us take example sequences A
current model does not capture such evasion techniques. and B as shown in Figure 1(a). Here, sequences A and B
are system call sequences of the same malware sample when
3. SEQUENCE ALIGNMENT executed in two different execution environments. Sequence
A corresponds to the execution environment where the mal-
The input to our system is a set of evasive malware sam-
ware evades analysis, while sequence B corresponds to the
ples detected by an automatic evasion detection systems,
execution environment where the malware shows its mali-
called BareCloud [17]. BareCloud provides information about
cious activity. The “malicious section” of the sequence B
which of the analysis environments a malware sample evades.
corresponding to the malicious activity of the malware sam-
To extract the evasion signature of an evasive malware sam-
ple is illustrated with a darker background. This malicious
ple, we analyze the sample in two analysis environments
section is missing in the sequence A because the malware
where it evades one of the environments while showing mali-
sample evades analysis. In this example, the LCS-based
cious activity in the other. In the first step, we start from the
alignment matches the first three calls from A1 with B1, as
two sequences of system call events from these two analysis
expected. However, the rest of the sequence of A is matched
environments. Because the system calls related to the mali-
with common subsequences from the malicious section of B
cious activities are entirely missing in one of the sequences,
to maximize the length of the common subsequence. In this
there must be an observable deviation between the two se-
case, it is an algorithmically optimal but semantically incor-
quences. The goal here is to efficiently and accurately find
rect alignment. However, this is likely to happen because
the location of the deviation in the sequence corresponding
the malicious sections are usually long and the alphabet is
to the evasion. To do this, we first align two sequences start-
limited in size. Note that the system call NtTerminatePro-
ing from the beginning, introducing gaps as required for an
cess does not align because such alignment will result in
optimal alignment. We locate the deviation by finding the
a shorter common subsequence. However, the alignment of
largest gap in the aligned sequence. We consider this loca-
important call events is critical for accurately locating the
tion as the evasion point. The malware activity significantly
evasion point. This LCS-based alignment example shows
differs after this point, implying evasion.
that the longest common subsequence may not always pro-
The intuition here is that an evasive malware sample must
duce the most meaningful alignment of the system call se-
perform its evasion “check” in both environments before the
quences.
evasion point. Once we locate the evasion point, we extract
To address this problem, we propose to apply sequence
the evasion signature from the detailed analysis log, which
alignment algorithms borrowed from bioinformatics. Such
contains user API calls and comparison events, as described
algorithms are used to identify regions of similarity in se-
in Section 4. Note that only the system-call level moni-
quences of DNA, RNA, or proteins [11, 14, 27]. These re-
toring is required for locating the evasion point. This is
gions of similarity usually correspond to evolutionary rela-
advantageous because the monitoring of user API calls and
tionships between the sequences [22]. In the case of system
comparison events may not be available in both analysis en-
call sequences, such similarity regions correspond to the ex-
vironments. However, most of the existing malware analysis
ecution of similar code or the same high-level library func-
systems are capable of producing system-call level execution
tions. While aligning system call sequences, the alignments
profiles.
of some system calls are more critical than others, such as
Apart from the malware evasion, there can be other fac-
the alignment of NtTerminateProcess in Figure 1(a), be-
tors that can cause deviation in the malware execution. We
cause they represent important events in the program execu-
followed all strategies proposed in BareCloud [17] to limit
tion. Sequence alignment algorithms from bioninformatics
deviations due to external factors. That is, we used identical
can prioritize such critical alignments. Furthermore, these
local network and identical internal software configurations
algorithms support more versatile similarity scores among
for all execution environments. We executed each malware
system calls, which can produce better approximation of the
sample in both environments at the same time to mitigate
alignments in the presence of noise in the sequences.
date time-related deviations. We used network service filters
There are two approaches to sequence alignment: Global
to provide consistent responses to DNS and SMTP commu-
Alignment and Local Alignment. In the next section, we
nications for all environments.
briefly describe these approaches.
One simple approach to finding the largest gap in the
alignment of system call sequences would be to take a diff of
the sequences. However, the generic diff algorithm finds the 3.1 Global and Local Alignments
longest common subsequence (LCS) of the sequences, which When finding alignments, global alignment algorithms,
may not accurately align the sequences in our context. This such as Needleman-Wunsh [24], take the entirety of both
is because a) a system call sequence is usually a long series sequences into consideration. It is a form of global optimiza-
of events drawn from a limited alphabet, e.g., around 300 tion that forces the alignment to span the entire length [27].
A B A B
Start NtOpenKeyedEvent(MEMORYEVENT) NtOpenKeyedEvent(MEMORYEVENT)
Start NtOpenKeyedEvent(MEMORYEVENT) NtOpenKeyedEvent(MEMORYEVENT)
NtQuerySystemInformation(SysInfo) NtQuerySystemInformation(SysInfo) B1 NtQuerySystemInformation(SysInfo) NtQuerySystemInformation(SysInfo)
A1 NtQueryValueKey(.../SystemBiosVersion) NtQueryValueKey(.../SystemBiosVersion) A1 NtQueryValueKey(.../SystemBiosVersion) NtQueryValueKey(.../SystemBiosVersion) B1
NtAllocateVirtualMemory() NtAllocateVirtualMemory() NtAllocateVirtualMemory() NtAllocateVirtualMemory()
NtReadVirtualMemory() NtReadVirtualMemory() NtReadVirtualMemory() NtReadVirtualMemory()
Evasion NtMapViewOfSection() NtOpenProcess(CSRSS.EXE)
B2 Evasion NtMapViewOfSection() NtOpenProcess(CSRSS.EXE)
NtTerminateProcess() NtMapViewOfSection() NtTerminateProcess() NtMapViewOfSection()
A2 A2
NtSetInformationThread() NtSetInformationThread() NtAllocateVirtualMemory()
NtAllocateVirtualMemory()
NtUnmapViewOfSection() NtUnmapViewOfSection() NtReadVirtualMemory()
NtReadVirtualMemory()
NtClose() B3 NtClose() NtMapViewOfSection()
End NtMapViewOfSection() End NtSetInformationThread()
NtUnmapViewOfSection()
B2
NtSetInformationThread() NtClose()
NtUnmapViewOfSection()
B4
NtQueryInformationProcess(ProcInfo)
NtProtectVirtualMemory()
NtClose() NtOpenProcessTokenEx()
NtQueryInformationProcess(ProcInfo) NtQueryInformationToken(TokenUser)
NtProtectVirtualMemory() NtReadVirtualMemory()
NtOpenProcessTokenEx() B5 NtClose()
NtQueryInformationToken(TokenUser)
NtReadVirtualMemory() NtMapViewOfSection()
NtTerminateProcess()
B3
NtClose()
NtMapViewOfSection() NtClose() B4
NtTerminateProcess()

NtClose() B6

a) Diff (LCS) b) Local Alignment


Figure 1: Sequence Alignments

This approach is useful when there is no deviation in the To obtain the optimal local alignment, backtracking is per-
malware behavior, or the deviation is minimal. formed starting from the highest value in the matrix H(i, j).
Local alignment algorithms, such as Smith-Waterman [30], We used a scalable implementation of the local alignment
tend to find good matches of local subsequences between algorithm [14]. We provide more information about the sim-
two sequences. Hence, these algorithms identify regions of ilarity score function and gap penalty schema in the next
similarity within long sequences that are often widely di- sections.
vergent overall. This approach is better if there are large
missing parts in the sequence. This is true for a system call 3.2 System Call Alignment
sequence corresponding to evasion, such as sequence A in A system call sequence consists of a sequence of system
Figure 1(b), which is missing system calls corresponding to call events. While the order of biological sequences repre-
B2, the malicious section of B. For this reason, we use a Lo- sents a structural property, the order of system call sequence
cal Alignment algorithm for aligning system call sequences. represents the temporal execution order. The order of sys-
Figure 1(b) represents the alignment using a local alignment tem call events has stronger significance when events are
algorithm. Notice that there is no undesirable alignment interdependent. For example, in order to create a thread in
with the malicious section of the sequence B. The NtTer- a foreign process to run arbitrary code, one must follow a
minateProcess system call is aligned even though the total certain order of system calls. Even with insertion of gaps,
number of matches is smaller compared to the LCS-based sequence alignment preserves this order while aligning se-
alignment (8 vs. 9 matches). The alignment in Figure 1(b) quences.
is clearly the better alignment for locating the evasion point
compared to the LCS-based alignment in Figure 1(a).
3.2.1 Similarity Score
One of the most important parts of the sequence align-
ment algorithm is the similarity-scoring schema. Based on
3.1.1 Local Alignment the domain knowledge, the scoring schema computes a simi-
In this section, we briefly describe the Smith-Waterman [30] larity score between two elements in the sequence. A straight-
local alignment algorithm. forward approach would be to simply assign a value µ > 0
Given two sequences A = a1 , a2 , ..., an and B = b1 , b2 , ..., bm for a match and σ < 0 for a mismatch. Values of µ and σ
of length n and m respectively, a maximum similarity ma- can be constant values or they may depend on the pair of
trix H is computed using the following induction: sequence elements being compared.
H(i, 0) = 0, 0 ≤ i ≤ m, There are many studies on modeling similarity schema
H(0, j) = 0, 0 ≤ j ≤ n, for biological sequence alignment [3, 9]. These schemata are
and   based on biological evidence, where a mismatch is treated

 0 
 as mutation. In general, the match score µ is based on
H(i − 1, j − 1) + Sim(ai , bj ) the functional significance of the match, and the mismatch
 
H(i, j) = max ,

 maxk≥1 {H(i − k, j) + Wk }  score σ is statistically computed from the observed muta-
 max {H(i, j − l) + W } 
tions seen in nature. Point Accepted Mutation (PAM) [9]

l≥1 l
1 ≤ i ≤ m, 1 ≤ j ≤ n and Blocks Substitution Matrix (BLOSUM) [3] are the two
where a and b are strings over the alphabet Σ, Sim(a, b) is most widely-used similarity schemata. The main focus of
a similarity score function on the alphabet, and Wi is the gap these schemata is to model mismatch scores based on the
penalty schema. Here, H(i, j) represents the maximum sim- observed probability of the mutation under comparison. A
ilarity score between suffixes of [a1 , a2 ...ai ] and [b1 , b2 ...bi ]. similar approach may be useful while comparing system call
sequences of polymorphic variants of malware. However, There are three main types of gap penalties used in the
we are comparing system call sequences of the same code. context of biological sequences: constant, linear, and affine
We observed that malware polymorphism happens mostly gap penalty. The constant gap penalty simply gives a fixed
during the propagation step, i.e., while the malware sample negative score for each gap opening. This value does not
creates a copy of itself, while runtime polymorphism is less depend on the length of the gap. This is a simple and fast
common. Moreover, achieving the same functionality by re- schema. However, this schema gives too much freedom for
placing the system call is difficult. That is, the probability sequence alignment, resulting in unnecessary long gaps. The
of mutation in the system call sequences extracted from two linear gap penalty, as the name implies, linearly increases
executions of the same malware sample is very small. This the penalty score in proportion to the length of the indel.
means that the mismatches of system calls are less common. This method favors shorter gaps by severely penalizing long
In our case, the challenge is to meaningfully quantify match indels, which is not suitable in our context. The affine gap
and mismatch in case of system calls. There may be a vary- penalty combines both constant and linear gap penalties,
ing number of arguments associated with each system call taking the form ga + (gb ∗ L). That is, it assigns an opening
event. Not all arguments are equally important for similar- gap penalty ga , which increases with the rate of gb . We can
ity computation. As discussed earlier, alignments of some use a smaller value for gb to favor longer gaps. By choosing
system calls are more important than others. For example, |ga | > |gb |, we can model a gap penalty such that it is easier
we want to prioritize the alignment of NtCreateProcess over to extend a gap than to open it. We use this model of gap
NtQueryValueKey because creating a process is a more crit- penalty when aligning system call sequences.
ical event compared to reading a registry value. We can
assign a high similarity value for a match of a critical sys- 3.2.3 Parameter Selection
tem call, which helps build an “anchor point” during the In our approach to system call alignment, like any other
alignment process. In our current model, the list of such alignment problem, there are certain constraints we need
critical system calls includes system calls that create and to follow while designing similarity score and gap penalty
terminate processes and threads. We propose the following parameters. More precisely, we want to have the follow-
similarity-scoring schema for computing similarity between ing inequality relation as a guideline for choosing parameter
two system calls. values:

Sim(a, b) = Bias(a, b) ∗ (N ameSim(a, b) + AttribSim(a, b))


nwt ≤ ga < gb < 0 < wt + nwa < wt < wt + wa
where,
Here, we want all mismatches and indels to have negative
  values and all matches, including partial matches, to have
wt if name(a) = name(b), positive values. Intuitively, a match where both name()
N ameSim(a, b) =
nwt if name(a) =
6 name(b) and attrib() match gets the highest score (wt + wa ). Sim-
, ilarly, a name() match and attrib() mismatch gets a lower

wa if name(a) = name(b)
 score (wt + nwa ) than when name() matches but there is
no attrib() associated with the events to be compared with,
 
and attrib(a) = attrib(b),

 

 
AttribSim(a, b) = nwa if name(a) = name(b) such as for the NtYieldExecution system call event. By


 and attrib(a) 6= attrib(b), 

 choosing nwt ≤ ga , we favor gaps over mismatched align-
ment. The inequality relation among parameters and their
 
0 If name(a) 6= name(b)
relative values are more important than the actual values
, and of the parameters. If all parameters are scaled by the same
factor, the final alignment output of the algorithm remains
  the same.
 wb if name(a) or name(b)
Furthermore, the bias multiplier wb used to compute

Bias(a, b) = is an important system call,

1 else.  Bias(a, b) needs to be large enough to overcome possible
penalty introduced by expected long gaps in case of evasive
Here, a and b are system call events, and name() and samples. For example, we want to prioritize the alignment of
attrib() have the meaning described in Section 2. In prac- the NtTerminateThread system call, which is usually located
tice, similar system calls are those calls that perform similar towards the end of the sequences, which requires a long gap.
actions on similar operating system objects.
3.2.4 Deduplication
3.2.2 Gap Penalty Sometimes a tight loop in the execution may produce a
Another important component of the sequence alignment long sequence of repeated short subsequences. Such rep-
algorithm is the gap penalty schema. In general, a gap etition may contain thousands of system calls, excessively
penalty is a negative score added to the similarity score to increasing the space and time complexity requirement for
discourage indels (insertion or deletion). Large gap penalty sequence alignment. To this end, we identify contiguously
is effective in aligning sequences properly if the majority of repeating subsequence of system calls of length one, two,
the sequences are identical. However, in our case, we expect and three. If such subsequence repeats more than five times
to have gaps in the sequence because of the noise and the contiguously, we discard all remaining subsequences during
evasion. Since our goal is to properly identify the gap intro- sequence alignment. There are two advantages in doing this.
duced by evasion, in some way we want to encourage long First, it greatly reduces the space and time requirement for
gaps in the alignment. sequence alignment. Second, it prevents possible inaccurate
detection of the longest gap due to the difference in the rep- pute the similarity score between two meta-nodes, we first
etition count between two call sequences. perform sequence alignment of the sequences corresponding
to the meta-nodes. Similarity is then computed as the differ-
3.2.5 Difference Pruning ence between the total length of the matching sections and
Accurately identifying the largest gap corresponding to the total length of the mismatch sections of the alignment
the evasion is critical in finding the evasion point. However, output. If at least one of the two arguments to Sim(a, b) is a
there is a possibility of short subsequence alignments break- meta-node, the following similarity-scoring schema is used.
ing the large gap associated with the evasion. This may
cause the algorithm to incorrectly pick the largest gap and, Sim(a, b) = M Sim(a, b) (1)
in turn, return the incorrect evasion point. To mitigate this where,
problem, we apply a difference pruning process to prune
possibly-incorrect small alignments in-between large gaps.
 
 sm − sg if a and b are meta-nodes, 
Let S be the sequence alignment output of two sequences M Sim(a, b) = −na if only a is a meta-node,
A and B. Let Sa , Sb , and Sc be three consecutive regions  −n
b if only b is a meta-node. 
of S where, Sb is a match alignment and Sa and Sc are gap .
alignments such that they are both insertion or both dele- Here, sm is the total length of all matching sections of the
tion alignments. We discard the Sb alignment region to com- alignment output corresponding to meta-nodes a and b, sg is
bine two gaps corresponding to Sa and Sc if Sb is relatively the total length of all gap sections of the alignment output,
very small compared to the length of Sa and Sc combined. na is the length of the sequence corresponding to meta-node
More precisely, if length(Sb )/(length(Sa )+length(Sc )) < td a, and nb is the length of the sequence corresponding to
we discard Sb and join Sa and Sc to find the largest gap. meta-node b. Note that if meta-nodes a and b correspond
Through a series of experiments described in Section 5, we to two completely different threads, sg will be greater than
obtained the optimal value of td = 0.02. That is, if the sm in resulting a negative similarity score.
length of a match region between two gaps is less than 2%
of the sum of the lengths of the gaps, we prune the match
region and connect the gaps. This process only affects the
4. EVASION SIGNATURE EXTRACTION
calculation of the largest gap without affecting the actual In the previous section, we described an evasion point as
sequence alignment output. The pruning is performed in the location of deviation in the call event sequence corre-
a single pass without updating the underlying sequence to sponding to the evasion. In this section, we describe how we
avoid a newly-formed longer gap destabilizing the pruning use this information to extract the evasion signature.
process. 4.1 Evasion Section
3.3 Handling Sequence Branching Intuitively, all system calls, API calls, and comparison
events used to make an evasion decision must happen before
All sequence alignment algorithms from bioinformatics can the evasion point. We observed that such events are usually
only handle monolithic single sequences. However, a se- located close to the evasion point. To capture the locality
quence of system calls may include calls from multiple threads. of such events in the sequence, we define an evasion section,
The main process thread can create multiple threads, which which consists of the event sequence prior to but close to
in turn can create more threads. Hence, a system call se- the evasion point. More precisely, let E be the sequence
quence has an inherent tree structure. A naı̈ve way of com- of malware execution events that consists of all system call
bining system calls from multiple threads into a single call events, user API call events, and comparison events, the
sequence can produce anomalous sequence alignment. ′
evasion section E of the event sequence E is defined as:
We propose a recursive algorithm to handle branched se-
quences. The input of this algorithm is a single system call ′
sequence of a process where system calls from all threads are E = {e ∈ E(i) : k − ω ≤ i < k},
chronologically merged. Each event in the sequence is tagged where, k is the index to the evasion point, and ω is the
with its corresponding thread ID. First, we preprocess this size of the evasion section. ′
system call sequence to generate a branching sequence struc- If ω is large enough, E can extend all the way to the
ture by sequentially inspecting events from the start of the beginning of the event sequence E. This case guarantees
sequence. Whenever a new thread is encountered, we in- ′ ′
that P ⊂ E and Q ⊂ E where, P and Q are call events
sert a new meta-node at the location where the thread was
and comparison events related to evasion, as introduced in
created. This is the location of the NtCreateThread system ′

call corresponding to the thread. We create a new blank Section 2. That is, evasion signature ∆ ⊂ E , since ∆ = P ∪

sequence and associate it with the new meta-node. A meta- Q. However, with large values of ω, evasion section E also
node represents a branching point in the main sequence. includes many other events that are not related to evasion.
We remove all occurrences of system calls associated with By reducing the value of ω we can reduce the number of such

the new thread from the main sequence and append it to unrelated events and improve the relation ∆ ≈ E . We also
the newly created sequence associated with the meta-node. observed that the comparison events in Q that are used for
The one-to-one mapping of a new thread event and its cor- evasion are likely to be performed very close to the evasion
responding NtCreateThread may not always be available in point k. This allows us to reduce ω to smaller values and

the execution profile. To this end, we assign a new thread still have Q ⊂ E . This approach might exclude call events
event with the last unassigned call of NtCreateThread. Dur- made earlier in the sequence whose results are used later for
ing the alignment process, two meta-nodes are recursively evasion. To mitigate this, we include all call events that are
processed to compute the similarity score. That is, to com- related to comparison events in Q into the evasion signature.
Notice that, unlike the previous sequence alignment step, Qemu intermediate language, all comparison instructions of
in this step a call event includes both system calls and API x86 architecture are translated into the same intermediate
calls. Although many user API calls correspond to system comparison instruction. For each comparison, taint labels of
calls, many user mode APIs may not trigger any system call. the operands are examined to determine corresponding call
For example, the user mode API GetTickCount in Windows events that produced the data byte. Consecutive compar-
does not invoke any system call (native API). However, this isons are merged into a single comparison event. In case the
API is widely used in timing-based evasions. We must in- comparison in performed with some constant, the constant
clude such call events in the evasion signature to make it value is also extracted.
more accurate and complete. Beside taint analysis, we also analyze handle dependen-
Initially, we set P as the set of all call events in the evasion cies between call events. This allows us to generate a more
′ ′
section E , and Q as the set of all comparison events in E . descriptive value of attrib(e) for the call event e. For ex-
However, even with smaller values of ω, the evasion section ample, if a registry key HKLM/System is opened by a call to

E still contains unrelated call events. In the next section, NtOpenKey and the returned handle is later used for a call
we describe our approach to filtering out these unrelated to NtEnumerateKey, we use the registry key name as the
events using statistical observations. attrib(e) for the call event NtEnumerateKey.
An execution of a program contains many comparisons
4.2 Inverse Document Frequency even if only comparisons with tainted operands are consid-
A call event used to retrieve information from the analysis ered. However, many of such comparisons originate from
environment for fingerprinting is usually unique to the eva- within API functions rather than the actual malware code.
sive behavior. The majority of the malware samples that are For this reason, comparisons inside user API calls are dis-
not evasive do not retrieve those unique pieces of informa- carded, except for API calls that are designed specifically
tion. Similarly, if the same call event e (same name(e) and for data type comparison, such as strings and dates. Com-
attrib(e)) is present in the call sequences of all non-evasive parison events are included in the execution profile of the
malware, such call event is less likely to be used for eva- malware along with the system call and user API call events.
sion. We can filter out call events from the evasion section We build the sequence of malware execution events E

E that occur too often in the collection of call sequences from the execution profile generated by Anubis. We also
of non-evasive malware. To perform such filtering, we use extract the system call sequence from another execution en-
inverse document frequency-based metric. vironment that the malware evaded. Since the evasion code
Inverse document frequency (idf ) is commonly used in in- executes in both environments, we can extract its evasion
formation retrieval [31]. It is a measure of whether a term is signature from Anubis execution profile regardless of which
common or rare across all documents. Formally, the inverse environment is evaded. We identify the evasion point and

document frequency of a term t in a collection of documents evasion section E using the approach described in the pre-
D is defined as: vious sections. We extract the call events P ′ and the com-
parison events Q from the evasion section E . We filter P
N using the idf -based method described previously.
idf (t, D) = log
dft Finally, all call events associated with Q are added to the
where, N is the total number of documents in the corpus set P . The union of P and Q represents our final evasion
and dft is the document frequency, defined as the number of signature ∆ = P ∪ Q.
documents in the collection D that contain the term t. 4.4 Clustering
In our case, a call event is a term, and collection of call
sequences of non-evasive malware is the document corpus D. Given a collection of evasive samples, we propose to assess
For a call event e, a large value of idf (e, D) implies that the different evasion techniques present in the collection based
call event e is unique, and a small value of idf (e, D) implies on the extracted evasion signatures. To do this, we per-
that e is commonplace. Here, idf (e, D) = 0 means the call form hierarchical clustering of evasive samples. This allows
event e is present in all call sequences of D. a malware analyst to prioritize and selectively study differ-
We define a threshold τ such that, if idf (e, D) < τ , we ent evasion techniques without analyzing randomly selected
consider the call event e to be a common event having little samples. To perform manual assessment of a particular clus-
or no discriminating power for building evasion signatures. ter, we can take an intersection of the evasion signatures of
We remove such call events {e : idf (e, D) < τ } from P . all samples from that cluster. That is, we inspect the eva-
sion signature elements that are common to all samples in
4.3 Event Dependency Analysis the cluster.
The next component of the evasion signature is the com- A hierarchical clustering requires a method to compute
parison events Q used for altering control flow during eva- pairwise similarity between two evasion signatures. An eva-
sion. Comparison events can be monitored with any fine- sion signature is essentially a set. We compute similarity
grained instruction-level execution monitoring. However, we between two evasion signatures ∆a and ∆b as a Jaccard
are interested in only those comparisons that involve the use Similarity J, which is given as:
of information generated by previous call events. To track
the information returned by call events we leverage taint | ∆a ∩ ∆b |
J(∆a , ∆b ) = .
analysis. To this end, we build upon the work of the Anubis | ∆a ∪ ∆b |
extension proposed in [5]. Anubis [1] is a malware analysis The result of a hierarchical clustering depends on the
framework, which uses Qemu-based full-system emulation as choice of the linkage method and the similarity measure,
the execution environment. In this approach, information where, the former is usually more critical than the latter [32].
returned by all call events is tainted at the byte level. Inside There are two main choices of linkage methods; single-linkage
and complete-linkage. We use the complete-linkage method used, we selected the last system call as the evasion call.
for our clustering. This is because the complete-linkage For instance, let us take an example evasion instance that
method prefers compact clusters with small diameters over opens a registry key HKLM/HARDWARE/Description/System
long, straggly clusters [21]. As we want maximum similarity using NtOpenKey and reads the value of the key System-
between all pairs of members in a cluster for assessment, the BiosVersion using NtQueryValueKey. Inside Anubis, the
complete-linkage method best fits our purpose. returned value is QEMU -1 because of the underlying Qemu
subsystem, which can be checked for evasion. In this exam-
5. EVALUATION ple, both system calls are related to evasion. However, we
select the last call to NtQueryValueKey as the evasion call.
We evaluated our approach on real-world Windows-based
We note the index of this instance of the system call in the
evasive malware samples. We made this choice because the
sequence as the data point used later in the experiments.
majority of the evasive malware is observed on this platform.
Moreover, the majority of the malware analysis systems are
also focused on the same platform.
5.3 Algorithm Evaluation
In our approach, accurately finding the evasion point is
5.1 Execution Environments the first and critical step towards extracting evasion signa-
In our evaluation, we provide two execution environments tures. This depends on the accuracy of the proposed se-
based on emulation and hardware virtualization, respectively. quence alignment algorithm for system calls. The accuracy
depends on several parameters used by the algorithm. In
5.1.1 Emulation Section 3.2, we discussed some guidelines for choosing opti-
We use Anubis [1] to extract malware execution events mal parameters for algorithm. However, there is no previous
from an emulated environment. Anubis performs execution work on this area. Unlike in the field of bioinformatics, an
monitoring by observing an execution of precomputed guest appropriate labeled dataset is lacking to build a statistical
memory addresses. These memory addresses correspond to model of similarity score for system call sequences. We use
system call functions and user API functions. Anubis is able an incremental approximation-based approach to find opti-
to extract additional information about the API execution mal values of the parameters, which we describe in the next
by inserting its own instructions to the emulator’s instruc- section.
tion execution chain. Besides system calls, we are able to
extract additional information, such as user API calls and 5.3.1 Experiment with Scoring Function
comparison events, which are necessary for building evasion To evaluate our guideline, we performed several experi-
signatures. ments by varying different scoring parameters. For this, we
first chose to vary a set of four main parameters (ga , gb , nwt ,
5.1.2 Hypervisor wt , see Section 3.2). Our preliminary experiments showed
We use Ether [10] to extract malware execution events that the values of these parameters play a major role in the
from a hardware-based virtualized environment. Ether is a algorithm output. For the remaining parameters, we empir-
Xen-hypervisor-based transparent malware analysis frame- ically assigned constant values. For each set of parameter
work that utilizes Intel VT’s hardware virtualization exten- values (ga , gb , nwt , wt ) we performed sequence alignment to
sions [2]. The hardware virtualization makes it possible to find the corresponding evasion point. Let Am be a sequence
execute most of the malware instructions as native CPU in- corresponding to a malware m and let km be the index to the

structions on the real hardware without any interception. calculated evasion point. Let, em be the index to the eva-
Thus, it does not suffer from inaccurate or incomplete sys- sion call in Am , which is known as the ground truth. We
tem emulation. It was observed that Ether can be evaded ′
say that an evasion section of width w successfully captures
in its default setup because it uses QEMU’s device model to ′ ′
the evasion call if km − w ≤ em < km . That is, the evasion
provide virtualized hardware peripherals [17]. We modified ′
call is within the evasion section defined by w . For a set of
the device model used by Ether to prevent such evasion.
N samples, we compute the recall rate corresponding to the

5.2 Dataset parameter set (ga , gb , nwt , wt ) and evasion section w as
The input for our system is a collection of known evasive T P/N , where T P is the number of samples that are within

malware samples. For the evaluation of our system, we re- the evasion section defined by w .
ceived 3,107 evasive samples identified by the BareCloud [17] Figure 2 shows the results of the recall rate of some param-

system. We analyzed those samples in Anubis and our mod- eter sets when varying the evasion section of width w . We
ified Ether environments. We extracted system call traces used the ground truth dataset as described in Section 5.2.
and computed behavior deviation scores as proposed in [17]. The area under the curve (AUC) represents the relative per-
We found that 2810 samples evaded Anubis with respect to formance of the choice of the parameters. The result vali-
our Ether environment. dates some of our initial intuitions. For example, choice of
To build the ground truth dataset, we randomly selected | ga |>| nwt | decrease the algorithm performance (compare
52 samples out of 2810 evasive samples. We manually an- top and second curves), a relatively large score for a match
alyzed those samples and identified the calls and the com- compared to the gap penalty degrades performance (third
parisons that are related to the evasion. This information curve), and a large gap extension penalty gb is not favorable
constitutes the evasion signature ∆ of the malware sam- (top and bottom curves).
ples. To evaluate the alignment algorithm, which works only There are many possible combinations of parameter choices.
on the system call sequences, we identified the most impor- To find the optimal choice, we computed AUC values for all
tant system call that is critical to the evasion technique as possible combinations when ga , gb , nwt , and wt are selected
the evasion call. In case multiple related system calls are from the sets 10 values for each parameter ranging from -10
85
1.0

80
0.8

75
0.6

AUC
Recall Rate

ga = −2, gb = −0.10, nwt = −2, wt = 3


0.4

70
ga = −10, gb = −0.01, nwt = −2, wt = 3
ga = −1, gb = −0.10, nwt = −2, wt = 10
ga = −2, gb = −0.50, nwt = −2, wt = 3
0.2

65
0.0

(−10,−0.10,−5,1) (−10,0.00,−1,5) (−5,0.00,−1,1) (−4,0.00,−2,5) (−2,0.00,−2,1) (−1,0.00,−5,5)

0 20 40 60 80 100 Parameter Combinations (ga,gb,nwt,wt)

Evasion Section Width (ω)


Figure 3: Average AUC values of recall rates corre-
sponding to 10,000 parameter combinations.
Figure 2: Parameter tuning experiment. (wa = 2
and nwa = -2).
exhibit evasive behavior in bare metal, Anubis, Ether, and
VirtualBox analysis environments. This dataset represents
to +10. That is, for each malware sample there are 10,000 the non-evasive malware sample set D as described in Sec-
test cases. To test the correctness of our guidelines, we also tion 4.2. We analyzed these samples in the Anubis environ-
included values that do not satisfy the inequality guideline. ment and extracted system calls, user API calls, and com-
Figure 3 shows the result of the AUC values for all parame- parisons along with the taint dependency information. From
ter combinations. The combination of parameters ga = −2, those extracted events, we calculated the idf values for all
gb = −0.10, nwt = −2, wt = 3 produces the highest value observed events.
of AUC (=85.214). In the next step, we find the optimal idf -based filter thresh-
In the next step, we performed another set of similar ex- old τ for filtering out the common execution events. As
periments by varying the values of other parameters while described in Section 4.2, we filter out a call event e, if
keeping the values of ga , gb , nwt , and wt set to the optimal idf (e, D) < τ . We want a larger value of τ because we
values obtained from the previous experiments. Namely, we want to filter out as many common events as possible. A
obtained optimal values for wa = 2, nwa = −2, wb = 20, value of τ too small may not filter anything, and a value
and deduplication threshold td = 0.02. too large may filter out events that are part of the evasion
signature. To find the optimal value for τ , we first extracted
5.3.2 Comparison with LCS multiple evasion signatures of the ground truth samples by
In this experiment, we compared our sequence alignment setting different values of τ . To compare the quality of the
algorithm with the standard diff algorithm used in Unix diff extracted signatures, we performed a precision-recall analy-

utility [23]. We computed the corresponding evasion point sis of the extracted evasion signatures. Let ∆ be the auto-
using both algorithms and compared their performances by matically extracted signature and let ∆ be the true evasion
computing their recall rates when varying w. The result of signature, which is available from the ground truth samples;
this experiment, shown in Figure 4, clearly shows that our the precision and recall of the evasion signature extraction
proposed alignment algorithm out performs the LCS-based is given as:
algorithm. This also shows that the LCS-based approach
is weak. More than half of the time, the evasion locations ∆∩∆

∆∩∆

precision = ′ , recall = ∆
.
identified using the LCS-based approach were incorrect. ∆
In this result, we can see that with ω > 83 we achieved Figure 5 shows the results of the precision and recall anal-
100% recall rate. That is, all evasion calls of the ground ysis. The curves represent the average characteristics of all
truth dataset are captured when ω > 83. We selected a samples. We can see that the precision decreases and recall
more conservative value of ω = 100 for our next signature increases as we lower the value of τ . Smaller values of τ
extraction experiments. make the idf -based filter weaker, and, hence, the signature
includes a lot of common events, lowering its precision. We
5.3.3 Evasion Signature Extraction can see that the idf -based filter significantly increases the
The next step in the extraction of evasion signatures is quality of the extracted evasion signatures if τ is selected
to build the idf -based filter as described in Section 4.2. For optimally. Precision and recall rate at the crossover point,
this, we obtained 119 non-evasive malware samples from the where the precision and recall curves meet, is 0.83 and the
BareCloud system [17]. These are the samples that did not value of the threshold at this point is τ = 2.75. That is,
Table 1: The summary of top five clusters.
Cluster Count Evasion signature summary
1.0
c6 898 Exception-based emulation detection
c4 582 Cumulative timing of system calls
c5 225 Timing of exception processing
0.8

c8 172 SystemMetrics-based fingerprinting


c18 106 Variant of exception-based detection
0.6
Recall Rate

represent clusters, and the shades of the color inside the


0.4

patch represents the degree of similarity among individual


MalGene
samples within the cluster. To generate unique clusters, we
cut the corresponding dendrogram close to the root of the
0.2

LCS
tree. This way, the clusters formed are very distinct from
each other, representing distinct evasion techniques. A cut
0.0

at the height h = 0.99 produced 78 clusters. Bold lines in


0 20 40 60 80 100 Figure 7 separate these clusters. A cut at the height h = 0
produced 1051 clusters. This represents the number of au-
Evasion Section Width (ω)
tomatically extracted identical evasion signatures.
NtOpenKey, HKLM/System/ControlSet001/Services/Disk/Enum
NtQueryValueKey, HKLM/System/ControlSet001/Services/Disk/Enum->0
Figure 4: Comparison of evasion gap detection. CMP, NtQueryValueKey.KeyValueInformation->’Z’
CMP, NtQueryValueKey.KeyValueInformation->’wmwavboxqemu’
CMP, NtQueryValueKey.KeyValueInformation->’qemu’
at τ = 2.75, the algorithm is able to extract 83% of the el-
ements of true evasion signatures with a precision of 83%.
We use this value for our next experiment on cluster analy- Figure 6: A sample of an automatically extracted
sis. Figure 6 shows a sample evasion signature automatically evasion signature (8964683b959a9256c1d35d9a6f9aa4ef ).
extracted by our system from a malware sample.
We manually analyzed few samples from the top five clus-
ters. Table 1 presents a summary of the findings.
1.0
0.8
Precision and Recall Rate

0.6

Precision
Recall
0.4
0.2
0.0

0 2 4 6 8 10

idf threshod(τ)

Figure 5: Precision recall analysis of the idf thresh-


old τ .
Figure 7: Hierarchical clustering of evasive malware
based on their evasion signature.
5.4 Evaluation on Real-world Samples
In this experiment, we applied our approach to 2810 real-
world evasive malware samples. We extracted the corre- 6. LIMITATIONS
sponding evasion signatures and performed hierarchical clus- The main limitation of our approach is the requirement of
tering as described in Section 4.4. Figure 7 shows the graph- system call sequences from both analysis environments. This
ical representation of the clusters. Here, the smallest rect- limitation prevents us from using pure bare-metal execution-
angles represent samples, larger rectangular patches of color based malware profiles that lack system call monitoring.
One of the ways to achieve such monitoring is to use SMM- orthogonal to those works. MalGene uses sequence align-
based monitoring systems, such as MALT [35]. ment for identifying deviations between sequences rather
Potentially, malware can have multiple evasion points. than finding common patterns as signatures. Furthermore,
That is, a malware can perform multiple evasion checks at our algorithm performs deduplication, difference pruning,
different sections of the system call sequence. Since we lo- and can handle branched sequences. Our approach to the
cate the deviation by finding the largest gap in the aligned extraction of evasion signature leverages data-flow depen-
sequence, our approach only finds one evasion point, and, as dencies to extract relevant but potentially distant events in
a result, extracts only one evasion signature corresponding the sequence. Data-mining techniques are used to discard
to that evasion point. Instead of only using the largest gap, irrelevant events from the evasion signature.
we could consider other large gaps in the aligned sequence, if
any, to identify multiple evasion points. This work is limited 7.2 Differential Program Analysis
to one evasion signature per evasive malware sample. The problem of analyzing the differences between two runs
An adversary with the knowledge of our system can de- of a program has been previously studied [15, 33]. The
velop a mimicry attack to foil evasion signature extraction. work most similar to ours is the approach of differential
For example, an attacker can use a Pseudo Random Num- slicing [15]. Given a pair of two execution traces of the
ber Generator (PRNG) to introduce artificial large gaps with same program and a location of observed difference in the
random system calls to evade evasion point detection. How- trace, differential slicing can identify the input difference
ever, since the malware sample evades one of the analysis that caused the observed difference. The main difference
systems, the actual malicious activity also causes another with our work is that the differential slicing approach re-
large gap in the sequence alignment. In this case, we could quires fine-grained analysis on both analysis environments.
support multiple evasion points and extract separate eva- This may not be always available in all malware analysis
sion signatures from the respective evasion points. However, environments. Our approach does not require fine-grain in-
false positive evasion signatures, such as the one that uses struction level monitoring from both analysis environments.
PRNG techniques need to be manually identified and fil- Furthermore, to find the source of the execution difference,
tered out. The clustering of evasion signatures as described an analyst must first manually identify the location of the
in Section 4.4 can help improve this manual process. observed difference before applying the differential slicing
One of the common limitations inherent to all dynamic analysis. We automate the process of identifying the loca-
analysis system is the use of stalling code. A malware sam- tion of the difference. Therefore, while previous work on dif-
ple can wait for a long time before performing any malicious ferential slicing is suitable for more focused individual anal-
activity. Kolbitsch et al., have proposed a technique to de- ysis, our approach is designed to provided an automated and
tect and mitigate malware stalling code [18]. Our current practical solution to approximate program difference analy-
system will not be able extract signatures for such evasions sis on large scale.
if the stalling part of the code is deterministic, producing
the same call sequence. 7.3 Evasion Detection
If a malware sample has a high level of randomization in Chen et al. proposed a detailed taxonomy of evasion tech-
the code execution, our approach to system call alignment niques used by malware against dynamic analysis system [6].
may not be effective. However, if the malicious activity is Lau et al. employed a dynamic-static tracing technique to
long enough in one of the analysis environments, the align- identify VM detection techniques. Kang et al. [16] proposed
ment algorithm may provide an approximate location of the a scalable trace-matching algorithm to locate the point of
evasion, which can help malware analyst in further analysis. execution diversion between two executions. The system
Another approach is to analyze the same malware sample is able to dynamically modify the execution of the whole-
multiple times in the same environment to detect and nor- system emulator to defeat anti-emulation checks. Balzarotti
malize such inherent randomization [20]. et al. [4] proposed a system for detecting dynamic behavior
The proposed approach of handling sequence branching deviation of malware by comparing behaviors between an
may not be effective for system call sequences produced by instrumented environment and a reference host. The com-
thread pools. This is because the order in which a thread parison method is based on deterministic program execution
in the thread pool is scheduled to handle callbacks can be replay. That is, the malware under analysis is first executed
different among instances of the malware executions. in a reference host while recording the interaction of the
malware with the operating system. Later, the execution is
7. RELATED WORK replayed deterministically in an analysis environment such
that any deviation in the execution is an evidence of an
7.1 Sequence Alignment evasion. Determinsitc replay of a malware sample may be
The sequence alignment problem is widely studied in bioin- challenging if it depends on the external network environ-
formatics [11, 14, 27]. Our work is based on the algorithms ment. In our approach, a malware can be simultaneously
proposed for biological sequence alignment. We extended executed in two environments and analyzed later.
the algorithms to handle sequence with branches. Addition-
ally, we adapted the algorithm and proposed optimal pa- 8. CONCLUSION
rameters in the context of system call sequence alignment. In this paper, we presented MalGene, a system for au-
Sequence alignment techniques are previously used in mal- tomatically extracting evasion signatures from evasive mal-
ware detection for finding common subsequences as signa- ware. We propose a combination of bioinformatic algo-
tures and for pattern matching [8,19,34]. Eskin [12] proposes rithms, data mining, and data flow analysis techniques to
a sparse sequence model to find outliers in the sequences for automate the signature extraction process, so that it can be
anomaly detection. Our use of the sequence alignment is applied to a large-scale dataset.
9. ACKNOWLEDGMENTS [18] C. Kolbitsch, E. Kirda, and C. Kruegel. The Power of
We want to thank our shepherd Konrad Rieck and the Procrastination: Detection and Mitigation of
Execution-stalling Malicious Code. In ACM Conference on
anonymous reviewers for their valuable comments, and Christo- Computer and Communications Security (CCS), 2011.
pher Kruegel for his insight and discussions throughout this [19] V. Kumar, S. K. Mishra, and L. Bhopal. Detection of
project. malware by using sequence alignment strategy and data
This work is sponsored by the Defense Advanced Research mining techniques. International Journal of Computer
Projects Agency (DARPA) under grant N66001-13-2-4039 Applications, 62(22), 2013.
and by the Army Research Office (ARO) under grant W911NF- [20] M. Lindorfer, C. Kolbitsch, and P. M. Comparetti.
09-1-0553. The U.S. Government is authorized to reproduce Detecting Environment-Sensitive Malware. Symposium on
Recent Advances in Intrusion Detection (RAID), pages
and distribute reprints for Governmental purposes notwith- 338–357, 2011.
standing any copyright notation thereon. [21] C. D. Manning, P. Raghavan, and H. Schütze. Introduction
to information retrieval. 2008.
10. REFERENCES [22] D. W. Mount. Sequence and genome analysis.
Bioinformatics: Cold Spring Harbour Laboratory Press:
Cold Spring Harbour, 2, 2004.
[1] Anubis. http://anubis.cs.ucsb.edu. [23] E. W. Myers. Ano (nd) difference algorithm and its
[2] Intel Virtualization Technology. variations. Algorithmica, 1986.
http://www.intel.com/technology/virtualization/. [24] S. B. Needleman and C. D. Wunsch. A general method
[3] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, applicable to the search for similarities in the amino acid
Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and sequence of two proteins. Journal of molecular biology,
psi-blast: a new generation of protein database search 1970.
programs. Nucleic Acids Research, 1997.
[25] R. Paleari, L. Martignoni, G. Fresi Roglia, and D. Bruschi.
[4] D. Balzarotti, M. Cova, C. Karlberger, C. Kruegel, A fistful of red-pills: How to automatically generate
E. Kirda, G. Vigna, and S. Antipolis. Efficient Detection of procedures to detect CPU emulators. In USENIX
Split Personalities in Malware. In Symposium on Network Workshop on Offensive Technologies (WOOT).
and Distributed System Security (NDSS), 2010.
[26] G. Pék. nEther : In-guest Detection of Out-of-the-guest
[5] U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and Malware Analyzers. Proceedings of the Fourth European
E. Kirda. Scalable, behavior-based malware clustering. In Workshop on System Security. ACM, 2011.
Symposium on Network and Distributed System Security
[27] V. Polyanovsky, M. A. Roytberg, and V. G. Tumanyan.
(NDSS), 2009.
Comparative analysis of the quality of a global algorithm
[6] X. Chen, J. Andersen, Z. M. Mao, M. Bailey, and and a local algorithm for alignment of two sequences.
J. Nazario. Towards an Understanding of Algorithms for Molecular Biology, 2011.
Anti-Virtualization and Anti-Debugging Behavior in
[28] T. Raffetseder, C. Kruegel, and E. Kirda. Detecting System
Modern Malware. In Dependable Systems and Networks Emulators. Information Security, pages 1–18, 2007.
With FTCS and DCC, 2008.
[29] J. Rutkowska. Red pill... or how to detect vmm using
[7] P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch, (almost) one cpu instruction, 2004.
C. Kruegel, and S. Zanero. Identifying Dormant
Functionality in Malware Programs. In IEEE Symposium [30] T. F. Smith and M. S. Waterman. Identification of common
molecular subsequences. Journal of molecular biology, 1981.
on Security and Privacy, 2010.
[8] S. E. Coull and B. K. Szymanski. Sequence alignment for [31] K. Sparck Jones. A statistical interpretation of term
specificity and its application in retrieval. Journal of
masquerade detection. Computational Statistics & Data
Analysis, 52(8):4116–4131, 2008. documentation, 1972.
[9] M. O. Dayhoff and R. M. Schwartz. A model of [32] A. J. Vakharia and U. Wemmerlöv. A comparative
evolutionary change in proteins. In Atlas of Protein investigation of hierarchical clustering techniques and
dissimilarity measures applied to the cell formation
Sequence and Structure, 1978.
problem. Journal of operations management, 1995.
[10] A. Dinaburg, P. Royal, M. Sharif, and W. Lee. Ether:
[33] D. Weeratunge, X. Zhang, W. N. Sumner, and
Malware Analysis via Hardware Virtualization Extensions.
In ACM Conference on Computer and Communications S. Jagannathan. Analyzing concurrency bugs using dual
Security (CCS), 2008. slicing. In Symposium on Software Testing and Analysis,
2010.
[11] R. C. Edgar. Muscle: multiple sequence alignment with
high accuracy and high throughput. Nucleic acids research, [34] A. Wespi, M. Dacier, and H. Debar. An intrusion-detection
system based on the Teiresias pattern-discovery algorithm.
2004.
IBM Thomas J. Watson Research Division, 1999.
[12] E. Eskin. Sparse sequence modeling with applications to
[35] F. Zhang, K. Leach, A. Stavrou, H. Wang, and K. Sun.
computational biology and intrusion detection. PhD thesis,
2002. Using hardware features for increased debugging
transparency. In IEEE Symposium on Security and
[13] P. Ferrie. Attacks on virtual machine emulators. Technical
Privacy, May 2015.
report, Symantec Corporation, 2007.
[14] O. Gotoh. An improved algorithm for matching biological
sequences. Journal of molecular biology.
[15] N. M. Johnson, J. Caballero, K. Z. Chen, S. McCamant,
P. Poosankam, D. Reynaud, and D. Song. Differential
slicing: Identifying causal execution differences for security
applications. In IEEE Symposium on Security and Privacy,
2011.
[16] M. Kang, H. Yin, and S. Hanna. Emulating
emulation-resistant malware. ACM workshop on Virtual
machine security. ACM, 2009.
[17] D. Kirat, G. Vigna, and C. Kruegel. BareCloud: bare-metal
analysis-based evasive malware detection. In USENIX
Security Symposium (USENIX), 2014.

You might also like