Malgene: Automatic Extraction of Malware Analysis Evasion Signature
Malgene: Automatic Extraction of Malware Analysis Evasion Signature
Evasion Signature
ABSTRACT Keywords
Automated dynamic malware analysis is a common approach computer security; malware analysis; evasive malware; se-
for detecting malicious software. However, many malware quence alignment; bioinformatics
samples identify the presence of the analysis environment
and evade detection by not performing any malicious ac- 1. INTRODUCTION
tivity. Recently, an approach to the automated detection
Automated dynamic malware analysis is a common ap-
of such evasive malware was proposed. In this approach,
proach for analyzing and detecting a wide variety of mali-
a malware sample is analyzed in multiple analysis environ-
cious software. Dynamic analysis systems have become more
ments, including a bare-metal environment, and its various
popular because signature-based and static-analysis-based
behaviors are compared. Malware whose behavior deviates
detection approaches are easily evaded using widely available
substantially is identified as evasive malware. However, a
techniques such as obfuscation, polymorphism, and encryp-
malware analyst still needs to re-analyze the identified eva-
tion. However, many malware samples identify the presence
sive sample to understand the technique used for evasion.
of the analysis environment and evade detection by avoid-
Different tools are available to help malware analysts in this
ing the execution of suspicious operations. Malware authors
process. However, these tools in practice require consider-
have developed several ways to detect the presence of mal-
able manual input along with auxiliary information. This
ware analysis systems [13, 25, 26, 28, 29]. The most common
manual process is resource-intensive and not scalable.
approach is based on the inspection of some specific arti-
In this paper, we present MalGene, an automated tech-
facts related to the analysis systems. This includes checking
nique for extracting analysis evasion signatures. MalGene
for the presence of registry keys or I/O ports, background
leverages algorithms borrowed from bioinformatics to auto-
processes, function hooks, or IP addresses that are specific
matically locate evasive behavior in system call sequences.
to some known malware analysis service. For example, a
Data flow analysis and data mining techniques are used to
malware running inside a Virtualbox guest operating sys-
identify call events and data comparison events used to per-
tem can simply inspect Virtualbox-specific service names,
form the evasion. These events are used to construct a suc-
or the hardware IDs of the available virtual devices, and
cinct evasion signature, which can be used by an analyst to
check for the substring VBOX. Another approach to evasion
quickly understand evasions. Finally, evasive malware sam-
is to fingerprint the underlying CPU that is executing the
ples are clustered based on their underlying evasive tech-
malware. For example, fingerprinting can be achieved by
niques. We evaluated our techniques on 2810 evasive sam-
detecting the differences in the timing property of the ex-
ples. We were able to automatically extract their analysis
ecution of certain instructions, or a small variation in the
evasion signatures and group them into 78 similar evasion
CPU execution semantics [25, 29].
techniques.
Recently, an approach to the automated detection of eva-
sive malware has been proposed [17]. In this approach, mal-
Categories and Subject Descriptors ware is executed in a bare-metal execution environment as
C.2.0 [Computer-Communication Networks]: General— well as environments that leverage virtualization and em-
Security and protection; D.4.6 [Software Engineering]: ulation. Malware behaviors are extracted from these exe-
Security and Protection—Invasive software (malware); J.3 cutions and compared to detect deviations in the behavior
[Computer Applications]: Life and Medical Sciences— in the assumption that bare-metal execution represents the
Biology and genetics “real” behavior of the malware. Malware whose behavior
deviates substantially among the execution environments
Permission to make digital or hard copies of all or part of this work for personal or
is labeled as evasive malware. This way, evasive malware
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
is identified without knowing the underlying evasion tech-
on the first page. Copyrights for components of this work owned by others than the nique. This approach requires each malware to be run on
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or a bare-metal environment. However, compared to a bare-
republish, to post on servers or to redistribute to lists, requires prior specific permission metal environment, emulated and virtualized environments
and/or a fee. Request permissions from [email protected]. are easier to scale and they provide far better control and
CCS’15, October 12–16, 2015, Denver, Colorado, USA.
visibility over malware execution. For these practical rea-
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3832-5/15/10 ...$15.00.
sons, emulated or virtualized sandboxes are widely used for
DOI: http://dx.doi.org/10.1145/2810103.2813642 . large-scale automated malware analysis. However, keeping
up emulated and virtualized sandboxes resistant to evolv- analysis and inverse document frequency-based techniques
ing evasion techniques is a current industry challenge. To to automatically identify call events and data comparisons
combat sandbox evasion attacks, a complete understanding used by the evasion techniques. We build evasion signatures
of evasion techniques is the first fundamental step, as this from these identified events. Finally, malware samples are
knowledge can help “fix” sandboxes and make them robust clustered based on their underlying evasive techniques.
against evasion attacks. Currently, understanding evasion Our work makes the following contributions:
techniques is largely a manual process.
Several analysis tools are available to help analyze mal- • We present MalGene, a system for automatically ex-
ware behavior differences [7, 15]. These tools are effective in tracting evasion signatures from evasive malware. Our
performing manual, fine-grained analysis of evasive malware. system leverages a combination of data mining and
However, they require additional auxiliary information, such data flow analysis techniques to automate the signa-
as a set of system calls corresponding to malicious behavior ture extraction process, which can be applied to a
or the selection of control-flow differences. Finding this aux- large-scale sample set.
iliary information is a manual process. This manual process
• We propose a novel bioinformatics-inspired approach
is resource-intensive and not scalable. However, performing
to system call sequence alignment for locating evasions.
such analysis on a large scale is necessary to combat rapidly
The proposed algorithm performs deduplication, dif-
evolving evasion attacks.
ference pruning, and can handle branched sequences.
In general, the manual process required to understand an
evasion instance starts from two sequences of system call • We evaluated our techniques on 2810 evasive samples.
traces of the same malware sample when executed in two dif- We were able to automatically extract their analysis
ferent execution environments. The malware sample evades evasion signatures and group them into 78 similar eva-
one of the environments, creating a difference between the sion techniques.
system call sequences. The first step of the evasion analysis
involves finding the location in the system call traces where
the execution deviates due to evasion. After accurately
2. EVASION SIGNATURE MODEL
locating the deviation, understanding the evasion requires In general, malware evades analysis in two steps. First, it
identifying environment-specific artifacts that are used for extracts information about the execution environment. Sec-
fingerprinting the analysis environment. In the first step, ond, it performs some comparison on the extracted informa-
manually finding the location of the deviation in the system tion to make the decision whether to evade or not. Usually,
call sequence can be difficult. The naı̈ve approach of looking malware uses system calls and user-mode API calls in the
for the first call that is different in both sequences does not first step to probe the execution environment. In the second
work. System call traces are usually noisy, and there can be step, it uses some predefined constant values or information
thousands of events in the sequence. Even when running the extracted from previous system or user API calls. With this
same program in exactly the same environment twice, the generalization, we define an evasion signature as a set of sys-
system call traces can be quite different. Thread schedul- tem call events, user API call events, and comparison events
ing is one of the main reasons for these differences, however, that are used as the basis for evading the analysis system.
other factors, such as operating-system and library-specific A comparison event is an execution of a comparison instruc-
aberrations, initialization characteristics, and timing, can tion, such as a CMP instruction in the x86 instruction set.
play a substantial role. Another approach would be to take Usually, a call to one of such instructions is necessary to
a diff of the sequences, in the assumption that there will make the control flow decision during evasion, which is the
be a large gap in the alignment corresponding to the eva- second step of the evasion process as mentioned earlier.
sion in one of the environments. However, this approach Formally, let P be the set of all call events (both system
may not accurately align the sequences. A generic diff algo- calls and API calls) and Q be the set of all comparison events
rithm finds the longest common subsequence (LCS) of the that are used by an evasion technique; we define the evasion
sequences. This approach is effective when large portions of signature ∆ of this technique as:
the sequences have unique alphabets, such as the lines of a
source code. However, a system call sequence has a limited ∆=P ∪Q
alphabet, while the sequence itself is usually long. Because
We represent a call event p : p ∈ P as a pair (name(p),
of this, instead of forming a gap, some subsequence of sys-
attrib(p)), where, name(p) represents the name of the call,
tem calls corresponding to malicious behaviors are likely to
e.g., NtCreateFile, and attrib(p) represents the name of
align with another sequence where the malicious behavior is
the operating system object associated with the call, e.g.,
absent.
C:/boot.ini. We represent a comparison event q : q ∈ Q
In this paper, we present MalGene, an automatic tech-
as a pair (p, v), where p is a call event that produced the
nique for extracting human-readable evasion signatures from
information in the first operand compared by event q. v
evasive malware. MalGene leverages local sequence align-
represents either some constant value used in the second
ment techniques borrowed from bioinformatics to automat-
operand, or another call event that produced the information
ically locate evasions in a system call sequence. Such se-
for the second operand.
quence alignment techniques are widely used for aligning
We extract the evasion signature ∆ of an evasive malware
long sequences of DNA or proteins [11, 14, 27]. These algo-
sample in two steps. In the first step, we locate the evasion
rithms are known to be effective even if there are large gaps
in the call sequences resulting from the execution of the mal-
and the size of the alphabet is limited, such as the alphabet
ware in different environments, as described in Section 3.2.3.
of four bases: Thymine (T), Adenine (A), Cytosine (C), and
In the second step, we identify the elements of ∆ used for
Guanine (G) in case of DNA sequence. We use data flow
the evasion, as described in Section 4.
The model defined by ∆ only captures those evasion tech- system calls in the Windows platform, and b) the differ-
niques that must trigger some system or user API calls. The ence between the sequences tends to be large. System call
majority of known evasion techniques falls in this category. names when combined with their arguments can increase
Some techniques may not directly make a system or API call, the size of the alphabet. However, there are frequent sys-
such as a forced exception-based CPU-fingerprinting [25]. tem calls, such as NtAllocateVirtualMemory, that act on
However, such techniques indirectly trigger calls to excep- unnamed OS objects or nondeterministic argument values.
tion handlers, which are captured by P . But again, in case of Using nondeterministic argument values, such as memory
an emulated CPU, there are known evasion techniques that addresses, creates too many undesirable mismatches result-
are entirely based on the inspection of the FPU, memory, ing a poor alignment. In such cases, we discard the attrib()
or register state after the execution of certain instructions. values of the system call events to get a more stable align-
Some evasion techniques are based on stalling code. Our ments. To illustrate this, let us take example sequences A
current model does not capture such evasion techniques. and B as shown in Figure 1(a). Here, sequences A and B
are system call sequences of the same malware sample when
3. SEQUENCE ALIGNMENT executed in two different execution environments. Sequence
A corresponds to the execution environment where the mal-
The input to our system is a set of evasive malware sam-
ware evades analysis, while sequence B corresponds to the
ples detected by an automatic evasion detection systems,
execution environment where the malware shows its mali-
called BareCloud [17]. BareCloud provides information about
cious activity. The “malicious section” of the sequence B
which of the analysis environments a malware sample evades.
corresponding to the malicious activity of the malware sam-
To extract the evasion signature of an evasive malware sam-
ple is illustrated with a darker background. This malicious
ple, we analyze the sample in two analysis environments
section is missing in the sequence A because the malware
where it evades one of the environments while showing mali-
sample evades analysis. In this example, the LCS-based
cious activity in the other. In the first step, we start from the
alignment matches the first three calls from A1 with B1, as
two sequences of system call events from these two analysis
expected. However, the rest of the sequence of A is matched
environments. Because the system calls related to the mali-
with common subsequences from the malicious section of B
cious activities are entirely missing in one of the sequences,
to maximize the length of the common subsequence. In this
there must be an observable deviation between the two se-
case, it is an algorithmically optimal but semantically incor-
quences. The goal here is to efficiently and accurately find
rect alignment. However, this is likely to happen because
the location of the deviation in the sequence corresponding
the malicious sections are usually long and the alphabet is
to the evasion. To do this, we first align two sequences start-
limited in size. Note that the system call NtTerminatePro-
ing from the beginning, introducing gaps as required for an
cess does not align because such alignment will result in
optimal alignment. We locate the deviation by finding the
a shorter common subsequence. However, the alignment of
largest gap in the aligned sequence. We consider this loca-
important call events is critical for accurately locating the
tion as the evasion point. The malware activity significantly
evasion point. This LCS-based alignment example shows
differs after this point, implying evasion.
that the longest common subsequence may not always pro-
The intuition here is that an evasive malware sample must
duce the most meaningful alignment of the system call se-
perform its evasion “check” in both environments before the
quences.
evasion point. Once we locate the evasion point, we extract
To address this problem, we propose to apply sequence
the evasion signature from the detailed analysis log, which
alignment algorithms borrowed from bioinformatics. Such
contains user API calls and comparison events, as described
algorithms are used to identify regions of similarity in se-
in Section 4. Note that only the system-call level moni-
quences of DNA, RNA, or proteins [11, 14, 27]. These re-
toring is required for locating the evasion point. This is
gions of similarity usually correspond to evolutionary rela-
advantageous because the monitoring of user API calls and
tionships between the sequences [22]. In the case of system
comparison events may not be available in both analysis en-
call sequences, such similarity regions correspond to the ex-
vironments. However, most of the existing malware analysis
ecution of similar code or the same high-level library func-
systems are capable of producing system-call level execution
tions. While aligning system call sequences, the alignments
profiles.
of some system calls are more critical than others, such as
Apart from the malware evasion, there can be other fac-
the alignment of NtTerminateProcess in Figure 1(a), be-
tors that can cause deviation in the malware execution. We
cause they represent important events in the program execu-
followed all strategies proposed in BareCloud [17] to limit
tion. Sequence alignment algorithms from bioninformatics
deviations due to external factors. That is, we used identical
can prioritize such critical alignments. Furthermore, these
local network and identical internal software configurations
algorithms support more versatile similarity scores among
for all execution environments. We executed each malware
system calls, which can produce better approximation of the
sample in both environments at the same time to mitigate
alignments in the presence of noise in the sequences.
date time-related deviations. We used network service filters
There are two approaches to sequence alignment: Global
to provide consistent responses to DNS and SMTP commu-
Alignment and Local Alignment. In the next section, we
nications for all environments.
briefly describe these approaches.
One simple approach to finding the largest gap in the
alignment of system call sequences would be to take a diff of
the sequences. However, the generic diff algorithm finds the 3.1 Global and Local Alignments
longest common subsequence (LCS) of the sequences, which When finding alignments, global alignment algorithms,
may not accurately align the sequences in our context. This such as Needleman-Wunsh [24], take the entirety of both
is because a) a system call sequence is usually a long series sequences into consideration. It is a form of global optimiza-
of events drawn from a limited alphabet, e.g., around 300 tion that forces the alignment to span the entire length [27].
A B A B
Start NtOpenKeyedEvent(MEMORYEVENT) NtOpenKeyedEvent(MEMORYEVENT)
Start NtOpenKeyedEvent(MEMORYEVENT) NtOpenKeyedEvent(MEMORYEVENT)
NtQuerySystemInformation(SysInfo) NtQuerySystemInformation(SysInfo) B1 NtQuerySystemInformation(SysInfo) NtQuerySystemInformation(SysInfo)
A1 NtQueryValueKey(.../SystemBiosVersion) NtQueryValueKey(.../SystemBiosVersion) A1 NtQueryValueKey(.../SystemBiosVersion) NtQueryValueKey(.../SystemBiosVersion) B1
NtAllocateVirtualMemory() NtAllocateVirtualMemory() NtAllocateVirtualMemory() NtAllocateVirtualMemory()
NtReadVirtualMemory() NtReadVirtualMemory() NtReadVirtualMemory() NtReadVirtualMemory()
Evasion NtMapViewOfSection() NtOpenProcess(CSRSS.EXE)
B2 Evasion NtMapViewOfSection() NtOpenProcess(CSRSS.EXE)
NtTerminateProcess() NtMapViewOfSection() NtTerminateProcess() NtMapViewOfSection()
A2 A2
NtSetInformationThread() NtSetInformationThread() NtAllocateVirtualMemory()
NtAllocateVirtualMemory()
NtUnmapViewOfSection() NtUnmapViewOfSection() NtReadVirtualMemory()
NtReadVirtualMemory()
NtClose() B3 NtClose() NtMapViewOfSection()
End NtMapViewOfSection() End NtSetInformationThread()
NtUnmapViewOfSection()
B2
NtSetInformationThread() NtClose()
NtUnmapViewOfSection()
B4
NtQueryInformationProcess(ProcInfo)
NtProtectVirtualMemory()
NtClose() NtOpenProcessTokenEx()
NtQueryInformationProcess(ProcInfo) NtQueryInformationToken(TokenUser)
NtProtectVirtualMemory() NtReadVirtualMemory()
NtOpenProcessTokenEx() B5 NtClose()
NtQueryInformationToken(TokenUser)
NtReadVirtualMemory() NtMapViewOfSection()
NtTerminateProcess()
B3
NtClose()
NtMapViewOfSection() NtClose() B4
NtTerminateProcess()
NtClose() B6
This approach is useful when there is no deviation in the To obtain the optimal local alignment, backtracking is per-
malware behavior, or the deviation is minimal. formed starting from the highest value in the matrix H(i, j).
Local alignment algorithms, such as Smith-Waterman [30], We used a scalable implementation of the local alignment
tend to find good matches of local subsequences between algorithm [14]. We provide more information about the sim-
two sequences. Hence, these algorithms identify regions of ilarity score function and gap penalty schema in the next
similarity within long sequences that are often widely di- sections.
vergent overall. This approach is better if there are large
missing parts in the sequence. This is true for a system call 3.2 System Call Alignment
sequence corresponding to evasion, such as sequence A in A system call sequence consists of a sequence of system
Figure 1(b), which is missing system calls corresponding to call events. While the order of biological sequences repre-
B2, the malicious section of B. For this reason, we use a Lo- sents a structural property, the order of system call sequence
cal Alignment algorithm for aligning system call sequences. represents the temporal execution order. The order of sys-
Figure 1(b) represents the alignment using a local alignment tem call events has stronger significance when events are
algorithm. Notice that there is no undesirable alignment interdependent. For example, in order to create a thread in
with the malicious section of the sequence B. The NtTer- a foreign process to run arbitrary code, one must follow a
minateProcess system call is aligned even though the total certain order of system calls. Even with insertion of gaps,
number of matches is smaller compared to the LCS-based sequence alignment preserves this order while aligning se-
alignment (8 vs. 9 matches). The alignment in Figure 1(b) quences.
is clearly the better alignment for locating the evasion point
compared to the LCS-based alignment in Figure 1(a).
3.2.1 Similarity Score
One of the most important parts of the sequence align-
ment algorithm is the similarity-scoring schema. Based on
3.1.1 Local Alignment the domain knowledge, the scoring schema computes a simi-
In this section, we briefly describe the Smith-Waterman [30] larity score between two elements in the sequence. A straight-
local alignment algorithm. forward approach would be to simply assign a value µ > 0
Given two sequences A = a1 , a2 , ..., an and B = b1 , b2 , ..., bm for a match and σ < 0 for a mismatch. Values of µ and σ
of length n and m respectively, a maximum similarity ma- can be constant values or they may depend on the pair of
trix H is computed using the following induction: sequence elements being compared.
H(i, 0) = 0, 0 ≤ i ≤ m, There are many studies on modeling similarity schema
H(0, j) = 0, 0 ≤ j ≤ n, for biological sequence alignment [3, 9]. These schemata are
and based on biological evidence, where a mismatch is treated
0
as mutation. In general, the match score µ is based on
H(i − 1, j − 1) + Sim(ai , bj ) the functional significance of the match, and the mismatch
H(i, j) = max ,
maxk≥1 {H(i − k, j) + Wk } score σ is statistically computed from the observed muta-
max {H(i, j − l) + W }
tions seen in nature. Point Accepted Mutation (PAM) [9]
l≥1 l
1 ≤ i ≤ m, 1 ≤ j ≤ n and Blocks Substitution Matrix (BLOSUM) [3] are the two
where a and b are strings over the alphabet Σ, Sim(a, b) is most widely-used similarity schemata. The main focus of
a similarity score function on the alphabet, and Wi is the gap these schemata is to model mismatch scores based on the
penalty schema. Here, H(i, j) represents the maximum sim- observed probability of the mutation under comparison. A
ilarity score between suffixes of [a1 , a2 ...ai ] and [b1 , b2 ...bi ]. similar approach may be useful while comparing system call
sequences of polymorphic variants of malware. However, There are three main types of gap penalties used in the
we are comparing system call sequences of the same code. context of biological sequences: constant, linear, and affine
We observed that malware polymorphism happens mostly gap penalty. The constant gap penalty simply gives a fixed
during the propagation step, i.e., while the malware sample negative score for each gap opening. This value does not
creates a copy of itself, while runtime polymorphism is less depend on the length of the gap. This is a simple and fast
common. Moreover, achieving the same functionality by re- schema. However, this schema gives too much freedom for
placing the system call is difficult. That is, the probability sequence alignment, resulting in unnecessary long gaps. The
of mutation in the system call sequences extracted from two linear gap penalty, as the name implies, linearly increases
executions of the same malware sample is very small. This the penalty score in proportion to the length of the indel.
means that the mismatches of system calls are less common. This method favors shorter gaps by severely penalizing long
In our case, the challenge is to meaningfully quantify match indels, which is not suitable in our context. The affine gap
and mismatch in case of system calls. There may be a vary- penalty combines both constant and linear gap penalties,
ing number of arguments associated with each system call taking the form ga + (gb ∗ L). That is, it assigns an opening
event. Not all arguments are equally important for similar- gap penalty ga , which increases with the rate of gb . We can
ity computation. As discussed earlier, alignments of some use a smaller value for gb to favor longer gaps. By choosing
system calls are more important than others. For example, |ga | > |gb |, we can model a gap penalty such that it is easier
we want to prioritize the alignment of NtCreateProcess over to extend a gap than to open it. We use this model of gap
NtQueryValueKey because creating a process is a more crit- penalty when aligning system call sequences.
ical event compared to reading a registry value. We can
assign a high similarity value for a match of a critical sys- 3.2.3 Parameter Selection
tem call, which helps build an “anchor point” during the In our approach to system call alignment, like any other
alignment process. In our current model, the list of such alignment problem, there are certain constraints we need
critical system calls includes system calls that create and to follow while designing similarity score and gap penalty
terminate processes and threads. We propose the following parameters. More precisely, we want to have the follow-
similarity-scoring schema for computing similarity between ing inequality relation as a guideline for choosing parameter
two system calls. values:
call corresponding to the thread. We create a new blank Section 2. That is, evasion signature ∆ ⊂ E , since ∆ = P ∪
′
sequence and associate it with the new meta-node. A meta- Q. However, with large values of ω, evasion section E also
node represents a branching point in the main sequence. includes many other events that are not related to evasion.
We remove all occurrences of system calls associated with By reducing the value of ω we can reduce the number of such
′
the new thread from the main sequence and append it to unrelated events and improve the relation ∆ ≈ E . We also
the newly created sequence associated with the meta-node. observed that the comparison events in Q that are used for
The one-to-one mapping of a new thread event and its cor- evasion are likely to be performed very close to the evasion
responding NtCreateThread may not always be available in point k. This allows us to reduce ω to smaller values and
′
the execution profile. To this end, we assign a new thread still have Q ⊂ E . This approach might exclude call events
event with the last unassigned call of NtCreateThread. Dur- made earlier in the sequence whose results are used later for
ing the alignment process, two meta-nodes are recursively evasion. To mitigate this, we include all call events that are
processed to compute the similarity score. That is, to com- related to comparison events in Q into the evasion signature.
Notice that, unlike the previous sequence alignment step, Qemu intermediate language, all comparison instructions of
in this step a call event includes both system calls and API x86 architecture are translated into the same intermediate
calls. Although many user API calls correspond to system comparison instruction. For each comparison, taint labels of
calls, many user mode APIs may not trigger any system call. the operands are examined to determine corresponding call
For example, the user mode API GetTickCount in Windows events that produced the data byte. Consecutive compar-
does not invoke any system call (native API). However, this isons are merged into a single comparison event. In case the
API is widely used in timing-based evasions. We must in- comparison in performed with some constant, the constant
clude such call events in the evasion signature to make it value is also extracted.
more accurate and complete. Beside taint analysis, we also analyze handle dependen-
Initially, we set P as the set of all call events in the evasion cies between call events. This allows us to generate a more
′ ′
section E , and Q as the set of all comparison events in E . descriptive value of attrib(e) for the call event e. For ex-
However, even with smaller values of ω, the evasion section ample, if a registry key HKLM/System is opened by a call to
′
E still contains unrelated call events. In the next section, NtOpenKey and the returned handle is later used for a call
we describe our approach to filtering out these unrelated to NtEnumerateKey, we use the registry key name as the
events using statistical observations. attrib(e) for the call event NtEnumerateKey.
An execution of a program contains many comparisons
4.2 Inverse Document Frequency even if only comparisons with tainted operands are consid-
A call event used to retrieve information from the analysis ered. However, many of such comparisons originate from
environment for fingerprinting is usually unique to the eva- within API functions rather than the actual malware code.
sive behavior. The majority of the malware samples that are For this reason, comparisons inside user API calls are dis-
not evasive do not retrieve those unique pieces of informa- carded, except for API calls that are designed specifically
tion. Similarly, if the same call event e (same name(e) and for data type comparison, such as strings and dates. Com-
attrib(e)) is present in the call sequences of all non-evasive parison events are included in the execution profile of the
malware, such call event is less likely to be used for eva- malware along with the system call and user API call events.
sion. We can filter out call events from the evasion section We build the sequence of malware execution events E
′
E that occur too often in the collection of call sequences from the execution profile generated by Anubis. We also
of non-evasive malware. To perform such filtering, we use extract the system call sequence from another execution en-
inverse document frequency-based metric. vironment that the malware evaded. Since the evasion code
Inverse document frequency (idf ) is commonly used in in- executes in both environments, we can extract its evasion
formation retrieval [31]. It is a measure of whether a term is signature from Anubis execution profile regardless of which
common or rare across all documents. Formally, the inverse environment is evaded. We identify the evasion point and
′
document frequency of a term t in a collection of documents evasion section E using the approach described in the pre-
D is defined as: vious sections. We extract the call events P ′ and the com-
parison events Q from the evasion section E . We filter P
N using the idf -based method described previously.
idf (t, D) = log
dft Finally, all call events associated with Q are added to the
where, N is the total number of documents in the corpus set P . The union of P and Q represents our final evasion
and dft is the document frequency, defined as the number of signature ∆ = P ∪ Q.
documents in the collection D that contain the term t. 4.4 Clustering
In our case, a call event is a term, and collection of call
sequences of non-evasive malware is the document corpus D. Given a collection of evasive samples, we propose to assess
For a call event e, a large value of idf (e, D) implies that the different evasion techniques present in the collection based
call event e is unique, and a small value of idf (e, D) implies on the extracted evasion signatures. To do this, we per-
that e is commonplace. Here, idf (e, D) = 0 means the call form hierarchical clustering of evasive samples. This allows
event e is present in all call sequences of D. a malware analyst to prioritize and selectively study differ-
We define a threshold τ such that, if idf (e, D) < τ , we ent evasion techniques without analyzing randomly selected
consider the call event e to be a common event having little samples. To perform manual assessment of a particular clus-
or no discriminating power for building evasion signatures. ter, we can take an intersection of the evasion signatures of
We remove such call events {e : idf (e, D) < τ } from P . all samples from that cluster. That is, we inspect the eva-
sion signature elements that are common to all samples in
4.3 Event Dependency Analysis the cluster.
The next component of the evasion signature is the com- A hierarchical clustering requires a method to compute
parison events Q used for altering control flow during eva- pairwise similarity between two evasion signatures. An eva-
sion. Comparison events can be monitored with any fine- sion signature is essentially a set. We compute similarity
grained instruction-level execution monitoring. However, we between two evasion signatures ∆a and ∆b as a Jaccard
are interested in only those comparisons that involve the use Similarity J, which is given as:
of information generated by previous call events. To track
the information returned by call events we leverage taint | ∆a ∩ ∆b |
J(∆a , ∆b ) = .
analysis. To this end, we build upon the work of the Anubis | ∆a ∪ ∆b |
extension proposed in [5]. Anubis [1] is a malware analysis The result of a hierarchical clustering depends on the
framework, which uses Qemu-based full-system emulation as choice of the linkage method and the similarity measure,
the execution environment. In this approach, information where, the former is usually more critical than the latter [32].
returned by all call events is tainted at the byte level. Inside There are two main choices of linkage methods; single-linkage
and complete-linkage. We use the complete-linkage method used, we selected the last system call as the evasion call.
for our clustering. This is because the complete-linkage For instance, let us take an example evasion instance that
method prefers compact clusters with small diameters over opens a registry key HKLM/HARDWARE/Description/System
long, straggly clusters [21]. As we want maximum similarity using NtOpenKey and reads the value of the key System-
between all pairs of members in a cluster for assessment, the BiosVersion using NtQueryValueKey. Inside Anubis, the
complete-linkage method best fits our purpose. returned value is QEMU -1 because of the underlying Qemu
subsystem, which can be checked for evasion. In this exam-
5. EVALUATION ple, both system calls are related to evasion. However, we
select the last call to NtQueryValueKey as the evasion call.
We evaluated our approach on real-world Windows-based
We note the index of this instance of the system call in the
evasive malware samples. We made this choice because the
sequence as the data point used later in the experiments.
majority of the evasive malware is observed on this platform.
Moreover, the majority of the malware analysis systems are
also focused on the same platform.
5.3 Algorithm Evaluation
In our approach, accurately finding the evasion point is
5.1 Execution Environments the first and critical step towards extracting evasion signa-
In our evaluation, we provide two execution environments tures. This depends on the accuracy of the proposed se-
based on emulation and hardware virtualization, respectively. quence alignment algorithm for system calls. The accuracy
depends on several parameters used by the algorithm. In
5.1.1 Emulation Section 3.2, we discussed some guidelines for choosing opti-
We use Anubis [1] to extract malware execution events mal parameters for algorithm. However, there is no previous
from an emulated environment. Anubis performs execution work on this area. Unlike in the field of bioinformatics, an
monitoring by observing an execution of precomputed guest appropriate labeled dataset is lacking to build a statistical
memory addresses. These memory addresses correspond to model of similarity score for system call sequences. We use
system call functions and user API functions. Anubis is able an incremental approximation-based approach to find opti-
to extract additional information about the API execution mal values of the parameters, which we describe in the next
by inserting its own instructions to the emulator’s instruc- section.
tion execution chain. Besides system calls, we are able to
extract additional information, such as user API calls and 5.3.1 Experiment with Scoring Function
comparison events, which are necessary for building evasion To evaluate our guideline, we performed several experi-
signatures. ments by varying different scoring parameters. For this, we
first chose to vary a set of four main parameters (ga , gb , nwt ,
5.1.2 Hypervisor wt , see Section 3.2). Our preliminary experiments showed
We use Ether [10] to extract malware execution events that the values of these parameters play a major role in the
from a hardware-based virtualized environment. Ether is a algorithm output. For the remaining parameters, we empir-
Xen-hypervisor-based transparent malware analysis frame- ically assigned constant values. For each set of parameter
work that utilizes Intel VT’s hardware virtualization exten- values (ga , gb , nwt , wt ) we performed sequence alignment to
sions [2]. The hardware virtualization makes it possible to find the corresponding evasion point. Let Am be a sequence
execute most of the malware instructions as native CPU in- corresponding to a malware m and let km be the index to the
′
structions on the real hardware without any interception. calculated evasion point. Let, em be the index to the eva-
Thus, it does not suffer from inaccurate or incomplete sys- sion call in Am , which is known as the ground truth. We
tem emulation. It was observed that Ether can be evaded ′
say that an evasion section of width w successfully captures
in its default setup because it uses QEMU’s device model to ′ ′
the evasion call if km − w ≤ em < km . That is, the evasion
provide virtualized hardware peripherals [17]. We modified ′
call is within the evasion section defined by w . For a set of
the device model used by Ether to prevent such evasion.
N samples, we compute the recall rate corresponding to the
′
5.2 Dataset parameter set (ga , gb , nwt , wt ) and evasion section w as
The input for our system is a collection of known evasive T P/N , where T P is the number of samples that are within
′
malware samples. For the evaluation of our system, we re- the evasion section defined by w .
ceived 3,107 evasive samples identified by the BareCloud [17] Figure 2 shows the results of the recall rate of some param-
′
system. We analyzed those samples in Anubis and our mod- eter sets when varying the evasion section of width w . We
ified Ether environments. We extracted system call traces used the ground truth dataset as described in Section 5.2.
and computed behavior deviation scores as proposed in [17]. The area under the curve (AUC) represents the relative per-
We found that 2810 samples evaded Anubis with respect to formance of the choice of the parameters. The result vali-
our Ether environment. dates some of our initial intuitions. For example, choice of
To build the ground truth dataset, we randomly selected | ga |>| nwt | decrease the algorithm performance (compare
52 samples out of 2810 evasive samples. We manually an- top and second curves), a relatively large score for a match
alyzed those samples and identified the calls and the com- compared to the gap penalty degrades performance (third
parisons that are related to the evasion. This information curve), and a large gap extension penalty gb is not favorable
constitutes the evasion signature ∆ of the malware sam- (top and bottom curves).
ples. To evaluate the alignment algorithm, which works only There are many possible combinations of parameter choices.
on the system call sequences, we identified the most impor- To find the optimal choice, we computed AUC values for all
tant system call that is critical to the evasion technique as possible combinations when ga , gb , nwt , and wt are selected
the evasion call. In case multiple related system calls are from the sets 10 values for each parameter ranging from -10
85
1.0
80
0.8
75
0.6
AUC
Recall Rate
70
ga = −10, gb = −0.01, nwt = −2, wt = 3
ga = −1, gb = −0.10, nwt = −2, wt = 10
ga = −2, gb = −0.50, nwt = −2, wt = 3
0.2
65
0.0
LCS
tree. This way, the clusters formed are very distinct from
each other, representing distinct evasion techniques. A cut
0.0
0.6
Precision
Recall
0.4
0.2
0.0
0 2 4 6 8 10
idf threshod(τ)