0% found this document useful (0 votes)
23 views12 pages

Metamorphic Malware

Uploaded by

ash sas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

Metamorphic Malware

Uploaded by

ash sas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MetaHunt: Towards Taming Malware Mutation via Studying the

Evolution of Metamorphic Virus


Li Wang Dongpeng Xu Jiang Ming
[email protected] [email protected] [email protected]
The Pennsylvania State University University of New Hampshire University of Texas at Arlington
University Park, PA 16802, USA Durham, NH 03824, USA Arlington, TX 76019, USA

Yu Fu Dinghao Wu
[email protected] [email protected]
The Pennsylvania State University The Pennsylvania State University
University Park, PA 16802, USA University Park, PA 16802, USA

ABSTRACT KEYWORDS
As the underground industry of malware prospers, malware de- Malware detection, metamorphic virus, binary diffing, binary code
velopers consistently attempt to camouflage malicious code and semantics analysis
undermine malware detection with various obfuscation schemes. ACM Reference Format:
Among them, metamorphism is known to have the potential to Li Wang, Dongpeng Xu, Jiang Ming, Yu Fu, and Dinghao Wu. 2019. Meta-
defeat the popular signature-based malware detection. A meta- Hunt: Towards Taming Malware Mutation via Studying the Evolution of
morphic malware sample mutates its code during propagations so Metamorphic Virus. In 3rd Software Protection Workshop (SPRO’19), Novem-
that each instance of the same family exhibits little resemblance to ber 15, 2019, London, United Kingdom. ACM, New York, NY, USA, 12 pages.
another variant. Especially with the development of compiler and https://doi.org/10.1145/3338503.3357720
binary rewriting techniques, metamorphic malware will become
much easier to develop and outbreak eventually. To fully under-
stand the metamorphic engine, the core part of the metamorphic 1 INTRODUCTION
malware, we attempt to systematically study the evolution of me-
The malicious software (malware) underground market has evolved
tamorphic malware over time. Unlike the previous work, we do
into a multi-billion dollar industry [6]. Driven by the rich profit,
not require any prior knowledge about the metamorphic engine
there has been consistent growth in the number and diversity of
in use. Instead, we perform trace-based semantic binary diffing
malware. According to a Panda Security Lab annual report [40], in
to compare mutation code iteratively and memoize semantically
2017 alone, the total number of malware samples in circulation is as
equivalent basic blocks. We have developed a prototype, called
high as 75 million, 1.4 times the number of malware found in 2016.
MetaHunt, and evaluated it with 1, 400 metamorphic malware vari-
Relentless malware developers typically apply various obfuscation
ants. Our experimental results show that MetaHunt can accurately
schemes (e.g., packer, polymorphism, and metamorphism) [37, 45]
capture the semantics of unknown metamorphic engines, and all
to camouflage arresting features, circumvent malware detection,
of the comparisons converge in a reasonable time. Besides, Meta-
and impede reverse engineering attempts. Among these obfuscation
Hunt identifies several metamorphic engine bugs, which lead to a
techniques, metamorphism is widely believed to be a panacea to
semantics-breaking transformation. We summarize our experience
thwart the signature-based malware scanning approaches [1, 47, 56],
learned from our empirical study, hoping to stimulate designing
which are still by far the most widely used anti-malware solution
mutation-aware solutions to defend this threat proactively.
in practice [8]. The core of metamorphic malware is a metamorphic
engine (i.e., morphing engine). Each time a metamorphic malware
CCS CONCEPTS sample executes or propagates, the metamorphic engine mutates
• Security and privacy → Intrusion/anomaly detection and the instructions that are loaded into memory by various methods
malware mitigation. such as register swapping, instruction substitution, instruction re-
ordering, and junk code insertion. As a result, the old version is
transformed into a syntactically different but semantically equiva-
Permission to make digital or hard copies of all or part of this work for personal or lent variant. In this way, a metamorphic malware sample becomes
classroom use is granted without fee provided that copies are not made or distributed a moving target for analysis as the archetype1 evolves from genera-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
tion to generation. Consequently, the signature-based anti-malware
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, approaches become insufficient to capture the numerous ostensibly
to post on servers or to redistribute to lists, requires prior specific permission and/or a different variants for a particular instance of malware, as illustrated
fee. Request permissions from [email protected].
SPRO’19, November 15, 2019, London, United Kingdom in Figure 1. A striking example is from Leder et al.’s study [27] in
© 2019 Association for Computing Machinery.
1 The term “archetype” means the initial un-mutated version, from which the mutation
ACM ISBN 978-1-4503-6835-3/19/11. . . $15.00
https://doi.org/10.1145/3338503.3357720 starts.

1
2009. They report that only 12.6% of the files infected by the meta- are well known. Since several prototypes of metamorphic malware
morphic malware Lexotan32 are detected by a total of 40 malware have been well studied or open sourced [19, 38], it seems that the
scanners in VirusTotal2 and no single scanner can identify all the prior knowledge about morphing rules can be collected easily. Ho-
infected samples. wever, such optimistic assumption does not always hold in practice.
The prototype of metamorphic malware first emerges in the DOS It is always possible for an expert malware developer to design
days [48] and is constantly evolving on Windows platforms [3, 17]. an alternative mutation way [39]. Unfortunately, manually tracing
However, compared with packer and polymorphism, metamor- metamorphic mutations often cost several days or even weeks of
phism obfuscation was not widely adopted in the past. The major tedious work, and the results are incomplete and error-prone as
reason is that developing a full-fledged metamorphic engine is well.
highly complicated, especially for self-propagating malware, which In this work, we present MetaHunt, to study the evolution of
typically attaches the metamorphic engine in its code. For example, metamorphic malware mutation over time. Our purpose is to under-
the relatively sophisticated sample, MetaPHOR (a.k.a.W32.Simile stand the diversity of the metamorphic transformation comprehen-
and W32.Etap) has about 14, 000 lines of assembly code and more sively, and provide the insight of the mutation mechanism behind
than 90% of its code is occupied by the metaphoric engine [19]. In the metamorphic malware, which further helps stimulate the de-
recent years, with the advent of automated development toolkits, velopment of mutation insensitive malware protection solution.
such as LLVM [26], SecondWrite [2], and Uroboros [51, 52], de- Different from the previous work, we do not assume the knowledge
veloping a powerful metamorphic engine will become relatively about the specific metamorphic engine in use. Instead, we study
easy. For example, LLVM has been actively employed to facilitate how a metamorphic engine mutates the code via iteratively com-
malware mutation and diversification [24, 42, 49]. Therefore, we paring input-related mutation code and memoizing equivalent basic
estimate that new malware variants with an advanced metamorphic blocks. There are two key observations behind our approach. The
engine will outbreak in the foreseeable future. To keep ahead in first one is the metamorphic mutation is a semantics-preserving
the malware defense arms race, we have to measure the risks of transformation. Therefore, ostensibly different code pairs but with
metamorphic malware and develop effective countermeasures. the same function can be matched by state-of-the-art semantics-
A major challenge in metamorphic malware analysis is to design based binary diffing techniques [21, 30, 35]. The second one is,
a general and automatic technique to capture all possible mutati- compared to other metamorphic transformation methods, the effect
ons [44]. Previous research work relies on studying the similarities of equivalent instruction substitution is harder to reverse (e.g., via
before/after metamorphism and can be classified into three cate- code normalization [5]) because of the cumbersome x86 instruction
gories. The first category measures the similarity of static features set architecture. Meanwhile, the sets of pure equivalent instruction
such as control flow graph [4], opcode statistical signatures [10], substitution patterns are also limited [33, 50]. For example, the
instruction hidden Markov model [55], and characteristic value code substitution table of MetaPHOR consists of 94 alternative in-
set [27]. Chouchane et al. [9] introduce an engine-specific scoring struction sequences [19]. Consequently, after our preprocessing to
signature to match metamorphic engines. These approaches gear remove some mutation methods such as junk code and opaque pre-
toward fast filtering out simple metamorphic malware, but they dicates, the iteration of comparing metamorphic mutations is not
are brittle to defeat the sophisticated ones whose code are even endless, which will converge when no new semantically equivalent
encrypted [43]. Besides, the metamorphic engines can be decou- code is discovered.
pled from the malicious code to mutate non-propagating malware More specifically, given two metamorphic mutations, we first
offline. For example, for the highly metamorphic malware created identify the basic blocks that can be affected by inputs via multi-
by NGVCK (Next Generation Virus Creation Kit) [55], their engines tag taint analysis. Next, we perform normalization to reverse the
are separated from the malicious body. In that case, the “engine sig- mutation methods that may affect the scope of a basic block. After
nature” approach is futile. The second category is based on the idea that, we represent the semantics of a basic block as a set of logical
that the malicious behavior is not changed during code mutation. formulas by symbolic execution. Then we compare these logical
They detect metamorphic malware by measuring the similarity of formulas to find semantically equivalent basic block pairs with
API call sequences or graphs [28, 34, 56]. The main drawback to a theorem prover. After that, the semantically equivalent basic
these API call approaches is that they regard the code mutation as blocks are memoized in a union-find set [12], an efficient tree-based
a “black box”, lacking an illuminating insight into the metamorphic data structure. During successive comparisons, we continue to
engine. compare metamorphic variants and maintain the corresponding
The third category, also the most advanced one, aims at capturing union-find sets until reaching a fixed point, that is, there is little or
the metamorphic engine’s semantics [11]. They model the meta- no increase in the size of the union-find sets. At that point, we call
morphism either by semantic juice [25], algebraic specification [53], that we have explored the metamorphic malware mutation evolution.
or abstract interpretation [13, 14]. The key design of metamorphic Although theoretically the attempt to find all the metamorphic
engine is a set of morphing rules (e.g., equivalent instruction substi- mutations is equal to solving the halting problem [25], the collected
tution patterns), which guide how to transform instructions to their information has many interesting implications from the practical
equivalent ones but with different syntax. A common assumption point of view. For example, a mutation insensitive signature can be
in the third category is that the metamorphic transformation rules generated to capture all possible metamorphic variants; malware
lineage information [23] can even be recovered as well.
2 https://www.virustotal.com/

2
9DULDQW$ 9DULDQW% 9DULDQW& 9DULDQW' 9DULDQW( 0RUHPXWDWLRQV ಹಹ

6LJQDWXUH
™ 1RPDWFKLQJ
VLJQDWXUH
PDWFKLQJ

,QVWDQFHVSHFLILFVLJQDWXUH

$ % & '

Figure 1: Metamorphic malware can evade conventional signature-based anti-malware solution.

We have implemented a prototype of MetaHunt on top of the 2 BACKGROUND AND RELATED WORK
BitBlaze [46] binary analysis platform. MetaHunt not only impro-
2.1 Metamorphic Malware
ves the semantics-based binary diffing technique in the resilience
to highly obfuscated binary code, but also in the better perfor- Metamorphic malware mutates their code during each generation
mance. We perform a solid empirical study with 1, 400 metamor- so that the new generated version reveals different instructions with
phic malware samples, which are generated by nine metamorphic the previous one, but the semantics is preserved. This differs from
engines, including two advanced malware mutation tools based on polymorphic malware (e.g., via binary packing) which do not re-
LLVM [24, 41]. The evaluation shows that the iteration of compa- write their own code [37]. The constantly changing property makes
ring metamorphic malware variants converges in a reasonable time. it difficult for signature-based anti-malware approaches to recog-
Compared to manually reverse engineering of malware, MetaHunt’s nize all the mutations of the same metamorphic malware. The core
exploration result provides a comprehensive understanding about of metamorphic malware is a metamorphic engine, which performs
the mechanism of a metamorphic engine. In addition, MetaHunt a set of transformations to mutate the code. The commonly used
identifies several buggy metamorphic engine implementations that code morphing methods are register swapping, instruction sub-
ignore subtle side effects of the x86 instructions. Our MetaHunt stitution, instruction reordering, junk code insertion, and control
prototype gives a method to record and compare the semantics flow obfuscation (e.g., opaque predicates and control flow flatte-
of the metamorphic malware, which provides some feasible hints ning). We refer the reader to the literature [37, 47] for more detailed
for the mutation insensitive anti-malware solutions. The result de- information. As shown in Figure 2, MetaPHOR [19] substitutes
monstrates that MetaHunt is an appealing complement to existing one instruction with a set of semantics-persevering instructions;
metamorphic malware defenses. Lexotan32 mutates its code by inserting junk code (the instructions
In summary, the contributions of this paper are as follows. are in italics) and reordering instruction. Note that after mutation,
the original single basic block in Figure 2(b) has been divided into
(1) To the best of our knowledge, we are the first one to study multiple basic blocks.
metamorphic malware evolution systematically. Note that among the multiple mutation methods, instruction sub-
(2) Instead of being metamorphic engine specific, our approach stitution is the most sophisticated one. Due to the cumbersome x86
is a generalized solution by automatically comparing the ISA, checking whether two instruction sequences are semantically
possible mutations and memoizing semantically equivalent equivalent is challenging. The advanced semantics-based binary
basic blocks. Our exploration results provide a comprehen- diffing has to rely on symbolic execution and theorem proving
sive understanding of the metamorphic engine semantics. techniques to match equivalent instructions. Typically, a metamor-
(3) We present MetaHunt, a novel approach to comparing the si- phic engine performs code substitution by comparing instructions
milarities before/after metamorphic mutation. MetaHunt in- against a fixed table containing alternative sequences, and then
tegrates the advanced semantics-based binary diffing techni- randomly chooses one. Figure 2(a) presents a part of MetaPHOR
que in metamorphic malware analysis and improves it with code substitution table. On the other hand, the pure equivalent
better accuracy and performance. instruction substitution rules are not unlimited either [33, 50]; that
is, the length of code substitution table is fixed. All of these obser-
The rest of the paper is organized as follows. Section 2 pro- vations form the basis of our approach.
vides background information and related work. Section 3 and In Section 1, we have introduced the existing metamorphic mal-
Section 4 present our system design and implementation in detail. ware analysis work. However, a systematical study of metamorphic
We evaluate MetaHunt in Section 5. Discussions and limitations malware evolution is still missing. Understanding how a morphing
are presented in Section 6. We conclude the paper in Section 7. engine mutates code over time without a priori knowledge is an

3
start:
push ebp
mov ebp, esp
jmp loc_0003

Before After loc_0003:


push ebp xor exp, 0x21
push eax
jmp eax mov ebp, esp lea ecx, [esi+0x06EF]
ret
mov edx, ebx
xor exp, 0x21 mov ecx, esi
xor eax, -1 not eax
jmp loc_0034
push 0x0ACAB
and eax, 0 mov eax, 0
pop eax loc_0017:
not eax add eax, 0x1234 pop eax
add eax, 1
neg eax pop ebp mov ecx, esi
ret mov edx, ebx
push eax
add eax, 0x1234
mov al, [esi]
cmp edx, 0xAD463
inc esi
movsb pop ebp
mov [edi], al
mov di, 0x0A8C
inc edi
ret
pop eax mov di, 0x1F24
mov edi, [al]
stosb loc_0034:
inc edi
push 0x0ACAB
push ecx sbb ecx, [esi+0x077F]
mov ecx, eax adc bh, 0x71
mov [ebx+8], eax
mov [ebx+8], ecx mov di, 0x0EEC4
pop ecx jmp loc_0017

(a) MetaPHOR [19] (b) Lexotan32 [38]

Figure 2: Metamorphism transformation examples.

interesting and challenging research problem. In this paper, we However, due to the slow symbolic execution and the high invoca-
propose MetaHunt to explore this problem. tion of a constraint solver, semantics-based binary diffing suffers
from significant performance slowdown [29].
The most relevant work to MetaHunt is the memoized binary
diffing method [32], another trace-oriented binary diffing tool for
2.2 Semantics-based Binary Diffing matching basic block pairs. However, MetaHunt is designed for
Since most malware spread in binary form, the techniques to detect comparing a large number of obfuscated metamorphic malware
the difference between two binaries (binary diffing) have been wi- variants; the binary diffing method [32] is used for comparing
dely applied to malware reverse engineering. Conventional binary different versions of normal programs. Compared to it, MetaHunt
diffing tools identify syntactical differences such as instruction se- is augmented with better resilience to various code obfuscation
quences, byte N-grams, and basic block hashing [36]. However, they methods (e.g. call/return obfuscation and opaque predicate) and a
can be easily evaded by various obfuscation methods. The core met- set of optimizations. Therefore, MetaHunt has better accuracy and
hod of the advanced semantics-based binary diffing [21, 29, 30, 35] performance on analyzing metamorphic malware.
is to first identify semantically equivalent basic block pairs. It uses
symbolic values to represent inputs to a basic block and then simula-
tes the function of each instruction by updating the corresponding 3 SYSTEM DESIGN
symbolic formula. The output of symbolic execution is a set of
formulas that represent the behavior of the basic block. After that, 3.1 Overview
we try to find whether there is an equivalent mapping between The architecture of MetaHunt is illustrated in Figure 4. It mainly
two basic block output formulas. If yes, those two basic blocks are comprises two parts: online trace logging and offline comparison.
equivalent in semantics. Figure 3 presents two semantically equi- The online part will produce a sequence of executed basic blocks
valent basic blocks. Their output symbolic formulas are verified as together with their associated taint tags, and then pass them to
equivalence by a constraint solver (e.g., STP [20]). Note that due the offline part for comparison. MetaHunt’s offline stage consists
to obfuscation such as register renaming, basic blocks could use of three components: normalization, basic block comparison with
different registers or variables to implement the same functionality. the semantics-based binary diffing technique, and a union-find set
As a result, current approaches exhaustively try all possible pairs structure to record semantically equivalent basic blocks. The nor-
to find if there exists a bijective mapping between output formulas. malization component performs several transformations to remove

4
Symbolic input: Symbolic input: following analysis. The comparison unit of most semantics-based
eax = i; eax = j; binary diffing work is basic block [21, 29, 30]. However, many obfus-
cation methods can split a single basic block to multiple basic blocks.
Basic block 1 Basic block 2 As a result, direct comparison between the split basic blocks with
not ebx the original block lead to false negatives. Moreover, too much extra
xor eax, -1 not ebx basic block comparisons increase the performance cost. Therefore,
add eax, 1 neg ebx
jmp loc_0022 a normalization pass is performed to reverse these obfuscation
jmp loc_0022
effects. Currently, we consider three major obfuscation methods: in-
Output Output struction reordering, call/return obfuscation, and opaque predicate
obfuscation. The effect of instruction reordering is to split one basic
eax = (i ^ -1) + 1; ebx = ((j ^ -1) ^ -1) × -1;
block into multiple new basic blocks, which are connected through
direct jumps. call/return obfuscation involves non-standard use
Figure 3: Example: basic block symbolic execution. of the call and ret instructions [45]. For example, push ADDR;
ret is equivalent to jmp ADDR. Reverting the effect of instruction
reordering or call/return obfuscation is straightforward. We merge
obfuscation effect. After that, the normalized basic blocks are com- all adjacent basic blocks that have only one predecessor and one
pared by a symbolic execution based method. Finally, the equivalent successor into a single basic block.
basic blocks are inserted into the same union-find set. The detail of Our normalization also removes opaque predicate obfuscation.
each component are discussed in the following sections. An opaque predicate means its value is known to the obfuscator at
obfuscation time, but it is difficult for an attacker to figure it out
3.2 Trace Logging afterward. For example, predicate (x 3 − x ≡ 0 (mod 3)) in Figure 5
The online trace logger records the basic blocks executed during is true for all integers x. Opaque predicates have been widely used
runtime. In general, not all of the executed instructions are of to introduce redundant branches for the purpose of control flow
interest, such as the code from packers or standard libraries. We obfuscation [31]. To handle opaque predicates, we submit a branch
want to compare the basic blocks that represent the virus behavior. condition to a constraint solver to verify whether it is always true
Our online stage supports recording the execution trace that comes or false. If yes, we conclude that the branch condition is an opaque
from real payload instead of various unpacking routines [45]. When predicate. After that, as shown in Figure 5, the unreachable paths
a packed binary starts running, the generic unpacking plug-in will and redundant predicates will be discarded; the basic blocks split
be invoked to monitor whether the original code is recovered; if by the opaque predicate will be merged.
so, the trace logging plug-in will be activated to record execution In addition, we also normalize basic blocks to ignore offsets
trace. Moreover, usually different metamorphic variants still call that may change due to code relocation and some nop instructi-
standard libraries, but the basic blocks in these libraries should ons. Binary code compiled from the same source code often have
not be compared. Our trace logger only records the code from the different address value caused by memory relocation during com-
metamorphic virus ignoring the standard library calls. pilation. What’s more, malware authors may intentionally insert
In addition to ignoring the unrelated basic blocks during run some instruction idioms like nop and xchg eax, eax to mislead
time, we also limit our comparison to the input-related code. The the following hash value calculation (see Section 3.4). The purpose
insight is that the basic blocks related to inputs implement the core of normalization is to ignore these effects and make the hash value
function of a virus, so these basic blocks should be recorded and more general.
compared. To this end, we utilize multi-tag taint forward tracking
to record input-related code, which also reduces the number of
possible basic block matches. We not only take multiple system 3.4 Basic Block Comparison and Memoization
calls that are used to receive outside input as different taint seeds The basic blocks tainted by the same taint tags are the candidates
but also consider the system calls that are commonly used to fulfill to be compared. Our basic block comparison is based on semantics-
malicious behavior, such as download and execution, replication based binary diffing with improvements in several ways. First, we
and remote injection. For example, when a MetaPHOR version exe- introduce an union-find set structure that records semantically
cutes, it invokes about 20 Windows Native API calls3 for replicating equivalent basic blocks. Managing the union-find structure during
and displaying its messages. Note that for the file-infecting meta- successive comparisons allows direct reuse of previously computed
morphic viruses (e.g., MetaPHOR and W32.Evol), multi-tag taint results rather than comparing them again. Specifically, after basic
tracking can also distinguish the host file code and virus body code. block normalization, we first calculate the MD5 value of the byte
The input-related basic blocks together with their associated taint sequence of each basic block. Then, we dynamically maintain a
tags will be passed to the MetaHunt’s offline stage for comparison. set of union-find subsets to record semantically equivalent basic
blocks, which are represented by their MD5 value. The basic blocks
3.3 Basic Block Normalization within the same subset are all semantically equivalent to each other.
After logging the execution trace, MetaHunt first lifts x86 instructi- To avoid a highly unbalanced searching tree, we adopt an improved
ons to an intermediate representation (IR), which facilitates the path compression and weighted union algorithm [12]. In addition
to the union-find set, we also maintain a DiffMap to record two
3 The system calls in Windows are named as Native API. subsets that have been verified that they are not equivalent. If two

5
2QOLQH 2IIOLQH

7(08 1RUPDOL]DWLRQ
9DULDQW
0HUJHEDVLF 5HPRYHRSDTXH %DVLFEORFNSDLUV
*HQHULF 0XOWLWDJ EORFNV SUHGLFDWH FRPSDULVRQ (TXLYDOHQWEDVLFEORFN
XQSDFNLQJ WDLQW XQLRQILQGVHW
9DULDQW 5HYHUVHFRGHUHORFDWLRQHIIHFWV

Figure 4: The architecture of MetaHunt.

A; Algorithm 1 A Fast Comparison of Basic Block Variants


v 1 , v 2 : two basic block variants
1: function FastCompare(v 1 , v 2 )
always
2: v 1′ ← Normalize(v 1 )
true x2 + x = 0
(mod 2)
A; 3: v 2′ ← Normalize(v 2 )
false B; if MD5(v 1′ ) = MD5(v 2′ ) then
true 4:
5: return True
6: end if
B; junk code
7: if Find(v 1′ ) = Find(v 2′ ) then // within the same subset
8: return True
Figure 5: Remove opaque predicate (x is an integer). 9: end if
10: if v 1′ and v 2′ in DiffMap set then
11: // semantically different subsets
basic blocks residing in different subsets are not equivalent, we 12: return False
can safely conclude that the left basic blocks in these two subsets 13: else
cannot be matched either. Note that the basic blocks within the 14: if Sym_Exec(v 1′ ) ∼ Sym_Exec(v 2′ ) then
same union-find set are the mutations mainly caused by instruction 15: // v 1′ , v 2′ are semantically equivalent
substitution transformations, which is confirmed by our evaluation 16: Union(v 1 , v 2 )
′ ′
data. 17: Update DiffMap
Algorithm 1 presents the method for fast comparing the basic 18: return True
block variants. When comparing two basic blocks, we first norma- 19: else // v 1′ , v 2′ are not semantically equivalent
lize them and compare their hash value (Line 4). This step quickly 20: Add DiffMap(Find(v 1′ ), Find(v 2′ ))
filters out basic blocks with quite similar instructions. If two hash 21: return False
values are not equal, we will identify whether they belong to the 22: end if
same union-find subset (Line 7). Basic blocks within the same sub- 23: end if
set are semantically equivalent to each other. If they are in the two 24: end function
different subsets, we continue to check DiffMap to find out whether
these two subsets have been ensured not equivalent (Line 10). At
last, we have to resort to comparing them with symbolic execution
and STP, which is accurate but computationally more expensive.
After that we update the union-find set and DiffMap accordingly
(Line 17∼20). offline stage is based on Vine, BitBlaze’s static analysis platform,
with 2, 900 OCaml lines of code. Our data flow analysis to get rid
4 IMPLEMENTATION of junk code is an extension to Vine’s chopping module, and the
We have implemented MetaHunt on top of BitBlaze [46], a binary theorem prover is STP [20]. The RISC-like style and static single
analysis platform. MetaHunt’s trace logging is built on TEMU, a assignment (SSA) format of Vine’s Intermediate Representation fits
whole-system emulator for dynamic analysis in BitBlaze [46]. The the requirement of our analysis. It has a feature to represent many
online part involves two plug-ins: generic unpacking and multi-tag functionally equivalent instructions (e.g., xor eax, eax and and
taint tracking. TEMU is also used as a malware execution sandbox in eax, 0) in the same way, which is extended for the normalization
our evaluation. We extend TEMU to perform multi-tag taint analysis component. The saving and loading of union-find set and query
on system calls. We intercept the system calls if they are input- hash map are developed using the OCaml Marshal API, which
related, and then assign taint tags to the return values. Various taint encodes arbitrary data structures as sequences of bytes and then
sources are labeled with different taint tags. The online logging part stores them in a disk file. We also write 500 lines of Perl scripts to
contains 2, 200 lines of code added/modified in TEMU. MetaHunt’s glue all components together to automate the comparison process.

6
5 EVALUATION
We evaluate MetaHunt with several objectives in mind. First, we
want to evaluate our iterative comparison of metamorphic variants
will converge in a reasonable time, that is, MetaHunt is capable
of exploring the morphing code evolution. At the same time, we
make sure MetaHunt’s exploration results are comprehensive and
accurate. We provide a case study of the metamorphic engine in
MetaPHOR and W32.Evol to show more details about the engine’s
mechanism and how MetaHunt explores the variants generated
by the engine. We also test the optimization methods for speeding
up the malware comparison. At last, we report some interesting
findings during our evaluation.

5.1 Experiment Setup


Our testbed consists of Intel Core i7-3770 processor (Quad Core
with 3.40GHz) and 8GB memory, running Ubuntu 12.04. The guest
OS running in TEMU is Windows XP SP3. The dataset used for our
Figure 6: Size of union-find set of the variants generated by
experiment consists of a total 1, 400 metamorphic variants. They are
MetaPHOR.
generated by nine mutation engines collected from different sour-
ces. Table 1 shows our dataset statistics and the morphing engine
information. The second column indicates whether the morphing
engine is attached by the malicious body or decoupled. The third third group are two open source metamorphic generators based
column presents the number of metamorphic variants we gene- on LLVM: MalDiv [41] and Obfuscator-LLVM [24]. They perform
rate. Column 4 ∼ 13 represent the major code morphing methods code mutation by manipulating the LLVM IR code. We select the
adopted by these engines. The morphing methods include regis- source code of Linux utility gzip as the base file. We generate
ter renaming, dead code insertion, instruction reorder, equivalent the metamorphic variants of gzip by applying the MalDiv and
code substitution, opaque constant, call/return obfuscation, indirect Obfuscator-LLVM iteratively on the mutated code. The sizes of the
jump, opaque predicate, control flow graph flattening, and function gzip mutations in this group range from 61 KB to 728 KB.
inlining. In addition, some metamorphic viruses also integrate po-
lymorphic encryption. For example, when the binaries infected 5.2 Converging Time and Union-Find Set Size
by Lexotan32 or MetaPHOR execute, the main virus body would We run MetaHunt to compare the metamorphic variants in Ta-
be first decrypted [38, 43]. We mark these two cases in the 14th ble 1 and the result is reported in column 15∼17. Column 15 shows
column (Decryption) of Table 1. the converging time of our iteratively comparing metamorphic
As shown in Table 1, the nine mutation engines in our experiment mutations. Column 16 and 17 present statistics of the union-find
are categorized into three groups. The first group contains three set, including the number of union-find subsets and the maximum
well-known self-propagating viruses, Lexotan32, MetaPHOR, and number of basic blocks in one subset. The result in Figure 6 shows
W32.Evol. They all embed the metamorphic engine within the virus that MetaHunt reaches a converging point in 7 hours for all the
body. In our study, we select 100 copies of Cygwin4 utility bzip2 as metamorphic engines. For these variants from simpler engines such
the “goat” binaries, and the metamorphic virus in the first category as Lextan32, W32.Evol, G2, and VCL32, MetaHunt takes less than 2
are used to infect them. Since these three viruses do not mutate hours to reach the converging point. After reaching the converging
their host code, choosing the same copies of goat files can help point, the size and number of union-find set in MetaHunt stop
us identify the morphing code, which is always being modified growing, which means MetaHunt has studied the evolution of the
while the host code does not change. During our evaluation, the variants from the metamorphic engine. We also record the number
running goat executables will infect themselves iteratively, and of union-find subsets and the maximum number of basic blocks
each infection will yield a new generation variant. The sizes of the in one set. We looked into the basic blocks inside one subset, and
malware samples in this group range from 28 KB to 410 KB. Virus manually verified that they are all semantically equivalent variants
construction kits are designed to simplify the development of virus of the same basic block. Our evaluation result shows the mutation
code, and some kits are also used as decoupled metamorphic engines capability of metamorphic malware is not unlimited, and the evalua-
to mutate non-propagating malware [47]. The second group in tion of variants will eventually reach to a converge point. Consider
Table 1 consists of four mutator cases, which are collected from VX the number of variants is unable to increase continuously, this
Heaven5 . For each tool, we create 200 metamorphic virus variants. may provide a start point for the malware defender and stimulate
Since the output of these virus generators is assembly code, we use designing mutation insensitive anti-malware solutions.
TASM 5.0 Assembler to compile the source code into binary. The
sizes of the viruses in this group vary from 1 KB to 300 KB. The 5.3 Case Study
4 https://www.cygwin.com 5.3.1 MetaPHOR. We analyze the source code of MetaPHOR vi-
5 http://vxheaven.org rus(version 1.1), which was first published on a virus and worm

7
Table 1: Metamorphic engine statistics and various code mutation methods adopted.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

call/return obfus.
Opaque constant

Max. subset size


Conv. time (hrs)
Instr. substitut.

CFG flattening

Funct. inlining
Reg. renaming

Indirect jump

Opaque pred.
Instr. reorder

# UF subsets
Decryption
Dead code
Engine Type # Mutations
Lexotan32 attached 100 X X X X X X 1.5 90 8
MetaPHOR attached 100 X X X X X X 2.2 132 12
W32.Evol attached 100 X X X X X 1.0 52 6
NGVCK decoupled 200 X X X X X X X X 4.7 346 16
G2 decoupled 200 X X X X 1.4 115 8
VCL32 decoupled 200 X X X X 1.8 130 10
MPCGEN decoupled 200 X X X X 2.2 96 8
MalDiv decoupled 150 X X X X X X X X 6.8 522 34
Obfuscator-LLVM decoupled 150 X X X X X X X X 4.6 304 18

Mutation Engine Virus Body


(the virus body) and polymorphic engine. The infector is on lines
10541 to 12578. Polymorphic engine code is between lines 12621
and 13887. The rest of the code are some other minor features.
In each propagation, MetaPHOR will first reserve a 0x340000
bytes space for the next-generation variant, and this space is fixed.
Disassembler Shrinker Permutator Expander Assembler This partially ensures that the generations of MetaPHOR will not go
out of control on the code size. Besides, the shrinker and expander
use transformation tables to compress or expand the virus body,
Figure 7: Structure of the MetaPHOR virus
and the transformation tables are fixed. Figure 2(a) shows the trans-
formation examples used by the expander, while transformation
examples used in shrinker is demonstrated in Figure 9. When pro-
group 29A6 . It has about 14,000 lines of assembly code. As shown
ducing a next-generation variant, both shrinker and expander are
in Figure 7, the structure of MetaPHOR consists of two parts, meta-
able to process the virus instructions for multiple rounds. The virus
mophic enigne and the virus body.
body will change in each round literately, and after each round
Unlike other metamorphic viruses, MetaPHOR has a complete
all the transformation will be accumulated in the next-generation
metamorphic engine, which provides full support for absolute me-
variant.
tamorphism. The metamorphic engine includes five components:
Between shrinker and expander is permutator. The algorithms
disassembler, shrinker, permutator, expander, and assembler. The
used in permutator includes redefining “code frames” and shuffling.
disassembler first decodes the virus body and transforms its in-
The first step is redefining “code frames”. Given an initial and a
structions into pseudo-assembly language, which benefits the follo-
final offset, new “code frame” sizes are selected randomly between
wing mutation steps finished by other components of the engine.
F0h and 1E0h until the last “code frame” reach to the end of code.
Shrinker is responsible of compressing the disassembled code pre-
All the new “code frame” entries will be stored in a table. Then,
processed by the disassembler, in order to avoid explosive growth
the “code frame” sequence in the table will be shuffled. After the
of code size in very few generations. The permutator further muta-
shuffling process, the permutator will start copying the instructions
tes the code by redefining the “code frame” size and shuffling the
according to the shuffled sequence of “code frames”. At last, the JMP
code frame sequence. As the opposite component of the shrinker,
instructions will be inserted at the end of new “code frame”, and at
expander does what the shrinker undoes, which recodes a single
the same time the virus behavior will be unchanged. Figure 8 shows
instruction to many instructions that perform the same function.
a permutation example. These basic block transformations are listed
At last, the assembler will reassemble the pseudo-assembly code
to show the typical mutation methods used in the metamorphic
back to the machine code.
malware samples. Since the permuted basic blocks are connected
Based on our analysis, the function layout of the source code is
by unconditional jumps, they are normalized to one basic block
as follows. The disassembler part is between line 1520 and line 3041.
in MetaHunt’s analysis. In fact, the permutation transformation is
The shrinker part resides in lines 3049 and 5765. The permutator
removed by the normalization component and it does not affect the
part starts from line 5929 and ends at line 6413. The expander part is
binary diffing in MetaHunt. Therefore, MetaHunt is able to reverse
on lines between 6453 and 9279. The lines between 9306 and 10485
the change made by the permutator.
is for the assembler. Besides, there are some other parts, infector
Based on our analysis, we found MetaPHOR is capable of muta-
6 http://virus.wikidot.com/metaphor ting itself. However, the mutation is not infinite, which properly

8
Frame 4
P: The original program P’ 9
P’: The permuted program based on P 10
Frame 1 jump
1
2 Frame 2
3 4
P
jump
1 Frame 2
2 4
Frame 1
3 1
Code frame size Frame 3 Code frame
4 randomization shuffling 2
5
5 3
6
6 jump
7
7 8
8 Frame 3
9 Frame 4 5
10 9 6
10 7
8
jump

Figure 8: An example of one-time permutation

Before After 5.3.2 W32.Evol. The W32.Evol virus is first discovered in July
20007 , which is the first virus to utilize a ‘true’ 32-bit metamor-
mov eax, 1
lea eax, [ecx+1] phic engine instead of the polymorphic engine which is suscepti-
add eax, ecx
ble to AV scanners that can trace virus decryption in memory. A
push 3 metamorphic engine is used to transform the executable code: it
mov eax, 3
pop eax implements some sort of an internal disassembler to parse input
code, and then transforms the program code and produces new
mov eax, ebx
lea eax, [ebx+8] different code while retaining its functionality.
add eax, 8
The instruction transformation supported by the engine can be
mov [eax], 3 divided into two parts: Inter-engine transformations are inlined
push 3
push [eax] inside the engine as a part of the engine’s core. External Transfor-
mations take place outside the main engine function, yet they act
mov [eax], ebx
add [eax], ecx add ebx, ecx as if they are inside the engine itself and jump back to the engine
mov ebx, [eax] when they are finished. The engine’s decision on whether or not to
transform a given instruction is based upon a random factor. The
mov [eax], 2 engine asks for a random number between 0 and 7, and the trans-
add [eax], ecx add ecx, 2 formation will be applied only if it is 0. Hence there is a probability
mov ebx, [eax] of 12.5% that an instruction would be transformed. Furthermore,
the engine will only disassemble the instructions that the author
or eax, 0 nop
had included.
As shown in Figure 10, the disassembly of the virus’ code before
Figure 9: Code compressing examples in MetaPHOR transformation in the left column and the corresponding transfor-
med code in the right column. We can see that for the first row, the
transformation is semantics-preserving unconditionally. However,
explains our experiment results. Since the metamorphic engine for the last two rows, we can see the value of eax is given the value
uses fixed transformation tables (in shrinker and expander) and 0x04 and 0x09 respectively. Therefore, the last two transformations
the reserved space for virus body is fixed, we can conclude that
although MetaPHOR employs full metamorphism engine, it only
has a finite length of evolution, which can be studied by MetaHunt. 7 https://www.symantec.com/security-center/writeup/2000-122010-0045-99

9
1000
Before After

Metamorphic Mutation Comparison Time (s)


900 None
push eax O1 None
push eax push ecx 800 O2
mov [edi], 0x03 mov ecx, 0x03 O3 O1
700
O4
mov [edi], ecx
600
pop ecx
500

mov eax, 0x03 400


push 0x03
push eax O2
mov eax, 0x08 300
mov eax, 0x08 O3
200 O4
mov eax, 0x03 100
mov eax, 0x03
push eax 0
push eax
mov eax, 0x08 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Number of Executed Basic Blocks (Normalized)

Figure 10: An example of conditional transformation in


W32.Evol Figure 12: The effect of our optimizations over time on NG-
VCK family

mov ecx, random


mov ecx, imm
Mutation add ecx, imm - random
rcr [esp], cl
rcr [esp], cl

6 O1
O2 Figure 13: Example: buggy metamorphic engine implemen-
O3
O4
tation (add instruction may modify the value of carry flag).
5
Speedup (times)

4 5.4 Optimization
The binary comparison component in MetaHunt is optimized for
3 quickly checking the equivalence of two basic blocks. Various met-
hods in MetaHunt contribute to improve the performance of com-
2 parison. First, the preprocessing normalizes the trace and remove
the obfuscations. Second, the union-find set and DiffMap keep the
checked basic block in memory so as to accelerate the future com-
1
parison. Third, concretizing the formulas in symbolic execution
Av
Le

O ida
M

W
N

VM EN
G

Th
VC

bf
G ol
et 32

PC
32
xo

er
em t

.-L
VC

replaces some unnecessary symbols with concrete values. It redu-


L3

Pr
aP

ag
ta

.E

ot

LV
2
H
n

K
v

e
ec
O

ces the complexity of symbolic formulas so that the comparison


M
R

is faster. Last, the querymap in MetaHunt calculate hash value of


Figure 11: The impact of basic blocks fast matching when ap- the symbolic formulas and use them to quickly match the formulas.
plied cumulatively: O1 (preprocessing), O2 (O1 + union-find In order to show the speedup effect of each of these methods, we
set and DiffMap), O3 (O2 + concretizing symbolic formulas), incrementally add one method in MetaHunt and use it to compare
O4 (O3 + QueryMap). the variants in all the metamorphic engines. The result of the expe-
riment is shown in Figure 11. We can see that every method has
a significant speed-up and it achieves over 5X speedup with all
optimizations turned on.
We conduct another experiment on the NGVCK metamorphic
are semantics-preserving only if the register eax is not live when virus family and the result is shown in Figure 12. We observe that
they are applied. the optimizations in MetaHunt are able to speed up the comparison,
Similar to MetaPHOR, the W32.Evol engine adopts a fixed trans- especially when there are large number of basic block candidates.
formation table. Moreover, because the W32.Evol engine only disas- Therefore, MetaHunt’s optimization can improve its performance
sembles instructions that the virus author included, and the con- on comparing basic block variants from metamorphic engines.
ditional transformation can only be applied at the point certain
registers are not live, these restrictions further limits the possible 5.5 Finding Metamorphic Engine Bugs
mutations of the W32.Evol virus, which is coherent to our experi- Most metamorphic malware are running on the Intel x86 platform
ments. because of its popularity. However, x86 Instruction Set Architecture

10
Table 2: Conditionally equivalent instructions (reg, imm and Ether [18]). Currently, MetaHunt’s detection on opaque predica-
random stand for register, immediate value and random number, tes focuses on invariant opaque predicates, whose value remain
respectively). the same for all possible inputs. The most recent work can detect
more advanced cases such as contextual and dynamic opaque pre-
Instruction Substitution Condition dicates [31]. Although we do not see such complicated opaque
inc reg add reg, 1 carry flag is not set predicates in our evaluation, we will extend our work to handle the
dec reg sub reg, 1 carry flag is not set advanced opaque predicates proactively.
mov reg, [esp] Another argument against studying the evolution of metamor-
pop reg no EFLAGS bit is set
add esp, 4 phic malware is the relatively high cost. In fact, compared to the
sub esp, 4 number and diversity of the malware samples in circulation, the
push reg no EFLAGS bit is set
mov [esp], reg metamorphic engine evolves rather slower because of the great
add reg, imm sub reg, -imm overflow and carry development complexity. A successful metamorphic engine tends
flags are not set to be reused and shared by malware authors. For example, NG-
mov reg, random VCK [55] is widely applied to generate metamorphic virus and
mov reg, imm no EFLAGS bit is set Obfuscator-LLVM, is also used to mutate both desktop and Android
add reg, imm - random
applications [24, 54]. Therefore, our one-time efforts to approximate
the semantics of nontrivial metamorphic engines are worthwhile.
is complicated as well, which make the design of metamorphic Furthermore, considering that manually tracing metamorphic mu-
transformation rules very difficult. Especially, certain instructions tations usually takes several days to weeks of hard work, the degree
have implicit side effects. They reveal different semantics when the of MetaHunt’s overhead is acceptable.
value of EFLAGS register varies. If a metamorphic engine neglects
such subtleties of x86 instructions, it is very likely that semantics- 7 CONCLUSION
breaking mutations will happen. Table 2 lists that some instructions The metamorphic malware relies on its morphing engine to mu-
and their substitutions are only conditionally equivalent when cer- tate the malicious code from generation to generation so that each
tain EFLAGS register bits are dead. For example, the Intel manual variant is different in syntax. Metamorphic malware have been
indicates that “inc/dec” does not affect the carry flag while “add/sub” demonstrated to evade the conventional signature-based malware
does; the instruction “pop reg” (the third row in Table 2) does not detection successfully. The mutation engine itself is also constantly
modify any EFLAGS bits while “add” may set as many as six bits. evolving. In this paper, we attempt to tame the metamorphic muta-
Unfortunately, the examples shown in Table 2 are misused by many tion by systematically chasing the morphing code evolution. We
of our testing metamorphic engines. Figure 13 shows a possible apply trace-based semantic binary diffing to compare possible mu-
semantics-breaking mutation we find in NGVCK. The instruction tation variants iteratively and memoizes equivalent basic blocks.
“rcr” rotates right using the carry flag as the “extra” bit. Therefore, Without pre-knowledge about a particular metamorphic engine,
the modification to the carry flag before the “rcr” instruction may our exploration result can approximate its mutation mechanism. We
lead to an incorrect rotation result. However, the “add” instruction have implemented our approach called MetaHunt and performed
in the new version may modify the value of carry flag. Since Meta- empirical evaluations on a large set of metamorphic malware. Our
Hunt also trace the symbolic execution for each EFLAGS register generalized approach can be seen as a first step towards designing
bit, we can find metamorphic engine bugs in terms of conditionally mutation insensitive anti-malware solutions.
equivalent transformations. In our evaluation, we find 62 semantics-
breaking bugs in total. These metamorphic engine bugs lead to fatal ACKNOWLEDGMENTS
runtime errors in many cases.
We thank the anonymous reviewers for their valuable feedback.
6 DISCUSSIONS AND LIMITATIONS This research was supported in part by the National Science Foun-
dation (NSF) grants CNS-1652790, and the Office of Naval Research
The power of MetaHunt is limited by the non-perfect path coverage. (ONR) grants N00014-16-1-2265, N00014-16-1-2912, and N00014-17-
This is mainly due to the limitation of dynamic malware analysis. 1-2894. Jiang Ming was also supported by the University of Texas
We can leverage automatic input generation techniques [22] to System STARs Program.
explore more paths. Since MetaHunt depends on multi-tag taint
analysis to reduce the number of basic block comparisons, Meta- REFERENCES
Hunt exhibits similar limitations of taint analysis in general, e.g., [1] Shahid Alam, Issa Traore, and Ibrahim Sogukpinar. 2014. Current Trends and the
implicit information flow evasions [7]. One possible solution is to Future of Metamorphic Malware Detection. In Proceedings of the 7th International
leverage statistical binary similarity comparison [15, 16] to reduce Conference on Security of Information and Networks (SIN’14).
[2] Kapil Anand, Matthew Smithson, Khaled Elwazeer, Aparna Kotha, Jim Gruen,
the number of constraint solving on multiple paths. Another threat Nathan Giles, and Rajeev Barua. 2013. A Compiler-level Intermediate Represen-
to dynamic malware analysis is environment-sensitive malware. tation Based Binary Analysis and Rewriting System. In Proceedings of the 8th
ACM European Conference on Computer Systems (EuroSys’13).
Since we analyze metamorphic malware in TEMU, a malware sam- [3] Philippe Beaucamps. 2007. Advanced Metamorphic Techniques in Computer
ple can detect itself running in an emulator instead of the physical Viruses. In Proceedings of the 2007 International Conference on Computer, Electrical,
machine and then quit immediately. To evade such sandbox envi- and Systems Science, and Engineering (CESSE’07).
[4] D. Bruschi, L. Martignoni, and M. Monga. 2006. Detecting Self-mutating Malware
ronment check, a possible countermeasure is to analyze malware Using Control-Flow Graph Matching. In Proceedings of Detection of Intrusions
in a transparent analysis platform via hardware virtualization (e.g., and Malware & Vulnerability Assessment (DIMVA’06).

11
[5] Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2007. Code Normaliza- [30] Jiang Ming, Meng Pan, and Debin Gao. 2012. iBinHunt: Binary Hunting with
tion for Self-Mutating Malware. IEEE Security and Privacy 5, 2 (2007). Inter-Procedural Control Flow. In Proceedings of the 15th Annual International
[6] Lorenzo Cavallaro. 2014. Malicious Software and its Underground Economy. Conference on Information Security and Cryptology (ICISC’12).
https://www.coursera.org/course/malsoftware. [31] Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. LOOP: Logic-
[7] L. Cavallaro, P. Saxena, and R. Sekar. 2008. On the Limits of Information Flow Oriented Opaque Predicates Detection in Obfuscated Binary Code. In Proceedings
Techniques for Malware Analysis and Containment. In Proceedings of the GI of the 22nd ACM Conference on Computer and Communications Security (CCS’15).
International Conference on Detection of Intrusions & Malware, and Vulnerability [32] Jiang Ming, Dongpeng Xu, and Dinghao Wu. 2015. Memoized Semantics-Based
Assessment (DIMVA’08). Binary Diffing with Application to Malware Lineage Inference. In Proc. of the
[8] Sang Kil Cha, Iulian Moraru, Jiyong Jang, John Truelove, David Brumley, and 30th IFIP Int’l Information Security and Privacy Conference (IFIP SEC’15).
David G. Andersen. 2010. SplitScreen: Enabling Efficient, Distributed Malware [33] Vishwath Mohan and Kevin W Hamlen. 2012. Frankenstein: Stitching Malware
Detection. In Proceedings of the 7th USENIX Conference on Networked Systems from Benign Binaries. WOOT 12 (2012), 77–84.
Design and Implementation (NSDI’10). [34] Vinod P. Nair, Harshit Jain, Yashwant K. Golecha, Manoj Singh Gaur, and Vijay
[9] Mohamed R. Chouchane and Arun Lakhotia. 2006. Using Engine Signature Laxmi. 2010. MEDUSA: MEtamorphic Malware Dynamic Analysis Using Signa-
to Detect Metamorphic Malware. In Proceedings of the 4th ACM Workshop on ture from API. In Proceedings of the 3rd International Conference on Security of
Recurring Malcode (WORM’06). Information and Networks (SIN’10).
[10] Mohamed R. Chouchane, Andrew Walenstein, and Arun Lakhotia. 2007. Statistical [35] Beng Heng Ng and Atul Prakash. 2013. Exposé: Discovering Potential Binary
Signatures for Fast Filtering of Instruction-substituting Metamorphic Malware. Code Re-use. In Proceedings of the 37th IEEE Annual Computer Software and
In Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM’07). Applications Conference (COMPSAC’13).
[11] M. Christodorescu, S. Jha, S. Seshia, D. Song, and R. Bryant. 2005. Semantics- [36] Jeong Wook Oh. 2009. Fight against 1-day exploits: Diffing Binaries vs Anti-diffing
aware malware detection. In Proc. of the IEEE Symposium on Security and Privacy. Binaries. In Proceedings of the 2009 Black Hat USA.
[12] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. [37] Philip OKane, Sakir Sezer, and Kieran McLaughlin. 2011. Obfuscation: The Hidden
2001. Introduction to Algorithms (Second ed.). MIT Press, Chapter 21: Data Malware. IEEE Security and Privacy 9, 5 (2011).
structures for Disjoint Sets, 498–524. [38] Orr. last reviewed, 04/14/2015. The Molecular Virology of Lexotan32: Metamor-
[13] Mila Dalla Preda, Roberto Giacobazzi, and Saumya Debray. 2015. Unveiling me- phism Illustrated. http://www.openrce.org/articles/full_view/29.
tamorphism by abstract interpretation of code properties. Theoretical Computer [39] Rodney Owens and Weichao Wang. 2011. Non-normalizable Functions: a New
Science 577 (2015), 74–97. Method to Generate Metamorphic Malware. In Proceedings of the 2011 IEEE
[14] Mila Dalla Preda, Roberto Giacobazzi, Saumya Debray, Kevin Coogan, and Military Communications Conference (MILCOM’11).
Gregg M Townsend. 2010. Modelling metamorphism by abstract interpreta- [40] Panda Security. 2017. PandaLabs Annual Report 2017. https://www.pandasecurity.
tion. In International Static Analysis Symposium. 218–235. com/mediacenter/src/uploads/2017/11/PandaLabs_Annual_Report_2017.pdf.
[15] Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of [41] Mathias Payer. 2014. Embracing the new threat: towards automatically, self-
Binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming diversifying malware. Symposium on Security for Asia Network (SyScan’14).
Language Design and Implementation (PLDI). [42] Mathias Payer, Stephen Crane, Per Larsen, Stefan Brunthaler, Richard Wartell,
[16] Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of Binaries and Michael Franz. 2014. Similarity-based matching meets Malware Diversity.
Through Re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference arXiv Technical Report (2014).
on Programming Language Design and Implementation (PLDI). [43] Frédéric Perriot, Peter Ferrie, and Péter Ször. 2003. Striking Similarities:
[17] Priti Desai and Mark Stamp. 2010. A highly metamorphic virus generator. Inter- Win32/Simile and Metamorphic Virus Code. Symantec Security Response.
national Journal of Multimedia Intelligence and Security 1, 4 (2010). [44] Mila Dalla Preda. 2012. The Grand Challenge in Metamorphic Analysis. In
[18] A. Dinaburg, P. Royal, M. Sharif, and W. Lee. 2008. Ether: Malware Analysis via Proceedings of the 6th International Conference on Information Systems, Technology
Hardware Virtualization Extensions. In Proceedings of the ACM Conference on and Management (ICISTM’12).
Computer and Communications Security (CCS’08). [45] Kevin A. Roundy and Barton P. Miller. 2013. Binary-code Obfuscations in Preva-
[19] The Mental Driller. last reviewed, 04/14/2015. Metamorphism in practice or How lent Packer Tools. Comput. Surveys 46, 1 (2013).
I made MetaPHOR and what I’ve learnt. http://vxheaven.org/lib/vmd01.html. [46] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung
[20] Vijay Ganesh and David L. Dill. 2007. A Decision Procedure for Bit-vectors and Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena.
Arrays. In Proceedings of the 2007 International Conference in Computer Aided 2008. BitBlaze: A New Approach to Computer Security via Binary Analysis. In
Verification (CAV’07). Proceedings of the 4th International Conference on Information Systems Security
[21] Debin Gao, Michael K. Reiter, and Dawn Song. 2008. BinHunt: Automatically (ICISS’08).
finding semantic differences in binary programs. In Poceedings of the 10th Inter- [47] Peter Szor. 2005. The Art of Computer Virus Research and Defense. Addison-Wesley
national Conference on Information and Communications Security (ICICS’08). Professional.
[22] P. Godefroid, M. Y. Levin, and D. Molnar. 2008. Automated Whitebox Fuzz [48] Péter Ször and Peter Ferrie. 2001. Hunting For Metamorphic. Symantec White
Testing. In Proceedings of the 15th Annual Network and Distributed System Security Paper.
Symposium (NDSS’08). [49] Teja Tamboli, Thomas H. Austin, and Mark Stamp. 2014. Metamorphic code
[23] Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards Automatic Soft- generation from LLVM bytecode. Computer Virology and Hacking Techniques 10,
ware Lineage Inference. In Proceedings of the 22nd USENIX Security Symposium. 3 (2014), 177–187.
[24] Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator- [50] Andrew Walenstein, Rachit Mathur, Mohamed R. Chouchane, and Arun Lakhotia.
LLVM – Software Protection for the Masses. In Proceedings of the IEEE/ACM 1st 2008. Constructing malware normalizers using term rewriting. Computer Virology
International Workshop on Software Protection (SPRO’15). 4, 4 (2008), 307–322.
[25] Arun Lakhotia, Mila Dalla Preda, and Roberto Giacobazzi. 2013. Fast Location of [51] Shuai Wang, Pei Wang, and Dinghao Wu. 2015. Reassembleable Disassembling.
Similar Code Fragments Using Semantic ’Juice’. In Proceedings of the 2nd ACM In Proceedings of the 24th USENIX Security Symposium (USENIX Security ’15).
SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW’13). USENIX Association.
[26] Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for [52] Shuai Wang, Pei Wang, and Dinghao Wu. 2016. Uroboros: Instrumenting Stripped
Lifelong Program Analysis & Transformation. In Proceedings of the International Binaries with Static Reassembling. In Proceedings of the 23rd IEEE International
Symposium on Code Generation and Optimization (CGO’04). Conference on Software Analysis, Evolution, and Reengineering (SANER ’16). USE-
[27] Felix Leder, Bastian Steinbock, and Peter Martini. 2009. Classification and de- NIX Association.
tection of metamorphic malware using value set analysis. In Proceedings of the 4th [53] Matt Webster and Grant Malcolm. 2009. Detection of metamorphic and
International Conference on Malicious and Unwanted Software (MALWARE’09). virtualization-based malware using algebraic specification. Computer Virology 5,
[28] Jusuk Lee, Kyoochang Jeong, and Heejo Lee. 2010. Detecting Metamorphic 3 (2009), 221–245.
Malwares using Code Graphs. In Proceedings of the 2010 ACM Symposium on [54] Ryan Welton. 2015. Obfuscating Android Applications using O-LLVM and the
Applied Computing (SAC’10). NDK. http://fuzion24.github.io/.
[29] Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. [55] Wing Wong and Mark Stamp. 2006. Hunting for metamorphic engines. Computer
Semantics-based Obfuscation-resilient Binary Code Similarity Comparison with Virology 2, 3 (2006), 211–229.
Applications to Software Plagiarism Detection. In Proc. of the 22nd ACM SIGSOFT [56] Qinghua Zhang and Douglas S. Reeves. 2007. MetaAware: Identifying Metamor-
Int’l Symposium on Foundations of Software Engineering (FSE’14). phic Malware. In Proceedings of the 23rd Annual Computer Security Applications
Conference (ACSAC’07).

12

You might also like