100% found this document useful (1 vote)

53 views9 pages

Defeating Polymorphism Beyond Emulation

Uploaded by

wagdcps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

53 views9 pages

Defeating Polymorphism Beyond Emulation

Uploaded by

wagdcps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

DEFEATING POLYMORPHISM: Of course, any combination of the above could be used as a

malware signature, and the list is not exhaustive. In an attempt
BEYOND EMULATION to defeat detection by signatures, malware writers started to
use code obfuscation techniques, such as encryption. In the
Adrian E. Stepan
beginning, viruses used fairly simple encryption schemes and
Microsoft Corp., One Microsoft Way, Redmond, only the keys changed from one generation to another, while
WA 90852, USA the encryption algorithm remained constant; these are known
as oligomorphic viruses. Later, more sophisticated
Tel +1 425 706 9498 • Fax +1 425 936 7329 polymorphic techniques were developed. Such viruses were
able to change both the encryption algorithm and the keys used
to encrypt themselves upon each replication; some were able
ABSTRACT to generate multiple encryption layers.
The most common method of detecting malware relies on It is still possible to detect a polymorphic virus using
signatures extracted from the malware body. Attempting to signatures, but the virus body must be decrypted first. There
defeat this method and evade detection, malware writers have are several methods that are widely used in the AV Industry for
resorted to code obfuscation techniques, thus creating the purpose of decrypting polymorphic viruses: cryptanalysis
polymorphic viruses. (also known as x-ray), dedicated decryption routines,
emulation, etc.
There are several well-known methods of decrypting
polymorphic viruses, such as emulation, cryptanalysis (x-ray)
and dedicated decryption routines. Each of these methods has 2. TECHNIQUES CURRENTLY USED TO
some limitations: x-ray can handle only simple decryptions; DEFEAT POLYMORPHISM
dedicated routines require significant development effort and X-ray works by attempting to find the decryption key by using
neither scales well with the number of detected viruses. the known decryption algorithm and a fragment of decrypted
Emulation doesn’t have these weaknesses, but emulating code code, which is part of the signature. For each key, an equation
is significantly slower than executing it on a real CPU. is written, that expresses this key as a function of the
Therefore a very complex polymorphic virus would take an encrypted code, decrypted code and the other keys. Solving
unreasonable length of time to emulate until it is decrypted. the system will produce the correct set of keys required to
This paper proposes a new method of dealing with decrypt the virus. The method is fairly simple to implement
polymorphic malware. The method relies on disassembling the and has good performance for a single given decryption
analysed code dynamically and performing just-in-time algorithm. In some cases, it is also able to detect fragments of
compilation targeted for the host CPU. The code obtained as a the virus code, even if the entire virus is not present or is not
result can be executed safely on the host CPU, with little functional. However, the method does not scale well with the
degradation in execution speed, compared to the original code. number of detected viruses, because it needs to be run for each
This provides the same flexibility as emulation but different encryption algorithm. Also, it can only handle simple
performance, in terms of speed, is dramatically improved. algorithms, as the equation system becomes impossible to
Additionally, the method could be used for other purposes, solve for more complex ones. Its usefulness is very limited in
such as generic unpacking of packed executables, and the case of viruses having multiple encryption layers.
behaviour-based analysis of complex code. Dedicated decryption routines can be developed to detect any
virus, and performance for any given virus is usually better
1. DETECTING MALWARE, AN ONGOING compared to the x-ray method. Unfortunately, writing such a
BATTLE routine requires that the virus is analysed completely and the
developer understands completely all the possible variants of
Malware has a long history of evolution. During the past two encryption that the virus can generate. A thorough analysis of
decades, malware has evolved with regard to replication and the malware and then developing and testing a specific
spreading mechanisms, as well as techniques used to prevent detection routine could take a lot of work and a lot of time to
analysis and/or detection. Such techniques include accomplish. Therefore, the response time when using this
anti-debugging, encryption, packing, entry point obscuring, etc. solution can often be quite long. Additionally, this method
One of the first methods used to detect malware relied on does not scale well with the number of detected viruses, as
signatures extracted from the malware body. Despite the each file must be checked with all the available routines.
significant evolution of malware, using signatures is still the Executing the decryption code from the virus itself would
most common detection method used today. However, there decrypt the virus body with excellent speed performance, but
are some things that have changed, such as the types of this is not a very good idea, for a number of reasons:
signature data and the methods used to search for such
signatures. A modern AV engine may use lots of different types • There isn’t any simple, reliable way of stopping the
of data extracted from the malware body: execution when the decryption is complete; therefore the
virus could replicate, do some damage to the host system
• Patterns, with or without wildcards, also known as and re-encrypt itself and without being detected
‘strings’
• The virus could do some damage even during decryption
• Checksums (CRC, MD5, SHA1)
• The code could be buggy and crash or enter an infinite
• Behaviour patterns loop
• File geometry, execution flow geometry • The virus code could be written for a different hardware/
• Statistic distribution of code instructions software platform, so it might not run at all

40 VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

Of course, the code could be executed in a controlled happens because there will always be a small percentage of
environment, such as a virtual machine, but a general-purpose clean files for which the emulator will never be able to
virtual machine is very complex software. Including one in an determine that they are clean, no matter how much it will
AV product would require software emulation for the full analyse. Of course, the time limit can be adjusted
system environment, from device drivers to system APIs, dynamically: the emulator could increase it if suspicious
which is simply too much overhead. behaviour is detected or decrease it otherwise. However, there
None of the methods described above is able to detect new are instances in which legitimate programs are encrypted with
malware in a generic way. Their purpose is to just decrypt the same encryption engines as certain viruses, or viruses use
polymorphic viruses, so that signature-based detection can be code that looks benign, etc. Even with adjusting the time limit
used. While it’s possible to develop dedicated routines that are dynamically, there will have to be a hard limit, to avoid
able to detect entire malware families (and often new variants having the emulator analysing a file in an infinite loop. This
as well), writing a routine that is able to analyse an arbitrary means that it will always be possible for a virus writer to
program and determine malicious behaviour is not feasible. determine what this limit is for a particular AV engine and
write a virus that would need to be emulated longer than that
Use of emulation solves all the above problems. Potential in order to be detected. Viruses such as Win32://Coke,
malicious code runs in a controlled, simulated environment. KME-based, etc. would take unreasonably long to emulate,
Hardware resources such as CPU registers are modelled using but they would still decrypt and replicate on the real machine
software data structures. The behaviour of each instruction is in a reasonable time, because native execution is typically
reproduced by a set of software routines designed to update hundreds of times faster than emulation.
the corresponding data structures, in the same way the
instructions would update the hardware resources when
3. DYNAMIC TRANSLATION
executed on a real CPU. Each instruction is first decoded, in
order to find the instruction type, length, operands that need The ‘dynamic translation’ method described below offers the
to be updated, etc. After this, the appropriate emulation same flexibility as emulation, while improving performance
routine is called to update the data structure describing significantly. It relies on disassembling the code to be
hardware resources. The address of the next instruction is analysed dynamically and translating it into functionally
obtained either as a result of instruction decoding or equivalent code that is safe to execute on the host machine.
computed by the emulation routine (in the case of branch The executable code obtained as a result of the translation is
instructions). persisted; if the code is executed inside a loop, the persisted
The emulation process usually starts at the program’s entry code can, in most cases, be executed directly, without
point and instructions are emulated sequentially until a requiring retranslation.
malware signature is found, the emulator is able to conclude Disassembling and translating an instruction requires a
that the program is not malicious or emulator resources are computational effort that is comparable to emulating an
exhausted. In addition, the emulator has to decide when to instruction; executing the obtained code is typically slower
scan for malware signatures and what data to scan, call the than executing the original instruction, but much less so than
scanner, collect and analyse data obtained during emulation emulating the instruction. If a code sequence is executed in a
for behaviour-based heuristic detection or for the purpose of loop, the code will be translated and executed at the first loop
deciding that the program is not malicious. iteration, and for all subsequent iterations the persisted code
Emulation can be used to decrypt any encrypted code, obtained at the first iteration will be executed. Thus, the
regardless of the complexity of the encryption algorithm, method eliminates redundant analysis of repeating code
given that the decryption code is available and the emulator is sequences. Compared to emulation, the time required to
provided with enough resources to complete the decryption. complete the first loop iteration would be approximately the
Providing resources such as memory is not very difficult, as same, while the subsequent iterations will take considerably
the requirement is comparable with that of the analysed less time.
program (the overhead caused by the emulator’s internal data
structures is typically negligible). However, emulating code is 3.1. Partitioning the code into blocks
significantly slower than running the code on a CPU that can One of the problems that the implementation of the DT
execute it natively. This limitation is impossible to overcome, engine has to address is determining whether translated code
because for each emulated instruction an emulator has to is available for any given instruction, and if so, locating the
execute hundreds of instructions to decode it, update the corresponding code. One possible solution is maintaining a
internal data structures, decide if scanning is needed, decide if table with virtual addresses of translated instructions and
more instructions should be emulated or not, find the address addresses of corresponding executable code. However,
of the next instruction, etc. Typically, emulation is hundreds searching a virtual address in this table for each processed
of times slower than execution. instruction is computationally very expensive, negating the
An emulator in an AV engine is required to analyse any given speed advantage of executing translated code. A much more
file in a finite time, during which it must determine if the file efficient way of solving this issue would be to partition the
is malicious or not. When the maximum allowed time for a original code into blocks of instructions and only store a table
file has elapsed and no malware signatures have been entry for each block. This way, the table would have
detected, the emulator must stop the analysis and conclude significantly fewer entries and searching would need to be
the given file is not malicious. It is always possible that the performed for each block, as opposed to each instruction.
maximum time limit was set to be too short for a particular Dividing the original code into blocks cannot be done in an
malware to be detected. On the other hand, increasing this arbitrary way; blocks need to have some specific properties,
time limit will deteriorate the emulator’s average speed. This as described below, that limit the size of each block. On the

VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be 41
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

other hand, the bigger the blocks are, the more efficient the that the current BB will always be a successor of the previous
storage and searching will be. BB. This information can be stored, in order to avoid
For the rest of this paper, a ‘basic block’ (BB) will be defined unnecessary searching in the BB address ranges in case these
as a contiguous block of code having a single entry point at blocks are inside a loop. If the previous jump instruction was
the beginning of the block and a single exit point at the end of unconditional, the current BB is the only possible successor;
the block. If the code within such a block is executed via a in the case of a conditional jump instruction, there can be at
call instruction to the beginning address of the block, all the most two different successor blocks. If a block ends with a
instructions in the block will be executed. A single instruction computed jump instruction, the destination address could be
is needed, at the end of the block, to return control to the different each time the block is executed, so in this case
caller. As a consequence, any basic block of original code will determining the successor block requires searching, as
contain at most one jump instruction. Basic blocks that don’t described above, in the list of existing block address ranges. A
contain any jump instructions are valid, but suboptimal. They list of successor blocks determined at previous iterations
could be created as a result of splitting different blocks or for could be also stored and used to speed up searching, in case
other practical reasons such as the DT engine having the number of different possible successor blocks is
insufficient resources to translate a bigger block, etc. Such reasonably small.
blocks don’t need to be treated differently, as we can consider Delimiting the original code into basic blocks as described
that they end with a virtual unconditional jump to the above and maintaining a data structure describing the blocks
following instruction. and the relations between them has several advantages:
Each discovered BB will be described by a set of properties, • If several blocks are executed inside a loop, searching for
such as: a successor block needs to be done only once for each
• Block boundaries; the address of the first instruction in successor; after all the successors of a particular block
the BB and the address immediately following the last have been determined, there is no need to search for a
instruction in the BB will be used to delimit the block; successor of that block at any subsequent loop iteration.
these are linear addresses in the address space of the • The list of beginning addresses for the discovered blocks
original code. can be used as a list of ‘entry points’ to scan for malware
• Executable code obtained as a result of translating the signatures; each discovered BB needs to be scanned
original code in the block. only once.

• Miscellaneous flags, indicating if the block was • It provides data for applying dynamic code
translated or not, if it was scanned for malware optimizations. Since optimizing code is computationally
signatures, if the block has any known successor expensive, blocks that are executed more frequently are
blocks, etc. better candidates for optimizations.

Discovering and delimiting basic blocks is a dynamic process,

meaning that new blocks may be discovered or existing 3.2. Scanning for malware signatures
blocks could be modified as a result of processing previously After the virus body has been decrypted, the virus can be
discovered blocks. After translating a block and executing the detected by finding a signature in the decrypted body. If the
resulted code, the beginning address of the next block to be virus is not metamorphic, any fragment of the virus body
processed will be the destination address of the jump could be used as a signature, providing that it is specific
instruction at the end of the block that was just executed. enough to provide accurate identification and not cause false
Let’s call this address the ‘current address’. The next block to positives. Finding a signature in the decrypted virus body
be analysed has to be a BB starting at the current address. raises the problem of deciding when and where to search for
This block will be determined by searching the current signatures; the search method must guarantee that no
address in the list of block address ranges for previously signature could go undetected, but it also has to be as fast as
discovered blocks; the following situations are possible: possible. In the case of a polymorphic virus, the scanner can
i An existing BB is found that begins at the current guarantee detection only if at least one signature search is
address; this becomes the current BB and will be performed after the virus body has been completely
processed in the same way as the previous BB. decrypted, but before it starts encrypting itself again.

ii The current address is found inside the address range of Given an arbitrary virus, it is extremely difficult to determine
an existing BB. According to our definition, a BB must the exact moment when the above condition is met during the
have a single entry point; therefore this block must be analysis. This is true for both emulation and the dynamic
split. The existing block will be modified such as its translation methods. One possible approach would be to
address range will end at the current address and a new detect decryptor loops and scan the code decrypted by each
BB will be created, starting at the current address. If the such loop at the end of the loop. However, this is not very
split block was already translated, the existing code will easy to implement, as determining decryptor loops, loop exit
not be truncated, but the code starting at the current criteria and range of decrypted code are fairly complex tasks.
address will be retranslated for the newly created block. Also, this method does not guarantee a minimum number of
scans; if the virus has multiple decryption layers, or the same
iii The current address is not found inside the address layer is executed multiple times (i.e. in brute-force
range of any existing BB. In this case, a new BB will be decryption), the code will be scanned redundantly for each
created, starting at the current address. decryption layer. There are also polymorphic viruses for
If the previous BB ended with an immediate jump instruction which the entire decrypted body cannot be found in memory
– for which the destination address is constant – this means at any time (i.e. Dark_Paranoid). In such cases, an instruction

42 VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

or a small code sequence is decrypted, executed and then virtual CPU is used instead of the real one. Other hardware
immediately encrypted again. For these viruses, the decrypted resources (I/O ports, IRQ controllers, disk drives, etc.) are
body can be obtained by logging each instruction the first typically virtualized to protect the host machine from any
time it is encountered and sorting the logged instructions by damage.
address. Scanning the resultant log guarantees that the entire
There are multiple ways in which a code translation that
virus code will be scanned for signatures, and no redundant
meets the above criteria can be achieved:
search operations are performed, because no code sequence
will be scanned more than once. i Translating directly from the original code to target
code: each original instruction will be decoded and then
It is generally preferable to extract signatures from the
an equivalent instruction or instruction sequence will be
constant part of a polymorphic virus and not from the
generated for the target code. This is the simplest
decryptor code; therefore the code that is used for decryption
translation method to implement, for any source and
doesn’t necessarily need to be scanned. In practice, however,
target binary languages.
it is typically more computationally expensive to identify the
code used for decryption precisely and exclude it from ii Translating using an intermediate language (IL): each
scanning, than it is to scan it. original instruction will first be translated into an
Another difficulty comes from the fact that one cannot predict intermediate code sequence and then the intermediate
the location in the decrypted code where a signature might be code will be translated to target native code. This
present. Therefore, scanning a chunk of code without any method is preferable when we have multiple sources (S)
knowledge about the code structure requires searching for and multiple target (T) languages. With direct
signatures starting from up to M - N offsets in the given code translation, we would need in this case S * T translators,
chunk, where N is the size of the chunk in bytes and M is the while using an intermediate language we only need
size of the shortest signature searched. S + T translators (S translators from a source language
to IL and T translators from IL to a target language).
Dividing the code into basic blocks, as described in section There are other advantages as well, such as the
3.1., provides a means to scan the entire code that was possibility to perform code optimizations using the IL
analysed, in a reliable way and with minimum number of form. The IL should be platform-independent, but could
signature searches. Having the data structure describing the be designed in a way that favours translation speed for
basic blocks, it is easy to determine, for any particular block, some particular translators (those that are more
whether it is the first time we’re analysing it (in which case frequently used). As a drawback, the IL would need to
we also need to scan it) or whether we have analysed it before support all the possible operators, operand types and
and no scanning is needed. As part of maintaining the BB data combinations of these, from all the source languages,
structure, we are also required to detect when any block that raising its degree of complexity. This could prove to be
has previously been analysed is being overwritten (because if a problem, because translating to or from languages
this happened, the block would need to be retranslated). A with a reduced instruction set is generally faster, per
block can be scanned at any time after it was discovered and translated instruction, than in the case of complex
before it is overwritten, and it needs to be scanned only once. languages.
If we choose, for instance, to use only signatures starting at
the BB boundary, the maximum number of scans needed will iii Combining the above methods could be achieved in a
be the number of unique basic blocks in the code. Of course, a way that preserves the advantages of both, without any
signature could span multiple blocks, in which case all of of their disadvantages. Most instructions could be
these blocks need to be discovered and decrypted, so that the translated using a fairly simple intermediate language,
signature can be detected. If some of the blocks are while the most exotic and complex ones, that would
overwritten before a signature is detected, the code must be also require a complex IL, will be translated directly.
logged as described above. Typically, the code obtained as a result of translation will not
be as efficient as the original code. This may happen for
3.3. Translating code various reasons: some instructions or operand encodings from
the source language might not have a 1:1 correspondence in
Given an arbitrary program code to be analysed by an AV
either the IL or the target language, some hardware resources
engine not only needs the code to be considered unsafe, but it
used in the source language need to be preserved or not used
could also be the case that the code is compiled to run on a
at all in the target language because of specific restrictions,
different hardware platform than the AV engine. Even if the
memory mapping has to be virtualized because the source and
code to be analysed executes natively on the same CPU as the
target languages might use different mappings, etc.
AV engine, it might need to run in a different CPU mode (i.e.
x86 real mode vs. protected mode), have different memory It is possible to perform some optimizations at basic block
mapping mode, execute under a different operating system, level, at translation time, to improve the efficiency of the
etc. Therefore, even if the code was safe or we could translated code. If the translation uses an IL, the best idea
somehow make sure the host machine cannot be damaged, it would be to perform the optimizations on the IL, because the
still wouldn’t be possible to execute correctly any arbitrary algorithms involved will only need to support this language,
part of the given code, from the AV engine process. It is, as opposed to all the source and target languages. The IL
however, possible to translate the given code into another could also be designed to facilitate optimizations such as
code sequence that is functionally equivalent with the original define-usage chains, copy propagation, etc., while the other
one and that can be safely and correctly executed on the host languages might not be very suitable for performing such
machine. In our case, the host CPU will be used to execute optimizations. Also, the IL may be used to pass translation
the translated code directly, while in the case of emulation a ‘hints’ from the source translator to the target translator.

VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be 43
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

If the analysed code is linear, the code will be translated once 8 If the current BB was previously translated but the
and executed once. As the execution time is negligible original code was overwritten, discard the translated
compared to translation time, in this case translation would code, as it is no longer valid.
account for most of the analysis time. For clean files that 9 Translate the current BB.
don’t unpack or decrypt themselves and don’t have any
suspicious behaviour, there is usually no need to analyse lots 10 Prepare for executing the translated code for the current
of looping code; in this case, the translators must be BB: save hardware resources that are used by both this
optimized for speed. On the other hand, for polymorphic algorithm and the translated code and need to be
malware and occasional clean files that contain unpacking or preserved, such as CPU registers, etc., depending on
decryption code, most time will be spent analysing loops and implementation.
executing code that is already translated. In this case, the 11 Execute the translated code for the current BB via a call
speed of the code generated by the translators is more instruction; after execution is complete, control will be
important than the translation speed. As improving the returned by the executed code to the caller.
efficiency of the translated code is done at the cost of
12 Restore resources saved at step 10.
translation speed, a compromise between these two must be
obtained. 13 Handle any errors that might have happened during
execution, decide whether to continue analysing the
An example of code translation is given in Appendix A.
next block or stop.

3.4. DT execution flow During execution, the translated code has to check, before
each write to memory operation, whether an existing block
For each file that needs to be scanned for malware, analysis would be overwritten by the data being written. The checking
consists of sequentially identifying and processing of basic may be skipped only if it can be determined, at translation
blocks, as defined in section 3.1. At the beginning of the time, that this particular write operation could never overwrite
analysis, the current address is initialized to the entry point of an existing block. If one or more blocks are overwritten, the
the program to be scanned. After each BB is processed, the corresponding translated code will be marked as ‘dirty’,
current address is updated to the destination address of the meaning that it will be discarded at step 8, when those blocks
jump instruction at the end of this block. The analysis will be processed again. If one of the blocks that were
continues with the BB starting at the new current address, overwritten is the current BB and code beyond the current
until a malware signature is detected or the program is execution point was overwritten, the execution must not be
determined to be clean. allowed to continue to the end of the current BB, as this
In a simplified description, after each BB is analysed, the would mean executing code that is no longer valid. In this
following processing algorithm has to be performed for the case, the write operation is allowed to happen and then
next one: execution is allowed to continue until the entire translated
code sequence corresponding to the current original
1 Look in the data structure describing relationships instruction is executed. This is done because execution cannot
between blocks for a known successor of the last be interrupted in the middle of the code sequence for an
analysed block; if a known successor exists, make this original instruction, otherwise resuming the execution would
block the current BB and continue from step 6. not be possible. After all, overwritten blocks are marked as
2 Search the block address range table for a previously dirty, execution is interrupted and control is returned to the
discovered block starting at the current address. If such caller, with the current address set to the address where
a block is found, make it the current BB and continue execution was interrupted.
from step 5. Handling of exceptions such as division by 0, page fault, etc.
3 If the current address is found inside the address range is done in a similar way: execution is interrupted at the
of an existing BB, split this BB to end at the current address of the instruction that generated the exception; if an
address. Create a new BB starting at the current address, exception handler is present, execution may continue with the
make the new block the current BB and continue from exception handler code. Information is passed to the
step 5. exception handler, enabling it to resume execution at the point
where it was interrupted.
4 If no existing block was found, that either starts or
includes the current address in its address range, create Obtaining the address of the original instruction
a new BB starting at the current address; this will be the corresponding to a given instruction in the translating code
current BB. requires some computational effort. For the original code, the
real CPU keeps an instruction pointer register, updating it
5 Update the data structure describing relationships after each instruction is executed. Reproducing this behaviour
between blocks: store the information that the current in the translated code would mean generating and executing
BB is a successor of the previously processed BB. code for an extra
6 If the current BB was not scanned for signatures, scan <add instruction_pointer, instruction_code_size>
for signatures starting at the current address. If a
operation, for each translated instruction. For simple
signature is found, stop the analysis and report the file
arithmetic instructions, this would mean doubling the
as malicious, otherwise mark the block as scanned.
translated code size and execution time, which is
7 If the current BB is already translated and the code was unacceptable. Therefore, computing the value of the current
not overwritten since last translation, continue from instruction pointer will only be done when needed, if an
step 10. exception happens or the current block is being overwritten.

44 VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

For this purpose, a table is used, that relates offsets of fashion as translating from a binary language. Using an
instructions within the current BB to addresses of intermediate language might be more suitable for translating
corresponding instructions, in the original code. from a non-binary language, because it could provide support
Special care has to be taken when calling high-level language for variable number of operands and operand allocation,
compiled functions from the translated code. CPU registers which are required by virtually all script languages.
that might be used by the compiler in global optimizations Using DT to analyse scripts, however, has to deal with some
must be saved at step 10 and restored before calling a specifics. For instance, a script might not be able to modify
high-level language function, in case they were changed by itself while it is being interpreted, but it could generate other
the translated code. The stack pointer and frames must also be script files dynamically and call the interpreter to run these.
preserved. The translated code must also save and restore any In this case, each particular file can be translated statically,
registers computed before the high-level language function is but the analysis is still a dynamic process, because not all of
called, whose values are still needed after the function returns. the component files are available at the beginning of the
analysis. Providing an environment for analysing scripts is
3.5. Environment more difficult than in the case of executable files. Some of
the major challenges are accurately reproducing the
In order to obtain a correct behaviour while analysing behaviour of the script interpreter and of OS commands and
programs, the DT engine must provide access to various utilities that might be called from the script. Different
hardware devices (disk drives, keyboard, mouse, network versions of operating system or script interpreter might
interface, video card, real-time clock, etc.), as well as behave differently.
software resources, such as BIOS data structures and routines
and operating system APIs. If the code provided as input to Dynamic translation can also be used to translate malware
the DT engine could be malicious, most of these resources detection routines from a platform-independent byte code into
need to be virtualized. This requires a lot of development executable code for the host CPU. This way, an AV engine can
effort, given the large number of devices and system APIs that easily be updated with new detection capabilities, without the
need to be supported. However, virtualizing the system need to actually change the engine code. The benefits
environment for a dynamic translation engine is done in provided are smaller engine code, smaller (and faster) updates
almost the same way as in the case of an emulator, so code and less testing effort required for the new routines. The new
reusing is possible if an emulator is available. detection routines will be sent as ‘data’, the same way as
signature/pattern database updates. They will be loaded and
Accessing virtualized devices, as opposed to real ones, translated to native code when the engine is loaded in
offers improved speed performance. A virtual device is in memory and then executed, as needed, with little speed
fact a data structure in memory, which can be accessed much penalty compared to native code generated by a compiler.
faster than a disk drive, for instance. Often there’s no need to
fully implement all the functionality of a device, because no The approach used to translate such routines will be slightly
existing malware would require it. Therefore, a virtual device different from the one used to translate files that are scanned
will be less complex than a real one, making it even faster for malware, because in this case it is already known that the
in operation. detection routine code is not malicious. This allows the use of
a real environment as opposed to an emulated one. Also, there
Real devices may be used in a few cases – for instance, the is no need to scan any of the translated code and there is no
current time could be obtained by using the real-time clock. need to check for overwritten blocks, because the code
However, this may cause the analysed code to behave doesn’t decrypt or otherwise modify itself. All the successors
incorrectly. As the code actually runs slower inside the DT for all basic blocks can be determined when the engine is
engine than it would run natively, time inside the DT engine loaded, which means that we don’t have to search for any
should also pass proportionally slower. Otherwise, it would be successor block at execution time. For these reasons, the
possible for the analysed code to determine, based on this translated code obtained for detection routines will be a lot
inconsistency, that it’s running in a virtual environment. This more efficient than the code typically obtained by translating
is used by malware as an anti-emulation technique. files for the purpose of scanning.
Some malware would function correctly only if the In the case of packed executable malware, unpacking is
environment is configured in a particular way. For instance, typically needed before a signature can be detected. Using
they might require a specific OS version, work only on a signatures extracted from packed code is not always practical,
certain file system type, check for the presence of a specific even for non-polymorphic malware. Some strings or other
file in a known place or work only within a certain calendar data might change inside the binary, upon each replication,
date range, etc. In these cases, it is difficult to configure the causing the packed file to change in such a way that the
environment so that the code will run correctly, as different signature won’t match any more. Detection based on
malware may have different conflicting requirements. packed code is not possible if the packer is polymorphic.
Heuristic or generic detection can be achieved only by using
4. DEVELOPING FURTHER APPLICATIONS unpacked code.
Decrypting code in order to provide signature-based detection Writing unpacking routines for all the packers publicly
for polymorphic executable file infectors is just one possible available takes a lot of development and test effort. In some
application of dynamic translation. The code to be analysed cases, writing an unpacker routine would require reverse
doesn’t need to be a CPU instruction code, it could also be a engineering the packer and there might be some legal
platform-independent byte code, such as MSIL byte code or a restrictions preventing this. In the absence of a dedicated
script. Translating from a non-binary language to native unpacking routine, a packed executable could be emulated
executable code for the host CPU can be done in a similar until the unpacked code is obtained. However, unpacking with

VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be 45
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

an emulator could be very slow, especially for large packed

files that would typically require emulating several millions of
instructions. Using dynamic translation, a file could be
unpacked significantly faster, compared to emulation,
providing detection for malware packed with new packers,
with reasonable speed performance, before a dedicated
routine is developed. In some cases, the generic unpacking
using DT could prove fast enough that dedicated routines
won’t even be needed.

APPENDIX A
The table below shows an example of translation of a 16-bit
x86 code sequence sample to 32-bit x86 target code, as
generated by the prototype implementation of the Dynamic
Translation method.

Original code Intermediate language Translated code Comments

movzx edx, w[ebp+offset_reg_BX] The memory mapper receives the
movzx eax, w[ebp+offset_reg_DS] virtual address in edx and returns the
L_decrypt: mov -, d[(reg.DS<<4)+reg.BX], reg.EAX shl eax, 4 real address in eax
mov eax, [bx] add edx, eax
call [ebp+offset_memory_mapper]
mov ecx, [eax]
mov [ebp+offset_reg_EAX], ecx
add ax, 1234h add reg.AX, 1234h, reg.AX lea eax, [ebp+offset_reg_AX] Possible optimization: add
add w[eax], 1234h w[ebp+offset_reg_AX], 1234h
mov eax, [ebp+offset_reg_AH] The layout of the Flags register
xor reg.AL, reg.AH, reg.AL xor [ebp+offset_reg_AL], al image in memory may be different
xor al, ah Saveflags reg.Flags lahf than the actual layout of the native
seto al Flags register, for speed reasons
mov [ebp+offset_reg_Flags], ax
lea eax, [ebp+offset_reg_DX] Possible optimization: movzx ecx,
mov dx, ax mov -, reg.AX, reg.DX movzx ecx, w[ebp+offset_reg_AX] w[ebp+offset_reg_AX] mov
mov w[eax], cx w[ebp+offset_reg_DX], cx
shr eax, 16 shr reg.EAX, 10h, reg.EAX lea eax, [ebp+offset_reg_EAX] Possible optimization: shr
shr d[eax], 10h d[ebp+offset_reg_EAX],10h Flags
don’t need to be stored, because they
will be overwritten by the next
instruction.
mov cl, [ebp+offset_reg_CL]
ror reg.AX, reg.CL, reg.AX ror w[ebp+offset_reg_AX], cl
ror ax, cl Saveflags reg.Flags lahf
seto al
mov [ebp+offset_reg_Flags], ax
movzx edx, w[ebp+offset_reg_BX]
movzx eax, w[ebp+offset_reg_DS]
mov -, reg.AX, shl eax, 4
mov [bx], ax w[(reg.DS<<4)+reg.BX] add edx, eax
call [ebp+offset_memory_mapper]
mov cx, [ebp+offset_reg_AX]
mov [eax], cx
mov edx, 2 The virtual address could be obtained
add dx, [ebp+offset_reg_BX] by adding 2 to the virtual address
movzx eax, w[ebp+offset_reg_DS] computed for the previous
mov -, reg.DX, shl eax, 4 instruction. However, implementing
mov [bx+2],dx w[(reg.DS<<4)+reg.BX+2] add edx, eax such optimizations in a generic way
call [ebp+offset_memory_mapper] is computationally expensive.
mov cx, [ebp+offset_reg_DX]
mov [eax], cx

46 VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

Original code Intermediate language Translated code Comments

mov ax, [ebp+ offset_reg_Flags] The previous ‘mov’ instructions in
add al, 7Fh the original code do not affect CF,
Loadflags reg.Flags sahf but the corresponding translated code
inc bx inc reg.BX, -, reg.BX lea eax, [ebp+offset_reg_BX] sequences do so. The IL only allows
Saveflags reg.Flags inc w[eax] saving all flags and not individual
lahf ones, so we must load the native
seto al flags prior to executing the ‘inc’
mov [ebp+ offset_reg_Flags], ax instruction that doesn’t affect CF, in
order to preserve the correct value of
the CF flag.
lea eax, [ebp+offset_reg_IP] The code sequence for a jump
mov ebx, addr_L_decrypt instruction must update the
movzx ecx, w[ebp+offset_reg_CS] instruction pointer register, compute
shl ecx, 4 the virtual address of the next
sub ebx, ecx instruction and provide the DT
mov [eax], bx engine information about the jump
lea eax, [ebp+offset_reg_CX] instruction, such as type of jump and
dec w[eax] whether the jump was taken or not. It
lea eax, [ebp+offset_reg_T32] is possible to replace ‘jmp L_return’
sub addr_L_endloop, mov ebx, eax with a ‘ret’ instruction, but this would
(reg.CS<<4), reg.IP mov eax, 0 negate the possibility of calling a
setz al debug trace function from the
loop L_decrypt dec reg.CX, -, reg.CX mov [ebx], eax generated code after each instruction
L_endloop: setz -, -, reg.T32 mov ecx, [ebp+offset_reg_T32] sequence, making debugging of
jecxz L_jump_taken generated code more difficult. The
jopz reg.T32, mov b[ebp+offset_jump_info], 2 final ‘ret’ instruction is not generated
(reg.CS<<4)+reg.IP–24h jmp L_return while translating the jump
L_jump_taken: instruction; it is generated after the
mov edx, -24h entire block has been translated. This
add dx, [ebp+offset_reg_IP] ensures that any translated block will
movzx eax, w[ebp+offset_reg_CS] be able to return control to the caller,
shl eax, 4 even if the original code in that block
add edx, eax doesn’t end with a jump instruction.
mov [ebp+offset_crt_address], edx
mov b[ebp+offset_jump_info], 3
L_return: ret

The following format was used for IL instructions: <opcode source_operand_1, source_operand_2, destination_operand>. For
example, ‘add x, y, d’ means ‘d = x + y’. Loadflags and Saveflags are not separate IL instructions; they are encoded in the binary
form of affected IL instructions. In this particular implementation, the IL language accepts simple operands, such as registers and
constants, as well as more complex operands, like registers shifted with a constant, a sum of register, constant or shifted operands,
etc. It is possible to design the IL to only support simple operands, in which case each IL instruction would be simpler and faster
to generate, but more IL instructions would be needed, in average, to translate an original instruction.

VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be 47
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.
DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

APPENDIX B
The following table shows comparative speed test results,
obtained by benchmarking a prototype implementation of the
dynamic translation method presented in this paper, versus the
emulator in the last version of RAV anti-virus engine. Within
each test, the DT prototype and the emulator analysed the
exact same instructions and scanned for malware with the
same signature set. The best time of three consecutive runs
was selected, for each test.
The files used for the first three tests were infected with
polymorphic file infectors; both the emulator and the DT
prototype had 100% detection rate on these test sets. The test
set for the last test consisted of 100 copies of an executable
file containing a nested decryptor loop – interior loop having
1,020 iterations, exterior loop having 256 iterations.
The benchmark results indicate that:
• The DT method provides a significant speed
improvement over emulation, in all tests.
• In the case of emulation, speed performance is not
affected, in most cases, by the complexity of the analysed
files. The time required to emulate a given code sequence
is proportional with the type and number of instructions
emulated. An emulator takes little or even no advantage
of repeating code sequences (a slight improvement is
noticed for the last test set).
• The speed performance provided by the DT method,
relative to native execution, improves with the average
number of analysed instructions per file, as the
probability of finding repeating code sequences
increases. Thus, detection of heavily polymorphic
viruses, requiring millions of instructions to be analysed,
can be accomplished in a reasonable time by using
dynamic translation.

test method analyzed analyzed average total MIPS average

number instructions files instructions time P4@3GHz slowdown
x 1,000,000 per file x 1000 [seconds] rate*
1 emulation 63 6.3 317
394 7070 58
DT 15 26 77
2 emulation 65 6.5 307
422 2000 211
DT 11 38.3 52
3 emulation 68 6.3 317
DT 427 738 579 8 53.4 37
4 emulation 54 7.7 260
DT 418 100 4180 7 60 33

* the average slowdown rate is defined as the time needed by emulation / DT to analyse a given test code divided by the time
needed by a real CPU to natively execute the same code. An average of 1.5 clock cycles per instruction was used for the purpose of
estimating the total time required for native execution of the files in the test sets. Actual execution time would be difficult to
measure given that infected files were used.

48 VIRUS BULLETIN CONFERENCE OCTOBER 2005 ©2005 Virus Bulletin Ltd. No part of this reprint may be
reproduced, stored in a retrieval system, or transmitted in any form without the prior written permission of the publishers.

07-Gray Hat Hacking. The Ethical Hacker's Handbook, Sixth Edition
No ratings yet
07-Gray Hat Hacking. The Ethical Hacker's Handbook, Sixth Edition
1 page
HW
100% (1)
HW
7 pages
Beashelp English
No ratings yet
Beashelp English
2,468 pages
Intelligent Platform Management Interface Specification v1.0
100% (1)
Intelligent Platform Management Interface Specification v1.0
205 pages
Fundamentals of Cyber Security 2
No ratings yet
Fundamentals of Cyber Security 2
52 pages
EIST Unlocked Example Rev 1.1a
No ratings yet
EIST Unlocked Example Rev 1.1a
14 pages
Intelligence Report Final
No ratings yet
Intelligence Report Final
145 pages
Data Structure-Basic Terminology
80% (10)
Data Structure-Basic Terminology
7 pages
TinyXPB (Windows XP 32-Bit Bootkit)
67% (3)
TinyXPB (Windows XP 32-Bit Bootkit)
28 pages
PXROS
No ratings yet
PXROS
74 pages
Remnux Tools Sheet
No ratings yet
Remnux Tools Sheet
20 pages
Digit
No ratings yet
Digit
108 pages
Converter Log Analysis
No ratings yet
Converter Log Analysis
18 pages
70-411 R2 Test Bank Lesson 01
No ratings yet
70-411 R2 Test Bank Lesson 01
10 pages
EFE Manual English Rev420 C
No ratings yet
EFE Manual English Rev420 C
479 pages
Hiren Boot CD
No ratings yet
Hiren Boot CD
20 pages
IPMI Intelligent Chassis Management Bus Bridge Specification v1.0
No ratings yet
IPMI Intelligent Chassis Management Bus Bridge Specification v1.0
83 pages
Armadillo 4.20: Removing The Armour: A Naked Animal
No ratings yet
Armadillo 4.20: Removing The Armour: A Naked Animal
45 pages
Learning Cypher Sample Chapter
No ratings yet
Learning Cypher Sample Chapter
26 pages
Adobe Actionscript 3 Class Diagram: For Adobe Flash Player 9 and Adobe AIR
100% (2)
Adobe Actionscript 3 Class Diagram: For Adobe Flash Player 9 and Adobe AIR
5 pages
Hibernate Mock Test III
No ratings yet
Hibernate Mock Test III
6 pages
Picmg - Comdg - 2.0 Released 2013 12 061 PDF
100% (1)
Picmg - Comdg - 2.0 Released 2013 12 061 PDF
218 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Improving Disassembly and Decompilation
100% (1)
Improving Disassembly and Decompilation
88 pages
Serverless Handbook
100% (1)
Serverless Handbook
360 pages
Kishore Vajja - Qa Resume
No ratings yet
Kishore Vajja - Qa Resume
11 pages
Omnia 3 Turbo 3fm 3am 3net 3drm Manual Version 2.1
100% (1)
Omnia 3 Turbo 3fm 3am 3net 3drm Manual Version 2.1
98 pages
SoftICE Command Reference PDF
No ratings yet
SoftICE Command Reference PDF
274 pages
Practical Malware Analysis Essentials For Incident Responders
No ratings yet
Practical Malware Analysis Essentials For Incident Responders
38 pages
ICT-CSS 10 Q3 Week1
No ratings yet
ICT-CSS 10 Q3 Week1
8 pages
Active Directory Glazami Khakera 2021 Ralf Hacker
No ratings yet
Active Directory Glazami Khakera 2021 Ralf Hacker
174 pages
Testdisk Manual
No ratings yet
Testdisk Manual
71 pages
Debugger Armv8a
No ratings yet
Debugger Armv8a
168 pages
70-411 R2 Test Bank Lesson 03
0% (1)
70-411 R2 Test Bank Lesson 03
11 pages
Function Hooking and Windows DLL Injection PDF
No ratings yet
Function Hooking and Windows DLL Injection PDF
15 pages
VSD 241
No ratings yet
VSD 241
64 pages
Notes On LonWorks-1
No ratings yet
Notes On LonWorks-1
8 pages
IP Header: Type of Service (TOS) Total Length
No ratings yet
IP Header: Type of Service (TOS) Total Length
1 page
Vmhunt ccs18
No ratings yet
Vmhunt ccs18
17 pages
E-Commerce Chapter 3
No ratings yet
E-Commerce Chapter 3
57 pages
Code Injection and Hooking
No ratings yet
Code Injection and Hooking
54 pages
Malware
No ratings yet
Malware
23 pages
Scala (Cheatsheet)
100% (2)
Scala (Cheatsheet)
2 pages
Malicious PDF Files Detecting and Analyzing
No ratings yet
Malicious PDF Files Detecting and Analyzing
26 pages
Microsoft Message Analyzer v1.4 Known Issues
No ratings yet
Microsoft Message Analyzer v1.4 Known Issues
14 pages
Proposal Project
No ratings yet
Proposal Project
10 pages
3 To 5
No ratings yet
3 To 5
12 pages
WRD Notes
No ratings yet
WRD Notes
13 pages
CSE3999 Technical Answers For Real World Problems (TARP)
No ratings yet
CSE3999 Technical Answers For Real World Problems (TARP)
22 pages
Using MagicISO To Create ISO Image File
No ratings yet
Using MagicISO To Create ISO Image File
2 pages
A Novel Method For Malware Detection On ML-based Visualization Technique
No ratings yet
A Novel Method For Malware Detection On ML-based Visualization Technique
41 pages
70-411 R2 Test Bank Lesson 08
No ratings yet
70-411 R2 Test Bank Lesson 08
10 pages
Survey of Unpacking Malware
No ratings yet
Survey of Unpacking Malware
17 pages
DESFire
No ratings yet
DESFire
23 pages
Reverse-Engineering and Implementation of The RDP 5 Protocol
No ratings yet
Reverse-Engineering and Implementation of The RDP 5 Protocol
121 pages
Evolution of Computer Virus Concealment and Anti-Virus Techniques: A Short Survey
No ratings yet
Evolution of Computer Virus Concealment and Anti-Virus Techniques: A Short Survey
9 pages
Safedisc
No ratings yet
Safedisc
35 pages
Network Security KA Webinar - Slides
No ratings yet
Network Security KA Webinar - Slides
61 pages
Pe Infection Tutorial
No ratings yet
Pe Infection Tutorial
35 pages
Google Chrome: Dhrumin Shah Niket Anand Nupur Seth Priya Bora Sarang Bafna Vishal Chogle
No ratings yet
Google Chrome: Dhrumin Shah Niket Anand Nupur Seth Priya Bora Sarang Bafna Vishal Chogle
12 pages
IDA Pro Shortcuts
100% (2)
IDA Pro Shortcuts
1 page
Software Fundamentals Quiz SDLC
No ratings yet
Software Fundamentals Quiz SDLC
27 pages
Subliminal Channels in DSA
No ratings yet
Subliminal Channels in DSA
15 pages
Mimimorphism: A New Approach To Binary Code Obfuscation
No ratings yet
Mimimorphism: A New Approach To Binary Code Obfuscation
11 pages
Vector Webinar Security Manager
No ratings yet
Vector Webinar Security Manager
35 pages
Lab4.1 - VLAN - VTP - DTP
No ratings yet
Lab4.1 - VLAN - VTP - DTP
3 pages
How To Configure Wmi Access On Windows For A Non Admin User
No ratings yet
How To Configure Wmi Access On Windows For A Non Admin User
4 pages
Malware Detection
No ratings yet
Malware Detection
38 pages
Architectural Styles
No ratings yet
Architectural Styles
26 pages
Phishing Attack Seminar
No ratings yet
Phishing Attack Seminar
20 pages
Java Design Patterns Interview Questions
No ratings yet
Java Design Patterns Interview Questions
5 pages
What Is UML
No ratings yet
What Is UML
5 pages
MorphoManager Universal BioBridge Integration - AccessIt!
No ratings yet
MorphoManager Universal BioBridge Integration - AccessIt!
6 pages
Data Science Wrangling
No ratings yet
Data Science Wrangling
121 pages
Ovation Software Load Kit: Section 1. Introduction
No ratings yet
Ovation Software Load Kit: Section 1. Introduction
114 pages
GUB Online Library Management System Project Report
No ratings yet
GUB Online Library Management System Project Report
9 pages
DB Managment Ch9
No ratings yet
DB Managment Ch9
4 pages
APU Lecture 3 - SQAP
No ratings yet
APU Lecture 3 - SQAP
18 pages
A Program To Find The GCD/HCF of Two Numbers Entered Through The Keybouard
No ratings yet
A Program To Find The GCD/HCF of Two Numbers Entered Through The Keybouard
12 pages
React Developer Job Description
No ratings yet
React Developer Job Description
3 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
EnCase Endpoint Investigator v24.3 Release Notes
No ratings yet
EnCase Endpoint Investigator v24.3 Release Notes
24 pages
Getting Started With MySQL Command Line
No ratings yet
Getting Started With MySQL Command Line
8 pages
Wired Equivalent Privacy (WEP) Attacks
No ratings yet
Wired Equivalent Privacy (WEP) Attacks
46 pages
Memoir Project Report PDF
No ratings yet
Memoir Project Report PDF
45 pages
Jpan Sbom
No ratings yet
Jpan Sbom
82 pages
SynopsisOfProject by Ekta
No ratings yet
SynopsisOfProject by Ekta
12 pages
How To Install GCC Compiler On Ubuntu (3 Simple Methods)
No ratings yet
How To Install GCC Compiler On Ubuntu (3 Simple Methods)
9 pages
637945601029523788CSE 20CS42P W1 S1 Sy
No ratings yet
637945601029523788CSE 20CS42P W1 S1 Sy
8 pages

Defeating Polymorphism Beyond Emulation

Uploaded by

Defeating Polymorphism Beyond Emulation

Uploaded by

DEFEATING POLYMORPHISM: BEYOND EMULATION STEPAN

DEFEATING POLYMORPHISM: Of course, any combination of the above could be used as a

Discovering and delimiting basic blocks is a dynamic process,

an emulator could be very slow, especially for large packed

Original code Intermediate language Translated code Comments

Original code Intermediate language Translated code Comments

test method analyzed analyzed average total MIPS average

You might also like