VMProtect 2 - Part Two, Complete Static Analysis - Back Engineering
VMProtect 2 - Part Two, Complete Static Analysis - Back Engineering
Table Of Contents
Purpose
Intentions
Definitions
VMProtect 2 - Project’s Overview
VMHook - Overview
VMHook - Example, um-hook
VMProfiler - Overview
VMProfiler - Virtual Machine Handler Profiling
VMProfiler - Virtual Branch Detection Algorithm
VMProfiler Qt - Overview
VMProfiler CLI - Overview
VMEmu - Overview
VMEmu - Unicorn Engine, Static Decryption Of Opcodes
VMEmu - Virtual Branching
VMAssembler - Overview
VMAssembler - Assembler Stages
VMAssembler - Stage One, Lexical Analysis and Parsing
VMAssembler - Stage Two, Virtual Instruction Encoding
VMAssembler - Stage Three, Virtual Instruction Encryption
VMAssembler - Stage Four, C++ Header Generation
VMAssembler - Example
VTIL - Getting Started
VTIL - The Basic Block
VTIL - VMProfiler Lifting
https://back.engineering/21/06/2021/#definitions 1/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
Purpose
The purpose of this article is to expound upon the prior work disclosed in the last
article titled “VMProtect 2 - Detailed Analysis of the Virtual Machine Architecture”, as
well as correct a few mistakes. In addition, this post will focus primarily on the creation
of static analysis tools using the knowledge disclosed in the prior post, and providing
some detailed, albut unofficial, VTIL documentation. This article will also showcase all
projects on githacks.org/vmp2, however, these projects are subject to change.
Intentions
Definitions
https://back.engineering/21/06/2021/#definitions 2/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMHook - Overview
VMHook is a very small C++ framework for hooking into VMProtect 2 virtual machines,
um-hook inherits this framework and provides a demonstration of how to use the
https://back.engineering/21/06/2021/#definitions 3/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
framework. VMHook is not used to uncover virtual instructions and their functionality,
rather to alter them.
.data
__mbase dq 0h
public __mbase
.code
__lconstbzx proc
xor al, bl
dec al
ror al, 1
neg al
xor bl, al
je swap_val
sub rbp, 2
mov [rbp], ax
jmp rax
sub rbp, 2
mov [rbp], ax
jmp rax
__lconstbzx endp
end
VMProfiler - Overview
https://back.engineering/21/06/2021/#definitions 4/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMProfiler is a C++ library which is used for static analysis of VMProtect 2 binaries.
This is the base project for VMProfiler Qt, VMProfiler CLI, VMEmu, and VMAssembler.
VMProfiler also inherits VTIL and contains virtual machine handler profiles and lifters.
Virtual machine handlers are found and categorized via a pattern matching algorithm.
The first iteration of this algorithm simply compared the native instructions bytes.
However this has proven to be ineffective as changes to the native instruction which
don’t result in a different outcome but do change the native instructions bytes will
cause the algorithm to miscatagorize or even fail to recongnize virtual machine
handlers. Consider the following instruction variants, all of which when executed have
the same result but each has their own unique sequence of bytes.
7: 00 00
In order to handle such cases, a new iteration of the profiling algorithm has been
designed and implemented. This new rendition still pattern matches, however for each
instruction of a virtual machine handler a lambda is defined. This lambda takes in a
ZydisDecodedInstruction parameter, by reference, and returns a boolean. The result
being true if a given decoded instruction meets all of the comparison cases. The usage
of zydis for this purpose allows for one to compare operands at a much finer level. For
example, operand two from both instructions in the figure above is of type
ZYDIS_OPERAND_TYPE_MEMORY. In addition, the base of this memory operand for both
instructions is RAX. The mnemonic of both instructions is the same. This sort of
minimalist comparison thinking is what this rendition of the profiling algorithm is
based off of.
vm::handler::profile_t readq = {
"READQ",
READQ,
NULL,
},
https://back.engineering/21/06/2021/#definitions 5/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
} } } };
In the figure above, the READQ profile is displayed. Notice that not every single
instruction for a virtual machine handler must have a zydis lambda for it. Only enough
for a unique profile to be constructed for it. There are in fact additional native
instructions for READQ which are not accounted for with zydis comparison lambdas.
The most glaring consistency in a virtual branch is the usage of PUSHVSP. This virtual
instruction is executed when two encrypted values are on the stack at VSP + 0, and VSP
+ 8. These encrypted values are decrypted using the last LCONSTDW value of a given
block. Thus a trivially small algorithm can be created based upon these two
consistencies. The first part of the algorithm will simply use std::find_if with reverse
iterators to locate the last LCONSTDW in a given code block. This DWORD value will be
interpreted as the XOR key used to decrypt the encrypted relative virtual addresses of
both branches. A second std::find_if is now executed to locate a PUSHVSP virtual
instruction that when executed, two encrypted relative virtual addresses will be
located on the stack. The algorithm will interpret the top two stack values of every
PUSHVSP instruction as encrypted relative virtual addresses and apply an XOR
operation with the last LCONSTDW value.
if ( code_block.vinstrs.back().mnemonic_t == vm::handler::VMEXIT )
return {};
// find the last LCONSTDW... the imm value is the JMP xor decrypt key...
jcc_data jcc;
https://back.engineering/21/06/2021/#definitions 6/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
//
result = std::find_if(
code_block.vinstrs.rbegin(), code_block.vinstrs.rend(),
vinstr.trace_data.vsp.qword[ 0 ] ^ xor_key ),
vinstr.trace_data.vsp.qword[ 1 ] ^ xor_key );
// our hands dirty and look into trying to emulate each branch
return false;
} );
if ( result == code_block.vinstrs.rend() )
jcc.has_jcc = false;
jcc.type = jcc_type::absolute;
else
result->trace_data.vsp.qword[ 0 ] ^ xor_key );
result->trace_data.vsp.qword[ 1 ] ^ xor_key );
jcc.has_jcc = true;
jcc.type = jcc_type::branching;
return jcc;
Note: the underlying flag in which the virtual branch is dependent on is not extracted
using this algorithm. This is one of the negative aspects of this algorithm as it stands.
https://back.engineering/21/06/2021/#definitions 7/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMProfiler Qt - Overview
VMProfiler Qt is a small C++ Qt based GUI that allows for inspection of virtual
instruction traces. These traces are generated via VMEmu and contain all information
for every virtual instruction. The GUI contains a window for virtual register values,
native register values, the virtual stack, virtual instructions, expandable virtual
branches, and lastly a tab containing all virtual machine handlers and their native
instructions, and transformations.
VMProfiler CLI is a command line project which is used to demonstrate all VMProfiler
features. This project only consists of a single file (main.cpp), however it’s a good
reference for those who are interested in inheriting VMProfiler as their code base.
Options:
https://back.engineering/21/06/2021/#definitions 8/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMEmu - Overview
Options:
In order to statically decrypt virtual instruction operands, one must first understand
how these operands are encrypted in the first place. The algorithm VMProtect 2 uses
to encrypt virtual instruction operands can be represented as a mathematical formula.
Thus:
Furthermore:
https://back.engineering/21/06/2021/#definitions 9/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
Considering the above figure, decryption of operands is merely the inverse of function
F . This inverse is generated into native x86_64 instructions and embedded into each
virtual machine handler as well as calc_jmp. One could simply emulate these
instructions via reimplementation of them in C/C++, however my implementation of
such instructions is merely for the purpose of encryption, not decryption. Instead, the
usage of unicorn-engine is preferred in this situation as by simply emulating these
virtual machine handlers, decrypted operands will be produced.
Understand that no runtime value can possibly affect the decryption of operands, thus
invalid memory accesses can be ignored. However, runtime values can alter which
virtual instruction blocks are decrypted, thus the need for saving the context of the
emulated CPU prior to execution of a branching virtual instruction. This will allow for
restoring the state of the emulated CPU prior to the branching instruction, but
additionally altering which branch the emulated CPU will take, allowing for complete
decryption of all virtual instruction blocks statically.
To reiterate, the usage of unicorn-engine is for computing F (e, o) and G(e, o) where
e takes the form of the native register RBX , o takes the form of the native register
RAX , and Tm,Fn takes the form of transformation mth.
In addition, not only can decrypted operands be obtained using unicorn-engine, but
views of the virtual stack can be snapshotted for every single virtual instruction. This
allows for algorithms to take advantage of values that are on the stack. Calls to native
WinAPI’s are done outside of the virtual machine, except for rare cases such as the
VMProtect 2 packer virtual machine handler which calls LoadLibrary with a pointer to
the string “NTDLL.DLL” in RCX.
Seeing all code paths is extremely important. Consider the most basic situation where
a parameter is checked to see if it’s a nullptr.
auto demo(int* a)
if (!a)
return {};
Analysis of the above code without being able to see all code paths would result in
something useless. Thus seeing all branches inside of the virtual machine was the top
priority. In this section I will detail how virtual branching works inside of the
https://back.engineering/21/06/2021/#definitions 10/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMProtect 2 virtual machine, as well as the algorithms I’ve designed to recognize and
analyze all paths.
To begin, not all code blocks end with a branching virtual instruction. Some end with
virtual machine exit’s, or absolute jumps. Thus the need for an algorithm which can
determine if a given virtual instruction block will branch or not. In order to produce
such an algorithm, intimate knowledge of the virtual machine branching mechanism is
required, specifically how native JCC’s are translated to virtual instructions.
Consider the possible affected flag bits of the native ADD instruction. Flags OF, SF, ZF,
AF, CF, and PF can all be affected depending on the computation. Native branching is
done via JCC instructions which depend upon the state of a specific flag or flags.
jz branch_1
Figure 2.
Consider figure 2, understand that the JZ native instruction will jump to “branch_1” if
the ZF flag is set. One could reimplement figure 2 in such a way that only the native
JMP instruction and a few other math and stack operations could be used. Reducing
the number of branching instructions to a single native JMP instruction.
Consider that the native TEST instruction performs a bitwise AND on both operands,
sets flags accordingly, and disregards the AND result. One could simply replace the
native TEST instruction with a few stack operations and the native AND instruction.
0: 50 push rax
1: 48 21 c0 and rax,rax
4: 9c pushf
f: 58 pop rax
Figure 3.
Note: bittest/test is not used here as it is implemented via AND, and SHR.
Although it may seem that converting a single instruction into multiple may be
counterproductive and requiring more work in the end, this is not the case as these
instructions will be reused in other orientations. Reimplementation of all JCC
instructions could be done quite simply using the above assembly code template. Even
https://back.engineering/21/06/2021/#definitions 11/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
such branching instructions as the JRCXZ, JECXZ, and JCXZ instructions could be
implemented by simply swapping RAX with RCX/EAX/CX in the above example.
Figure 3, although in native x86_64, provides a solid example of how VMProtect 2 does
branching inside of the virtual machine. However, VMProtect 2 adds additional
obfuscation via math obfuscation. Firstly, both addresses pushed onto the stack are
encrypted relative virtual addresses. These addresses are decrypted via XOR.
Although XOR, SUB, and other math operations themselves are obfuscated into NAND
operations.
LCONSTQ 0x19edc194
LCONSTQ 0x19ed8382
PUSHVSP
; calculate which branch will be executed, then read its encrypted address on the stack
LCONSTBZXW 0x3
LCONSTBSXQ 0xbf
LREGQ 0x80
NANDQ
SREGQ 0x68
SHRQ
SREGQ 0x70
ADDQ
SREGQ 0x48
READQ
SREGQ 0x68
SREGQ 0x70
SREGQ 0x90
; put the selected branch encrypted address back onto the stack...
LREGQ 0x68
LREGQ 0x68
LCONSTDW 0xa60934c9
NANDDW
SREGQ 0x48
LCONSTDW 0x59f6cb36
LREGDW 0x68
NANDDW
SREGQ 0x48
NANDDW
SREGQ 0x90
SREGQ 0x70
; …
https://back.engineering/21/06/2021/#definitions 12/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
LREGQ 0x70
JMP
Figure 4.
As discussed prior, VMProtect 2 uses the XOR operation to decrypt and subsequently
encrypt the relative virtual addresses pushed onto the stack. Selection of a specific
encrypted relative virtual address is done by shifting a given flag to result in its value
being either zero or eight. Then, adding VSP to the resulting shift computes the
address in which the encrypted relative virtual address is located.
return result;
Figure 5. Note: Notice that FIRST_CONSTANT and SECOND_CONSTANT are inverses of each
other.
VMAssembler - Overview
https://back.engineering/21/06/2021/#definitions 13/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
VMAssembler uses LEX and YACC to parse text files for virtual instruction names and
immediate values. There are four main stages to VMAssembler, lexical analysis and
parsing, virtual instruction encoding, virtual instruction encryption, and lastly C++ code
generation.
Lexical analysis and token parsing are two stages themselves, however I will be
referring to these stages as one as the result of these is data structures manageable
by C++.
The first stage of VMAssembler is almost entirely handled by LEX and YACC. Text is
converted into C++ structures representing virtual instructions. These structures are
referred to as _vinstr_meta and _vlable_meta. These structures are then used by stage
https://back.engineering/21/06/2021/#definitions 14/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
two to validate virtual instructions existence, as well as encoding these higher level
representations of virtual instructions into decrypted virtual operands.
Virtual instruction encoding stage of assembling also validates the existence of all
virtual instructions for each virtual label. This is done by comparing profiled vm
handler names with the virtual instruction name token. If a virtual instruction does not
exist then assembling will cease.
label_data->vinstrs.begin(), label_data->vinstrs.end(),
std::printf( "> vinstr name = %s, has imm = %d, imm = 0x%p\n", vinstr.
vinstr.has_imm, vinstr.imm );
std::printf( "[!] this vm protected file does not have the vm handler
https://back.engineering/21/06/2021/#definitions 15/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
vinstr.name.c_str() );
return true;
} );
} ) )
exit( -1 );
Once all virtual instruction IL is validated, encoding of these virtual instructions can
commence. The order in which the virtual instruction pointer advances is important to
note throughout the process of encoding and encrypting. The direction dictates the
ordering of operands and virtual instructions.
Just like stage two of assembly, stage three must also take into consideration which
way the virtual instruction pointer advances. This is because operands must be
encrypted in an order based upon the direction of VIP’s advancement. The encryption
https://back.engineering/21/06/2021/#definitions 16/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
key produced by the last operands encryption is used for the starting encryption key
for the next as detailed in “VMEmu - Unicorn Engine, Static Decryption Of Opcodes”.
This stage will do F −1 (e, o) and G−1 (e, o) for each virtual instruction operand of
each label. Lastly, the relative virtual address from vm_entry to the first operand of
the first virtual instruction is calculated and then encrypted using the inverse
transformations used to decrypt the relative virtual address to the virtual instructions
themselves. You can find more details about these transformations inside of the
vm_entry section of the last article.
Stage four is the final stage of virtual instruction assembly. In this stage C++ code is
generated. The code is completely self contained and environment agnostic. However,
there are a few limitations to the current implementation. Most glaring is the need for
a RWX (read, write, and executable) section. If one were to use this generated C++
code in a Windows kernel driver then the driver would not support HVCI systems. Also,
as of 6/19/2021, MSVC cannot compile the generated header as for whatever reason,
the static initializer for the raw module causes the compiler to hang. You must use
clang-cl if you want to compile with the generated header file from VMAssembler.
VMAssembler - Example
Once a C++ header has been generated using VMAssembler you can now include it
into your project and compile using any compiler that is not MSVC as the MSVC
compiler for some reason cannot handle such a large static initializer which the
protected binary is contained in, clang-cl handles it however. Each label that you
define will be inserted into the vm::calls enum. The value for each enum entry is the
encrypted relative virtual address to the virtual instructions of the label.
https://back.engineering/21/06/2021/#definitions 17/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
namespace vm
get_hello = 0xbffd6fa5,
get_world = 0xbffd6f49,
};
//
// ...
//
template < calls e_call, class T, class... Ts > auto call( const Ts... args ) -> T
for ( auto idx = 0u; idx < sizeof( call_map ) / sizeof( _pair_t< u8, calls > );
if ( call_map[ idx ].second == e_call )
You can now call any label from your C++ code by simply specifying the vm::calls
enum entry and the labels return type as templated params.
#include <iostream>
#include "test.hpp"
int main()
Output
The VTIL project as it currently stands on github has some untold requirements and
dependencies which are not submoduled. I have created a fork of VTIL which
submodule’s keystone and capstone, as well as describes the Visual Studios
configurations that must be applied to a project which inherits VTIL. VTIL uses C++
https://back.engineering/21/06/2021/#definitions 18/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
2020 features such as the concept keyword, thus the latest Visual Studios (2019) must
be used, vs2017 is not supported. If you are compiling on a non-windows/non-visual
studios environment you can ignore the last sentence.
Note: maybe this will become a branch in VTIL-Core, if so, you should refer to the
official VTIL-Core repository if/when that happens.
Another requirement to compile VTIL is that you must define the NOMINMAX macro prior
to any inclusion of Windows.h as std::numeric_limits has static member functions (max,
and min). These static member function names are treated as min/max macros and
thus cause compilation errors.
#define NOMAXMIN
#include <Windows.h>
The last requirement has to do with dynamic initializers causing stack overflows. In
order for your compiled executable containing VTIL to not crash instantly you must
increase the initial stack size. I set mine to 4MB just for precaution as I have a large
amount of dynamic initializers in VMProfiler.
// Creates a new block connected to this block at the given vip, if already explored re
// should still be called if the caller knowns it is explored since this function creat
//
//
fassert( is_complete() );
//
//
https://back.engineering/21/06/2021/#definitions 19/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
Once a basic block has been created, one can start appending VTIL instructions
documented at https://docs.vtil.org/ to the basic block object. For every defined VTIL
instruction a templated function is created using the “WRAP_LAZY” macro. You can
now “emplace_back” any VTIL instruction with ease in your virtual machine handler
lifters.
//
#define WRAP_LAZY(x) \
template<typename... Tx> \
{ \
return this; \
#undef WRAP_LAZY
Take an example for the virtual machine handler lifter LCONSTQ. The lifter simply adds
a VTIL push instruction which pushes a 64bit value onto the stack. Note the usage of
vtil::operand to create a 64bit immediate value operand.
vm::lifters::lifter_t lconstq = {
// push imm<N>
vm::handler::LCONSTQ,
https://back.engineering/21/06/2021/#definitions 20/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
} };
VMProfiler simply loops over all virtual instructions for a given block and applies
lifters. Once all code blocks are exhausted, vtil::optimizer::apply_all is called. This is the
climax of VTIL currently as some of these optimization passes are targeted toward
stack machined based obfuscation. The purpose of submodeling VTIL in vmprofiler is
for these optimizations as programming these myself would take months of research.
Compiler optimization is a field of its own, interesting, but not something I have the
time to pursue at the moment so VTIL will suffice.
Although I have done much work on VMProtect 2, the main success of my endeavors
has truly been statically uncovering all virtual branches and producing a legible IL.
Additionally doing all of this in a, well documented, open source, C++ library which can
be inherited further by other researchers. I would not consider the work I’ve done
anything close to a “finished product” or something that could be presented as such, it
is merely a step in the right direction for devirtualization. The last word of the last
sentence leads me to my next point.
https://back.engineering/21/06/2021/#definitions 21/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
elegant way of going about this that I am simply oblivious to at this time. Thus my
conclusion to devirtualization: it is not a job for a single person, thus the goal of my
project(s) has never been devirtualization, it’s always been an IL view of the virtual
instructions with VTIL providing deobfuscation pseudo code. The IL alone is enough
for a dedicated individual to begin research, the VTIL pseudo code makes it easier for
the rest of us. VMProfiler Qt combined with IDA Pro as it currently exists can be used
to analyze binaries protected with VMProtect 2. It may not be a beginner friendly
solution, but in my opinion, it will suffice.
I must note that it is not a far stretch of the mind to assume private entities have well
rounded solutions for VMProtect 2. I can imagine what a team of individuals, much
more skilled than myself, working on devirtualization day in and day out would
produce. On top of this, considering the length of time VMProtect 2 has been public,
there has been ample time for these private entities to create such tools.
原文
提供更好的翻译建议
https://back.engineering/21/06/2021/#definitions 22/23
2021/6/24 VMProtect 2 - Part Two, Complete Static Analysis // Back Engineering
https://back.engineering/21/06/2021/#definitions 23/23