Multi-stage Binary Code Obfuscation using Improved Virtual Machine
Multi-stage Binary Code Obfuscation using Improved Virtual Machine
1 Introduction
X. Lai, J. Zhou, and H. Li (Eds.): ISC 2011, LNCS 7001, pp. 168–181, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Multi-stage Binary Code Obfuscation Using Improved Virtual Machine 169
2 Related Work
Most existing obfuscation techniques on binary code fall into three categories:
– data transformation, such as name renaming and string encryption.
– instruction transformation, which replaces binary instructions using a library
of equivalent instructions.
– control flow transformation, which transforms the graph structure of pro-
gram control flow.
Data transformation does not alter program controls. Even the encrypted data
will have to be decrypted inside the program for use. The code for decryption
again faces the attack from reverse engineering. Therefore data obfuscation is
usually applied together with other complicated obfuscation techniques to in-
crease security [26,16,35].
Control flow transformation is relatively complicated [41,18,14,30,1]. Typi-
cally a control flow flattening method puts all basic blocks into a single switch
statement which maintains whole control flow. It obfuscates the order in which
the computations are carried out, in order to stand against static analysis. How-
ever, constant propagation on the switch variable will expose the next block
to be executed. Besides, one large switch statement will generate many jumps
which decreases program performance. Opaque predicates are boolean expres-
sions whose values are known to the obfuscator but difficult for adversary to de-
duce. Junk codes are usually inserted into the dead path of an opaque predicate.
However, for the same reason as above, there still exists risk that an adversary
may figure out the value of an opaque predicate by static analysis.
Instruction transformation refers to replacement of protected binary instruc-
tion with a block of instructions which is functionally equivalent [20,19,23,29,32].
The introduced blocks representing native instruction are written as byte-codes
170 H. Fang et al.
into the program. Those byte-codes are often maintained by a virtual machine
integrated with the obfuscated program. In practice, instruction transformation
works well against static analysis except for runtime disassembly. However, lit-
tle theoretical work has been carried out to show guarantee on its security and
performance on obfuscated software.
Virtual machine (VM) based obfuscation recently becomes popular for soft-
ware obfuscation, and it is probably the most sophisticated in the literature
[36,34,32]. It usually integrates several obfuscation techniques including data per-
mutation, instruction institution, and control flow transformation. As a result,
VM obfuscation is fairly good against dynamic analysis in practice [40,37,31].
We observe the common way how VM obfuscator works, and summarize a gen-
eral code structure for the program before and after obfuscation as shown in
Figure 1. Generally speaking, a VM section will be appended to the original
program, and the protected binary code will be transformed to byte-code, which
is interpreted by a VM core. Finally, the entry point of the program will be
redirected into VM code. To fulfil the byte-code fetching, VM core still needs
to save all registers and flags in its own context, and to restore upon exiting
byte-code interpretation.
Classical VM obfuscators suffer two drawbacks. Firstly, they generate ob-
fuscated software which runs much slower than the original one. It is largely
because of byte-code interpretation working style [37,40]. Secondly, the security
of VM obfuscated program relies merely on an uncustomized VM core inte-
grated with program rather than each individual program. VM does not restore
byte-codes to original instructions any more. Therefore success of attacking ob-
fuscated program requires two steps: understanding VM code, and decoding
mapping between binary instructions and byte-codes. One round VM obfusca-
tion will output relatively intelligible mapping, which allows an adversary to
perform instruction level analysis, and further to reconstruct the structure of
original software [34,32].
The existing works are promising under certain situations. However, the dan-
ger of software cracking is always changing and increasing [38,24]. Therefore we
propose a new approach on software obfuscation in next section, introducing a
more light-weighted obfuscator which generates harder understanding codes.
3 Our Approach
In this section we firstly introduce the concept of black box security, then present
new design of block-to-byte virtual machine, and describe a framework of multi-
stage code obfuscation based on improved virtual machine.
A program obfuscator is often regarded as a processor on computer programs,
which outputs a new program of the same functionality but with unreadable
code structure [28,10]. More precisely, a program obfuscator O is theoretically
defined to be a probabilistic Turing machine or Boolean circuit, which satisfies
three requirements [3]:
Although Barak et al. [3] further proved that this kind of universal black box
obfuscator does not exist, the theoretical concept is still useful in evaluating
performance of code obfuscators. In other words, a good obfuscator shall as best
as possible promise three properties: function equivalence, code efficiency, and
black box security. In light of these requirements we present our customized VM
obfuscator below.
Figure 2 shows the format for binary instructions and VM byte-codes respec-
tively. It also gives an example how a binary instruction was transformed into
byte-code together with an implementation.
VM dispatcher works on stack based style: it saves registers for native code
and create own VM stack. The return value of last execution for each byte-code
was saved in VM registers (var RegEip and var RegDI in Figure 3) for next byte-
code execution. VM dispatcher then obtains the target address by searching a
jump table using byte-code as index. Target address is the location that current
instruction will transfer to. VM obfuscator retrieves all target addresses of the
Multi-stage Binary Code Obfuscation Using Improved Virtual Machine 173
original program in four different ways: for direct jump, target address is specified
in the original instruction; for conditional jump, there are two target addresses
with a predicate; for call instruction, one target address is set for called function,
and another one for return address; and for return instruction, target address is
stored on the stack.
Ki = f (Pi ),
Pi+1 = Obf (Pi , Ki ).
The function f maps any program into a key in binary string, satisfying that: f
must have one-way hardness, and the output key can characterize the program.
The examples of this type of function include: MD5 hash value of program where
the program is feed as data, or the number of nodes in program’s control flow graph.
4 Security Analysis
This section analyzes the security of multi-stage obfuscated program in two as-
pects: code efficiency and black box security. Specifically we strengthen the black
box security by introducing code polymorphism during multi-stage obfuscation,
and improve the code efficiency by removing unnecessary jump instructions dur-
ing block-to-byte VM obfuscation.
C(k + 1) = W ∗ C(k)L
k−1
= W ∗ (W L +...+L+1 L
)
L +...+L2 +L
k
= W ∗ (W )
Lk +...+L2 +L+1
=W ,
5 Experiments
The testing experiment on our multi-stage VM obfuscation module was carried out
on WinXP 2.4GHz CPU and 1G RAM platform. A demo of obfuscation out is given
Multi-stage Binary Code Obfuscation Using Improved Virtual Machine 177
code than classical VM obfuscator in one stage. However when given multi-stage
obfuscation, the execution time of obfuscated program increases quickly due to
more complicated obfuscation.
6 Conclusion
We have presented a new method to obfuscate code in multiple stages to protect
software from reverse engineering. The key idea is to implement a block-to-byte
virtual machine to interpret byte-codes, while modifying program structure itera-
tively. Block obfuscation hides the binary details into byte-codes while improving
the program execution efficiency; multi-stage obfuscation hides the control flow
of program in a more complicated level by using a polymorphism tree. Literally,
an adversary will have to decode all n variants of program to obtain the struc-
ture of original program. Meanwhile compared with classical byte-code virtual
machine obfuscation, block obfuscation makes the program run more efficiently
by removing unnecessary jump instructions.
References
1. Abadi, M., Plotkin, G.: On protection by layout randomization. In: 23rd IEEE
Computer Security Foundations Symposium, pp. 337–351 (2010)
2. Anckaert, B., Madou, M., De Sutter, B., De Bus, B., De Bosschere, K., Preneel,
B.: Program obfuscation: a quantitative approach. In: ACM Workshop on Quality
of Protection, pp. 15–20 (2007)
3. Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., Yang,
K.: On the (Im)possibility of obfuscating programs. In: Kilian, J. (ed.) CRYPTO
2001. LNCS, vol. 2139, pp. 1–18. Springer, Heidelberg (2001)
4. Beaucamps, P., Filiol, E.: On the possibility of practically obfuscating programs
towards a unified perspective of code protection. Journal in Computer Virology 3,
3–21 (2007)
5. Bitansky, N., Canetti, R.: On Strong Simulation and Composable Point Obfusca-
tion. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 520–537. Springer,
Heidelberg (2010)
Multi-stage Binary Code Obfuscation Using Improved Virtual Machine 179
6. Canetti, R., Dakdouk, R.R.: Obfuscating Point Functions with Multibit Output.
In: Smart, N.P. (ed.) EUROCRYPT 2008. LNCS, vol. 4965, pp. 489–508. Springer,
Heidelberg (2008)
7. Canetti, R., Tauman Kalai, Y., Varia, M., Wichs, D.: On Symmetric Encryption
and Point Obfuscation. In: Micciancio, D. (ed.) TCC 2010. LNCS, vol. 5978, pp.
52–71. Springer, Heidelberg (2010)
8. Cappaert, J., Preneel, B., Anckaert, B., Madou, M., De Bosschere, K.: Towards
tamper resistant code encryption: Practice and experience. In: Chen, L., Mu, Y.,
Susilo, W. (eds.) ISPEC 2008. LNCS, vol. 4991, pp. 86–100. Springer, Heidelberg
(2008)
9. Ceccato, M., Di Penta, M., Nagra, J., Falcarin, P., Ricca, F., Torchiano, M.,
Tonella, P.: The effectiveness of source code obfuscation -an experimental assess-
ment. In: The 17th IEEE International Conference on Program Comprehension
(ICPC), pp. 178–187. IEEE Computer Society, Los Alamitos (2009)
10. Collberg, C.: Tutorial: code transformation techniques for software protection. In:
ACM SIGPLAN 2009 Conference on Programming Language Design and Imple-
mentation, PLDI 2009 (2009)
11. Collberg, C., Thomborson, C.: Watermarking, tamper-proofing, and obfuscation
- tools for software protection. IEEE Transactions on Software Engineering 28,
735–746 (2002)
12. Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transforma-
tions. Technical report (1997)
13. DataRescue. The ida pro disassembler and debugger (2005),
http://www.hex-rays.com/idapro/
14. Ge, J.: Control flow based obfuscation. In: Proceedings of the 5th ACM Workshop
on Digital Rights Management (DRM), pp. 83–92. ACM Press, New York (2005)
15. Goldweisser, S.: On the impossibility of obfuscation with auxiliary input, pp. 553–
562. IEEE Computer Society, Los Alamitos (2005)
16. Hohenberger, S., Rothblum, G.N., Shelat, A., Vaikuntanathan, V.: Securely Ob-
fuscating Re-encryption. In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp.
233–252. Springer, Heidelberg (2007)
17. Hohenberger, S., Waters, B.: Constructing Verifiable Random Functions with Large
Input Spaces. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 656–
672. Springer, Heidelberg (2010)
18. Jhala, R., Majumdar, R.: Path slicing. In: Proceedings of ACM SIGPLAN Con-
ference on Programming Language Design and Implementation, PLDI 2005, pp.
38–47. ACM, New York (2005)
19. Kanzaki, Y., Monden, A., Nakamura, M.: A software protection method based
on instruction camouflage. IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences (Japanese Edition) J87-A(6):755-767, 47–
59 (2004)
20. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to
static disassembly. In: ACM Conference on Computer and Communications Se-
curity (CCS), pp. 290–299. ACM Press, New York (2003)
21. Lynn, B., Prabhakaran, M., Sahai, A.: Positive Results and Techniques for Obfusca-
tion. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027,
pp. 20–39. Springer, Heidelberg (2004)
22. Madou, M., Anckaert, B., De Bus, B., De Bosschere, K.: On the effectiveness of
source code transformations for binary obfuscation. In: Proc. of the Int’l Conf. on
Software Engineering Research and Practice (SERP 2006), pp. 527–533 (2006)
180 H. Fang et al.
23. Madou, M., Anckaert, B., Moseley, P., Debray, S.K., De Sutter, B., De Bosschere,
K.: Software protection through dynamic code mutation. In: Song, J.-S., Kwon, T.,
Yung, M. (eds.) WISA 2005. LNCS, vol. 3786, pp. 194–206. Springer, Heidelberg
(2006)
24. Madou, M., Van Put, L., De Bosschere, K.: Understanding obfuscated code. In:
14th IEEE Int’l Conf. on Program Comprehension (ICPC), pp. 268–274 (2006)
25. Mit, M.E., Ernst, M.D.: Static and dynamic analysis: synergy and duality. In:
WODA 2003: ICSE Workshop on Dynamic Analysis, pp. 24–27 (2003)
26. Monden, A., Monsifrot, A., Thomborson, C.: Security improvements for encrypted
interpretation. In: Proc. 3rd Workshop on Application Specific Processors (WASP)
Digest, pp. 19–26 (2004)
27. Naeem, N.A., Batchelder, M., Hendren, L.: Metrics for measuring the effectiveness
of decompilers and obfuscator. In: 15th IEEE Int’l. Conf. on Program Comprehen-
sion, pp. 253–258 (2007)
28. Ogiso, T., Sakabe, Y., Soshi, M., Miyaji, A.: Software obfuscation on a theoretical
basis and its implementation. IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences E86-A(1), 176–186 (2003)
29. Popov, I.V., Debray, S.K., Andrews, G.R.: Binary obfuscation using signals. In:
USENIX Security Symposium (2007)
30. Dalla Preda, M., Madou, M., De Bosschere, K., Giacobazzi, R.: Opaque Predicates
Detection by Abstract Interpretation. In: Johnson, M., Vene, V. (eds.) AMAST
2006. LNCS, vol. 4019, pp. 81–95. Springer, Heidelberg (2006)
31. Rolles, R.: X86 virtualizer (2008), http://rewolf.pl/
32. Rolles, R.: Unpacking virtualization obfuscators. In: Proceedings of the 3rd
USENIX Conference on Offensive Technologies, WOOT 2009, p. 1. USENIX As-
sociation (2009)
33. Schwarz, B., Debray, S.K., Andrews, G.R.: Disassembly of executable code revis-
ited. In: 10th Working Conference on Reverse Engineering, pp. 45–54 (2002)
34. Sharif, M., Lanzi, A., Giffin, J., Lee, W.: Automatic reverse engineering of malware
emulators. In: Proceedings of the 30th IEEE Symposium on Security and Privacy,
pp. 94–109. IEEE Computer Society, Los Alamitos (2009)
35. Sivadasan, P., Sojan Lal, P.: Jconsthide: a framework for java source code constant
hiding. CoRR (2009)
36. Smith, J.E., Nair, R.: Virtual machines: versatile platforms for systems and pro-
cesses. Morgan Kaufmann, San Francisco (2005)
37. Oreans Technologies. Code virtualizer, http://oreans.com/codevirtualizer.php
38. Udupa, S.K., Debray, S.K., Madou, M.: Deobfuscation: reverse engineering obfus-
cated code. In: 12th Working Conference on Reverse Engineering, pp. 45–54 (2005)
39. van Oorschot, P.C.: Revisiting Software Protection. In: Boyd, C., Mao, W. (eds.)
ISC 2003. LNCS, vol. 2851, pp. 1–13. Springer, Heidelberg (2003)
40. VMPsoft. Vmprotect software, http://www.vmprotect.ru/
41. Wang, C., Hill, J., Knight, J.C., Davidson, J.W.: Protection of software-based
survivability mechanism. In: Proceedings of the International Conference on De-
pendable Systems and Networks (formerly: FTCS), DSN 2001, pp. 193–202. IEEE
Computer Society, Los Alamitos (2001)
42. Wee, H.: On obfuscating point functions. In: Proceedings of the 37th Annual ACM
Symposium on Theory of Computing, STOC 2005, pp. 523–532. ACM, New York
(2005)
Multi-stage Binary Code Obfuscation Using Improved Virtual Machine 181