This paper describes how code protection is done via “virtual machines” and techniques used in popular virtual machines, giving a considerable level of understanding of such virtual machines for readers from beginners to professionals
This paper describes how code protection is done via “virtual machines” and techniques used in popular virtual machines, giving a considerable level of understanding of such virtual machines for readers from beginners to professionals
by Nooby This paper describes how code protection is done via virtual machines and techniques used in popular virtual machines, giving a considerable level of understanding of such virtual machines for readers from beginners to professionals. Why Virtual Machines? In the early years of software protection, techniques like obfuscation and mutation were developed, such methods insert junk codes into the oriinal code flow, chane the oriinal instructions to their synonyms, replace constants with some calculations, insert conditional and unconditional branches and write random bytes in the hives between e!ecution code"makin sequential disassemblin fail#, etc$ %!ample of such processin are included as e!ample&obfuscated$e!e$ 'ver time, the level of comple!ity introduced with such methods become insufficient due to the development of debuin tools"many now features run(time tracin# and averae human skill$ It is possible for an averae skilled reverse enineer to manually or semi(automatically clean up the code to make it readable)alterable$ *eavier obfuscation increases the size of protected code dramatically but brins little increase in comple!ity$ +eople bein to seek a method of code protection aainst such brute force$ ,y breakin down instructions into a e!ecution loop with a set of reusable micro operations, the lenth of code bein e!ecuted can be increased e!ponentially without rowin much in size$ -uch loop code act like an .emulator/ of the oriinal code"aka$ Interpreter#, with takes in a flow of data"aka$ +esudo(code or +(code#, do micro operations"aka$ *andlers#, which much like a .virtual machine/ e!ecutes on its own instruction set$ 0his rows into the term1 code virtialization$ How Does a Virtual Machine Work? 2e know that a real processor has reisters, instruction decoders and e!ecution loic$ Virtual machines are about the same$ 0he virtual machine entry code will collect conte!t information from the real processor and store on its own conte!t, the the e!ecution loop will read +(Code and dispatch to the correspondin handler$ 3nd when the virtual machine e!its, it will update real processor reisters from its stored conte!t$ 4or a quick e!ample, there is a function bein e!ecuted via a pseudo virtual machine1 'riinal Instructions1 add ea!, eb! retn ,y transform it into virtualized code1 push address&of&pcode jmp VM%ntry VM%ntry1 push all reister values jmp VM5oop VM5oop1 fetch p(code from VM%I+ dispatch to handler VMInit1 pop all reister values into VMConte!t pop address&of&pcode into VM%I+ jmp VM5oop 3dd&%36&%,6&*ander1 do .add ea!, eb!/ on VMConte!t jmp VM5oop VM7etn1 restore reister values from VMConte!t do .retn/ Note that a virtual machine does not and need not to emulate all !89 instructions, some can be e!ecuted as(is by the real processor, which takes the virtual machine to e!it at certain point, point %I+ to the raw instruction and then re(enter VM$ 0he actual virtual machine handlers are usually desined more eneric as opposed to the e!ample handler above$ :sually the +(Code also determines operands$ 0he .3dd&%36&%,6&*ander/ can be defined as an .3dd&*andler/ which takes ; parameters and produce a result$ 0here will also be load)store reister handlers with produces)saves parameters and results$ ,y doin so makes the handlers more reusable so that tracin throuh such handlers without understandin the virtual machine architecture cannot not make a ood understandin of oriinal code$ Now we see how it works on a stack(based virtual machine1 3dd&*ander1 pop 7%< = 7%< > parameter; add ?-03C@A, 7%< = ?-03C@A points to parameterB <et7%<&*andler1 fetch +(Code for operand push VMC'N0%60?operandA = push value of 7%< on stack -et7%<&*andler1 fetch +(Code for operand pop VMC'N0%60?operandA = pop value of 7%< from stack 0he +(Code of above function will be1 Init <et7%< %,6 <et7%< %36 3dd -et7%< %36 7etn What Modern Virtual Machines Do Against Reverse Engineering? Code obfuscation and mutation are important to code virtualization, as the virtual machine Interpreters are directly e!posed, they can help protectin the virtual machine aainst automated analysis tools$ *eavily obfuscated virtual machine handlers can take a while to de(obfuscate without the knowlede of its underlyin virtual machine architecture$ -ince some of the processor reisters are not used"VMConte!t are stored separately and the virtual machine Interpreter can be desined to use only a few reisters#, they can be used as e!tra obfuscation$ Virtual machine handlers can be desined to have as little operand)conte!t dependency as possible$ 4urthermore, real stack pointer can be tracked within the VMConte!t, stack can be junked durin interpreter loop$ 2ith these aspects, code obfuscation and mutation can be very effective$ 3n e!ample of such obfuscated virtual machine can be found in e!ample&virtualized$e!e Now we know how the e!ecution part of virtual machines are protected, letCs continue to see what techniques are used durin transformin instructions into +(Codes, which is the part of awesomeness in code virtualization$ Instruction Decomposition Logical Instructions In the approach of increased comple!ity and reusabiliy, loical operations can be broken into operation like N3ND)N'7, accordin to the followin1 N'0"6# > N3ND"6, 6# > N'7"6, 6# 3ND"6, E# > N'0"N3ND"6, E## > N'7"N'0"6#, N'0"E## '7"6, E# > N3ND"N'0"6#, N'0"E## > N'0"N'7"6, E## 6'7"6, E# > N3ND"N3ND"N'0"6#, E#, N3ND"6, N'0"E### > N'7"3ND"6, E#, N'7"6, E## Arithmetic Instructions -ubtraction can be substituted by addition, with the overhead of %453<- calculation1 -:,"6, E# > N'0"3DD"N'0"6#, E## 0akin the %453<- before the final N'0 as 3 and the %453<- after the final N'0 as ,, the calculation is as follows1 %453<- > '7"3ND"3, F!8BG#, 3ND",, N'0"F!8BG### = F!8BG masks '4, 34, +4 and C4 Register Astraction -ince a virtual machine can have more reisters than an actual !89 processor, real processor reisters can be dynamically mapped to virtual machine reisters, the e!tra reisters can be used to store intermediate values or simply be confusions$ 0his also allows further obfuscation)optimization across instructions as described below$ Context Rotation Due to reister abstraction, different pieces of +(Code can have different reister mappins, and such correspondence can be desined to chane from time to time, makin reverse enineerin more difficult$ 0he virtual machine only swaps the value on its conte!t when the ne!t piece of +( Code has different reister mappins$ 2hen transformin instructions like 6C*<, it can simply chane the mappins of reisters and not producin any +(Code$ -ee the followin e!ample1 'riinal Instruction1 !ch eb!, ec! add ea!, ec! +(Code 2ithout Conte!t 7otation1 <et7%< 7; = 7; > %C6 <et7%< 7B = 7B > %,6 -et7%< 7; = %C6 > value of %,6 -et7%< 7B = %,6 > value of %C6 <et7%< 7; <et7%< 7F = 7F > %36 3dd -et7%< 7F +(Code 2ith Conte!t 7otation"e!chae done durin +(Code eneration#1 ?Map 7B > %C6, 7; > %,6A = e!chane <et7%< 7B = 7B > %C6 <et7%< 7F = 7F > %36 3dd -et7%< 7F = 7F > %36 -uch rotation can also be applied to the last -et7%< operation, so that the result of addition will be written to another unused virtual machine reister"i$e$ 7H#, leavin the 7F with useless data$ 0he followin piece of +(Code operates on H virtual machine reisters, makes it hard for reverse enineers to find its !89 equivalent$ After Exchange 7eal 7eisters Virtual 7eisters %36 7F %,6 7; %C6 7B Before Exchange 7eal 7eisters Virtual 7eisters %36 7F %,6 7B %C6 7; urrent !egister "appings 7eal 7eisters Virtual 7eisters %36 7F %,6 7B %C6 7; +(Code 2ith Conte!t 7otation ;1 ?Map 7B > %C6, 7; > %,6A = e!chane <et7%< 7B = 7B > %C6 <et7%< 7F = 7F > %36 3dd ?Map 7F > :nused, 7H > %36A = rotation -et7%< 7H = 7H > %36 Register Aliasing 2hen processin assinment instructions, especially assinment between reisters, it is possible to make temporary mappins between source and destination reisters$ :nless the source reister is about to be chaned"which forces a remappin or a <et7%< I -et7%< operation#, this mappin can redirect read access to destination reister to its source without actually perform the assinment$ 0ake the followin piece of code as an e!ample1 'riinal Instructions1 mov ea!, ec! add ea!, eb! mov ec!, ea! mov ea!, eb! +(Code1 ?Make alias 7F > 7;A <et7%< 7B = 7B > %,6 <et7%< 7; = readin of 7F redirects to 7; 3dd ?7F"%36# is bein chaned, since 7F is destination of an alias, just clear its aliasA ?Map 7F > :nused, 7H > %36A = rotation -et7%< 7H = 7H > %36 ?Make alias 7; > 7HA <et7%< 7B ?7H"%36# is bein chaned, since 7H is source of an alias, we need to do the assinmentA ?Map 7H > %C6, 7; > %36A = we can simplify the 7; > 7H assinment by rotation ?Map 7F > %36, 7H > :nusedA = another rotation -et7%< 7F = 7F > %36 urrent !egister "appings 7eal 7eisters Virtual 7eisters %36 7F %,6 7B %C6 7; Register !sage Analysis <iven the conte!t of a set of instructions, it can determined that at some point the value of certain reisters are chaneable without affectin the proram loic, and some overhead of %453<- calculations can be omitted$ 4or e!ample, a piece of code at F!JF9K38 in e!ample$e!e1 +:-* %,+ M'V %,+, %-+ = %36L%C6L%,+L'4L-4LM4L+4LC4 -:, %-+, F!BF = %36L%C6L'4L-4LM4L+4LC4 M'V %C6, D2'7D +07 ?%,+NF!8A = %36L%C6L'4L-4LM4L+4LC4 M'V %36, D2'7D +07 ?%C6NF!BFA = %36L'4L-4LM4L+4LC4 +:-* %-I = '4L-4LM4L+4LC4 M'V %-I, D2'7D +07 ?%,+NF!CA = %-IL'4L-4LM4L+4LC4 +:-* %DI = '4L-4LM4L+4LC4 M'V %DI, %-I = %DIL'4L-4LM4L+4LC4 -:, %DI, D2'7D +07 ?%C6NF!CA = '4L-4LM4L+4LC4 3DD %-I, (F!J = %C6L'4L-4LM4L+4LC4 -*7 %DI, F!4 = %C6L'4L-4LM4L+4LC4 M'V %C6, %DI = %C6L'4L-4LM4L+4LC4 IM:5 %C6, %C6,F!;FJ = '4L-4LM4L+4LC4 5%3 %C6, D2'7D +07 ?%C6N%36NF!BJJA = '4L-4LM4L+4LC4 M'V D2'7D +07 ?%,+(F!BFA, %C6 = '4L-4LM4L+4LC4 M'V %C6, D2'7D +07 ?%-IA = %C6L'4L-4LM4L+4LC4 D%C %C6 = '4L-4LM4L+4LC4 0%-0 C5, F!B = '4L-4LM4L+4LC4 M'V D2'7D +07 ?%,+(F!JA, %C6 ONM F!JF9C,8 3nalysis in comments shows the unused reister)fla state before the instruction$ 0his information is used to enerate reister rotations, %453<- calculation omission and junk operations which makes enerated +(Code even harder to analyze$ "ther #$%ode "&uscations ' "ptimi(ations Const Encryption Intermediate values and constants from the oriinal instructions can be transformed into calculated results durin run(time, thus decrease the chance of constants bein directly e!posed in +(Codes$ Stack Obfuscation 0he virtual machineCs stack can be obfuscated by pushin)writin random values due to the fact that the real %-+ can be calculated)tracked from VMConte!t$ Multiple Virtual Machine Interpreters It is possible to use multiple virtual machines to e!ecute one series of +(Code$ 'n certain points, a special handler leadin to another interpreter loop is e!ecuted$ 0he +(Code data after such points are processed in a different virtual machine$ 0hese virtual machines need only to share the intermediate run(time information such as reister mappins on switch points$ 0racin such +(Code will need to analyze all virtual machine instances, which is considerably much more work$ Re&erences #"$rotect http%&&vmpsoft.com& ode #irtuali'er http%&&www.oreans.com& (afengine http%&&www.safengine.com& !e)olf*s x+, #irtuali'er http%&&rewolf.pl&stuff&x+,.virt.pdf -ll./B0 http%&&www.oll.dbg.de& #"(weeper http%&&forum.tuts1.ou.com&topic&234556vmsweeper&
Chatlog 2-22-14 To 4 - 27 - 14 - Weekend Performance Tuning - Analyzing With DBA Skillsets - Every Sat - Sun 10 - 00 Am To 5 - 00 PM 2014-04-19 13 - 58
Chatlog 2-22-14 To 4 - 27 - 14 - Weekend Performance Tuning - Analyzing With DBA Skillsets - Every Sat - Sun 10 - 00 Am To 5 - 00 PM 2014-04-19 13 - 58