0% found this document useful (0 votes)
88 views10 pages

CCS2003 PDF

Uploaded by

Vas Vvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views10 pages

CCS2003 PDF

Uploaded by

Vas Vvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Obfuscation of Executable Code to Improve Resistance to

Static Disassembly

Cullen Linn Saumya Debray


Department of Computer Science
University of Arizona
Tucson, AZ 85721.
 linnc, debray  @cs.arizona.edu

Source Code
ABSTRACT
parsing
A great deal of software is distributed in the form of executable
code. The ability to reverse engineer such executables can create Syntax tree
opportunities for theft of intellectual property via software piracy, intermediate
code gen. and
as well as security breaches by allowing attackers to discover vul- control flow
analysis
decompilation

nerabilities in an application. The process of reverse engineering Compilation Reverse


Control flow graph
an executable program typically begins with disassembly, which Engineering
translates machine code to assembly code. This is then followed final code gen.
by various decompilation steps that aim to recover higher-level ab-
stractions from the assembly code. Most of the work to date on Assembly code
code obfuscation has focused on disrupting or confusing the de-
assembly disassembly
compilation phase. This paper, by contrast, focuses on the initial
disassembly phase. Our goal is to disrupt the static disassembly Machine code
process so as to make programs harder to disassemble correctly.
We describe two widely used static disassembly algorithms, and
discuss techniques to thwart each of them. Experimental results Figure 1: The processes of compilation and reverse engineering
indicate that significant portions of executables that have been ob-
fuscated using our techniques are disassembled incorrectly, thereby
making unauthorized modifications, or stealing intellectual prop-
showing the efficacy of our methods.
erty. These all require an ability to take an executable program and
reconstruct its high-level structure to some extent. For example, to
Categories and Subject Descriptors identify vulnerabilities in a software system, a hacker has to be able
K.6.5 [Management of Computing and Information Systems]: to figure out how it works and where it may be attacked. Similarly,
Security and Protectionunauthorized access to steal a piece of software with an embedded copyright notice or
software watermark, a pirate must reconstruct enough of its inter-
General Terms nal structure to be able to identify and delete the copyright notice
or watermark without affecting the functionality of the program.
Security The problem can be addressed by maintaining the software in
encrypted form and decrypting it as needed during execution [1], or
Keywords using specialized hardware (e.g., see [16]). While effective, such
disassembly, code obfuscation approaches have the disadvantages of high performance overhead
or loss of flexibility because the software can no longer be run on
1. INTRODUCTION stock hardware. An alternative approach, which we focus on, is
to use code obfuscation techniques to enhance software security
Advances in program analysis and software engineering technol- [9, 10, 11, 12, 28]. The goal is to deter attackers by making the
ogy in recent years have led to significant improvements in tools for cost of the reconstructing the high-level structure of the program
program analysis and software development. Unfortunately, this prohibitively high.
same technology can, in many cases, be subverted to reverse engi- The processes of compilation and reverse engineering are illus-
neer software systems with the goal of discovering vulnerabilities, trated in Figure 1. Compilation refers to the translation of a source-
This work was supported in part by the National Science Founda- language program to machine code; it consists of a series of steps,
tion under grants EIA-0080123 and CCR-0113633. each producing successively lower-level program representations.
Permission to make digital or hard copies of all or part of this work for Reverse engineering is the dual process of recovering higher-level
personal or classroom use is granted without fee provided that copies are structure and semantics from a machine code program. Broadly
not made or distributed for profit or commercial advantage and that copies speaking, we can divide the reverse engineering process into two
bear this notice and the full citation on the first page. To copy otherwise, to parts: disassembly, which produces assembly code from machine
republish, to post on servers or to redistribute to lists, requires prior specific code; and decompilation, which reconstructs the higher-level se-
permission and/or a fee.
CCS03, October 2730, 2003, Washington, DC, USA mantic structure of the program from the assembly code. Most
Copyright 2003 ACM 1-58113-738-9/03/0010 ...$5.00. of the prior work on code obfuscation and tamper-proofing focus
This paper focuses on static disassembly. There are two gener-
header
program entry point ally used techniques for this: linear sweep and recursive traversal
text section size
[22]. The remainder of this section sketches each of them.

2.1 Linear Sweep


The linear sweep algorithm begins disassembly at the input pro-
grams first executable byte, and simply sweeps through the entire
sections text section size
text section disassembling each instruction as it is encountered:
global startAddr, endAddr;
proc DisasmLinear(addr)
begin
 
while startAddr addr endAddr do  
I := decode instruction at address addr;
Figure 2: The structure of an executable file addr += length I ; 
od
on various aspects of decompilation. For example, a number of end
researchers suggest relying on the use of difficult static analysis
problems, e.g., involving complex Boolean expressions, pointers, proc main()
or indirect control flow, to make it harder to construct a precise begin
control flow graph for a program [3, 12, 20, 26, 27]. startAddr := address of the first executable byte;
The work described in this paper, by contrast, focuses on the endAddr := startAddr + text section size;
disassembly process. Our goal is to increase the difficulty of stat- DisasmLinear(ep);
ically disassembling a program. This is independent of, and com- end
plementary to, current approaches to code obfuscation. It is inde- This method is used by programs such as the GNU utility objdump
pendent of them because our techniques can be applied regardless [19] as well as a number of link-time optimization tools [8, 18, 24].
of whether or not any of these other obfuscating transformations are The main weakness of this algorithm is that it is prone to dis-
being used. It is complementary to them because, by making a pro- assembly errors resulting from the misinterpretation of data that is
gram harder to disassemble accurately, we add yet another barrier embedded in the instruction stream. Only under special circum-
to recovering high-level semantic information about a program. stances, e.g., when an invalid opcode is encountered, can the disas-
sembler become aware of such disassembly errors.
2. BACKGROUND: DISASSEMBLY 2.2 Recursive Traversal
A machine code file typically consists of a number of different The problem with the linear sweep algorithm is that, because it
sections, e.g., text, read-only data, etc., that contain various sorts does not take into account the control flow behavior of the program,
of information about the program, together with a header describ- it cannot go around data (e.g., alignment bytes, jump tables, etc.)
ing these sections. Among other things, the header contains infor- embedded in the instruction stream, and mistakenly interprets them
mation about the program entry point, i.e., the location in the file as executable code. An obvious fix would be to take into account
where the machine instructions begin (and where program execu- the control flow behavior of the program being disassembled in or-
tion begins), and the total size or extent of these instructions1 (see der to determine what to disassemble. Intuitively, whenever we en-
Figure 2) [15]. Disassembly refers to the process of recovering a counter a branch instruction during disassembly, we determine the
sequence of assembly code instructions from such a file, e.g., in a possible control flow successors of that instruction, i.e., addresses
textual format readable by a human being. where execution could continue, and proceed with disassembly at
Broadly speaking, there are two approaches to disassembly: those addresses (e.g., for a conditional branch instruction we would
static disassembly, where the file being disassembled is examined consider the branch target and the fall-through address):
by the disassembler but is not itself executed during the course of
disassembly; and dynamic disassembly, where the file is executed global startAddr, endAddr;
on some input and this execution is monitored by an external tool proc DisasmRec(addr)
begin
(e.g., a debugger) to identify the instructions that are being exe-
cuted. Static disassembly has the advantage of being able to pro-  
while startAddr addr endAddr do  
cess the entire file all at once, while dynamic disassembly only dis- if (addr has been visited already) return;
assembles a slice of the program, i.e., those instructions that were I := decode instruction at address addr;
executed for the particular input that was used. Another advantage mark addr as visited;
of static disassembly is that it takes time proportional to the size of if (I is a branch or function call)
the program, while the time taken by dynamic disassembly is typ- for each possible target t of I do
ically proportional to the number of instructions executed by the DisasmRec(t);
program at runtimethe former tends to be considerably less than od
return;
the latter (often by several orders of magnitude), making static dis-
assembly considerably more efficient than dynamic disassembly. else addr += length I ; 
od
1 This applies to most file formats commonly encountered in prac- end
tice, including Unix a.out, ELF, COFF, and DOS EXE files. The
information about the entry point and code section size is implicit proc main()
in DOS COM files. begin
switch (i) {
code for accessing the jump table
case 0 : ... (1) r := evaluate i
case 1 : ... (2) if r u N goto default
...
(3) r *= 4 jump table
case N-1 : ... code for case 0
(4) r += BaseAddr
default: ... BaseAddr
(5) jmp *r 0
} 1 code for case 1
.
N .
.

N1
code for case N1

(a) Source code (b) Implementation using a jump table

Figure 3: A example of a C switch statement and its implementation using a jump table

startAddr := program entry point; undisassembled portions of the text segment that appear to be code,
endAddr := startAddr + text section size; in the expectation that they might be the targets of indirect function
DisasmRec(startAddr); calls; a speculative bit is set when this is done, and speculative
end disassembly of a particular region of memory is abandoned if an
invalid instruction is encountered.
Variations on this basic approach to disassembly, which we term
recursive traversal, are used by a number of binary translation and
optimization systems [4, 23, 25]. 3. THWARTING DISASSEMBLY
A virtue of this algorithm is that, by following the control flow In order to thwart a disassembler, we have to somehow confuse,
behavior of the program being processed, it is able to go around as much as possible, its notion of where the instruction boundaries
and thus avoid disassembly of data embedded in the text section. in a program lie. This section discusses some ways in which this
Its main weakness is that its key assumption, that we can precisely can be achieved. We first discuss a phenomenon that we had not ex-
identify the set of control flow successors of each control transfer pected: that of disassembly errors that repair themselves within
operation in the program, may not always hold in the case of indi- a relatively short distance. This is followed by a discussion of a
rect jumps. Imprecision in determining the set of possible targets general technique we use to inject junk bytes into the instruction
of such a jump will result either in a failure to disassemble some stream to introduce disassembly errors. After this we discuss spe-
reachable code (if the set of targets is underestimated) or erroneous cific details of the way in which this is done to confuse the two
disassembly of data (if the set of targets is overestimated). disassembly algorithms discussed in the previous section.
Some researchers have proposed ad hoc extensions to the ba-
sic algorithm outlined above to handle common cases of indirect 3.1 Self-Repairing Disassembly
jumps. As an example, one of the most common uses of indirect On some instruction setsmost notably, that of the Intel IA-
jumps involves jump tables, a construct used by compilers to imple- 32 architecturethe instruction structure is such that, very often,
ment C-style switch statements [2]. This is illustrated in Figure the disassembly process is self-repairing: even when a disassembly
3. The jump table itself is a contiguous array of N code addresses, error occurs (e.g., due to the disassembly of data), the disassem-
corresponding to the N cases in the switch statement. The code bler eventually ends up re-synchronizing with the actual instruc-
to access the jump table evaluates the index expression i; checks tion stream. This is illustrated by the example in Figure 4, which
to see whether this expression falls within the bounds of the table, shows a typical byte sequence in memory, together with the actual
 
i.e., whether 0 i N; adds the scaled value of the index expres- disassembly (on the left), and the disassemblies we obtain if the
sion to the base address of the table to obtain the address of the ith disassembler is off by 1, 2, or 3 bytes, on the right. When the dis-
entry in the table; then jumps indirectly through this location. The assembly is initially off by a single byte, the disassembler produces
check of whether the index expression falls within the table bounds two erroneous instructions but is back in synchrony with the origi-
can be accomplished using a single unsigned comparison (denoted nal disassembly by the second instruction in the actual disassembly

by u in instruction (2) in Figure 3(b)) [2]. To determine the pos- sequence. A similar phenomenon occurs when the disassembler is
sible target addresses of an indirect jump through a jump table, a initially off by two bytes: it resynchronizes with the second instruc-
disassembler needs to know the base address of the table and its tion in the actual disassembly after producing a single incorrectly
extent, i.e., the values of BaseAddr and N in Figure 3(b). This disassembled instruction. If the disassembler is initially off by three
can be done by scanning back from the indirect jump instruction to bytes, it generates three incorrectly disassembled instructions but
find the instruction that adds the scaled index to the base address resynchronizes by the third instruction in the actual disassembly.
(instruction (4) in Figure 3(b)), whence the base address can be ex- Obviously, the actual resynchronization behavior on a particular
tracted; and the unsigned compare of the index (instruction (2) in program will depend on its particular distribution of instructions.
Figure 3(b)), whence the table size can be determined. Once this In practice, however, we have found that disassembly errors usually
has been done, disassembly can continue at each target identified resynchronize quite quicklyoften within just one or two instruc-
from the N table entries starting at location BaseAddr [6]. tions beyond the point at which the disassembly error occurred. Ef-
Code that is reachable only through indirect control transfers forts to confuse disassembly have to take this self-repairing aspect
may not be found using the basic algorithm above. To handle of disassembly into account.
this problem, some systems, e.g., the UQBT binary translation sys-
tem [5], resort to speculative disassembly. The idea is to process 3.2 Junk Insertion
memory
actual disassembly bytes (hex) 1 byte off 2 bytes off 3 bytes off
(synchronizes in 2 instrs.) (synchronizes in 1 instr.) (synchronizes in 3 instrs.)

8b
44 inc %esp
mov 4(%esp), %eax
24
and $4, %al and $4, %al
04
add $3, %al
03

44 inc %esp
add 12(%esp), %eax
24
and $12, %al
0c

83

sub $6, %eax e8

06

Figure 4: An example of self-repairing disassembly

We can introduce disassembly errors by inserting junk bytes at point to note here is that since the simulation of disassembly scans
selected locations in the instruction stream where the disassembler forward from each candidate to determine the number of junk
is likely to expect code. (An alternative approach involves partially bytes to be inserted there, it is important to ensure that such deci-
or fully overlapping instructions, e.g., see [7]: this is discussed in sions made for one candidate are not subsequently invalidated by
Section 5.) It is not difficult to see that any such junk bytes must the insertion of junk into subsequent candidates. To avoid such ef-
satisfy two properties. First, in order to actually confuse the disas- fects, we consider candidate blocks in reverse order when inserting
sembler, the junk bytes must be partial instructions, not complete junk.
instructions. Second, in order to preserve program semantics, such With the approach described thus far, we find that we are typi-
partial instructions must be inserted in such a way that they are cally able to attain a confusion factor of about 15% on average
unreachable at runtime. To this end, define a basic block as a can- i.e., 15% of the instructions in a program are incorrectly disassem-
didate block if it can have such junk bytes inserted before it. In bled (confusion factors are discussed in more detail in Section 4).
order to ensure that any junk so inserted is unreachable during exe- The reason that it is not higher is that candidates for the insertion of
cution, a candidate basic block cannot have execution fall through junk bytes cannot have execution fall through into them: the pre-
into it. In other words, the basic block immediately before a can- ceding block has to end in an unconditional control transfer. We
didate block must end in an unconditional control transfer, e.g., an have found that, in programs obtained from a typical optimizing
unconditional jump or a return from a function. Candidate blocks compiler, candidate blocks tend to be around 30 instructions apart
can be identified in a straightforward way by scanning the basic on average.2 This distance, combined with the self-repairing na-
blocks of the program after their final memory layout has been de- ture of disassembly, means that when disassembly goes wrong af-
termined. ter the insertion of junk before a candidate, it typically manages
As mentioned in Section 3.1, the static disassembly process very to re-synchronize before the next candidate is encountered. We in-
often manages to re-synchronize itself after a disassembly error. crease the number of candidates by a transformation called branch
Once a candidate block B has been identified, we have to determine flipping. The idea is to invert the sense of conditional jumps, by
what junk bytes to insert before it so as to confuse the disassembler converting code of the form
as much as possible, i.e., delay this re-synchronization for as long
as possible. To do this, we take a particular n-byte instruction I (our
 cc Addr
current implementation considers a 6-byte bitwise-OR instruction,
but it is easy to extend this to other instructions), and determine how where cc represents a condition, e.g., eq or ne, to
far away this re-synchronization would occur if the first k bytes of
I were to be inserted immediately before the candidate block B, for
cc L
 
each k, 0 k n. To determine the re-synchronization point, for
L:
jmp Addr
each such k we simulate disassembly for the candidate block, as-
suming that the disassembler encounters the first k bytes of I at the
where cc is the complementary condition to cc, e.g., a beq ...
beginning of B, then continuing with the byte sequence comprising
the machine-level encodings of the instructions actually in B. Us- is converted to a bne .... The basic block at L now becomes
ing this approach we determine the value kmax of k for which the a candidate. With this transformation, the distance between can-
re-synchronization distance is maximized, and insert the first kmax didate blocks drops to about 12 instructions on average, and the
bytes of I immediately before block B. confusion factor rises to about 37%. Yet another measure that can
be taken to increase candidates for junk insertions is call conver-
sion which raises instruction confusion to about 42%. This method
3.3 Thwarting Linear Sweep is discussed in more detail in section 3.4.2.
As observed in Section 2.1, linear sweep disassembly is gener-
ally unable to distinguish data embedded in the text section. We 2 These data reflect the SPECint-95 benchmark suite compiled us-
can exploit this weakness by inserting junk bytes at selected lo- ing gcc at optimization level -O3. The averages given here were
cations in the instruction stream, as discussed in Section 3.2. One computed as geometric means.
3.4 Thwarting Recursive Traversal the computation of the target address bi within the branch function,
The main strength of the recursive disassembly algorithmits we can make it difficult for an attacker to reconstruct the original
ability to deal intelligently with control flow and thereby disassem- map it realizes. The second is to create opportunities for mis-
ble around data embedded in the text segmentalso turns out to be leading a disassembler: since a disassembler will typically continue
a weakness that we can take advantage of to confuse the disassem- disassembly at the instruction following the call instruction, we can
bly process. There are two (related) aspects of recursive traversal introduce errors in the disassembly by inserting junk bytes at the
that we can exploit. The first is that when it encounters a control point immediately after each call f instruction in a manner
transfer, disassembly continues at those locations that are deemed similar to that discussed in Section 3.3.
to be the possible control transfer targets. In this context, disassem- Branch functions can be implemented in a number of ways. For
blers typically assume that commonly encountered control trans- example, a straightforward implementation might use the return ad-
fers, such as conditional branches and function calls, behave rea- dress to look up a table, via a simple linear or binary search, to de-
sonably. For example, a conditional branch is assumed to have termine the target address. The disadvantage with such schemes is
two possible targets: the branch target and the fall through to the that they relatively straightforward to reverse engineer.
next instruction. Similarly, a function call is assumed to return to A more sophisticated implementation of branch functions has the
the instruction immediately following the call instruction. callee pass, as an argument to the branch function, the offset from
The second aspect of recursive traversal is that identifying the the instruction immediately after it (whose address is passed to the
set of possible targets of indirect control transfers is difficult. Re- branch function as the return address) to the target bi . The branch
cursive traversal disassemblers therefore generally resort to ad hoc function simply adds the value of its argument to the return address,
techniques, such as examining bounds checks associated with jump so that the return address becomes the address of the original target
tables, or disassembling speculatively, to handle commonly en- bi . The code for this, on the Intel IA-32 architecture, might be as
countered situations involving indirect jumps. follows:4
Below we discuss different ways in which these characteristics xchg %eax, 0(%esp) # I1
can be exploited to confuse recursive traversal disassembly. add %eax, 8(%eax) # I2
pop %eax # I3
3.4.1 Branch Functions ret # I4
The assumption that a function returns to the instruction follow-
Instruction I1 exchanges the contents of register %eax with the
ing the call instruction can be exploited using what we term branch
word at the top of the stack, effectively saving the contents of %eax
functions. The idea is illustrated in Figure 5. Given a finite map
and at the same time loading the displacement to the target (passed
over locations in a program
to the branch function as an argument on the stack) into %eax.

a 
1 b1  a 
n bn  Instruction I2 then has the effect of adding this displacement to the
return address. I3 restores the previously saved value of %eax, and
a branch function f is a function that, whenever it is called from I4 then has the effect of branching to the address computed by the
one of the locations ai , causes control to be transferred to the cor- function.
 
responding location bi , 1 i n. Given such a branch function f , Our current implementation uses a variation on this idea that uses
we can replace n unconditional branches in a program, perfect hashing [14, 17] and is, we believe, harder to reverse engi-
neer. Once the final code layout has been determined and we know
a1 : jmp b1
... the mapping
 
a 1 b1  
an bn we want the branch func-
a2 : jmp b2 tion to implement, we create a perfect hash function h :
...
an : jmp bn  a   1  n 
h : a1 n

We then construct a table T in the data section of the binary, that


lists offsets for each  a  b  pair, as follows:
by calls to the branch function:
i i

T  h  a  b  a [for each i].


a1 : call f
... i i i
a2 : call f
... Upon invocation the branch function proceeds as follows:
 i
an : call f
apply the perfect hash function h to its return address a to
The code for the branch function is responsible for determining the
target location bi based on the location ai it was called from, then
compute a perfect hash value h a ;  
branching to the appropriate bi . Moreover, it has to do this in such  ii use the table T to obtain the offset T h a to the target;   
a way that the program state is that which would have been en-
countered at the location bi in the original code with unconditional  iii add this offset to the return address;
branches. Note that a branch function does not behave like nor-
mal functions, in that it typically does not return to the instruction
 iv return.
following the call instruction, but instead branches to some other Since the resulting code is quite a bit more expensive, in execution
location in the program that depends, in general, on where it was cost, than the single branch instruction in the original program, we
called from.3 use execution profile information to apply the transformation only
Branch functions serve two distinct purposes. The first is to ob- to code that is not hot, i.e., that is not frequently executed; the
scure the flow of control in the program: by sufficiently obscuring details are discussed in Section 4.
3 This can, however, have adverse performance implications on 4 If any of the condition code flags is live at the call point, they
some architectures, e.g., the Intel Pentium, by interfering with the have to be saved by the caller just before the call, and restored at
branch prediction and/or return stack buffer mechanisms. the target.
a 1 : jmp b1 b1 a 1 : call f b1
... f
...
a2 : jmp b2 b2 a2 : call f b2
... ...

a n : jmp bn
a n : call f bn
bn

(a) Original code (b) Code using a branch function

Figure 5: Branch functions

The complexity of a branch functions implementation, and the 3.4.4 Jump Table Spoofing
way in which it is accessed, offer an interesting tradeoff between In addition to simply inserting junk bytes at the fake target of an
execution speed, on the one hand, and difficulty of reverse engi- opaquely directed conditional branch, we can also insert artificial
neering, on the other. For example, we can choose different branch jump tables to mislead recursive traversal disassembly. We refer to
function implementations for jump instructions depending on their this technique as jump table spoofing.
execution frequencies: frequently executed jump instructions might Recall that, as discussed in Section 2.2, recursive traversal dis-
be directed to a lightweight branch function, less frequently exe- assemblers may attempt to use the bounds check for a jump table
cuted ones to a more complex branch function, and so on. to identify its size, and thereby determine the set of possible targets
of an indirect jump through a jump table. We can exploit this to
3.4.2 Call Conversion mislead a disassembler by introducing a jump table that is unreach-
A variation of the branch function scheme described in section able at runtime. The jump table code can be made unreachable ei-
3.4.1 can be used to extend the candidates for junk insertions to ther by using a conditional branch that uses an opaque predicate to
include those basic blocks directly following call instructions as jump around it, or by using an opaque expressionwhich is very
well. Recall that the reason junk bytes can typically not be inserted similar to an opaque predicate, expect that the value need not be
after call instructions is that control returns to the address directly simply a truth valuewhose value is guaranteed to fail the bounds
after the last byte of a call instruction upon completion of an in- check. The code addresses in this jump table can now be set to
voked function. This being the case, if junk bytes were inserted junk addressestext segment addresses that do not correspond
after a well-behaved call instruction then it would be possible for to actual instructionsand thereby cause disassembly errors.
control to reach the junk bytes and therefore violate the constraints A variation on this idea is to take an unconditional jump to an
of junk insertions described in section 3.2. 
address and convert it to an indirect jump through a jump table
One solution to this problem is to reroute call instructions 
where the address appears as the kth table entry. The table is
through a specialized branch function that branches to the intended indexed by the value of an opaque expression that always evalu-
target function via perfect hashing, as in the standard branch func- ates to k. However, the bounds check for the table uses a table
tion, but then returns to some predetermined offset from the origi- 
size m k, leading the disassembler to believe that the jump table
nal call instruction (i.e., the offset to the real successor instruction contains m entries. Only one of these m entriesnamely, the kth
that lies beyond some number of junk bytes). Using this method entrycontains a real code address: the other entries contain junk
we are able to obscure control flow information by making func- addresses.
tion entry points more difficult to decipher while also increasing
the potential to mislead the disassembler. 3.5 Implementation Status
We have implemented our ideas using PLTO, a binary rewrit-
3.4.3 Opaque Predicates ing system developed for Intel IA-32 executables [21]. The system
The assumption that a conditional branch has two possible tar- reads in statically linked executables,5 disassembles the input bi-
gets can be exploited by disguising an unconditional branch as a nary, and constructs a control flow graph. This control flow graph
conditional branch that happens, at runtime, to always go in one is then processed in one of two ways. If the user specifies that
directioni.e., either it is always taken, and never falls through; or profiling is to be carried out, instrumentation code is inserted to
it is never taken, and always falls through. This technique relies generate an edge profile when the resulting binary is executed. If,
on using predicates that always evaluate to either the constant true on the other hand, the user requests obfuscating transformations to
or the constant false, regardless of the values of their inputs: such be carried out, the system reads in edge profile information if avail-
predicates are known as opaque predicates [12]. Other researchers able, carries out branch flipping to increase the number of candidate
have discussed techniques for synthesizing opaque predicates; their blocks (Section 3.3), applies various obfuscating transformations,
ideas translate in a straightforward way to our context, so we do not and writes out the resulting executable.
discuss this issue further. The transformations currently implemented in the system are
Once an unconditional branch has been replaced by a conditional junk insertion (Section 3.2) and transformation of unconditional
branch that uses an opaque predicate, we have a locationeither jumps and call instructions to the respective branch function calls
the branch target or the fall through, depending on whether the 5 The requirement for statically linked executables is a result of the
opaque predicate is always false or always truethat appears to be fact that PLTO relies on the presence of relocation information to
a legitimate continuation for execution from the conditional branch distinguish addresses from data. The Unix linker ld refuses to
but, in fact, is not. We can then insert junk bytes at this point, as retain relocation information for executables that are not statically
discussed earlier, to mislead the disassembly. linked.
fraction of candidates converted
1.00 Thresholds
0.70
0.95
0.80
0.90 0.90
0.95
0.85 1.00

0.80

0.75
compress gcc go ijpeg li m88ksim perl vortex Mean
Program

(a) Fraction of jumps converted to branch function calls


6.0 Thresholds
0.70
5.0
slowdown factor

0.80
4.0 0.90
0.95
3.0 1.00

2.0

1.0
compress gcc go ijpeg li m88ksim perl vortex Mean
Program

(b) Slowdown in execution speed

Figure 6: Effect of hot code threshold on branch function conversion and execution speed

(Sections 3.4.1 and 3.4.2). We expect to have additional transfor- and functions. Intuitively, the confusion factor measures the frac-
mations, such as jump table spoofing (Section 3.4.4), implemented tion of program units (instructions, basic blocks, or functions) in
in the near future. the obfuscated code that were incorrectly identified by a disassem-
bler. More formally, let A be the set of all actual instruction ad-
dresses, i.e., those that would be encountered when the program
4. EXPERIMENTAL EVALUATION is executed, and P the set of all perceived instruction addresses,
We evaluated the efficacy of our techniques using the SPECint-
95 benchmark suite. Our experiments were run on an otherwise

i.e., those addresses produced by a static disassembly. A P is the
set of addresses that are not correctly identified as instruction ad-
unloaded 2.4 GHz Pentium IV system with 1 GB of main memory dresses by the disassembler. We define the confusion factor CF to
running RedHat Linux 8.0. The programs were compiled with gcc be the fraction of instruction addresses that the disassembler fails
version egcs-2.91.66 at optimization level -O3. The programs were to identify correctly:6
profiled using the SPEC training inputs and these profiles were used
to identify any hot spots during our transformations. The final per- CF
 A  P  ! A  .
formance of the transformed programs were then evaluated using
the SPEC reference inputs. Each execution time reported was de- Confusion factors for functions and basic blocks are calculated
rived by running seven trials, removing the highest and lowest times anologously: a basic block or function is counted as being in-
from the sampling, and averaging the remaining five. correctly disassembled if any of the instructions in it is incorrectly
We experimented with three different attack disassemblers to disassembled. The reason for computing confusion factors for ba-
evaluate our techniques. The first of these is the GNU objdump util- sic blocks and functions as well as for instructions is to determine
ity which employs a straight-forward linear sweep algorithm. The whether the errors in disassembling instructions are clustered in a
second, which we wrote ourselves, is a recursive disassembler that small region of the code, or whether they are distributed over sig-
incorporates a variation of speculative disassembly (see Section 2). nificant portions of the program.
In addition we also provide the recursive disassembler with extra As mentioned in Section 3.4.1, we transform jumps to branch
information about the address and size of each jump table in the function calls only if the jump does not occur in a hot basic block.
program as well as the start and end address of each function. The The first questions we have to address, therefore, are: how are hot
results obtained from this disassembler therefore serve as a lower basic blocks identified, and what is the effect of different choices of
bound estimate of the extent of obfuscation achieved. Our third dis- what constitutes a hot block on the extent of obfuscation achieved
assembler is IDA Pro [13], a commercially available disassembly and the performance of the resulting code? To identify the hot, or
tool that is generally regarded to be among the best disassemblers
available. 
6 We also considered taking into account the set P A of addresses
that are erroneously identified as instruction addresses by the disas-
For each of these, the efficacy of obfuscation was measured by sembler, but rejected this approach because it double counts the
computing confusion factors for the instructions, basic blocks, effects of disassembly errors.
Confusion factor (%)
P ROGRAM L INEAR SWEEP (O BJDUMP ) R ECURSIVE TRAVERSAL C OMMERCIAL (IDA P RO )
Instructions Basic blocks Functions Instructions Basic blocks Functions Instructions Basic blocks Functions
compress95 43.93 63.68 100.00 30.04 40.42 75.98 75.81 91.53 87.37
gcc 34.46 53.34 99.53 17.82 26.73 72.80 54.91 68.78 82.87
go 33.92 51.73 99.76 21.88 30.98 60.56 56.99 70.94 75.12
ijpeg 39.18 60.83 99.75 25.77 38.04 69.99 68.54 85.77 83.94
li 43.35 63.69 99.88 27.22 38.23 76.77 70.93 87.88 84.91
m88ksim 41.58 62.87 99.73 24.34 35.72 77.16 70.44 87.16 87.16
perl 42.34 63.43 99.75 27.99 39.82 76.18 68.64 84.62 87.13
vortex 33.98 55.16 99.65 23.03 35.61 86.00 57.35 74.55 91.29
Geo. mean 39.09 59.34 99.75 24.76 35.69 74.43 65.45 81.40 84.97

Figure 7: Efficacy of obfuscation: confusion factors (


1  0)
E XECUTION TIME ( SECS )
frequently executed, basic blocks, we start with a (user-defined)
   
fraction (0 0 1 0) that specifies what fraction of the total
P ROGRAM ( ) ( ) ( * )
Original Obfuscated Slowdown
T0 T1 T1 T0
number of instructions executed at runtime should be accounted for


by hot basic blocks. For example, 0 8 means that hot blocks
compress95
gcc
34.49
23.27
34.44
23.23
1.00
1.00
should account for at least 80% of all the instructions executed by go 53.17 53.08 1.00
the program. More formally, let the weight of a basic block be ijpeg 40.13 40.15 1.00
the number of instructions in the block multiplied by its execution li 26.50 42.91 1.62
frequency, i.e., the blocks contribution to the total number of in- m88ksim 28.18 30.02 1.07
structions executed at runtime. Let tot instr ct be the total number perl 28.62 37.71 1.32
of instructions executed by the program, as given by its execution vortex 48.84 49.05 1.00
profile. Given a value of , we consider the basic blocks b in the Geo. mean 1.13
program in decreasing order of execution frequency, and determine
the largest execution frequency N such that
Figure 8: Effect of obfuscation on execution speed (
1  0)
weight b & '
tot instr ct 
" #%$
b:freq b N last resort actually reverts to linear sweep for the speculative dis-
assembly of undisassembled code. Nevertheless, we find that, on
Any basic block whose execution frequency is at least N is consid- average, over 25% of the instructions in the program incur disas-
ered to be hot. sembly errors. As a result, over 35% of the basic blocks and close
The effect of varying the hot code threshold on performance to 74% of the functions, on average, are incorrectly disassembled
(both obfuscation and speed) is shown in Figure 6. Figure 6(a) using this disassembly method. This is achieved at the cost of a
shows the fraction of candidates that are converted to branch func- 13% penalty in execution speed (see Figure 8).
tion calls at different thresholds; this closely tracks the overall con- The recursive traversal data reported in Figure 7 are actually
fusion factors achieved. Figure 6(b) shows the concomitant degra- quite conservative since these were gathered using our own recur-
dation in performance. It can be seen, from Figure 6(a), that most sive disassembler which, as mentioned before, is supplied with ex-
programs have a small and well-defined hot spot, and as a result tra information to avoid unduly optimistic results. To evaluate the
varying the threshold from a modest 0.70 to a value as high as 1.0 efficacy of our techniques in a more realistic situation, we used a
does not dramatically affect the number of candidates converted. commercial disassembly tool, IDA Pro version 4.3x [13], which is
The benchmark that is affected the most is gcc, and even here over


79% of the candidates are converted at 1 0. On average, about
widely considered to be the most advanced disassembler available.



91% of the candidates are converted at 1 0. However, as il-
The results of this experiment are reported in Figure 7. It can be
seen that this tool fails on most of the program: close to 65% of the
lustrated in Figure 6(b), varying the hot code threshold has a sig-
nificant effect on execution speed. For example, at 0 70 the
 instructions, and about 85% of the functions in the program, are
disassembled incorrectly. Part of the reason for this high degree of
programs slow down by a factor of 3.67 on average, with the li failure is that IDA Pro only disassembles addresses that (it believes)
benchmark experiencing the largest slowdown, by a factor of 5.14. can be guaranteed to be instruction addresses. This has two effects:
However, as is increased the slowdown drops off quickly, to a fac-



tor of 3.14 at 0 9 and 1.62 at 1 0. In summary, choosing a
first, large portions of the code that are reached by branch function
addresses are simply not disassembled, being presented instead to
threshold of 1.0 still results in most of the candidate blocks in the the user as a jumble of hex data; and second, the location imme-
program being converted to branch function calls without excessive diately following a branch function call is treated as an address to
performance penalty. For the purposes of this paper, therefore, we
give measurements for 1 0.
 which control returns, and this causes some junk bytes to be erro-
neously disassembled. Overall, this shows that our techniques are
Figure 7 shows the efficacy of our obfuscation transformations effective even against state-of-the-art disassembly tools.
for both of the disassembly methods discussed in Section 2. The Finally, Figure 9 shows the impact of obfuscation on code size,
confusion factors achieved for linear sweep disassembly are quite both in terms of the number of instructions (which increases, for
modest: on average, 39% of the instructions, 59% of the basic example, due to branch flipping), as well as the number of bytes
blocks, and nearly 100% of the functions are incorrectly disassem- occupied by the text section. The latter includes the effects of the
bled. For recursive traversal, the confusion factors are somewhat new instructions inserted as well as all junk bytes added to the pro-
lower because in this case the disassembler can understand and deal gram. Overall, it can be seen that there is a 20% increase in the
with control flow somewhat better than with linear sweep and as a
NO. OF I NSTRUCTIONS T EXT SECTION SIZE ( BYTES )
P ROGRAM Original Obfuscated Change Original Obfuscated Change
(I0 ) (I1 ) *
(I1 I0 ) (S0 ) (S1 ) *
(S1 S0 )
compress95 74787 92137 1.231 265985 311095 1.169
gcc 327133 387289 1.183 1128273 1290419 1.143
go 124424 145953 1.173 468537 525232 1.121
ijpeg 105766 127012 1.200 363169 419535 1.155
li 89309 109652 1.227 310301 363801 1.172
m88ksim 104211 127358 1.222 368798 430845 1.168
perl 137947 169054 1.225 484194 566935 1.170
vortex 174960 204230 1.167 592076 672795 1.136
Geo. mean 1.204 1.154

Figure 9: Effect of obfuscation on code size (


1  0)
total number of instructions, and a 15% increase in the size of the harder to disassemble correctly, and to thereby sow uncertainty in
text section of the resulting executables. an attackers mind about which portions of a disassembled program
The techniques described here apply to a wide variety of archi- have been correctly disassembled and which parts may contain dis-
tectures. The insertion of partial instructions to confuse disassem- assembly errors. If the program has already been obfuscated using
bly, as discussed in Section 3.2, is applicable to variable-length any of these higher-level obfuscation techniques, our techniques
instruction sets, such as those on the widely used Intel Pentium add an additional layer of protection that makes it even harder to
and Motorola 680x0, as well as the StrongArm (together with the decipher the actual structure of the program.
Thumb 16-bit instruction encoding) and other mixed-mode archi- Even greater security may be obtained by maintaining the soft-
tectures such as the MIPS32/MIPS16. Branch functions and jump ware in encrypted form and decrypting it as needed during execu-
table spoofing can be used on any architecture. tion, as suggested by Aucsmith [1]; or using specialized hardware,
as discussed by Lie et al. [16]. Such approaches have the disad-
5. RELATED WORK vantages of high performance overhead (in the case of runtime de-
cryption in the absence of specialized hardware support) or a loss
The only work we are aware of that addresses the problem of
of flexibility because the software can no longer be run on stock
making executable programs harder to disassemble is by Cohen,
hardware.
who proposes overlapping adjacent instructions to fool a disassem-
bler [7]. We are not aware of any actual implementations of this
proposal. We implemented this idea as well as a number of varia- 6. CONCLUSIONS
tions on the basic scheme, but found the results disappointing: the A great deal of software is distributed in the form of executable
resulting confusion factors were typically less than 1%. The reason code. Such code is potentially vulnerable to reverse engineering, in
for this is that in order to overlap two adjacent instructions I and J, the form of disassembly followed by decompilation. This can al-
we have to satisfy several conditions, among them: low an attacker to discover vulnerabilities in the software, modify

 i execution cannot fall through from I to J; and


it in unauthorized ways, or steal intellectual property via software
piracy. This paper describes and evaluates techniques to make ex-
 ii the trailing k bytes of I must be identical with the leading k ecutable programs harder to disassemble. Our techniques are seen

bytes of J for some k 0. to be quite effective: applied to the widely used SPECint-95 bench-
mark suite, they result in disassembly errors over much of the pro-
There tend to be relatively very few candidates satisfying these cri- gram; the best commercially available disassembly tool fails to cor-
teria (e.g., the largest number of overlaps we achieved was for the rectly disassemble over 65% of the instructions, and 85% of the
gcc benchmark, where we found only 27 overlaps out of 360,152 functions, in the obfuscated binaries.
instructions; by contrast, our approach can use 11,205 candidates
before branch flipping and call conversion on this program, and Acknowledgements
56,925 candidates after branch flipping and call conversion). Vari-
We are grateful to Christian Collberg and Gregg Townsend for very
ations on this theme, e.g., by judicious insertion, immediately be-
helpful discussions and comments.
fore the instruction J, of no-ops or dead code that satisfy the second
condition above, do not seem to help matters significantly either.
This scarcity of candidates for overlapping, together with the self- 7. REFERENCES
repairing property of disassembly errors discussed in Section 3.1, [1] D. Aucsmith. Tamper-resistant software: An
results in poor confusion factor numbers using this approach. implementation. In Information Hiding: First International
There is a considerable body of work on code obfuscation that Workshop: Proceedings, volume 1174 of Lecture Notes in
focuses on making it harder for an attacker to decompile a program Computer Science, pages 317333. Springer-Verlag, 1996.
and extract high level semantic information from it [3, 12, 20, 26, [2] R. L. Bernstein. Producing good code for the case statement.
27, 28]. Typically, these authors rely on the use of computationally SoftwarePractice and Experience, 15(10):10211024,
difficult static analysis problems, e.g., involving complex Boolean October 1985.
expressions, pointers, or indirect control flow, to make it harder to [3] W. Cho, I. Lee, and S. Park. Againt intelligent tampering:
construct a precise control flow graph for a program. Of the ref- Software tamper resistance by extended control flow
erences cited, only Wroblewski focuses specifically on obfuscation obfuscation. In Proc. World Multiconference on Systems,
of executable programs [28]. Our work is orthogonal to these pro- Cybernetics, and Informatics. International Institute of
posals, and complementary to them. We aim to make a program Informatics and Systematics, 2001.
[4] C. Cifuentes and K. J. Gough. Decompilation of binary [22] B. Schwarz, Saumya K. Debray, and G. R. Andrews.
programs. SoftwarePractice and Experience, Disassembly of executable code revisited. In Proc. IEEE
25(7):811829, July 1995. 2002 Working Conference on Reverse Engineering (WCRE),
[5] C. Cifuentes and M. Van Emmerik. UQBT: Adaptable binary pages 4554, October 2002.
translation at low cost. IEEE Computer, 33(3):6066, March [23] R. L. Sites, A. Chernoff, M. B. Kirk, M. P. Marks, and S. G.
2000. Robinson. Binary translation. Communications of the ACM,
[6] C. Cifuentes and M. Van Emmerik. Recovery of jump table 36(2):6981, February 1993.
case statements from binary code. Science of Computer [24] A. Srivastava and D. W. Wall. A practical system for
Programming, 40(23):171188, July 2001. intermodule code optimization at link-time. Journal of
[7] F. B. Cohen. Operating system protection through program Programming Languages, 1(1):118, March 1993.
evolution, 1992. [25] H. Theiling. Extracting safe and precise control flow from
http://all.net/books/IP/evolve.html. binaries. In Proc. 7th Conference on Real-Time Computing
[8] R. S. Cohn, D. W. Goodwin, and P. G. Lowney. Optimizing Systems and Applications, December 2000.
Alpha executables on Windows NT with Spike. Digital [26] C. Wang, J. Davidson, J. Hill, and J. Knight. Protection of
Technical Journal, 9(4):320, 1997. software-based survivability mechanisms. In Proc.
[9] C. Collberg and C. Thomborson. Software watermarking: International Conference of Dependable Systems and
Models and dynamic embeddings. In Proc. 26th. ACM Networks, July 2001.
Symposium on Principles of Programming Languages [27] C. Wang, J. Hill, J. Knight, and J. Davidson. Software
(POPL 1999), pages 311324, January 1999. tamper resistance: Obstructing static analysis of programs.
[10] C. Collberg and C. Thomborson. Watermarking, Technical Report CS-2000-12, 12 2000.
tamper-proofing, and obfuscation tools for software [28] G. Wroblewski. General Method of Program Code
protection. Technical Report TR00-03, The Department of Obfuscation. PhD thesis, Wroclaw University of Technology,
Computer Science, University of Arizona, February 2000. Institute of Engineering Cybernetics, 2002.
[11] C. Collberg, C. Thomborson, and D. Low. Breaking
abstractions and unstructuring data structures. In Proc. 1998
IEEE International Conference on Computer Languages,
pages 2838.
[12] C. Collberg, C. Thomborson, and D. Low. Manufacturing
cheap, resilient, and stealthy opaque constructs. In Proc.
25th. ACM Symposium on Principles of Programming
Languages (POPL 1998), pages 184196, January 1998.
[13] DataRescue sa/nv, Liege, Belgium. IDA Pro.
http://www.datarescue.com/idabase/.
[14] M. L. Fredman, J. Komlos, and E. Szemeredi. Storing a
 
sparse table with O 1 worst case access time. Journal of the
ACM, 31(3):538544, July 1984.
[15] J. R. Levine. Linkers and Loaders. Morgan Kaufman
Publishers, San Francisco, CA, 2000.
[16] D. Lie, C. Thekkath, M. Mitchell, P. Lincoln, D. Boneh,
J. Mitchell, and M. Horowitz. Architectural support for copy
and tamper resistant software. In Proc. 9th. International
Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS-IX), pages
168177, November 2000.
[17] K. Mehlhorn and A. K. Tsakalidis. Data structures. In J. van
Leeuwen, editor, Handbook of Theoretical Computer
Science, Volume A: Algorithms and Complexity (A), pages
301341. MIT Press, 1990.
[18] R. Muth, Saumya K. Debray, Scott Watterson, and K. De
Bosschere. alto : A link-time optimizer for the Compaq
Alpha. SoftwarePractice and Experience, 31:67101,
January 2001.
[19] Objdump. GNU Manuals Online. GNU ProjectFree
Software Foundation.
http://www.gnu.org/manual/binutils-2.10.1/html chapter/binutils 4.html.
[20] T. Ogiso, Y. Sakabe, M. Soshi, and A. Miyaji. Software
obfuscation on a theoretical basis and its implementation.
IEEE Trans. Fundamentals, E86-A(1), January 2003.
[21] B. Schwarz, Saumya K. Debray, and G. R. Andrews. Plto: A
link-time optimizer for the Intel IA-32 architecture. In Proc.
2001 Workshop on Binary Translation (WBT-2001), 2001.

You might also like