CCS2003 PDF
CCS2003 PDF
Static Disassembly
Source Code
ABSTRACT
parsing
A great deal of software is distributed in the form of executable
code. The ability to reverse engineer such executables can create Syntax tree
opportunities for theft of intellectual property via software piracy, intermediate
code gen. and
as well as security breaches by allowing attackers to discover vul- control flow
analysis
decompilation
N1
code for case N1
Figure 3: A example of a C switch statement and its implementation using a jump table
startAddr := program entry point; undisassembled portions of the text segment that appear to be code,
endAddr := startAddr + text section size; in the expectation that they might be the targets of indirect function
DisasmRec(startAddr); calls; a speculative bit is set when this is done, and speculative
end disassembly of a particular region of memory is abandoned if an
invalid instruction is encountered.
Variations on this basic approach to disassembly, which we term
recursive traversal, are used by a number of binary translation and
optimization systems [4, 23, 25]. 3. THWARTING DISASSEMBLY
A virtue of this algorithm is that, by following the control flow In order to thwart a disassembler, we have to somehow confuse,
behavior of the program being processed, it is able to go around as much as possible, its notion of where the instruction boundaries
and thus avoid disassembly of data embedded in the text section. in a program lie. This section discusses some ways in which this
Its main weakness is that its key assumption, that we can precisely can be achieved. We first discuss a phenomenon that we had not ex-
identify the set of control flow successors of each control transfer pected: that of disassembly errors that repair themselves within
operation in the program, may not always hold in the case of indi- a relatively short distance. This is followed by a discussion of a
rect jumps. Imprecision in determining the set of possible targets general technique we use to inject junk bytes into the instruction
of such a jump will result either in a failure to disassemble some stream to introduce disassembly errors. After this we discuss spe-
reachable code (if the set of targets is underestimated) or erroneous cific details of the way in which this is done to confuse the two
disassembly of data (if the set of targets is overestimated). disassembly algorithms discussed in the previous section.
Some researchers have proposed ad hoc extensions to the ba-
sic algorithm outlined above to handle common cases of indirect 3.1 Self-Repairing Disassembly
jumps. As an example, one of the most common uses of indirect On some instruction setsmost notably, that of the Intel IA-
jumps involves jump tables, a construct used by compilers to imple- 32 architecturethe instruction structure is such that, very often,
ment C-style switch statements [2]. This is illustrated in Figure the disassembly process is self-repairing: even when a disassembly
3. The jump table itself is a contiguous array of N code addresses, error occurs (e.g., due to the disassembly of data), the disassem-
corresponding to the N cases in the switch statement. The code bler eventually ends up re-synchronizing with the actual instruc-
to access the jump table evaluates the index expression i; checks tion stream. This is illustrated by the example in Figure 4, which
to see whether this expression falls within the bounds of the table, shows a typical byte sequence in memory, together with the actual
i.e., whether 0 i N; adds the scaled value of the index expres- disassembly (on the left), and the disassemblies we obtain if the
sion to the base address of the table to obtain the address of the ith disassembler is off by 1, 2, or 3 bytes, on the right. When the dis-
entry in the table; then jumps indirectly through this location. The assembly is initially off by a single byte, the disassembler produces
check of whether the index expression falls within the table bounds two erroneous instructions but is back in synchrony with the origi-
can be accomplished using a single unsigned comparison (denoted nal disassembly by the second instruction in the actual disassembly
by u in instruction (2) in Figure 3(b)) [2]. To determine the pos- sequence. A similar phenomenon occurs when the disassembler is
sible target addresses of an indirect jump through a jump table, a initially off by two bytes: it resynchronizes with the second instruc-
disassembler needs to know the base address of the table and its tion in the actual disassembly after producing a single incorrectly
extent, i.e., the values of BaseAddr and N in Figure 3(b). This disassembled instruction. If the disassembler is initially off by three
can be done by scanning back from the indirect jump instruction to bytes, it generates three incorrectly disassembled instructions but
find the instruction that adds the scaled index to the base address resynchronizes by the third instruction in the actual disassembly.
(instruction (4) in Figure 3(b)), whence the base address can be ex- Obviously, the actual resynchronization behavior on a particular
tracted; and the unsigned compare of the index (instruction (2) in program will depend on its particular distribution of instructions.
Figure 3(b)), whence the table size can be determined. Once this In practice, however, we have found that disassembly errors usually
has been done, disassembly can continue at each target identified resynchronize quite quicklyoften within just one or two instruc-
from the N table entries starting at location BaseAddr [6]. tions beyond the point at which the disassembly error occurred. Ef-
Code that is reachable only through indirect control transfers forts to confuse disassembly have to take this self-repairing aspect
may not be found using the basic algorithm above. To handle of disassembly into account.
this problem, some systems, e.g., the UQBT binary translation sys-
tem [5], resort to speculative disassembly. The idea is to process 3.2 Junk Insertion
memory
actual disassembly bytes (hex) 1 byte off 2 bytes off 3 bytes off
(synchronizes in 2 instrs.) (synchronizes in 1 instr.) (synchronizes in 3 instrs.)
8b
44 inc %esp
mov 4(%esp), %eax
24
and $4, %al and $4, %al
04
add $3, %al
03
44 inc %esp
add 12(%esp), %eax
24
and $12, %al
0c
83
06
We can introduce disassembly errors by inserting junk bytes at point to note here is that since the simulation of disassembly scans
selected locations in the instruction stream where the disassembler forward from each candidate to determine the number of junk
is likely to expect code. (An alternative approach involves partially bytes to be inserted there, it is important to ensure that such deci-
or fully overlapping instructions, e.g., see [7]: this is discussed in sions made for one candidate are not subsequently invalidated by
Section 5.) It is not difficult to see that any such junk bytes must the insertion of junk into subsequent candidates. To avoid such ef-
satisfy two properties. First, in order to actually confuse the disas- fects, we consider candidate blocks in reverse order when inserting
sembler, the junk bytes must be partial instructions, not complete junk.
instructions. Second, in order to preserve program semantics, such With the approach described thus far, we find that we are typi-
partial instructions must be inserted in such a way that they are cally able to attain a confusion factor of about 15% on average
unreachable at runtime. To this end, define a basic block as a can- i.e., 15% of the instructions in a program are incorrectly disassem-
didate block if it can have such junk bytes inserted before it. In bled (confusion factors are discussed in more detail in Section 4).
order to ensure that any junk so inserted is unreachable during exe- The reason that it is not higher is that candidates for the insertion of
cution, a candidate basic block cannot have execution fall through junk bytes cannot have execution fall through into them: the pre-
into it. In other words, the basic block immediately before a can- ceding block has to end in an unconditional control transfer. We
didate block must end in an unconditional control transfer, e.g., an have found that, in programs obtained from a typical optimizing
unconditional jump or a return from a function. Candidate blocks compiler, candidate blocks tend to be around 30 instructions apart
can be identified in a straightforward way by scanning the basic on average.2 This distance, combined with the self-repairing na-
blocks of the program after their final memory layout has been de- ture of disassembly, means that when disassembly goes wrong af-
termined. ter the insertion of junk before a candidate, it typically manages
As mentioned in Section 3.1, the static disassembly process very to re-synchronize before the next candidate is encountered. We in-
often manages to re-synchronize itself after a disassembly error. crease the number of candidates by a transformation called branch
Once a candidate block B has been identified, we have to determine flipping. The idea is to invert the sense of conditional jumps, by
what junk bytes to insert before it so as to confuse the disassembler converting code of the form
as much as possible, i.e., delay this re-synchronization for as long
as possible. To do this, we take a particular n-byte instruction I (our
cc Addr
current implementation considers a 6-byte bitwise-OR instruction,
but it is easy to extend this to other instructions), and determine how where cc represents a condition, e.g., eq or ne, to
far away this re-synchronization would occur if the first k bytes of
I were to be inserted immediately before the candidate block B, for
cc L
each k, 0 k n. To determine the re-synchronization point, for
L:
jmp Addr
each such k we simulate disassembly for the candidate block, as-
suming that the disassembler encounters the first k bytes of I at the
where cc is the complementary condition to cc, e.g., a beq ...
beginning of B, then continuing with the byte sequence comprising
the machine-level encodings of the instructions actually in B. Us- is converted to a bne .... The basic block at L now becomes
ing this approach we determine the value kmax of k for which the a candidate. With this transformation, the distance between can-
re-synchronization distance is maximized, and insert the first kmax didate blocks drops to about 12 instructions on average, and the
bytes of I immediately before block B. confusion factor rises to about 37%. Yet another measure that can
be taken to increase candidates for junk insertions is call conver-
sion which raises instruction confusion to about 42%. This method
3.3 Thwarting Linear Sweep is discussed in more detail in section 3.4.2.
As observed in Section 2.1, linear sweep disassembly is gener-
ally unable to distinguish data embedded in the text section. We 2 These data reflect the SPECint-95 benchmark suite compiled us-
can exploit this weakness by inserting junk bytes at selected lo- ing gcc at optimization level -O3. The averages given here were
cations in the instruction stream, as discussed in Section 3.2. One computed as geometric means.
3.4 Thwarting Recursive Traversal the computation of the target address bi within the branch function,
The main strength of the recursive disassembly algorithmits we can make it difficult for an attacker to reconstruct the original
ability to deal intelligently with control flow and thereby disassem- map it realizes. The second is to create opportunities for mis-
ble around data embedded in the text segmentalso turns out to be leading a disassembler: since a disassembler will typically continue
a weakness that we can take advantage of to confuse the disassem- disassembly at the instruction following the call instruction, we can
bly process. There are two (related) aspects of recursive traversal introduce errors in the disassembly by inserting junk bytes at the
that we can exploit. The first is that when it encounters a control point immediately after each call f instruction in a manner
transfer, disassembly continues at those locations that are deemed similar to that discussed in Section 3.3.
to be the possible control transfer targets. In this context, disassem- Branch functions can be implemented in a number of ways. For
blers typically assume that commonly encountered control trans- example, a straightforward implementation might use the return ad-
fers, such as conditional branches and function calls, behave rea- dress to look up a table, via a simple linear or binary search, to de-
sonably. For example, a conditional branch is assumed to have termine the target address. The disadvantage with such schemes is
two possible targets: the branch target and the fall through to the that they relatively straightforward to reverse engineer.
next instruction. Similarly, a function call is assumed to return to A more sophisticated implementation of branch functions has the
the instruction immediately following the call instruction. callee pass, as an argument to the branch function, the offset from
The second aspect of recursive traversal is that identifying the the instruction immediately after it (whose address is passed to the
set of possible targets of indirect control transfers is difficult. Re- branch function as the return address) to the target bi . The branch
cursive traversal disassemblers therefore generally resort to ad hoc function simply adds the value of its argument to the return address,
techniques, such as examining bounds checks associated with jump so that the return address becomes the address of the original target
tables, or disassembling speculatively, to handle commonly en- bi . The code for this, on the Intel IA-32 architecture, might be as
countered situations involving indirect jumps. follows:4
Below we discuss different ways in which these characteristics xchg %eax, 0(%esp) # I1
can be exploited to confuse recursive traversal disassembly. add %eax, 8(%eax) # I2
pop %eax # I3
3.4.1 Branch Functions ret # I4
The assumption that a function returns to the instruction follow-
Instruction I1 exchanges the contents of register %eax with the
ing the call instruction can be exploited using what we term branch
word at the top of the stack, effectively saving the contents of %eax
functions. The idea is illustrated in Figure 5. Given a finite map
and at the same time loading the displacement to the target (passed
over locations in a program
to the branch function as an argument on the stack) into %eax.
a
1 b1 a
n bn Instruction I2 then has the effect of adding this displacement to the
return address. I3 restores the previously saved value of %eax, and
a branch function f is a function that, whenever it is called from I4 then has the effect of branching to the address computed by the
one of the locations ai , causes control to be transferred to the cor- function.
responding location bi , 1 i n. Given such a branch function f , Our current implementation uses a variation on this idea that uses
we can replace n unconditional branches in a program, perfect hashing [14, 17] and is, we believe, harder to reverse engi-
neer. Once the final code layout has been determined and we know
a1 : jmp b1
... the mapping
a 1 b1
an bn we want the branch func-
a2 : jmp b2 tion to implement, we create a perfect hash function h :
...
an : jmp bn a
1 n
h : a1 n
a n : jmp bn
a n : call f bn
bn
The complexity of a branch functions implementation, and the 3.4.4 Jump Table Spoofing
way in which it is accessed, offer an interesting tradeoff between In addition to simply inserting junk bytes at the fake target of an
execution speed, on the one hand, and difficulty of reverse engi- opaquely directed conditional branch, we can also insert artificial
neering, on the other. For example, we can choose different branch jump tables to mislead recursive traversal disassembly. We refer to
function implementations for jump instructions depending on their this technique as jump table spoofing.
execution frequencies: frequently executed jump instructions might Recall that, as discussed in Section 2.2, recursive traversal dis-
be directed to a lightweight branch function, less frequently exe- assemblers may attempt to use the bounds check for a jump table
cuted ones to a more complex branch function, and so on. to identify its size, and thereby determine the set of possible targets
of an indirect jump through a jump table. We can exploit this to
3.4.2 Call Conversion mislead a disassembler by introducing a jump table that is unreach-
A variation of the branch function scheme described in section able at runtime. The jump table code can be made unreachable ei-
3.4.1 can be used to extend the candidates for junk insertions to ther by using a conditional branch that uses an opaque predicate to
include those basic blocks directly following call instructions as jump around it, or by using an opaque expressionwhich is very
well. Recall that the reason junk bytes can typically not be inserted similar to an opaque predicate, expect that the value need not be
after call instructions is that control returns to the address directly simply a truth valuewhose value is guaranteed to fail the bounds
after the last byte of a call instruction upon completion of an in- check. The code addresses in this jump table can now be set to
voked function. This being the case, if junk bytes were inserted junk addressestext segment addresses that do not correspond
after a well-behaved call instruction then it would be possible for to actual instructionsand thereby cause disassembly errors.
control to reach the junk bytes and therefore violate the constraints A variation on this idea is to take an unconditional jump to an
of junk insertions described in section 3.2.
address and convert it to an indirect jump through a jump table
One solution to this problem is to reroute call instructions
where the address appears as the kth table entry. The table is
through a specialized branch function that branches to the intended indexed by the value of an opaque expression that always evalu-
target function via perfect hashing, as in the standard branch func- ates to k. However, the bounds check for the table uses a table
tion, but then returns to some predetermined offset from the origi-
size m k, leading the disassembler to believe that the jump table
nal call instruction (i.e., the offset to the real successor instruction contains m entries. Only one of these m entriesnamely, the kth
that lies beyond some number of junk bytes). Using this method entrycontains a real code address: the other entries contain junk
we are able to obscure control flow information by making func- addresses.
tion entry points more difficult to decipher while also increasing
the potential to mislead the disassembler. 3.5 Implementation Status
We have implemented our ideas using PLTO, a binary rewrit-
3.4.3 Opaque Predicates ing system developed for Intel IA-32 executables [21]. The system
The assumption that a conditional branch has two possible tar- reads in statically linked executables,5 disassembles the input bi-
gets can be exploited by disguising an unconditional branch as a nary, and constructs a control flow graph. This control flow graph
conditional branch that happens, at runtime, to always go in one is then processed in one of two ways. If the user specifies that
directioni.e., either it is always taken, and never falls through; or profiling is to be carried out, instrumentation code is inserted to
it is never taken, and always falls through. This technique relies generate an edge profile when the resulting binary is executed. If,
on using predicates that always evaluate to either the constant true on the other hand, the user requests obfuscating transformations to
or the constant false, regardless of the values of their inputs: such be carried out, the system reads in edge profile information if avail-
predicates are known as opaque predicates [12]. Other researchers able, carries out branch flipping to increase the number of candidate
have discussed techniques for synthesizing opaque predicates; their blocks (Section 3.3), applies various obfuscating transformations,
ideas translate in a straightforward way to our context, so we do not and writes out the resulting executable.
discuss this issue further. The transformations currently implemented in the system are
Once an unconditional branch has been replaced by a conditional junk insertion (Section 3.2) and transformation of unconditional
branch that uses an opaque predicate, we have a locationeither jumps and call instructions to the respective branch function calls
the branch target or the fall through, depending on whether the 5 The requirement for statically linked executables is a result of the
opaque predicate is always false or always truethat appears to be fact that PLTO relies on the presence of relocation information to
a legitimate continuation for execution from the conditional branch distinguish addresses from data. The Unix linker ld refuses to
but, in fact, is not. We can then insert junk bytes at this point, as retain relocation information for executables that are not statically
discussed earlier, to mislead the disassembly. linked.
fraction of candidates converted
1.00 Thresholds
0.70
0.95
0.80
0.90 0.90
0.95
0.85 1.00
0.80
0.75
compress gcc go ijpeg li m88ksim perl vortex Mean
Program
0.80
4.0 0.90
0.95
3.0 1.00
2.0
1.0
compress gcc go ijpeg li m88ksim perl vortex Mean
Program
Figure 6: Effect of hot code threshold on branch function conversion and execution speed
(Sections 3.4.1 and 3.4.2). We expect to have additional transfor- and functions. Intuitively, the confusion factor measures the frac-
mations, such as jump table spoofing (Section 3.4.4), implemented tion of program units (instructions, basic blocks, or functions) in
in the near future. the obfuscated code that were incorrectly identified by a disassem-
bler. More formally, let A be the set of all actual instruction ad-
dresses, i.e., those that would be encountered when the program
4. EXPERIMENTAL EVALUATION is executed, and P the set of all perceived instruction addresses,
We evaluated the efficacy of our techniques using the SPECint-
95 benchmark suite. Our experiments were run on an otherwise
i.e., those addresses produced by a static disassembly. A P is the
set of addresses that are not correctly identified as instruction ad-
unloaded 2.4 GHz Pentium IV system with 1 GB of main memory dresses by the disassembler. We define the confusion factor CF to
running RedHat Linux 8.0. The programs were compiled with gcc be the fraction of instruction addresses that the disassembler fails
version egcs-2.91.66 at optimization level -O3. The programs were to identify correctly:6
profiled using the SPEC training inputs and these profiles were used
to identify any hot spots during our transformations. The final per- CF
A P ! A .
formance of the transformed programs were then evaluated using
the SPEC reference inputs. Each execution time reported was de- Confusion factors for functions and basic blocks are calculated
rived by running seven trials, removing the highest and lowest times anologously: a basic block or function is counted as being in-
from the sampling, and averaging the remaining five. correctly disassembled if any of the instructions in it is incorrectly
We experimented with three different attack disassemblers to disassembled. The reason for computing confusion factors for ba-
evaluate our techniques. The first of these is the GNU objdump util- sic blocks and functions as well as for instructions is to determine
ity which employs a straight-forward linear sweep algorithm. The whether the errors in disassembling instructions are clustered in a
second, which we wrote ourselves, is a recursive disassembler that small region of the code, or whether they are distributed over sig-
incorporates a variation of speculative disassembly (see Section 2). nificant portions of the program.
In addition we also provide the recursive disassembler with extra As mentioned in Section 3.4.1, we transform jumps to branch
information about the address and size of each jump table in the function calls only if the jump does not occur in a hot basic block.
program as well as the start and end address of each function. The The first questions we have to address, therefore, are: how are hot
results obtained from this disassembler therefore serve as a lower basic blocks identified, and what is the effect of different choices of
bound estimate of the extent of obfuscation achieved. Our third dis- what constitutes a hot block on the extent of obfuscation achieved
assembler is IDA Pro [13], a commercially available disassembly and the performance of the resulting code? To identify the hot, or
tool that is generally regarded to be among the best disassemblers
available.
6 We also considered taking into account the set P A of addresses
that are erroneously identified as instruction addresses by the disas-
For each of these, the efficacy of obfuscation was measured by sembler, but rejected this approach because it double counts the
computing confusion factors for the instructions, basic blocks, effects of disassembly errors.
Confusion factor (%)
P ROGRAM L INEAR SWEEP (O BJDUMP ) R ECURSIVE TRAVERSAL C OMMERCIAL (IDA P RO )
Instructions Basic blocks Functions Instructions Basic blocks Functions Instructions Basic blocks Functions
compress95 43.93 63.68 100.00 30.04 40.42 75.98 75.81 91.53 87.37
gcc 34.46 53.34 99.53 17.82 26.73 72.80 54.91 68.78 82.87
go 33.92 51.73 99.76 21.88 30.98 60.56 56.99 70.94 75.12
ijpeg 39.18 60.83 99.75 25.77 38.04 69.99 68.54 85.77 83.94
li 43.35 63.69 99.88 27.22 38.23 76.77 70.93 87.88 84.91
m88ksim 41.58 62.87 99.73 24.34 35.72 77.16 70.44 87.16 87.16
perl 42.34 63.43 99.75 27.99 39.82 76.18 68.64 84.62 87.13
vortex 33.98 55.16 99.65 23.03 35.61 86.00 57.35 74.55 91.29
Geo. mean 39.09 59.34 99.75 24.76 35.69 74.43 65.45 81.40 84.97
91% of the candidates are converted at 1 0. However, as il-
The results of this experiment are reported in Figure 7. It can be
seen that this tool fails on most of the program: close to 65% of the
lustrated in Figure 6(b), varying the hot code threshold has a sig-
nificant effect on execution speed. For example, at 0 70 the
instructions, and about 85% of the functions in the program, are
disassembled incorrectly. Part of the reason for this high degree of
programs slow down by a factor of 3.67 on average, with the li failure is that IDA Pro only disassembles addresses that (it believes)
benchmark experiencing the largest slowdown, by a factor of 5.14. can be guaranteed to be instruction addresses. This has two effects:
However, as is increased the slowdown drops off quickly, to a fac-
tor of 3.14 at 0 9 and 1.62 at 1 0. In summary, choosing a
first, large portions of the code that are reached by branch function
addresses are simply not disassembled, being presented instead to
threshold of 1.0 still results in most of the candidate blocks in the the user as a jumble of hex data; and second, the location imme-
program being converted to branch function calls without excessive diately following a branch function call is treated as an address to
performance penalty. For the purposes of this paper, therefore, we
give measurements for 1 0.
which control returns, and this causes some junk bytes to be erro-
neously disassembled. Overall, this shows that our techniques are
Figure 7 shows the efficacy of our obfuscation transformations effective even against state-of-the-art disassembly tools.
for both of the disassembly methods discussed in Section 2. The Finally, Figure 9 shows the impact of obfuscation on code size,
confusion factors achieved for linear sweep disassembly are quite both in terms of the number of instructions (which increases, for
modest: on average, 39% of the instructions, 59% of the basic example, due to branch flipping), as well as the number of bytes
blocks, and nearly 100% of the functions are incorrectly disassem- occupied by the text section. The latter includes the effects of the
bled. For recursive traversal, the confusion factors are somewhat new instructions inserted as well as all junk bytes added to the pro-
lower because in this case the disassembler can understand and deal gram. Overall, it can be seen that there is a 20% increase in the
with control flow somewhat better than with linear sweep and as a
NO. OF I NSTRUCTIONS T EXT SECTION SIZE ( BYTES )
P ROGRAM Original Obfuscated Change Original Obfuscated Change
(I0 ) (I1 ) *
(I1 I0 ) (S0 ) (S1 ) *
(S1 S0 )
compress95 74787 92137 1.231 265985 311095 1.169
gcc 327133 387289 1.183 1128273 1290419 1.143
go 124424 145953 1.173 468537 525232 1.121
ijpeg 105766 127012 1.200 363169 419535 1.155
li 89309 109652 1.227 310301 363801 1.172
m88ksim 104211 127358 1.222 368798 430845 1.168
perl 137947 169054 1.225 484194 566935 1.170
vortex 174960 204230 1.167 592076 672795 1.136
Geo. mean 1.204 1.154