Stitching The Gadgets: On The Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection
Stitching The Gadgets: On The Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection
Abstract (e.g., the Java virtual machine) that are in turn imple-
Return-oriented programming (ROP) offers a robust at- mented in type-unsafe languages.
tack technique that has, not surprisingly, been exten- Sadly, as modern compilers and applications become
sively used to exploit bugs in modern software programs more and more complex, memory errors and vulnera-
(e.g., web browsers and PDF readers). ROP attacks re- bilities will likely continue to persist, with little end in
quire no code injection, and have already been shown sight [41]. The most prominent example of a memory
to be powerful enough to bypass fine-grained memory error is the stack overflow vulnerability, where the adver-
randomization (ASLR) defenses. To counter this in- sary overflows a local buffer on the stack and overwrites
genious attack strategy, several proposals for enforce- a function’s return address [4]. While today’s defenses
ment of (coarse-grained) control-flow integrity (CFI) protect against this attack strategy (e.g., by using stack
have emerged. The key argument put forth by these canaries [15]), other avenues for exploitation exists, in-
works is that coarse-grained CFI policies are sufficient to cluding those that leverage heap [33], format string [21],
prevent ROP attacks. As this reasoning has gained trac- or integer overflow [6] vulnerabilities.
tion, ideas put forth in these proposals have even been Regardless of the attacker’s method of choice, exploit-
incorporated into coarse-grained CFI defenses in widely ing a vulnerability and gaining control over an applica-
adopted tools (e.g., Microsoft’s EMET framework). tion’s control-flow is only the first step of a runtime at-
In this paper, we provide the first comprehensive tack. The second step is to launch malicious program
security analysis of various CFI solutions (covering actions. Traditionally, this has been realized by inject-
kBouncer, ROPecker, CFI for COTS binaries, ROP- ing malicious code into the application’s address space,
Guard, and Microsoft EMET 4.1). A key contribution and later executing the injected code. However, with the
is in demonstrating that these techniques can be effec- wide-spread enforcement of the non-executable memory
tively undermined, even under weak adversarial assump- principle (called data execution prevention in Windows)
tions. More specifically, we show that with bare mini- such attacks are more difficult to do today [28]. Unfortu-
mum assumptions, turing-complete and real-world ROP nately, the long-held assumption that only new injected
attacks can still be launched even when the strictest of code bared risks was shattered with the introduction of
enforcement policies is in use. To do so, we intro- code reuse attacks, such as return-into-libc [30, 37] and
duce several new ROP attack primitives, and demonstrate return-oriented programming (ROP) [35]. As the name
the practicality of our approach by transforming existing implies, code reuse attacks do not require any code injec-
real-world exploits into more stealthy attacks that bypass tion and instead use code already resident in memory.
coarse-grained CFI defenses. One of the most promising defense mechanisms
against such runtime attacks is the enforcement of
1 Introduction control-flow integrity (CFI) [1, 3]. The main idea of CFI
Today, runtime attacks remain one of the most prevalent is to derive an application’s control-flow graph (CFG)
attack vectors against software programs. The continued prior to execution, and then monitor its runtime behavior
success of these attacks can be attributed to the fact that to ensure that the control-flow follows a legitimate path
large portions of software programs are implemented in of the CFG. Any deviation from the CFG leads to a CFI
type-unsafe languages (C, C++, or Objective-C) that do exception and subsequent termination of the application.
not enforce bounds checking on data inputs. Moreover, Although CFI requires no source code of an appli-
even type-safe languages (e.g., Java) rely on interpreters cation, it suffers from practical limitations that impede
all recent coarse-grained CFI solutions we are aware of Pointer RET ADDR 1 ROP Sequence 2
asm_ins
(SP)
claim that their relaxed policies are sufficient to thwart Memory Layout asm_ins
for ROP Attack RET
ROP attacks1 . In particular, they claim that the property
ROP Sequence 1
of Turing-completeness is lost due to the fact that the
code base which an adversary can exploit is significantly Figure 1: Memory snapshot of a ROP Attack
reduced. Yet, to date, no evidence substantiating these
assertions has been given, raising questions with regards An example ROP payload and a typical memory lay-
to the true effectiveness of these solutions. out for a ROP attack is shown in Figure 1. Basically, the
Contribution. We revisit the assumption that coarse- ROP payload consists of a number of return addresses
grained CFI offers an effective defense against ROP. each pointing to a short code sequence. These sequences
For this, we conduct a security analysis of the re- consist of a small number of assembler instructions (de-
cently proposed CFI solutions including kBouncer [31], noted in Figure 1 as asm ins), and traditionally termi-
ROPecker [13], CFI for COTS binaries [46], ROP- nate in a return [35] instruction2 . The indirect branches
Guard [20], and Microsofts’ EMET tool [29]. In particu- are responsible for chaining and executing one ROP se-
lar, we derived a combined CFI policy that takes for each quence after the other.
indirect branch class (i.e., return, indirect jump, indirect In addition to return addresses, the adversary writes
call) and behavioral-based heuristics (e.g., the number several data-words in memory that are used by the in-
of instruction executed between two indirect branches), voked code sequences (usually via stack POP instruc-
the most restrictive setting among these policies. After- tions as shown in ROP Sequence 2). At the beginning of
wards, we use our combined CFI policy and a weak ad- the attack, the stack pointer (SP) points to the first return
versary having access to only a single — and common address of the payload. Once the first sequence has been
used system library — to realize a Turing-complete gad- executed, its final return instruction (RET) advances the
get set. The reduced code base mandated that we develop stack pointer by one memory word, loads the next return
several new return-oriented programming attack gadgets address from the stack, and transfers the control-flow to
to facilitate our attacks. To demonstrate the power of our the next code sequence.
attacks, we show how to harden existing real-world ex- The combination of the invoked ROP sequences in-
ploits against the Windows version of Adobe Reader [26] duce the malicious operations. Typically, these se-
and mPlayer [10] so that they bypass coarse-grain CFI quences are identified within an (offline) static analy-
… … label fn1
CALL printf RET … allows it. Consequently, the quality of protection from
target = ra1?
RET control-flow attacks rests squarely on the level of CFG
label ra1
coverage. And that is exactly where recent CFI solutions
BBL 2
…
CALL [REG]
target = fn1?
function2(): have deviated (substantially) from the original work, pri-
target = ra2?
marily as a means to address performance issues.
label ra2 …
Recall that in the original proposal, the CFG was
BBL 3
… asm_instr
RET asm_instr obtained a priori using binary analysis techniques sup-
RET ported by a proprietary framework called Vulcan. Since
Intended control flow the CFG is created ahead of time, it is not capable of cap-
Non-Intended (malicious) control flow turing the dynamic nature of the call stack. That is, with
only the CFG at hand, one can not enforce that functions
Figure 2: The CFG shepherds control-flow transfers return to their most recent call site, but only that they re-
As noted above, a number of new control-flow integrity 2 – Behavior-Based Heuristics (HEU). Apart from
(CFI) solutions have been recently proposed to address enforcing specific policies on indirect branch instruc-
the challenges of good runtime performance, high ro- tions, CFI solutions can also validate other program be-
bustness and ease of deployment. The most prominent havior to detect ROP attacks. One prominent example
examples include kBouncer [31], ROPecker [13], CFI for is the number of instructions executed between two con-
COTS binaries [46], and ROPGuard [20]. To aide in bet- secutive indirect branches. The expectation is that the
ter understanding the strenghts and limitations of these number of such instructions will be low (compared to
proposals, we first provide a taxonomy of the various CFI ordinary execution) because ROP attacks invoke a chain
policies embodied in these works. Later, to strengthen of short code sequences each terminating in an indirect
our own analyses, we also derive a combined CFI policy branch instruction.
that takes into account the most restrictive CFI policy. 3 – Time of CFI Check (TOC). Abadi et al. argued
3.1 CFI Policies that a CFI validation routine should be invoked whenever
the program issues an indirect branch instruction [3]. In
Table 1 summarizes the five CFI policies we use through- practice, however, doing so induces significant perfor-
out this paper to analyze the effectiveness of coarse- mance overhead. For that reason, some of the more
grained CFI solutions. Specifically, we distinguish be- recent CFI approaches reduce the number of runtime
tween three types of policies, namely policies used checks, and only enforce CFI validation at critical pro-
for indirect branch instructions, general CFI heuristics gram states, e.g., before a system or API call.
that do not provide well-founded control-flow checks but
instead try to capture general machine state patterns of 3.2 Instantiation in Recent Proposals
ROP attacks and a policy class that covers the time Next, we turn our attention to the specifics of how these
CFI checks are enforced. policies are implemented in recent CFI mechanisms.
We believe this categorization covers the most impor-
3.2.1 kBouncer
tant aspects of CFI-based defenses suggested to date. In
particular, they cover polices for each indirect branch The approach of Pappas et al. [31], called kBouncer, de-
the processor supports since all control-flow attacks (in- ploys techniques that fall in each of the aforementioned
cluding ROP) require exploiting indirect branches. Sec- categories. Under category , Pappas et al. [31] lever-
ond, heuristics are used by several coarse-grained CFI age the x86-model register set called last branch record
approaches (e.g., [20, 31]) to allow more relaxed CFI (LBR). The LBR provides a register set that holds the
y
[4
lic
0]
9]
3]
]
TS
[2
31
Po
[2
[1
CO
r[
d
r
r
ke
4.
ne
ua
e
nc
or
ET
bi
PG
Pe
ou
If
m
EM
RO
RO
Control-Flow Integrity (CFI) Policies
CF
kB
Co
CFIRET : destination has to be call-preceded
CFIRET : destination can be taken from a code pointer
CFIJMP : destination has to be call-preceded
CFIJMP : destination can be taken from a code pointer
CFICALL : destination can be taken from an exported symbol
CFICALL : destination can be taken from a code pointer
CFIHEU : allow only s consecutive short sequences, s <= 7 s <= 10 s <= 7
CFIHEU : where short is defined as n instructions n <= 20 n <= 6 n <= 20
CFIT OC : check at every indirect branch
Always
CFIT OC : check at critical API functions or system calls
observed
CFIT OC : check when leaving sliding code window
Table 2: Policy comparison of coarse-grained CFI solutions: indicates that the CFI policy is applied and enforced. means that
the CFI policy is prohibited (corresponding execution flows would lead to an attack alarm) . indicates that the CFI policy is not
applied/enforced. The combined policy takes the most restrictive setting for each CFI policy.
best range of thresholds for the recommended number of coarse-grained CFI protections are enforced. In particu-
consecutive short sequences (s) with a given sequence lar, we desire a gadget set that still allows an adversary
length of n <= 20. Their analysis reveals that adjusting to undermine the combined CFI policy (see Table 2).
the thresholds for s beyond their recommended values Assumptions. To be as pragmatic as possible, we as-
is hardly realistic: when every function call was instru- sume that the adversary can only leverage the presence
mented, 975 false positives were recorded for s <= 8. of a single shared library to derive the gadget set. This
An alternative is to increase the sequence length n is a very stringent requirement placed on ourselves since
(e.g., setting it to n <= 40). Doing so would require an modern programs typically link to dozens of libraries.
adversary to find a long sequence of 40 instructions after Note also that we are not concerned with circumvent-
each seventh short sequence (for s <= 7). However, in- ing other runtime protection mechanisms such as ASLR
creasing the threshold for the sequence length will only or stack canaries. The reasons are twofold: first, coarse-
exacerbate the false positive issue. For this reason, Pap- grained CFI protection approaches do not rely on the
pas et al. [31] did not consider sequences consisting of presence of other defenses to mitigate against code reuse
more than 20 instructions as a gadget in their analyses. attacks. Second, in contrast to CFI, ASLR and protection
We provide our own assessment in §5.3. mechanisms that defend against code pointer overwrites
The approach of Cheng et al. [13], on the other hand, (e.g., stack canaries, bounds checkers, pointer encryp-
uses different thresholds for s and n than in kBouncer. tion) do not offer a general defense, and moreover, are
Making the thresholds in ROPecker more conservative typically bypassed in practice. In particular, ASLR is
(e.g., reducing s and increasing n) will lead to the same vulnerable to memory disclosure attacks [36, 38]. That
false positives problems as in kBouncer. Moreover, the said, the attacks and return-oriented programming gad-
problem would be worse, since ROPecker performs CFI gets we present in the following can be also leveraged to
validation more frequently than kBouncer. Nevertheless, mount memory disclosure attacks in the first stage.
we show that regardless of the specific choice of parame-
Methodology and Outline. Our analysis is performed
ter chosen in the recommended ranges, our attacks render
primarily on Windows as it is the most widely deployed
these defenses ineffective in practice (see Section 5).
desktop operating system today. Specifically, we inspect
4 Turing-Complete ROP Gadget Set kernel32.dll (on x86 Windows 7 SP1), a 848kb sys-
tem library that exposes Windows API functions and is
We now explore whether or not it is possible to derive a by default linked to nearly every major Windows pro-
Turing-complete gadget set even when all state-of-the-art cess (e.g., Adober Reader, IE, Firefox, MS Office). It
Table 4: Selected Memory Load and Store Gadgets Table 6: Branching Gadgets
We also identified a corresponding memory store gad- a new address that has been loaded before into our in-
get on eax. The shown gadget stores eax at the address termediate register ebp. The second variant realizes the
provided by register esi, which needs to be initialized unconditional branch by adding a constant offset to esp.
by a load register gadget beforehand. The gadget has no Either one suffices for our purposes.
side-effects, since it resets eax (which was stored earlier) Conditional branch gadgets change the stack pointer
and loads new values from the stack into esi (which held iff a particular condition holds. Because load, store, and
the target address) and ebp (our intermediate register). arithmetic/logic computation can be conveniently done
Given a memory store gadget for eax and the fact that for eax, we could place the conditional in this regis-
we have already identified register load gadgets for each ter. Unfortunately, because a direct load of esp (that de-
register, it is sufficient to use the same memory load on pended on the value of eax) was not readily available, we
eax to load any other register. This is possible because realized the conditional branch in three steps requiring
we use the eax load gadget to load the desired value from the invocation of only four ROP sequences. That said,
memory, store it afterwards on the stack, and finally use our gadget is still within the constraints for the number of
one of the register load gadgets to load the value into the allowable consecutive sequences in the Combined CFI-
desired register. Finally, we also identified some conve- enforcement Policy (see n = 8 for CFIHEU in Table 2).
nient memory store gadgets for esi and edi only requir- First, we use the conditional branch gadget (see Ta-
ing ebp to hold the target address of the store operation. ble 6) to either load 0 or a prepared value into eax. In
Arithmetic and Logical Gadgets. For arithmetic op- this sequence neg eax computes the two’s complement
erations we utilize the sequence containing the x86 sub and, more importantly, sets the carry flag to zero if and
instruction shown in Table 5. This instruction takes the only if eax was zero beforehand. This is nicely used by
operands from eax and esi and stores the result of the the subsequent sbb instruction, which subtracts the reg-
subtraction into eax. Both operands can be loaded by us- ister from itself, always yielding zero, but additionally
ing the register load gadgets (see Table 3). The same gad- subtracting an extra one if the carry flag is set. Because
get can be used to perform an addition: one only needs subtracting one from zero gives 0xFFFFFFFF, the next
to load the two’s complement into esi. Based on addi- and masks either none or all the bits. Hence, the re-
tion and subtraction, we can realize multiplication and sult in eax will be exactly the contents of [ebp-4] if
division as well. Unfortunately, logical gadgets are not eax was zero, or zero otherwise. One might think that
as commonplace. There is, however, a XOR gadget that it is very unlikely to find sequences that follow the pat-
takes its operands from eax and edi (see Table 3). tern neg-sbb-and. However, we found 16 sequences in
kernel32.dll that follow the same pattern and could
Type Call-Preceded Sequence (ending in ret) have been leveraged for a conditional branch gadget.
ADD/SUB sub eax,esi; pop esi; pop ebp We then use the ADD/SUB gadget (see Table 5) to
XOR xor eax,edi; pop edi; pop esi; subtract esi from eax so that the latter holds the branch
pop ebp offset for esp. Finally, we move eax into esp using
the stack as temporary storage. The STORE(eax) gad-
Table 5: Arithmetic and Logical Gadgets
get (see Table 4) will store the branch offset on the stack,
where pop ebp followed by the unconditional branch 1
Branching Gadgets. We remind the reader that gadget loads it into esp.
branching in ROP attacks is realized by modifying the
4.2 Extended Gadget Set
stack pointer rather than the instruction pointer [35].
In general, we can distinguish two different types For those readers who have either written or analyzed
of branches: unconditional and conditional branches. real-world ROP exploits before, it would be clear to them
kernel32.dll, for example, offers two variants for a that several other gadgets are useful in practice. For ex-
unconditional branch gadget (see Table 6). The first uses ample, modern exploits usually invoke several WinAPI
the leave instruction to load the stack pointer (esp) with functions to perform malicious actions, e.g., launching
To overcome this restriction, we make use of an old s=7 s=8 s=9 s=10 s>10
[14] T. Chiueh and F.-H. Hsu. RAD: A compile-time solution to buffer [32] J. Pewny and T. Holz. Compiler-based CFI for iOS. In Annual
overflow attacks. In International Conference on Distributed Computer Security Applications Conference (ACSAC), 2013.
Computing Systems (ICDCS), 2001. [33] J. Pincus and B. Baker. Beyond stack smashing: Recent advances
in exploiting buffer overruns. IEEE Security and Privacy, 2(4),
[15] C. Cowan, C. Pu, D. Maier, H. Hintony, J. Walpole, P. Bakke,
2004.
S. Beattie, A. Grier, P. Wagle, and Q. Zhang. StackGuard: Au-
tomatic adaptive detection and prevention of buffer-overflow at- [34] F. Schuster, T. Tendyck, J. Pewny, A. Maaß, M. Steegmanns,
tacks. In USENIX Security Symposium, 1998. M. Contag, and T. Holz. Evaluating the effectiveness of current
anti-ROP defenses. In Symposium on Recent Advances in Intru-
[16] J. Criswell, N. Dautenhahn, and V. Adve. KCoFI: Complete sion Detection (RAID), 2014.
control-flow integrity for commodity operating system kernels.
In IEEE Symposium on Security and Privacy, Oakland ’14, 2014. [35] H. Shacham. The geometry of innocent flesh on the bone: Return-
into-libc without function calls (on the x86). In ACM Conference
[17] D. Dai Zovi. Practical return-oriented programming. SOURCE on Computer and Communications Security (CCS), 2007.
Boston, 2010. Presentation. Slides: http://trailofbits.
files.wordpress.com/2010/04/practical-rop.pdf. [36] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen,
and A.-R. Sadeghi. Just-in-time code reuse: On the effective-
[18] L. Davi, A. Dmitrienko, M. Egele, T. Fischer, T. Holz, R. Hund, ness of fine-grained address space layout randomization. In IEEE
S. Nürnberger, and A.-R. Sadeghi. MoCFI: A framework to miti- Symposium on Security and Privacy, Oakland ’13, 2013.
gate control-flow attacks on smartphones. In Symposium on Net-
work and Distributed System Security (NDSS), 2012. [37] Solar Designer. ”return-to-libc” attack. Bugtraq, 1997.
[19] L. Davi, D. Lehmann, A.-R. Sadeghi, and F. Monrose. Stitch- [38] A. Sotirov and M. Dowd. Bypassing browser memory protections
ing the gadgets: On the ineffectiveness of coarse-grained control- in Windows Vista. http://www.phreedom.org/research/
flow integrity protection. Technical Report TUD-CS-2014-0097, bypassing-browser-memory-protections/, 2008.
Technische Universität Darmstadt, 2014.
[39] M. Thomlinson. Announcing the BlueHat Prize winners.
[20] I. Fratric. ROPGuard: Runtime prevention of return-oriented https://blogs.technet.com/b/msrc/archive/
programming attacks. 2012/07/26/announcing-the-bluehat-prize-
http://www.ieee.hr/_download/repository/Ivan_ winners.aspx?Redirected=true, 2012.
Fratric.pdf, 2012.
[40] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, Ú. Erlings-
[21] gera. Advances in format string exploitation. Phrack Magazine, son, L. Lozano, and G. Pike. Enforcing forward-edge control-
59(12), 2002. flow integrity in GCC & LLVM. In USENIX Security Symposium,
2014.
[22] E. Göktas, E. Athanasopoulos, H. Bos, and G. Portokalidis. Out
of control: Overcoming control-flow integrity. In IEEE Sympo- [41] V. van der Veen, N. dutt-Sharma, L. Cavallaro, and H. Bos. Mem-
sium on Security and Privacy, Oakland ’14, 2014. ory errors: The past, the present, and the future. In Symposium
on Research in Attacks, Intrustions, and Defenses (RAID), 2012.
[23] E. Göktas, E. Athanasopoulos, M. Polychronakis, H. Bos, and
[42] Z. Wang and X. Jiang. HyperSafe: A lightweight approach to
G. Portokalidis. Size does matter: Why using gadget-chain length
provide lifetime hypervisor control-flow integrity. In IEEE Sym-
to prevent code-reuse attacks is hard. In USENIX Security Sym-
posium on Security and Privacy, 2010.
posium, 2014.
[43] B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy,
[24] S. Jalayeri. Bypassing EMET 3.5’s ROP mitigations. https:
S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox
//repret.wordpress.com/2012/08/08/bypassing-
for portable, untrusted x86 native code. In IEEE Symposium on
emet-3-5s-rop-mitigations/, 2012.
Security and Privacy, 2009.
[25] D. Jang, Z. Tatlock, and S. Lerner. SAFEDISPATCH: Securing [44] B. Zeng, G. Tan, and G. Morrisett. Combining control-flow in-
C++ virtual calls from memory corruption attacks. In Symposium tegrity and static analysis for efficient and validated data sand-
on Network and Distributed System Security (NDSS), 2014. boxing. In ACM Conference on Computer and Communications
Security (CCS), 2011.
[26] jduck. The latest Adobe exploit and session upgrad-
ing. http://bugix-security.blogspot.de/2010/03/ [45] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant,
adobe-pdf-libtiff-working-exploitcve.html, 2010. D. Song, and W. Zou. Practical control flow integrity & random-
ization for binary executables. In IEEE Symposium on Security
[27] T. Kornau. Return oriented programming for the ARM architec-
and Privacy, Oakland ’13, 2013.
ture. Master’s thesis, Ruhr-University Bochum, 2009.
[46] M. Zhang and R. Sekar. Control flow integrity for COTS binaries.
[28] Microsoft. Data Execution Prevention (DEP). http:// In USENIX Security Symposium, 2013.
support.microsoft.com/kb/875352/EN-US/, 2006.
B Stack Pivot Gadgets dresses are hard-coded in the code of an executable and cannot be
changed by an adversary when W⊕X is enforced.
4 Specifically, kBouncer reports a ROP attack when a chain of 8
We take advantage of two distinct stack pivot gadgets short sequences has been executed, where a sequence is referred to as
shown in Table 8. The first one is our unconditional “short” whenever the sequence length is less than 20 instructions.
branch gadget, which moves ebp via the leave instruc- 5 The target address of an external function is dynamically allocated
tion to esp. The other sequence takes the value of esi in the global offset table (GOT) which is loaded by an indirect memory
jump in the procedure linkage table (PLT).
and loads it into esp. In both sequences, the adversary 6 For the interested reader, we have placed the specific assembler
must control the source register ebp and esi, respec- implementation of the long-NOP sequence in Appendix C.
7 We also simulated the analysis performed in [31] by setting n = 20.
tively. This is achieved by invoking a load register gadget
beforehand. Note also that several vulnerabilities allow However, we arrive at a significantly higher false positive rate than
in [31]. This is likely due to the fact that we perform our analysis on
an adversary to load these registers with the correct val- industry benchmark programs, while their analysis is based on open-
ues at the time the buffer overflow occurs, which would ing web-browsers or document readers. Furthermore, their focus is on
make the ROP attack easier. WinAPI calls, whereas in Figure 7 we instrument every call.