0% found this document useful (0 votes)
30 views17 pages

Stitching The Gadgets: On The Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection

This paper analyzes the effectiveness of coarse-grained control flow integrity (CFI) protections against return-oriented programming (ROP) attacks. It demonstrates that these CFI techniques can be undermined by new ROP attack primitives, even under weak adversarial assumptions. Specifically, turing-complete and real-world ROP attacks can still be launched when the strictest CFI enforcement policies are used.

Uploaded by

nilojoj246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views17 pages

Stitching The Gadgets: On The Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection

This paper analyzes the effectiveness of coarse-grained control flow integrity (CFI) protections against return-oriented programming (ROP) attacks. It demonstrates that these CFI techniques can be undermined by new ROP attack primitives, even under weak adversarial assumptions. Specifically, turing-complete and real-world ROP attacks can still be launched when the strictest CFI enforcement policies are used.

Uploaded by

nilojoj246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Stitching the Gadgets: On the Ineffectiveness of

Coarse-Grained Control-Flow Integrity Protection


Lucas Davi and Ahmad-Reza Sadeghi, Intel CRI-SC at Technische Universität Darmstadt;
Daniel Lehmann, Technische Universität Darmstadt; Fabian Monrose,
The University of North Carolina at Chapel Hill
https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/davi

This paper is included in the Proceedings of the


23rd USENIX Security Symposium.
August 20–22, 2014 • San Diego, CA
ISBN 978-1-931971-15-7

Open access to the Proceedings of


the 23rd USENIX Security Symposium
is sponsored by USENIX
Stitching the Gadgets:
On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection

Lucas Davi, Ahmad-Reza Sadeghi Daniel Lehmann


Intel CRI-SC, TU Darmstadt, Germany TU Darmstadt, Germany
Fabian Monrose
University of North Carolina at Chapel Hill, USA

Abstract (e.g., the Java virtual machine) that are in turn imple-
Return-oriented programming (ROP) offers a robust at- mented in type-unsafe languages.
tack technique that has, not surprisingly, been exten- Sadly, as modern compilers and applications become
sively used to exploit bugs in modern software programs more and more complex, memory errors and vulnera-
(e.g., web browsers and PDF readers). ROP attacks re- bilities will likely continue to persist, with little end in
quire no code injection, and have already been shown sight [41]. The most prominent example of a memory
to be powerful enough to bypass fine-grained memory error is the stack overflow vulnerability, where the adver-
randomization (ASLR) defenses. To counter this in- sary overflows a local buffer on the stack and overwrites
genious attack strategy, several proposals for enforce- a function’s return address [4]. While today’s defenses
ment of (coarse-grained) control-flow integrity (CFI) protect against this attack strategy (e.g., by using stack
have emerged. The key argument put forth by these canaries [15]), other avenues for exploitation exists, in-
works is that coarse-grained CFI policies are sufficient to cluding those that leverage heap [33], format string [21],
prevent ROP attacks. As this reasoning has gained trac- or integer overflow [6] vulnerabilities.
tion, ideas put forth in these proposals have even been Regardless of the attacker’s method of choice, exploit-
incorporated into coarse-grained CFI defenses in widely ing a vulnerability and gaining control over an applica-
adopted tools (e.g., Microsoft’s EMET framework). tion’s control-flow is only the first step of a runtime at-
In this paper, we provide the first comprehensive tack. The second step is to launch malicious program
security analysis of various CFI solutions (covering actions. Traditionally, this has been realized by inject-
kBouncer, ROPecker, CFI for COTS binaries, ROP- ing malicious code into the application’s address space,
Guard, and Microsoft EMET 4.1). A key contribution and later executing the injected code. However, with the
is in demonstrating that these techniques can be effec- wide-spread enforcement of the non-executable memory
tively undermined, even under weak adversarial assump- principle (called data execution prevention in Windows)
tions. More specifically, we show that with bare mini- such attacks are more difficult to do today [28]. Unfortu-
mum assumptions, turing-complete and real-world ROP nately, the long-held assumption that only new injected
attacks can still be launched even when the strictest of code bared risks was shattered with the introduction of
enforcement policies is in use. To do so, we intro- code reuse attacks, such as return-into-libc [30, 37] and
duce several new ROP attack primitives, and demonstrate return-oriented programming (ROP) [35]. As the name
the practicality of our approach by transforming existing implies, code reuse attacks do not require any code injec-
real-world exploits into more stealthy attacks that bypass tion and instead use code already resident in memory.
coarse-grained CFI defenses. One of the most promising defense mechanisms
against such runtime attacks is the enforcement of
1 Introduction control-flow integrity (CFI) [1, 3]. The main idea of CFI
Today, runtime attacks remain one of the most prevalent is to derive an application’s control-flow graph (CFG)
attack vectors against software programs. The continued prior to execution, and then monitor its runtime behavior
success of these attacks can be attributed to the fact that to ensure that the control-flow follows a legitimate path
large portions of software programs are implemented in of the CFG. Any deviation from the CFG leads to a CFI
type-unsafe languages (C, C++, or Objective-C) that do exception and subsequent termination of the application.
not enforce bounds checking on data inputs. Moreover, Although CFI requires no source code of an appli-
even type-safe languages (e.g., Java) rely on interpreters cation, it suffers from practical limitations that impede

USENIX Association 23rd USENIX Security Symposium 401


its deployment in practice, including significant perfor- protections. We also demonstrate a proof-of-concept at-
mance overhead of 21%, on average [3, Section 5.4], tack against a Linux-based system.
when function returns are validated based on a return ad-
dress (shadow) stack. To date, several CFI frameworks
2 Background
have been proposed that tackle the practical shortcom- 2.1 Return-Oriented Programming
ings of the original CFI approach. ROPecker [13] and Return-oriented programming (ROP) belongs to the class
kBouncer [31], for example, leverage the branch history of runtime attacks that require no code injection. The ba-
table of modern x86 processors to perform a CFI check sic idea is to combine short code sequences already re-
on a short history of executed branches. More recently, siding in the address space of an application (e.g., shared
Zhang and Sekar [46] demonstrate a new CFI binary in- libraries and the executable itself) to perform malicious
strumentation approach that can be applied to commer- actions. Like any other runtime attack, it first exploits a
cial off-the-shelf binaries. vulnerability in the software running on the targeted sys-
However, the benefits of these state-of-the-art solu- tem. Relevant vulnerabilities are memory errors (e.g.,
tions comes at the price of relaxing the original CFI pol- stack, heap, or integer overflows [33]) which can be
icy. Abstractly speaking, coarse-grained CFI allows for discovered by reverse-engineering the target program.
CFG relaxations that contains dozens of more legal exe- Once a vulnerability has been discovered, the adversary
cution paths than would be allowed under the approach needs to exploit it by providing a malicious input to the
first suggested by Abadi et al. [3]. The most notable dif- program, the so-called ROP payload. The applicability
ference is that the coarse-grained CFI policy for return of ROP has been shown on many platforms including
instructions only validates if the return address points to x86 [35], SPARC [7], and ARM [27].
an instruction that follows after a call instruction. In con-
trast, Abadi et al. [3]’s policy for fine-grained CFI en- asm_ins
sures that the return address points to the original caller RET ADDR 3
asm_ins
RET
of a function (based on a shadow stack). That is, a func- DATA WORD 2 ROP Sequence 3
tion return is only allowed to return to its original caller. DATA WORD 1 POP REG1
POP REG2
Surprisingly, even given these relaxed assumptions, Stack RET ADDR 2 RET

all recent coarse-grained CFI solutions we are aware of Pointer RET ADDR 1 ROP Sequence 2
asm_ins
(SP)
claim that their relaxed policies are sufficient to thwart Memory Layout asm_ins
for ROP Attack RET
ROP attacks1 . In particular, they claim that the property
ROP Sequence 1
of Turing-completeness is lost due to the fact that the
code base which an adversary can exploit is significantly Figure 1: Memory snapshot of a ROP Attack
reduced. Yet, to date, no evidence substantiating these
assertions has been given, raising questions with regards An example ROP payload and a typical memory lay-
to the true effectiveness of these solutions. out for a ROP attack is shown in Figure 1. Basically, the
Contribution. We revisit the assumption that coarse- ROP payload consists of a number of return addresses
grained CFI offers an effective defense against ROP. each pointing to a short code sequence. These sequences
For this, we conduct a security analysis of the re- consist of a small number of assembler instructions (de-
cently proposed CFI solutions including kBouncer [31], noted in Figure 1 as asm ins), and traditionally termi-
ROPecker [13], CFI for COTS binaries [46], ROP- nate in a return [35] instruction2 . The indirect branches
Guard [20], and Microsofts’ EMET tool [29]. In particu- are responsible for chaining and executing one ROP se-
lar, we derived a combined CFI policy that takes for each quence after the other.
indirect branch class (i.e., return, indirect jump, indirect In addition to return addresses, the adversary writes
call) and behavioral-based heuristics (e.g., the number several data-words in memory that are used by the in-
of instruction executed between two indirect branches), voked code sequences (usually via stack POP instruc-
the most restrictive setting among these policies. After- tions as shown in ROP Sequence 2). At the beginning of
wards, we use our combined CFI policy and a weak ad- the attack, the stack pointer (SP) points to the first return
versary having access to only a single — and common address of the payload. Once the first sequence has been
used system library — to realize a Turing-complete gad- executed, its final return instruction (RET) advances the
get set. The reduced code base mandated that we develop stack pointer by one memory word, loads the next return
several new return-oriented programming attack gadgets address from the stack, and transfers the control-flow to
to facilitate our attacks. To demonstrate the power of our the next code sequence.
attacks, we show how to harden existing real-world ex- The combination of the invoked ROP sequences in-
ploits against the Windows version of Adobe Reader [26] duce the malicious operations. Typically, these se-
and mPlayer [10] so that they bypass coarse-grain CFI quences are identified within an (offline) static analy-

402 23rd USENIX Security Symposium USENIX Association


sis phase on the target program binary and its linked An example for CFI enforcement is shown in Figure 2.
shared libraries. Furthermore, one or multiple ROP se- It shows a program consisting of a main function that
quences can form a gadget, where a gadget accomplishes invokes directly the library function printf(), and indi-
a specific task such as adding two values or storing a rectly the local subroutine function1(). The indirect call
data word into memory. These gadgets typically form to function1() in BBL 2 is critical, since an adversary
a Turing-complete language meaning that an adversary may load an arbitrary address into the register by means
can perform arbitrary (malicious) computation. of a buffer overflow attack. However, the CFG states
A well-known defense against ROP is address space that this indirect call is only allowed to target function1().
layout randomization (ASLR) which randomizes the Hence, at runtime, CFI validates whether the indirect call
base address of libraries and executables, thereby ran- in BBL 2 is targeting label fn1. If an adversary aims
domizing the start addresses of code sequences needed to redirect the call to a code sequence residing in func-
by the adversary in her ROP attack. However, ASLR is tion2(), CFI will prevent this malicious control-flow, be-
vulnerable to memory disclosure attacks, which reveal cause label fn1 is not defined for function2(). Similarly,
runtime addresses to the adversary. Memory disclosure CFI protects the return instructions of printf() and func-
can even be exploited to circumvent fine-grained ASLR tion1(), which an adversary could both exploit by over-
schemes, where the location of each code block is ran- writing a return address on the stack. The specific CFI
domized in memory by identifying ROP gadgets on-the- checks in Figure 2 validate if the returns address label
fly and generating a ROP payload at runtime [36]. ra1 or ra2, respectively.
It is also prudent to note that CFI has been studied in
2.2 Control-Flow Integrity many domains. For instance, it has been used as an en-
abling technology for software fault isolation by Abadi
Although W⊕X, ASLR and other protection mecha- et al. [2] and Yee et al. [43]. CFI enforcement has also
nisms have been widely adopted, their security bene- been shown for hypervisors [42], commodity operating
fits remain open to debate [1]. The main critique is system kernels [16] and mobile devices [18]. In other
the lack of a clear attack model and formal reasoning. communities, Zeng et al. [44] and Pewny and Holz [32],
To address this, Abadi et al. [3] proposed a new secu- for example, have shown how to instrument a compiler to
rity property called control-flow integrity (CFI). A pro- generate CFI-protected applications. Lastly, Budiu et al.
gram maintains CFI if its path of execution adheres to a [8] have explored architectural support to tackle the per-
certain pre-defined control-flow graph (CFG). This CFG formance overheads of software-only based solutions.
consists of basic blocks (BBLs) as nodes, where a BBL
is a sequence of assembler instructions. Edges connect 2.3 Control-Flow Integrity Challenges
two nodes, whenever the program may legally transfer
There are several factors that impede the deployment of
control-flow from one to the next BBL. A control-flow
control-flow integrity (CFI) in practice, including those
transfer may be either a direct or indirect branch instruc-
related to control-flow graph (CFG) coverage, perfor-
tion (e.g., call, jump, or return). To ensure that a program
mance, robustness, and ease of deployment.
follows a valid path in the CFG, CFI inserts labels at the
Before proceeding further, we note that besides pre-
beginning of basic blocks. Whenever there is a control-
senting the design of CFI, Abadi et al. [3] also included
flow transfer at runtime, CFI validates whether the indi-
a formal security proof for the soundness of their solu-
rect branch targets a BBL with a valid label.
tion. A key observation noted in that work is that “de-
spite attack steps, the program counter always follows
main(): printf(): function1(): the CFG.” [3, p. 4:34]. In other words, in Abadi et al.
[3], every control-flow is permitted as long as the CFG
BBL 1

… … label fn1
CALL printf RET … allows it. Consequently, the quality of protection from
target = ra1?
RET control-flow attacks rests squarely on the level of CFG
label ra1
coverage. And that is exactly where recent CFI solutions
BBL 2


CALL [REG]
target = fn1?
function2(): have deviated (substantially) from the original work, pri-
target = ra2?
marily as a means to address performance issues.
label ra2 …
Recall that in the original proposal, the CFG was
BBL 3

… asm_instr
RET asm_instr obtained a priori using binary analysis techniques sup-
RET ported by a proprietary framework called Vulcan. Since
Intended control flow the CFG is created ahead of time, it is not capable of cap-
Non-Intended (malicious) control flow turing the dynamic nature of the call stack. That is, with
only the CFG at hand, one can not enforce that functions
Figure 2: The CFG shepherds control-flow transfers return to their most recent call site, but only that they re-

USENIX Association 23rd USENIX Security Symposium 403


turn to any of the possible call sites. This limitation is Category Policy x86 Example Description
tackled by adding a shadow stack to the statically cre- CFIRET ret returns
ated CFG. Intuitively, upon each call, the return address  CFIJMP jmp reg|mem indirect jumps
CFICALL call reg|mem indirect calls
is placed in a safe location in memory, so that an instru-
 CFIHEU heuristics
mented return is able to compare the return address on
 CFIT OC time of CFI check
the stack with one on a shadow stack, and the program is
terminated if a deviation is detected [3, 14, 18]. In this Table 1: Our CFI policies
way, many control-flow transfers are prohibited, largely
reducing the gadget space available for a return-oriented
programming attack. policies for indirect branches. Finally, the time-of-check
Given the power of CFI, it is surprising that it has policy is an important aspect, because it states at which
not yet received widespread adoption. The reason lies execution state ROP attacks can be detected. We elabo-
in the fact that extracting the CFG is not as simple as rate further on each of these categories below.
it may appear. To see why, notice that (1) source code
1 – Indirect Branches. Recall that the goal of CFI is to
is not readily available (thereby limiting compiler-based
validate the control-flow path taken at indirect branches,
approaches), (2) binaries typically lack the necessary de-
i.e., at those control-flow instructions that take the target
bug or relocation information, as was needed for exam-
address from either a processor register or from a data
ple, in the Vulcan framework, and (3) the approach in-
memory area3 . The indirect branch instructions present
duces high performance overhead due to dynamic rewrit-
on an Intel x86 platform are indirect calls, indirect jumps,
ing and runtime checks. Much of the academic research
and returns. Since CFI solutions apply different poli-
on CFI in the last few years has focused on techniques
cies for each type of indirect branch, it is only natural
for tackling these drawbacks.
that there are three CFI policies in this category, denoted
3 Categorizing Coarse-Grained Control- as CFICALL (indirect function calls), CFIJMP (indirect
Flow Integrity Approaches jumps), CFIRET (function returns).

As noted above, a number of new control-flow integrity 2 – Behavior-Based Heuristics (HEU). Apart from
(CFI) solutions have been recently proposed to address enforcing specific policies on indirect branch instruc-
the challenges of good runtime performance, high ro- tions, CFI solutions can also validate other program be-
bustness and ease of deployment. The most prominent havior to detect ROP attacks. One prominent example
examples include kBouncer [31], ROPecker [13], CFI for is the number of instructions executed between two con-
COTS binaries [46], and ROPGuard [20]. To aide in bet- secutive indirect branches. The expectation is that the
ter understanding the strenghts and limitations of these number of such instructions will be low (compared to
proposals, we first provide a taxonomy of the various CFI ordinary execution) because ROP attacks invoke a chain
policies embodied in these works. Later, to strengthen of short code sequences each terminating in an indirect
our own analyses, we also derive a combined CFI policy branch instruction.
that takes into account the most restrictive CFI policy. 3 – Time of CFI Check (TOC). Abadi et al. argued
3.1 CFI Policies that a CFI validation routine should be invoked whenever
the program issues an indirect branch instruction [3]. In
Table 1 summarizes the five CFI policies we use through- practice, however, doing so induces significant perfor-
out this paper to analyze the effectiveness of coarse- mance overhead. For that reason, some of the more
grained CFI solutions. Specifically, we distinguish be- recent CFI approaches reduce the number of runtime
tween three types of policies, namely  policies used checks, and only enforce CFI validation at critical pro-
for indirect branch instructions,  general CFI heuristics gram states, e.g., before a system or API call.
that do not provide well-founded control-flow checks but
instead try to capture general machine state patterns of 3.2 Instantiation in Recent Proposals
ROP attacks and  a policy class that covers the time Next, we turn our attention to the specifics of how these
CFI checks are enforced. policies are implemented in recent CFI mechanisms.
We believe this categorization covers the most impor-
3.2.1 kBouncer
tant aspects of CFI-based defenses suggested to date. In
particular, they cover polices for each indirect branch The approach of Pappas et al. [31], called kBouncer, de-
the processor supports since all control-flow attacks (in- ploys techniques that fall in each of the aforementioned
cluding ROP) require exploiting indirect branches. Sec- categories. Under category , Pappas et al. [31] lever-
ond, heuristics are used by several coarse-grained CFI age the x86-model register set called last branch record
approaches (e.g., [20, 31]) to allow more relaxed CFI (LBR). The LBR provides a register set that holds the

404 23rd USENIX Security Symposium USENIX Association


last 16 branches the processor has executed. Each branch under category  is for validating that the stack pointer
is stored as a pair consisting of its source and target ad- does not point to a memory location beyond the stack
dress. kBouncer performs CFI validation on the LBR boundaries. While doing so prevents ROP payload exe-
entries whenever a Windows API call is invoked. Its cution on the heap, it does not prevent traditional stack-
promise resides in the fact that these checks induce al- based ROP attacks; thus the adversary could easily reset
most no performance overhead, and can be directly ap- the stack pointer before a critical function is called.
plied to existing software programs.
Remarks: ROPGuard and its implementation in
With respect to its policy for returns, kBouncer iden-
Microsoft EMET [5] use similar CFI policies as in
tifies those LBR entries whose source address belong to
kBouncer. One difference is that kBouncer checks the
a return instruction. For these entries, kBouncer checks
indirect branches executed in the past, while ROPGuard
whether the target address (i.e., the return address) points
only checks the current return address of the critical
to a call-preceded instruction. A call-preceded instruc-
function, and for future execution of ROP gadgets. ROP-
tion is any instruction in the address space of the applica-
Guard is vulnerable to ROP attacks that are capable of
tion that follows a call instruction. Internally, kBouncer
jumping over the CFI policy hooks, and cannot prevent
disassembles a few bytes before the target address and
ROP attacks that do not attempt to call any critical Win-
terminates the process if it fails to find a call instruction.
dows function. To tackle the former problem (i.e., by-
While kBouncer does not enforce any CFI check passing the policy hook), EMET adds some randomness
on indirect calls and jumps, Pappas et al. [31] pro- in the length and structure of the policy hook instruc-
pose behavioral-based heuristics (category ) to mitigate tions. Hence, the adversary has to guess the right offset
ROP attacks. In particular, the number of instructions ex- to successfully deploy her attack. However, recent mem-
ecuted between consecutive indirect branches (i.e., “the ory disclosure attacks show that such randomization ap-
sequence length”) is checked, and a limit is placed on the proaches can be easily circumvented [36].
number of sequences that can be executed in a row.4
A key observation by Pappas et al. [31] is that even 3.2.3 ROPecker
though pure ROP payloads can perform Turing-complete ROPecker is a linux-based approach suggested by Cheng
computation, in actual exploits they will ultimately need et al. [13] that also leverages the last branch record reg-
to interact with the operating system to perform a mean- ister set to detect past execution of ROP gadgets. More-
ingful task. Hence, as a time-of-CFI check policy (cate- over, it speculatively emulates the future program exe-
gory ) kBouncer instruments and places hooks at the cution to detect ROP gadgets that will be invoked in the
entry of a WinAPI function. Additionally, it writes a near future. To accomplish this, a static offline phase is
checkpoint after CFI validation to prohibit an adversary required to generate a database of all possible ROP code
from simply jumping over the hook in userspace. sequences. To limit false positives, Cheng et al. [13] sug-
3.2.2 ROPGuard and Microsoft EMET gest that only code sequences that terminate after at most
n instructions in an indirect branch should be recorded.
Similar to Pappas et al. [31], the approach suggested by
For its policies in category , ROPecker inspects each
Fratric [20] (called ROPGuard) performs CFI validation
LBR entry to identify indirect branches that have redi-
when a critical Windows function is called. However, its
rected the control-flow to a ROP gadget. This decision
policies differ from that of Pappas et al. [31].
is based on the gadget database that ROPecker derived in
First, with respect to policies under category , upon
the static analysis phase. ROPecker also inspects the pro-
entering a critical function, ROPGuard validates whether
gram stack to predict future execution of ROP gadgets.
the return address of that critical function points to a
There is no direct policy check for indirect branches,
call-preceded instruction. Hence, it prevents an adver-
but instead, possible gadgets are detected via a heuristic.
sary from using a ROP sequence terminating in a return
More specifically, the robustness of its behavioral-based
instruction to invoke the critical Windows function. In
heuristic (category ) completely hinges on the assump-
addition, ROPGuard checks if the memory word before
tion that ROP code sequences will be short and that there
the return address is the start address of the critical func-
will always be a chain of at least some threshold number
tion. This would indicate that the function has been en-
of consecutive ROP sequences.
tered via a return instruction. ROPGuard also inspects
Lastly, its time of CFI check policy (category ) is
the stack and predicts future execution to identify ROP
triggered whenever the program execution leaves a slid-
gadgets. Specifically, it walks the stack to find return ad-
ing window of two memory pages.
dresses. If any of these return addresses points to a non
call-preceded instruction, the program is terminated. Remarks: Clearly, ROPecker performs more fre-
Interestingly, there is no CFI policy for indirect calls or quently CFI checks than both kBouncer and ROPGuard.
indirect jumps. Furthermore, ROPGuard’s only heuristic Hence, it can detect ROP attacks that do not necessar-

USENIX Association 23rd USENIX Security Symposium 405


ily invoke critical functions. However, as we shall show since memory disclosure attacks can reveal the content
later, the fact that there is no policy for the target of indi- of the entire springboard section [36]. The CFI policies
rect branches is a significant limitation. enforced by CCFIR are in principle covered by CFI for
COTS binaries. However, there is one noteworthy policy
3.2.4 CFI for COTS Binaries
addition: CCFIR denies indirect calls and jumps to target
Most closely related to the original CFI work by Abadi pre-defined sensitive functions (e.g., VirtualProtect). We
et al. [3] is the proposal of Zhang and Sekar [46] do not consider this policy for two reasons: first, this pol-
which suggest an approach for commercial-off-the-shelf icy violates the default external library call dispatching
(COTS) binaries based on a static binary rewriting ap- mechanism in Linux systems. Any application linking
proach, but without requiring debug symbols or reloca- to such a sensitive (external) function will use an indi-
tion information of the target application. In contrast to rect jump to invoke it.5 Second, as shown in detail by
all the other approaches we are aware of, the CFI checks Göktas et al. [22] there are sufficient direct calls to sen-
are directly incorporated into the application binary. To sitive functions in Windows libraries which an adversary
do so, the binary is disassembled using the Linux dis- can exploit to legitimately transfer control to a sensitive
assembler objdump. However, since that disassembler function.
uses a simple linear sweep disassembly algorithm, Zhang Remarks: The approach of Zhang and Sekar [46] is
and Sekar [46] suggest several error correction meth- most similar to Abadi et al. [3]’s original proposal in that
ods to ensure correct disassembly. Moreover, potential it enforces CFI policies each time an indirect branch is
candidates of indirect control-flow target addresses are invoked. However, to achieve better performance and
collected and recorded. These addresses comprise pos- to support COTS binaries, it deploys less fine-grained
sible return addresses (i.e., call-preceded instructions), CFI policies. Alas, its coarse-grain policies allow one
constant code pointers (including memory locations of to bypass the restrictions for indirect call instructions
pointers to external library calls), and computed code (CFICALL ). The main problem is caused by the fact
pointers (used for instance in switch-case statements). that the integrity of indirect call pointers is not vali-
Afterwards, all indirect branch instructions are instru- dated. Instead, it is only enforced that an indirect call
mented by means of a jump to a CFI validation routine. takes a pointer from a memory location that is expected
Like the aforementioned works, the approach of to hold indirect call targets. A typical example is the
Zhang and Sekar [46] checks whether a return or an in- Linux global offset table (GOT) which holds the target
direct jump targets a call-preceded instruction. Further- addresses for library calls. This leaves the solution vul-
more, it also allows returns and indirect jumps to target nerable to so-called GOT-overwrite attacks [9] that over-
any of the constant and computed code pointers, as well write pointers (in the GOT) to external library calls. We
as exception handling addresses. Hence, the CFI policy return to this vulnerability in §5. Moreover, even if one
for returns is not as strict as in kBouncer, where only call- would ensure the integrity of these pointers, we are still
preceded instructions are allowed. On the other hand, allowed to use a valid code pointer defined in the exter-
their approach deploys a CFI policy for indirect jumps, nal symbols. Hence, the adversary can invoke dangerous
which is largely unmonitored in the other approaches. functions such as VirtualAlloc() and memcpy() that are
However, it does not deploy any behavioral-based heuris- frequently used in applications and libraries.
tics (category ).
Lastly, CFI validation (category ) is performed 3.3 Deriving a Combined CFI Policy
whenever an indirect branch instruction is executed. In our analysis that follows, we endeavor to have the best
Hence, it has the highest frequency of CFI validation in- possible protections offered by the aforementioned CFI
vocation among all discussed CFI approaches. mechanisms in place at the time of our evaluation. There-
Similar CFI policies are also enforced by CCFIR fore, our combined CFI policy (see Table 2) selects the
(compact CFI and randomization) [45]. In contrast to most restrictive setting for each policy. Nevertheless, de-
CFI for COTS binaries, all control-flow targets for in- spite this combined CFI policy, we then show that one
direct branches are collected and randomly allocated on can still circumvent these coarse-grained CFI solutions,
a so-called springboard section. Indirect branches are construct Turing-complete ROP attacks (under realistic
only allowed to use control-flow targets contained in that assumptions) and launch real-world exploits.
springboard section. Specifically, CCFIR enforces that At this point, we believe it is prudent to comment on
returns target a call-preceded instruction, and indirect the parameter choices in these prior works — and that
calls and jumps target a previously collected function adopted in Table 2. In particular, one might argue that the
pointer. Although the randomization of control-flow tar- prerequisite thresholds could be adjusted to make ROP
gets in the springboard section adds an additional layer attacks more difficult. To that end, we note that Pappas
of security, it is not directly relevant for our analysis, et al. [31] performed an extensive analysis to arrive at the

406 23rd USENIX Security Symposium USENIX Association


6]

y
[4

lic
0]

9]
3]
]
TS

[2
31

Po
[2
[1
CO

r[

d
r

r
ke

4.

ne
ua
e
nc
or

ET

bi
PG
Pe
ou
If

m
EM
RO

RO
Control-Flow Integrity (CFI) Policies

CF

kB

Co
CFIRET : destination has to be call-preceded      
CFIRET : destination can be taken from a code pointer      
CFIJMP : destination has to be call-preceded      
CFIJMP : destination can be taken from a code pointer      
CFICALL : destination can be taken from an exported symbol      
CFICALL : destination can be taken from a code pointer      
CFIHEU : allow only s consecutive short sequences,  s <= 7 s <= 10   s <= 7
CFIHEU : where short is defined as n instructions  n <= 20 n <= 6   n <= 20
CFIT OC : check at every indirect branch     
Always
CFIT OC : check at critical API functions or system calls     
observed
CFIT OC : check when leaving sliding code window     

Table 2: Policy comparison of coarse-grained CFI solutions:  indicates that the CFI policy is applied and enforced.  means that
the CFI policy is prohibited (corresponding execution flows would lead to an attack alarm) .  indicates that the CFI policy is not
applied/enforced. The combined policy takes the most restrictive setting for each CFI policy.

best range of thresholds for the recommended number of coarse-grained CFI protections are enforced. In particu-
consecutive short sequences (s) with a given sequence lar, we desire a gadget set that still allows an adversary
length of n <= 20. Their analysis reveals that adjusting to undermine the combined CFI policy (see Table 2).
the thresholds for s beyond their recommended values Assumptions. To be as pragmatic as possible, we as-
is hardly realistic: when every function call was instru- sume that the adversary can only leverage the presence
mented, 975 false positives were recorded for s <= 8. of a single shared library to derive the gadget set. This
An alternative is to increase the sequence length n is a very stringent requirement placed on ourselves since
(e.g., setting it to n <= 40). Doing so would require an modern programs typically link to dozens of libraries.
adversary to find a long sequence of 40 instructions after Note also that we are not concerned with circumvent-
each seventh short sequence (for s <= 7). However, in- ing other runtime protection mechanisms such as ASLR
creasing the threshold for the sequence length will only or stack canaries. The reasons are twofold: first, coarse-
exacerbate the false positive issue. For this reason, Pap- grained CFI protection approaches do not rely on the
pas et al. [31] did not consider sequences consisting of presence of other defenses to mitigate against code reuse
more than 20 instructions as a gadget in their analyses. attacks. Second, in contrast to CFI, ASLR and protection
We provide our own assessment in §5.3. mechanisms that defend against code pointer overwrites
The approach of Cheng et al. [13], on the other hand, (e.g., stack canaries, bounds checkers, pointer encryp-
uses different thresholds for s and n than in kBouncer. tion) do not offer a general defense, and moreover, are
Making the thresholds in ROPecker more conservative typically bypassed in practice. In particular, ASLR is
(e.g., reducing s and increasing n) will lead to the same vulnerable to memory disclosure attacks [36, 38]. That
false positives problems as in kBouncer. Moreover, the said, the attacks and return-oriented programming gad-
problem would be worse, since ROPecker performs CFI gets we present in the following can be also leveraged to
validation more frequently than kBouncer. Nevertheless, mount memory disclosure attacks in the first stage.
we show that regardless of the specific choice of parame-
Methodology and Outline. Our analysis is performed
ter chosen in the recommended ranges, our attacks render
primarily on Windows as it is the most widely deployed
these defenses ineffective in practice (see Section 5).
desktop operating system today. Specifically, we inspect
4 Turing-Complete ROP Gadget Set kernel32.dll (on x86 Windows 7 SP1), a 848kb sys-
tem library that exposes Windows API functions and is
We now explore whether or not it is possible to derive a by default linked to nearly every major Windows pro-
Turing-complete gadget set even when all state-of-the-art cess (e.g., Adober Reader, IE, Firefox, MS Office). It

USENIX Association 23rd USENIX Security Symposium 407


is also noteworthy to mention that our results do not struction, where leave behaves as mov esp,ebp; pop
only apply to Windows; Although we did not perform ebp. However, we can handle this side-effect, since the
a Turing-complete gadget analysis for Linux’s default li- stack pointer receives the value from our intermediate
brary (libc.so), to demonstrate the generality of our register ebp. Hence, we first invoke the load gadget for
approach, we provide a shellcode exploit that uses gad- ebp and load the desired stack pointer value, and after-
gets from libc (see §5.2). To facilitate the gadget find- wards call the sequence for edi/ecx.
ing process, we developed a static analysis python mod- More challenges arise when loading the general-
ule in IDA Pro that outputs all call-preceded sequences purpose registers eax, ebx, and edx. While ebx can be
ending in an indirect branch. We also developed a se- loaded with side-effects, we were not able to find any
quence filter in the general purpose D programming lan- useful stack pop sequence for eax and edx. This is not
guage that allows us to check for sequences containing a surprising given the fact that we must use call-preceded
specific register, instruction, or memory operand. Note sequences. Typically, these sequences are found in func-
that in the subsequent discussions, we use the Intel as- tion epilogues, where a function epilogue is responsible
sembler syntax, e.g., mov destination, source, and for resetting the caller-saved registers (esi, edi, epb).
use a semicolon to separate two consecutive instructions. We alleviate the side-effects for ebx by loading all the
We first review in §4.1 the basic gadgets that form a caller-saved registers from the stack.
Turing-complete language [12, 35]. To achieve Turing-
completeness, we require gadgets to realize memory Register Call-Preceded Sequence (ending in ret)
load and store operations, as well as a gadget to real- EBP pop ebp
ESI pop esi; pop ebp
ize a conditional branch. Afterwards, we present two
EDI pop edi; leave
new gadget types called the Call-Ret-Pair gadget (§4.2.1) ECX pop ecx; leave
and the Long-NOP gadget (§4.2.2). Constructing the EBX pop edi; pop esi; pop ebx; pop ebp
latter was a non-trivial engineering task and the out- EAX mov eax,edi; pop edi; leave
come played an important role in “stitching” gadgets to- EDX mov eax,[ebp-8]; mov edx,[ebp-4];
gether, thereby bypassing coarse-grained CFI defenses. pop edi; leave
It should also be noted that we only present a subset of
the available sequences. Eliminating the specific few se- Table 3: Register Load Gadgets
quences presented here will not prevent our attack, since
kernel32.dll (and many other libraries) provides a For eax and edx, data movement gadgets can be used.
multitude of other sequences we could have leveraged. As can be seen in Table 3, eax can be loaded using the
edi load gadget in advance. The situation is more com-
4.1 Basic Gadget Arsenal plicated for edx, especially given our choice to only use
Loading Registers. Load gadgets are leveraged in kernel32.dll. In particular, while there is a sequence
nearly every ROP exploit to load a value from the stack that allows one to load edx by using the ebp load gadget
into a CPU register. Recall that x86 provides six general beforehand, it is challenging to do so since the adversary
registers (eax, ebx, ecx, edx, esi, edi), a base/frame would need to save the state of some registers. That said,
pointer register (ebp), the stack pointer (esp), and the other default Windows libraries (such as shell32.dll)
instruction pointer (eip). All registers can be directly ac- offer several more convenient gadgets to load edx (e.g.,
cessed (read and write) by assembler instructions except a common sequence we observed was pop edx; pop
the eip which is only indirectly influenced by dedicated ecx; jmp eax), and so this limitation should not be a
branch instructions such as ret, call, and jmp. major obstacle in practice.
Typically, stack loading is achieved on x86 via the POP Loading and Storing from Memory. In general, soft-
instruction. The call-preceded load gadgets we identi- ware programs can only accomplish their tasks if the
fied in kernel32.dll are summarized in Table 3. Ex- underlying processor architecture provides instructions
cept for the ebp register, we are not able to load any for loading from memory and storing values to memory.
other register without inducing a side-effect, i.e., with- Similarly, ROP attacks require memory load and store
out affecting other registers. That said, notice that the gadgets. Although we have found several load and store
sequence for esi, edi, and ecx only modifies the base gadgets, we focus on the gadgets listed in Table 4.
pointer (ebp). Because traditionally ebp holds the base In particular, we discovered load gadgets that use eax
pointer and no data, and ordinary programs can be com- as the destination register. The specific load gadget
piled without using a base pointer, we consider ebp as an shown in Table 4 loads a value from memory pointed to
intermediate register in our gadget set. The astute reader by ebp+8. Hence, the adversary is required to correctly
would have noticed that the sequences for edi and ecx set the target address of the memory load operation in
modify the stack pointer as well through the leave in- ebp via the register load gadget shown in Table 3.

408 23rd USENIX Security Symposium USENIX Association


Type Call-Preceded Sequence (ending in ret) Type Call-Preceded Sequence
LOAD (eax) mov eax, [ebp+8]; pop ebp (ending in ret)
STORE (eax) mov [esi],eax; xor eax,eax; unconditional branch 1 leave
pop esi; pop ebp unconditional branch 2 add esp,0Ch; pop ebp
STORE (esi) mov [ebp-20h],esi conditional LOAD(eax) neg eax; sbb eax,eax;
STORE (edi) mov [ebp-20h],edi and eax,[ebp-4];leave

Table 4: Selected Memory Load and Store Gadgets Table 6: Branching Gadgets

We also identified a corresponding memory store gad- a new address that has been loaded before into our in-
get on eax. The shown gadget stores eax at the address termediate register ebp. The second variant realizes the
provided by register esi, which needs to be initialized unconditional branch by adding a constant offset to esp.
by a load register gadget beforehand. The gadget has no Either one suffices for our purposes.
side-effects, since it resets eax (which was stored earlier) Conditional branch gadgets change the stack pointer
and loads new values from the stack into esi (which held iff a particular condition holds. Because load, store, and
the target address) and ebp (our intermediate register). arithmetic/logic computation can be conveniently done
Given a memory store gadget for eax and the fact that for eax, we could place the conditional in this regis-
we have already identified register load gadgets for each ter. Unfortunately, because a direct load of esp (that de-
register, it is sufficient to use the same memory load on pended on the value of eax) was not readily available, we
eax to load any other register. This is possible because realized the conditional branch in three steps requiring
we use the eax load gadget to load the desired value from the invocation of only four ROP sequences. That said,
memory, store it afterwards on the stack, and finally use our gadget is still within the constraints for the number of
one of the register load gadgets to load the value into the allowable consecutive sequences in the Combined CFI-
desired register. Finally, we also identified some conve- enforcement Policy (see n = 8 for CFIHEU in Table 2).
nient memory store gadgets for esi and edi only requir- First, we use the conditional branch gadget (see Ta-
ing ebp to hold the target address of the store operation. ble 6) to either load 0 or a prepared value into eax. In
Arithmetic and Logical Gadgets. For arithmetic op- this sequence neg eax computes the two’s complement
erations we utilize the sequence containing the x86 sub and, more importantly, sets the carry flag to zero if and
instruction shown in Table 5. This instruction takes the only if eax was zero beforehand. This is nicely used by
operands from eax and esi and stores the result of the the subsequent sbb instruction, which subtracts the reg-
subtraction into eax. Both operands can be loaded by us- ister from itself, always yielding zero, but additionally
ing the register load gadgets (see Table 3). The same gad- subtracting an extra one if the carry flag is set. Because
get can be used to perform an addition: one only needs subtracting one from zero gives 0xFFFFFFFF, the next
to load the two’s complement into esi. Based on addi- and masks either none or all the bits. Hence, the re-
tion and subtraction, we can realize multiplication and sult in eax will be exactly the contents of [ebp-4] if
division as well. Unfortunately, logical gadgets are not eax was zero, or zero otherwise. One might think that
as commonplace. There is, however, a XOR gadget that it is very unlikely to find sequences that follow the pat-
takes its operands from eax and edi (see Table 3). tern neg-sbb-and. However, we found 16 sequences in
kernel32.dll that follow the same pattern and could
Type Call-Preceded Sequence (ending in ret) have been leveraged for a conditional branch gadget.
ADD/SUB sub eax,esi; pop esi; pop ebp We then use the ADD/SUB gadget (see Table 5) to
XOR xor eax,edi; pop edi; pop esi; subtract esi from eax so that the latter holds the branch
pop ebp offset for esp. Finally, we move eax into esp using
the stack as temporary storage. The STORE(eax) gad-
Table 5: Arithmetic and Logical Gadgets
get (see Table 4) will store the branch offset on the stack,
where pop ebp followed by the unconditional branch 1
Branching Gadgets. We remind the reader that gadget loads it into esp.
branching in ROP attacks is realized by modifying the
4.2 Extended Gadget Set
stack pointer rather than the instruction pointer [35].
In general, we can distinguish two different types For those readers who have either written or analyzed
of branches: unconditional and conditional branches. real-world ROP exploits before, it would be clear to them
kernel32.dll, for example, offers two variants for a that several other gadgets are useful in practice. For ex-
unconditional branch gadget (see Table 6). The first uses ample, modern exploits usually invoke several WinAPI
the leave instruction to load the stack pointer (esp) with functions to perform malicious actions, e.g., launching

USENIX Association 23rd USENIX Security Symposium 409


Type Call-Preceded Sequence start address of VirtualAlloc() into esi. Further, it loads
Call 1 lea eax,[ebp-34h]; push eax; into ebp an address denoted as ADDR. At this address is
call esi; ret stored RET 3, the pointer to the ROP sequence we desire
Call 2 call eax
to call after VirtualAlloc() has returned. The next ROP
Call 3 push eax; call [ebp+0Ch]
sequence is our Call-Ret-Pair gadget, where the first in-
Table 7: Function Call Gadgets struction effectively loads RET 3 pointed to by ebp-34h
into eax. Next, RET 3 is stored at ADDR onto the stack
using a push instruction before the function call occurs.
a malicious executable by invoking WinExec(). Calling The push instruction also decrements the stack pointer
such functions within a ROP attack requires a function so that it points to RET 2. The subsequent indirect call
call gadget (§4.2.1). It is also useful to have gadgets that invokes VirtualAlloc() and automatically pushes the re-
allow one to conveniently write a NULL word to mem- turn address onto the stack, i.e, it will overwrite RET 2
ory (the Null-Byte gadget) or the Stack-pivot gadget [17] with the return address. This ensures that the control-
which is used by attacks exploiting heap overflows. Our flow will be redirected to the ret instruction in our Call-
instantiations of the Null-Byte and Stack-pivot gadgets Ret-Pair gadget when VirtualAlloc() returns. Lastly, the
are given in the Appendix as they are not vital to under- return will use RET 3 to invoke the next ROP sequence.
standing the discussion that follows. Note that this Call-Ret-Pair gadget works for subrou-
Additionally, to provide a generic method for circum- tines following the stdcall calling convention. Such func-
venting the behavioral heuristics of the Combined CFI tion remove their arguments from the stack upon function
Policy, we present a new gadget type, coined Long-NOP, return. For functions using cdecl, we use a Call-Ret-Pair
containing long sequences of instructions which do not gadget that pops after the function call, the arguments of
break the semantics of an arbitrary ROP chain (§4.2.2). the subroutine from the stack. The details of the gadget
we use for cdecl function can be found in the Appendix
4.2.1 Call-Ret-Pair Gadget of our technical report [19].
CFI policies raise several challenges with respect to For ROP attacks that terminate in a function call, we
calling WinAPI functions within a ROP attack. First, leverage the Call 2 and Call 3 gadgets in Table 7. The
one cannot simply exploit a ret instruction because the difference resides in the fact that Call 2 requires the target
CFIRET policy states that only a call-preceded sequence address to be loaded into eax, whereas Call 3 loads the
is allowed — clearly, the beginning of a function is not branch address from memory.
call-preceded. Second, the adversary must regain control Recall that the CFI policy for indirect calls (CFICALL
when the function returns. Hence, the return address of in Table 2) only permits the use of branch addresses taken
the function to be called must point to a call-preceded from an exported symbol or a valid code pointer place.
sequence that allows the ROP attack to continue. However, as we already described in §3.2.4, the integrity
To overcome these restrictions, we utilize what we of code pointers is not guaranteed. Hence, we can lever-
coined a Call-Ret-Pair gadget. The basic idea is to use a age GOT overwrite-like attacks to change the address at a
sequence that terminates in an indirect call but provides a given code pointer location. Alternatively, since modern
short instruction sequence afterwards that terminates in a applications typically make use of many WinAPI func-
ret instruction. Among our possible choices, the Call 1 tions by default, we can indirectly call one of these func-
sequence shown in Table 7 was selected. tions using the external symbols.
4.2.2 Long-NOP Gadget
LEA eax,[ebp-34h]

ADDR ROP Gadget 3 (RET 3)


PUSH eax
CALL esi
Our final gadget is needed to thwart the restriction that
ROP Gadget 2 (RET 2) RET
Alloc Mem.
after s = 7 short sequences in a row is used, another se-
ADDR + 34h ROP Sequence 2 (Call-Ret-Pair) ... quence of at least n = 20 instructions must follow (see
&VirtualAlloc RET
ROP Gadget 1 (RET 1) POP esi VirtualAlloc()
CFIHEU in Table 2). For this task, we developed a new
Memory Layout for
POP ebp
RET
gadget type that we refer to as the long no-operation
Call-Ret-Pair Gadget
ROP Sequence 1
(long-NOP) gadget. Constructing long-NOP in a way
that does not break the semantics of an arbitrary ROP
Figure 3: Example for Call-Ret-Pair Gadget chain was a non-trivial task that required painstaking
analyses and a stroke of luck.
To better understand the intracies of this gadget, we To identify possible sequences for this gadget type,
provide an example in Figure 3. This example depicts we let our sequence finder filter those call-preceded se-
how we can leverage our gadget to call VirtualAlloc(). quences that contain more than n = 20 instructions. To
We start with a load register gadget which first loads the ensure that the long sequence does not break the seman-

410 23rd USENIX Security Symposium USENIX Association


tics of the ROP chain, we further reduced the set of se- in esi and edi the same address, namely DATA ADDR,
quences to those that (i) contain many memory-write in- which points to an arbitrary data memory area in the ad-
structions, and (ii) make use of only a small set of regis- dress space of the application, e.g., stack, heap, or any
ters. While the latter requirement is obvious, the former other data segment of an executable module. Due to the
seems counter-intuitive as it can potentially change the ret 8 instruction, the stack pointer will be incremented
memory state of the process. However, if we are able to by 8 more bytes leaving space for pattern values. Af-
control the destination address of these memory writes, terwards, our long-NOP sequence uses esi and edi to
we can write arbitrary values into the data area of the issue 13 memory writes in a small window of 36 bytes.
process outside the memory used by our ROP attack. In each round, we use the same address for DATA ADDR,
and hence, we always write the same arbitrary values in
a 36 byte memory space not affecting memory used by
ROP Gadget 2 (RET 1)
DATA_ADDR
DATA AREA our ROP attack. The long-NOP sequence also destroys
New Value (ebp)
Saved eax (ebx) 36 Bytes the value of eax and loads new values via pop instruc-
Saved esi (esi)
Memory tions in other registers. However, these register changes
Saved edi (edi) are resolved by our optional sequences discussed next.
Pattern
PUSH 3 Optional Sequences. ROP Sequence 2 to 6 are the op-
Pattern POP eax
LNOP (RET 8) 13 Memory Writes tional sequences, and are responsible for preserving the
Pattern (ebp) (esi,edi) state of all registers. The optional sequences shown in
XOR eax,eax
DATA_ADDR (edi)
MOV eax,ebx Figure 4 represent those already presented in our basic
DATA_ADDR (esi) POP edi
POP esi
gadget arsenal in §4.1. Depending on the specific goals
Pre-LNOP (RET 7)
STORE EAX (RET 6)
POP ebx and gadgets of a ROP attack, the adversary can choose
O POP ebp
STORE EDI (RET 5) P RET among the optional sequences as required.
EDI_ADDR (ebp) T
ROP Sequence 8 (LNOP)
ROP Sequence 2 and 3 store the value of esi on the
EAX_ADDR (esi) I stack in such a way that the pop esi instruction in long-
POP ESI,EBP (RET 4) O
N
NOP resets the value accordingly. ROP Sequence 4 to 6
STORE ESI (RET 3) POP esi
ESI_ADDR (ebp) A POP edi store the content of eax and edi on the stack. Similar
POP ebp
POP EBP (RET 2)
L
RET 8
to the store for esi, the content is again re-loaded into
ROP Gadget 1 (RET 1)
ROP Sequence 7 (Pre-LNOP) these registers via pop instructions at the end of the long-
Memory Layout NOP sequence. However, the content of register eax and
for ROP Attack
ebx is exchanged after the long-NOP sequence since mov
Figure 4: Flow of Long-NOP gadget eax,ebx stores ebx to eax, and the former value of eax
is loaded via pop into ebx. However, we can compensate
Among the sequences that fulfill these requirements, this switch by invoking the Long-NOP gadget twice so
we chose a sequence that is (abstractly)6 shown in Fig- that eax and ebx are exchanged again.
ure 4. It contains 13 memory write instructions using 5 Hardening Real-World Exploits
only the registers esi and edi. We stress that the en-
tire gadget chain for long-NOP does not induce any side- We now elaborate on the hardening of two real-world ex-
effects, i.e., the content of all registers and memory area ploits against 32-bit Windows 7 SP1 and a Linux proof-
used by the ROP attack is preserved. of-concept exploit. Specifically, we transform publicly
We distinguish between mandatory and optional se- available ROP attacks against Adobe PDF reader [26]
quences used for long-NOP. The latter sequences are and the GNU mediaplayer mPlayer [10]. We used the
only required if the content of all registers needs to be gadget set derived in §4 to perform the transformation.
preserved. We classify them as optional, since it is very Furthermore, our attacks are executed with the Caller,
unlikely that ROP attacks need to operate on all registers SimExecFlow, StackPivot, LoadLib, and MemProt op-
during the entire ROP execution phase. If all registers tion for ROP detection in Microsoft EMET 4.1 enabled.
need to be preserved (worst-case scenario), we require 6 The source code for both attacks is given in our technical
ROP sequences before the long-NOP gadget sequence is report [19].
invoked. Since all registers are preserved, we can issue in 5.1 Windows Exploits
each round another ROP sequence until all desired ROP
The Adobe PDF attack used in this paper exploits the
sequences have been executed.
integer vulnerability CVE-2010-0188 in the TIFF image
Mandatory Sequences. The mandatory sequences are processing library libtiff. The vulnerability originally
those labeled Sequence 7 and 8 (in Figure 4). Sequence 7 targeted Adobe PDF versions 9.1-9.3 running on Win-
is used to set three registers: esi, edi, and ebp. We load dows XP SP2/SP3. Likewise, the mPlayer attack ex-

USENIX Association 23rd USENIX Security Symposium 411


ploited a buffer overflow vulnerability that allows the ad- we are allowed to leverage indirect calls to invoke these
versary to overwrite an exception handler pointer. Since functions (addressing CFICALL ). Note that even if this
we perform our analyses on Windows 7, we ported both were not the case, we could still call these functions by
exploits from Windows XP to Windows 7. overwriting valid code pointer locations. A demonstra-
Exploit Requirements: For both exploits, we need tion of this weakness — particularly for the approach
to (1) allocate a new read-write-execute (RWX) memory of Zhang and Sekar [46] — is provided in Section 5.2.
page with VirtualAlloc(), (2) copy malicious shellcode Lastly, we need to tackle the CFI policies for behav-
into the newly allocated page by using memcpy(), and ioral heuristics (addressing CFIHEU ) by ensuring that we
(3) redirect the control-flow to the shellcode. Originally, never execute more than 7 short sequences in a row be-
the exploits made use of non-call-preceded gadgets, and fore calling our long-NOP gadget.
used a long chain of short instruction sequences. For Putting-It-All-Together: Gadget  in Figure 5 loads
mPlayer 18 consecutive short sequences are executed, the target address of VirtualAlloc() into esi. The argu-
while for Adobe PDF 11 sequences are executed until the ments to this function (Arg1-Arg4) are set on the stack.
first system call is issued. Hence, both exploits clearly They are chosen in such a way that VirtualAlloc() allo-
violate CFIRET and CFIHEU of the combined CFI pol- cates a new RWX memory page. Gadget  leverages
icy. These exploits are prevented by Microsoft EMET our Call-Ret-Pair gadget to call VirtualAlloc(). The start
because of CFIRET , and are detected by both kBouncer address of the page is placed by VirtualAlloc() into eax.
and ROPecker due to violation of the CFIHEU policy. ROP Gadgets  and  facilitate two goals: first they
Replacing ROP Sequences: A simplified view of the store the start address of the new RWX page on the stack.
gadget chain we use for our hardened exploits in the PDF Second, they prepare the execution of the long-NOP gad-
exploit is shown in Figure 5. We first replaced all non- get. In particular, they set esi and edi to DATA ADDR.
call-preceded sequences with one of our call-preceded This address points to an arbitrary data section of one
sequences in our ROP gadget set identified in Section 4. of the linked libraries. Our long-NOP sequence (ROP
Both exploits mainly use load register and memory gad- Gadget ) will later perform 13 memory writes on this
gets to set the arguments for VirtualAlloc() and mem- data region, thereafter setting esi to the start address of
cpy(), and function call gadgets to invoke both functions. memcpy(). ROP Gadget  invokes memcpy() to copy the
By leveraging only call-preceded sequences, our attacks malicious shellcode onto the newly allocated RWX page.
comply to the CFI policy for returns (CFIRET ). Lastly, our ROP chain transfers the control-flow to the
copied shellcode via Gadget , which in both exploits
Execute SHELLCODE
opens the Windows calculator.
ROP Gadget 7 (RET 2) For the Adobe PDF attack, we used 7 ROP sequences
Arg3 = size
Arg2 = &SHELLCODE NEW_PAGE with 52 instructions executed. In the hardened version
RWX Memory
Arg1 = NEW_PAGE
SHELLCODE
of the mPlayer exploit, we used 49 ROP sequences with
ROP Gadget 6 (RET 1)
Call-Ret-Pair 2 380 instructions executed. Note that the 49 sequences
&memcpy memcpy()
ROP Gadget 5 (RET 5) DATA AREA
include the interspersed long-NOP sequences to adhere
DATA_ADDR
DATA_ADDR
to the CFI policy CFIHEU . We used a writable memory
36 Bytes
ROP Gadget 4 (RET 4)
Memory area of 36 Bytes for the long-NOP gadget. The require-
DATA_ADDR
M_Args
ment of more sequences for the mPlayer attack can be
LNOP Sequence
ROP Gadget 3 (RET 3) LOAD esi,ebp
attributed to the fact that this exploit did not allow for
Arg4 = RWX the use of any NULL bytes in the payload and so we
Arg3 = MEM_COMMIT STORE eax at [esi]
Arg2 = size LOAD esi
needed to leverage a NULL-Byte gadget (Appendix A)
Arg1 = NULL LOAD esi,edi
in this exploit. The mPlayer exploit also required a
ROP Gadget 2 (RET 2) stack pivot gadget (Appendix B). This attack also re-
&VirtualAlloc Call-Ret-Pair
ROP Gadget 1 (RET 1) VirtualAlloc() quired a specific stack pivot gadget adding a large con-
Memory Layout LOAD esi stant to esp. Unfortunately, our stack pivot sequences in
for ROP Exploit
(Adobe PDF)
kernel32.dll did not use large enough constants, and
the original sequence exploited a non call-preceded one
Figure 5: Simplified view of our hardened PDF exploit. in avformat-52.dll. However, we identified another
See [19] for the full source code. useful call-preceded stack pivot sequence in the same li-
brary which allowed us to instantiate the exploit.
Since both exploits make use of WinAPI calls, we uti- The above strategies can be used to easily transform
lized our Call-Ret-Pair gadget to invoke VirtualAlloc() other ROP attacks to bypass current coarse-grained CFI
and memcpy(). As both functions are default routines defenses. Furthermore, given our routines for finding and
used in a benign execution of Adobe PDF and mPlayer, filtering useful call-preceded ROP sequences, the process

412 23rd USENIX Security Symposium USENIX Association


of transforming exploits could be fully automated. We to overwrite the pointers placed in the GOT. Specifically,
leave that as an exercise for future work. we first find useful sequences from the Linux standard
A final remark concerns the control transfer to the library libc.so and use gadgets that perform the GOT
injected shellcode. In both exploits, we invoke a call- overwrite while using only call-preceded sequences.
preceded sequence terminating in an indirect jump. Putting-It-All-Together: An example on how we by-
While this approach works for kBouncer, ROPecker, and pass the CFI policy for indirect calls is shown in Figure 6.
ROPGuard, it might raise an alarm for CFI for COTS The approach is as follows: first, Gadget  loads the ad-
binaries if the shellcode is placed at an address that is dress of the GOT entry we want to modify into edx, and
not within the set of valid function pointers (i.e., indirect loads eax with the address of execve(). Next, Gadget 
jump targets). However, there are several ways to tackle overwrites the address of printf() with the address of ex-
this issue. A very effective approach has been shown by ecve() in the GOT. Finally, Gadget  loads the address of
Göktas et al. [22], where the code section is simply set to the printf() stub into esi, and Gadget  uses a Call-Ret-
be writable, the shellcode copied to an address which re- Pair gadget to invoke execve(). At this point, the attack
sembles a valid function pointer, and after which the code succeeds without violating any of the CFI policies.
section is reset back to be executable. Alternatively, one
can overwrite the location of a valid function pointer with 5.3 On Parameter Adjustment
the start address of the shellcode. We provide a detailed As alluded to in §3.3, adjusting the parameters for the
example how this can be realized in the next subsection. CFIHEU policy beyond the recommended settings will
5.2 Linux Shellcode Exploit negatively impact the false positive rate. To assess that,
Since the approach of Zhang and Sekar [46] targets we extended the analysis beyond what Pappas et al. [31]
Linux specifically, we also developed a proof-of-concept originally performed in order to analyze the impact of
exploit that shows how our attack bypasses the CFI poli- increasing n to 30 or 40 instructions — thereby render-
cies for indirect calls. To do so, we use a sample program ing our Long-NOP gadget (which is only 23 instruc-
that suffers from a buffer overflow vulnerability allowing tions long) stitching ineffective. Specifically, we per-
an adversary to overwrite a return address on the stack. formed an experiment using three benchmarks of the
The goal of our attack is to call execve(), which is a stan- SPEC CPU 2006 benchmark suite: bzip2, perlbench,
dard system function defined in libc.so to execute a and xalancbmk. The first two are programmed in C,
new program. The challenge, however, is that the ex- while the latter in C++. We developed an Intel Pintool
ample program does not include execve() in its external that counts the number of instructions issued between
symbols, and consequently, we are not allowed to redi- two indirect branches, and the number of consecutive
rect the control-flow to execve() using an indirect call. short instruction sequences. Whenever a function call oc-
curs, we check how many short sequences (s) have been
Call-Ret-Pair
executed since the last function call.
printf@plt
Arg1 = /bin/sh 200000
printf@plt JMP
Number of Potential False Positives

ROP Gadget 4 (RET 4) [&[email protected]] 180000


&printf@plt LOAD esi ...
160000
...

ROP Gadget 3 (RET 3) 140000


ROP Gadget 2 (RET 2) 120000
[email protected] &execve
&execve
Code Layout of 100000
&[email protected] STORE eax at [edx] module 80000
ROP Gadget 1 (RET 1)
60000
LOAD edx,eax
Stack layout for
40000
GOT overwrite
20000
0
Figure 6: GOT overwrite attack bzip2 perlbench xalancbmk bzip2 perlbench xalancbmk
(n=30) (n=30) (n=30) (n=40) (n=40) (n=40)

To overcome this restriction, we make use of an old s=7 s=8 s=9 s=10 s>10

(but seemingly forgotten) attack technique called global


Figure 7: Potential false positives when the parameters for the
offset table (GOT) overwrite [9]. The basic idea is to
consecutive sequences (s) and sequence length (n) are adjusted.
write the address of execve() at a valid code pointer loca-
tion. A well-known location for doing so is the GOT
table, which contains pointers to library calls such as As Figure 7 shows, increasing the thresholds for n
printf(). We reiterate that the weakness here is that CFI induces many potential false positives (y-axis). In par-
for COTS binaries does not validate the integrity of these ticular, for each benchmark (x-axis), observe that for
pointers — a very difficult, if not unsurmountable task, in s > 10 there are about 20,000 potential false positives,
the current design of Linux since the GOT is initialized at i.e., 20,000 times we detected a function call that was
runtime of an application. Hence, we can invoke gadgets preceded by more than 10 short sequences7 .

USENIX Association 23rd USENIX Security Symposium 413


6 Related Work seminal work. This realization is a bit troubling, and
Concurrent and independent to our work, several re- calls for a broader acceptance that we should not sacrifice
search groups have investigated the security of coarse- security for small performance gains. Doing so simply
grained CFI solutions [11, 22–24, 34]. However, our does not raise the bar high enough to deter skillful adver-
analysis differs from these works as we examine the saries. Indeed, our own work shows that even if coarse-
security of a combination of coarse-grained CFI poli- grained CFI solutions are combined, there is still enough
cies irrespective of when the CFI check occurs. For in- leeway to mount reasonable and Turing-complete ROP
stance, the attacks shown in [11, 22, 34] are prevented by attacks. Our hope is that our findings will raise better
our combined CFI policy which monitors the sequence awareness of some of the critical issues when designing
length at any time in program execution. Furthermore, robust CFI mechanisms, all-the-while re-energizing the
unlike these works, we systematically show the construc- community to explore more efficient solutions for em-
tion of a Turing-complete gadget set based on a weak ad- powering CFI.
versary that has only access to one standard shared Win- 8 Acknowledgments
dows library. On the other hand, concurrent work also in-
vestigates some other interesting attack aspects: Göktas We thank Kevin Z. Snow and Úlfar Erlingsson for their
et al. [22] demonstrate attacks against CCFIR [45] using valuable feedback on earlier versions of this paper.
call-preceded gadgets to invoke sensitive functions via
direct calls; Carlini and Wagner [11] and Schuster et al.
References
[34] show flushing attacks that eliminate return-oriented [1] M. Abadi, M. Budiu, Ú. Erlingsson, and J. Ligatti. Control-flow
programming traces before a critical function is invoked. integrity: Principles, implementations, and applications. In ACM
Conference on Computer and Communications Security (CCS),
Lastly, new CFI-based solutions have also been pro- 2005.
posed. For instance, the approaches of Tice et al. [40]
and Jang et al. [25] focus on protecting indirect calls [2] M. Abadi, M. Budiu, Ú. Erlingsson, G. C. Necula, and M. Vrable.
to virtual methods in C++. Both approaches have been XFI: Software guards for system address spaces. In USENIX
Symposium on Operating Systems Design and Implementation
implemented as a compiler extension and ensure that
(OSDI), 2006.
an adversary cannot manipulate a virtual table (vtable)
pointer so that it points to an adversary-controlled (mali- [3] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow
cious) vtable. Unfortunately, these schemes do not pro- integrity: Principles, implementations, and applications. ACM
tect against classical ROP attacks which exploit return Transactions on Information and System Security (TISSEC), 13
(1), 2009.
instructions, and map malicious code to a memory area
reserved for a valid virtual method. [4] Aleph One. Smashing the stack for fun and profit. Phrack Mag-
azine, 49(14), 2000.
7 Summary
Without question, control-flow integrity offers a strong [5] E. Bachaalany. Inside EMET 4.0. REcon Mon-
treal, 2013. Presentation. Slides: http://recon.
defense against runtime attacks. Its promise lies in the cx/2013/slides/Recon2013-Elias%20Bachaalany-
fact that it provides a general defense mechanism to Inside%20EMET%204.pdf.
thwart such attacks. Rather than focusing on patching
program vulnerabilities one by one, CFI’s power stems [6] blexim. Basic integer overflows. Phrack Magazine, 60(10), 2002.
from focusing on the integrity of the program’s control [7] E. Buchanan, R. Roemer, H. Shacham, and S. Savage. When
flow regardless of how many bugs and errors it may suf- good instructions go bad: Generalizing return-oriented program-
fer from. Unfortunately, several pragmatic issues (most ming to RISC. In ACM Conference on Computer and Communi-
notably, its relatively high performance overhead), have cations Security (CCS), 2008.
limited its widespread adoption.
[8] M. Budiu, U. Erlingsson, and M. Abadi. Architectural support
To better tackle the performance trade-off between se- for software-based protection. In Workshop on Architectural and
curity and performance, several coarse-grained CFI so- System Support for Improving Software Dependability, ASID ’06,
lutions have been proposed to date [13, 20, 31, 45, 46]. 2006.
Additionally, it has been recently shown that such coarse-
[9] Bulba and Kil3r. Bypassing StackGuard and StackShield. Phrack
grained CFI policies can be applied to operating system Magazine, 56(5), 1996.
kernels [16]. These proposals all use relaxed policies,
e.g., allowing returns to target any instruction following [10] C4SS!0 and h1ch4m. MPlayer Lite r33064 m3u Buffer Over-
a call instruction. flow Exploit (DEP Bypass). http://www.exploit-db.com/
exploits/17565/, 2011.
While many advancements have been made along the
way, all to often the relaxed enforcement policies signifi- [11] N. Carlini and D. Wagner. ROP is still dangerous: Breaking mod-
cantly diminish the security afforded by Abadi et al. [3]’s ern defenses. In USENIX Security Symposium, 2014.

414 23rd USENIX Security Symposium USENIX Association


[12] S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, [29] Microsoft. Enhanced Mitigation Experience Toolkit.
H. Shacham, and M. Winandy. Return-oriented programming https://www.microsoft.com/emet, 2014.
without returns. In ACM Conference on Computer and Commu-
nications Security (CCS), 2010. [30] Nergal. The advanced return-into-lib(c) exploits: PaX case study.
Phrack Magazine, 58(4), 2001.
[13] Y. Cheng, Z. Zhou, Y. Miao, X. Ding, and R. H. Deng. ROPecker:
A generic and practical approach for defending against ROP at- [31] V. Pappas, M. Polychronakis, and A. D. Keromytis. Transparent
tacks. In Symposium on Network and Distributed System Security ROP exploit mitigation using indirect branch tracing. In USENIX
(NDSS), 2014. Security Symposium, 2013.

[14] T. Chiueh and F.-H. Hsu. RAD: A compile-time solution to buffer [32] J. Pewny and T. Holz. Compiler-based CFI for iOS. In Annual
overflow attacks. In International Conference on Distributed Computer Security Applications Conference (ACSAC), 2013.
Computing Systems (ICDCS), 2001. [33] J. Pincus and B. Baker. Beyond stack smashing: Recent advances
in exploiting buffer overruns. IEEE Security and Privacy, 2(4),
[15] C. Cowan, C. Pu, D. Maier, H. Hintony, J. Walpole, P. Bakke,
2004.
S. Beattie, A. Grier, P. Wagle, and Q. Zhang. StackGuard: Au-
tomatic adaptive detection and prevention of buffer-overflow at- [34] F. Schuster, T. Tendyck, J. Pewny, A. Maaß, M. Steegmanns,
tacks. In USENIX Security Symposium, 1998. M. Contag, and T. Holz. Evaluating the effectiveness of current
anti-ROP defenses. In Symposium on Recent Advances in Intru-
[16] J. Criswell, N. Dautenhahn, and V. Adve. KCoFI: Complete sion Detection (RAID), 2014.
control-flow integrity for commodity operating system kernels.
In IEEE Symposium on Security and Privacy, Oakland ’14, 2014. [35] H. Shacham. The geometry of innocent flesh on the bone: Return-
into-libc without function calls (on the x86). In ACM Conference
[17] D. Dai Zovi. Practical return-oriented programming. SOURCE on Computer and Communications Security (CCS), 2007.
Boston, 2010. Presentation. Slides: http://trailofbits.
files.wordpress.com/2010/04/practical-rop.pdf. [36] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen,
and A.-R. Sadeghi. Just-in-time code reuse: On the effective-
[18] L. Davi, A. Dmitrienko, M. Egele, T. Fischer, T. Holz, R. Hund, ness of fine-grained address space layout randomization. In IEEE
S. Nürnberger, and A.-R. Sadeghi. MoCFI: A framework to miti- Symposium on Security and Privacy, Oakland ’13, 2013.
gate control-flow attacks on smartphones. In Symposium on Net-
work and Distributed System Security (NDSS), 2012. [37] Solar Designer. ”return-to-libc” attack. Bugtraq, 1997.

[19] L. Davi, D. Lehmann, A.-R. Sadeghi, and F. Monrose. Stitch- [38] A. Sotirov and M. Dowd. Bypassing browser memory protections
ing the gadgets: On the ineffectiveness of coarse-grained control- in Windows Vista. http://www.phreedom.org/research/
flow integrity protection. Technical Report TUD-CS-2014-0097, bypassing-browser-memory-protections/, 2008.
Technische Universität Darmstadt, 2014.
[39] M. Thomlinson. Announcing the BlueHat Prize winners.
[20] I. Fratric. ROPGuard: Runtime prevention of return-oriented https://blogs.technet.com/b/msrc/archive/
programming attacks. 2012/07/26/announcing-the-bluehat-prize-
http://www.ieee.hr/_download/repository/Ivan_ winners.aspx?Redirected=true, 2012.
Fratric.pdf, 2012.
[40] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, Ú. Erlings-
[21] gera. Advances in format string exploitation. Phrack Magazine, son, L. Lozano, and G. Pike. Enforcing forward-edge control-
59(12), 2002. flow integrity in GCC & LLVM. In USENIX Security Symposium,
2014.
[22] E. Göktas, E. Athanasopoulos, H. Bos, and G. Portokalidis. Out
of control: Overcoming control-flow integrity. In IEEE Sympo- [41] V. van der Veen, N. dutt-Sharma, L. Cavallaro, and H. Bos. Mem-
sium on Security and Privacy, Oakland ’14, 2014. ory errors: The past, the present, and the future. In Symposium
on Research in Attacks, Intrustions, and Defenses (RAID), 2012.
[23] E. Göktas, E. Athanasopoulos, M. Polychronakis, H. Bos, and
[42] Z. Wang and X. Jiang. HyperSafe: A lightweight approach to
G. Portokalidis. Size does matter: Why using gadget-chain length
provide lifetime hypervisor control-flow integrity. In IEEE Sym-
to prevent code-reuse attacks is hard. In USENIX Security Sym-
posium on Security and Privacy, 2010.
posium, 2014.
[43] B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy,
[24] S. Jalayeri. Bypassing EMET 3.5’s ROP mitigations. https:
S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox
//repret.wordpress.com/2012/08/08/bypassing-
for portable, untrusted x86 native code. In IEEE Symposium on
emet-3-5s-rop-mitigations/, 2012.
Security and Privacy, 2009.
[25] D. Jang, Z. Tatlock, and S. Lerner. SAFEDISPATCH: Securing [44] B. Zeng, G. Tan, and G. Morrisett. Combining control-flow in-
C++ virtual calls from memory corruption attacks. In Symposium tegrity and static analysis for efficient and validated data sand-
on Network and Distributed System Security (NDSS), 2014. boxing. In ACM Conference on Computer and Communications
Security (CCS), 2011.
[26] jduck. The latest Adobe exploit and session upgrad-
ing. http://bugix-security.blogspot.de/2010/03/ [45] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant,
adobe-pdf-libtiff-working-exploitcve.html, 2010. D. Song, and W. Zou. Practical control flow integrity & random-
ization for binary executables. In IEEE Symposium on Security
[27] T. Kornau. Return oriented programming for the ARM architec-
and Privacy, Oakland ’13, 2013.
ture. Master’s thesis, Ruhr-University Bochum, 2009.
[46] M. Zhang and R. Sekar. Control flow integrity for COTS binaries.
[28] Microsoft. Data Execution Prevention (DEP). http:// In USENIX Security Symposium, 2013.
support.microsoft.com/kb/875352/EN-US/, 2006.

USENIX Association 23rd USENIX Security Symposium 415


A NULL-Byte Write Gadget C Details of Long-NOP Gadget
In real-world exploits it is useful to have gadgets that al- pop esi ; ptr to writable mem for NOP
pop edi ; ptr to writable mem for NOP
low one to conveniently write a NULL word to memory. pop ebp ; unused in NOP
This is important as real-world vulnerabilities typically retn 8 ; -> insert 8 bytes junk after
do not allow an adversary to write a NULL byte in the next gadget
payload, but such functionality is indeed needed to write Listing 1: Pre-Seuence for LNOP
a 32-bit NULL word on the stack when required as a pa- movzx eax , ax
rameter to function calls. mov [ esi +4] , eax ; 5 writes to
A prominent example is the traditional strcpy(dest,src) mov [ esi +8] , 1 F4Bh ; a 20 byte
mov [ esi +14 h ] , 5 ; memory region
vulnerability, which can be exploited to write data be- mov [ esi +10 h ] , 1 Fh
yond the boundaries of the src variable. However, mov [ esi +0 Ch ] , 0 Ch
strcpy() stops copying input data after encountering a push 3 Bh
NULL byte. pop eax
mov [ esi +1 Ch ] , eax ; 2 writes to
mov [ esi +20 h ] , eax ; 8 byte region
xor eax , eax
ADDR NULL mov [ esi +18 h ] , 17 h ; another 8 bytes
AND [ebp-20h],0 mov [ esi +24 h ] , 98967 Fh
RET mov [ edi +18 h ] , eax ; if edi == esi
ROP Gadget 2 (RET 2) ROP Sequence 2 (NULL)
mov [ edi +1 Ch ] , eax ; these writes
mov [ edi +20 h ] , eax ; goto the same
ADDR + 20h mov [ edi +24 h ] , eax ; region as before
ROP Gadget 1 (RET 1) POP ebp pop edi ; ( optional :) restore edi
RET pop esi ; ( optional :) restore esi
Memory Layout for
NULL Gadget ROP Sequence 1
mov eax , ebx
pop ebx ; ( optional :) load former eax
pop ebp
Figure 8: Details of NULL Gadget retn 0 Ch
Listing 2: Long sequence used for LNOP gadget
Our choice for such a gadget is shown in Figure 8.
This gadget first loads the target address into ebp with Notes
1 Some of the mechanisms used in kBouncer and ROPGuard (both
the first ROP sequence. The next sequence exploits the
awarded by Microsoft’s BlueHat Prize [39]) have already been inte-
and instruction to generate a NULL word at the memory grated in Microsoft’s defense tool called EMET [29].
location pointed to by ebp-20h. 2 Sequences that end in indirect jumps or calls can also be used [12].
3 Typically, CFI does not validate direct branches because these ad-

B Stack Pivot Gadgets dresses are hard-coded in the code of an executable and cannot be
changed by an adversary when W⊕X is enforced.
4 Specifically, kBouncer reports a ROP attack when a chain of 8
We take advantage of two distinct stack pivot gadgets short sequences has been executed, where a sequence is referred to as
shown in Table 8. The first one is our unconditional “short” whenever the sequence length is less than 20 instructions.
branch gadget, which moves ebp via the leave instruc- 5 The target address of an external function is dynamically allocated

tion to esp. The other sequence takes the value of esi in the global offset table (GOT) which is loaded by an indirect memory
jump in the procedure linkage table (PLT).
and loads it into esp. In both sequences, the adversary 6 For the interested reader, we have placed the specific assembler
must control the source register ebp and esi, respec- implementation of the long-NOP sequence in Appendix C.
7 We also simulated the analysis performed in [31] by setting n = 20.
tively. This is achieved by invoking a load register gadget
beforehand. Note also that several vulnerabilities allow However, we arrive at a significantly higher false positive rate than
in [31]. This is likely due to the fact that we perform our analysis on
an adversary to load these registers with the correct val- industry benchmark programs, while their analysis is based on open-
ues at the time the buffer overflow occurs, which would ing web-browsers or document readers. Furthermore, their focus is on
make the ROP attack easier. WinAPI calls, whereas in Figure 7 we instrument every call.

Type Call-Preceded Sequence (ending in ret)


Pivot 1 leave
Pivot 2 mov esp, esi; pop ebx; pop edi;
pop esi; pop ebp

Table 8: Stack Pivot Gadgets

416 23rd USENIX Security Symposium USENIX Association

You might also like