function hooking
for osx and linux
joe damato
@joedamato
[Link]
slides are on
[Link]
(free jmpesp)
i’m not a security
researcher.
call me a script kiddie:
@joedamato
[Link]
[Link]
assembly is in att syntax
[Link]
WTF is an ABI ?
WTF is an Application
Binary
Interface ?
alignment
[Link]
calling convention
[Link]
object file and
library formats
[Link]
hierarchy of specs
[Link]
System V ABI (271 pages)
System V ABI AMD64 Architecture Processor
Supplement (128 pages)
System V ABI Intel386 Architecture Processor
Supplement (377 pages)
MIPS, ARM, PPC, and IA-64 too!
mac osx x86-64 calling convention
based on
System V ABI AMD64 Architecture
! ! ! Processor Supplement
[Link]
alignment
[Link]
end of argument area must be
aligned on a 16byte boundary.
and $0xfffffffffffffff0, %rsp
calling convention
[Link]
• function arguments from left to right live in:
%rdi, %rsi, %rdx, %rcx, %r8, %r9
• that’s for INTEGER class items.
• Other stuff gets passed on the stack (like
on i386).
• registers are either caller or callee save
object file and
library formats
[Link]
[Link]
ELF Objects
[Link]
ELF Objects
• ELF objects have headers
• elf header (describes the elf object)
• program headers (describes segments)
• section headers (describes sections)
• libelf is useful for wandering the elf object extracting
information.
• the executable and each .so has its own set of data
ELF Object sections
• .text - code lives here
• .plt - stub code that helps to “resolve”
absolute function addresses.
• .[Link] - absolute function addresses; used
by .plt entries.
• .debug_info - debugging information
• .gnu_debuglink - checksum and filename for
debug info
ELF Object sections
• .dynsym - maps exported symbol names to
offsets
• .dynstr - stores exported symbol name
strings
• .symtab - maps symbol names to offsets
• .strtab - symbol name strings
• more sections for other stuff.
[Link]
Mach-O Objects
[Link]
Mach-O Objects
• Mach-O objects have load commands
• header (describes the mach-o object)
• load commands (describe layout and linkage info)
• segment commands (describes sections)
• dyld(3) describes some apis for touching mach-o
objects
• the executable and each dylib/bundle has its own set
of data
Mach-O sections
• __text - code lives here
• __symbol_stub1 - list of jmpq instructions
for runtime dynamic linking
• __stub_helper - stub code that helps to
“resolve” absolute function addresses.
• __la_symbol_ptr - absolute function
addresses; used by symbol stub
Mach-O sections
• symtabs do not live in a segment, they have
their own load commands.
• LC_SYMTAB - holds offsets for symbol
table and string table.
• LC_DYSYMTAB - a list of 32bit offsets into
LC_SYMTAB for dynamic symbols.
[Link]
nm
% nm /usr/bin/ruby
000000000048ac90 t Balloc
0000000000491270 T Init_Array
0000000000497520 T Init_Bignum
000000000041dc80 T Init_Binding
symbol symbol names
“value” 000000000049d9b0 T Init_Comparable
000000000049de30 T Init_Dir
00000000004a1080 T Init_Enumerable
00000000004a3720 T Init_Enumerator
00000000004a4f30 T Init_Exception
000000000042c2d0 T Init_File
0000000000434b90 T Init_GC
objdump
% objdump -D /usr/bin/ruby
offsets opcodes instructions helpful metadata
readelf
% readelf -a /usr/bin/ruby
This is a *tiny* subset of the data available
otool
% otool -l /usr/bin/ruby
This is a *tiny* subset of the data available
[Link]
strip
• You can strip out whatever sections you
want....
• but your binary may not run.
• you need to leave the dynamic symbol/
string tables intact or dynamic linking will
not work.
[Link]
Calling functions
callq *%rbx
callq 0xdeadbeef
other ways, too...
anatomy of a call
(objdump output)
412d16: e8 c1 36 02 00 callq 4363dc # <a_function>
412d1b: .....
address of this instruction
call opcode
32bit displacement to the
target function from the next
instruction.
anatomy of a call
(objdump output)
412d16: e8 c1 36 02 00 callq 4363dc # <a_function>
412d1b: .....
(x86 is little endian)
412d1b + 000236c1 = 4363dc
Hook a_function
Overwrite the displacement so that all calls
to a_function actually call a different function
instead.
It may look like this:
int other_function()
{
/* do something good/bad */
/* be sure to call a_function! */
return a_function();
}
codez are easy
/* CHILL, it’s fucking psuedo code */
while (are_moar_bytes()) {
curr_ins = next_ins;
next_ins = get_next_ins();
if (curr_ins->type == INSN_CALL) {
if ((hook_me - next_ins) == curr_ins->displacement) {
/* found a call hook_me!*/
rewrite(curr_ins->displacement, (replacement_fn - next_ins));
return 0;
}
}
}
... right?.....
[Link]
32bit displacement
• overwriting an existing call with another call
• stack will be aligned
• args are good to go
• can’t redirect to code that is outside of:
• [rip + 32bit displacement]
• you can scan the address space looking for
an available page with mmap, though...
Doesn’t work for all
calling a function that is exported by a
dynamic library works differently.
How runtime dynamic
linking works (elf)
.[Link] entry
Initially, the .[Link] entry contains
the address of the instruction after
the jmp. 0x7ffff7afd6e6
How runtime dynamic
linking works (elf)
.[Link] entry
An ID is stored and the rtld is
invoked.
0x7ffff7afd6e6
How runtime dynamic
linking works (elf)
.[Link] entry
rtld writes the address of
rb_newobj to the .[Link] entry.
0x7ffff7b34ac0
How runtime dynamic
linking works (elf)
.[Link] entry
rtld writes the address of
rb_newobj to the .[Link] entry.
0x7ffff7b34ac0
calls to the PLT entry jump
immediately to rb_newobj now
that .[Link] is filled in.
[Link]
Hook the GOT
Redirect execution by overwriting all
the .[Link] entries for rb_newobj in each
DSO with a handler function instead.
Hook the GOT
VALUE other_function() .[Link] entry
{
new_obj = rb_newobj();
/* do something with new_obj */
return new_obj; 0xdeadbeef
}
WAIT... other_function() calls rb_newobj() isn’t that an infinite loop?
NO, it isn’t. other_function() lives in it’s own DSO, so its
calls to rb_newobj() use the .plt/.[Link] in its own DSO.
As long as we leave other_function()‘s DSO unmodified, we’ll
avoid an infinite loop.
[Link]
[Link]
elf
mach-o
me
what else is left?
inline functions.
add_freelist
• Can’t hook because add_freelist is inlined:
static inline void
add_freelist(p)
RVALUE *p;
{
p->[Link] = 0;
p->[Link] = freelist;
freelist = p;
}
• The compiler has the option of
inserting the instructions of this
function directly into the callers.
• If this happens, you won’t see any calls.
So... what now?
• Look carefully at the code:
static inline void
add_freelist(p)
RVALUE *p;
{
p->[Link] = 0;
p->[Link] = freelist;
freelist = p;
}
• Notice that freelist gets updated.
• freelist has file level scope.
• hmmmm......
A (stupid) crazy idea
• freelist has file level scope and lives at some
static address.
• add_freelist updates freelist, so...
• Why not search the binary for mov instructions
that have freelist as the target!
• Overwrite that mov instruction with a call to
our code!
• But... we have a problem.
• The system isn’t ready for a call instruction.
alignment
[Link]
calling convention
[Link]
Isn’t ready? What?
• The 64bit ABI says that the stack must be
aligned to a 16byte boundary after any/all
arguments have been arranged.
• Since the overwrite is just some random
mov, no way to guarantee that the stack is
aligned.
• If we just plop in a call instruction, we
won’t be able to arrange for arguments to
get put in the right registers.
• So now what?
jmp
• Can use a jmp instruction.
• Transfer execution to an assembly stub
generated at runtime.
• recreate the overwritten instruction
• set the system up to call a function
• do something good/bad
• jmp back when done to resume execution
[Link]/lh/photo/-R3BPlqOq8MfQGFTduIqCA
checklist
• save and restore caller/callee saved
registers.
• align the stack.
• recreate what was overwritten.
• arrange for any arguments your
replacement function needs to end up in
registers.
• invoke your code.
• resume execution as if nothing happened.
this instruction updates the freelist and comes from
add_freelist:
Can’t overwrite it with a call instruction because the
state of the system is not ready for a function call.
The jmp instruction and its offset are 5 bytes wide.
Can’t grow or shrink the binary, so insert 2 one byte
NOPs.
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
shortened assembly
stub
void handler(VALUE freed_object)
{
mark_object_freed(freed_object);
return;
}
shortened assembly
stub
and it actually works.
gem install memprof
[Link]
[Link]
Sample Output
require 'memprof'
[Link]
require "stringio"
[Link]
[Link]
108 /custom/ree/lib/ruby/1.8/x86_64-linux/[Link]:__node__
14 test2.r[Link]String
2 /custom/ree/lib/ruby/1.8/x86_64-linux/[Link]:Class
1 test2.r[Link]StringIO
1 test2.r[Link]String
1 test2.r[Link]rray
1 /custom/ree/lib/ruby/1.8/x86_64-linux/[Link]:Enumerable
[Link]
a web-based heap visualizer and leak analyzer
[Link]
a web-based heap visualizer and leak analyzer
[Link]
a web-based heap visualizer and leak analyzer
[Link]
a web-based heap visualizer and leak analyzer
[Link]
a web-based heap visualizer and leak analyzer
[Link]
a web-based heap visualizer and leak analyzer
[Link]
[Link](Memprof::Tracer)
{
"time": 4.3442, total time for request
"rails": { rails controller/action
"controller": "test",
"action": "index"
},
"request": { request env info
"REQUEST_PATH": "/test,,
"REQUEST_METHOD": "GET"
},
[Link](Memprof::Tracer)
"mysql": {
"queries": 3, 3 mysql queries
"time": 0.00109302
},
"gc": {
"calls": 8, 8 calls to GC
"time": 2.04925 2 secs spent in GC
},
[Link](Memprof::Tracer)
"objects": {
"created": 3911103, 3 million objs created
"types": {
"none": 1168831, 1 million method calls
"object": 1127, object instances
"float": 627,
"string": 1334637, lots of strings
"array": 609313, lots of arrays
"hash": 3676,
"match": 70211 regexp matches
}
}
}
[Link]
[Link]
evil lives
[Link]
• makes ruby faster!11!!1
• hooks read syscall
• looks for magic cookie (JOE)
• turns off GC
• Ruby is fast.
it makes ruby faster!!1!
look a bullshit
benchmark!
it makes ruby faster!!1!
#NORMAL RUBY!!!!11!!
[joe@mawu:/Users/joe/code/defcon/memprof/ext]% ab -c 10 -n 200 [Link]
4567/hi/JOE
Benchmarking blah (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Concurrency Level: 10
Time taken for tests: 7.462 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Requests per second: 26.80 [#/sec] (mean)
Time per request: 373.108 [ms] (mean)
Time per request: 37.311 [ms] (mean, across all concurrent requests)
it makes ruby faster!!1!
# fast0r RUBY!!!11!111
[joe@mawu:/Users/joe/code/defcon]% ab -c 10 -n 200 [Link]
Benchmarking blah (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Concurrency Level: 10
Time taken for tests: 6.594 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Requests per second: 30.33 [#/sec] (mean)
Time per request: 329.708 [ms] (mean)
Time per request: 32.971 [ms] (mean, across all concurrent requests)
you can do anything
• this example is stupid, but you can do
anything.
• hook read/write and phone home with
data.
• fork a backdoor when a specific cookie is
seen
• whatever
[Link]
[Link]
injectso
• written by Shaun Clowes
• injects libraries into running processes
using ptrace(2).
• super clever hack!
[Link]
injecting live processes
• ptrace(2)
• allows you to view and modify the
register set and address space of another
process
• permissions on memory are ignored
fucking injectso, how
does it work?
• attach to target process using ptrace
• save a copy of a small piece of the program
stack.
• save a copy of the register set
• create a fake stack frame with a saved return
address of 0
fucking injectso, how
does it work?
• set register set to point at dlopen
• rip = &dlopen
• rdi = dso name
• rsi = mode
• let er rip, waitpid and it’ll segfault on return
to 0.
• restore stack, register set, resume as
normal.
ptrace evil dso
• remote allocating • getting the user to
memory is a pain in use your library
the ass. might be hard.
• generating segfaults in • already running
running processes processes will need
might be bad (core to be killed first.
dumps, etc).
• need to poison each
• binary patching is time app is started.
hard, doing it with
ptrace is harder.
• binary patching is
hard.
[Link]
combine ‘em
• use injectso hack to load an evil dso
• evil dso will take it from there
64bit injectso port
• ported by Stealth
• [Link]
[Link]
• i did some trivial cleanup and put the codez
on github
• [Link]
• tested it on 64bit ubuntu VM, works.
injectso
+
evil-binary-patching-dso
[Link]
[Link]
[Link]
[Link]
how to defend against it
• NX bit - call mprotect
• strip debug information - mostly prebuilt binaries
• statically link everything - extremely large binaries
• put all .text code in ROM - maybe?
• don’t load DSOs at runtime - no plugins, though
• disable ptrace - no gdb/strace.
• check /proc/<pid>/maps - word.
[Link]
my future research:
exploring alternative
binary formats.
[Link]
[Link]
alignment
[Link]
calling convention
[Link]
object file and
library formats
[Link]
questions?
joe damato
@joedamato
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
“Interesting Behavior of
OS X”
• Steven Edwards (winehacker@[Link])
• november 29 2007
• [Link]
devel/2007-November/[Link]
leopard has a pe
loader?
handle = dlopen("./[Link]", RTLD_NOW | RTLD_FIRST );
steven-edwardss-imac:temp sedwards$ ./[Link]
dlopen(./[Link], 258): Library not loaded: WS2_32.dll
Referenced from: /Users/sedwards/Library/Application
Support/CrossOver/Bottles/winetest/drive_c/windows/temp/
[Link]
Reason: image not found
[Link]