Sanitize Your C++ Code - Kostya Serebryany - CppCon 2014
Sanitize Your C++ Code - Kostya Serebryany - CppCon 2014
Sanitize Your C++ Code - Kostya Serebryany - CppCon 2014
SECURITY
Do you have enough feet to use C++?
Bullet proof boots for C++:
● Finds
○ buffer overflows (stack, heap, globals)
○ heap-use-after-free, stack-use-after-return
○ leaks, ODR violations, init-order fiasco, double-free, etc
● Run-time library
○ malloc replacement (redzones, quarantine)
○ Bookkeeping for error messages
ASan report example: global-buffer-overflow
int global_array[100] = {-1};
int main(int argc, char **argv) {
return global_array[argc + 100]; // BOOM
}
% clang++ -O1 -fsanitize=address a.cc ; ./a.out
5
Good byte
4
Bad byte 3
Shadow value 2
-1
ASan virtual address space
0x7fffffffffff
0x10007fff8000 Shadow =
Addr / 8 + kOffset
0x10007fff7fff
0x02008fff7000
0x02008fff6fff
0x00008fff7000
0x00008fff6fff Application
0x00007fff7fff
Shadow
0x00007fff7fff
0x000000000000 mprotect-ed
ASan instrumentation: 8-byte access
*a = ...
char *shadow =
(a >> 3) + kOffset;
if (*shadow)
ReportError(a);
*a = ...
ASan instrumentation: N-byte access (1, 2, 4)
*a = ...
char *shadow =
(a >> 3) + kOffset;
if (*shadow &&
*shadow <= ((a&7)+N-1))
ReportError(a);
*a = ...
Instrumentation example (x86_64)
mov %rdi,%rax
shr $0x3,%rax # shift by 3
cmpb $0x0,0x7fff8000(%rax) # load shadow
je 1f <foo+0x1f>
ud2a # generate SIGILL*
movq $0x1234,(%rdi) # original store
char a[328];
}
Instrumenting stack frames
void foo() {
char rz1[32]; // 32-byte aligned
char a[328];
char rz2[24];
char rz3[32];
int *shadow = (&rz1 >> 3) + kOffset;
shadow[0] = 0xffffffff; // poison rz1
int a;
struct {
int original;
char redzone[60];
} a; // 32-aligned
Malloc replacement
● Run-time library
○ Malloc replacement
○ Intercepts all synchronization
○ Handles reads/writes
TSan report example: data race
int X;
std::thread t([&]{X = 42;});
X = 43;
t.join();
Application
0x7fffffffffff
0x7f0000000000
Protected
0x7effffffffff
0x200000000000
Shadow
0x1fffffffffff
0x180000000000
Protected
0x17ffffffffff
0x000000000000
Shadow cell
An 8-byte shadow cell represents one memory
TID
access:
○ ~16 bits: TID (thread ID)
○ ~42 bits: Epoch (scalar clock) Epo
○ 5 bits: position/size in 8-byte word
○ 1 bit: IsWrite
E1
Write in thread T1
0:2
W
Example: second access
T1 T2
E1 E2
Read in thread T2
0:2 4:8
W R
Example: third access
T1 T2 T3
E1 E2 E3
Read in thread T3
E1 E2 E3
● Constant-time operation
○ Get TID and Epoch from the shadow cell
○ 1 load from thread-local storage
○ 1 comparison
● CPU: 4x-10x
● RAM: 5x-8x
Trophies
● Speed
○ > 10x faster than other tools
● Uninitialized memory:
○ Returned by malloc
○ Local stack objects (poisoned at function entry)
Origin
0x5fffffffffff
0x400000000000
Shadow
0x3fffffffffff
0x200000000000
Protected
0x1fffffffffff
0x000000000000
MSan overhead
● Without origins:
○ CPU: 2.5x
○ RAM: 2x
● With origins:
○ CPU: 5x
○ RAM: 3x
Tricky part :(
● Libc
○ Solution: function wrappers
● Inline assembly
○ Openssl, libjpeg_turbo, etc
● 20+ in LLVM
○ Regressions caught by regular LLVM bootstrap
● Memory overhead
○ limited RAM on a device or VM
● CPU overhead is minor issue
○ but it has a cost in $$
http://code.google.com/p/address-sanitizer/
http://code.google.com/p/thread-sanitizer/
http://code.google.com/p/memory-sanitizer/
http://clang.llvm.org/docs/UsersManual.html
Quiz: find all bugs
#include <thread> // C++11
int main() {
int *a = new int[4];
int *b = new int[4];
std::thread t{[&](){b++;}};
delete a;
t.detach();
return *a + (*++b) + b[3];
}
Dynamic vs static analysis
Static analysis:
+ Checks all code
+ Does not require tests
- Complex methods don’t scale
- False positives
Dynamic analysis:
- Requires very good test coverage
- Requires to run tests, adds slowdown
+ Finds bugs that static analysis can not find in theory
+ No false positives
ASan/MSan vs Valgrind (Memcheck)
Valgrind ASan MSan
Heap out-of-bounds YES YES
Stack out-of-bounds YES
Global out-of-bounds YES
Use-after-free YES YES
Use-after-return YES
Uninitialized reads YES YES
CPU Overhead 10x-300x 1.5x-3x 3x
Why not a single tool?