The Automated Exploitation Grand Challenge: Tales of Weird Machines
The Automated Exploitation Grand Challenge: Tales of Weird Machines
Grand Challenge
Tales of Weird Machines
Julien Vanegue
[email protected]
2
What is Automated Exploitation?
• The ability to generate a successful computer
attack with reduced or entirely without
human interaction.
• It is important to understand the hardness of
AE to measure the risk on critical
infrastructure and online properties.
• There are many domains of attack: network,
web, kernel, system, hardware, applications.
Our focus today is on software security.
3
What are Weird Machines?
• Weird Machine (WM): “The underlying capacity
of a program to perform runtime computations
that escape the program specification”
4
Today’s exploits techniques
Modern history of exploit techniques :
• Meta-data corruption
– “Smashing C++ VPTRs” (Eric Landuiyt, 2000)
– “V00d00 malloc tricks” (Michel Kaempf, 2001)
– Many, many other papers. 5
Today’s exploits techniques (2)
• Information disclosure attacks
– Format bugs (tf8 wu-ftpd 2.6 site-exec exploit, ~ 2000)
– Weaknesses where content or address of target
variables/functions can be read (BIND TSIG Exploit by LSD-
PL, Openssl-too-open exploit by Sotirov, ~ 2001)
– “Return into printf/send” (“Bypassing PaX ASLR
protection”, Vanegue 2002)
• Heap chunks alignment techniques
– “Advanced DL malloc exploit” (JP @ core-st , 2003)
– “Heap Feng Shui”, (Sotirov, 2007)
• JIT attacks : make target generate “chosen” new code
– “Pointer inference and JIT spraying” (Blazakis, 2010)
6
Exploit Mitigations
• Data Execution Prevention (DEP/Openwall/PaX/W^X/etc)
• Address Space Layout Randomization (ASLR)
• Control-Flow Integrity (CFI)
• Intra-modular displacement randomization (IDR)
• Heap randomization (non-deterministic fitness algorithms)
• Many others targeted protections (UDEREF, SEHOP, canary
insertion, meta-data encoding, etc)
9
Exploit primitives
• Two major families of exploit primitives are
write primitives (write address space) and
read primitives (read address space).
• Early classification done by Gerardo Richarte
at core-st : “About exploit writing”, 2002.
• Modern classification done by Matt Miller:
“Modeling the exploitation and mitigation of
memory safety vulnerabilities”, 2012.
10
Exploit write primitive
General form: *(basepointer + offset) = value
(1) Base, offset and values are attacker-controlled
Write controlled value at controlled location
12
Rising exploit techniques
• Data-only attacks (DOA)
– Change internal program values to elevate privileges
without changing Program control flow.
– Infer address of data in program without direct
memory read primitives.
• Program Likelihood Inference (PLI)
– Probabilistic attacks: discover most likely executions
to successful exploitation in non-deterministic
environment.
– Timing attacks: discover internal program information
via run time execution measurements.
13
Tools Armory
14
Exploit Generation
• Automated Exploitation focuses on discovery and
combination of write primitives and read primitives.
• Automated Exploitation in Full Model is a very hard
problem. Anybody telling you otherwise is a fool or an
impostor.
• Existing AE work focused on Restricted Models:
– Sean Heelan’s “Automatic Generation of Control Flow
Hijacking Exploits for Software Vulnerabilities” :
http://www.cprover.org/dissertations/thesis-Heelan.pdf
– David Brumley et al. (AEG, MAYHEM, etc)
http://users.ece.cmu.edu/~dbrumley/pubs.html
15
Analysis and Exploit Automation
• Compilers (Program transformation)
• Fuzz testers (Input generation)
• SMT solvers (Symbolic reasoning)
• Model Checkers (State space exploration)
• Symbolic Execution Eng (Path generation)
• Emulators (Machine modeling)
• Abstract interpreters (Abstraction)
16
SMT solvers
SMT = Satisfiability Modulo Theories
Chunk 1, Size S1, Addr Chunk 2, Size S2, Chunk 3, Size S3,
S1 = A1 Addr A2 = A1 + S1 Addr A3 = A2 + S2
Chunk 1, Size S1, Addr Chunk 3, Size S3, Chunk 2, Size S2,
S2 = A1 Addr A2 = A1 + S1 Addr A3 = A2 + S3
23
Markov transition system
0.9
S1
0.1
S3
0.6 S2
0.4 0.95
0.05
S4 S5
S6
The transition system models the set of all possible random walks.
24
Markov transition system
Previous slide explained:
28
Hilbert’s program
http://en.wikipedia.org/wiki/Hilbert's_program
29
A Program for Automated Exploitation
• Inspired by David Hilbert and many ones after
him, we define a list of problems whose solutions
pave the way for years to come in the realm of
automated low-level software analysis.
• The Grand Challenge consists of a set of 11
problems in the area of vulnerability discovery
and exploitation that vary in scope and
applicability.
• Most problems relate to discovering and
combining exploit primitives to achieve elevation
of privilege.
30
Exploit challenges are not new
• Gerardo Richarte’s insecure programming (from 10
years ago!) constitutes great training for manual exploit
writing:
http://community.coresecurity.com/~gera/InsecurePro
gramming/
31
Nature of Grand Challenge problems
• Exploit Specification problem (A, H)
• Input generation problems (B, C, D, E)
• Exploit Primitive composition problem (F)
• Environment determination (I, J, K)
• State space representation (G)
33
Exploit specification
Problem A: Given a program P, determine the
set of assertions S for which satisfying any a in S
is equivalent to corrupting the program.
In other words,
what is the program P anti-specification ?
34
Problem A code
F(int x, int y)
{
int loc[4];
int idx = G(x, y);
if (idx > 4)
return -1;
assert(idx >= 4); // do infer assertion
loc[idx] = 0x00;
}
35
Pre/post-conditions inference
Problem B: Given a program function and an
assertion in the function, determine the necessary
and sufficient pre/post conditions such as the
assertion is true if and only if the pre/post
conditions is true.
This is equivalent to the input generation problem
(we start with loop-free programs).
Note: May need to walk over call graph to resolve
problem transitively from entry point to assertion.
36
Problem B code
PRECOND (?) PRECOND (?)
F(int x, int y) Int G(int x, int y)
{ {
int array[4]; if (x < y) return x;
int idx = G(x + y); else return 0;
assert(idx >= 4); }
array[idx] = 0; POSTCOND (?)
}
37
Problem B code
PRECOND (?) PRECOND (?)
F(int x, int y) Int G(int x, int y)
{ {
int array[4]; if (x < y) return x;
int idx = G(x + y); else return 0;
assert(idx >= 4); }
array[idx] = 0; POSTCOND (?)
}
38
Loop assertion inference
Problem C: Given a program loop and an assertion
A1 within or at the loop exit-node, determine loop-
assertion A2 such as A1 is true if and only if A2 is
true.
39
Problem C code
F(char *buf, int bufsz)
{
int limit = bufsz;
int idx = 0;
loop_assertion(?)
while (i < limit)
{
if (buf[i] == ‘{’) limit++;
else if (buf[i] == ‘}’) limit--;
i++;
}
assert(i >= sizeof(buf));
buf[i] = 0;
} 40
Exploit input definability
Problem D: Given an initial state I of a program
P with functions and loops, exhibit an algorithm
converging to a desired sink state.
41
Problem D code
Precondition(?) // D = A + B + C
F(int x, int y)
{
int loc[4];
int idx = G(x, y);
if (idx > 4)
return -1;
while (x < y) idx++;
assert(idx >= 4); // how to reach this?
loc[idx] = 0x00;
}
42
Exploit derivability
Problem E: Given a concrete program input and
associated program crash/log, find the longest
crash trace prefix from which the desired
exploitable program state can be reached.
F() {
p1a = (struct s1*) calloc(sizeof(struct s1), 1);
p1b = (struct s1*) calloc(sizeof(struct s1), 1);
p1c = (struct s1*) calloc(sizeof(struct s1), 1);
}
G() { p2 = (struct s2*) calloc(sizeof(struct s2), 1); }
H() { free(p1b); }
I() { memset(p1a, 0x01, 32); }
J() { if (p2 && p2->authenticated) puts(“you win”); } // Print this
K() { if (p1a && p1a->ptr) *(p1a->ptr) = 0x42; } // Avoid crash here
52
Generalized program timing attack
Problem J: Define the necessary and sufficient execution
time analysis conditions to infer value, size, or location of:
54
Indirect information disclosures
Problem K: Define the necessary and sufficient
conditions to infer the value or address of a
variable without a direct read primitive, such as:
56
Conclusion
• We decomposed the problem of Automated
Exploit Generation in a set of challenges with
clear intermediate assumptions.
• Resolving one such sub-problem is a step towards
automated end-to-end solutions of larger and
larger sub-classes of exploits.
• Even though Automated Exploitation is an
undecidable problem, observing that most
vulnerabilities are shallow allows the problem to
be approached.
57
Questions / Discussion
• Thanks for attending H2HC’s 10th anniversary
58