09 FindingBugs
09 FindingBugs
Further Reading
§ Shoshitaishvili et al.: „SoK: (State of) The Art of War: Offensive Techniques in
Binary Analysis“, IEEE Symposium on Security and Privacy, 2016
§ Song et al.: „SoK: Sanitizing for Security“, IEEE Symposium on Security and
Privacy, 2019
1
3761
This Lecture
§ Last lectures
- Memory tagging
- Address Space Layout Randomization (ASLR)
- Control-Flow Integrity (CFI)
§ This lecture
- Finding bugs
- Fuzzing
- Symbolic execution
2
3761
3
3761
B Label_2
Label_3 C D Label_3
E F
Label_3 Label_4
5
3761
Finding Bugs
3761
General Thoughts
§ Human experts can find bugs by looking at source code
§ Static code analysis (SCA) methods find bugs by analyzing source code
§ Sanitizers add bug-finding code to the source code
§ git bisect can assist in finding bugs in source code (finds the exact commit
that introduced the change)
§ Explaining source code helps in finding bugs
§ print statements in source code can assist in finding bugs
§ Stepping through source code with a debugger reveals bugs
§ …
7
3761
Program Analysis
§ Program analysis: process of analyzing a given program behavior to
determine correctness, robustness, liveness, security, or other properties
§ Two main approaches
- Static analysis
• Analyze source code to find faults or check their absence
• Consider all possible inputs (in theory)
• Typically requires a lot of effort and generates false positives
- Dynamic analysis
• Run instrumented program to find problems (e.g., crashes)
• Need to select test inputs, only limited coverage (only executed code is
tested)
• Can find vulnerabilities, but cannot prove their absence
8
3761
Static Analysis
§ A static analysis tool S analyzes the source code of a program P to determine
whether it satisfies a property φ, such as:
- “P never deferences a null pointer”
- “P does not leak file handles”
- “No cast in P will lead to a ClassCastException”
9
3761
10
3761
Dynamic Analysis
§ Dynamic (Program) Analysis analyzes computer software while it is operating
(in contrast to static which looks only at code)
§ Unit tests, integration tests, system tests, and acceptance tests are all a form
of dynamic testing.
§ However, typically need to instrument code to understand where which kind
of problem occurred
12
3761
13
3761
Example: ASan
14
3761
Summary
Pros Cons
15
3761
Fuzzing
3761
Overview
§ Fuzzing (“fuzz testing”) is an automated software testing method
§ Provide invalid, unexpected, or random data as input
§ Closely monitor program behavior (e.g., crash, assertion, ...)
17
3761
§ Fuzzing § Testing
- Invalid, unexpected, or random - Normal, valid, well-formed data
data as input as input
- Automatically generated - Manually generated testcases
testcases
- Goal: Normal users should not
- Goal: Find exploitable errors get errors
18
3761
Fuzzing History
§ Very old technique, was considered worst means of testing
§ Term coined 1988 in class assignment by Miller: noise over fuzzy network
connections
§ Google runs ClusterFuzz since 2012 to fuzz Chrome, OSS-Fuzz to test open
source software
- As of August 2023, OSS-Fuzz has helped identify and fix over 10,000
vulnerabilities and 36,000 bugs across 1,000 projects
- Meta, Microsoft, Oracle, … have fuzzing teams
§ Most teams used fuzzing to automatically detect bugs in the DARPA Cyber
Grand Challenge 2016 (CGC)
- Likely also applies to AIxCC next year
§ american fuzzy lop (AFL) found many bugs, new version AFL++ is among the
most popular tools
19
3761
Types of Fuzzing
§ Different types of fuzzing
§ Fuzzing can be somewhere between dumb and smart
§ The smarter the fuzzing, the harder the setup (typically)
- However, smarter fuzzing can potentially find more
bugs
- But might also find less bugs because too much time
is spent in the heuristics / optimizations
§ Typically all fuzzers rely on some kind of mutations
- Mutation might be completely random or follow
some pattern
20
3761
22
3761
Evolutionary Fuzzing II
23
3761
24
3761
Example: Nyx
Code available at https://github.com/RUB-SysSec/
25
3761
Papers published at USENIX Security’22 and USENIX Security’23, code available at https://github.com/fuzzware-fuzzer/
26
3761
Symbolic Execution
3761
Overview
§ Goal is to find the required input to reach a certain position in the program
§ Programs are interpreted, input modelled using symbolic values
- Variables can be expressed using the symbolic values
- Symbolic state maps variables to symbolic values
- Path condition is a formula over symbolic values that encodes all branch
decisions taken so far (basically we keep track of all path conditions,
conditional jumps are constraint by the symbolic values)
- All paths in the program form an execution tree: some paths are feasible,
while some are infeasible
§ Expressions of symbolic values solved using SAT solvers (boolean satisfiability)
28
3761
Example
§ Illustration of how symbolic execution works
Symbolic Execution
x = λ
Example
y = 2 * λ
x = read(); z = 2 * λ + 4
y = x * 2; 12
z = y + 4 +4 !=
2
2 ·λ ·λ
λ!
if (z == 12) { 1 2= 4 =4 +4
bug(); λ=
} else {
… Symbolic Execution Symbolic Execution
}
bug() …
29
3761
Symbolic Execution
§ Very powerful tool with several shortcomings
- Symbolic execution does not scale well to complex programs
• Possible paths grow exponentially
- Unbounded loops (i.e., iterations depend on user input) and recursion
• Only approximated
• Finitize paths by unrolling loops and recursion (bounded verification)
• Or finitize paths by limiting the size of path conditions (also bounded)
- Environment (e.g., syscalls, file system, ...) and heap are hard to model
- Possible solution: Concolic (concrete + symbolic) execution
• Run symbolic execution in parallel with real execution
• Take real values if symbolic expressions get too complicated
30
3761
input[strlen(input) - 1] = 0; if (parity == i % 2)
digit *= 2;
if (checkSerial(input)) { Go here
printf("Number validated!\n"); sum += (digit / 10) + (digit % 10);
} else { }
printf("Invalid number\n"); Avoid this
} return 0 == sum % 10;
}
return 0;
}
31
3761
good = 0x8048630
avoid = (0x804862)
length = 12
project = angr.Project(’main.elf’)
state = project.factory.full_init_state()
simgr = project.factory.simgr()
simgr.explore(find=good, avoid=avoid)
s = simgr.found[0]
for i in range(length):
b = s.memory.load(0x0804a060 + i, 1)
s.add_constraints(b >= ord(’0’), b <= ord(’9’))
33
3761
Reverse Engineering
3761
Reverse Engineering
§ Reverse Engineering is the process of getting back source code from a binary
§ Identify bugs (or hidden features) if only the binary is available
§ Allows to find compiler-introduced bugs
§ Re-engineering allows to build a new binary from the reverse engineered
binary
§ Many tools available
- IDA Pro
- Ghidra
- radare2
- …
35
3761
Methods
§ Disassembler allows to
- Disassemble code (get assembly code)
- Analyze binaries (dependencies, strings, control flow)
- Debug programs (see actual register values, step through code)
- Disassembler only returns often hard-to-understand assembly code
§ Decompilers convert code to high-level language (e.g., C)
- Decompilation output is often a lot easier to read
- However, decompilation is a lot of magic - does not always work
- Highly dependent on architecture, used compiler, optimization level ,
obfuscation, …
- If it works, it gives a quick overview for further investigations
36
3761
37
3761
Binary Diffing
§ Diffing tools use different methods to find matching and unmatching blocks
- Same function name
- Same assembly, same decompiled code
- Equal number of calls to and from function
- Same referenced strings
- ...
§ Most diff tools rely on the control-flow graph from a disassembler
§ Basic blocks are then matched using some heuristic
§ Binary diffing is a way to reverse engineer patches
- If there are not many changes, vulnerability can be quickly spotted
§ Knowledge of the vulnerability allows attackers to craft exploits
38
3761
Conclusion
39
3761
Sources / References
40