Week 05 Testing
Week 05 Testing
The save button code is broken, no transition to “saving the file” state
Implementation bug: Code introduces new states not represented in the
conceptual state machine
A cosmic ray causes a bit flip in a voting machine’s memory, causing a state
where one candidate has an impossible number of votes
This list is not intended to be exhaustive; merely to illustrate the myriad ways that unintended states may enter a system; deciding which ones to defend against is one step of proper threat modeling.
For any interesting program, it is
essentially impossible to manually
explore the full state space to find the
unintended states
Fuzzing
Fuzzing
Find bugs in a program by feeding it random, corrupted, or unexpected data
Idea: Random inputs will explore a large part of the state space
Some unintended states are observable as crashes (SIGSEGV, abort())
Works best on programs that parse files or process complex input data
Fuzzing example
Fuzzing can be as simple as:
cat /dev/random | head -c 512 > rand.jpeg; open rand.jpeg
Measure the JPEG parser to see how deep we’re getting in the code
Common fuzzing strategies
Mutation-based fuzzing
4. Go back to step 2
Can mutation-based “dumb” fuzzing be successful?
In 2010, Charlie Miller fuzzed PDF viewers using the following mutation program:
numwrites = random.randrange(math.ceil((float(len(buf)) / FuzzFactor))) + 1
for j in range(numwrites):
rbyte = random.randrange(256)
rn = random.randrange(len(buf))
buf[rn] = "%c"%(rbyte)
Dumb fuzzing is often way more successful than it has any right to be
Mutation-based fuzzing
Advantages
Can use off-the-shelf software (possibly with a harness) for many programs
Limitations
Results depend strongly on the quality of the initial corpus
4. Go back to step 2
Syzkaller
A kernel system call fuzzer that uses
test case generation and coverage
https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions.md
Generation-based (smart) fuzzing
Advantages
Can get deeper coverage faster by leveraging knowledge of the input format
Limitations
https://googleprojectzero.blogspot.com/2020/07/mms-exploit-part-2-effective-fuzzing-qmage.html
american fuzzy lop (AFL)
1. Compile the program with
instrumentation to measure
coverage
2. Trim the test cases in the queue
to the smallest size that doesn’t change the program behavior
3. Create new test cases by mutating the files in the queue using traditional
fuzzing strategies
https://lcamtuf.coredump.cx/afl/README.txt
Coverage guided fuzzing
Advantages
Very good at finding new program states, even if the initial corpus is limited
Limitations
Not a panacea to bypass strong checksums or input validation
https://googleprojectzero.blogspot.com/2020/07/mms-exploit-part-2-effective-fuzzing-qmage.html
Fuzzing the Samsung Qmage image codec: harness
A fuzzing harness was written to call d2s:/data/local/tmp $ ./loader accessibility_light_easy_off.qmg
[+] Detected image characteristics:
[+] Dimensions: 344 x 344
the interesting functions in the library [+] Color type: 4
[+] Alpha type: 3
and supply the test case input from the [+] Bytes per pixel: 4
[+] codec->GetAndroidPixels() completed successfully
d2s:/data/local/tmp $
fuzzer
An emulator (qemu-aarch64) was used to run the harness and Qmage library on a
Linux machine
Easier to get 1000 Linux cores than 1000 Samsung Galaxy phones
Fuzzing the Samsung Qmage image codec: coverage
Code coverage was collected by
modifying qemu-aarch64 to trace
executed PC addresses
Coverage feedback
compensated for the small
number of initial test cases
Fuzzing the Samsung Qmage image codec: results
4 weeks of fuzzing
87.3% coverage of the Qmage
codec
https://github.com/googleprojectzero/fuzzilli
Fuzzing summary
Off-the-shelf fuzzers are excellent at
finding bugs This code parses untrusted data
Custom fuzzers are also excellent at
finding bugs
Should I
Different fuzzers often find different write a Yes
bugs fuzzer?
https://web.stanford.edu/class/cs107/resources/valgrind.html
AddressSanitizer (ASan)
Fast memory error detector for C/C++ using compiler instrumentation and a
runtime library that replaces malloc() to surround allocations with redzones
Out-of-bounds accesses
==9901==ERROR: AddressSanitizer:heap-use-after-free on address 0x60700000dfb5 at pc 0x45917b
Use-after-free bp 0x7fff4490c700 sp 0x7fff4490c6f8
READ of size 1 at 0x60700000dfb5 thread T0
#0 0x45917a in main use-after-free.c:5
Use-after-return #1 0x7fce9f25e76c in libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
#2 0x459074 in _start (a.out+0x459074)
0x60700000dfb5 is located 5 bytes inside of 80-byte region [0x60700000dfb0,0x60700000e000)
Use-after-scope freed by thread T0 here:
#0 0x4441ee in interceptor_free projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64
#1 0x45914a in main use-after-free.c:4
Double-free, invalid free #2 0x7fce9f25e76c in libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
previously allocated by thread T0 here:
#0 0x44436e in interceptor_malloc projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74
Memory leaks #1 0x45913f in main use-after-free.c:3
#2 0x7fce9f25e76c in libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
SUMMARY: AddressSanitizer: heap-use-after-free use-after-free.c:5 main
Typically 2x slowdown
https://github.com/google/sanitizers/wiki/AddressSanitizer
AddressSanitizer (ASan)
Fast memory error detector for C/C++ using compiler instrumentation and a
runtime library t hat replaces malloc() to surround allocations wit h redzones
Out-of-bounds accesses
==9901==ERROR: AddressSanitizer:heap-use-after-free on addre ss 0x60700000dfb5 at pc 0x45917b
Use-after-free bp 0x7fff4490c700 sp 0x7fff4490c6f8
READ of size 1 at 0x60700000dfb5 thread T0
#0 0x45917a in main use-after-free.c:5
Use-after-return #1 0x7fce9f25e76c in __libc_start_main /build/buildd/egl ibc-2.15/csu/libc-start.c:226
#2 0x459074 in _start (a.out+0x459074)
0x60700000dfb5 is located 5 bytes inside of 80-byte region [ 0x60700000dfb0,0x60700000e000)
Use-after-scope freed by thread T0 here:
#0 0x4441ee in __interceptor_free projects/compiler-rt/l ib/asan/asan_malloc_linux.cc:64
#1 0x45914a in main use-after-free.c:4
Double-free, invalid free #2 0x7fce9f25e76c in libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
previously allocated by thread T0 here:
Typically 2x slowdown
https://github.com/google/sanitizers/wiki/AddressSanitizer
ThreadSanitizer (TSan)
Data race detector for C/C++
Similar in principle to AddressSanitizer but for race conditions
High overhead
https://clang.llvm.org/docs/ThreadSanitizer.html
Frida
Dynamic instrumentation for
closed-source binaries
https://frida.re/docs/hacking/
Frida
Dynamic instrumentation for
closed-source binaries
https://frida.re/docs/hacking/
Static analysis
Static analysis
Using a tool to analyze a program’s behavior without actually running it
Z == 2 crash
... crash
Data flow analysis X=0
X: {0}
Determine the possible values of variables at Y =A
points in the control flow graph X: ⊤; Y: {A}; Z: ⊤
X == Y
Approximations are usually needed
X: {A}; Y: {A}; Z: ⊤ X: ⊤; Y: {A}; Z: ⊤
X == Y Z=1
X: {A}; Y: {A}; Z: ⊤
Z == 2 crash
... crash
Data flow analysis X=0
X: {0}
Determine the possible values of variables at Y =A
points in the control flow graph X: ⊤; Y: {A}; Z: ⊤
X == Y
Approximations are usually needed
X: {A}; Y: {A}; Z: ⊤ X: ⊤; Y: {A}; Z: ⊤
X == Y Z=1
X: {A}; Y: {A}; Z: ⊤
Z == 2 crash
... crash
static int vipx_ioctl_get_container(struct vs4l_container_list *karg,
struct vs4l_container_list user *uarg)
{
...
ret = copy_from_user(karg, uarg, sizeof(*karg));
Taint analysis
...
ucon = karg->containers;
size = karg->count * sizeof(*kcon);
kcon = kzalloc(size, GFP_KERNEL);
...
Identify sources of “tainted” data
karg->containers = kcon;
ret = copy_from_user(kcon, ucon, size);
if (ret) { User/attacker input
vipx_err("Copy failed [CONTAINER] (%d)\n", ret);
}
goto p_err_free; Reads from files/network
for (idx = 0; idx < karg->count; ++idx) {
ubuf = kcon[idx].buffers;
size = kcon[idx].count * sizeof(*kbuf);
kbuf = kzalloc(size, GFP_KERNEL);
Check to see if tainted data flows
...
kcon[idx].buffers = kbuf; into a “trusted sink”
ret = copy_from_user(kbuf, ubuf, size);
if (ret) {
vipx_err("Copy failed [CONTAINER] (%d)\n", ret);
goto p_err_free; memcpy()
}
}
...
free()
return 0;
p_err_free: bzero()
for (idx = 0; idx < karg->count; ++idx)
kfree(kcon[idx].buffers);
kfree(kcon);
p_err:
return ret;
}
https://bugs.chromium.org/p/project-zero/issues/detail?id=1978
static int vipx_ioctl_get_container(struct vs4l_container_list *karg,
struct vs4l_container_list user *uarg)
{
...
ret = copy_from_user(karg, uarg, sizeof(*karg));
Taint analysis
...
ucon = karg->containers;
size = karg->count * sizeof(*kcon);
kcon = kzalloc(size, GFP_KERNEL);
...
Identify sources of “tainted” data
karg->containers = kcon;
ret = copy_from_user(kcon, ucon, size);
if (ret) { User/attacker input
vipx_err("Copy failed [CONTAINER] (%d)\n", ret);
}
goto p_err_free; Reads from files/network
for (idx = 0; idx < karg->count; ++idx) {
ubuf = kcon[idx].buffers;
size = kcon[idx].count * sizeof(*kbuf);
kbuf = kzalloc(size, GFP_KERNEL);
Check to see if tainted data flows
...
kcon[idx].buffers = kbuf; into a “trusted sink”
ret = copy_from_user(kbuf, ubuf, size);
if (ret) {
vipx_err("Copy failed [CONTAINER] (%d)\n", ret);
goto p_err_free; memcpy()
}
}
...
free()
return 0;
p_err_free: bzero()
for (idx = 0; idx < karg->count; ++idx)
kfree(kcon[idx].buffers);
kfree(kcon);
p_err:
return ret;
}
https://bugs.chromium.org/p/project-zero/issues/detail?id=1978
Clang static analyzer
Check for common security issues
with a static analysis framework in
the compiler
Built in checkers:
https://clang-analyzer.llvm.org/images/analyzer_html.png
CodeQL (Semmle)
Query language for finding patterns
in large codebases
https://msrc-blog.microsoft.com/2018/08/16/vulnerability-hunting-with-semmle-ql-part-1/
Manual analysis
Reverse engineering
Looking at a compiled program in order to figure out what it does and how it works
Usually assisted by tools
Disassembler
Decompiler
Strings
Often aided by dynamic analysis
Tracing
IDA Pro
Disassembly
Decompilation
Binary analysis
Scripting
Ghidra
Similar to IDA
Open source
Written by the
NSA (no, really)
https://en.wikipedia.org/wiki/Ghidra#/media/File:Ghidra-disassembly,March_2019.png
Tips for writing (more) secure software
Software tests
One of the most effective ways to reduce bugs
Unit tests: Check that each piece of code behaves as expected in isolation
Goal: Unit tests should cover all code, including error handling
If you don’t run regression tests, attackers will run them for you!
Understand and document your threat model early in the design process
Treat all input from outside your process adversarially, even if you trust the sender
Information density matters: complex structured binary formats (e.g. JPEG) are
generally more “fuzzable” than verbose or textual ones (e.g. Python source)
Even if the code being analyzed isn’t a good fit for fuzzing, it may be possible to
transform it into a fuzzable data-processing program
Developing exploits
“Exploits are the closest thing to ‘magic
spells’ we experience in the real world:
Construct the right incantation, gain
remote control over device.”
— Thomas Dullien
https://docs.google.com/presentation/d/1YcBqgccBcdn5-v80OX8NTYdu_-qRmrwfejlEx6eq-4E/edit