0% found this document useful (0 votes)
217 views77 pages

Be A Binary Rockstar

This document discusses the design goals and features of the Binary Ninja Intermediate Language (BNIL) for program analysis. The key goals of BNIL are to be human-readable, computer-understandable using techniques like SSA form and 3-address code, easy for plugins to understand and use, and easy for lifting different processor architectures. BNIL aims to have multiple levels or stages of intermediate representation from a low-level form close to assembly to higher-level forms closer to decompiled source code. This is intended to make analysis more effective and tools like decompilers more achievable. Semantic flags and stack frame analysis are important techniques used to translate between the low-level and higher-level IL forms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views77 pages

Be A Binary Rockstar

This document discusses the design goals and features of the Binary Ninja Intermediate Language (BNIL) for program analysis. The key goals of BNIL are to be human-readable, computer-understandable using techniques like SSA form and 3-address code, easy for plugins to understand and use, and easy for lifting different processor architectures. BNIL aims to have multiple levels or stages of intermediate representation from a low-level form close to assembly to higher-level forms closer to decompiled source code. This is intended to make analysis more effective and tools like decompilers more achievable. Semantic flags and stack frame analysis are important techniques used to translate between the low-level and higher-level IL forms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Be a Binary Rockst r

An Introduction to Program Analysis with


Binary Ninja
Agenda
● Motivation
● Current State of Program Analysis
● Design Goals of Binja Program
Analysis
● Building Tools
2
Motivation
3
● Tooling - Concrete -> Symbolic
○ Increase speed & effectiveness of RE / VR
● Make Program Analysis more accessible &
useful

4
Foundations

• Need to understand code semantics


• Could be done directly on the assembly
• An Intermediate Language (IL) is needed

5
Why IL?
• Architecture Abstraction
• Smaller number of instructions

6
Easy to lift
• Simple flags calculation
• As close to native instructions as possible
• Typeless - types inferred later

7
Easy to read
• Intuitive to read
• Tree-based infix notation
• No register abstraction
• Flags calculation only when necessary
• Avoid excessive temporaries

8
IL Instruction Set
Size

ze
aly

Ea
an

sie
r to

r to
sie

lift
Ea

Instruction
Set Size

9
The Options
10
Existing Options for IL
• BAP
• VEX
• REIL
• LLVM
• IDA

11
BAP
• Tree-tree based :)
• Flags are explicit and inhibit readability :(
• Written in OCAML :(

12
addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32
REBX:u32 = REBX:u32 + REAX:u32
RCF:bool = REBX:u32 < t:u32
addr 0x2 @asm ”shl %cl,%ebx”
add ebx, eax t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 &
shl ebx, cl 0x1f:u32)
RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) &
RCF:bool | ̃((RECX:u32 & 0x1f:u32) = 0:u32) &
low:bool(t1:u32)

13
VEX
• Register names are abstracted :(
• Single assignment :(
• Over 1000 instructions! :(
• Yet they call it “RISC-like”
• Even Angr is planning a move away from it

14
t0 = GET:I32(16)
t1 = 0x8:I32
subs R2, R2, #8 t3 = Sub32(t0,t1)
PUT(16) = t3
PUT(68) = 0x59FC8:I32

15
REIL
• Tiny instruction set
• Horrible readability
• Makes abstractions nearly impossible
• Flags are explicit and inhibit readability :(

16
00000000.00 STR R_EAX:32, , V_00:32
00000000.01 STR 0:1, , R_CF:1
00000000.02 AND V_00:32, ff:8, V_01:8
00000000.03 SHR V_01:8, 7:8, V_02:8
00000000.04 SHR V_01:8, 6:8, V_03:8
00000000.05 XOR V_02:8, V_03:8, V_04:8
00000000.06 SHR V_01:8, 5:8, V_05:8
00000000.07 SHR V_01:8, 4:8, V_06:8
00000000.08 XOR V_05:8, V_06:8, V_07:8
00000000.09 XOR V_04:8, V_07:8, V_08:8
00000000.0a SHR V_01:8, 3:8, V_09:8
00000000.0b SHR V_01:8, 2:8, V_10:8
test eax, eax 00000000.0c XOR V_09:8, V_10:8, V_11:8
00000000.0d SHR V_01:8, 1:8, V_12:8
00000000.0e XOR V_12:8, V_01:8, V_13:8
00000000.0f XOR V_11:8, V_13:8, V_14:8
00000000.10 XOR V_08:8, V_14:8, V_15:8
00000000.11 AND V_15:8, 1:1, V_16:1
00000000.12 NOT V_16:1, , R_PF:1
00000000.13 STR 0:1, , R_AF:1
00000000.14 EQ V_00:32, 0:32, R_ZF:1
00000000.15 SHR V_00:32, 1f:32, V_17:32
00000000.16 AND 1:32, V_17:32, V_18:32
00000000.17 EQ 1:32, V_18:32, R_SF:1
00000000.18 STR 0:1, , R_OF:1 17
LLVM
● Easy to analyze and has great tools already
available
● It’s a compiler!
○ Reversers want a decompiler.
○ Cannot be the only goal

18
LLVM Challenges

● Hard to lift well from compiled binaries


○ Designed for compiler output
● Expects type information in the instructions
● SSA form - assembly is not
● Stack in assembly looks like a structure, but
structures lose many advantages of SSA
19
IDA

?
20
Binary Ninja’s Answer
• Binary Ninja Intermediate Language (BNIL)

21
IL Goals & Design
22
Why Another IL?
● Popular existing ILs for compiled binaries are not very
human readable. They are extremely low level and
verbose.
● Existing ILs are single stage. Heavyweight analysis must
be performed to get anywhere close to decompiled output.
● Writing a lifter for a new architecture is usually very time
consuming.

23
Binary Ninja IL
● Create a family of ILs with multiple stages of analysis
● Lowest level is close to assembly
● After analysis and transformations, higher levels are
closer to decompiled output and would be much easier to
translate to good LLVM code
● Analysis involved in each transformation is easy to
understand, fast, and directly aids further analysis

24
IL Design Goals
● Human readable
● Computer understandable (SSA, 3AF, etc.)
● Plugin understandable
● Easy to lift native architectures
● Translation to other ILs such as LLVM

25
Human Readable

● Reads like pseudocode, even in lowest level


form
● Flags are resolved into readable expressions

26
Low Level IL Example
lea rax, [0x201047] rax = 0x201047
lea rdi, [0x201040] rdi = 0x201040
push rbp push(rbp)
sub rax, rdi rax = rax - rdi
mov rbp, rsp rbp = rsp
cmp rax, 0xe if (rax u> 0xe) then
ja 0x68d 6 @ 0x68d else 8 @ 0x68b

x86-64 Assembly Low Level IL

27
Low Level IL Example
addiu $sp, $sp, -0x18 $sp = $sp - 0x18
sw $ra, 0x14($sp) [$sp + 0x14].d = $ra
lw $a0, ($a1) $a0 = [$a1].d
jal atoi call(atoi)
nop $at = $v0 u< 0x20 ? 1 : 0
sltiu $at, $v0, 0x20 if ($at == 0) then
beqz $at, 0x4002d8 7 @ 0x4002d8 else
nop 12 @ 0x400290

MIPS Assembly Low Level IL


28
Computer Understandable

● Multiple IL forms
● Pick the right IL for the task at hand

29
IL Forms
Lifted IL Low Level IL SSA /
ASM -> IL Flags use resolved 3AF

High Level IL Medium Level IL


Calls in high level form Stack usage resolved SSA /
3AF
Expression folding Type propagation
Like decompiled output
30
Plugin Understandable

● All IL forms directly accessible from API


● Analysis performed on IL also accessible by
API

31
Easy to Lift

● Expression tree
● Designed for quick, modular lifter
implementations
● Semantic flags eases the burden of describing
flag effects during lifting

32
Semantic Flags
● Architecture plugins define the set of flags and
their semantic roles
● Instructions can define a set of flags they write
● Data flow analysis is performed to link flag
uses to flag writes

33
Semantic Flags
● In most compiled code, flags are resolved to
simple comparison expressions with no effort
from the architecture plugin
● Special cases fall back to emitting concrete
flag write expressions

34
Semantic Flags Example
“Writes to all ALU flags” Folded expression
describing use of flags
sub.q{*}(rax, 0xe)
if (rax u> 0xe) then
if (u>) then
… else …
… else …

“Flag state representing


unsigned greater than”
35
Translating Upwards

● Semantic flags analysis gives Low Level IL with flag


usage fully resolved
● Stack is represented as memory accesses, so data flow
can be difficult to compute on stack variables in Low Level
IL
● Need to analyze and translate to Medium Level IL

36
Low Level IL to Medium Level IL
● Low Level IL is translated to SSA form
● Use implicit data flow from SSA to resolve stack layout
● Data flow based stack layout resolution avoids problems
with nonstandard frame pointer behavior
● Translate loads and stores on stack to stack variable uses
and assignments

37
Medium Level IL Example
push(ebp)
ebp = esp
var_4 = ebp
esp = esp - 0x18
eax = arg_4
eax = [ebp + 8].d
var_1c = eax
[esp].d = eax
free(var_1c)
call(free)
ebp = var_4
esp = ebp
return
ebp = pop
<return> jump(pop)
Medium Level IL
38
Medium Level IL
● Registers and stack usage are now both treated as
variables
● Stack variables no longer use explicit memory access
● Translate to SSA form to obtain implicit data flow on both
registers and stack variables
● Type propagation is performed on SSA form

39
Using Medium Level IL - Jump Tables

40
Using Medium Level IL - Jump Tables
● Jump table resolution based on path-sensitive data flow
● SSA conversion process also tracks control flow
dependence for every block
● Data flow computations allow disjoint sets of possible
values
● Reads from memory are simulated
● At jump site, possible values are the possible jump targets

41
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)
Medium Level IL
Solve for this to get jump targets SSA Form
42
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Track flow backwards with SSA to find definitions


43
Jump Table Example
x8#1 = zx.q(x0#2.d)
if (x0#2.d u> 0x1f)
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Memory read depends on value of x8#1


44
Jump Table Example
x8#1 = zx.q(x0#2.d) Value used in
if (x0#2.d u> 0x1f) branch
then … else … comparison

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

45
Jump Table Example
x8#1 = zx.q(x0#2.d) Branch condition
if (x0#2.d u> 0x1f) must be false to
then … else … reach jump site

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

46
Jump Table Example
When false, we
x8#1 = zx.q(x0#2.d) know that x0#2.d
if (x0#2.d u> 0x1f) is between 0 and
then … else … 0x1f inclusive

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

47
Jump Table Example
Resolve forward
x8#1 = zx.q(x0#2.d) to obtain possible
if (x0#2.d u> 0x1f) jump targets
then … else …

… x8#2 = sx.q([table + (x8#1 << 2)].d)


x8#3 = x8#2 + table
jump(x8#3)

Set of possible values here are the jump targets 48


Using Medium Level IL - Jump Tables
● More complex idioms need to combine multiple sources of
information
● Value through SSA ϕ-functions is the set union of the
inputs
● Value of a specific SSA variable is the set intersection of
the information found in the definition and all uses of the
variable

49
Leveraging the Jump Table Algorithm
● Single jump table algorithm works on all architectures with
no additional effort from architecture plugin
● Control flow dependence information accessible from API
● Queries for set of possible values accessible from API

50
The Final Forms
● Medium Level IL has the type information, stack
knowledge, and SSA form to translate easily into LLVM IR
● Further analysis can be performed to translate to High
Level IL, the Binary Ninja IL that will be used to create its
decompiler
● All aspects of every IL form are plugin accessible, so
translating to other representations is straightforward

51
Binary Ninja for Profit
52
● Python, C and C++ API’s
Binja API
(headless)

● Branches: Basic block/ Function edges


(incoming & outgoing)

● Get the register states, some naive range


analysis

● api.binary.ninja/search.html
53
binja_memcpy.py: IL
/bin/bash

54
binja_memcpy.py: IL
/bin/bash

55
binja_memcpy.py: API

56
binja_memcpy.py: API

57
binja_memcpy.py: API

58
binja_memcpy.py: API

59
binja_memcpy.py: Output

60
SSA: Uninitialized variable
for func in bv.functions:
for block in func.medium_level_il.ssa_form:
for instr in block:
visit_instr(instr)

61
SSA: Uninitialized variable
def visit_instr(inst):
# Read of variable
if inst.operation == MLIL_VAR_SSA:
# Not written
if inst.index == 0:
if inst.src.type == StackVariableSourceType:
# Local variables
if inst.src.identifier < 0:
print
("Uninitialized stack variable reference at " + hex(inst.address))

62
SSA: Uninitialized variable
else:
for op in instr.operands:
if isinstance(op, MediumLevelILInstruction):
visit_instr(op)

63
Symbolic Execution
● Very accurate

● Takes time, data, and memory, often not feasible

● IDEA! Reasoning only about what we care about.

● Apply complex data to abstract domains !

● Domains: type, sign, range, color etc….


64
Abstract Interpretation ● Sets of
concrete values
are abstracted
imprecisely

● Galois
Connection
formalizes
Concrete
<-> Abstract

65
Abstract Interpretation
● X ‘s value is imprecise
int x; ● Compilers perform imprecise
int[] a = new abstraction
int[10]; 1. Add precision - i.e. declare
a[2 * x] = 3; 1. abstract value [0, 9]
2. Symbolically execute with
abstract domain/ values
● Requires control-flow analysis
66
Abstract Domains & Sign Analysis
int a,b,c; ● Map variables to an
a = 42; abstract value
b = 87;
if (input) {
c = a + b;
} else {
c = a - b;
} 67
Abstract Domains & Sign Analysis
● Binary Ninja plugin

● Under approximate
● Path sensitive - construct lattices of
abstract values
● One abstract state per CFG node
● Avoid loss in precision for fractions.
68
Demo!
● Analyze
example
program

● PHP
CVE-2016-6289

69
UAF Analysis:
PointsTo for Binja IL
● Before: Allocation -> Write
● UAF Analysis: Allocation -> Free -> Use
● Key Idea: Data flow graph, assignments, copies, dereferences,
and frees of pointers
● Context and path sensitive (path API == soon!).
blog.trailofbits.com/2016/03/09/the-problem-with-dynamic-program-analysis/
70
Devirtualizing C++
● VTable Function Call
● Example: mov eax, [ecx]; call [eax + 4]

https://blog.trailofbits.com/2017/02/13/devirtualizing-c-with-binary-ninja/ 71
Devirtualizing C++

72
Devirtualizing C++

73
Playing with Scripts!
● memcpy, headless python
API script
https://github.com/
● Depth-first-search, path
sensitive CFG template trailofbits/binjascripts
● Sign analysis, abstract domain
plugin, CFG traversal script
● And much much more ….
74
Conclusion: Resources
● binary.ninja/
● Abstract Interpretation talk:
santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
● Static Program Analysis Book!
cs.au.dk/~amoeller/spa/spa.pdf

75
Conclusion: Binary Ninja

76
Contact Us
Sophia d’Antoine
- IRC/Slack: @quend
- [email protected]

Peter LaFosse
Rusty Wagner
- https://binaryninjaslack.herokuapp.com
- [email protected]
77

You might also like