0% found this document useful (0 votes)
63 views

CPP Dynamic Type Recovery

This document describes techniques for automating the reconstruction of dynamic structures and type information from C++ binaries through reverse engineering. It discusses existing DBI-based approaches, limitations, and the author's contributions, including tracking structure accesses and argument types at runtime to derive type-related metadata and apply it in disassemblers. The techniques allow automatic creation and application of type information to improve reverse engineering productivity.

Uploaded by

huyphamxt1992
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

CPP Dynamic Type Recovery

This document describes techniques for automating the reconstruction of dynamic structures and type information from C++ binaries through reverse engineering. It discusses existing DBI-based approaches, limitations, and the author's contributions, including tracking structure accesses and argument types at runtime to derive type-related metadata and apply it in disassemblers. The techniques allow automatic creation and application of type information to improve reverse engineering productivity.

Uploaded by

huyphamxt1992
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

Automation Techniques in C++ Reverse Engineering

Rolf Rolles, Möbius Strip Reverse Engineering

July 16, 2019


Introduction

Dynamic Structure Reconstruction


Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem

Dynamic Resolution of Argument Types


Preprocessing
Run-Time Data Collection
Applying the Results

Further Extensions and Challenges


Extensions
Challenges

Conclusion
Introduction
Genesis of this Research

I While researching an upcoming C++ RE training class, I:


I Practiced statically reverse engineering large C++ binaries.
I Spent ˜85%-95% of my time creating and setting types.
I Experimented with automating type-related activities.
I A few of my results are detailed in this presentation.

I Goal: derive type-related metadata from runtime allocation


and structure access data, and apply it in IDA and Hex-Rays.
I The techniques are simple, but the results are very useful!
I Two primary analyses, both based on DLL injection:
1. Track structure accesses
2. Track data flow from allocation sites into function arguments
Introduction
Type Information

Type information is the


difference between this:
unreadable, borderline
useless gibberish . . .
Introduction
Type Information

. . . and this: nearly


perfect code versus the
original source, minus
names and comments.
However, it is tedious to
create and apply type
information, so let’s
automate it.
Introduction
Interesting Type-Related Information

Discover, through dynamically executing the program:


I All exercised allocation sites and their sizes
I Size and layout of structures; types for its fields
I Also discover structures contained within other structures
I All locations accessing allocated structures of interest
I Type relationships between fields of different structures
I Function argument and local variable types

(And, some more experimental stuff described later.)


Introduction
Numbers for my Current Target

These techniques allowed me to automatically (or semi-auto.)


create and apply type information for my current target:

Structures recovered ˜200


Structure references added 10,000+
Union selections applied ˜2,200
Variable types modified ˜6,000
Argument types modified ˜2,750
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Limitations of DBI-Based Solutions
My Contributions to this Problem
Dynamic Structure Reconstruction
Inspiration

I Existing academic work on the subject inspired me:


I Howard: A Dynamic Excavator for Reverse Engineering Data
Structures by Slowinska et al
I dynStruct: An Automatic Reverse Engineering Tool for
Structure Recovery and Memory Use Analysis by Mercier
I The author has published the source code on GitHub.

I I adapted and modified their ideas for better performance and


increased flexibility.
I For example, I use DLL injection instead of DBI.
Dynamic Structure Reconstruction
Overview

The workflow of these tools is as follows:


1. Locate addresses of malloc, free, etc.
2. Hook these memory routines at runtime to record:
I The allocation site (e.g., address of the call to malloc)
I The size of the allocation
I The pointer returned by malloc
I Discard this information upon a call to free
3. Run the program under dynamic binary instrumentation (DBI).
4. Instrument every instruction that accesses memory.
5. Upon memory access, if address is within an allocation, log:
I Address of referencing instruction
I Allocation details (allocation site and size)
I Accessed offset within allocation

6. Post-process the data to build higher-level information.


Dynamic Structure Reconstruction
Step #1: Locate Memory Management Functions

free proc near


push ebp
mov ebp, esp
···
.idata:61EB19C8 extrn imp free:dword
.idata:61EB19CC extrn imp malloc:dword
malloc proc near
push ebp
mov ebp, esp
···

Locate and record pointers to memory management functions.


(Of course, these may be contained in the binary and require direct hooks.)
This step is not specific to DBI.
Dynamic Structure Reconstruction
Step #2: Hook Memory Management Functions

HOOK
.idata:61EB19C8 extrn imp free:dword &freeHook
.idata:61EB19CC extrn imp malloc:dword &mallocHook
HOOK

Hook the routines, point them to our wrappers around them


(somewhere inside of the same address space).

This step is not specific to DBI.


Dynamic Structure Reconstruction
Skeletons for the Memory Management Wrappers

The hooks save metadata upon malloc, and discard upon free.

void *mallocHook(int size) { void freeHook(void *mem) {


#1 void *mem = pfnOriginalMalloc(size); #1 forget(mem);
#2 remember(mem,size,_ReturnAddress()); #2 pfnOriginalFree(mem);
#3 return mem; }
}

1. Invoke the original malloc 1. Remove metadata


2. Record the pointer / size / about that allocation
allocation site 2. free it
3. Return the allocated pointer

This works transparently to unmodified applications.


Dynamic Structure Reconstruction
remember and Allocation Records

remember stores allocation records.

Allocation Record

Allocated pointer Size RVA of return address from malloc

The allocation site is written as an RVA (offset into image).

.text:61EC53EE call malloc


.text:61EC53F3 mov rbx, rax J

Base address is 61EB0000, so RVA J = 61EC53F3-61EB0000 = 153F3.


Dynamic Structure Reconstruction
Implementation of remember and forget

remember stores pointers and metadata in a tree (map) structure.


forget removes items from the tree.

0xe9d5a00 0x138 0x14f375

0xe176610 0x18 0x6f00b 0xec7c0a0 0x250 0x17b532

0xe105e18 0x50 0x154671 0xe874958 0x18 0xd356 0xeb0c038 0x138 0x154671 0xf0a3888 0x50 0x14f40c

Binary trees (AVL, red/black) are well-suited here. Hash tables are not.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Dynamic Structure Reconstruction
Step #3: Run Program under Instrumentation

DBI

Program Inputs

Logged Memory Access Data

Run the program under dynamic binary instrumentation. Provide


inputs that exercise as much functionality as possible.
Dynamic Structure Reconstruction
Step #4: Instrument Memory References

.text:61F33E5A mov eax, [ebp+4]


.text:61F33E60 add edx, 4 Insert DBI memory access
.text:61F33E63 mov ebx, [eax+8] callback routine before every
.text:61F33E69 push ebx memory access
.text:61F33E6A call sub 61F33950

Use DBI to instrument every memory reference.


Dynamic Structure Reconstruction
Step #5: Detect, Record Structure Accesses

void DBIMemAccessCallback(
000000ADDR eaIns, ADDR eaMem,
000000SIZE size, BOOL bRead) {
{
00AllocRecord *ar = lookup(eaMem); J
00if(ar != NULL)
0000log(ar,eaIns,size,eaMem,bRead); J
}

I Lookup accessed addresses in the map. J


I If the access was within an allocation, log the details. J
I Next slide gives an example.
Dynamic Structure Reconstruction
Step #5: Detect, Record Structure Accesses

Suppose this instruction accesses address DC07928:

.text:61F33E63 mov ebx, [eax+8]

Suppose further that we have recorded this allocation record:

DC07900 80 6F00B

We found an access! Log the following data:

Allocation RVA 6F00B


Allocation Size 80
Instruction RVA 83E63
Access Size dword
Access Offset 0x28
Access Type READ
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Dynamic Structure Reconstruction
Step #6: Post-Process Recorded Data

So far, we have logged access data to allocated objects, as in:

Allocation Inst. Access


RVA Size RVA Offset Size Type
6F00B 50 6D04A 0x48 4 WRITE
14F40C 50 6B84D 0x18 4 READ
14C213 50 6B859 0x4 4 READ
55816 50 1E0E4 0x44 4 READ
E941A 50 BD7DC 0x10 8 WRITE
6F00B 50 6D000 0x8 8 READ
55816 50 6D00B 0x0 4 WRITE
6F00B 50 149E8D 0x20 1 READ

Now we process this data to reconstruct useful information.


This step is not specific to DBI.
Dynamic Structure Reconstruction
Segregate Data by Allocation Site

Allocation Inst. Access


RVA Size RVA Offset Size Type
I 6F00B 50 6D04A 0x48 4 WRITE
14F40C 50 6B84D 0x18 4 READ
14C213 50 6B859 0x4 4 READ
I 55816 50 1E0E4 0x44 4 READ
E941A 50 BD7DC 0x10 8 WRITE
I 6F00B 50 6D000 0x8 8 READ
I 55816 50 6D00B 0x0 4 WRITE
I 6F00B 50 149E8D 0x20 1 READ

First, group accesses by their allocation site.

(If two sites are known to allocate the same type, we can merge their data.)
Dynamic Structure Reconstruction
Rebuild C-Level Structures

For a given allocation site:

Offset Size ztruct X {


0x0 4 znt f0;
0x4 4 znt f4;
0x8 8 __int64 f8;
0x10 8 __int64 f10;
0x18 4 znt f18;
0x20 1 zhar f20;
0x21 1 zhar f21;
0x22 2 zhort f22;
···
1. Sort accesses, remove duplicates.
Dynamic Structure Reconstruction
Rebuild C-Level Structures

For a given allocation site:

Offset Size struct X {


0x0 4 int f0;
0x4 4 int f4;
0x8 8 __int64 f8;
0x10 8 __int64 f10;
0x18 4 int f18;
0x20 1 char f20;
0x21 1 char f21;
0x22 2 short f22;
···
1. Sort accesses, remove duplicates.
2. Create properly sized and padded fields. (Easy, right? . . . )
Dynamic Structure Reconstruction
Discover Nested Subobjects

Brief digression: discovery of nested structure locations.

mov ebx, [eax+I8J]


Allocation RVA 6F00B
Allocation Size 80
Access Offset I 0x28 J

I Notice that the instruction vs. accessed offsets are different:


I The instruction uses offset I8J; however:
I The logged, raw offset into the allocation was I 0x28 J.
I Hence, when logged, eax pointed +0x20 into the allocation.
I Usually, this implies a structure is contained at offset +0x20.
I The alternative is a pointer to a contained POD type field.
I We can use this information to recover nesting relationships.
I Reconstruct not just a flat list of fields, but contained structs.
Dynamic Structure Reconstruction
Sources of Ambiguity in the Data

Imperfect data (misleading, conflicting, or hard-to-analyze


structure field accesses) arises from:

I Natural causes in the source code.


I Casts between integer sizes
I Use of unions
I Arrays
I Compiler optimizations.
I Bulk data operations.

A summary of these problems and our solutions follow.


Dynamic Structure Reconstruction
Ambiguity #1: Casting

struct X { int16 b = x->a; int8 b = x->a;


00// ... . . . might compile to . . . . . . might compile to . . .
00int32 a; mov ax, [rsi+10h] mov al, [rsi+10h]

I Casts produce different access sizes to the same field.


I My solution: choose the most frequently-occurring size.
I Works well in this case.
Dynamic Structure Reconstruction
Ambiguity #2: Compiler Optimizations

if(x->flags32 & 0x40) might compile to:

F7 06 40 00 00 00 test dword ptr [esi], 40h


OR, EQUIVALENTLY:
F6 06 40 test byte ptr [esi], 40h

I Peephole optimizers may produce the smaller, second one.


I This can produce different access sizes to the same fields.
I My solution: same as before. Also works well.
Dynamic Structure Reconstruction
Ambiguity #3: Bulk Copies and Assignments

struct X { memset(x1,0,sizeof(X)) might compile to:


00char a; J xor eax, eax
00char b; J mov qword ptr [rcx], rax J
00short c; J mov qword ptr [rcx+8], rax
00int d; J mov qword ptr [rcx+10h], rax
00// ...
mov qword ptr [rcx+18h], rax
···
I Compiled memset and memcpy often use block operations, i.e.,
are oblivious to the sizes/configurations of structure fields.
I This can produce different access sizes to the same fields.
I My solution: same as before. Also works well.
Dynamic Structure Reconstruction
Ambiguity #4: unions

struct X {
00// ...
00union { int c = x->a; char *d = x->b;
0000int a; . . . might compile to . . . . . . might compile to . . .
0000char *b; mov eax, [rsi+10h] mov rax, [rsi+10h]
00}
00// ...

I With unions, multiple variables (of different types and


sizes) occupy a single memory location.
I Clearly there can be multiple sizes for one location.
I My solution: as before, choose the most frequent size.
I That is, sidestep and ignore unions completely.
I Not a real solution, and does not work very well!
Dynamic Structure Reconstruction
Ambiguity #5: Array Accesses

struct X { int b = x->a[i];


00// ... . . . might compile to . . .
00int a[10]; mov ebx, [rsi+rbx*4+10h]

I Arrays produce references to non-constant offsets.


I My solution: discard accesses with non-constant offsets.
I Also not a real solution.
I Does not recover arrays, period, let alone do it well.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Limitations of DBI-Based Solutions
Limitation #1: Overhead from Instrumentation

I The technique is comprehensive and fully automated, but . . .


I Every memory access must be instrumented.
I Every memory access incurs a map lookup.
I This is a relatively heavyweight application of DBI.
I The overhead impedes interaction with the application;
I Lower interactivity means lower code coverage; and
I Limited code coverage means limited applicability.
I Furthermore, the overhead is fundamental to the approach.
I No matter how we optimize it, fundamentally, the approach
instruments all memory accesses.
Limitations of DBI-Based Solutions
Limitation #2: Overhead from Tracking Every Allocation

I Every instrumented memory access requires a map lookup.


I Binary trees ensure slow log (N) growth, but nevertheless . . .
I . . . more allocations tracked means slower lookups.
I Can we reduce the overhead even further?, and/or
I Is it useful to track only a subset of allocations?
Limitations of DBI-Based Solutions
Friendliness as an Interactive Tool

I DBI solution is fire-and-forget, monolithic, and slow.


I Can we make a useful interactive reverse engineering tool?
I How best to offer the results within Hex-Rays and IDA?
I GUIs for browsing the results
I Full automation for common type annotation tasks
I Library elements for custom tasks
Directions Explored in this Research

1. Track memory accesses via page-related chicanery.


2. Allow the user to specify particular allocation sites of interest,
rather than targeting all at once.
3. Divert interesting allocations into custom allocators.
4. Performance-optimize everything.
5. Make good use of the resulting data.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Focus on the Memory, not the Instructions
Idea #1: Debug Breakpoints

First idea for replacing DBI: X86 debug/memory breakpoints.

mov rcx, 20h


call malloc ; Set 8-byte R/W breakpoint on:
; rax+0x00
; rax+0x08
; rax+0x10
; rax+0x18

I PRO: only incurs overhead when the memory is accessed


I CON: can only set 4 breakpoints, i.e., 0x20 bytes of memory
I Would like to track megabytes/gigabytes, not “bytes”.

Verdict: right direction, wrong scale.


Focus on the Memory, not the Instructions
Idea #2: Page Breakpoints

Second idea for replacing DBI: SoftICE’s BPR feature.

mov rcx, 88h


call malloc SoftICE command:
:bpr rax rax+0x88 rw

I Unlimited memory breakpoints of any size!


I Can be implemented in kernel-mode or user-mode.
I Note, also implemented by other tools:
I IDA’s large memory breakpoints
I OllyDbg’s page breakpoints

Sounds promising! Let’s review how it works.


Virtual Memory and Demand Paging
Virtual Memory Concept

Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT


#1 #2 #3 #4

PHYS PHYS PHYS


#1 #2 #3

The OS provides the illusion of a large virtual address space, but


only mapped addresses are valid. (E.g., #4 is not.)
Virtual Memory and Demand Paging
Virtual-to-Physical Address Translation
Page Table Entries (PTEs)

VIRT VIRT VIRT


#1 #2 #3

PHYS PHYS PHYS


#1 #2 #3

PTE #2 for VIRT = 0x76543000


31 11 1 0

Physical Address
Misc. Flags P=1
0x12345000

The page table maintains the virtual-to-physical mappings. E.g. if


virtual 0x76543000 is mapped to physical 0x12345000, then
0x76543 012 resolves to 0x 12345
| {z } |{z} 012 .
| {z } |{z}
Virtual Offset PhysicalOffset
Virtual Memory and Demand Paging
On-Demand Growth of Virtual Address Space

Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT


#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS


#1 #2 #3 #4 #5

The OS can grow the virtual address space on demand by


allocating and mapping additional physical pages.
Virtual Memory and Demand Paging
Swapping Under High Memory Load
Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT


#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS


#1 #2 #3 #4 #5

PTE #3
31 11 1 0

Physical Address Misc. Flags P=0

When memory is scarce, the OS reclaims physical pages by:


1. Writing their contents to disk.
2. Marking the corresponding PTE entry as non-present (P=0).
Later accesses to non-present pages generate page fault exceptions.
Virtual Memory and Demand Paging
Transparently Reloading Swapped Pages
Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT


#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS


#1 #2 #3 #4 #5

PTE #3
31 11 1 0

Physical Address Misc. Flags P=0

In response to page faults in non-present pages, the OS:


1. Allocates a physical page.
2. Loads the page’s previous contents from disk.
3. Updates the PTE with the physical address and P=1.
4. Resumes execution.
Dynamic Structure Reconstruction
Summary

SoftICE’s BPR, IDA’s large memory, and OllyDbg’s page


breakpoints co-opt X86’s demand paging mechanism as such:
I Mark pages of interest as non-present.
I Intercept page fault exceptions for those pages.
I Determine whether the region of interest was accessed;
I Break if so, continue execution if not.
We exploit this same mechanism to track structure accesses.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]


I 61F33E60 add edx, 4

Before the first instruction executes, assume that the


page at [ebx+4] has been marked as non-present.
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]


I 61F33E60 add edx, 4

First instruction attempts to execute.


Memory Tracing via Presence
Mechanism in Detail

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault
I 61F33E60 add edx, 4

Since [ebx+4] is non-present, the CPU triggers a page


fault exception.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4] if(!ourFault())
Page Fault
I 61F33E60 add edx, 4
00return;

The OS transfers control to our exception handler.


Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I if(!ourFault())
I 61F33E60 add edx, 4
I 00return;

If the address was not within a tracked region, we pass.


Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I log(exnDetails);
I 61F33E60 add edx, 4
I save(faultEa); J

Log structure access (identically to what we did for DBI).


Also, record the address of the faulting instruction
(61F33E5A J in this case).
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I makePresent(page);
I 61F33E60 add edx, 4

Mark the page at [ebx+4] as present again.


Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I setTrapFlag();
I 61F33E60 add edx, 4
I resumeExecution();

Set the X86 trap flag (TF). This will allow one instruction
to execute, after which a single-step exception will be
raised. Continue execution of the monitored program.
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]


I 61F33E60 add edx, 4

The first instruction executes again. Since the page at


[ebx+4] is now present, execution succeeds this time.
Memory Tracing via Presence
Mechanism in Detail

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step
I 61F33E60 add edx, 4

The second instruction would execute, but since the TF


was previously set, the CPU raises a single-step exception.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I if(!ourFault())
I 61F33E60 add edx, 4
I 00return;

The OS invokes our single step exception handler. Ensure


that the faulting address J is immediately after the
previously-saved faulting address J. If not, we pass.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I makeNonPresent(page);
I 61F33E60 add edx, 4

Mark the page at [ebx+4] as non-present again, ensuring


that we catch future accesses to the monitored page.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler


EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I resumeExecution();
I 61F33E60 add edx, 4

Resume execution of the program.


Memory Tracing via Presence
Multi-Threaded Architecture

Controller Thread

Poll buffers
Filter, log
Sleep

Ring Ring Ring


Buffer Buffer Buffer
Catch faults Catch faults Catch faults
Log data Log data Log data

Thread #1 Thread #2 Thread #3

A dedicated thread retrieves events from the program’s thread-local


data, filters duplicates, and logs the results to disk via buffered I/O.
Memory Tracing via Presence
Optimizations

Optimization ideas implemented:


1. Emulate common instructions instead of single-stepping1
2. Use guard pages instead of PAGE_NOACCESS2
3. Force consumer thread away from producer thread cores

Architectural # Accesses
Revision per Minute
Single-Threaded 3M
Multi-Threaded 6.4M
Mini X64 Emulator V1 9M
Guard Pages 11M
SetThreadAffinityMask() 12.1M
Mini X64 Emulator V2 13.2M

1
Suggested by Yaron Dinkin
2
Suggested by Jason Geffner + RECON attendee whose name I forget (sorry)
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
How to Apply Page-Based Tracking

We’ve shown how to track memory, but not how to apply it. We
explore our two possibilities, and strategies for those cases:

1. Track every allocation, a la DBI.


2. Only track certain allocations.
Dynamic Structure Reconstruction
Tracking Every Allocation

When tracking all allocations:

void *allocHook(int size) { void freeHook(void *mem) {


00// existing hook code 00// existing hook code
00addBpt(mem,size); J 00delBpt(mem,size); J
00// existing hook code 00// existing hook code
} }

Simply add breakpoints upon malloc, and remove upon free.


Target Specific Allocation Sites
Noisiness of Technique

When tracking only some allocations:


Heap

... Allocation #1 Allocation #2 Allocation #3 ...

I Since our technique works at the level of pages, we incur page


faults for any allocation on the same page.
I Only interested in red region.
I However, we take faults in blue regions on the same page.
I More page faults = more overhead.
Target Specific Allocation Sites
Noisiness of Technique

When tracking only some allocations:


Heap

... Allocation #1 Allocation #2 Allocation #3 ...

Page Boundary

I Since our technique works at the level of pages, we incur page


faults for any allocation on the same page.
I Only interested in red region.
I However, we take faults in blue regions on the same page.
I More page faults = more overhead.
I Worse, we may monitor multiple pages per allocation.
I Best performance: only fault on interesting allocations.
Target Specific Allocation Sites
Divert into Custom Allocator

When tracking only specific allocations, to improve performance:

void *mallocHook(int size) { void freeHook(void *mem) {


00if(isInteresting(_RetAddr())) { J 00if(isCustom(mem)) { J
0000mem = customAlloc(size); J 0000delBpt(mem,size); J
0000addBpt(mem,size); J 0000customFree(mem); J
00} else 00} else
0000mem = originalMalloc(size); 0000originalFree(size);
00return mem; }
}

Divert interesting allocations into a custom allocator.

Benefit: no page faults for uninteresting allocations!


Target Specific Allocation Sites
Divert into General-Purpose, Off-the-Shelf Allocator

One strategy for custom allocation: use an existing allocator.

HANDLE hHeap = HeapCreate(...); void customFree(void *mem) {


//... 00HeapFree(hHeap,0,mem);
void *customAlloc(size) { }
00return HeapAlloc(hHeap,0,size);
}

Pros: Cons:
I Easy to implement I Page faults for in-band
I Usually thread-safe metadata
I Naturally handles I Slower than some
different-sized allocations alternatives
I Tuned for performance
Target Specific Allocation Sites
Divert into Customized Slab Allocator

Another allocation strategy: fixed-size slab allocator.

Slab Allocator
CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK
...
#1 #2 #3 #4 #5 #6 #7 #8

#3 #5 #7

Free List

Pros: Cons:
I Fast allocation and range checks I Fixed-size
I No in-band metadata I Must be applied judiciously
Dynamic Structure Reconstruction
Summary: DBI vs. DLL Injection

DBI DLL Injection


1. Hook malloc/free 1. Hook malloc/free
2. Record allocation details 2. Divert chosen allocations
3. Instrument memory into custom allocator
references 3. Mark allocations non-present
4. Keep accesses to allocations 4. Catch memory exceptions
5. Log structure accesses 5. Log structure accesses
6. Post-process data 6. Post-process data
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
Loading the Data in IDA

Right-click in the main GUI window.

I Unknown structure: for structure recovery purposes.


I Known structure: when the structure type is already known.
I This case is still very useful, as we will see.
Dynamic Structure Reconstruction
List of Raw Structure Accesses

After loading the access data for some allocation site(s).


Dynamic Structure Reconstruction
Raw Structure Access Operations

Right-clicking provides two primary operations:

1. Apply structure offsets in the disassembly.


2. Change the types of variables in Hex-Rays.
Dynamic Structure Reconstruction
Applying Structure Offsets in the Disassembly

Applying structure offsets is extremely fast.


Dynamic Structure Reconstruction
Structure Cross-References

You can browse cross-references in the structures window.


1243 free structure references to that field alone!
Dynamic Structure Reconstruction
Locating Hex-Rays Variables

1706AC8A movzx eax, byte ptr [rbx+40h]

For a given discovered structure access, locate the pointer


dereference at that address in the Hex-Rays CTREE.

Does not always work. Hex-Rays may:


I Lose track of the address of the memory dereference;
I Not create a variable for the pointer dereference;
I Render the access via many patterns, some of which I miss.

Can be improved somewhat, but will never be perfect.


Dynamic Structure Reconstruction
Applying Hex-Rays Variable Types

Comparison before and after applying Hex-Rays types.


Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
unions

union U {
00int x;
00char *y;
00void *z;
};

I unions allow multiple interpretations of the same variable.


I U can hold either an int, x, OR a char *, y, OR a void *, z.
Dynamic Structure Reconstruction
Tagged unions

enum Uelt { struct taggedU {


00Uint = 0, 00Uelt t;
00Ucptr = 1, 00U e;
00Uvptr = 2 };
};

I The tagged union pattern associates an enum with a union.


I t is called the tag or the discriminant.
I This design pattern is especially common in programming
language tools (compilers, interpreters, decompilers, etc).
Dynamic Structure Reconstruction
Tag Checking

void print(taggedU *u) {


00if(u->t == Uint) 0printf("%d\n", 0u->e.x);
00if(u->t == Ucptr) printf("%s\n", 0u->e.y);
00if(u->t == Uvptr) printf("%llx\n",u->e.z);
};

I The code must check the tag to know the union’s held type.
I Code using unions is littered with these checks.
Dynamic Structure Reconstruction
unions in Decompilation: Improper Selection

Proper union field selection is critical to readable decompilation.


Failure to do so leads to hideous code like this.
Dynamic Structure Reconstruction
unions in Decompilation: Proper Selection

Same code as the previous, with three union fields set properly.
unionsare particularly tedious to apply manually – let’s automate.
Dynamic Structure Reconstruction
Upon Manually Discovering a union Somewhere . . .

class mop_t {
+0x00 00mopt_t t; J
+0x01 00char oprops;
+0x02 00short valnum;
+0x04 00int size;
+0x08 00union { ... }; J
+0x10 };

I Suppose we discover a tagged union within an allocated type.


I Suppose we discover the tag location at +0x00 J.
I Suppose we discover the union location at +0x08 J.
I Run the structure access discovery again, but this time:
I Only log accesses to the union region. J
I Log the value of the enum at every union access. J
Dynamic Structure Reconstruction
Determine the Number of Union Variants

RVA enum Type


150F03 0x2 READ
151F8A 0xA READ
1520BC 0xC READ
14E9AE 0x8 READ
15EB78 0x4 READ
1531A8 0x4 READ
146F01 0xE READ

This data helps determining the number of union variants. Create


a union with this many variants (of the observed sizes).
Dynamic Structure Reconstruction
Mapping between enum Elements and union Fields

enum mop_t {
union {
00mop_z00 = 0,
00mop_r00 = 1,
10 00mreg_t r;
00mop_n00 = 2,
11 00mnumber_t *nnn;
00mop_str = 3,
12 00minsn_t *d;
00mop_d00 = 4,
13 00stkvar_ref_t *s;
00mop_S00 = 5,
14 00ea_t g;
00mop_v00 = 6,
15 00int b;
00mop_b00 = 7,
16 00mfuncinfo_t *f;
00mop_f00 = 8,
17 00lvar_ref_t *l;
00mop_l00 = 9,
18 00mop_addr_t *a;
00mop_a00 = 10,
19 00char *helper;
00mop_h00 = 11,
10 00char *cstr;
00mop_c00 = 12,
11 00mcases_t *c;
00mop_fn0 = 13,
12 00fnumber_t *fpc;
00mop_p00 = 14,
13 00mop_pair_t *pair;
00mop_sc0 = 15
14 00scif_t *scif;
};
};

Through reverse engineering, manually establish a mapping


between the enum elements and union variant numbers.
Dynamic Structure Reconstruction
Setting Hex-Rays Union Selections

17150D86 mov rbx, [rbx+28h]

1. First, set the type of the base variable, as before.


2. Next, locate the union reference •.
3. Finally, apply the proper union element number, based on the
tag value recorded at runtime.
Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Overview: Big Picture

Via DLL injection, for every call to malloc, record the pointer.

loc_567:
v4 = malloc(0x138);

sub_123(a1,v9,0);
sub_456(a4,v7);
sub_234(v1+24,"a");
sub_345(a3,a2,v17+16);

Discover every function receiving an allocated pointer as argument.


Record h function RVA, arg #, allocation RVA, size, pointer offset i.
E.g., record h 0x345, #3, 0x567, 0x138, 16 i.
Dynamic Resolution of Argument Types
Preprocessing #1: Locate Functions via x64 Exception Directory

RUNTIME FUNCTION <rva sub 61EB80C0 J, rva algn 61EB80DB J, rva stru 620C0390>

61EB80C0 sub 61EB80C0 Jproc near


61EB80C0 var 18= qword ptr -18h
61EB80C0 00sub rsp, 38h
61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]
61EB80D1 00call free
61EB80D6 00add rsp, 38h
61EB80DA 00retn
61EB80DA sub 61EB80C0 endp
61EB80DB algn 61EB80DB: align 20h J

I The PE64 Exception Directory has RUNTIME FUNCTION entries.


I These give the beginning of every non-leaf function J,
I and its end (or the beginning of its first try block) J.
Dynamic Resolution of Argument Types
Preprocessing #2: Filter Unusable Functions

I 61EB80C0 00sub rsp, 38h


61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references


t I Has control-flow I Is after function end / beginning of
I Is not easily next try block (per X64 exception
relocatable metadata)
Dynamic Resolution of Argument Types
Preprocessing #2: Filter Unusable Functions

61EB80C0 00sub rsp, 38h


I 61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references


t I Has control-flow I Is after function end / beginning of
I Is not easily next try block (per X64 exception
relocatable metadata)
Dynamic Resolution of Argument Types
Preprocessing #2: Filter Unusable Functions

61EB80C0 00sub rsp, 38h


61EB80C4 00mov [rsp+38h+var 18], -2
I 61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references


t I Has control-flow I Is after function end / beginning of
I Is not easily next try block (per X64 exception
relocatable metadata)

Succeed after sizeof(call) bytes. J


Dynamic Resolution of Argument Types
Preprocessing #3: Force __fastcall Calling Convention

For each kept function, get the prototype from Hex-Rays.

signed __int64 signed __int64


00__usercall@<rax> • 00__fastcall •
00sub_17078400( 00sub_17078400(
0000@__int64 a1@<rdx>, J 0000__int64 a2, J
0000@__int64 a2@<rcx>, J 0000__int64 a1) J
0000signed int a3@<r15d>) J

Remove non-__fastcall-compliant arguments; reorder remaining.


Dynamic Resolution of Argument Types
Preprocessing #4: Record Positions of Pointer-Sized Arguments

void __fastcall
00sub_61EB8AD0(
0000void *rcx0, J
0000unsigned int a2,
0000void *a3) J

For each pointer-sized argument J, record positions (#0, #2).


(Standardized X64 __fastcall makes this easier than on X86.)
Discard functions with no pointer-sized arguments.

Determining argument sizes isn’t perfect; Hex-Rays sometimes makes mistakes.


Dynamic Resolution of Argument Types
Preprocessing Summary

Function RVA # Prolog Bytes # Args # arg0 # arg1 ···

For each suitable function with pointer-size arguments, record:


I Function’s location
I Number of prolog bytes
I Number, positions of pointer-sized arguments

Function # Prolog # Tracked


RVA Bytes Args Arg positions
0xe760 5 4 0 1 3 4
0x114e50 8 3 0 1 2
0xf6f60 5 3 0 1 3
0x47c20 6 2 0 1
0x10c4d0 5 2 0 1
Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Step #1: Hook Memory Management Functions

Hook allocators via DLL injection.

HOOK
.idata:61EB19C8 extrn imp free:dword &freeHook
.idata:61EB19CC extrn imp malloc:dword &mallocHook
HOOK

As before, record allocation records from malloc until free.

Allocation Record

Allocated pointer Size RVA of return address from malloc


Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 mov r8, rcx
I .text:61F6ABD3 push rbx
I .text:61F6ABD4 sub rsp, 80h
I .text:61F6ABDB mov [r11-68h], -2

I .text:6205CC10 push rdi


I .text:6205CC12 sub rsp, 40h
I .text:6205CC16 mov [rsp+48h+var 28], -2

I .text:6205F91C sub rsp, 18h


I .text:6205F920 mov r8, rcx
I .text:6205F923 mov eax, 5A4Dh

I .text:61F79570 mov rax, rsp


I .text:61F79573 push r14
I .text:61F79575 sub rsp, 60h

I .text:61F00AA0 sub rsp, 38h


I .text:61F00AA4 mov [rsp+38h+var 18], -2
I .text:61F00AAD mov rcx, [rcx]

For each function to hook . . .


Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 mov r8, rcx I .text:12345600 mov r8, rcx
I .text:61F6ABD3 push rbx COPY I .text:12345603 push rbx
I .text:61F6ABD4 sub rsp, 80h I .text:12345604 sub rsp, 80h
I .text:61F6ABDB mov [r11-68h], -2 I .text:1234560B jmp 61F6ABDB

I .text:6205CC10 push rdi COPY I .text:12345620 push rdi


I .text:6205CC12 sub rsp, 40h I .text:12345622 sub rsp, 40h
I .text:6205CC16 mov [rsp+48h+var 28], -2 I .text:12345626 jmp 6205CC16

I .text:6205F91C sub rsp, 18h COPY I .text:12345640 sub rsp, 18h


I .text:6205F920 mov r8, rcx I .text:12345644 mov r8, rcx
I .text:6205F923 mov eax, 5A4Dh I .text:12345647 jmp 61F79573

I .text:61F79570 mov rax, rsp COPY I .text:12345660 mov rax, rsp


I .text:61F79573 push r14 I .text:12345663 push r14
I .text:61F79575 sub rsp, 60h I .text:12345666 jmp 61F79575

I .text:61F00AA0 sub rsp, 38h COPY I .text:12345680 sub rsp, 38h


I .text:61F00AA4 mov [rsp+38h+var 18], -2 I .text:12345684 mov [rsp+38h+var 18], -2
I .text:61F00AAD mov rcx, [rcx] I .text:1234568D jmp 61F00AAD

Allocate memory for re-entry thunks. Copy the leading


instructions, and insert a jump to after the copied instructions.
Record { Original RVA, Thunk VA } in a hash table.
Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 call commonLog
.text:61F6ABDB mov [r11-68h], -2

I .text:6205CC10 call commonLog


.text:6205CC16 mov [rsp+48h+var 28], -2

I .text:6205F91C call commonLog


.text:6205F923 mov eax, 5A4Dh commonLog()

I .text:61F79570 call commonLog


.text:61F79575 sub rsp, 60h

I .text:61F00AA0 call commonLog


.text:61F00AAD mov rcx, [rcx]

Divert every function into a common logging routine.


Dynamic Resolution of Argument Types
Redirect Functions into a Common Logging Stub

commonLog:
; Save flags
; Save registers
00mov rcx, rsp 00000000; Point arg #0 to stack data
00call commonLogC
00mov [rsp+78h], rax J ; Return to re-entry thunk
; Restore registers
; Restore flags
00retn

I commonLog just invokes its C counterpart.


I Overwrite return address J with function’s re-entry thunk.
Dynamic Resolution of Argument Types
Argument Logging Details

uint64_t commonLogC(uint64_t *args) {


00funcRVA = _ReturnAddress();
#1 00reEntry, argList = lookup(funcRVA);
#2 00for(argNo : argList)
#2 0000log(funcRVA,argNo,args[argNo]);
#3 00return reEntry;
}

1. Fetch re-entry address, list of interesting arguments.


2. Log each interesting argument.
I This happens in another thread for efficiency.

3. Return to re-entry thunk.


Dynamic Resolution of Argument Types
Filter, Log Allocation Flow to Arguments

void log(uint64_t funcRVA, int argNo, uint64_t arg) {


#1 00allocRec = allocMapLookup(arg);
#2 00if(!allocRec) return;
#3 00write(funcRVA,argNo,allocRec);
}

1. Look up function argument in allocation map.


2. Return if not part of an allocation.
3. Write log entry otherwise.

Logged Data
Function RVA # Arg Alloc RVA Alloc size Offset into alloc
Dynamic Resolution of Argument Types
Summary: Logged Data

Logged Data
Function RVA # Arg Alloc RVA Alloc size Offset into alloc
0x15f520 1 0xbe648 0x50 0x20
0x153ce0 0 0xbe648 0x50 0x0
0x147520 0 0x143fd8 0x50 0x40
0x15f8d0 1 0x143fd8 0x50 0x40
0x11530 0 0x56c89 0x630 0x0
0x55a80 0 0x56149 0x120 0x0
0x57bd0 0 0x56149 0x120 0x18

This generates a lot of data (˜60K entries for my target).


Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Loading the Data in IDA

I First, create an allocator/free pair.


I Can add multiple allocators if appropriate.
I Next, load the data for that allocator.
Dynamic Resolution of Argument Types
Displaying Allocation Sites and Types

I Double-click an allocator to see the list of data:


1. The address of all observed allocation sites
2. The size allocated by that site
I If multiple sizes, show the GCD
3. The user-supplied type for that site, if any
Dynamic Resolution of Argument Types
Displaying Allocation Site Flow Data

I Double-click an allocation site to see the list of:


1. Functions and arguments into which the allocations flowed
2. Observed offsets from the base of the allocation
Dynamic Resolution of Argument Types
Displaying Allocation Flow in Hex-Rays

Hex-Rays listings will automatically display allocation flow data.


Dynamic Resolution of Argument Types
Setting Allocation Site Types

Once known, the user manually sets the allocation site type.
(Or, in IDA: Edit->Operand Type->Set Operand Type)

After refresh, the allocation sites window shows the type.


Dynamic Resolution of Argument Types
Applying Argument Types

For known allocation site types, the user can apply argument types.
Can select multiple allocation sites at once.
Dynamic Resolution of Argument Types
Applied Argument Types

Way better than doing it by hand, isn’t it?


Dynamic Resolution of Argument Types
Related Types

For a given allocation site, for each offset passed to a function


argument, display the types of other structure fields passed to the
same argument.
Further Extensions and Challenges
Extensions
Challenges
Further Extensions
Combination with Static Analysis

Access data only covers observed behaviors.


E.g., will not discover the accesses J below.

// ALWAYS OBSERVED TAKEN


if(v1 != 0) {
00x->f0 = 1234;
00x->fC = 0;
} // NEVER OBSERVED TAKEN
else {
00x->f4 = 456; J
00x->f8 = 789; J
}

Can use Hex-Rays struct analysis to discover other accesses J.


Further Extensions
Maximum Size of Pointed-To Objects

Want to know: how big is the thing an argument points to?

Offset Size Max


0x00 0x30 0x30
0x40 0x50 0x10
0x2C 0x30 0x04

Example data reaching a function argument

I Maximum size is the distance from the offset to the end.


I Take the minimum across all data points.
Further Extensions
Inheritance Discovery by Access Location

Derived classes must construct base classes first:

0042CFF0 GraceWireGeneric Constructor proc near


···
0042D020 push 4
0042D025 call GraceObject Constructor
Further Extensions
Inheritance Discovery by Access Location

00424096 mov [edi+8], eax


00424099 lea eax, [edi+0Ch]
0042409C push eax
0042409D mov [ebp+a5], eax
004240A0 mov dword ptr [edi], offset vtbl

I Hence, every class in a single-inheritance hierarchy should


have the same address for its first access.
I CAVEAT: inlined constructors will break this.
Further Extensions and Challenges
Extensions
Challenges
Challenges
Code Coverage

As with any dynamic analysis, results limited to covered code.

if(v1 != 0) { // ALWAYS OBSERVED TAKEN


00x->f0 = 1234;
00x->fC = 0;
} else { // NEVER OBSERVED TAKEN
00neverExecutedFunc1(x); J
00neverExecutedFunc2(x); J
}

I offer no real contributions here, other than that the performance


optimizations hereinbefore can increase observations per time unit.
Challenges
Type-Related Ambiguity

Suppose multiple allocation sites/sizes flow to an argument.

class x { class y :
+0 int a; J
00int a; J 00public x {
+4 int b; J
00int b; J 00int c; J
+8 int c; J
}; 00int d; J
+12 int d; J
00 };

What “type” should we assign the argument?


Need inheritance/composition relationships to fully resolve.
The data is still useful without knowing that, though.
Challenges
Type- and Size-Related Ambiguity

Suppose multiple allocation sites/sizes flow to an argument.

class x { class y : class z :


00int a; 00public x { 00public x {
00int b; 00char *c; 00void *d;
}; }; };

For derived objects, same size does not imply same type.
Challenges
Nested Structures

struct A {
00int a;
00struct B {
0000struct C { A
000000struct D {
00000000int d; int a int d int c int b
000000} D;
000000int c; D
0000} C; C
0000int b; B
00} B;
};

An example of nested structures.


Challenges
Access to Nested Structure Fields

Type Expression
int a int d J int c int b int * *x
D * x->d
D C * x->D.d
C B * x->C.D.d

I Suppose x points to int d J within a struct A.


I What is the C-level type of x and the accessing expression?
I The four possibilities are shown at right.
I Even if we had the structure types and nesting relationships
from the source code, how would we know the type of x?
Introduction

Dynamic Structure Reconstruction

Dynamic Resolution of Argument Types

Further Extensions and Challenges

Conclusion
Conclusion

I None of these techniques are particularly sophisticated.


I However, they are easy-to-use and produce very useful results.
I Despite challenges and open problems, the results are useful.
I Automation was a better use of my RE time than reading code.
I I’ll probably release the code in July.
I Check Twitter, Reverse Engineering reddit, etc.

Any Questions?

You might also like