0% found this document useful (0 votes)

76 views129 pages

CPP Dynamic Type Recovery

This document describes techniques for automating the reconstruction of dynamic structures and type information from C++ binaries through reverse engineering. It discusses existing DBI-based approaches, limitations, and the author's contributions, including tracking structure accesses and argument types at runtime to derive type-related metadata and apply it in disassemblers. The techniques allow automatic creation and application of type information to improve reverse engineering productivity.

Uploaded by

huyphamxt1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views129 pages

CPP Dynamic Type Recovery

Uploaded by

huyphamxt1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 129

Automation Techniques in C++ Reverse Engineering

Rolf Rolles, Möbius Strip Reverse Engineering

July 16, 2019

Introduction

Dynamic Structure Reconstruction

Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem

Dynamic Resolution of Argument Types

Preprocessing
Run-Time Data Collection
Applying the Results

Further Extensions and Challenges

Extensions
Challenges

Conclusion
Introduction
Genesis of this Research

I While researching an upcoming C++ RE training class, I:

I Practiced statically reverse engineering large C++ binaries.
I Spent ˜85%-95% of my time creating and setting types.
I Experimented with automating type-related activities.
I A few of my results are detailed in this presentation.

I Goal: derive type-related metadata from runtime allocation

and structure access data, and apply it in IDA and Hex-Rays.
I The techniques are simple, but the results are very useful!
I Two primary analyses, both based on DLL injection:
1. Track structure accesses
2. Track data flow from allocation sites into function arguments
Introduction
Type Information

Type information is the

difference between this:
unreadable, borderline
useless gibberish . . .
Introduction
Type Information

. . . and this: nearly

perfect code versus the
original source, minus
names and comments.
However, it is tedious to
create and apply type
information, so let’s
automate it.
Introduction
Interesting Type-Related Information

Discover, through dynamically executing the program:

I All exercised allocation sites and their sizes
I Size and layout of structures; types for its fields
I Also discover structures contained within other structures
I All locations accessing allocated structures of interest
I Type relationships between fields of different structures
I Function argument and local variable types

(And, some more experimental stuff described later.)

Introduction
Numbers for my Current Target

These techniques allowed me to automatically (or semi-auto.)

create and apply type information for my current target:

Structures recovered ˜200

Structure references added 10,000+
Union selections applied ˜2,200
Variable types modified ˜6,000
Argument types modified ˜2,750
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Limitations of DBI-Based Solutions
My Contributions to this Problem
Dynamic Structure Reconstruction
Inspiration

I Existing academic work on the subject inspired me:

I Howard: A Dynamic Excavator for Reverse Engineering Data
Structures by Slowinska et al
I dynStruct: An Automatic Reverse Engineering Tool for
Structure Recovery and Memory Use Analysis by Mercier
I The author has published the source code on GitHub.

I I adapted and modified their ideas for better performance and

increased flexibility.
I For example, I use DLL injection instead of DBI.
Dynamic Structure Reconstruction
Overview

The workflow of these tools is as follows:

1. Locate addresses of malloc, free, etc.
2. Hook these memory routines at runtime to record:
I The allocation site (e.g., address of the call to malloc)
I The size of the allocation
I The pointer returned by malloc
I Discard this information upon a call to free
3. Run the program under dynamic binary instrumentation (DBI).
4. Instrument every instruction that accesses memory.
5. Upon memory access, if address is within an allocation, log:
I Address of referencing instruction
I Allocation details (allocation site and size)
I Accessed offset within allocation

6. Post-process the data to build higher-level information.

Dynamic Structure Reconstruction
Step #1: Locate Memory Management Functions

free proc near

push ebp
mov ebp, esp
···
.idata:61EB19C8 extrn imp free:dword
.idata:61EB19CC extrn imp malloc:dword
malloc proc near
push ebp
mov ebp, esp
···

Locate and record pointers to memory management functions.

(Of course, these may be contained in the binary and require direct hooks.)
This step is not specific to DBI.
Dynamic Structure Reconstruction
Step #2: Hook Memory Management Functions

HOOK
.idata:61EB19C8 extrn imp free:dword &freeHook
.idata:61EB19CC extrn imp malloc:dword &mallocHook
HOOK

Hook the routines, point them to our wrappers around them

(somewhere inside of the same address space).

This step is not specific to DBI.

Dynamic Structure Reconstruction
Skeletons for the Memory Management Wrappers

The hooks save metadata upon malloc, and discard upon free.

void mallocHook(int size) { void freeHook(void mem) {

#1 void *mem = pfnOriginalMalloc(size); #1 forget(mem);
#2 remember(mem,size,_ReturnAddress()); #2 pfnOriginalFree(mem);
#3 return mem; }
}

1. Invoke the original malloc 1. Remove metadata

2. Record the pointer / size / about that allocation
allocation site 2. free it
3. Return the allocated pointer

This works transparently to unmodified applications.

Dynamic Structure Reconstruction
remember and Allocation Records

remember stores allocation records.

Allocation Record

Allocated pointer Size RVA of return address from malloc

The allocation site is written as an RVA (offset into image).

.text:61EC53EE call malloc

.text:61EC53F3 mov rbx, rax J

Base address is 61EB0000, so RVA J = 61EC53F3-61EB0000 = 153F3.

Dynamic Structure Reconstruction
Implementation of remember and forget

remember stores pointers and metadata in a tree (map) structure.

forget removes items from the tree.

0xe9d5a00 0x138 0x14f375

0xe176610 0x18 0x6f00b 0xec7c0a0 0x250 0x17b532

0xe105e18 0x50 0x154671 0xe874958 0x18 0xd356 0xeb0c038 0x138 0x154671 0xf0a3888 0x50 0x14f40c

Binary trees (AVL, red/black) are well-suited here. Hash tables are not.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Dynamic Structure Reconstruction
Step #3: Run Program under Instrumentation

DBI

Program Inputs

Logged Memory Access Data

Run the program under dynamic binary instrumentation. Provide

inputs that exercise as much functionality as possible.
Dynamic Structure Reconstruction
Step #4: Instrument Memory References

.text:61F33E5A mov eax, [ebp+4]

.text:61F33E60 add edx, 4 Insert DBI memory access
.text:61F33E63 mov ebx, [eax+8] callback routine before every
.text:61F33E69 push ebx memory access
.text:61F33E6A call sub 61F33950

Use DBI to instrument every memory reference.

Dynamic Structure Reconstruction
Step #5: Detect, Record Structure Accesses

void DBIMemAccessCallback(
000000ADDR eaIns, ADDR eaMem,
000000SIZE size, BOOL bRead) {
{
00AllocRecord *ar = lookup(eaMem); J
00if(ar != NULL)
0000log(ar,eaIns,size,eaMem,bRead); J
}

I Lookup accessed addresses in the map. J

I If the access was within an allocation, log the details. J
I Next slide gives an example.
Dynamic Structure Reconstruction
Step #5: Detect, Record Structure Accesses

Suppose this instruction accesses address DC07928:

.text:61F33E63 mov ebx, [eax+8]

Suppose further that we have recorded this allocation record:

DC07900 80 6F00B

We found an access! Log the following data:

Allocation RVA 6F00B

Allocation Size 80
Instruction RVA 83E63
Access Size dword
Access Offset 0x28
Access Type READ
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Locate Memory Management Functions
Hook Memory Management Functions
Run the Program, Instrumented
Instrument Memory References
Detect and Record Structure Accesses
Post-Process Recorded Data
Dynamic Structure Reconstruction
Step #6: Post-Process Recorded Data

So far, we have logged access data to allocated objects, as in:

Allocation Inst. Access

RVA Size RVA Offset Size Type
6F00B 50 6D04A 0x48 4 WRITE
14F40C 50 6B84D 0x18 4 READ
14C213 50 6B859 0x4 4 READ
55816 50 1E0E4 0x44 4 READ
E941A 50 BD7DC 0x10 8 WRITE
6F00B 50 6D000 0x8 8 READ
55816 50 6D00B 0x0 4 WRITE
6F00B 50 149E8D 0x20 1 READ

Now we process this data to reconstruct useful information.

This step is not specific to DBI.
Dynamic Structure Reconstruction
Segregate Data by Allocation Site

Allocation Inst. Access

RVA Size RVA Offset Size Type
I 6F00B 50 6D04A 0x48 4 WRITE
14F40C 50 6B84D 0x18 4 READ
14C213 50 6B859 0x4 4 READ
I 55816 50 1E0E4 0x44 4 READ
E941A 50 BD7DC 0x10 8 WRITE
I 6F00B 50 6D000 0x8 8 READ
I 55816 50 6D00B 0x0 4 WRITE
I 6F00B 50 149E8D 0x20 1 READ

First, group accesses by their allocation site.

(If two sites are known to allocate the same type, we can merge their data.)
Dynamic Structure Reconstruction
Rebuild C-Level Structures

For a given allocation site:

Offset Size ztruct X {

0x0 4 znt f0;
0x4 4 znt f4;
0x8 8 __int64 f8;
0x10 8 __int64 f10;
0x18 4 znt f18;
0x20 1 zhar f20;
0x21 1 zhar f21;
0x22 2 zhort f22;
···
1. Sort accesses, remove duplicates.
Dynamic Structure Reconstruction
Rebuild C-Level Structures

For a given allocation site:

Offset Size struct X {

0x0 4 int f0;
0x4 4 int f4;
0x8 8 __int64 f8;
0x10 8 __int64 f10;
0x18 4 int f18;
0x20 1 char f20;
0x21 1 char f21;
0x22 2 short f22;
···
1. Sort accesses, remove duplicates.
2. Create properly sized and padded fields. (Easy, right? . . . )
Dynamic Structure Reconstruction
Discover Nested Subobjects

Brief digression: discovery of nested structure locations.

mov ebx, [eax+I8J]

Allocation RVA 6F00B
Allocation Size 80
Access Offset I 0x28 J

I Notice that the instruction vs. accessed offsets are different:

I The instruction uses offset I8J; however:
I The logged, raw offset into the allocation was I 0x28 J.
I Hence, when logged, eax pointed +0x20 into the allocation.
I Usually, this implies a structure is contained at offset +0x20.
I The alternative is a pointer to a contained POD type field.
I We can use this information to recover nesting relationships.
I Reconstruct not just a flat list of fields, but contained structs.
Dynamic Structure Reconstruction
Sources of Ambiguity in the Data

Imperfect data (misleading, conflicting, or hard-to-analyze

structure field accesses) arises from:

I Natural causes in the source code.

I Casts between integer sizes
I Use of unions
I Arrays
I Compiler optimizations.
I Bulk data operations.

A summary of these problems and our solutions follow.

Dynamic Structure Reconstruction
Ambiguity #1: Casting

struct X { int16 b = x->a; int8 b = x->a;

00// ... . . . might compile to . . . . . . might compile to . . .
00int32 a; mov ax, [rsi+10h] mov al, [rsi+10h]

I Casts produce different access sizes to the same field.

I My solution: choose the most frequently-occurring size.
I Works well in this case.
Dynamic Structure Reconstruction
Ambiguity #2: Compiler Optimizations

if(x->flags32 & 0x40) might compile to:

F7 06 40 00 00 00 test dword ptr [esi], 40h

OR, EQUIVALENTLY:
F6 06 40 test byte ptr [esi], 40h

I Peephole optimizers may produce the smaller, second one.

I This can produce different access sizes to the same fields.
I My solution: same as before. Also works well.
Dynamic Structure Reconstruction
Ambiguity #3: Bulk Copies and Assignments

struct X { memset(x1,0,sizeof(X)) might compile to:

00char a; J xor eax, eax
00char b; J mov qword ptr [rcx], rax J
00short c; J mov qword ptr [rcx+8], rax
00int d; J mov qword ptr [rcx+10h], rax
00// ...
mov qword ptr [rcx+18h], rax
···
I Compiled memset and memcpy often use block operations, i.e.,
are oblivious to the sizes/configurations of structure fields.
I This can produce different access sizes to the same fields.
I My solution: same as before. Also works well.
Dynamic Structure Reconstruction
Ambiguity #4: unions

struct X {
00// ...
00union { int c = x->a; char *d = x->b;
0000int a; . . . might compile to . . . . . . might compile to . . .
0000char *b; mov eax, [rsi+10h] mov rax, [rsi+10h]
00}
00// ...

I With unions, multiple variables (of different types and

sizes) occupy a single memory location.
I Clearly there can be multiple sizes for one location.
I My solution: as before, choose the most frequent size.
I That is, sidestep and ignore unions completely.
I Not a real solution, and does not work very well!
Dynamic Structure Reconstruction
Ambiguity #5: Array Accesses

struct X { int b = x->a[i];

00// ... . . . might compile to . . .
00int a[10]; mov ebx, [rsi+rbx*4+10h]

I Arrays produce references to non-constant offsets.

I My solution: discard accesses with non-constant offsets.
I Also not a real solution.
I Does not recover arrays, period, let alone do it well.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Limitations of DBI-Based Solutions
Limitation #1: Overhead from Instrumentation

I The technique is comprehensive and fully automated, but . . .

I Every memory access must be instrumented.
I Every memory access incurs a map lookup.
I This is a relatively heavyweight application of DBI.
I The overhead impedes interaction with the application;
I Lower interactivity means lower code coverage; and
I Limited code coverage means limited applicability.
I Furthermore, the overhead is fundamental to the approach.
I No matter how we optimize it, fundamentally, the approach
instruments all memory accesses.
Limitations of DBI-Based Solutions
Limitation #2: Overhead from Tracking Every Allocation

I Every instrumented memory access requires a map lookup.

I Binary trees ensure slow log (N) growth, but nevertheless . . .
I . . . more allocations tracked means slower lookups.
I Can we reduce the overhead even further?, and/or
I Is it useful to track only a subset of allocations?
Limitations of DBI-Based Solutions
Friendliness as an Interactive Tool

I DBI solution is fire-and-forget, monolithic, and slow.

I Can we make a useful interactive reverse engineering tool?
I How best to offer the results within Hex-Rays and IDA?
I GUIs for browsing the results
I Full automation for common type annotation tasks
I Library elements for custom tasks
Directions Explored in this Research

1. Track memory accesses via page-related chicanery.

2. Allow the user to specify particular allocation sites of interest,
rather than targeting all at once.
3. Divert interesting allocations into custom allocators.
4. Performance-optimize everything.
5. Make good use of the resulting data.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Focus on the Memory, not the Instructions
Idea #1: Debug Breakpoints

First idea for replacing DBI: X86 debug/memory breakpoints.

mov rcx, 20h

call malloc ; Set 8-byte R/W breakpoint on:
; rax+0x00
; rax+0x08
; rax+0x10
; rax+0x18

I PRO: only incurs overhead when the memory is accessed

I CON: can only set 4 breakpoints, i.e., 0x20 bytes of memory
I Would like to track megabytes/gigabytes, not “bytes”.

Verdict: right direction, wrong scale.

Focus on the Memory, not the Instructions
Idea #2: Page Breakpoints

Second idea for replacing DBI: SoftICE’s BPR feature.

mov rcx, 88h

call malloc SoftICE command:
:bpr rax rax+0x88 rw

I Unlimited memory breakpoints of any size!

I Can be implemented in kernel-mode or user-mode.
I Note, also implemented by other tools:
I IDA’s large memory breakpoints
I OllyDbg’s page breakpoints

Sounds promising! Let’s review how it works.

Virtual Memory and Demand Paging
Virtual Memory Concept

Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT

#1 #2 #3 #4

PHYS PHYS PHYS

#1 #2 #3

The OS provides the illusion of a large virtual address space, but

only mapped addresses are valid. (E.g., #4 is not.)
Virtual Memory and Demand Paging
Virtual-to-Physical Address Translation
Page Table Entries (PTEs)

VIRT VIRT VIRT

#1 #2 #3

PHYS PHYS PHYS

#1 #2 #3

PTE #2 for VIRT = 0x76543000

31 11 1 0

Physical Address
Misc. Flags P=1
0x12345000

The page table maintains the virtual-to-physical mappings. E.g. if

virtual 0x76543000 is mapped to physical 0x12345000, then
0x76543 012 resolves to 0x 12345
| {z } |{z} 012 .
| {z } |{z}
Virtual Offset PhysicalOffset
Virtual Memory and Demand Paging
On-Demand Growth of Virtual Address Space

Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT

#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS

#1 #2 #3 #4 #5

The OS can grow the virtual address space on demand by

allocating and mapping additional physical pages.
Virtual Memory and Demand Paging
Swapping Under High Memory Load
Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT

#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS

#1 #2 #3 #4 #5

PTE #3
31 11 1 0

Physical Address Misc. Flags P=0

When memory is scarce, the OS reclaims physical pages by:

1. Writing their contents to disk.
2. Marking the corresponding PTE entry as non-present (P=0).
Later accesses to non-present pages generate page fault exceptions.
Virtual Memory and Demand Paging
Transparently Reloading Swapped Pages
Page Table Entries (PTEs)

VIRT VIRT VIRT VIRT VIRT

#1 #2 #3 #4 #5

PHYS PHYS PHYS PHYS PHYS

#1 #2 #3 #4 #5

PTE #3
31 11 1 0

Physical Address Misc. Flags P=0

In response to page faults in non-present pages, the OS:

1. Allocates a physical page.
2. Loads the page’s previous contents from disk.
3. Updates the PTE with the physical address and P=1.
4. Resumes execution.
Dynamic Structure Reconstruction
Summary

SoftICE’s BPR, IDA’s large memory, and OllyDbg’s page

breakpoints co-opt X86’s demand paging mechanism as such:
I Mark pages of interest as non-present.
I Intercept page fault exceptions for those pages.
I Determine whether the region of interest was accessed;
I Break if so, continue execution if not.
We exploit this same mechanism to track structure accesses.
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]

I 61F33E60 add edx, 4

Before the first instruction executes, assume that the

page at [ebx+4] has been marked as non-present.
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]

I 61F33E60 add edx, 4

First instruction attempts to execute.

Memory Tracing via Presence
Mechanism in Detail

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault
I 61F33E60 add edx, 4

Since [ebx+4] is non-present, the CPU triggers a page

fault exception.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4] if(!ourFault())
Page Fault
I 61F33E60 add edx, 4
00return;

The OS transfers control to our exception handler.

Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I if(!ourFault())
I 61F33E60 add edx, 4
I 00return;

If the address was not within a tracked region, we pass.

Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I log(exnDetails);
I 61F33E60 add edx, 4
I save(faultEa); J

Log structure access (identically to what we did for DBI).

Also, record the address of the faulting instruction
(61F33E5A J in this case).
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I makePresent(page);
I 61F33E60 add edx, 4

Mark the page at [ebx+4] as present again.

Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Page Fault I setTrapFlag();
I 61F33E60 add edx, 4
I resumeExecution();

Set the X86 trap flag (TF). This will allow one instruction
to execute, after which a single-step exception will be
raised. Continue execution of the monitored program.
Memory Tracing via Presence
Mechanism in Detail

I 61F33E5A mov eax, [ebx+4]

I 61F33E60 add edx, 4

The first instruction executes again. Since the page at

[ebx+4] is now present, execution succeeds this time.
Memory Tracing via Presence
Mechanism in Detail

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step
I 61F33E60 add edx, 4

The second instruction would execute, but since the TF

was previously set, the CPU raises a single-step exception.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I if(!ourFault())
I 61F33E60 add edx, 4
I 00return;

The OS invokes our single step exception handler. Ensure

that the faulting address J is immediately after the
previously-saved faulting address J. If not, we pass.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I makeNonPresent(page);
I 61F33E60 add edx, 4

Mark the page at [ebx+4] as non-present again, ensuring

that we catch future accesses to the monitored page.
Memory Tracing via Presence
Mechanism in Detail

Our Exception Handler

EXCEPTION:
I 61F33E5A mov eax, [ebx+4]
Single Step I resumeExecution();
I 61F33E60 add edx, 4

Resume execution of the program.

Memory Tracing via Presence
Multi-Threaded Architecture

Controller Thread

Poll buffers
Filter, log
Sleep

Ring Ring Ring

Buffer Buffer Buffer
Catch faults Catch faults Catch faults
Log data Log data Log data

Thread #1 Thread #2 Thread #3

A dedicated thread retrieves events from the program’s thread-local

data, filters duplicates, and logs the results to disk via buffered I/O.
Memory Tracing via Presence
Optimizations

Optimization ideas implemented:

1. Emulate common instructions instead of single-stepping1
2. Use guard pages instead of PAGE_NOACCESS2
3. Force consumer thread away from producer thread cores

Architectural # Accesses
Revision per Minute
Single-Threaded 3M
Multi-Threaded 6.4M
Mini X64 Emulator V1 9M
Guard Pages 11M
SetThreadAffinityMask() 12.1M
Mini X64 Emulator V2 13.2M

1
Suggested by Yaron Dinkin
2
Suggested by Jason Geffner + RECON attendee whose name I forget (sorry)
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
How to Apply Page-Based Tracking

We’ve shown how to track memory, but not how to apply it. We
explore our two possibilities, and strategies for those cases:

1. Track every allocation, a la DBI.

2. Only track certain allocations.
Dynamic Structure Reconstruction
Tracking Every Allocation

When tracking all allocations:

void allocHook(int size) { void freeHook(void mem) {

00// existing hook code 00// existing hook code
00addBpt(mem,size); J 00delBpt(mem,size); J
00// existing hook code 00// existing hook code
} }

Simply add breakpoints upon malloc, and remove upon free.

Target Specific Allocation Sites
Noisiness of Technique

When tracking only some allocations:

Heap

... Allocation #1 Allocation #2 Allocation #3 ...

I Since our technique works at the level of pages, we incur page

faults for any allocation on the same page.
I Only interested in red region.
I However, we take faults in blue regions on the same page.
I More page faults = more overhead.
Target Specific Allocation Sites
Noisiness of Technique

When tracking only some allocations:

Heap

... Allocation #1 Allocation #2 Allocation #3 ...

Page Boundary

I Since our technique works at the level of pages, we incur page

faults for any allocation on the same page.
I Only interested in red region.
I However, we take faults in blue regions on the same page.
I More page faults = more overhead.
I Worse, we may monitor multiple pages per allocation.
I Best performance: only fault on interesting allocations.
Target Specific Allocation Sites
Divert into Custom Allocator

When tracking only specific allocations, to improve performance:

void mallocHook(int size) { void freeHook(void mem) {

00if(isInteresting(_RetAddr())) { J 00if(isCustom(mem)) { J
0000mem = customAlloc(size); J 0000delBpt(mem,size); J
0000addBpt(mem,size); J 0000customFree(mem); J
00} else 00} else
0000mem = originalMalloc(size); 0000originalFree(size);
00return mem; }
}

Divert interesting allocations into a custom allocator.

Benefit: no page faults for uninteresting allocations!

Target Specific Allocation Sites
Divert into General-Purpose, Off-the-Shelf Allocator

One strategy for custom allocation: use an existing allocator.

HANDLE hHeap = HeapCreate(...); void customFree(void *mem) {

//... 00HeapFree(hHeap,0,mem);
void *customAlloc(size) { }
00return HeapAlloc(hHeap,0,size);
}

Pros: Cons:
I Easy to implement I Page faults for in-band
I Usually thread-safe metadata
I Naturally handles I Slower than some
different-sized allocations alternatives
I Tuned for performance
Target Specific Allocation Sites
Divert into Customized Slab Allocator

Another allocation strategy: fixed-size slab allocator.

Slab Allocator
CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK CHUNK
...
#1 #2 #3 #4 #5 #6 #7 #8

#3 #5 #7

Free List

Pros: Cons:
I Fast allocation and range checks I Fixed-size
I No in-band metadata I Must be applied judiciously
Dynamic Structure Reconstruction
Summary: DBI vs. DLL Injection

DBI DLL Injection

1. Hook malloc/free 1. Hook malloc/free
2. Record allocation details 2. Divert chosen allocations
3. Instrument memory into custom allocator
references 3. Mark allocations non-present
4. Keep accesses to allocations 4. Catch memory exceptions
5. Log structure accesses 5. Log structure accesses
6. Post-process data 6. Post-process data
Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
Loading the Data in IDA

Right-click in the main GUI window.

I Unknown structure: for structure recovery purposes.

I Known structure: when the structure type is already known.
I This case is still very useful, as we will see.
Dynamic Structure Reconstruction
List of Raw Structure Accesses

After loading the access data for some allocation site(s).

Dynamic Structure Reconstruction
Raw Structure Access Operations

Right-clicking provides two primary operations:

1. Apply structure offsets in the disassembly.

2. Change the types of variables in Hex-Rays.
Dynamic Structure Reconstruction
Applying Structure Offsets in the Disassembly

Applying structure offsets is extremely fast.

Dynamic Structure Reconstruction
Structure Cross-References

You can browse cross-references in the structures window.

1243 free structure references to that field alone!
Dynamic Structure Reconstruction
Locating Hex-Rays Variables

1706AC8A movzx eax, byte ptr [rbx+40h]

For a given discovered structure access, locate the pointer

dereference at that address in the Hex-Rays CTREE.

Does not always work. Hex-Rays may:

I Lose track of the address of the memory dereference;
I Not create a variable for the pointer dereference;
I Render the access via many patterns, some of which I miss.

Can be improved somewhat, but will never be perfect.

Dynamic Structure Reconstruction
Applying Hex-Rays Variable Types

Comparison before and after applying Hex-Rays types.

Dynamic Structure Reconstruction
Existing DBI-Based Approaches
Limitations of DBI-Based Solutions
My Contributions to this Problem
Exploit X86 Demand-Based Paging
DLL Injection-Based Memory Tracking
Target Specific Allocation Sites
Exploit the Results within IDA/Hex-Rays
Target-Specific Example: unions
Dynamic Structure Reconstruction
unions

union U {
00int x;
00char *y;
00void *z;
};

I unions allow multiple interpretations of the same variable.

I U can hold either an int, x, OR a char *, y, OR a void *, z.
Dynamic Structure Reconstruction
Tagged unions

enum Uelt { struct taggedU {

00Uint = 0, 00Uelt t;
00Ucptr = 1, 00U e;
00Uvptr = 2 };
};

I The tagged union pattern associates an enum with a union.

I t is called the tag or the discriminant.
I This design pattern is especially common in programming
language tools (compilers, interpreters, decompilers, etc).
Dynamic Structure Reconstruction
Tag Checking

void print(taggedU *u) {

00if(u->t == Uint) 0printf("%d\n", 0u->e.x);
00if(u->t == Ucptr) printf("%s\n", 0u->e.y);
00if(u->t == Uvptr) printf("%llx\n",u->e.z);
};

I The code must check the tag to know the union’s held type.
I Code using unions is littered with these checks.
Dynamic Structure Reconstruction
unions in Decompilation: Improper Selection

Proper union field selection is critical to readable decompilation.

Failure to do so leads to hideous code like this.
Dynamic Structure Reconstruction
unions in Decompilation: Proper Selection

Same code as the previous, with three union fields set properly.
unionsare particularly tedious to apply manually – let’s automate.
Dynamic Structure Reconstruction
Upon Manually Discovering a union Somewhere . . .

class mop_t {
+0x00 00mopt_t t; J
+0x01 00char oprops;
+0x02 00short valnum;
+0x04 00int size;
+0x08 00union { ... }; J
+0x10 };

I Suppose we discover a tagged union within an allocated type.

I Suppose we discover the tag location at +0x00 J.
I Suppose we discover the union location at +0x08 J.
I Run the structure access discovery again, but this time:
I Only log accesses to the union region. J
I Log the value of the enum at every union access. J
Dynamic Structure Reconstruction
Determine the Number of Union Variants

RVA enum Type

150F03 0x2 READ
151F8A 0xA READ
1520BC 0xC READ
14E9AE 0x8 READ
15EB78 0x4 READ
1531A8 0x4 READ
146F01 0xE READ

This data helps determining the number of union variants. Create

a union with this many variants (of the observed sizes).
Dynamic Structure Reconstruction
Mapping between enum Elements and union Fields

enum mop_t {
union {
00mop_z00 = 0,
00mop_r00 = 1,
10 00mreg_t r;
00mop_n00 = 2,
11 00mnumber_t *nnn;
00mop_str = 3,
12 00minsn_t *d;
00mop_d00 = 4,
13 00stkvar_ref_t *s;
00mop_S00 = 5,
14 00ea_t g;
00mop_v00 = 6,
15 00int b;
00mop_b00 = 7,
16 00mfuncinfo_t *f;
00mop_f00 = 8,
17 00lvar_ref_t *l;
00mop_l00 = 9,
18 00mop_addr_t *a;
00mop_a00 = 10,
19 00char *helper;
00mop_h00 = 11,
10 00char *cstr;
00mop_c00 = 12,
11 00mcases_t *c;
00mop_fn0 = 13,
12 00fnumber_t *fpc;
00mop_p00 = 14,
13 00mop_pair_t *pair;
00mop_sc0 = 15
14 00scif_t *scif;
};
};

Through reverse engineering, manually establish a mapping

between the enum elements and union variant numbers.
Dynamic Structure Reconstruction
Setting Hex-Rays Union Selections

17150D86 mov rbx, [rbx+28h]

1. First, set the type of the base variable, as before.

2. Next, locate the union reference •.
3. Finally, apply the proper union element number, based on the
tag value recorded at runtime.
Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Overview: Big Picture

Via DLL injection, for every call to malloc, record the pointer.

loc_567:
v4 = malloc(0x138);

sub_123(a1,v9,0);
sub_456(a4,v7);
sub_234(v1+24,"a");
sub_345(a3,a2,v17+16);

Discover every function receiving an allocated pointer as argument.

Record h function RVA, arg #, allocation RVA, size, pointer offset i.
E.g., record h 0x345, #3, 0x567, 0x138, 16 i.
Dynamic Resolution of Argument Types
Preprocessing #1: Locate Functions via x64 Exception Directory

RUNTIME FUNCTION <rva sub 61EB80C0 J, rva algn 61EB80DB J, rva stru 620C0390>

61EB80C0 sub 61EB80C0 Jproc near

61EB80C0 var 18= qword ptr -18h
61EB80C0 00sub rsp, 38h
61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]
61EB80D1 00call free
61EB80D6 00add rsp, 38h
61EB80DA 00retn
61EB80DA sub 61EB80C0 endp
61EB80DB algn 61EB80DB: align 20h J

I The PE64 Exception Directory has RUNTIME FUNCTION entries.

I These give the beginning of every non-leaf function J,
I and its end (or the beginning of its first try block) J.
Dynamic Resolution of Argument Types
Preprocessing #2: Filter Unusable Functions

I 61EB80C0 00sub rsp, 38h

61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references

t I Has control-flow I Is after function end / beginning of
I Is not easily next try block (per X64 exception
relocatable metadata)
Dynamic Resolution of Argument Types
Preprocessing #2: Filter Unusable Functions

61EB80C0 00sub rsp, 38h

I 61EB80C4 00mov [rsp+38h+var 18], -2
61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references

61EB80C0 00sub rsp, 38h

61EB80C4 00mov [rsp+38h+var 18], -2
I 61EB80CD 00mov rcx, [rcx+60h]

Iterate through the function’s instructions. FAIL if instruction:

I Cannot be decoded I Has incoming cross-references

t I Has control-flow I Is after function end / beginning of
I Is not easily next try block (per X64 exception
relocatable metadata)

Succeed after sizeof(call) bytes. J

Dynamic Resolution of Argument Types
Preprocessing #3: Force __fastcall Calling Convention

For each kept function, get the prototype from Hex-Rays.

signed int64 signed int64

00__usercall@<rax> • 00__fastcall •
00sub_17078400( 00sub_17078400(
0000@__int64 a1@<rdx>, J 0000__int64 a2, J
0000@__int64 a2@<rcx>, J 0000__int64 a1) J
0000signed int a3@<r15d>) J

Remove non-__fastcall-compliant arguments; reorder remaining.

Dynamic Resolution of Argument Types
Preprocessing #4: Record Positions of Pointer-Sized Arguments

void __fastcall
00sub_61EB8AD0(
0000void *rcx0, J
0000unsigned int a2,
0000void *a3) J

For each pointer-sized argument J, record positions (#0, #2).

(Standardized X64 __fastcall makes this easier than on X86.)
Discard functions with no pointer-sized arguments.

Determining argument sizes isn’t perfect; Hex-Rays sometimes makes mistakes.

Dynamic Resolution of Argument Types
Preprocessing Summary

Function RVA # Prolog Bytes # Args # arg0 # arg1 ···

For each suitable function with pointer-size arguments, record:

I Function’s location
I Number of prolog bytes
I Number, positions of pointer-sized arguments

Function # Prolog # Tracked

RVA Bytes Args Arg positions
0xe760 5 4 0 1 3 4
0x114e50 8 3 0 1 2
0xf6f60 5 3 0 1 3
0x47c20 6 2 0 1
0x10c4d0 5 2 0 1
Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Step #1: Hook Memory Management Functions

Hook allocators via DLL injection.

HOOK
.idata:61EB19C8 extrn imp free:dword &freeHook
.idata:61EB19CC extrn imp malloc:dword &mallocHook
HOOK

As before, record allocation records from malloc until free.

Allocation Record

Allocated pointer Size RVA of return address from malloc

Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 mov r8, rcx
I .text:61F6ABD3 push rbx
I .text:61F6ABD4 sub rsp, 80h
I .text:61F6ABDB mov [r11-68h], -2

I .text:6205CC10 push rdi

I .text:6205CC12 sub rsp, 40h
I .text:6205CC16 mov [rsp+48h+var 28], -2

I .text:6205F91C sub rsp, 18h

I .text:6205F920 mov r8, rcx
I .text:6205F923 mov eax, 5A4Dh

I .text:61F79570 mov rax, rsp

I .text:61F79573 push r14
I .text:61F79575 sub rsp, 60h

I .text:61F00AA0 sub rsp, 38h

I .text:61F00AA4 mov [rsp+38h+var 18], -2
I .text:61F00AAD mov rcx, [rcx]

For each function to hook . . .

Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 mov r8, rcx I .text:12345600 mov r8, rcx
I .text:61F6ABD3 push rbx COPY I .text:12345603 push rbx
I .text:61F6ABD4 sub rsp, 80h I .text:12345604 sub rsp, 80h
I .text:61F6ABDB mov [r11-68h], -2 I .text:1234560B jmp 61F6ABDB

I .text:6205CC10 push rdi COPY I .text:12345620 push rdi

I .text:6205CC12 sub rsp, 40h I .text:12345622 sub rsp, 40h
I .text:6205CC16 mov [rsp+48h+var 28], -2 I .text:12345626 jmp 6205CC16

I .text:6205F91C sub rsp, 18h COPY I .text:12345640 sub rsp, 18h

I .text:6205F920 mov r8, rcx I .text:12345644 mov r8, rcx
I .text:6205F923 mov eax, 5A4Dh I .text:12345647 jmp 61F79573

I .text:61F79570 mov rax, rsp COPY I .text:12345660 mov rax, rsp

I .text:61F79573 push r14 I .text:12345663 push r14
I .text:61F79575 sub rsp, 60h I .text:12345666 jmp 61F79575

I .text:61F00AA0 sub rsp, 38h COPY I .text:12345680 sub rsp, 38h

I .text:61F00AA4 mov [rsp+38h+var 18], -2 I .text:12345684 mov [rsp+38h+var 18], -2
I .text:61F00AAD mov rcx, [rcx] I .text:1234568D jmp 61F00AAD

Allocate memory for re-entry thunks. Copy the leading

instructions, and insert a jump to after the copied instructions.
Record { Original RVA, Thunk VA } in a hash table.
Dynamic Resolution of Argument Types
Step #2: Hook Every Suitable Function
I .text:61F6ABD0 call commonLog
.text:61F6ABDB mov [r11-68h], -2

I .text:6205CC10 call commonLog

.text:6205CC16 mov [rsp+48h+var 28], -2

I .text:6205F91C call commonLog

.text:6205F923 mov eax, 5A4Dh commonLog()

I .text:61F79570 call commonLog

.text:61F79575 sub rsp, 60h

I .text:61F00AA0 call commonLog

.text:61F00AAD mov rcx, [rcx]

Divert every function into a common logging routine.

Dynamic Resolution of Argument Types
Redirect Functions into a Common Logging Stub

commonLog:
; Save flags
; Save registers
00mov rcx, rsp 00000000; Point arg #0 to stack data
00call commonLogC
00mov [rsp+78h], rax J ; Return to re-entry thunk
; Restore registers
; Restore flags
00retn

I commonLog just invokes its C counterpart.

I Overwrite return address J with function’s re-entry thunk.
Dynamic Resolution of Argument Types
Argument Logging Details

uint64_t commonLogC(uint64_t *args) {

00funcRVA = _ReturnAddress();
#1 00reEntry, argList = lookup(funcRVA);
#2 00for(argNo : argList)
#2 0000log(funcRVA,argNo,args[argNo]);
#3 00return reEntry;
}

1. Fetch re-entry address, list of interesting arguments.

2. Log each interesting argument.
I This happens in another thread for efficiency.

3. Return to re-entry thunk.

Dynamic Resolution of Argument Types
Filter, Log Allocation Flow to Arguments

void log(uint64_t funcRVA, int argNo, uint64_t arg) {

#1 00allocRec = allocMapLookup(arg);
#2 00if(!allocRec) return;
#3 00write(funcRVA,argNo,allocRec);
}

1. Look up function argument in allocation map.

2. Return if not part of an allocation.
3. Write log entry otherwise.

Logged Data
Function RVA # Arg Alloc RVA Alloc size Offset into alloc
Dynamic Resolution of Argument Types
Summary: Logged Data

Logged Data
Function RVA # Arg Alloc RVA Alloc size Offset into alloc
0x15f520 1 0xbe648 0x50 0x20
0x153ce0 0 0xbe648 0x50 0x0
0x147520 0 0x143fd8 0x50 0x40
0x15f8d0 1 0x143fd8 0x50 0x40
0x11530 0 0x56c89 0x630 0x0
0x55a80 0 0x56149 0x120 0x0
0x57bd0 0 0x56149 0x120 0x18

This generates a lot of data (˜60K entries for my target).

Dynamic Resolution of Argument Types
Preprocessing
Run-Time Data Collection
Applying the Results
Dynamic Resolution of Argument Types
Loading the Data in IDA

I First, create an allocator/free pair.

I Can add multiple allocators if appropriate.
I Next, load the data for that allocator.
Dynamic Resolution of Argument Types
Displaying Allocation Sites and Types

I Double-click an allocator to see the list of data:

1. The address of all observed allocation sites
2. The size allocated by that site
I If multiple sizes, show the GCD
3. The user-supplied type for that site, if any
Dynamic Resolution of Argument Types
Displaying Allocation Site Flow Data

I Double-click an allocation site to see the list of:

1. Functions and arguments into which the allocations flowed
2. Observed offsets from the base of the allocation
Dynamic Resolution of Argument Types
Displaying Allocation Flow in Hex-Rays

Hex-Rays listings will automatically display allocation flow data.

Dynamic Resolution of Argument Types
Setting Allocation Site Types

Once known, the user manually sets the allocation site type.
(Or, in IDA: Edit->Operand Type->Set Operand Type)

After refresh, the allocation sites window shows the type.

Dynamic Resolution of Argument Types
Applying Argument Types

For known allocation site types, the user can apply argument types.
Can select multiple allocation sites at once.
Dynamic Resolution of Argument Types
Applied Argument Types

Way better than doing it by hand, isn’t it?

Dynamic Resolution of Argument Types
Related Types

For a given allocation site, for each offset passed to a function

argument, display the types of other structure fields passed to the
same argument.
Further Extensions and Challenges
Extensions
Challenges
Further Extensions
Combination with Static Analysis

Access data only covers observed behaviors.

E.g., will not discover the accesses J below.

// ALWAYS OBSERVED TAKEN

if(v1 != 0) {
00x->f0 = 1234;
00x->fC = 0;
} // NEVER OBSERVED TAKEN
else {
00x->f4 = 456; J
00x->f8 = 789; J
}

Can use Hex-Rays struct analysis to discover other accesses J.

Further Extensions
Maximum Size of Pointed-To Objects

Want to know: how big is the thing an argument points to?

Offset Size Max

0x00 0x30 0x30
0x40 0x50 0x10
0x2C 0x30 0x04

Example data reaching a function argument

I Maximum size is the distance from the offset to the end.

I Take the minimum across all data points.
Further Extensions
Inheritance Discovery by Access Location

Derived classes must construct base classes first:

0042CFF0 GraceWireGeneric Constructor proc near

···
0042D020 push 4
0042D025 call GraceObject Constructor
Further Extensions
Inheritance Discovery by Access Location

00424096 mov [edi+8], eax

00424099 lea eax, [edi+0Ch]
0042409C push eax
0042409D mov [ebp+a5], eax
004240A0 mov dword ptr [edi], offset vtbl

I Hence, every class in a single-inheritance hierarchy should

have the same address for its first access.
I CAVEAT: inlined constructors will break this.
Further Extensions and Challenges
Extensions
Challenges
Challenges
Code Coverage

As with any dynamic analysis, results limited to covered code.

if(v1 != 0) { // ALWAYS OBSERVED TAKEN

00x->f0 = 1234;
00x->fC = 0;
} else { // NEVER OBSERVED TAKEN
00neverExecutedFunc1(x); J
00neverExecutedFunc2(x); J
}

I offer no real contributions here, other than that the performance

optimizations hereinbefore can increase observations per time unit.
Challenges
Type-Related Ambiguity

Suppose multiple allocation sites/sizes flow to an argument.

class x { class y :
+0 int a; J
00int a; J 00public x {
+4 int b; J
00int b; J 00int c; J
+8 int c; J
}; 00int d; J
+12 int d; J
00 };

What “type” should we assign the argument?

Need inheritance/composition relationships to fully resolve.
The data is still useful without knowing that, though.
Challenges
Type- and Size-Related Ambiguity

Suppose multiple allocation sites/sizes flow to an argument.

class x { class y : class z :

00int a; 00public x { 00public x {
00int b; 00char *c; 00void *d;
}; }; };

For derived objects, same size does not imply same type.
Challenges
Nested Structures

struct A {
00int a;
00struct B {
0000struct C { A
000000struct D {
00000000int d; int a int d int c int b
000000} D;
000000int c; D
0000} C; C
0000int b; B
00} B;
};

An example of nested structures.

Challenges
Access to Nested Structure Fields

Type Expression
int a int d J int c int b int * *x
D * x->d
D C * x->D.d
C B * x->C.D.d

I Suppose x points to int d J within a struct A.

I What is the C-level type of x and the accessing expression?
I The four possibilities are shown at right.
I Even if we had the structure types and nesting relationships
from the source code, how would we know the type of x?
Introduction

Dynamic Structure Reconstruction

Dynamic Resolution of Argument Types

Further Extensions and Challenges

Conclusion
Conclusion

I None of these techniques are particularly sophisticated.

I However, they are easy-to-use and produce very useful results.
I Despite challenges and open problems, the results are useful.
I Automation was a better use of my RE time than reading code.
I I’ll probably release the code in July.
I Check Twitter, Reverse Engineering reddit, etc.

Any Questions?

TCPB Workflow English
No ratings yet
TCPB Workflow English
168 pages
Computer Controlled Devices For Agri-Input Management
No ratings yet
Computer Controlled Devices For Agri-Input Management
9 pages
Unit 1 Combined
No ratings yet
Unit 1 Combined
256 pages
Segmentation in Operating System
No ratings yet
Segmentation in Operating System
9 pages
Session 3
No ratings yet
Session 3
49 pages
Os8 p3c8 Memorymanagement
No ratings yet
Os8 p3c8 Memorymanagement
58 pages
Module-1 DS 2024
No ratings yet
Module-1 DS 2024
158 pages
Session 3 (Static & Dynamic Implementation)
No ratings yet
Session 3 (Static & Dynamic Implementation)
20 pages
What Is Dynamic Memory Allocation
No ratings yet
What Is Dynamic Memory Allocation
2 pages
Illustrator Mcqs
50% (2)
Illustrator Mcqs
5 pages
Operaating System Take Home
No ratings yet
Operaating System Take Home
8 pages
Array Record
No ratings yet
Array Record
28 pages
DS Module 1
No ratings yet
DS Module 1
46 pages
Unit 3
No ratings yet
Unit 3
13 pages
Lecture 05
No ratings yet
Lecture 05
50 pages
Data Strctures Unit 1
No ratings yet
Data Strctures Unit 1
22 pages
Dyn Alloc
No ratings yet
Dyn Alloc
56 pages
DS Chapter 1
No ratings yet
DS Chapter 1
21 pages
Dsa 1 Unit Bca
No ratings yet
Dsa 1 Unit Bca
15 pages
Dynamic Memory Allocation in C
No ratings yet
Dynamic Memory Allocation in C
8 pages
Emerging Technologies For Business Processes
No ratings yet
Emerging Technologies For Business Processes
19 pages
OS Module IV
No ratings yet
OS Module IV
73 pages
Data Structures-Bcs304 Module 1 1
No ratings yet
Data Structures-Bcs304 Module 1 1
43 pages
1st Module
No ratings yet
1st Module
70 pages
Data Structures Overview: Topics
No ratings yet
Data Structures Overview: Topics
10 pages
Memory Management in Operating System
No ratings yet
Memory Management in Operating System
8 pages
COMP2006 Lecture 5 Structs
No ratings yet
COMP2006 Lecture 5 Structs
38 pages
Reseacrh Format
No ratings yet
Reseacrh Format
5 pages
Lecture 06
No ratings yet
Lecture 06
70 pages
Logic Development
No ratings yet
Logic Development
4 pages
Commercial DB
No ratings yet
Commercial DB
1 page
Crash Report
No ratings yet
Crash Report
83 pages
Prog 8
No ratings yet
Prog 8
61 pages
Lab 2
No ratings yet
Lab 2
4 pages
4 DS 1pdf 2023 09 23 11 39 51
No ratings yet
4 DS 1pdf 2023 09 23 11 39 51
18 pages
VS Code Extensions
No ratings yet
VS Code Extensions
8 pages
Basic Calculator in C# - C# Tutorials - Dream
No ratings yet
Basic Calculator in C# - C# Tutorials - Dream
17 pages
Dbss
No ratings yet
Dbss
44 pages
Lecture 2 M Allocation (ADT Arrays)
No ratings yet
Lecture 2 M Allocation (ADT Arrays)
38 pages
Introduction To Search Engine Optimization
No ratings yet
Introduction To Search Engine Optimization
6 pages
Packing Slip
No ratings yet
Packing Slip
2 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
Converted 7da7c
No ratings yet
Converted 7da7c
31 pages
Agile in The Enterprise 2019 - Results Summary (Updated)
No ratings yet
Agile in The Enterprise 2019 - Results Summary (Updated)
50 pages
VVP Engineering College: Assignment 1 Data Query Language
No ratings yet
VVP Engineering College: Assignment 1 Data Query Language
11 pages
Main Memory2 Studs v1
No ratings yet
Main Memory2 Studs v1
34 pages
Data Structure
No ratings yet
Data Structure
10 pages
Ahmad Altoufaily: Mechanical Engineer
No ratings yet
Ahmad Altoufaily: Mechanical Engineer
1 page
Osmod 3
No ratings yet
Osmod 3
14 pages
DS - Module 1
No ratings yet
DS - Module 1
25 pages
Unit 1 Introduction To Data Structures
No ratings yet
Unit 1 Introduction To Data Structures
98 pages
Accounting Information System For Decision Making
No ratings yet
Accounting Information System For Decision Making
3 pages
Automatic Reverse Engineering of Data Structures From Binary Execution
No ratings yet
Automatic Reverse Engineering of Data Structures From Binary Execution
18 pages
Static Data Structure
No ratings yet
Static Data Structure
24 pages
Trace Surfing Presentation
No ratings yet
Trace Surfing Presentation
63 pages
TCP Lab 04
No ratings yet
TCP Lab 04
5 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
Unit-III Notes Updated
No ratings yet
Unit-III Notes Updated
32 pages
Week 19 Dynamic Data Structure
No ratings yet
Week 19 Dynamic Data Structure
37 pages
Unit 4 CD
No ratings yet
Unit 4 CD
12 pages
Unit - 5 Sad
No ratings yet
Unit - 5 Sad
17 pages
LAB 2 DMA in Pointers
No ratings yet
LAB 2 DMA in Pointers
8 pages
Zimbra OS Admin Guide 8.6.0
No ratings yet
Zimbra OS Admin Guide 8.6.0
208 pages
Chapter One: Fundamentals of Data Structure
No ratings yet
Chapter One: Fundamentals of Data Structure
30 pages
Lecture 14
No ratings yet
Lecture 14
27 pages
Topic 1 Introduction To Data Structure PDF
No ratings yet
Topic 1 Introduction To Data Structure PDF
34 pages
12Th Generation of Poweredge Servers: 11G / 12G Quick Comparison Guide
No ratings yet
12Th Generation of Poweredge Servers: 11G / 12G Quick Comparison Guide
2 pages
Payment To Adhoc Bene Converter
No ratings yet
Payment To Adhoc Bene Converter
115 pages
CT038!3!2 Object Oriented Development Using Java (VD1) 1 September 2019
No ratings yet
CT038!3!2 Object Oriented Development Using Java (VD1) 1 September 2019
2 pages
DS Unit-I
No ratings yet
DS Unit-I
39 pages
In Computer Operating Systems
No ratings yet
In Computer Operating Systems
5 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
Data Structures Introduction
No ratings yet
Data Structures Introduction
9 pages
Lec1 2
No ratings yet
Lec1 2
21 pages
Online Railway Ticket Booking
No ratings yet
Online Railway Ticket Booking
36 pages
Biostar A58ml2 Spec
No ratings yet
Biostar A58ml2 Spec
6 pages
Gallagher - SALTO: Wireless Access Solutions
No ratings yet
Gallagher - SALTO: Wireless Access Solutions
4 pages
DSU Unit 1
No ratings yet
DSU Unit 1
9 pages
Simp Rewards: Downloaded From
82% (11)
Simp Rewards: Downloaded From
21 pages
Excel 2013 Training: 100 Metres Olympic
No ratings yet
Excel 2013 Training: 100 Metres Olympic
2 pages
Short Notes On Dynamic Memory Allocation, Pointer and Data Structure
No ratings yet
Short Notes On Dynamic Memory Allocation, Pointer and Data Structure
25 pages
VMware Interview Questions
100% (1)
VMware Interview Questions
9 pages
Full Report PDF
No ratings yet
Full Report PDF
67 pages
AB - PW3600 User Manual
No ratings yet
AB - PW3600 User Manual
91 pages
@vtucode - in Module 1 DS 2022 Scheme
No ratings yet
@vtucode - in Module 1 DS 2022 Scheme
50 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)

CPP Dynamic Type Recovery

Uploaded by

CPP Dynamic Type Recovery

Uploaded by

Automation Techniques in C++ Reverse Engineering

Rolf Rolles, Möbius Strip Reverse Engineering

July 16, 2019

Dynamic Structure Reconstruction

Dynamic Resolution of Argument Types

Further Extensions and Challenges

I While researching an upcoming C++ RE training class, I:

I Goal: derive type-related metadata from runtime allocation

Type information is the

. . . and this: nearly

Discover, through dynamically executing the program:

(And, some more experimental stuff described later.)

These techniques allowed me to automatically (or semi-auto.)

Structures recovered ˜200

I Existing academic work on the subject inspired me:

I I adapted and modified their ideas for better performance and

The workflow of these tools is as follows:

6. Post-process the data to build higher-level information.

free proc near

Locate and record pointers to memory management functions.

Hook the routines, point them to our wrappers around them

This step is not specific to DBI.

void *mallocHook(int size) { void freeHook(void *mem) {

1. Invoke the original malloc 1. Remove metadata

This works transparently to unmodified applications.

remember stores allocation records.

Allocated pointer Size RVA of return address from malloc

The allocation site is written as an RVA (offset into image).

.text:61EC53EE call malloc

Base address is 61EB0000, so RVA J = 61EC53F3-61EB0000 = 153F3.

remember stores pointers and metadata in a tree (map) structure.

0xe9d5a00 0x138 0x14f375

0xe176610 0x18 0x6f00b 0xec7c0a0 0x250 0x17b532

Logged Memory Access Data

Run the program under dynamic binary instrumentation. Provide

.text:61F33E5A mov eax, [ebp+4]

Use DBI to instrument every memory reference.

I Lookup accessed addresses in the map. J

Suppose this instruction accesses address DC07928:

.text:61F33E63 mov ebx, [eax+8]

Suppose further that we have recorded this allocation record:

We found an access! Log the following data:

Allocation RVA 6F00B

So far, we have logged access data to allocated objects, as in:

Allocation Inst. Access

Now we process this data to reconstruct useful information.

Allocation Inst. Access

First, group accesses by their allocation site.

For a given allocation site:

Offset Size ztruct X {

For a given allocation site:

Offset Size struct X {

Brief digression: discovery of nested structure locations.

mov ebx, [eax+I8J]

I Notice that the instruction vs. accessed offsets are different:

Imperfect data (misleading, conflicting, or hard-to-analyze

I Natural causes in the source code.

A summary of these problems and our solutions follow.

struct X { int16 b = x->a; int8 b = x->a;

I Casts produce different access sizes to the same field.

if(x->flags32 & 0x40) might compile to:

F7 06 40 00 00 00 test dword ptr [esi], 40h

I Peephole optimizers may produce the smaller, second one.

struct X { memset(x1,0,sizeof(X)) might compile to:

I With unions, multiple variables (of different types and

struct X { int b = x->a[i];

I Arrays produce references to non-constant offsets.

I The technique is comprehensive and fully automated, but . . .

I Every instrumented memory access requires a map lookup.

I DBI solution is fire-and-forget, monolithic, and slow.

1. Track memory accesses via page-related chicanery.

First idea for replacing DBI: X86 debug/memory breakpoints.

mov rcx, 20h

I PRO: only incurs overhead when the memory is accessed

Verdict: right direction, wrong scale.

Second idea for replacing DBI: SoftICE’s BPR feature.

void mallocHook(int size) { void freeHook(void mem) {

void allocHook(int size) { void freeHook(void mem) {

void mallocHook(int size) { void freeHook(void mem) {