x86 Disassembly
x86 Disassembly
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.1 What is Wikibooks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 What is this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.3 Who are the authors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.4 Wikibooks in Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.5 Happy Reading! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.1 What Is This Book About? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.2 What Will This Book Cover? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.3 Who Is This Book For? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.4 What Are The Prerequisites? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.5 What is Disassembly? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Tools 4
1.1 Assemblers and Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Assembler Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Intel Syntax Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 (x86) AT&T Syntax Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Other Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.7 Common C/C++ Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Disassemblers and Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 What is a Disassembler? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 x86 Disassemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Disassembler Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 A General view of Disassembling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Disassembly Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Example: Hello World Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Example: Basic Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
i
ii CONTENTS
2 Platforms 19
2.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Windows Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.5 System calls and interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.6 Win32 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.7 Native API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.8 ntoskrnl.exe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.9 Win32K.sys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.10 Win64 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.11 Windows Vista . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.12 Windows CE/Mobile, and other versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.13 “Non-Executable Memory” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.14 COM and Related Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.15 Remote Procedure Calls (RPC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Windows Executable Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 MS-DOS COM Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 MS-DOS EXE Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 PE Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 Relative Virtual Addressing (RVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.6 Code Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.7 Imports and Exports - Linking to other modules . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.8 Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.9 Imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.10 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.11 Relocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.12 Alternate Bound Import Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.13 Windows DLL Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CONTENTS iii
3 Code Patterns 33
3.1 The Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 The Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Push and Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 ESP In Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.4 Reading Without Popping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.5 Data Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Functions and Stack Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Functions and Stack Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Standard Entry Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Standard Exit Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.4 Non-Standard Stack Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.5 Local Static Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Functions and Stack Frame Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Example: Number of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Example: Standard Entry Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Notes on Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.3 Standard C Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.4 C++ Calling Convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.5 Note on Name Decorations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Calling Convention Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.1 Microsoft C Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2 GNU C Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.3 Example: C Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.4 Example: Named Assembly Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.5 Example: Unnamed Assembly Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.6 Example: Another Unnamed Assembly Function . . . . . . . . . . . . . . . . . . . . . . 43
3.5.7 Example: Name Mangling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
iv CONTENTS
3.6.1 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.2 If-Then . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.3 If-Then-Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.4 Switch-Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6.5 Ternary Operator ?: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Branch Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.1 Example: Number of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.2 Example: Identify Branch Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7.3 Example: Convert To C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.1 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.2 Do-While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.3 While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.8.4 For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.8.5 Other Loop Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.9 Loop Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9.1 Example: Identify Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9.2 Example: Complete C Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9.3 Example: Decompile To C Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Data Patterns 51
4.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.2 How to Spot a Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.3 .BSS and .DATA sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.4 “Static” Local Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.5 Signed and Unsigned Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.6 Floating-Point Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.7 Global Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.8 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.9 “Volatile” memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.10 Simple Accessor Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.11 Simple Setter (Manipulator) Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Variable Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Example: Identify C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Example: Identify C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.3 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.4 Advanced Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.5 Identifying Structs and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
CONTENTS v
5 Difficulties 62
5.1 Code Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.1 Code Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Stages of Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.3 Loop Unwinding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.4 Inline Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Optimization Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Example: Optimized vs Non-Optimized Code . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.2 Example: Manual Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.3 Example: Trace Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.4 Example: Decompile Optimized Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.5 Example: Instruction Pairings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.6 Example: Avoiding Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.7 Example: Duff’s Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Code Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Code Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.2 What is Code Obfuscation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.3 Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.4 Non-Intuitive Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.5 Obfuscators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.6 Code Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.7 Opaque Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.8 Code Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4 Debugger Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.1 Detecting Debuggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.2 IsDebuggerPresent API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.3 PEB Debugger Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4.4 Kernel Mode Debugger Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4.5 Timeouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vi CONTENTS
This book was created by volunteers at Wikibooks (http: This book was generated by the volunteers at Wikibooks,
//en.wikibooks.org). a team of people from around the world with varying
backgrounds. The people who wrote this book may not
be experts in the field. Some may not even have a passing
0.1.1 What is Wikibooks? familiarity with it. The result of this is that some infor-
mation in this book may be incorrect, out of place, or
misleading. For this reason, you should never rely on a
community-edited Wikibook when dealing in matters of
medical, legal, financial, or other importance. Please see
our disclaimer for more details on this.
Despite the warning of the last paragraph, however, books
at Wikibooks are continuously edited and improved. If
errors are found they can be corrected immediately. If
you find a problem in one of our books, we ask that you
be bold in fixing it. You don't need anybody’s permission
to help or to make our books better.
Wikibooks runs off the assumption that many eyes can
find many errors, and many able hands can fix them. Over
time, with enough community involvement, the books at
Wikibooks will become very high-quality indeed. You
are invited to participate at Wikibooks to help make
our books better. As you find problems in your book
don't just complain about them: Log on and fix them!
This is a kind of proactive and interactive reading expe-
rience that you probably aren't familiar with yet, so log
Started in 2003 as an offshoot of the popular Wikipedia on to http://en.wikibooks.org and take a look around at
project, Wikibooks is a free, collaborative wiki website all the possibilities. We promise that we won't bite!
dedicated to creating high-quality textbooks and other ed-
ucational books for students around the world. In addi-
tion to English, Wikibooks is available in over 130 lan- 0.1.3 Who are the authors?
guages, a complete listing of which can be found at http:
//www.wikibooks.org. Wikibooks is a “wiki”, which The volunteers at Wikibooks come from around the
means anybody can edit the content there at any time. world and have a wide range of educational and profes-
If you find an error or omission in this book, you can sional backgrounds. They come to Wikibooks for dif-
log on to Wikibooks to make corrections and additions ferent reasons, and perform different tasks. Some Wik-
as necessary. All of your changes go live on the website ibookians are prolific authors, some are perceptive edi-
immediately, so your effort can be enjoyed and utilized tors, some fancy illustrators, others diligent organizers.
by other readers and editors without delay. Some Wikibookians find and remove spam, vandalism,
Books at Wikibooks are written by volunteers, and can and other nonsense as it appears. Most Wikibookians
be accessed and printed for free from the website. Wiki- perform a combination of these jobs.
books is operated entirely by donations, and a certain por- It’s difficult to say who are the authors for any particu-
tion of proceeds from sales is returned to the Wikimedia lar book, because so many hands have touched it and so
Foundation to help keep Wikibooks running smoothly. many changes have been made over time. It’s not unheard
Because of the low overhead, we are able to produce and of for a book to have been edited thousands of times by
sell books for much cheaper then proprietary textbook hundreds of authors and editors. You could be one of them
publishers can. This book can be edited by anybody at too, if you're interested in helping out.
any time, including you. We don't make you wait two
years to get a new edition, and we don't stop selling old
versions when a new one comes out. 0.1.4 Wikibooks in Class
Note that Wikibooks is not a publisher of books, and
is not responsible for the contributions of its volunteer Books at Wikibooks are free, and with the proper edit-
editors. PediaPress.com is a print-on-demand publisher ing and preparation they can be used as cost-effective
that is also not responsible for the content that it prints. textbooks in the classroom or for independent learners.
Please see our disclaimer for more information: http:// In addition to using a Wikibook as a traditional read-
en.wikibooks.org/wiki/Wikibooks:General_disclaimer . only learning aide, it can also become an interactive class
2 CONTENTS
project. Several classes have come to Wikibooks to write x86 assembly code into human-readable C or C++ source
new books and improve old books as part of their nor- code. Some topics covered will be common to all com-
mal course work. In some cases, the books written by puter architectures, not just x86-compatible machines.
students one year are used to teach students in the same
class next year. Books written can also be used in classes
around the world by students who might not be able to 0.3.2 What Will This Book Cover?
afford traditional textbooks.
This book is going to look in-depth at the disassembly and
decompilation of x86 machine code and assembly code.
0.1.5 Happy Reading! We are going to look at the way programs are made us-
ing assemblers and compilers, and examine the way that
We at Wikibooks have put a lot of effort into these books, assembly code is made from C or C++ source code. Us-
and we hope that you enjoy reading and learning from ing this knowledge, we will try to reverse the process. By
them. We want you to keep in mind that what you are examining common structures, such as data and control
holding is not a finished product but instead a work in structures, we can find patterns that enable us to disas-
progress. These books are never “finished” in the tradi- semble and decompile programs quickly.
tional sense, but they are ever-changing and evolving to
meet the needs of readers and learners everywhere. De-
spite this constant change, we feel our books can be reli- 0.3.3 Who Is This Book For?
able and high-quality learning tools at a great price, and
we hope you agree. Never hesitate to stop in at Wiki- This book is for readers at the undergraduate level with
books and make some edits of your own. We hope to see experience programming in x86 Assembly and C or
you there one day. Happy reading! C++. This book is not designed to teach assembly lan-
guage programming, C or C++ programming, or com-
piler/assembler theory.
0.2 Cover
The Wikibook of
0.3.4 What Are The Prerequisites?
x86 Disassembly The reader should have a thorough understanding of x86
Using C and Assembly Language Assembly, C Programming, and possibly C++ Program-
ming. This book is intended to increase the reader’s
understanding of the relationship between x86 machine
code, x86 Assembly Language, and the C Programming
Kernel
Language. If you are not too familar with these topics,
you may want to reread some of the above-mentioned
books before continuing.
Library Library Library
0.3.5 What is Disassembly?
Tools
1.1 Assemblers and Compilers code, the instructions will be the same, but all the other
helpful information will be lost. The code will be accu-
rate, but more difficult to read.
Compilers, as we will see later, cause even more informa-
tion to be lost, and decompiling is often so difficult and
1.1.1 Assemblers convoluted as to become nearly impossible to do accu-
rately.
Assemblers are significantly simpler than compilers, and
are often implemented to simply translate the assembly
code to binary machine code via one-to-one correspon- 1.1.3 Intel Syntax Assemblers
dence. Assemblers rarely optimize beyond choosing the
shortest form of an instruction or filling delay slots. Because of the pervasiveness of Intel-based IA-32 mi-
croprocessors in the home PC market, the majority of
Because assembly is such a simple process, disassem-
assembly work done (and the majority of assembly work
bly can often be just as simple. Assembly instructions
considered in this wikibook) is x86-based. Many of
and machine code words have a one-to-one correspon-
these assemblers (or new versions of them) can handle
dence, so each machine code word will exactly map to one
amd64/x86_64/EMT64 code as well, although this wiki-
assembly instruction. However, disassembly has some
book will focus primarily on 32 bit (x86/IA-32) code ex-
other difficulties which cannot be accounted for using
amples.
simple code-word lookups. We will introduce assemblers
here, and talk about disassembly later.
MASM
4
1.1. ASSEMBLERS AND COMPILERS 5
• MASM currently supports all Intel instruction sets, GAS is developed specifically to be used as the GCC
including SSE2. backend. Because GCC always feeds it syntactically cor-
rect code, GAS often has minimal error checking.
Many users love MASM, but many more still dislike the GAS is available as a part of either the GCC package or
fact that it isn't portable to other systems. the GNU binutils package.
TASM
1.1.5 Other Assemblers
TASM, Borland’s “Turbo Assembler,” is a functional
assembler from Borland that integrates seamlessly with HLA
Borland’s other software development tools. Current re-
HLA, short for “High Level Assembler” is a project
lease version is version 5.0. TASM syntax is very similar
to MASM, although it has an “IDEAL” mode that many spearheaded by Randall Hyde to create an assembler
users prefer. TASM is not free. with high-level syntax. HLA works as a front-end to
other assemblers such as FASM (the default), MASM,
NASM, and GAS. HLA supports “common” assembly
NASM language instructions, but also implements a series of
higher-level constructs such as loops, if-then-else branch-
NASM, the “Netwide Assembler,” is a free, portable, ing, and functions. HLA comes complete with a compre-
and retargetable assembler that works on both Windows hensive standard library.
and Linux. It supports a variety of Windows and Linux
executable file formats, and even outputs pure binary. Since HLA works as a front-end to another assembler,
NASM is not as “mature” as either MASM or TASM, the programmer must have another assembler installed to
but is: assemble programs with HLA. HLA code output there-
fore, is as good as the underlying assembler, but the code
is much easier to write for the developer. The high-level
• more portable than MASM
components of HLA may make programs less efficient,
• cheaper than TASM but that cost is often far outweighed by the ease of writ-
ing the code. HLA high-level syntax is very similar in
• strives to be very user-friendly many respects to Pascal, which in turn is itself similar in
many respects to C, so many high-level programmers will
NASM comes with its own disassembler ndisasm, and immediately pick up many of the aspects of HLA.
supports 64-bit (x86-64/x64/AMD64/Intel 64) CPUs. Here is an example of some HLA code:
NASM is released under the LGPL. mov(src, dest); // C++ style comments pop(eax);
push(ebp); for(mov(0, ecx); ecx < 10; inc(ecx)) do
FASM mul(ecx); endfor;
FASM, the “Flat Assembler” is an open source assembler Some disassemblers and debuggers can disassemble bi-
that supports x86, and IA-64 Intel architectures. nary code into HLA-format, although none can faithfully
recreate the HLA macros.
1.1.4 (x86) AT&T Syntax Assemblers
AT&T syntax for x86 microprocessor assembly code is
1.1.6 Compilers
not as common as Intel-syntax, but the GNU Assembler
(GAS) uses it, and it is the de facto assembly standard on A compiler is a program that converts instructions from
Unix and Unix-like operating systems. one language into equivalent instructions in another lan-
guage. There is a common misconception that a com-
piler always directly converts a high level language into
GAS machine language, but this isn't always the case. Many
compilers convert code into assembly language, and a
The GNU Assembler (GAS) is the default back-end to few even convert code from one high level language into
the GNU Compiler Collection (GCC) suite. As such, another. Common examples of compiled languages are:
GAS is as portable and retargetable as GCC is. However, C/C++, Fortran, Ada, and Visual Basic. The figure below
GAS uses the AT&T syntax for its instructions as default, shows the common compile-time steps to building a pro-
which some users find to be less readable than Intel syn- gram using the C programming language. The compiler
tax. Newer versions of gas can be switched to Intel syntax produces object files which are linked to form the final
with the directive ".intel_syntax noprefix”. executable:
6 CHAPTER 1. TOOLS
C++, and has the option to compile C++ code into MSIL
(the .NET bytecode).
Microsoft’s compiler only supports Windows systems,
and Intel-compatible 16/32/64 bit architectures.
The Microsoft C compiler is cl.exe and the linker is
link.exe
bugging symbols so you can see the line numbers in the HLA syntax for code examples, but that may change in
listing. The -fno-asynchronous-unwind-tables flag can the future.
help eliminate some macros in the listing.
This compiler is commonly used for embedded systems. IDA Pro is a professional disassembler that is expensive,
If you try to reverse-engineer a piece of consumer elec- extremely powerful, and has a whole slew of fea-
tronics, you may encounter code generated by Green Hills tures. The downside to IDA Pro is that it costs $515
C/C++. US for the standard single-user edition. As such this
wikibook will not consider IDA Pro specifically be-
cause the price tag is exclusionary. Freeware ver-
sions do exist; see below.
1.2 Disassemblers and Decompil-
ers • (version 6.x) http://www.hex-rays.com/idapro/
http://www.caesum.com/
Commercial Freeware/Shareware Windows Disas-
semblers
HT Editor An analyzing disassembler for Intel x86 in-
OllyDbg OllyDbg is one of the most popular disassem- structions. The latest version runs as a console GUI
blers recently. It has a large community and a wide program on Windows, but there are versions com-
variety of plugins available. It emphasizes binary piled for Linux as well.
code analysis. Supports x86 instructions only (no http://hte.sourceforge.net/
x86_64 support for now, although it is on the way).
BugDbg is a 64-bit user-land debugger designed to de- ciasdis The official name of ciasdis is com-
bug native 64-bit applications on Windows. puter_intelligence_assembler_disassembler. This
Forth-based tool allows to incrementally and
http://www.pespin.com/ interactively build knowledge about a code body.
It is unique that all disassembled code can be
DSMHELP Disassemble Help Library is a disas- re-assembled to the exact same code. Processors
sembler library with single line Epimorphic are 8080, 6809, 8086, 80386, Pentium I en DEC
assembler. Supported instruction sets - Ba- Alpha. A scripting facility aids in analyzing Elf and
MSDOS headers and makes this tool extendable.
sic,System,SSE,SSE2,SSE3,SSSE3,SSE4,SSE4A,MMX,FPU,3DNOW,VMX,SVM,AVX,AVX2,BMI1,BMI2,F16C,FMA3,FMA
The Pentium I ciasdis is available as a binary image,
http://dsmhelp.narod.ru/ (in Russian) others are in source form, loadable onto lina Forth,
available from the same site.
ArkDasm is a 64-bit interactive disassembler and de-
http://home.hccnet.nl/a.w.m.van.der.horst/ciasdis.html
bugger for Windows. Supported processor: x64 ar-
chitecture (Intel x64 and AMD64)
objdump comes standard, and is typically used for gen-
http://www.arkdasm.com/ eral inspection of binaries. Pay attention to the relo-
cation option and the dynamic symbol table option.
SharpDisam is a C# port of the udis86 x86 / x86-64
gdb comes standard, as a debugger, but is very often
disassembler
used for disassembly. If you have loose hex dump
http://sharpdisasm.codeplex.com/ data that you wish to disassemble, simply enter it
(interactively) over top of something else or com-
pile it into a program as a string like so: char foo[]
Unix Disassemblers = {0x90, 0xcd, 0x80, 0x90, 0xcc, 0xf1, 0x90};
Many of the Unix disassemblers, especially the open lida linux interactive disassembler an interactive dis-
source ones, have been ported to other platforms, like assembler with some special functions like a crypto
Windows (mostly using MinGW or Cygwin). Some Dis- analyzer. Displays string data references, does code
assemblers like otool (OS X) are distro-specific. flow analysis, and does not rely on objdump. Uti-
lizes the Bastard disassembly library for decoding
Capstone Capstone is an open source disassembly single opcodes. The project was started in 2004 and
framework for multi-arch (including support for remains dormant to this day.
x86, x86_64) & multi-platform (including Mac
http://lida.sourceforge.net
OSX, Linux, *BSD, Android, iOS, Solaris) with ad-
vanced features.
dissy This program is a interactive disassembler that
http://www.capstone-engine.org/ uses objdump.
http://code.google.com/p/dissy/
Bastard Disassembler The Bastard disassembler is a
powerful, scriptable disassembler for Linux and EmilPRO replacement for the deprecated dissy disas-
FreeBSD. sembler.
http://bastard.sourceforge.net/ http://github.com/SimonKagstrom/emilpro
ndisasm NASM’s disassembler for x86 and x86-64. x86dis This program can be used to display binary
Works on DOS, Windows, Linux, Mac OS X and streams such as the boot sector or other unstructured
various other systems. binary files.
udis86 Disassembler Library for x86 and x86-64 ldasm LDasm (Linux Disassembler) is a Perl/Tk-based
GUI for objdump/binutils that tries to imitate the
http://udis86.sourceforge.net/ 'look and feel' of W32Dasm. It searches for cross-
references (e.g. strings), converts the code from
ZyanDisassembler Engine (Zydis) Fast and GAS to a MASM-like style, traces programs and
lightweight x86/x86-64 disassembler library. much more. Comes along with PTrace, a process-
flow-logger. Last updated in 2002, available from
https://github.com/zyantific/zyan-disassembler-engine Tucows.
10 CHAPTER 1. TOOLS
call and return boundaries. This loses valuable informa- Intel x86, ARM, MIPS, PIC32, and PowerPC ar-
tion about the way the program is structured. chitectures and outputs C or Python-like code, plus
flow charts and control flow graphs. It puts a running
time limit on each decompilation. It produces nice
1.2.4 Decompilers results in most cases.
In the face of optimizing compilers, it is not uncommon C4Decompiler C4Decompiler is an interactive, static
to be asked “Is decompilation even possible?" To some decompiler under development (Alpha in 2013). It
degree, it usually is. Make no mistake, however: an op- performs global analysis of the binary and presents
timizing compiler results in the irretrievable loss of in- the resulting C source in a Windows GUI. Context
formation. An example is in-lining, as explained above, menus support navigation, properties, cross refer-
where code called is combined with its surroundings, such ences, C/Asm mixed view and manipulation of the
that the places where the original subroutine is called can- decompile context (function ABI).
not even be identified. An optimizer that reverses that http://www.c4decompiler.com
process is comparable to an artificial intelligence program
that recreates a poem in a different language. So perfectly
Boomerang Decompiler Project Boomerang Decom-
operational decompilers are a long way off. At most, cur-
piler is an attempt to make a powerful, retargetable
rent Decompilers can be used as simply an aid for the
decompiler. So far, it only decompiles into C with
reverse engineering process leaving lots of arduous work.
moderate success.
http://boomerang.sourceforge.net/
Common Decompilers
Hex-Rays Decompiler Hex-Rays is a commercial de- Reverse Engineering Compiler (REC) REC is a pow-
compiler. It is made as an extension to popular erful “decompiler” that decompiles native assembly
IDA-Pro disassembler. It is currently the only vi- code into a C-like code representation. The code is
able commercially available decompiler which pro- half-way between assembly and C, but it is much
duces usable results. It supports both x86 and ARM more readable than the pure assembly is. Unfortu-
architecture. nately the program appears to be rather unstable.
http://www.hex-rays.com/products/decompiler/index. http://www.backerstreet.com/rec/rec.htm
shtml
ExeToC ExeToC decompiler is an interactive decom-
piler that boasted pretty good results in the past.
DCC DCC is likely one of the oldest decompilers in ex-
istence, dating back over 20 years. It serves as a http://sourceforge.net/projects/exetoc
good historical and theoretical frame of reference
for the decompilation process in general (Mirrors: ). snowman Snowman is an open source native code to
As of 2015, DCC is an active project. Some of the C/C++ decompiler. Supports ARM, x86, and x86-
latest changes include fixes for longstanding mem- 64 architectures. Reads ELF, Mach-O, and PE file
ory leaks and a more modern Qt5-based front-end. formats. Reconstructs functions, their names and
arguments, local and global variables, expressions,
RetDec The Retargetable Decompiler is a freeware web integer, pointer and structural types, all types of
decompiler that takes in ELF/PE/COFF binaries in control-flow structures, including switch. Has a nice
12 CHAPTER 1. TOOLS
graphical user interface with one-click navigation A macro-assembler like TASM will then use a macro like
between the assembler code and the reconstructed this one:
program. Has a command-line interface for batch _write macro message call write db message db 0 _write
processing. endm
https://derevenets.com
From a human disassembler’s point of view, this is a
nightmare, although this is straightforward to read in the
1.2.5 A General view of Disassembling original Assembly source code, as there is no way to de-
cide if the db should be interpreted or not from the binary
8 bit CPU code form, and this may contain various jumps to real exe-
cutable code area, triggering analysis of code that should
Most CPUs are 8-bit CPUs.[1] never be analysed, and interfering with the analysis of the
Normally when a subroutine is finished, it returns to ex- real code (e.g. disassembling the above code from 0000h
ecuting the next address immediately following the call or 0001h won't give the same results at all).
instruction. However a half-decent tool with possibilities to specifiy
However, assembly-language programmers occasionally rules, and heuristic means to identify texts will have little
use several different techniques that adjust the return ad- trouble.
dress, making disassembly more difficult:
32 bit CPU code
• jump tables,
Most 32-bit CPUs use the ARM instruction set.[1][2][3]
• calculated jumps, and
Typical ARM assembly code is a series of subroutines,
• a parameter after the call instruction. with literal constants scattered between subroutines. The
standard prolog and epilog for subroutines is pretty easy
to recognize.
jump tables and other calculated jumps On 8-bit
CPUs, calculated jumps are often implemented by push-
ing a calculated “return” address to the stack, then jump- A brief list of disassemblers
ing to that address using the “return” instruction. For ex-
ample, the RTS Trick uses this technique to implement • ciasdis “an assembler where the elements opcode,
jump tables (w:branch table). operands and modifiers are all objects, that are
reusable for disassembly.” For 8080 8086 80386 Al-
pha 6809 and should be usable for Pentium 68000
parameters after the call instruction Instead of 6502 8051.
picking up their parameters off the stack or out of some
fixed global address, some subroutines provide parame- • radare, the reverse engineering framework includes
ters in the addresses of memory that follow the instruc- open-source tools to disassemble code for many
tion that called that subroutine. Subroutines that use this processors including x86, ARM, PowerPC, m68k,
technique adjust the return address to skip over all the etc. several virtual machines including java, msil,
constant parameter data, then return to an address many etc., and for many platforms including Linux, BSD,
bytes after the “call” instruction. One of the more famous OSX, Windows, iPhoneOS, etc.
programs that used this technique is the “Sweet 16” vir-
• IDA, the Interactive Disassembler ( IDA Pro ) can
tual machine.
disassemble code for a huge number of processors,
The technique may make disassembly more difficult. including ARM Architecture (including Thumb
A simple example of this is the write() procedure imple- and Thumb-2), ATMEL AVR, INTEL 8051, IN-
mented as follows: TEL 80x86, MOS Technologies 6502, MC6809,
MC6811, M68H12C, MSP430, PIC 12XX, PIC
; assume ds = cs, e.g like in boot sector code start: call 14XX, PIC 18XX, PIC 16XXX, Zilog Z80, etc.
write ; push message’s address on top of stack db “Hello,
world”,0dh,0ah,00h ; return point ret ; back to DOS • Wikipedia: objdump, part of the GNU binutils, can
write proc near pop si ; get string address mov ah,0eh ; disassemble code for several processors and plat-
BIOS: write teletype w_loop: lodsb ; read char at [ds:si] forms. binutils is an important part of the toolchain
and increment si or al,al ; is it 00h? jz short w_exit int as it provides the linker, assembler and other utilties
10h ; write the character jmp w_loop ; continue writing (like objdump) to manipulate executables on the tar-
w_exit: jmp si write endp end start get platform, and is available for most popular plat-
forms.
1.4. ANALYSIS TOOLS 13
• For OS X/BSD systems, there is a rough equiv- Here are examples of C and C++ “Hello World!" pro-
alent called otool in the XCode kit. grams.
• Disassemblers at DMOZ lists a huge number of dis- #include <stdio.h> int main() { printf(“Hello World!\n”);
assemblers return 0; }
#include <iostream> int main() { std::cout << “Hello
• Program transformation wiki: disassembly lists World!\n"; return 0; }
many highly recommended disassemblers
• search for “disassemble” at SourceForge shows
many disassemblers for a variety of CPUs. 1.3.2 Example: Basic Disassembly
• Hopper is a disassembler that runs on OS-X and dis-
assembles 32/64-bit OS-X and windows binaries. Write a basic “Hello World!" program (see the example
above). Compile the program into an executable with
• The University of Queensland Binary Translator your favorite compiler, then disassemble it. How big is
(UQBT) is a reusable, component-based binary- the disassembled code file? How does it compare to the
translation framework that supports CISC, RISC, code from the listing file you generated? Can you explain
and stack-based processors. why the file is this size?
[3] Tom Krazit. “ARMed for the living room”. “ARM li- Debuggers are programs that allow the user to execute a
censed 1.6 billion cores [in 2005]". 2006. compiled program one step at a time. You can see what
instructions are executed in which order, and which sec-
• http://www.crackmes.de/ : reverse engineering tions of the program are treated as code and which are
challenges treated as data. Debuggers allow you to analyze the pro-
gram while it is running, to help you get a better picture
• “A Challengers Handbook” by Caesum has some of what it is doing.
tips on reverse engineering programs in JavaScript, Advanced debuggers often contain at least a rudimentary
Flash Actionscript (SWF), Java, etc. disassembler, often times hex editing and reassembly fea-
• the Open Source Institute occasionally has reverse tures. Debuggers often allow the user to set breakpoints
engineering challenges among its other brainteasers. on instructions, function calls, and even memory loca-
tions.
• The Program Transformation wiki has a Reverse en-
A breakpoint is an instruction to the debugger that allows
gineering and Re-engineering Roadmap, and dis-
program execution to be halted when a certain condition
cusses disassemblers, decompilers, and tools for
is met. For instance, when a program accesses a certain
translating programs from one high-level language
variable, or calls a certain API function, the debugger can
to another high-level language.
pause program execution.
• Other disassemblers with multi-platform support
Windows Debuggers
1.3.1 Example: Hello World Listing WinDbg WinDbg is a free piece of software from Mi-
crosoft that can be used for local user-mode de-
Write a simple “Hello World” program using C or C++ bugging, or even remote kernel-mode debugging.
and your favorite compiler. Generate a listing file from WinDbg is not the same as the better-known Vi-
the compiler. Does the code look the way you expect it sual Studio Debugger, but comes with a nifty GUI
to? Do you understand what the assembly code means? nonetheless. Available in 32 and 64-bit versions.
14 CHAPTER 1. TOOLS
Many of the open source debuggers on Linux, again, are ladebug An enhanced debugger on Tru64 Unix sys-
cross-platform. They may be available on some other tems from HP (originally Digital Equipment Cor-
Unix(-like) systems, or even Windows. Some of the de- poration) that handles advanced functionality like
buggers may give you better experience than the old and threads better than dbx.
native ones on your system.
DTrace An advanced tool on Solaris that provides func-
gdb The GNU debugger, comes with any normal Linux tions like profiling and many others on the entire sys-
install. It is quite powerful and even somewhat pro- tem, including the kernel.
grammable, though the raw user interface is harsh.
mdb The Modular Debugger (MDB) is a new general
lldb LLVM’s debugger. purpose debugging tool for the Solaris Operating
Environment. Its primary feature is its extensibility.
emacs The GNU editor, can be used as a front-end to The Solaris Modular Debugger Guide describes how
gdb. This provides a powerful hex editor and allows to use MDB to debug complex software systems,
full scripting in a LISP-like language. with a particular emphasis on the facilities available
for debugging the Solaris kernel and associated de-
ddd The Data Display Debugger. It’s another front-end vice drivers and modules. It also includes a complete
to gdb. This provides graphical representations of reference for and discussion of the MDB language
data structures. For example, a linked list will look syntax, debugger features, and MDB Module Pro-
just like a textbook illustration. gramming API.
strace, ltrace, and xtrace Lets you run a program Debugger Techniques
while watching the actions it performs. With strace,
you get a log of all the system calls being made. Setting Breakpoints As previously mentioned in the
With ltrace, you get a log of all the library calls be- section on disassemblers, a 6-line C program doing some-
ing made. With xtrace, you get a log of some of the thing as simple as outputting “Hello, World!" turns into
funtion calls being made. massive amounts of assembly code. Most people don't
want to sift through the entire mess to find out the in-
valgrind Executes a program under emulation, per- formation they want. It can be time consuming just to
forming analysis according to one of the many plug- find the information one desires by just looking through
in modules as desired. You can write your own plug- the code. As an alternative, one can choose to set break-
in module as desired. Newer versions of valgrind points to halt the program once it has reached a given
also support OS X. point within the program’s code.
1.4. ANALYSIS TOOLS 15
For instance, let’s say that in your program you consis- http://www.bpsoft.com/
tantly experience crashes after one particular event: im-
mediately after closing a message box. You set break- Tiny Hexer Free and does statistics. For Windows.
points on all calls to MessageBoxA. You run your pro-
gram with the breakpoints set, and it stops, ready to call http://www.mirkes.de/files/
MessageBoxA. Executing each line one-by-one thereafter
(referred to as stepping) through the code, and watching frhed - free hex editor For Windows. Free and open-
the program stack, you see that a buffer overflow occurs source.
soon after the call.
http://www.kibria.de/frhed.html
1.4.2 Hex Editors Cygnus Hex Editor For Windows. A very fast and
easy-to-use hex editor, available in a 'Free Edition'.
Hex editors are able to directly view and edit the binary
of a source file, and are very useful for investigating the http://www.softcircuits.com/cygnus/fe/
structure of proprietary closed-format data files. There
are many hex editors in existence. This section will at- Hexprobe Hex Editor For Windows. A professional
tempt to list some of the best, some of the most popular, hex editor designed to include all the power to deal
or some of the most powerful. with hex data, particularly helpful in the areas of
hex-byte editing and byte-pattern analysis.
HxD (Freeware) For Windows. A fast and powerful http://www.hexprobe.com/hexprobe/index.htm
free hex, disk and RAM editor
BreakPoint Hex Workshop For Windows. An excel- 1Fh For Windows. A free binary/hex editor which is
lent and powerful hex-editor, its usefulness is re- very fast, even while working with large files. It’s
stricted by the fact that it is not free like some of the only Windows hex editor that allows you to view
the other options. files in byte code (all 256-characters).
16 CHAPTER 1. TOOLS
A view of a small binary file in a 1Fh hex editor. xxd and any text editor Produce a hex dump with xxd,
freely edit it in your favorite text editor, and then
convert it back to a binary file with your changes
http://www.4neurons.com/1Fh/ included.
HexEdit For Windows (Open source) and shareware GHex Hex editor for GNOME.
versions. Powerful and easy to use binary file and
disk editor. http://directory.fsf.org/All_Packages_in_Directory/
ghex.html
http://www.hexedit.com/
Okteta The well-integrated hexeditor from KDE since
HexToolkit For Windows. A free hex viewer specifi- 4.1. Offers the traditional two-columns layout,
cally designed for reverse engineering file formats. one with numeric values (binary, octal, decic-
Allows data to be viewed in various formats and in- mal, hexdecimal) and one with characters (lots of
cludes an expression evaluator as well as a binary file charsets supported). Editing can be done in both
comparison tool. columns, with unlimited undo/redo. Small set of
http://www.binaryearth.net/HexToolkit tools (searching/replacing, strings, binary filter, and
more).
FlexHex For Windows. It Provides full support for http://utils.kde.org/projects/okteta
NTFS files which are based on a more complex
model than FAT32 files. Specifically, FlexHex sup- BEYE A viewer of binary files with built-in editor in
ports Sparse files and Alternate data streams of files binary, hexadecimal and disassembler modes. It
on any NTFS volume. Can be used to edit OLE uses native Intel syntax for disassembly. Highlight
compound files, flash cards, and other types of phys- AVR/Java/Athlon64/Pentium 4/K7-Athlon disas-
ical drives. sembler, Russian codepages converter, full preview
http://www.heaventools.com/flexhex-hex-editor.htm of formats - MZ, NE, PE, NLM, coff32, elf partial
- a.out, LE, LX, PharLap; code navigator and more
over. (
HT Editor For Windows. A file editor/viewer/analyzer
for executables. Its goal is to combine the low- http://beye.sourceforge.net/en/beye.html
level functionality of a debugger and the usability
of IDEs. BIEW A viewer of binary files with built-in editor in
http://hte.sourceforge.net/ binary, hexadecimal and disassembler modes. It
uses native Intel syntax for disassembly. Highlight
AVR/Java/Athlon64/Pentium 4/K7-Athlon disas-
HexEdit For MacOS. A simple but reliable hex editor
sembler, Russian codepages converter, full preview
wher you to change highlight colours. There is also
of formats - MZ, NE, PE, NLM, coff32, elf partial
a port for Apple Classic users.
- a.out, LE, LX, PharLap; code navigator and more
http://hexedit.sourceforge.net/ over. (PROJECT RENAMED, see BEYE)
http://biew.sourceforge.net/en/biew.html
Hex Fiend For MacOS. A very simple hex editor, but
incredibly powerful nonetheless. It’s only 346 KB to
download and takes files as big as 116 GB. hview A curses based hex editor designed to work with
large (600+MB) files with as quickly, and with little
http://ridiculousfish.com/hexfiend/ overhead, as possible.
1.4. ANALYSIS TOOLS 17
hexedit View and edit files in hexadecimal or in ASCII. PE File Header dumpers
http://rigaux.org/hexedit.html Dumpbin Dumpbin is a program that previously used
to be shipped with MS Visual Studio, but recently
Data Workshop An editor to view and modify binary the functionality of Dumpbin has been incorporated
data; provides different views which can be used to into the Microsoft Linker, link.exe. to access dump-
edit, analyze and export the binary data. bin, pass /dump as the first parameter to link.exe:
http://www.dataworkshop.de/
link.exe /dump [options]
VCHE A hex editor which lets you see all 256 characters It is frequently useful to simply create a batch
as found in video ROM, even control and extended file that handles this conversion:
ASCII, it uses the /dev/vcsa* devices to do it. It also
could edit non-regular files, like hard disks, floppies, ::dumpbin.bat link.exe /dump %*
CDROMs, ZIPs, RAM, and almost any device. It
All examples in this wikibook that use dumpbin will
comes with a ncurses and a raw version for people
call it in this manner.
who work under X or remotely.
http://www.grigna.com/diego/linux/vche/ Here is a list of useful features of dumpbin :
http://msdn.microsoft.com/library/default.
1.4.3 Other Tools for Windows asp?url=/library/en-us/vccore/html/_core_
dumpbin_reference.asp
Resource Monitors
Depends Dependency Walker is a GUI tool which will
SysInternals Freeware This page has a large number allow you to see exports and imports of binaries. It
of excellent utilities, many of which are very use- ships with many Microsoft tools including MS Vi-
ful to security experts, network administrators, and sual Studio.
(most importantly to us) reversers. Specifically,
check out Process Monitor, FileMon, RegMon,
TCPView, and Process Explorer. 1.4.4 GNU Tools
http://technet.microsoft.com/sysinternals/default.aspx The GNU packages have been ported to many platforms
including Windows.
API Monitors
GNU BinUtils The GNU BinUtils package contains
SpyStudio Freeware The Spy Studio software is a tool several small utilities that are very useful in deal-
to hook into windows processes, log windows API ing with binary files. The most important programs
call to DLLs, insert breakpoints and change param- in the list are the GNU objdump, readelf, GAS as-
eters. sembler, and the GNU linker, although the reverser
might find more use in addr2line, c++filt, nm, and
http://www.nektra.com/products/spystudio/ readelf.
18 CHAPTER 1. TOOLS
dprobes Lets you work with both kernel and user code.
Platforms
2.1 Microsoft Windows was good enough). It also handles all string operations
internally in Unicode, giving more flexibility when us-
ing different languages. Operating Systems based on the
WinNT kernel are: Windows NT (versions 3.1, 3.5, 3.51
and 4.0), Windows 2000 (NT 5.0), Windows XP (NT
2.1.1 Microsoft Windows 5.1), Windows Server 2003 (NT 5.2), Windows Vista
(NT 6.0), Windows 7 (NT 6.1), Windows 8 (NT 6.2),
The Windows operating system is a popular reverse en- Windows 8.1 (NT 6.3), and Windows 10 (NT 10.0).
gineering target for one simple reason: the OS itself (mar- The Microsoft XBOX and and XBOX 360 also run a
ket share, known weaknesses), and most applications for variant of NT, forked from Windows 2000. Most future
it, are not Open Source or free. Most software on a Win- Microsoft operating system products are based on NT in
dows machine doesn't come bundled with its source code, some shape or form.
and most pieces have inadequate, or non-existent docu-
mentation. Occasionally, the only way to know precisely
what a piece of software does (or for that matter, to de- 2.1.3 Virtual Memory
termine whether a given piece of software is malicious or
legitimate) is to reverse it, and examine the results. 32 bit WinNT allows for a maximum of 4Gb of virtual
memory space, divided into “pages” that are 4096 bytes
by default. Pages not in current use by the system or any
2.1.2 Windows Versions of the applications may be written to a special section on
the harddisk known as the “paging file.” Use of the paging
Windows operating systems can be easily divided into 2 file may increase performance on some systems, although
categories: Win9x, and WinNT. high latency for I/O to the HDD can actually reduce per-
formance in some instances.
Windows 9x
The Win9x kernel was originally written to span the 16bit 2.1.4 System Architecture
- 32bit divide. Operating Systems based on the 9x ker-
nel are: Windows 95, Windows 98, and Windows ME. The Windows architecture is heavily layered. Function
Win9x Series operating systems are known to be prone calls that a programmer makes may be redirected 3 times
or more before any action is actually performed. There is
to bugs and system instability. The actual OS itself was
a 32 bit extension of MS-DOS, its predecessor. An im- an unignorable penalty for calling Win32 functions from
a user-mode application. However, the upside is equally
portant issue with the 9x line is that they were all based
unignorable: code written in higher levels of the windows
around using the ASCII format for storing strings, rather
than Unicode. system is much easier to write. Complex operations that
involve initializing multiple data structures and calling
Development on the Win9x kernel ended with the release multiple sub-functions can be performed by calling only
of Windows ME. a single higher-level function.
The Win32 API comprises 3 modules: KERNEL32,
Windows NT USER32, and GDI32. KERNEL32 is layered on top
of NTDLL, and most calls to KERNEL32 functions are
The WinNT kernel series was originally written as simply redirected into NTDLL function calls. USER32
enterprise-level server and network software. WinNT and GDI32 are both based on WIN32K (a kernel-
stresses stability and security far more than Win9x ker- mode module, responsible for the Windows “look and
nels did (although it can be debated whether that stress feel”), although USER32 also makes many calls to the
19
20 CHAPTER 2. PLATFORMS
more-primitive functions in GDI32. This and NTDLL GDI diverts most of its calls into WIN32K, but it does
both provide an interface to the Windows NT kernel, contain a manager for GDI objects, such as pens, brushes
NTOSKRNL (see further below). and device contexts. The GDI object manager and the
NTOSKRNL is also partially layered on HAL (Hardware KERNEL object manager are completely separate.
Abstraction Layer), but this interaction will not be con-
sidered much in this book. The purpose of this layer- user32.dll
ing is to allow processor variant issues (such as location
of resources) to be made separate from the actual ker- The USER subsystem is located in the user32.dll library
nel itself. A slightly different system configuration thus file. This subsystem controls the creation and manipula-
requires just a different HAL module, rather than a com- tion of USER objects, which are common screen items
pletely different kernel module. such as windows, menus, cursors, etc... USER will set
up the objects to be drawn, but will perform the actual
drawing by calling on GDI (which in turn will make many
2.1.5 System calls and interrupts calls to WIN32K) or sometimes even calling WIN32K
directly. USER utilizes the GDI Object Manager.
After filtering through different layers of subroutines,
most API calls require interaction with part of the op-
erating system. Services are provided via 'software in- 2.1.7 Native API
terrupts’, traditionally through the “int 0x2e” instruction.
This switches control of execution to the NT executive / The native API, hereby referred to as the NTDLL sub-
kernel, where the request is handled. It should be pointed
system, is a series of undocumented API function calls
out here that the stack used in kernel mode is different that handle most of the work performed by KERNEL32.
from the user mode stack. This provides an added layer Microsoft also does not guarantee that the native API will
of protection between kernel and user. Once the function remain the same between different versions, as Windows
completes, control is returned back to the user applica- developers modify the software. This gives the risk of na-
tion. tive API calls being removed or changed without warning,
Both Intel and AMD provide an extra set of instructions breaking software that utilizes it.
to allow faster system calls, the “SYSENTER” instruction
from Intel and the SYSCALL instruction from AMD. ntdll.dll
• Dbg functions are present to enable debugging rou- Vista may be better known by its development code-name
tines and operations “Longhorn.” Microsoft claims that Vista has been writ-
ten largely from the ground up, and therefore it can be
• Ldr provides the ability to load, manipulate and re- assumed that there are fundamental differences between
trieve data from DLLs and other module resources the Vista API and system architecture, and the APIs and
architectures of previous Windows versions. Windows
User Mode Versus Kernel Mode Vista was released January 30th, 2007.
This module is the Windows NT "'Executive'", providing Recent windows service packs have attempted to im-
all the functionality required by the native API, as well plement a system known as “Non-executable memory”
as the kernel itself, which is responsible for maintaining where certain pages can be marked as being “non-
the machine state. By default, all interrupts and kernel executable”. The purpose of this system is to prevent
calls are channeled through ntoskrnl in some way, mak- some of the most common security holes by not allow-
ing it the single most important program in Windows it- ing control to pass to code inserted into a memory buffer
self. Many of its functions are exported (all of which with by an attacker. For instance, a shellcode loaded into an
various prefixes, a la NTDLL) for use by device drivers. overflowed text buffer cannot be executed, stopping the
attack in its tracks. The effectiveness of this mechanism
is yet to be seen, however.
2.1.9 Win32K.sys
This module is the “Win32 Kernel” that sits on top of the 2.1.14 COM and Related Technologies
lower-level, more primitive NTOSKRNL. WIN32K is re-
sponsible for the “look and feel” of windows, and many COM, and a whole slew of technologies that are either re-
portions of this code have remained largely unchanged lated to COM or are actually COM with a fancy name, is
since the Win9x versions. This module provides many of another factor to consider when reversing Windows bina-
the specific instructions that cause USER and GDI to act ries. COM, DCOM, COM+, ActiveX, OLE, MTS, and
the way they do. It’s responsible for translating the API Windows DNA are all names for the same subject, or
calls from the USER and GDI libraries into the pictures subjects, so similar that they may all be considered under
you see on the monitor. the same heading. In short, COM is a method to export
Object-Oriented Classes in a uniform, cross-platform and
cross-language manner. In essence, COM is .NET, ver-
2.1.10 Win64 API sion 0 beta. Using COM, components written in many
languages can export, import, instantiate, modify, and de-
With the advent of 64-bit processors, 64-bit software is a stroy objects defined in another file, most often a DLL.
necessity. As a result, the Win64 API was created to uti- Although COM provides cross-platform (to some extent)
lize the new hardware. It is important to note that the for- and cross-language facilities, each COM object is com-
mat of many of the function calls are identical in Win32 piled to a native binary, rather than an intermediate for-
and Win64, except for the size of pointers, and other data mat such as Java or .NET. As a result, COM does not
types that are specific to 64-bit address space. require a virtual machine to execute such objects.
This book will attempt to show some examples of COM
Differences files, and the reversing challenges associated with them,
although the subject is very broad, and may elude the
scope of this book (or at least the early sections of it). The
2.1.11 Windows Vista
discussion may be part of an “Advanced Topic” found in
Microsoft has released a new version of its Windows the later sections of this book.
operation system, named “Windows Vista.” Windows Due to the way that COM works, a lot of the methods and
22 CHAPTER 2. PLATFORMS
data structures exported by a COM component are diffi- “offset” specifying an offset into that window. The seg-
cult to perceive by simply inspecting the executable file. ment register would be set by DOS and the COM file
Matters are made worse if the creating programmer has would be expected to respect this setting and not ever
used a library such as ATL to simplify their programming change the segment registers. The offset registers, how-
experience. Unfortunately for a reverse engineer, this re- ever, were fair game and served (for COM files) the same
duces the contents of an executable into a “Sea of bits”, purpose as a modern 32-bit register. The downside was
with pointers and data structures everywhere. that the offset registers were only 16-bit and, therefore,
since COM files could not change the segment registers,
COM files were limited to using 64K of RAM. The good
2.1.15 Remote Procedure Calls (RPC) thing about this approach, however, was that no extra
work was needed by DOS to load and run a COM file:
RPC is a generic term referring to techniques that allow a just load the file, set the segment register, and jump to it.
program running on one machine to make calls that actu- (The programs could perform 'near' jumps by just giving
ally execute on another machine. Typically, this is done an offset to jump to.)
by marshalling all of the data needed for the procedure in- COM files are loaded into RAM at offset $100. The
cluding any state information stored on the first machine, space before that would be used for passing data to and
and building it into a single data structure, which is then from DOS (for example, the contents of the command
transmitted over some communications method to a sec- line used to invoke the program).
ond machine. This second machine then performs the re-
quested action, and returns a data packet containing any Note that COM files, by definition, cannot be 32-bit.
results and potentially changed state information to the Windows provides support for COM files via a special
originating machine. CPU mode.
In Windows NT, RPC is typically handled by having two
libraries that are similarly named, one which generates 2.2.2 MS-DOS EXE Files
RPC requests and accepts RPC returns, as requested by
a user-mode program, and one which responds to RPC One way MS-DOS compilers got around the 64K mem-
requests and returns results via RPC. A classic example ory limitation was with the introduction of memory
is the print spooler, which consists of two pieces: the models. The basic concept is to cleverly set different seg-
RPC stub spoolss.dll, and the spooler proper and RPC ment registers in the x86 CPU (CS, DS, ES, SS) to point
service provider, spoolsv.exe. In most machines, which to the same or different segments, thus allowing varying
are stand-alone, it would seem that the use of two mod- degrees of access to memory. Typical memory models
ules communicating by means of RPC is overkill; why were:
not simply roll them into a single routine? In networked
printing, though, this makes sense, as the RPC service
provider can be resident physically on a distant machine, tiny All memory accesses are 16-bit (segment registers
with the remote printer, and the local machine can con- unchanged). Produces a .COM file instead of an
trol the printer on the remote machine in exactly the same .EXE file.
way that it controls printers on the local machine.
small All memory accesses are 16-bit (segment registers
unchanged).
[1] https://msdn.microsoft.com/en-us/library/windows/
hardware/ff565646(v=vs.85).aspx compact Data addresses include both segment and off-
set, reloading the DS or ES registers on access and
allowing up to 1M of data. Code accesses don't
2.2 Windows Executable Files change the CS register, allowing 64K of code.
2.2.3 PE Files
The DOS header is also known by some as the EXE The first big chunk of information lies in the COFF
header. Here is the DOS header presented as a C data header, directly after the PE signature.
structure:
struct DOS_Header { // short is 2 bytes, long is 4 bytes COFF Header
char signature[2] = “MZ"; short lastsize; short nblocks;
short nreloc; short hdrsize; short minalloc; short maxal- The COFF header is present in both COFF object files
loc; void *ss; void *sp; short checksum; void *ip; void (before they are linked) and in PE files where it is known
*cs; short relocpos; short noverlay; short reserved1[4]; as the “File header”. The COFF header has some infor-
short oem_id; short oem_info; short reserved2[10]; long mation that is useful to an executable, and some informa-
e_lfanew; } tion that is more useful to an object file.
Here is the COFF header, presented as a C data structure:
After the DOS header there is a stub program mentioned struct COFFHeader { short Machine; short Num-
in the paragraph above the DOS header structure. Listed berOfSections; long TimeDateStamp; long Point-
below is a commented example of that program, it was erToSymbolTable; long NumberOfSymbols; short
taken from a program compiled with GCC. SizeOfOptionalHeader; short Characteristics; }
;# Using NASM with Intel syntax push cs ;# Push CS
onto the stack pop ds ;# Set DS to CS mov dx,message
; point to our message “This program cannot be run Machine This field determines what machine the file
in DOS mode.”, 0x0d, 0x0d, 0x0a, '$' mov ah, 09 int was compiled for. A hex value of 0x14C (332 in
0x21 ;# when AH = 9, DOS interrupt to write a string ;# decimal) is the code for an Intel 80386.
terminate the program mov ax,0x4c01 int 0x21 message
db “This program cannot be run in DOS mode.”, 0x0d, Here’s a list of possible values it can have.
0x0d, 0x0a, '$'
NumberOfSections The number of sections that are
described at the end of the PE headers.
MajorLinkerVersion The major version number of the MinorSubsystemVersion The minor version number
linker. of the subsystem.
MinorLinkerVersion The minor version number of Win32VersionValue This member is reserved and
the linker. must be 0.
SizeOfCode The size of the code section, in bytes, or SizeOfImage The size of the image, in bytes, includ-
the sum of all such sections if there are multiple code ing all headers. Must be a multiple of SectionAlign-
sections. ment.
SizeOfInitializedData The size of the initialized data SizeOfHeaders The combined size of the following
section, in bytes, or the sum of all such sections if items, rounded to a multiple of the value specified
there are multiple initialized data sections. in the FileAlignment member.
IMAGE_SCN_MEM_PURGEABLE 0x00020000 #de- is loaded at runtime by the operating system. This is also
fine IMAGE_SCN_MEM_16BIT 0x00020000 #define known as a “Dynamically linked library”, or DLL. A li-
IMAGE_SCN_MEM_LOCKED 0x00040000 #define brary is a module containing a series of functions or val-
IMAGE_SCN_MEM_PRELOAD 0x00080000 #define ues that can be exported. This is different from the term
IMAGE_SCN_ALIGN_1BYTES 0x00100000 // #de- executable, which imports things from libraries to do what
fine IMAGE_SCN_ALIGN_2BYTES 0x00200000 it wants. From here on, “module” means any file of PE
// #define IMAGE_SCN_ALIGN_4BYTES format, and a “Library” is any module which exports and
0x00300000 // #define IM- imports functions and values.
AGE_SCN_ALIGN_8BYTES 0x00400000 // #define
Dynamically linking has the following benefits:
IMAGE_SCN_ALIGN_16BYTES 0x00500000 //
Default alignment if no others are specified. #define IM-
AGE_SCN_ALIGN_32BYTES 0x00600000 // #define • It saves disk space, if more than one executable links
IMAGE_SCN_ALIGN_64BYTES 0x00700000 // #de- to the library module
fine IMAGE_SCN_ALIGN_128BYTES 0x00800000 • Allows instant updating of routines, without provid-
// #define IMAGE_SCN_ALIGN_256BYTES ing new executables for all applications
0x00900000 // #define IM-
AGE_SCN_ALIGN_512BYTES 0x00A00000 // #de- • Can save space in memory by mapping the code of
fine IMAGE_SCN_ALIGN_1024BYTES 0x00B00000 a library into more than one process
// #define IMAGE_SCN_ALIGN_2048BYTES
• Increases abstraction of implementation. The
0x00C00000 // #define IM-
method by which an action is achieved can be mod-
AGE_SCN_ALIGN_4096BYTES 0x00D00000 // #de-
ified without the need for reprogramming of appli-
fine IMAGE_SCN_ALIGN_8192BYTES 0x00E00000
cations. This is extremely useful for backward com-
// #define IMAGE_SCN_ALIGN_MASK 0x00F00000
patibility with operating systems.
#define IMAGE_SCN_LNK_NRELOC_OVFL
0x01000000 // Section contains extended reloca-
tions. #define IMAGE_SCN_MEM_DISCARDABLE This section discusses how this is achieved using the PE
0x02000000 // Section can be discarded. #de- file format. An important point to note at this point is that
fine IMAGE_SCN_MEM_NOT_CACHED anything can be imported or exported between modules,
0x04000000 // Section is not cachable. #define including variables as well as subroutines.
IMAGE_SCN_MEM_NOT_PAGED 0x08000000
// Section is not pageable. #define IM- Loading
AGE_SCN_MEM_SHARED 0x10000000 // Section
is shareable. #define IMAGE_SCN_MEM_EXECUTE The downside of dynamically linking modules together is
0x20000000 // Section is executable. #define IM- that, at runtime, the software which is initialising an ex-
AGE_SCN_MEM_READ 0x40000000 // Section ecutable must link these modules together. For various
is readable. #define IMAGE_SCN_MEM_WRITE reasons, you cannot declare that “The function in this dy-
0x80000000 // Section is writeable. namic library will always exist in memory here". If that
memory address is unavailable or the library is updated,
the function will no longer exist there, and the application
trying to use it will break. Instead, each module (library
2.2.7 Imports and Exports - Linking to or executable) must declare what functions or values it ex-
other modules ports to other modules, and also what it wishes to import
from other modules.
What is linking?
As said above, a module cannot declare where in memory
Whenever a developer writes a program, there are a num- it expects a function or value to be. Instead, it declares
ber of subroutines and functions which are expected to be where in its own memory it expects to find a pointer to
implemented already, saving the writer the hassle of hav- the value it wishes to import. This permits the module
ing to write out more code or work with complex data to address any imported value, wherever it turns up in
structures. Instead, the coder need only declare one call memory.
to the subroutine, and the linker will decide what happens
next.
2.2.8 Exports
There are two types of linking that can be used: static and
dynamic. Static uses a library of precompiled functions. Exports are functions and values in one module that
This precompiled code can be inserted into the final exe- have been declared to be shared with other modules.
cutable to implement a function, saving the programmer This is done through the use of the “Export Direc-
a lot of time. In contrast, dynamic linking allows subrou- tory”, which is used to translate between the name of
tine code to reside in a different file (or module), which an export (or “Ordinal”, see below), and a location in
28 CHAPTER 2. PLATFORMS
memory where the code or data can be found. The dressOfFunctions array point into the section which con-
start of the export directory is identified by the IM- tains the export directory, something that normal ex-
AGE_DIRECTORY_ENTRY_EXPORT entry of the ports should not do. At that location, there should
resource directory. All export data must exist in the same be a zero terminated ASCII string of format “Library-
section. The directory is headed by the following struc- Name.ExportName” for the appropriate place to forward
ture: this export to.
struct IMAGE_EXPORT_DIRECTORY { long Char-
acteristics; long TimeDateStamp; short MajorVersion; 2.2.9 Imports
short MinorVersion; long Name; long Base; long
NumberOfFunctions; long NumberOfNames; long The other half of dynamic linking is importing functions
*AddressOfFunctions; long *AddressOfNames; long and values into an executable or other module. Before
*AddressOfNameOrdinals; } runtime, compilers and linkers do not know where in
memory a value that needs to be imported could exist.
The “Characteristics” value is generally unused, Time- The import table solves this by creating an array of point-
DateStamp describes the time the export directory was ers at runtime, each one pointing to the memory location
generated, MajorVersion and MinorVersion should de- of an imported value. This array of pointers exists inside
scribe the version details of the directory, but their na- of the module at a defined RVA location. In this way, the
ture is undefined. These values have little or no impact linker can use addresses inside of the module to access
on the actual exports themselves. The “Name” value is values outside of it.
an RVA to a zero terminated ASCII string, the name of
this library name, or module.
The Import directory
Names and Ordinals Each exported value has both a The start of the import directory is pointed to by
name and an “ordinal” (a kind of index). The actual ex- both the IMAGE_DIRECTORY_ENTRY_IAT and IM-
ports themselves are described through AddressOfFunc- AGE_DIRECTORY_ENTRY_IMPORT entries of the
tions, which is an RVA to an array of RVAs, each pointing resource directory (the reason for this is uncer-
to a different function or value to be exported. The size tain). At that location, there is an array of IM-
of this array is in the value NumberOfFunctions. Each of AGE_IMPORT_DESCRIPTOR structures. Each of
these functions has an ordinal. The “Base” value is used these identify a library or module that has a value we need
as the ordinal of the first export, and the next RVA in the to import. The array continues until an entry where all the
array is Base+1, and so forth. values are zero. The structure is as follows:
the AddressOfNames array; to save searching for a string, The OriginalFirstThunk for that index identifies the IM-
the loader first checks the AddressOfNames entry corre- AGE_IMPORT_BY_NAME structure for a import that
sponding to “Hint”. needs to be resolved, and the FirstThunk for that index is
To summarise: The import table consists of a large ar- the index of another entry that needs to be resolved. This
ray of IMAGE_IMPORT_DESCRIPTORs, terminated continues until the FirstThunk value is −1, indicating no
by an all-zero entry. These descriptors identify a library more forwarded values to import.
to import things from. There are then two parallel RVA
arrays, each pointing at IMAGE_IMPORT_BY_NAME
2.2.10 Resources
structures, which identify a specific value to be imported.
Resource structures
Imports at runtime
Resources are data items in modules which are diffi-
cult to be stored or described using the chosen pro-
Using the above import directory at runtime, the loader
gramming language. This requires a separate com-
finds the appropriate modules, loads them into mem-
piler or resource builder, allowing insertion of dialog
ory, and seeks the correct export. However, to be able
boxes, icons, menus, images, and other types of re-
to use the export, a pointer to it must be stored some-
sources, including arbitrary binary data. A number of
where in the importing module’s memory. This is why
API calls can then be used to retrieve resources from
there are two parallel arrays, OriginalFirstThunk and
the module. The base of resource data is pointed to by
FirstThunk, identifying IMAGE_IMPORT_BY_NAME
the IMAGE_DIRECTORY_ENTRY_RESOURCE en-
structures. Once an imported value has been resolved,
try of the data directory, and at that location there is an
the pointer to it is stored in the FirstThunk array. It can
IMAGE_RESOURCE_DIRECTORY structure:
then be used at runtime to address imported values.
struct IMAGE_RESOURCE_DIRECTORY { long
Characteristics; long TimeDateStamp; short MajorVer-
Bound imports sion; short MinorVersion; short NumberOfNamedEn-
tries; short NumberOfIdEntries; }
The PE file format also supports a peculiar feature known
as “binding”. The process of loading and resolving import
Characteristics is unused, and TimeDateStamp is nor-
addresses can be time consuming, and in some situations
mally the time of creation, although it doesn't matter
this is to be avoided. If a developer is fairly certain that
if it’s set or not. MajorVersion and MinorVersion re-
a library is not going to be updated or changed, then the
late to the versioning info of the resources: the fields
addresses in memory of imported values will not change
have no defined values. Immediately following the IM-
each time the application is loaded. So, the import ad-
AGE_RESOURCE_DIRECTORY structure is a series
dress can be precomputed and stored in the FirstThunk
of IMAGE_RESOURCE_DIRECTORY_ENTRYs, the
array before runtime, allowing the loader to skip resolv-
number of which are defined by the total of NumberOf-
ing the imports - the imports are “bound” to a particu-
NamedEntries and NumberOfIdEntries. The first por-
lar memory location. However, if the versions numbers
tion of these entries are for named resources, the latter
between modules do not match, or the imported library
for ID resources, depending on the values in the IM-
needs to be relocated, the loader will assume the bound
AGE_RESOURCE_DIRECTORY struct. The actual
addresses are invalid, and resolve the imports anyway.
shape of the resource entry structure is as follows:
The “TimeDateStamp” member of the import directory
struct IMAGE_RESOURCE_DIRECTORY_ENTRY {
entry for a module controls binding; if it is set to zero,
long NameId; long *Data; }
then the import directory is not bound. If it is non-zero,
then it is bound to another module. However, the Time-
DateStamp in the import table must match the TimeDat- The NameId value has dual purpose: if the most signif-
eStamp in the bound module’s FileHeader, otherwise the icant bit (or sign bit) is clear, then the lower 16 bits are
bound values will be discarded by the loader. an ID number of the resource. Alternatly, if the top bit
is set, then the lower 31 bits make up an offset from the
start of the resource data to the name string of this partic-
Forwarding and binding Binding can of course be a ular resource. The Data value also has a dual purpose: if
problem if the bound library / module forwards its exports the most significant bit is set, the remaining 31 bits form
to another module. In these cases, the non-forwarded im- an offset from the start of the resource data to another
ports can be bound, but the values which get forwarded IMAGE_RESOURCE_DIRECTORY (i.e. this entry is
must be identified so the loader can resolve them. This an interior node of the resource tree). Otherwise, this is
is done through the ForwarderChain member of the im- a leaf node, and Data contains the offset from the start
port descriptor. The value of “ForwarderChain” is an in- of the resource data to a structure which describes the
dex into the FirstThunk and OriginalFirstThunk arrays. specifics of the resource data itself (which can be consid-
30 CHAPTER 2. PLATFORMS
• A .DLL file extension If is often useful to determine which functions are im-
ported from external libraries when examining a pro-
• A DllMain() entry point, instead of a WinMain() or gram. To list import files to the console, use dumpbin
main(). in the following manner:
• The DLL flag set in the PE header. dumpbin /IMPORTS <dll file>
You can also use depends.exe to list imported and ex-
DLLs may be loaded in one of two ways, a) at load-time, ported functions. Depends is a a GUI tool and comes
or b) by calling the LoadModule() Win32 API function. with Microsoft Platform SDK.
2.3. LINUX 31
{
The Linux Executable Files page of the X86 Disassembly
Wikibook is a stub. You can help by expanding this section.
.text
{
The ELF file format (short for Executable and Link-
ing Format) was developed by Unix System Laborato- ...
ries to be a successor to previous file formats such as
COFF and a.out. In many respects, the ELF format is .data
more powerful and versatile than previous formats, and
has widely become the standard on Linux, Solaris, IRIX,
and FreeBSD (although the FreeBSD-derived Mac OS Section header table
X uses the Mach-O format instead). ELF has also been
adopted by OpenVMS for Itanium and BeOS for x86.
Historically, Linux has not always used ELF; Red Hat An ELF file has two views: the program header shows the seg-
Linux 4 was the first time that distribution used ELF; pre- ments used at run-time, while the section header lists the set of
vious versions had used the a.out format. sections of the binary.
Code Patterns
3.1 The Stack because other functions may overwrite these values with-
out your knowledge.
Users of Windows ME, 98, 95, 3.1 (and earlier) may
fondly remember the infamous “Blue Screen of Death”
-- that was sometimes caused by a stack overflow excep-
3.1.1 The Stack tion. This occurs when too much data is written to the
stack, and the stack “grows” beyond its limits. Modern
operating systems use better bounds-checking and error
recovery to reduce the occurrence of stack overflows, and
Push Pop to maintain system stability after one has occurred.
33
34 CHAPTER 3. CODE PATTERNS
to artificially move esp forward. We can then access our 3.2 Functions and Stack Frames
reserved memory directly as a memory pointer, or we can
access it indirectly as an offset value from esp itself.
Say we wanted to create an array of byte values on the
stack, 100 items long. We want to store the pointer to the
base of this array in edi. How do we do it? Here is an 3.2.1 Functions and Stack Frames
example:
sub esp, 100 ; num of bytes in our array mov edi, esp ; To allow for many unknowns in the execution envi-
copy address of 100 bytes area to edi ronment, functions are frequently set up with a "stack
frame" to allow access to both function parameters, and
automatic function variables. The idea behind a stack
To destroy that array, we simply write the instruction frame is that each subroutine can act independently of its
add esp, 100 location on the stack, and each subroutine can act as if it
is the top of the stack.
When a function is called, a new stack frame is created at
the current esp location. A stack frame acts like a par-
3.1.4 Reading Without Popping tition on the stack. All items from previous functions
are higher up on the stack, and should not be modified.
Each current function has access to the remainder of the
To read values on the stack without popping them off the
stack, from the stack frame until the end of the stack page.
stack, esp can be used with an offset. For instance, to
The current function always has access to the “top” of the
read the 3 DWORD values from the top of the stack into
stack, and so functions do not need to take account of the
eax (but without using a pop instruction), we would use
memory usage of other functions or programs.
the instructions:
mov eax, DWORD PTR SS:[esp] mov eax, DWORD
PTR SS:[esp + 4] mov eax, DWORD PTR SS:[esp + 8] 3.2.2 Standard Entry Sequence
Remember, since esp moves downward as the stack For many compilers, the standard function entry sequence
grows, data on the stack can be accessed with a posi- is the following piece of code (X is the total size, in bytes,
tive offset. A negative offset should never be used be- of all automatic variables used in the function):
cause data “above” the stack cannot be counted on to stay push ebp mov ebp, esp sub esp, X
the way you left it. The operation of reading from the
stack without popping is often referred to as “peeking”,
but since this isn't the official term for it this wikibook For example, here is a C function code fragment and the
won't use it. resulting assembly instructions:
void MyFunction() { int a, b, c; ...
_MyFunction: push ebp ; save the value of ebp mov ebp,
esp ; ebp now points to the top of the stack sub esp, 12 ;
3.1.5 Data Allocation space allocated on the stack for the local variables
It produces the following assembly code: 2007/03/12/fpo.aspx)] This means that the value of esp
_MyFunction2: push ebp mov ebp, esp sub esp, 0 ; no cannot be reliably used to determine (using the appropri-
local variables, most compilers will omit this line ate offset) the memory location of a specific local vari-
able. To solve this problem, many compilers access local
variables using negative offsets from the ebp registers.
Which is exactly as one would expect. So, what ex- This allows us to assume that the same offset is always
actly does ebp do, and where are the function parameters used to access the same variable (or parameter). For this
stored? The answer is found when we call the function. reason, the ebp register is called the frame pointer, or
Consider the following C function call: FP.
MyFunction2(10, 5, 2);
3.2.3 Standard Exit Sequence
This will create the following assembly code (using
a Right-to-Left calling convention called CDECL, ex- The Standard Exit Sequence must undo the things that the
plained later): Standard Entry Sequence does. To this effect, the Stan-
push 2 push 5 push 10 call _MyFunction2 dard Exit Sequence must perform the following tasks, in
the following order:
Note: Remember that the call x86 instruction is basically
equivalent to 1. Remove space for local variables, by reverting esp
push eip + 2 ; return address is current address + size of to its old value.
two instructions jmp _MyFunction2
2. Restore the old value of ebp to its old value, which
is on top of the stack.
It turns out that the function arguments are all passed on
the stack! Therefore, when we move the current value 3. Return to the calling function with a ret command.
of the stack pointer (esp) into ebp, we are pointing ebp
directly at the function arguments. As the function code
pushes and pops values, ebp is not affected by esp. Re- As an example, the following C code:
member that pushing basically does this: void MyFunction3(int x, int y, int z) { int a, int b, int c;
sub esp, 4 ; “allocate” space for the new stack item mov ... return; }
[esp], X ; put new stack item value X in
Will create the following assembly code:
This means that first the return address and then the old _MyFunction3: push ebp mov ebp, esp sub esp, 12 ;
value of ebp are put on the stack. Therefore [ebp] points sizeof(a) + sizeof(b) + sizeof(c) ;x = [ebp + 8], y = [ebp
to the location of the old value of ebp, [ebp + 4] points to + 12], z = [ebp + 16] ;a = [ebp - 4] = [esp + 8], b = [ebp
the return address, and [ebp + 8] points to the first func- - 8] = [esp + 4], c = [ebp - 12] = [esp] mov esp, ebp pop
tion argument. Here is a (crude) representation of the ebp ret 12 ; sizeof(x) + sizeof(y) + sizeof(z)
stack at this point:
: : | 2 | [ebp + 16] (3rd function argument) | 5 | [ebp + 12]
(2nd argument) | 10 | [ebp + 8] (1st argument) | RA | [ebp
+ 4] (return address) | FP | [ebp] (old ebp value) | | [ebp - 3.2.4 Non-Standard Stack Frames
4] (1st local variable) : : : : | | [ebp - X] (esp - the current
stack pointer. The use of push / pop is valid now) Frequently, reversers will come across a subroutine that
The stack pointer value may change during the execution doesn't set up a standard stack frame. Here are some
of the current function. In particular this happens when: things to consider when looking at a subroutine that does
not start with a standard sequence:
• parameters are passed to another function;
• the pseudo-function “alloca()" is used. Using Uninitialized Registers
[FIXME: When parameters are passed into another func- When a subroutine starts using data in an uninitialized
tion the esp changing is not an issue. When that func- register, that means that the subroutine expects external
tion returns the esp will be back to its old value. So functions to put data into that register before it gets called.
why does ebp help there. This needs better explana- Some calling conventions pass arguments in registers, but
tion. (The real explanation is here, ESP is not really sometimes a compiler will not use a standard calling con-
needed: http://blogs.msdn.com/larryosterman/archive/ vention.
36 CHAPTER 3. CODE PATTERNS
If such a function needs to be replaced without reloading The function above takes 2 4-byte parameters, accessed
the application (or restarting the machine in case of ker- by offsets +8 and +12 from ebp. The function also has 1
nel patches) it can be achieved by inserting a jump to the variable created on the stack, accessed by offset +0 from
replacement function. A short jump instruction (which esp. The function is nearly identical to this C code:
can jump +/- 127 bytes) requires 2 bytes of storage space int Question1(int x, int y) { int z; z = x * 2; return y + z;
- just the amount that the “mov edi,edi” placeholder pro- }
vides. A jump to any memory location, in this case to
our replacement function, requires 5 bytes. These are
provided by the 5 no-operation bytes just preceding the
function. If a function thus patched gets called it will first
jump back by 5 bytes and then do a long jump to the re-
3.3.2 Example: Standard Entry Sequences
placement function. After the patch the memory might
Does the following function follow the Standard Entry
look like this
and Exit Sequences? if not, where does it differ?
LABEL: jmp REPLACEMENT_FUNCTION ; <-- 5
NOPs replaced by jmp FUNCTION: jmp short LABEL
; <-- mov edi has been replaced by short jump backwards _Question2: call _SubQuestion2 mov ecx, 2 mul ecx ret
push ebp mov ebp, esp ; <-- regular stack frame setup as
before The function does not follow the standard entry sequence,
because it doesn't set up a proper stack frame with ebp
The reason for using a 2-byte mov instruction at the be- and esp. The function basically performs the following C
ginning instead of putting 5 nops there directly, is to pre- instructions:
vent corruption during the patching process. There would int Question2() { return SubQuestion2() * 2; }
be a risk with replacing 5 individual instructions if the in-
3.4. CALLING CONVENTIONS 37
Although an optimizing compiler has chosen to take a few will generate the following code if passed Left-to-Right:
shortcuts. push a push b call _MyFunction
3.4 Calling Conventions and will generate the following code if passed Right-to-
Left:
push b push a call _MyFunction
3.4.1 Calling Conventions Return value Some functions return a value, and that
value must be received reliably by the function’s
Calling conventions are a standardized method for func- caller. The called function places its return value
tions to be implemented and called by the machine. A in a place where the calling function can get it when
calling convention specifies the method that a compiler execution returns. The called function stores the re-
sets up to access a subroutine. In theory, code from any turn value before executing the ret instruction.
compiler can be interfaced together, so long as the func-
tions all have the same calling conventions. In practice Cleaning the stack When arguments are pushed onto
however, this is not always the case. the stack, eventually they must be popped back off
Calling conventions specify how arguments are passed to again. Whichever function, the caller or the callee,
a function, how return values are passed back out of a is responsible for cleaning the stack must reset the
function, how the function is called, and how the function stack pointer to eliminate the passed arguments.
manages the stack and its stack frame. In short, the call-
ing convention specifies how a function call in C or C++ Calling function (the caller) The “parent” function
is converted into assembly language. Needless to say, that calls the subroutine. Execution resumes in the
there are many ways for this translation to occur, which calling function directly after the subroutine call,
is why it’s so important to specify certain standard meth- unless the program terminates inside the subroutine.
ods. If these standard conventions did not exist, it would
be nearly impossible for programs created using different
compilers to communicate and interact with one another. Called function (the callee) The “child” function that
gets called by the “parent.”
There are three major calling conventions that are used
with the C language: STDCALL, CDECL, and FAST-
Name Decoration When C code is translated to assem-
CALL. In addition, there is another calling conven-
bly code, the compiler will often “decorate” the
tion typically used with C++: THISCALL. There are
function name by adding extra information that the
other calling conventions as well, including PASCAL and
linker will use to find and link to the correct func-
FORTRAN conventions, among others. We will not con-
tions. For most calling conventions, the decoration
sider those conventions in this book.
is very simple (often only an extra symbol or two
to denote the calling convention), but in some ex-
3.4.2 Notes on Terminology treme cases (notably C++ “thiscall” convention), the
names are “mangled” severely.
There are a few terms that we are going to be using in this
chapter, which are mostly common sense, but which are Entry sequence (the function prologue) a few in-
worthy of stating directly: structions at the beginning of a function, which
prepare the stack and registers for use within the
Passing arguments “passing arguments” is a way of function.
saying that the calling function is writing data in the
place where the called function will look for them. Exit sequence (the function epilogue) a few instruc-
Arguments are passed before the call instruction is tions at the end of a function, which restore the stack
executed. and registers to the state expected by the caller, and
return to the caller. Some calling conventions clean
Right-to-Left and Left-to-Right These describe the the stack in the exit sequence.
manner that arguments are passed to the subroutine,
in terms of the High-level code. For instance, the Call sequence a few instructions in the middle of a func-
following C function call: tion (the caller) which pass the arguments and call
the called function. After the called function has
MyFunction1(a, b); returned, some calling conventions have one more
instruction in the call sequence to clean the stack.
38 CHAPTER 3. CODE PATTERNS
The C language, by default, uses the CDECL calling STDCALL, also known as “WINAPI” (and a few other
convention, but most compilers allow the programmer names, depending on where you are reading it) is used al-
to specify another convention via a specifier keyword. most exclusively by Microsoft as the standard calling con-
These keywords are not part of the ISO-ANSI C stan- vention for the Win32 API. Since STDCALL is strictly
dard, so you should always check with your compiler doc- defined by Microsoft, all compilers that implement it do
umentation about implementation specifics. it the same way.
If a calling convention other than CDECL is to be used,
• STDCALL passes arguments right-to-left, and re-
or if CDECL is not the default for your compiler, and
turns the value in eax. (The Microsoft documenta-
you want to manually use it, you must specify the call-
tion erroneously claimed that arguments are passed
ing convention keyword in the function declaration itself,
left-to-right, but this is not the case.)
and in any prototypes for the function. This is important
because both the calling function and the called function • The called function cleans the stack, unlike CDECL.
need to know the calling convention. This means that STDCALL doesn't allow variable-
length argument lists.
Variadic functions usually have special entry code, gen- There are a few important points to note here:
erated by the va_start(), va_arg() C pseudo-functions.
Consider the following C instructions: 1. In the function body, the ret instruction has an (op-
_cdecl int MyFunction1(int a, int b) { return a + b; } tional) argument that indicates how many bytes to
pop off the stack when the function returns.
and the following function call: 2. STDCALL functions are name-decorated with a
leading underscore, followed by an @, and then the
x = MyFunction1(2, 3);
number (in bytes) of arguments passed on the stack.
This number will always be a multiple of 4, on a 32-
These would produce the following assembly listings, re- bit aligned machine.
spectively:
_MyFunction1: push ebp mov ebp, esp mov eax, [ebp + FASTCALL
8] mov edx, [ebp + 12] add eax, edx pop ebp ret
The FASTCALL calling convention is not completely
and standard across all compilers, so it should be used with
caution. In FASTCALL, the first 2 or 3 32-bit (or
push 3 push 2 call _MyFunction1 add esp, 8 smaller) arguments are passed in registers, with the most
commonly used registers being edx, eax, and ecx. Ad-
When translated to assembly code, CDECL functions ditional arguments, or arguments larger than 4-bytes are
are almost always prepended with an underscore (that’s passed on the stack, often in Right-to-Left order (simi-
why all previous examples have used "_” in the assembly lar to CDECL). The calling function most frequently is
code). responsible for cleaning the stack, if needed.
3.4. CALLING CONVENTIONS 39
C++ requires that non-static methods of a class be called And here is the resultant mangled name:
by an instance of the class. Therefore it uses its own stan- ?MyFunction@MyClass@@QAEHH@Z
dard calling convention to ensure that pointers to the ob-
ject are passed to the function: THISCALL.
Extern “C”
Would form the following asm code: 3.4.5 Note on Name Decorations
mov ecx, MyObj push c push b push a call _MyMethod We've been discussing name decorations in this chapter,
but the fact is that in pure disassembled code there typi-
At least, it would look like the assembly code above if it cally are no names whatsoever, especially not names with
weren't for name mangling. fancy decorations. The assembly stage removes all these
40 CHAPTER 3. CODE PATTERNS
readable identifiers, and replaces them with the binary lo- CDECL
cations instead. Function names really only appear in two
places: int MyFunction(int x, int y) { return (x * 2) + (y * 3); }
becomes:
PUBLIC @MyFunction@8 _TEXT SEGMENT _y$
3.5.1 Microsoft C Compiler = −8 ; size = 4 _x$ = −4 ; size = 4 @MyFunction@8
PROC NEAR ; _x$ = ecx ; _y$ = edx ; Line 4 push
Here is a simple function in C: ebp mov ebp, esp sub esp, 8 mov _y$[ebp], edx mov
_x$[ebp], ecx ; Line 5 mov eax, _y$[ebp] imul eax, 3
int MyFunction(int x, int y) { return (x * 2) + (y * 3); } mov ecx, _x$[ebp] lea eax, [eax+ecx*2] ; Line 6 mov
esp, ebp pop ebp ret 0 @MyFunction@8 ENDP _TEXT
Using cl.exe, we are going to generate 3 separate list- ENDS END
ings for MyFunction, one with CDECL, one with FAST-
CALL, and one with STDCALL calling conventions. On This function was compiled with optimizations turned
the commandline, there are several switches that you can off. Here we see arguments are first saved in stack then
use to force the compiler to change the default: fetched from stack, rather than be used directly. This is
because the compiler wants a consistent way to use all ar-
• /Gd : The default calling convention is CDECL
guments via stack access, not only one compiler does like
• /Gr : The default calling convention is FASTCALL that.
• /Gz : The default calling convention is STDCALL There is no argument is accessed with positive offset to
entry SP, it seems caller doesn’t pushed in them, thus it
Using these commandline options, here are the listings: can use ret 0. Let’s do further investigation:
3.5. CALLING CONVENTION EXAMPLES 41
int FastTest(int x, int y, int z, int a, int b, int c) { return x ; size = 4 _STDCALLTest@24 PROC NEAR ; Line
* y * z * a * b * c; } 2 push ebp mov ebp, esp ; Line 3 mov eax, _x$[ebp]
imul eax, DWORD PTR _y$[ebp] imul eax, DWORD
and the corresponding listing: PTR _z$[ebp] imul eax, DWORD PTR _a$[ebp] imul
eax, DWORD PTR _b$[ebp] imul eax, DWORD
PUBLIC @FastTest@24 _TEXT SEGMENT _y$ = PTR _c$[ebp] ; Line 4 pop ebp ret 24 ; 00000018H
−8 ; size = 4 _x$ = −4 ; size = 4 _z$ = 8 ; size = _STDCALLTest@24 ENDP _TEXT ENDS END
4 _a$ = 12 ; size = 4 _b$ = 16 ; size = 4 _c$ = 20 ;
size = 4 @FastTest@24 PROC NEAR ; _x$ = ecx ;
_y$ = edx ; Line 2 push ebp mov ebp, esp sub esp, 8 Yes the only difference between STDCALL and CDECL
mov _y$[ebp], edx mov _x$[ebp], ecx ; Line 3 mov is that the former does stack clean up in callee, the later
eax, _x$[ebp] imul eax, DWORD PTR _y$[ebp] imul in caller. This saves a little bit in X86 due to its “ret n”.
eax, DWORD PTR _z$[ebp] imul eax, DWORD PTR
_a$[ebp] imul eax, DWORD PTR _b$[ebp] imul eax,
DWORD PTR _c$[ebp] ; Line 4 mov esp, ebp pop ebp
3.5.2 GNU C Compiler
ret 16 ; 00000010H
We will be using 2 example C functions to demonstrate
how GCC implements calling conventions:
Now we have 6 arguments, four are pushed in by caller
int MyFunction1(int x, int y) { return (x * 2) + (y * 3); }
from right to left, and last two are passed again in cx/dx,
and processed the same way as previous example. Stack
cleanup is done by ret 16, which corresponding to 4 ar- and
guments pushed before call executed. int MyFunction2(int x, int y, int z, int a, int b, int c) {
For FASTCALL, compiler will try to pass arguments in return x * y * (z + 1) * (a + 2) * (b + 3) * (c + 4); }
registers, if not enough caller will pushed them into stack
still in an order from right to left. Stack cleanup is done GCC does not have commandline arguments to force the
by callee. It is called FASTCALL because if arguments default calling convention to change from CDECL (for
can be passed in registers (for 64bit CPU the maximum C), so they will be manually defined in the text with the
number is 6), no stack push/clean up is needed. directives: __cdecl, __fastcall, and __stdcall.
The name-decoration scheme of the function: @My-
Function@n, here n is stack size needed for all arguments.
CDECL
lea ecx, [eax + eax] creased by 2 (a). +16 gets increased by 3 (b), and +20
gets increased by 4 (c). Let’s list these values then:
Which is clearly the same as a multiplication by 2. The z = [ebp + 8] a = [ebp + 12] b = [ebp + 16] c = [ebp +
first value accessed must then have been the last value 20]
passed, which would seem to indicate that values are
c is the furthest down, and therefore was the first pushed.
passed right-to-left here. To prove this, we will look at
z is the highest to the top, and was therefore the last
the next section of the listing:
pushed. Arguments are therefore pushed in right-to-left
movl 12(%ebp), %edx movl %edx, %eax addl %eax, order, just like cl.exe.
%eax addl %edx, %eax leal (%eax,%ecx), %eax
the value at offset +12 from ebp is moved into edx. edx
is then moved into eax. eax is then added to itselt (eax *
2), and then is added back to edx (edx + eax). remember
though that eax = 2 * edx, so the result is edx * 3. This STDCALL
then is clearly the y parameter, which is furthest on the
stack, and was therefore the first pushed. CDECL then on Let’s compare then the implementation of MyFunction1
GCC is implemented by passing arguments on the stack in GCC:
in right-to-left order, same as cl.exe.
.globl _MyFunction1@8 .def _MyFunction1@8; .scl 2;
.type 32; .endef _MyFunction1@8: pushl %ebp movl
FASTCALL %esp, %ebp movl 8(%ebp), %eax leal (%eax,%eax),
%ecx movl 12(%ebp), %edx movl %edx, %eax addl
.globl @MyFunction1@8 .def @MyFunction1@8; .scl %eax, %eax addl %edx, %eax leal (%eax,%ecx), %eax
2; .type 32; .endef @MyFunction1@8: pushl %ebp popl %ebp ret $8
movl %esp, %ebp subl $8, %esp movl %ecx, −4(%ebp)
movl %edx, −8(%ebp) movl −4(%ebp), %eax leal The name decoration is the same as in cl.exe, so STD-
(%eax,%eax), %ecx movl −8(%ebp), %edx movl CALL functions (and CDECL and FASTCALL for that
%edx, %eax addl %eax, %eax addl %edx, %eax leal matter) can be assembled with either compiler, and linked
(%eax,%ecx), %eax leave ret with either linker, it seems. The stack frame is set up,
then the value at [ebp + 8] is doubled. After that, the
Notice first that the same name decoration is used as in value at [ebp + 12] is tripled. Therefore, +8 is x, and +12
cl.exe. The astute observer will already have realized that is y. Again, these values are pushed in right-to-left order.
GCC uses the same trick as cl.exe, of moving the fastcall This function also cleans its own stack with the “ret 8”
arguments from their registers (ecx and edx again) onto instruction.
a negative offset on the stack. Again, optimizations are Looking at a bigger example:
turned off. ecx is moved into the first position (−4) and
edx is moved into the second position (−8). Like the .globl _MyFunction2@24 .def _MyFunction2@24; .scl
CDECL example above, the value at −4 is doubled, and 2; .type 32; .endef _MyFunction2@24: pushl %ebp
the value at −8 is tripled. Therefore, −4 (ecx) is x, and movl %esp, %ebp movl 8(%ebp), %eax imull 12(%ebp),
−8 (edx) is y. It would seem from this listing then that %eax movl 16(%ebp), %edx incl %edx imull %edx,
values are passed left-to-right, although we will need to %eax movl 20(%ebp), %edx addl $2, %edx imull %edx,
take a look at the larger, MyFunction2 example: %eax movl 24(%ebp), %edx addl $3, %edx imull %edx,
%eax movl 28(%ebp), %edx addl $4, %edx imull %edx,
.globl @MyFunction2@24 .def @MyFunction2@24; %eax popl %ebp ret $24
.scl 2; .type 32; .endef @MyFunction2@24: pushl %ebp
movl %esp, %ebp subl $8, %esp movl %ecx, −4(%ebp)
movl %edx, −8(%ebp) movl −4(%ebp), %eax imull We can see here that values at +8 and +12 from ebp are
−8(%ebp), %eax movl 8(%ebp), %edx incl %edx imull still x and y, respectively. The value at +16 is incremented
%edx, %eax movl 12(%ebp), %edx addl $2, %edx imull by 1, the value at +20 is incremented by 2, etc all the way
%edx, %eax movl 16(%ebp), %edx addl $3, %edx imull to the value at +28. We can therefore create the following
%edx, %eax movl 20(%ebp), %edx addl $4, %edx imull table:
%edx, %eax leave ret $16 x = [ebp + 8] y = [ebp + 12] z = [ebp + 16] a = [ebp +
20] b = [ebp + 24] c = [ebp + 28]
By following the fact that in MyFunction2, successive pa- With c being pushed first, and x being pushed last. There-
rameters are added to increasing constants, we can de- fore, these parameters are also pushed in right-to-left or-
duce the positions of each parameter. −4 is still x, and der. This function then also cleans 24 bytes off the stack
−8 is still y. +8 gets incremented by 1 (z), +12 gets in- with the “ret 24” instruction.
3.6. BRANCHES 43
3.5.3 Example: C Calling Conventions Two things should get our attention immediately. The
first is that before the function call, a value is stored into
Identify the calling convention of the following C func- ecx. Also, the function name itself is heavily mangled.
tion: This example must use the C++ THISCALL convention.
Inside the mangled name of the function, we can pick
int MyFunction(int a, int b) { return a + b; }
out two english words, “Load” and “Container”. Without
knowing the specifics of this name mangling scheme, it
The function is written in C, and has no other specifiers, is not possible to determine which word is the function
so it is CDECL by default. name, and which word is the class name.
We can pick out two 32-bit variables being passed to the
3.5.4 Example: Named Assembly Func- function, and a single 8-bit variable. The first is located
in eax, the second is originally located on the stack from
tion offset −4 from ebp, and the third is located at ebp off-
set −3. In C++, these would likely correspond to two int
Identify the calling convention of the function MyFunc-
variables, and a single char variable. Notice at the end of
tion:
the mangled function name are three lower-case charac-
:_MyFunction@12 push ebp mov ebp, esp ... pop ebp ters “cii”. We can't know for certain, but it appears these
ret 12 three letters correspond to the three parameters (char, int,
int). We do not know from this whether the function re-
The function includes the decorated name of an STD- turns a value or not, so we will assume the function returns
CALL function, and cleans up its own stack. It is there- void.
fore an STDCALL function. Assuming that “Load” is the function name and “Con-
tainer” is the class name (it could just as easily be the
other way around), here is our function definition:
3.5.5 Example: Unnamed Assembly Func-
class Container { void Load(char, int, int); }
tion
produces this assembler code: switch (x) { //body of switch statement } end_of_switch:
Let us now look at a more complicated case: the If-Then- And when we compile this with cl.exe, we can generate
Else instruction. the following listing file:
Now, what happens here? Like before, the if statement tv64 = −4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ;
only jumps to the else clause when the condition is false.
size = 4 _main PROC NEAR ; Line 10 push ebp mov
However, we must also install an unconditional jump at ebp, esp push ecx ; Line 11 mov eax, DWORD PTR
the end of the “then” clause, so we don't perform the else
_argc$[ebp] mov DWORD PTR tv64[ebp], eax mov
clause directly afterwards. ecx, DWORD PTR tv64[ebp] sub ecx, 1 mov DWORD
Now, here is an example of a real C If-Then-Else: PTR tv64[ebp], ecx cmp DWORD PTR tv64[ebp], 3 ja
SHORT $L810 mov edx, DWORD PTR tv64[ebp] jmp
if(x == 10) { x = 0; } else { x++; } DWORD PTR $L818[edx*4] $L806: ; Line 14 push
1 call _MyFunction add esp, 4 ; Line 15 jmp SHORT
Which gets translated into the following assembly code: $L803 $L807: ; Line 17 push 2 call _MyFunction
add esp, 4 ; Line 18 jmp SHORT $L803 $L808: ;
mov eax, $x cmp eax, 0x0A ;0x0A = 10 jne else mov Line 19 push 3 call _MyFunction add esp, 4 ; Line
eax, 0 jmp end else: inc eax end: mov $x, eax 20 jmp SHORT $L803 $L809: ; Line 22 push 4 call
_MyFunction add esp, 4 ; Line 23 jmp SHORT $L803
As you can see, the addition of a single unconditional $L810: ; Line 25 push 5 call _MyFunction add esp, 4
jump can add an entire extra option to our conditional. $L803: ; Line 27 xor eax, eax ; Line 28 mov esp, ebp
3.6. BRANCHES 45
pop ebp ret 0 $L818: DD $L806 DD $L807 DD $L808 sub ecx, 1 mov DWORD PTR tv64[ebp], ecx
DD $L809 _main ENDP
The value of argc is moved into eax. The value of eax is
Lets work our way through this. First, we see that line then immediately moved to the scratch space. The value
10 sets up our standard stack frame, and it also saves ecx. of the scratch space is then moved into ecx. Sounds like
Why does it save ecx? Scanning through the function, an awfully convoluted way to get the same value into so
we never see a corresponding “pop ecx” instruction, so many different locations, but remember: I turned off the
it seems that the value is never restored at all. In fact, optimizations. The value of ecx is then decremented by
the compiler isn't saving ecx at all, but is instead simply 1. Why didn't the compiler use a dec instruction instead?
reserving space on the stack: it’s creating a local vari- Perhaps the statement is a general statement, that in this
able. The original C code didn't have any local variables, case just happens to have an argument of 1. We don't
however, so perhaps the compiler just needed some extra know why exactly, all we know is this:
scratch space to store intermediate values. Why doesn't
the compiler execute the more familiar “sub esp, 4” com- • eax = “scratch pad”
mand to create the local variable? push ecx is just a faster
instruction that does the same thing. This “scratch space” • ecx = eax - 1
is being referenced by a negative offset from ebp. tv64
was defined in the beginning of the listing as having the Finally, the last line moves the new, decremented value
value −4, so every call to “tv64[ebp]" is a call to this of ecx back into the scratch pad. Very inefficient.
scratch space.
There are a few things that we need to notice about the
The Compare and Jumps
function in general:
cmp DWORD PTR tv64[ebp], 3 ja SHORT $L810
• Label $L803 is the end_of_switch label. Therefore,
every “jmp SHORT $L803” statement is a break.
This is verifiable by comparing with the C code line- The value of the scratch pad is compared with the value
by-line. 3, and if the unsigned value is above 3 (4 or more), exe-
cution jumps to label $L810. How do I know the value is
• Label $L818 contains a list of hard-coded memory
unsigned? I know because ja is an unsigned conditional
addresses, which here are labels in the code section!
jump. Let’s look back at the original C code switch:
Remember, labels resolve to the memory address of
the instruction. This must be an important part of switch(argc) { case 1: MyFunction(1); break; case 2:
our puzzle. MyFunction(2); break; case 3: MyFunction(3); break;
case 4: MyFunction(4); break; default: MyFunction(5); }
To solve this puzzle, we will take an in-depth look at line
11: Remember, the scratch pad contains the value (argc - 1),
mov eax, DWORD PTR _argc$[ebp] mov DWORD which means that this condition is only triggered when
PTR tv64[ebp], eax mov ecx, DWORD PTR tv64[ebp] argc > 4. What happens when argc is greater than 4? The
sub ecx, 1 mov DWORD PTR tv64[ebp], ecx cmp function goes to the default condition. Now, let’s look at
DWORD PTR tv64[ebp], 3 ja SHORT $L810 mov the next two lines:
edx, DWORD PTR tv64[ebp] jmp DWORD PTR mov edx, DWORD PTR tv64[ebp] jmp DWORD PTR
$L818[edx*4] $L818[edx*4]
This sequence performs the following pseudo-C opera- edx gets the value of the scratch pad (argc - 1), and then
tion: there is a very weird jump that takes place: execution
if( argc - 1 >= 4 ) { goto $L810; /* the default */ } label jumps to a location pointed to by the value (edx * 4 +
*L818[] = { $L806, $L807, $L808, $L809 }; /* define a $L818). What is $L818? We will examine that right now.
table of jumps, one per each case */ // goto L818[argc -
1]; /* use the address from the table to jump to the correct
The Switch Table
case */
Here’s why... $L818: DD $L806 DD $L807 DD $L808 DD $L809
jmp DWORD PTR $L818[edx*4] menting this value sets every bit in eax to a logical 1.
Now, when we perform the logical and function we get:
In this jump, $L818 isn't the offset, it’s the base, edx*4 is ...11111111 &...00000101 ;101 is 5 in binary ------------
the offset. As we said earlier, edx contains the value of ...00000101
(argc - 1). If argc == 1, we jump to [$L818 + 0] which
eax gets the value 5. In this case, it’s a roundabout method
is $L806. If argc == 2, we jump to [$L818 + 4], which of doing it, but as a reverser, this is the stuff you need to
is $L807. Get the picture? A quick look at labels $L806,
worry about.
$L807, $L808, and $L809 shows us exactly what we ex-
pect to see: the bodies of the case statements from the For reference, here is the GCC assembly output of the
original C code, above. Each one of the case statements same ternary operator from above:
calls the function “MyFunction”, then breaks, and then _main: pushl %ebp movl %esp, %ebp subl $8, %esp
jumps to the end of the switch block. xorl %eax, %eax andl $−16, %esp call __alloca call
___main xorl %edx, %edx cmpl $2, 8(%ebp) setge %dl
leal (%edx,%edx,4), %eax leave ret
3.6.5 Ternary Operator ?:
Notice that GCC produces slightly different code than
Again, the best way to learn is by doing. Therefore we cl.exe produces. However, the stack frame is set up the
will go through a mini example to explain the ternary op- same way. Notice also that GCC doesn't give us line num-
erator. Consider the following C code program: bers, or other hints in the code. The ternary operator line
int main(int argc, char **argv) { return (argc > 1)?(5):(0); occurs after the instruction “call __main”. Let’s highlight
} that section here:
xorl %edx, %edx cmpl $2, 8(%ebp) setge %dl leal
cl.exe produces the following assembly listing file: (%edx,%edx,4), %eax
_argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4
_main PROC NEAR ; File c:\documents and set- Again, xor is used to set edx to 0 quickly. Argc is tested
tings\andrew\desktop\test2.c ; Line 2 push ebp mov against 2 (instead of 1), and dl is set if argc is greater then
ebp, esp ; Line 3 xor eax, eax cmp DWORD PTR or equal. If dl gets set to 1, the leal instruction directly
_argc$[ebp], 1 setle al dec eax and eax, 5 ; Line 4 pop thereafter will move the value of 5 into eax (because lea
ebp ret 0 _main ENDP (edx,edx,4) means edx + edx * 4, i.e. edx * 5).
Consider the following generic Do-While loop: See why we covered the Do-While loop first? Because the
What does this loop do? The loop body simply executes, While-loop becomes a Do-While when it gets assembled.
the condition is tested at the end of the loop, and the loop So why can't the jump label occur before the test?
jumps back to the beginning of the loop if the condition is mov eax, $x beginning: cmp eax, 0x0A jg end inc eax
satisfied. Unlike if statements, Do-While conditions are
jmp beginning end: mov $x, eax
not reversed.
Let us now take a look at the following C code:
do { x++; } while(x != 10);
3.8.4 For Loops
Which can be translated into assembly language as such:
mov eax, $x beginning: inc eax cmp eax, 0x0A ;0x0A = What is a For-Loop? In essence, it’s a While-Loop with
10 jne beginning mov $x, eax an initial state, a condition, and an iterative instruction.
For instance, the following generic For-Loop:
gets translated into the following pseudocode while-loop:
3.8.3 While Loops initialization; while(condition) { action; increment; }
While loops look almost as simple as a Do-While loop, Which in turn gets translated into the following Do-While
but in reality they aren't as simple at all. Let’s examine a Loop:
generic while-loop:
initialization; if(condition) { do { action; increment; }
while(x) { //loop body } while(condition); }
What does this loop do? First, the loop checks to make Note that often in for() loops you assign an initial constant
sure that x is true. If x is not true, the loop is skipped. The value in A (for example x = 0), and then compare that
loop body is then executed, followed by another check: is value with another constant in B (for example x < 10).
x still true? If x is still true, execution jumps back to the Most optimizing compilers will be able to notice that the
top of the loop, and execution continues. Keep in mind first time x IS less than 10, and therefore there is no need
that there needs to be a jump at the bottom of the loop for the initial if(B) statement. In such cases, the compiler
(to get back up to the top), but it makes no sense to jump will simply generate the following sequence:
back to the top, retest the conditional, and then jump back
to the bottom of the loop if the conditional is found to be initialization; do { action increment; } while(condition);
false. The while-loop then, performs the following steps:
rendering the code indistinguishable from a while() loop.
1. check the condition. if it is false, go to the end
4. if the condition is not true, fall-through the end of C only has Do-While, While, and For Loops, but some
the loop. other languages may very well implement their own types.
Also, a good C-Programmer could easily “home brew” a
Here is a while-loop in C code: new type of loop using a series of good macros, so they
bear some consideration:
while(x <= 10) { x++; }
If we were to translate that assembly code back into C, do { //loop body } until(x);
we would get the following code:
if(x <= 10) //remember: in If statements, we reverse which essentially becomes the following Do-While loop:
the condition from the asm { do { x++; } while(x <= 10) } do { //loop body } while(!x);
3.9. LOOP EXAMPLES 49
Until Loop sum in eax. The only parameter (located in [ebp + 8]) is
a pointer to an array of integer values. The comparison
Like the Do-Until loop, the standard Until-Loop looks between ebx and 100 indicates that the input array has
like the following: 100 entries in it. The pointer offset [esi + ebx * 4] shows
until(x) { //loop body } that each entry in the array is 4 bytes wide.
which (likewise) gets translated to the following While- 3.9.2 Example: Complete C Prototype
Loop:
while(!x) { //loop body } What is this function’s C prototype? Make sure to include
parameters, return values, and calling convention.
push ebp mov ebp, esp mov esi, [ebp + 8] mov ebx, 0
mov eax, 0 mov ecx, 0 _Label_1: mov ecx, [esi + ebx *
Do-Forever Loop
4] add eax, ecx inc ebx cmp ebx, 100 jne _Label_1 mov
esp, ebp pop ebp ret 4
A Do-Forever loop is simply an unqualified loop with a
condition that is always true. For instance, the following
pseudo-code: Notice how the ret function cleans its parameter off the
stack? That means that this function is an STDCALL
doforever { //loop body }
function. We know that the function takes, as its only
parameter, a pointer to an array of integers. We do not
will become the following while-loop: know, however, whether the integers are signed or un-
while(1) { //loop body } signed, because the je command is used for both types of
values. We can assume one or the other, and for simplic-
ity, we can assume unsigned values (unsigned and signed
Which can actually be reduced to a simple unconditional values, in this function, will actually work the same way).
jump statement: We also know that the return value is a 4-byte integer
beginning: ;loop body jmp beginning value, of the same type as is found in the parameter ar-
ray. Since the function doesnt have a name, we can just
call it “MyFunction”, and we can call the parameter “ar-
Notice that some non-optimizing compilers will produce
ray” because it is an array. From this information, we can
nonsensical code for this:
determine the following prototype in C:
mov ax, 1 cmp ax, 1 jne loopend beginning: ;loop body
unsigned int STDCALL MyFunction(unsigned int
cmp ax, 1 je beginning loopend:
*array);
3.9.1 Example: Identify Purpose Starting with the function prototype above, and the de-
scription of what this function does, we can start to write
What does this function do? What kinds of parameters the C code for this function. We know that this function
does it take, and what kind of results (if any) does it re- initializes eax, ebx, and ecx before the loop. However,
turn? we can see that ecx is being used as simply an intermedi-
push ebp mov ebp, esp mov esi, [ebp + 8] mov ebx, 0 ate storage location, receiving successive values from the
mov eax, 0 mov ecx, 0 _Label_1: mov ecx, [esi + ebx * array, and then being added to eax.
4] add eax, ecx inc ebx cmp ebx, 100 jne _Label_1 mov We will create two unsigned integer values, a (for eax)
esp, ebp pop ebp ret 4 and b (for ebx). We will define both a and b with the
register qualifier, so that we can instruct the compiler
This function loops through an array of 4 byte integer val- not to create space for them on the stack. For each loop
ues, pointed to by esi, and adds each entry. It returns the iteration, we are adding the value of the array, at location
50 CHAPTER 3. CODE PATTERNS
Data Patterns
4.1 Variables In the last example, the value of ecx is calculated at run-
time, whereas in the first 2 examples, the value is the same
every time. RVAs are considered hard-coded addresses,
even though the loader needs to “fix them up” to point to
the correct locations.
4.1.1 Variables
We've already seen some mechanisms to create local stor- 4.1.3 .BSS and .DATA sections
age on the stack. This chapter will talk about some other
variables, including global variables, static variables, Both .bss and .data sections contain values which can
variables labled "const,” "register,” and "volatile.” It will change at run-time (e.g. variables). Typically, variables
also consider some general techniques concerning vari- that are initialized to a non-zero value in the source are al-
ables, including accessor and setter methods (to borrow located in the .data section (e.g. “int a = 10;"). Variables
from OO terminology). This section may also talk about that are not initialized, or initialized with a zero value,
setting memory breakpoints in a debugger to track mem- can be allocated to the .bss section (e.g. “int arr[100];").
ory I/O on a variable. Because all values of .bss variables are guaranteed to be
zero at the start of the program, there is no need for the
linker to allocate space in the binary file. Therefore, .bss
4.1.2 How to Spot a Variable sections do not take space in the binary file, regardless of
their size.
Variables come in 2 distinct flavors: those that are created
on the stack (local variables), and those that are accessed
via a hardcoded memory address (global variables). Any 4.1.4 “Static” Local Variables
memory that is accessed via a hard-coded address is usu-
ally a global variable. Variables that are accessed as an Local variables labeled static maintain their value across
offset from esp, or ebp are frequently local variables. function calls, and therefore cannot be created on the
stack like other local variables are. How are static vari-
ables created? Let’s take a simple example C function:
Hardcoded address Anything hardcoded is a value
that is stored as-is in the binary, and is not changed void MyFunction(int a) { static int x = 0; printf(“my
at runtime. For instance, the value 0x2054 is hard- number: "); printf("%d, %d\n”, a, x); }
coded, whereas the current value of variable X is not
hard-coded and may change at runtime. Compiling to a listing file with cl.exe gives us the follow-
ing code:
Example of a hardcoded address: _BSS SEGMENT ?x@?1??MyFunction@@9@9 DD
mov eax, [0x77651010] 01H DUP (?) ; `MyFunction'::`2'::x _BSS ENDS
_DATA SEGMENT $SG796 DB 'my number: ', 00H
$SG797 DB '%d, %d', 0aH, 00H _DATA ENDS PUB-
OR: LIC _MyFunction EXTRN _printf:NEAR ; Function
mov ecx, 0x77651010 mov eax, [ecx] compile flags: /Odt _TEXT SEGMENT _a$ = 8 ; size
= 4 _MyFunction PROC NEAR ; Line 4 push ebp
mov ebp, esp ; Line 6 push OFFSET FLAT:$SG796
Example of a non-hardcoded (softcoded?) address: call _printf add esp, 4 ; Line 7 mov eax, DWORD
mov ecx, [esp + 4] add ecx, ebx mov eax, [ecx] PTR ?x@?1??MyFunction@@9@9 push eax mov
ecx, DWORD PTR _a$[ebp] push ecx push OFFSET
51
52 CHAPTER 4. DATA PATTERNS
FLAT:$SG797 call _printf add esp, 12 ; 0000000cH ; 4.1.5 Signed and Unsigned Variables
Line 8 pop ebp ret 0 _MyFunction ENDP _TEXT ENDS
Integer formatted variables, such as int, char, short and
Normally when assembly listings are posted in this wiki- long may be declared signed or unsigned variables in the
book, most of the code gibberish is discarded to aid read- C source code. There are two differences in how these
ability, but in this instance, the “gibberish” contains the variables are treated:
answer we are looking for. As can be clearly seen, this
function creates a standard stack frame, and it doesn't cre- 1. Signed variables use signed instructions such as add,
ate any local variables on the stack. In the interests of and sub. Unsigned variables use unsigned arith-
being complete, we will take baby-steps here, and work metic instructions such as addi, and subi.
to the conclusion logically.
2. Signed variables use signed branch instructions such
In the code for Line 7, there is a call to _printf with 3 ar-
as jge and jl. Unsigned variables use unsigned
guments. Printf is a standard libc function, and it there-
branch instructions such as jae, and jb.
fore can be assumed to be cdecl calling convention. Ar-
guments are pushed, therefore, from right to left. Three
arguments are pushed onto the stack before _printf is The difference between signed and unsigned instructions
called: is the conditions under which the various flags for greater-
than or less-than (overflow flags) are set. The integer re-
sult values are exactly the same for both signed and un-
• DWORD PTR ?x@?1??MyFunction@@9@9 signed data.
Because they are so simple, accessor methods are fre- and we can take the element that is at offset +8 from that
quently heavily optimized (they generally don't need a pointer (value3):
stack frame), and are even occasionally inlined by the MyClass::GetValue3() { return this->value3; }
compiler.
4.2.1 Example: Identify C++ Code push ebp mov ebp, esp
Can you tell what the original C++ source code looks like, The next two lines of code compare the value of [ebp +
in general, for the following accessor method? 8] (which we know to be the first parameter) to zero. If
push ebp mov ebp, esp mov eax, [ecx + 8] ;THISCALL [ebp+8] is zero, the function jumps to the label “error”.
function, passes “this” pointer in ecx mov esp, ebp pop We see that the label “error” sets eax to 0, and returns. We
ebp ret haven't seen it before, but this looks conspicuously like an
if statement. “If the parameter is zero, return zero”.
We don't know the name of the class, so we will use a If, on the other hand, the parameter is not zero, we move
generic name MyClass (or whatever you would like to call the value into eax, and then move the value into [ecx +
it). We will lay out a simple class definition, that contains 0], which we know as the first data field in MyClass. We
a data value at offset +8. Offset +8 is the only data value also see, from this code, that this first data field must be
4 bytes long (because we are using eax). After we move
accessed, so we don't know what the first 8 bytes of data
looks like, but we will just assume (for our purposes) that eax into [ecx + 0], we set eax to 1 and jump to the end of
the function.
our class looks like this:
class MyClass { int value1; int value2; int value3; //offset If we use the same MyClass defintion as in question 1,
+8 ... } above, we can get the following code for our function,
“SetValue1(int val)":
We will then create our function, which I will call “Get- int MyClass::SetValue1(int val) { if(val == 0) return 0;
Value3()". We know that the data value being accessed this->value1 = val; return 1; }
is located at [ecx+8], (which we have defined above to
be “value3”). Also, we know that the data is being read Notice that since we are returning a 0 on failure, and a
into a 4-byte register (eax), and is not truncated. We can 1 on success, the function looks like it has a bool return
assume, therefore, that value3 is a 4-byte data value. We value. However, the return value is 4 bytes wide (eax is
can use the this pointer as the pointer value stored in ecx, used), but the size of a bool is implementation-specific,
4.3. DATA STRUCTURES 55
so we can't be sure. The bool is usually defined to have Which looks harmless enough. But, what if a program
a size of 1 byte, but it is often stored the same way as an inadvertantly accesses buffer[4]? what about buffer[5]?
int. what about buffer[8]? This is the makings of a buffer
overflow vulnerability, and (might) will be discussed in a
later section. However, this section won't talk about secu-
rity issues, and instead will focus only on data structures.
4.3 Data Structures
Spotting an Array on the Stack
Arrays are simply a storage scheme for multiple data ob- Spotting an Array in Memory
jects of the same type. Data objects are stored sequen-
tially, often as an offset from a pointer to the beginning Arrays in memory, such as global arrays, or arrays which
of the array. Consider the following C code: have initial data (remember, initialized data is created in
x = array[25]; the .data section in memory) and will be accessed as off-
sets from a hardcoded address in memory:
Which is identical to the following asm code: :_MyFunction4 push ebp mov ebp, esp mov esi,
0x77651004 mov ebx, 0x00000000 mov [esi + ebx],
mov ebx, $array mov eax, [ebx + 25] mov $x, eax 0x00
Now, consider the following example: It needs to be kept in mind that structures and classes
int MyFunction1() { int array[20]; ... might be accessed in a similar manner, so the reverser
needs to remember that all the data objects in an array
are of the same type, that they are sequential, and they
This (roughly) translates into the following asm pseudo-
will often be handled in a loop of some sort. Also, (and
code: this might be the most important part), each elements in
:_MyFunction1 push ebp mov ebp, esp sub esp, 80 ;the an array may be accessed by a variable offset from the
whole array is created on the stack!!! lea $array, [esp base.
+ 0] ;a pointer to the array is saved in the array variable ... Since most times an array is accessed through a computed
index, not through a constant, the compiler will likely use
The entire array is created on the stack, and the pointer to the following to access an element of the array:
the bottom of the array is stored in the variable “array”. mov [ebx + eax], 0x00
An optimizing compiler could ignore the last instruction,
and simply refer to the array via a +0 offset from esp (in
this example), but we will do things verbosely. If the array holds elements larger than 1 byte (for char),
the index will need to be multiplied by the size of the
Likewise, consider the following example:
element, yielding code similar to the following:
void MyFunction2() { char buffer[4]; ... mov [ebx + eax * 4], 0x11223344 # access to an array
of DWORDs, e.g. arr[i] = 0x11223344 ... mul eax, $20
This will translate into the following asm pseudo-code:
# access to an array of structs, each 20 bytes long lea edi,
:_MyFunction2 push ebp mov ebp, esp sub esp, 4 lea [ebx + eax] # e.g. ptr = &arr[i]
$buffer, [esp + 0] ...
This pattern can be used to distinguish between accesses
56 CHAPTER 4. DATA PATTERNS
to arrays and accesses to structure data members. struct MyStruct3 { long value1; void *value2; long
value3; } void MyFunction2(struct MyStruct3 *ptr) {
ptr->value1 = 10; ptr->value2 = ptr; ptr->value3 = 10; }
4.3.3 Structures
As a quick aside note, notice that this function doesn't
All C programmers are going to be familiar with the fol- load anything into eax, and therefore it doesn't return a
lowing syntax: value.
struct MyStruct { int FirstVar; double SecondVar;
unsigned short int ThirdVar; } 4.3.4 Advanced Structures
It’s called a structure (Pascal programmers may know a Lets say we have the following situation in a function:
similar concept as a “record”). :MyFunction1 push ebp mov ebp, esp mov esi, [ebp + 8]
Structures may be very big or very small, and they may lea ecx, SS:[esi + 8] ...
contain all sorts of different data. Structures may look
very similar to arrays in memory, but a few key points what is happening here? First, esi is loaded with the value
need to be remembered: structures do not need to con- of the function’s first parameter (ebp + 8). Then, ecx is
tain data fields of all the same type, structure fields are loaded with a pointer to the offset +8 from esi. It looks
often 4-byte aligned (not sequential), and each element like we have 2 pointers accessing the same data structure!
in a structure has its own offset. It therefore makes no
sense to reference a structure element by a variable offset The function in question could easily be one of the fol-
from the base. lowing 2 prototypes:
Take a look at the following structure definition: struct MyStruct1 { DWORD value1; DWORD value2;
struct MySubStruct1 { ...
struct MyStruct2 { long value1; short value2; long struct MyStruct2 { DWORD value1; DWORD value2;
value3; } DWORD array[LENGTH]; ...
Assuming the pointer to the base of this structure is one pointer offset from another pointer in a structure of-
loaded into ebx, we can access these members in one of ten means a complex data structure. There are far too
two schemes: many combinations of structures and arrays, however, so
The first arrangement is the most common, but it clearly this wikibook will not spend too much time on this sub-
leaves open an entire memory word (2 bytes) at offset +6, ject.
which is not used at all. Compilers occasionally allow the
programmer to manually specify the offset of each data
member, but this isn't always the case. The second exam- 4.3.5 Identifying Structs and Arrays
ple also has the benefit that the reverser can easily identify
that each data member in the structure is a different size. Array elements and structure fields are both accessed as
offsets from the array/structure pointer. When disassem-
Consider now the following function: bling, how do we tell these data structures apart? Here
:_MyFunction push ebp mov ebp, esp lea ecx, SS:[ebp are some pointers:
+ 8] mov [ecx + 0], mov [ecx + 4], ecx mov [ecx + 8],
mov esp, ebp pop ebp 1. array elements are not meant to be accessed individ-
ually. Array elements are typically accessed using a
variable offset
The function clearly takes a pointer to a data structure as
its first argument. Also, each data member is the same 2. Arrays are frequently accessed in a loop. Because
size (4 bytes), so how can we tell if this is an array or a arrays typically hold a series of similar data items,
structure? To answer that question, we need to remember the best way to access them all is usually a loop.
one important distinction between structures and arrays: Specifically, for(x = 0; x < length_of_array; x++)
the elements in an array are all of the same type, the el- style loops are often used to access arrays, although
ements in a structure do not need to be the same type. there can be others.
Given that rule, it is clear that one of the elements in this
structure is a pointer (it points to the base of the struc- 3. All the elements in an array have the same data type.
ture itself!) and the other two fields are loaded with the 4. Struct fields are typically accessed using constant
hex value 0x0A (10 in decimal), which is certainly not offsets.
a valid pointer on any system I have ever used. We can
then partially recreate the structure and the function code 5. Struct fields are typically not accessed in order, and
below: are also not accessed using loops.
4.4. OBJECTS AND CLASSES 57
6. Struct fields are not typically all the same data type, When you start adding in inheritance and polymorphism,
or the same data width things get a little more complicated. For the purposes
of simplicity, the structure of an object will be described
in terms of having no inheritance. At the end, however,
4.3.6 Linked Lists and Binary Trees inheritance and polymorphism will be covered.
Figure 3: This shows how to offset a pointer to retrieve The abstract class A acts as a blueprint for the compiler,
variables. The first line places the address of variable 'a'
defining an expected structure for any class that inherits
into eax. The second line places the address of variable it. Every variable defined in class A and every virtual
'b' into ebx. And the last line places the variable 'c' into
method defined in A will have the exact same offset for
ecx. any of its children. Figure 7 declares a possible inheri-
tance scheme as well as it structure in memory. Notice
how the offset to C::one is the same as D::one, and the
Methods offset to C’s copy of A::a is the same as D’s copy. In this,
our polymorphic loop can just iterate through the array
At a low level, there is almost no difference between a of pointers and know exactly where to find each method.
function and a method. When decompiling, it can some-
A:
times be hard to tell a difference between the two. They
both reside in the text memory space, and both are called class A{ public: int a; virtual void one() = 0; }; class B{
the same way. An example of how a method is called can public: int b; int c; virtual void two() = 0; }; class C:
be seen in Figure 4. public A{ public: int d; void one(); }; class D: public A,
public B{ public: int e; void one(); void two(); };
A:
//method call abc123->foo(1, 2, 3);
B:
;Object C 0x00200000 dd 0x00423848 ; address of
B:
C::one ;offset = 0*word_size 0x00200004 dd 1 ; C’s
push 3 ; int c push 2 ; int b push 1 ; int a push [ebp-4] ; copy of A::a ;offset = 2*word_size 0x00200008 dd 4 ;
the address of the object call 0x00434125 ; call to method C::d ;offset = 4*word_size ;Object D 0x00200100 dd
0x00412348 ; address of D::one ;offset = 0*word_size
A notable characteristic in a method call is the address 0x00200104 dd 1 ; D’s copy of A::a ;offset = 2*word_size
of the object being passed in as an argument. This, how- 0x00200108 dd 0x00431255 ; address of D::two ;offset
ever, is not a always a good indicator. Figure 5 shows = 4*word_size 0x0020010C dd 2 ; D’s copy of B::b
function with the first argument being an object passed ;offset = 6*word_size 0x00200110 dd 3 ; D’s copy of
in by reference. The result is function that looks identical B::c ;offset = 8*word_size 0x00200114 dd 5 ; D::e
to a method call. ;offset = 10*word_size
A:
//function call foo(abc123, 1, 2, 3);
4.4.3 Classes Vs. Structs
B:
push 3 ; int c push 2 ; int b push 1 ; int a push [ebp+4] 4.5 Floating Point Numbers
; the address of the object call 0x00498372 ; call to
function
4.5.2 Calling Conventions aren't integers. Unfortunately, the exact format of float-
ing point numbers is well beyond the scope of this chap-
With the addition of the floating-point stack, there is an ter.
entirely new dimension for passing parameters and re- x is offset +8, y is offset +16, and z is offset +24 from
turning values. We will examine our calling conventions ebp. Therefore, z is pushed first, x is pushed last, and the
here, and see how they are affected by the presence of parameters are passed right-to-left on the regular stack
floating-point numbers. These are the functions that we not the floating point stack. To understand how a value
will be assembling, using both GCC, and cl.exe: is returned however, we need to understand what fmulp
__cdecl double MyFunction1(double x, double y, float does. fmulp is the “Floating-Point Multiply and Pop” in-
z) { return (x + 1.0) * (y + 2.0) * (z + 3.0); } __fastcall struction. It performs the instructions:
double MyFunction2(double x, double y, float z) { return ST1 := ST1 * ST0 FPU POP ST0
(x + 1.0) * (y + 2.0) * (z + 3.0); } __stdcall double
MyFunction3(double x, double y, float z) { return (x + This multiplies ST(1) and ST(0) and stores the result in
1.0) * (y + 2.0) * (z + 3.0); } ST(1). Then, ST(0) is marked empty and stack pointer
is incremented. Thus, contents of ST(1) are on the top
of the stack. So the top 2 values are multiplied together,
and the result is stored on the top of the stack. Therefore,
CDECL in our instruction above, “fmulp ST(1), ST(0)", which is
also the last instruction of the function, we can see that
the last result is stored in ST0. Therefore, floating point
Here is the cl.exe assembly listing for MyFunction1:
parameters are passed on the regular stack, but floating
PUBLIC _MyFunction1 PUBLIC point results are passed on the FPU stack.
__real@3ff0000000000000 PUBLIC
One final note is that MyFunction2 cleans its own stack,
__real@4000000000000000 PUBLIC
as referenced by the ret 20 command at the end of the list-
__real@4008000000000000 EXTRN __fl-
ing. Because none of the parameters were passed in reg-
tused:NEAR ; COMDAT __real@3ff0000000000000
isters, this function appears to be exactly what we would
CONST SEGMENT __real@3ff0000000000000
expect an STDCALL function would look like: parame-
DQ 03ff0000000000000r ; 1 CONST ENDS ;
ters passed on the stack from right-to-left, and the func-
COMDAT __real@4000000000000000 CONST
tion cleans its own stack. We will see below that this is
SEGMENT __real@4000000000000000 DQ
actually a correct assumption.
04000000000000000r ; 2 CONST ENDS ; COM-
DAT __real@4008000000000000 CONST SEGMENT For comparison, here is the GCC listing:
__real@4008000000000000 DQ 04008000000000000r LC1: .long 0 .long 1073741824 .align 8 LC2: .long
; 3 CONST ENDS _TEXT SEGMENT _x$ = 8 ; size = 0 .long 1074266112 .globl _MyFunction1 .def _My-
8 _y$ = 16 ; size = 8 _z$ = 24 ; size = 4 _MyFunction1 Function1; .scl 2; .type 32; .endef _MyFunction1: pushl
PROC NEAR ; Line 2 push ebp mov ebp, esp ; Line %ebp movl %esp, %ebp subl $16, %esp fldl 8(%ebp)
3 fld QWORD PTR _x$[ebp] fadd QWORD PTR fstpl −8(%ebp) fldl 16(%ebp) fstpl −16(%ebp) fldl
__real@3ff0000000000000 fld QWORD PTR _y$[ebp] −8(%ebp) fld1 faddp %st, %st(1) fldl −16(%ebp) fldl
fadd QWORD PTR __real@4000000000000000 fmulp LC1 faddp %st, %st(1) fmulp %st, %st(1) flds 24(%ebp)
ST(1), ST(0) fld DWORD PTR _z$[ebp] fadd QWORD fldl LC2 faddp %st, %st(1) fmulp %st, %st(1) leave ret
PTR __real@4008000000000000 fmulp ST(1), ST(0) .align 8
; Line 4 pop ebp ret 0 _MyFunction1 ENDP _TEXT
ENDS
This is a very difficult listing, so we will step through it
(albeit quickly). 16 bytes of extra space is allocated on
Our first question is this: are the parameters passed on the stack. Then, using a combination of fldl and fstpl
the stack, or on the floating-point register stack, or some instructions, the first 2 parameters are moved from off-
place different entirely? Key to this question, and to this sets +8 and +16, to offsets −8 and −16 from ebp. Seems
function is a knowledge of what fld and fstp do. fld like a waste of time, but remember, optimizations are off.
(Floating-point Load) pushes a floating point value onto fld1 loads the floating point value 1.0 onto the FPU stack.
the FPU stack, while fstp (Floating-Point Store and Pop) faddp then adds the top of the stack (1.0), to the value in
moves a floating point value from ST0 to the specified lo- ST1 ([ebp - 8], originally [ebp + 8]).
cation, and then pops the value from ST0 off the stack en-
tirely. Remember that double values in cl.exe are treated
as 8-byte storage locations (QWORD), while floats are
FASTCALL
only stored as 4-byte quantities (DWORD). It is also im-
portant to remember that floating point numbers are not
stored in a human-readable form in memory, even if the Here is the cl.exe listing for MyFunction2:
reader has a solid knowledge of binary. Remember, these PUBLIC @MyFunction2@20 PUB-
60 CHAPTER 4. DATA PATTERNS
is cleaning exactly 20 bytes off the stack which is, inci- z) { return (x + 1.0) * (y + 2.0) * (z + 3.0); }
dentally, the total amount that we passed to begin with. .align 8 LC5: .long 0 .long 1073741824 .align 8 LC6:
We can also notice that the implementation of this func- .long 0 .long 1074266112 .globl @MyFunction2@20
tion looks exactly like the FASTCALL version of this .def @MyFunction2@20; .scl 2; .type 32; .endef @My-
function. This is true because FASTCALL only passes Function2@20: pushl %ebp movl %esp, %ebp subl
DWORD-sized parameters in registers, and floating point $16, %esp fldl 8(%ebp) fstpl −8(%ebp) fldl 16(%ebp)
numbers do not qualify. This means that our assumption fstpl −16(%ebp) fldl −8(%ebp) fld1 faddp %st, %st(1)
above was correct. fldl −16(%ebp) fldl LC5 faddp %st, %st(1) fmulp %st,
%st(1) flds 24(%ebp) fldl LC6 faddp %st, %st(1) fmulp
Here is the GCC listing for MyFunction3:
%st, %st(1) leave ret $20
.align 8 LC9: .long 0 .long 1073741824 .align 8 LC10:
.long 0 .long 1074266112 .globl @MyFunction3@20
.def @MyFunction3@20; .scl 2; .type 32; .endef @My- For this, we don't even need a floating-point number cal-
Function3@20: pushl %ebp movl %esp, %ebp subl culator, although you are free to use one if you wish (and
$16, %esp fldl 8(%ebp) fstpl −8(%ebp) fldl 16(%ebp) if you can find a good one). LC5 is added to [ebp - 16],
fstpl −16(%ebp) fldl −8(%ebp) fld1 faddp %st, %st(1) which we know to be y, and LC6 is added to [ebp - 24],
fldl −16(%ebp) fldl LC9 faddp %st, %st(1) fmulp %st, which we know to be z. Therefore, LC5 is the number
%st(1) flds 24(%ebp) fldl LC10 faddp %st, %st(1) fmulp “2.0”, and LC6 is the number “3.0”. Notice that the fld1
%st, %st(1) leave ret $20 instruction automatically loads the top of the floating-
point stack with the constant value “1.0”.
Here we can also see, after all the opening nonsense, that
[ebp - 8] (originally [ebp + 8]) is value x, and that [ebp
- 24] (originally [ebp - 24]) is value z. These parameters
are therefore passed right-to-left. Also, we can deduce
from the final fmulp instruction that the result is passed
in ST0. Again, the STDCALL function cleans its own
stack, as we would expect.
Conclusions
Difficulties
5.1.1 Code Optimization Another set of optimization which can be performed ei-
ther at the intermediate or at the code generation level are
An optimizing compiler is perhaps one of the most com- control flow optimizations. Most of these optimizations
plicated, most powerful, and most interesting programs in deal with the elimination of useless branches. Consider
existence. This chapter will talk about optimizations, al- the following code:
though this chapter will not include a table of common if(A) { if(B) { C; } else { D; } end_B: } else { E; } end_A:
optimizations.
62
5.2. OPTIMIZATION EXAMPLES 63
will probably never see one used in real code. 5.1.4 Inline Functions
What about the instruction:
The C and C++ languages allow the definition of an inline
mov eax, 0 type of function. Inline functions are functions which are
treated similarly to macros. During compilation, calls to
The mov instruction is relatively quick, but a faster part of an inline function are replaced with the body of that func-
the processor is the arithmetic unit. Therefore, it makes tion, instead of performing a call instruction. In addition
more sense to use the following instruction: to using the inline keyword to declare an inline function,
optimizing compilers may decide to make other functions
xor eax, eax inline as well.
Function inlining works similarly to loop unwinding for
because xor operates in very few processor cycles (and increasing code performance. A non-inline function re-
saves three bytes at the same time), and is therefore faster quires a call instruction, several instructions to create a
than a “mov eax, 0”. The only drawback of a xor instruc- stack frame, and then several more instructions to destroy
tion is that it changes the processor flags, so it cannot the stack frame and return from the function. By copying
be used between a comparison instruction and the cor- the body of the function instead of making a call, the size
responding conditional jump. of the machine code increases, but the execution time de-
creases.
It is not necessarily possible to determine whether iden-
5.1.3 Loop Unwinding tical portions of code were created originally as macros,
inline functions, or were simply copy and pasted. How-
When a loop needs to run for a small, but definite number ever, when disassembling it can make your work easier to
of iterations, it is often better to unwind the loop in or- separate these blocks out into separate inline functions, to
der to reduce the number of jump instructions performed, help keep the code straight.
and in many cases prevent the processor’s branch predic-
tor from failing. Consider the following C loop, which
calls the function MyFunction() 5 times: 5.2 Optimization Examples
for(x = 0; x < 5; x++) { MyFunction(); }
_n$[ebp] mov DWORD PTR _r$[ebp], edx ; Line 8 cmp assigns storage in the function, and readily discards
DWORD PTR _r$[ebp], 0 jne SHORT $L479 ; Line 10 values that are not needed.
mov eax, DWORD PTR _n$[ebp] jmp SHORT $L473
$L479: ; Line 12 mov ecx, DWORD PTR _n$[ebp]
mov DWORD PTR _m$[ebp], ecx ; Line 13 mov edx, 5.2.2 Example: Manual Optimization
DWORD PTR _r$[ebp] mov DWORD PTR _n$[ebp],
edx ; Line 14 jmp SHORT $L477 $L473: ; Line 15 mov The following lines of assembly code are not optimized,
esp, ebp pop ebp ret 0 _EuclidsGCD ENDP _TEXT but they can be optimized very easily. Can you find a way
ENDS END to optimize these lines?
mov eax, 1 test eax, eax je SHORT $L473
Notice how there is a very clear correspondence between
the lines of C code, and the lines of the ASM code. the The code in this line is the code generated for the “while(
addition of the "; line x” directives is very helpful in that 1 )" C code, to be exact, it represents the loop break con-
respect. dition. Because this is an infinite loop, we can assume
Next, we compile the same function using a series of op- that these lines are unnecessary.
timizations to stress speed over size: “mov eax, 1” initializes eax.
cl.exe /Tceuclids.c /Fa /Ogt2 the test immediately afterwards tests the value of eax
and we produce the following listing: to ensure that it is nonzero. because eax will always be
nonzero (eax = 1) at this point, the conditional jump can
PUBLIC _EuclidsGCD _TEXT SEGMENT _m$ = 8
be removed along whith the “mov” and the “test”.
; size = 4 _n$ = 12 ; size = 4 _EuclidsGCD PROC
NEAR ; Line 7 mov eax, DWORD PTR _m$[esp-4] The assembly is actually checking whether 1 equals 1.
push esi mov esi, DWORD PTR _n$[esp] cdq idiv Another fact is, that the C code for an infinite FOR loop:
esi mov ecx, edx ; Line 8 test ecx, ecx je SHORT for( ; ; ) { ... }
$L563 $L547: ; Line 12 mov eax, esi cdq idiv ecx ;
Line 13 mov esi, ecx mov ecx, edx test ecx, ecx jne
SHORT $L547 $L563: ; Line 10 mov eax, esi pop esi would not create such a meaningless assembly code to be-
; Line 15 ret 0 _EuclidsGCD ENDP _TEXT ENDS END gin with, and is logically the same as “while( 1 )".
As you can see, the optimized version is significantly 5.2.3 Example: Trace Variables
shorter then the non-optimized version. Some of the key
differences include: Here are the C code and the optimized assembly listing
from the EuclidGCD function, from the example above.
• The optimized version does not prepare a standard Can you determine which registers contain the variables
stack frame. This is important to note, because r and q?
many times new reversers assume that functions al- /*line 1*/ int EuclidsGCD(int m, int n) /*we want to find
ways start and end with proper stack frames, and this the GCD of m and n*/ { int q, r; /*q is the quotient, r is
is clearly not the case. EBP isnt being used, ESP isnt the remainder*/ while(1) { q = m / n; /*find q and r*/
being altered (because the local variables are kept in r = m % n; if(r == 0) /*if r is 0, return our n value*/ {
registers, and not put on the stack), and no subfunc- return n; } m = n; /*set m to the current n value*/ n = r;
tions are called. 5 instructions are cut by this. /*set n to our current remainder value*/ } /*repeat*/ }
PUBLIC _EuclidsGCD _TEXT SEGMENT _m$ = 8
• The “test EAX, EAX” series of instructions in the ; size = 4 _n$ = 12 ; size = 4 _EuclidsGCD PROC
non-optimized output, under ";line 4” is all unnec-
NEAR ; Line 7 mov eax, DWORD PTR _m$[esp-4]
essary. The while-loop is defined by “while(1)" and push esi mov esi, DWORD PTR _n$[esp] cdq idiv
therefore the loop always continues. this extra code
esi mov ecx, edx ; Line 8 test ecx, ecx je SHORT
is safely cut out. Notice also that there is no uncon- $L563 $L547: ; Line 12 mov eax, esi cdq idiv ecx ;
ditional jump in the loop like would be expected:
Line 13 mov esi, ecx mov ecx, edx test ecx, ecx jne
the “if(r == 0) return n;" instruction has become the SHORT $L547 $L563: ; Line 10 mov eax, esi pop esi
new loop condition.
; Line 15 ret 0 _EuclidsGCD ENDP _TEXT ENDS END
• The structure of the function is altered greatly: the
division of m and n to produce q and r is performed At the beginning of the function, eax contains m, and esi
in this function twice: once at the beginning of the contains n. When the instruction “idiv esi” is executed,
function to initialize, and once at the end of the eax contains the quotient (q), and edx contains the re-
loop. Also, the value of r is tested twice, in the mainder (r). The instruction “mov ecx, edx” moves r into
same places. The compiler is very liberal with how it ecx, while q is not used for the rest of the loop, and is
5.3. CODE OBFUSCATION 65
of obfuscations. Notice that many code optimizations Port0 Double-speed integer arithmetic, floating point
(discussed in the previous chapter) have the side-effect load, memory store
of making code more difficult to read, and therefore op-
timizations act as obfuscations. Port1 Double-speed integer arithmetic, floating point
arithmetic
• Code instructions that are put in a hard-to read or- Notice however that writing to memory is particularly
der. slow (requiring the address to be sent by Port3, and the
data itself to be written by Port0). Floating point num-
• Code instructions which are used in a non-obvious bers need to be loaded to the FPU before they can be
way. operated on, so a floating point load and a floating point
arithmetic instruction cannot operate on a single value in
a single instruction cycle. Therefore, it is not uncommon
This chapter will try to examine some common methods
to see floating point values loaded, integer values be ma-
of obfuscating code, but will not necessarily delve into
nipulated, and then the floating point value be operated
methods to break the obfuscation.
on.
Code transformations are a way of reordering code so that To disassemble an encrypted executable, you must first
it performs exactly the same task but becomes more dif- determine how the code is being decrypted. Code can be
ficult to trace and disassemble. We can best demonstrate decrypted in one of two primary ways:
this technique by example. Let’s say that we have 2 func-
tions, FunctionA and FunctionB. Both of these two func-
1. All at once. The entire code portion is decrypted in
tions are comprised of 3 separate parts, which are per-
a single pass, and left decrypted during execution.
formed in order. We can break this down as such:
Using a debugger, allow the decryption routine to
FunctionA() { FuncAPart1(); FuncAPart2(); FuncA- run completely, and then dump the decrypted code
Part3(); } FunctionB() { FuncBPart1(); FuncBPart2(); into a file for further analysis.
FuncBPart3(); }
2. By Block. The code is encrypted in separate blocks,
And we have our main program, that executes the two where each block may have a separate encryption
functions: key. Blocks may be decrypted before use, and re-
encrypted again after use. Using a debugger, you
main() { FunctionA(); FunctionB(); } can attempt to capture all the decryption keys and
then use those keys to decrypt the entire program
Now, we can rearrange these snippets to a form that is at once later, or you can wait for the blocks to be
much more complicated (in assembly): decrypted, and then dump the blocks individually to
a separate file for analysis.
main: jmp FAP1 FBP3: call FuncBPart3 jmp end FBP1:
call FuncBPart1 jmp FBP2 FAP2: call FuncAPart2 jmp
FAP3 FBP2: call FuncBPart2 jmp FBP3 FAP1: call
FuncAPart1 jmp FAP2 FAP3: call FuncAPart3 jmp 5.4 Debugger Detectors
FBP1 end:
5.4.3 PEB Debugger Check To detect SoftICE, there are a number of techniques that
can be used:
The Process Environment Block stores the value that Is-
DebuggerPresent queries to determine its return value. 1. Search for the SoftICE install directory. If SoftICE
To avoid suspicion, some programmers access the value is installed, the user is probably a hacker or a re-
directly from the PEB instead of calling the API func- verser.
tion. The following code snippet shows how to access the
value: 2. Detect the presence of int 1. SoftICE uses interrupt
1 to debug, so if interrupt 1 is installed, SoftICE is
mov eax, [fs:0x30] mov al, [eax+2] test al, al jne
running.
@DebuggerDetected
5.4.5 Timeouts
Debuggers can put break points in the code, and can
therefore stop program execution. A program can detect
this, by monitoring the system clock. If too much time
has elapsed between instructions, it can be determined
that the program is being stopped and analyzed (although
this is not always the case). If a program is taking too
much time, the program can terminate.
Notice that on preemptive multithreading systems, such
as modern Windows or Linux systems will switch away
from your program to run other programs. This is called
thread switching. If the system has many threads to run,
or if some threads are hogging processor time, your pro-
gram may detect a long delay and may falsely determine
that the program is being debugged.
http://www.opensolaris.org/os/community/
dtrace/
http://www.opensolaris.org/os/community/
6.1.1 Wikimedia Resources mdb/
Wikibooks
• Free Debugging Tools, Static Source Code Analysis
• X86 Assembly Tools, Bug Trackers
70
6.3. MANUAL OF STYLE 71
• Yurichev, Dennis, “An Introduction To Reverse This book has a global stylesheet that can be loaded
Engineering for Beginners”. Online book: http: for you. Go to the Gadgets tab at Special:Preferences,
//yurichev.com/writings/RE_for_beginners-en.pdf and activate the "Per-book Javascript and Stylesheets"
gadget.
• Eilam, Eldad. “Reversing: Secrets of Reverse En-
gineering.” 2005. Wiley Publishing Inc. ISBN
0764574817
6.2 Licensing
6.2.1 Licensing
This book is released under the following license:
Chapter 7
7.1 Text
• Wikibooks:Collections Preface Source: https://en.wikibooks.org/wiki/Wikibooks%3ACollections_Preface?oldid=2842060 Contribu-
tors: RobinH, Whiteknight, Jomegat, Mike.lifeguard, Martin Kraus, Adrignola, Magesha and MadKaw
• X86 Disassembly/Cover Source: https://en.wikibooks.org/wiki/X86_Disassembly/Cover?oldid=2595883 Contributors: Whiteknight, Ick-
toofay and Anonymous: 2
• X86 Disassembly/Introduction Source: https://en.wikibooks.org/wiki/X86_Disassembly/Introduction?oldid=2370674 Contributors:
DavidCary and Whiteknight
• X86 Disassembly/Assemblers and Compilers Source: https://en.wikibooks.org/wiki/X86_Disassembly/Assemblers_and_Compilers?
oldid=3018566 Contributors: DavidCary, Panic2k4, AlbertCahalan, Whiteknight, Az1568, Gcaprino, Scientes, Sigma 7, Adrignola, Jf-
mantis, EleoTager, Artoria2e5, Syum90 and Anonymous: 22
• X86 Disassembly/Disassemblers and Decompilers Source: https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_
Decompilers?oldid=3170911 Contributors: DavidCary, Mshonle, Panic2k4, AlbertCahalan, Quoth-22, Whiteknight, Mike Van Emmerik,
Koavf, Mdupont, 0xf001, Jkl, MichaelFrey, Svdb, Herbythyme, Macpunk, C1de0x, Ysangkok, Phatom87, Gannalech, SamB, Sponge-
bob88, QuiteUnusual, Afog, Adrignola, Duplode, JamesCrook, Voomoo, M.boli, Jfmantis, EleoTager, Artoria2e5, Chip Wildon Forster,
C4Decompiler, Aquynh, Andy80586, Xradonx, Sfrlz, Mrexodia and Anonymous: 101
• X86 Disassembly/Disassembly Examples Source: https://en.wikibooks.org/wiki/X86_Disassembly/Disassembly_Examples?oldid=
1232569 Contributors: Whiteknight and Anonymous: 1
• X86 Disassembly/Analysis Tools Source: https://en.wikibooks.org/wiki/X86_Disassembly/Analysis_Tools?oldid=3094739 Contributors:
Utcursch, Panic2k4, Marcika, AlbertCahalan, Quoth-22, Whiteknight, Jomegat, Kaosone, Perpetuum~enwikibooks, Hagindaz, Wiki-
moder~enwikibooks, Dr Dnar, Macpunk, Frozen dude, AnthonyD~enwikibooks, Spongebob88, MohammadEbrahim, QuiteUnusual,
Jodell1, Adrignola, Jfmantis, KenMacD, Rohitab, Rotlink, Artoria2e5, IamMe3141 and Anonymous: 65
• X86 Disassembly/Microsoft Windows Source: https://en.wikibooks.org/wiki/X86_Disassembly/Microsoft_Windows?oldid=3137993
Contributors: Panic2k4, Quoth-22, Whiteknight, Hexed321, Chazz, Mantis~enwikibooks, Wj32, Gcaprino, Adrignola, Dennis714,
Gary600playsmc, Luis150902 and Anonymous: 35
• X86 Disassembly/Windows Executable Files Source: https://en.wikibooks.org/wiki/X86_Disassembly/Windows_Executable_Files?
oldid=3088866 Contributors: Quoth-22, Whiteknight, Shokuku, Barthax, Hexed321, Dr Dnar, Gcaprino, Chris.digiamo, Van der Hoorn,
Adrignola, LaZ0r, EroCarrera, Ashpilkin, Self~enwikibooks, CallumPoole, Luis150902, Cwilson2016 and Anonymous: 31
• X86 Disassembly/Linux Source: https://en.wikibooks.org/wiki/X86_Disassembly/Linux?oldid=2027237 Contributors: Whiteknight, Dr
Dnar, Gcaprino, Recent Runes, MohammadEbrahim, Adrignola, Swatnio~enwikibooks and Anonymous: 10
• X86 Disassembly/Linux Executable Files Source: https://en.wikibooks.org/wiki/X86_Disassembly/Linux_Executable_Files?oldid=
2748762 Contributors: Orderud, Whiteknight, Ddouthitt, Gcaprino, ChrisR~enwikibooks, Ulf Abrahamsson~enwikibooks, Artoria2e5
and Anonymous: 2
• X86 Disassembly/The Stack Source: https://en.wikibooks.org/wiki/X86_Disassembly/The_Stack?oldid=2622875 Contributors:
Whiteknight, Dr Dnar, Swift, Mantis~enwikibooks, Gcaprino, Gannalech, Jsvcycling, Jfmantis, X-Fi6 and Anonymous: 17
• X86 Disassembly/Functions and Stack Frames Source: https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_
Frames?oldid=3064266 Contributors: Whiteknight, Hagindaz, Mantis~enwikibooks, Gcaprino, Gannalech, Svick, Jfmantis and Anony-
mous: 26
• X86 Disassembly/Functions and Stack Frame Examples Source: https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_
Stack_Frame_Examples?oldid=2759822 Contributors: Whiteknight, NipplesMeCool, Jfmantis and Anonymous: 2
• X86 Disassembly/Calling Conventions Source: https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions?oldid=3118519
Contributors: DavidCary, Whiteknight, Mantis~enwikibooks, Gcaprino, Sigma 7, Timjr~enwikibooks, Crazy Ivan, Jfmantis and Anony-
mous: 22
72
7.2. IMAGES 73
7.2 Images
• File:1Fh_01.png Source: https://upload.wikimedia.org/wikibooks/en/a/af/1Fh_01.png License: Fair use Contributors: ? Original artist: ?
• File:C_language_building_steps.png Source: https://upload.wikimedia.org/wikipedia/commons/b/b3/C_language_building_steps.png
License: CC-BY-SA-3.0 Contributors: ? Original artist: ?
• File:C_language_do_while.png Source: https://upload.wikimedia.org/wikipedia/commons/2/21/C_language_do_while.png License: CC
BY 3.0 Contributors: Own work Original artist: Thedsadude
• File:C_language_for.png Source: https://upload.wikimedia.org/wikipedia/commons/5/51/C_language_for.png License: CC BY 3.0 Con-
tributors: Own work Original artist: Thedsadude
• File:C_language_if.png Source: https://upload.wikimedia.org/wikipedia/commons/f/fb/C_language_if.png License: CC BY 3.0 Contrib-
utors: Own work Original artist: Thedsadude
• File:C_language_if_else.png Source: https://upload.wikimedia.org/wikipedia/commons/a/ac/C_language_if_else.png License: CC BY
3.0 Contributors: Own work Original artist: Thedsadude
• File:C_language_linked_list.png Source: https://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png License:
CC BY 3.0 Contributors: Own work Original artist: Thedsadude
• File:Data_stack.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/29/Data_stack.svg License: Public domain Contribu-
tors: made in Inkscape, by myself User:Boivie. Based on Image:Stack-sv.png, originally uploaded to the Swedish Wikipedia in 2004 by
sv:User:Shrimp Original artist: User:Boivie
• File:Elf-layout--en.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/77/Elf-layout--en.svg License: CC BY-SA 3.0 Con-
tributors: Own work Original artist: Surueña
• File:Heckert_GNU_white.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/22/Heckert_GNU_white.svg License: CC
BY-SA 2.0 Contributors: gnu.org Original artist: Aurelio A. Heckert <[email protected]>
74 CHAPTER 7. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES