0% found this document useful (0 votes)
522 views334 pages

Master PDF

Uploaded by

Mario Sang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
522 views334 pages

Master PDF

Uploaded by

Mario Sang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 334

Static Single Assignment for Decompilation

by

Michael James Van Emmerik

B.E. (Hons), B.Sc (Comp. Sc.)

A thesis submitted for the degree of

Doctor of Philosophy

at

The University of Queensland

Submitted 3rd May 2007

c 2007 Mike Van Emmerik


School of Information Technology and Electrical Engineering


ii QUOTATIONS

"Life would be much easier if I had the source code." [Jar04]

Use the source, Luke! [Tho96]

 To extract the mythic essence, mere detail must become subservient to a

deeper truth. Prospero, speaking to the Dive team in [Ega98].

The last quote could have been describing the process of decompilation: machine specic

detail is discarded, leading to the essence of the program (source code), from which it

is possible to divine a deeper truth (understanding of the program).

AT X.
Produced with LYX and L E
iii

Statement of Originality
I declare that the work presented in the thesis is, to the best of my knowledge and

belief, original and my own work, except as acknowledged in the text, and that the

material has not been submitted, either in whole or in part, for a degree at this or any

other university.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(Mike Van Emmerik, candidate)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(Professor Paul Bailes, principal advisor)


iv

Acknowledgements
I would like to rstly thank my associate advisor, Cristina Cifuentes, for her inspiration,

hard work nding grant money to pay me during the binary translation years (1997

to 2001), ideas, and suggestions. By extension, I'm indebted to Cristina's supervisors,

Professors John Gough and Bill Caelli, who came up with the idea of using decompila-

tion for security analysis, and as a PhD topic. My primary advisor, Prof. Paul Bailes,

was also very helpful, particularly for showing me the value of writing thesis material

as early as possible during candidature.

In the rst years of the IBM PC and clones (early 1980s), Ron Harris and I would

disassemble scores of Z80 and 8086 programs. Thanks, Ron; they were the best years!

By the late 1980s, Glen McDiarmid was inspiring me (and the rest of the world) with a

truly great interactive disassembler. Thanks also to Glenn Wallace, who in about 1992

started a pattern based decompiler, which we then both worked on, called dc.

I am indebted to several people for their helpful discussions. Ilfak Guilfanov, author of

IDA Pro, was one of these, and was kind enough to drive from a neighbouring country

for one such discussion. Alan Mycroft's paper Type-Based Decompilation was an

early inspiration for investigating type analysis [Myc99]. Two of Alan's students were

also inuential. Eben Upton pointed out to me that type analysis for aggregates is

a dierent problem to analysis of the more elementary types. The other was Jeremy

Singer, whose ideas on Static Single Information triggered many positive ideas.

The theory in this thesis has been tested on an open source decompiler platform called

Boomerang. I am very grateful to Boomerang co-developer Trent Waddington, who

wrote the pre-SSA data ow code, early preservation code (his proof engine), the

idea of propagating %flags, early Statement class hierarchy, and much else. His code

helped make it possible for me to test the theory without having to write a complete

decompiler from scratch. He demonstrated what could be achieved with ad hoc type

analysis, and also helped by participating in long discussions. I'd like to thank also those

open source developers who helped test and maintain Boomerang, particularly Gerard

Krol, Emmanuel Fleury and his students [BDMP05], Mike Melanson, Mike tamlin

Nordell, and Luke indel Dunstan. Thanks, guys, it's much easier to maintain a big

project with more people.


v

Boomerang is based in part on the University of Queensland Binary Translator (UQBT),

which was funded in large part by Sun Microsystems. UQBT beneted from the work of

many students. In particular, I'd like to single out Doug Simon, for the early parameter

identication work, and Shane Sendall, for the Semantic Specication Language and

supporting code.

Email correspondents were also helpful with more general discussions about decompila-

tion, particularly Jeremy Smith and Raimar Falke. Ian Peake showed how to implement
AT X and L X
the one-sentence summaries, while Daniel Jarrott was very helpful with L E Y
problems, and generally with discussions.

Gary Holden and Paul Renner of LEA Detection System, Inc. were the rst clients

to make use of Boomerang for commercial purposes. Their decompilation problem

provided valuable real-world experience, and Boomerang developed signicantly during

this period.

I am indebted to Prof. Simon Kaplan for supporting my scholarship, even though the

combined teaching and research plan did not work out in the long term.

Thanks to Jens Tröger and Trent Waddington for reviewing drafts of this thesis; I'm

sure the readers will appreciate the improvements you suggested.

In any long list of acknowledgements, some names are inevitably left out. If you belong

on this list and have not been mentioned, please don't feel hurt; your contributions are

appreciated.

Last but by no means least, thanks to my wife Margaret and daughter Felicity for

putting up with the long hours.


vi

List of Publications
M. Van Emmerik and T. Waddington. Using a Decompiler for Real-World Source

Recovery. In the Working Conference on Reverse Engineering, Delft, Netherlands, 9th-

12th November 2004.

C. Cifuentes, M. Van Emmerik, N. Ramsey and B. Lewis. Experience in the Design,

Implementation and Use of a Retargetable Static Binary Translation Framework. Sun

Microsystems Laboratories, Technical Report TR-2002-105, January 2002.

C Cifuentes. and M. Van Emmerik. Recovery of Jump Table Case Statements from

Binary Code. In Science of Computer Programming, 40 (2001): 171-188.

C. Cifuentes, T. Waddington, and M. Van Emmerik. Computer Security Analysis

through Decompilation and High-Level Debugging. Decompilation Techniques Work-

shop, Proceedings of the Eighth Working Conference on Reverse Engineering, Stuttgart,

Germany, October 2001. IEEE-CS Press, pp 375-380.

C. Cifuentes and M. Van Emmerik. UQBT: Adaptable Binary Translation at Low Cost.

In Computer 33(3), March 2000. IEEE Computer Society Press, pp 60-66.

C. Cifuentes, M. Van Emmerik, D. Ung, D. Simon and T. Waddington. Preliminary

Experiences with the Use of the UQBT Binary Translation Framework. In Proceedings

of the Workshop on Binary Translation, NewPort Beach, Oct 16, 1999. Technical

Committee on Computer Architecture Newsletter , IEEE-CS Press, Dec 1999, pp 12-

22.

C. Cifuentes, M. Van Emmerik, and N. Ramsey. The Design of a Resourceable and

Retargetable Binary Translator. In Proceedings of the Sixth Working Conference on

Reverse Engineering, Atlanta, USA, October1999. IEEE-CS Press, pp 280-291.

C. Cifuentes and M. Van Emmerik. Recovery of Jump Table Case Statements from

Binary Code. Proceedings of the International Workshop on Program Comprehension,

Pittsburgh, USA, May 1999, IEEE-CS Press, pp 192-199.

M. Van Emmerik. Identifying Library Functions in Executable Files Using Patterns.

In Proceedings of the 1998 Australian Software Engineering Conference, Adelaide, 9th

to 13th November, 1998. IEEE-CS Press, pp 90-97.


vii

Abstract
Static Single Assignment enables the ecient implementation of many important de-

compiler components, including expression propagation, preservation analysis, type an-

alysis, and the analysis of indirect jumps and calls.

Source code is an essential part of all software development. It is so valuable that

when it is not available, it can be worthwhile deriving it from the executable form of

computer programs through the process of decompilation. There are many applications

for decompiled source code, including inspections for malware, bugs, and vulnerabilities;

interoperability; and the maintenance of an application that has some or all source code

missing. Existing machine code decompilers, in contrast to existing decompilers for

Java and similar platforms, have signicant deciencies. These include poor recovery

of parameters and returns, poor handling of indirect jumps and calls, and poor to

nonexistent type analysis. It is shown that use of the Static Single Assignment form

(SSA form) enables many of these deciencies to be overcome. SSA enables or assists

with

• data ow analysis, particularly expression propagation;

• the identication of parameters and returns, without assuming ABI compliance;

• preservation analysis (whether a location is preserved across a call), which is

needed for analysing parameters and return locations;

• type analysis, implemented as a sparse data ow problem; and

• the analysis of indirect jumps and calls.

Expression propagation is a key element of a decompiler, since it allows long sequences

of individual instruction semantics to be combined into more complex, high level state-

ments. Parameters, returns, and types are features of high level languages that do

not appear explicitly in machine code programs, hence their recovery is important for

readability and the ability to recompile the generated code. In addition, type analysis

is either absent from existing machine code decompilers, or is limited to a relatively


viii

simple propagation of types from library function calls. The analysis of indirect jumps

and calls is important for nding all code in a machine code program, and enables

the translation of important high level program elements such as switch statements,

assigned gotos, virtual function calls, and calls through function pointers.

Because of these challenges, machine code decompilers are the most interesting case.

Existing machine code decompilers are weak at identifying parameters and returns,

particularly where parameters are passed in registers, or the calling convention is non

standard. A general analysis of parameters and returns is demonstrated, using new

devices such as Collectors. These analyses become more complex in the presence of

recursion. The elimination of redundant parameters and returns are shown to be global

analyses, implying that for a general decompiler, procedures can not be nalised until

all other procedures are analysed.

Full type analysis is discussed, where the semantics of individual instructions, as well as

information from library calls, contribute to the solution. A sparse, iterative, data ow

based approach is compared with the more common constraint based approach. The

former requires special functions to handle the multiple constraints that result from

overloaded operators such as addition and subtraction. Special problems arise with

aggregate types (arrays and structures), and address taking of variables.

Indirect branch instructions are often handled at instruction decode time. Delaying

analysis until the program is represented in SSA form allows more powerful techniques

such as expression propagation to be used. This results in a simpler, more general

analysis, at the cost of having to throw away some results and restart some analyses. It

is shown that this technique easily extends to handling Fortran assigned gotos, which

can not be eectively analysed at decode time. The analysis of indirect call instructions

has the potential for enabling the recovery of object oriented virtual function calls.

Many of the techniques presented in this thesis have been veried with the Boomerang

open source decompiler. The goal of extending the state of the art of machine code

decompilation has been achieved. There are of course still some areas left for future

work. The most promising areas for future research have been identied as range

analysis and alias analysis.


Contents

Quotations ii

Acknowledgements iv

List of Publications vi

Abstract vii

List of Figures xvii

List of Tables xxiv

List of Algorithms xxiv

List of Abreviations xxv

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl

Summary xliii

1 Introduction 1

1.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Forward and Reverse Engineering . . . . . . . . . . . . . . . . . . . . . 6

1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Decompiler as Browser . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.2 Automated Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.3 Recompilable Code, Automatically Generated . . . . . . . . . . 11

ix
x

1.3.4 Recompilable, Maintainable Code . . . . . . . . . . . . . . . . . 13

1.4 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Reverse Engineering Tool Problems . . . . . . . . . . . . . . . . . . . . 16

1.5.1 Separating Code from Data . . . . . . . . . . . . . . . . . . . . 18

1.5.2 Separating Pointers from Constants . . . . . . . . . . . . . . . . 19

1.5.3 Separating Original from Oset Pointers . . . . . . . . . . . . . 19

1.5.4 Tools Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.5.4.1 Disassemblers . . . . . . . . . . . . . . . . . . . . . . . 23

1.5.4.2 Assembly Decompilers . . . . . . . . . . . . . . . . . . 23

1.5.4.3 Object Code Decompilers . . . . . . . . . . . . . . . . 24

1.5.4.4 Virtual Machine Decompilers . . . . . . . . . . . . . . 25

1.5.4.5 Limitations of Existing Decompilers . . . . . . . . . . 26

1.5.5 Theoretical Limits and Approximation . . . . . . . . . . . . . . 27

1.6 Legal Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.7 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.8 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Decompiler Review 33

2.1 Machine Code Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 Object Code Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3 Assembly Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4 Decompilers for Virtual Machines . . . . . . . . . . . . . . . . . . . . . 42

2.4.1 Java decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.4.2 CLI Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5 Decompilation Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6.1 Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6.2 Decompilation of DSP Assembly Language . . . . . . . . . . . . 46

2.6.3 Link-Time Optimisers . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.4 Synthesising to Hardware . . . . . . . . . . . . . . . . . . . . . 48


xi

2.6.5 Binary Translation . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6.6 Instruction Set Simulation . . . . . . . . . . . . . . . . . . . . . 49

2.6.7 Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . 50

2.6.8 Proof-Carrying Code . . . . . . . . . . . . . . . . . . . . . . . . 50

2.6.9 Safety Checking of Machine Code . . . . . . . . . . . . . . . . . 51

2.6.10 Traditional Reverse Engineering . . . . . . . . . . . . . . . . . . 52

2.6.11 Compiler Infrastructures . . . . . . . . . . . . . . . . . . . . . . 53

2.6.11.1 LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.6.11.2 SUIF2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.6.11.3 COINS . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.6.11.4 SCALE . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.6.11.5 GCC Tree SSA . . . . . . . . . . . . . . . . . . . . . . 56

2.6.11.6 Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.6.11.7 Open64 . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.6.12 Simplication of Mathematical Formulae . . . . . . . . . . . . . 59

2.6.13 Obfuscation and Protection . . . . . . . . . . . . . . . . . . . . 59

3 Data Flow Analysis 61

3.1 Expression Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.2 Limiting Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3 Dead Code Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.3.1 Condition Code Combining . . . . . . . . . . . . . . . . . . . . 70

3.3.2 x86 Floating Point Compares . . . . . . . . . . . . . . . . . . . 75

3.4 Summarising Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.1 Call Related Terminology . . . . . . . . . . . . . . . . . . . . . 77

3.4.2 Caller/Callee Context . . . . . . . . . . . . . . . . . . . . . . . 79

3.4.3 Globals, Parameters, and Returns . . . . . . . . . . . . . . . . . 81

3.4.4 Call Summary Equations . . . . . . . . . . . . . . . . . . . . . . 83

3.4.5 Stack Pointer as Parameter . . . . . . . . . . . . . . . . . . . . 86

3.5 Global Data Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 87


xii

3.6 Safe Approximation of Data Flow Information . . . . . . . . . . . . . . 90

3.7 Overlapped Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.7.1 Sub-elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4 SSA Form 97

4.1 Applying SSA to Registers . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.1.1 Benets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.1.2 Translating out of SSA form . . . . . . . . . . . . . . . . . . . . 102

4.1.2.1 Unused Denitions with Side Eects . . . . . . . . . . 106

4.1.2.2 SSA Back Translation Algorithms . . . . . . . . . . . . 107

4.1.2.3 Allocating Variables versus Register Colouring . . . . . 108

4.1.3 Extraneous Local Variables . . . . . . . . . . . . . . . . . . . . 110

4.2 Applying SSA to Memory . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.2.1 Problems Resulting from Aliasing . . . . . . . . . . . . . . . . . 114

4.2.2 Subscripting Memory Locations . . . . . . . . . . . . . . . . . . 116

4.2.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.3 Preserved Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.3.1 Other Data Flow Anomalies . . . . . . . . . . . . . . . . . . . . 123

4.3.2 Final Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.3.3 Bypassing Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.4 Recursion Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.4.1 Procedure Processing Order . . . . . . . . . . . . . . . . . . . . 128

4.4.2 Conditional Preservation . . . . . . . . . . . . . . . . . . . . . . 130

4.4.3 Redundant Parameters and Returns . . . . . . . . . . . . . . . . 134

4.5 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

4.5.1 Collector Applications . . . . . . . . . . . . . . . . . . . . . . . 139

4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.7 Other Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

4.7.1 Value Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 142


xiii

4.7.2 Static Single Information (SSI) . . . . . . . . . . . . . . . . . . 142

4.7.3 Dependence Flow Graph (DFG) . . . . . . . . . . . . . . . . . . 145

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5 Type Analysis for Decompilers 149

5.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.2 Type Analysis for Machine Code . . . . . . . . . . . . . . . . . . . . . 152

5.2.1 The Role of Types . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.2.2 Types in High Level Languages . . . . . . . . . . . . . . . . . . 154

5.2.3 Elementary and Aggregate Types . . . . . . . . . . . . . . . . . 155

5.2.4 Running Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.3 Sources of Type Information . . . . . . . . . . . . . . . . . . . . . . . . 158

5.4 Typing Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.5 Type Constraint Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . 162

5.5.1 Arrays and Structures . . . . . . . . . . . . . . . . . . . . . . . 165

5.6 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.7 Data Flow Based Type Analysis . . . . . . . . . . . . . . . . . . . . . . 168

5.7.1 Type Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.7.2 SSA-Based Type Analysis . . . . . . . . . . . . . . . . . . . . . 173

5.7.3 Typing Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 174

5.7.4 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . 177

5.8 Type Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

5.9 Partitioning the Data Sections . . . . . . . . . . . . . . . . . . . . . . . 186

5.9.1 Colocated Variables . . . . . . . . . . . . . . . . . . . . . . . . . 187

5.10 Special Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

5.11 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

5.12 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

5.13 SSA Enablements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193


xiv

6 Indirect Jumps and Calls 195

6.1 Incomplete Control Flow Graphs . . . . . . . . . . . . . . . . . . . . . 196

6.2 Indirect Jump Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 198

6.2.1 Switch Statements . . . . . . . . . . . . . . . . . . . . . . . . . 198

6.2.1.1 Minimum Case Value . . . . . . . . . . . . . . . . . . . 200

6.2.1.2 No Compare for Maximum Case Value . . . . . . . . . 201

6.2.2 Assigned Goto Statements . . . . . . . . . . . . . . . . . . . . . 201

6.2.2.1 Other Indirect Jumps . . . . . . . . . . . . . . . . . . 203

6.2.2.2 Other Branch Tree Cases . . . . . . . . . . . . . . . . 204

6.2.3 Sparse Switch Statements . . . . . . . . . . . . . . . . . . . . . 205

6.3 Indirect Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

6.3.1 Virtual Function Calls . . . . . . . . . . . . . . . . . . . . . . . 209

6.3.1.1 Data Flow and Virtual Function Target Analyses . . . 211

6.3.1.2 Null-Preserved Pointers . . . . . . . . . . . . . . . . . 217

6.3.2 Recovering the Class Hierarchy . . . . . . . . . . . . . . . . . . 218

6.3.3 Function Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 219

6.3.3.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . 219

6.3.4 Splitting Functions . . . . . . . . . . . . . . . . . . . . . . . . . 220

6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7 Results 223

7.1 Industry Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

7.2 Limiting Expression Propagation . . . . . . . . . . . . . . . . . . . . . 224

7.3 Preventing Extraneous Local Variables . . . . . . . . . . . . . . . . . . 230

7.4 Preserved Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

7.5 Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

7.5.1 Conditional Preservation Analysis . . . . . . . . . . . . . . . . . 235

7.6 Redundant Parameters and Returns . . . . . . . . . . . . . . . . . . . . 240


xv

8 Conclusion 243

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

8.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 245

8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

Bibliography 249
xvi
List of Figures

1.1 Symmetry between a compiler and a decompiler. . . . . . . . . . . . . . 2

1.2 An obfuscated C program; it prints the lyrics for The Twelve Days of

Christmas (all 64 lines of text). From [Ioc88]. . . . . . . . . . . . . . . 5

1.3 Part of a traditional reverse engineering (from source code) of the Twelve

Days of Christmas obfuscated program of Figure 1.2. From [Bal98]. . . 5

1.4 The machine code decompiler and its relationship to other tools and

processes. Parts adapted from Figure 2 of [Byr92]. . . . . . . . . . . . . 7

1.5 Applications of decompilation, which depend on how the output is to be

used, and whether the user modies the automatically generated output. 9

1.6 A program illustrating the problem of separating original and oset

pointers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.7 Information and separations lost at various stages in the compilation of

a machine code program. . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.8 Assembly language output for the underlined code of Figure 1.6(a), pro-

duced by the GCC compiler. . . . . . . . . . . . . . . . . . . . . . . . . 24

1.9 Disassembly of the underlined code from Figure 1.6(a) starting with ob-

ject code. Intel syntax. Compare with Figure 1.6(b), which started with

machine code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1 LLVM can be used to compile and optimise a program at various stages

of its life. From [Lat02]. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.2 Overview of the SUIF2 compiler infrastructure. From [Lam00]. . . . . . 55

+
2.3 Overview of the COINS compiler infrastructure. From Fig. 1 of [SFF 05]. 55

2.4 Scale Data Flow Diagram. From [Sca01]. . . . . . . . . . . . . . . . . . 56

2.5 The various IRs used in the GCC compiler. From a presentation by D.

Novillo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xvii
xviii

2.6 Overview of the Phoenix compiler infrastructure and IR. From [MRU07]. 57

2.7 The various levels of the WHIRL IR used by the Open64 compiler in-

frastructure. From [SGI02]. . . . . . . . . . . . . . . . . . . . . . . . . 58

3.1 Basic components of a machine code decompiler. . . . . . . . . . . . . . 61

3.2 Original source code for the combinations program. . . . . . . . . . . . 62

3.3 First part of the compiled machine code for procedure comb of Figure 3.2. 62

3.4 IR for the rst seven instructions of the combinations example. . . . . . 66

3.5 Two machine instructions referring to the same memory location us-

ing dierent registers. /f is the oating point division operator, and

(double) is the integer to oating point conversion operator. . . . . . . 67

3.6 Excessive propagation. From [VEW04]. . . . . . . . . . . . . . . . . . . 68

3.7 The circumstance where limiting expression propagation results in more

readable decompiled code. . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.8 The subtract immediate from stack pointer instruction from register esp
from Figure 3.4(a) including the side eect on the condition codes. . . . 71

3.9 Combining condition codes in a decrement and branch sequence. . . . . 72

3.10 Code from the running example where the carry ag is used explicitly. . 74

3.10 (continued). Code from the running example where the carry ag is used

explicitly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.11 80386 code for the oating point compare in the running example. . . . 76

3.12 Intermediate representation illustrating call related terminology. . . . . 78

3.13 Caller/callee context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.14 Potential problems caused by locations undened on some paths through

a called procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.15 Solution to the problem of Figure 3.14. . . . . . . . . . . . . . . . . . . 83

3.16 Interprocedural data ow analysis. Adapted from [SW93]. . . . . . . . 88

3.17 Transmission of liveness from use to denition through a call. . . . . . . 91

3.18 Transmission of liveness from use to parameter through a call. . . . . . 92

3.19 Overlapped registers in the x86 architecture. . . . . . . . . . . . . . . . 93

4.1 The main loop of the running example and its equivalent SSA form. . . 99
xix

4.2 A propagation not normally possible is enabled by the SSA form. . . . 101

4.3 Part of the main loop of the running example, after the propagation of

the previous section and dead code elimination. . . . . . . . . . . . . . 104

4.4 The IR of Figure 4.3 after transformation out of SSA form. . . . . . . . 104

4.5 The code of Figure 4.3, with the loop condition optimised to num>r. . . 105

4.6 Two possible transformations out of SSA form for the code of Figure 4.5. 105

4.7 A version of the running example where an unused denition has not

been eliminated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.8 Incorrect results from translating the code of Figure 4.7 out of SSA form,

considering only the liveness of variables. . . . . . . . . . . . . . . . . . 107

4.9 Sreedhar's coalescing algorithm. Adapted from [SDGS99]. . . . . . . . . 108

4.10 The code from Figure 4.1 after exhaustive expression propagation, show-

ing the overlapping live ranges. . . . . . . . . . . . . . . . . . . . . . . 110

4.11 Generated code from a real decompiler with extraneous variables for the

IR of Figure 4.10. Copy statements inserted before the loop are not shown.111

4.12 Live ranges for x2 and x3 when x3 := af(x2 ) is propagated inside a loop. 111

4.13 Live ranges for x2 and x3 when y1 := x2 - 1 is propagated across an over-


writing statement inside a loop. . . . . . . . . . . . . . . . . . . . . . . 112

4.14 A version of the running example using pairs of integers. . . . . . . . . 115

4.15 A recursive version comb from the running example, where the frame

pointer (ebp) has not yet been shown to be preserved because of a re-

cursive call. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.16 The running example with a debug line added. . . . . . . . . . . . . . . 117

4.17 The example of Figure 4.14 with expression propagation before renaming

memory expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.18 The eect of ignoring a restored location. The last example uses a call-

by-reference parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

4.19 Pseudo code for a procedure. It uses three xor instructions to swap

registers a and b at the beginning and end of the procedure. Eectively,


register a is saved in register b during the execution of the procedure. . 121

4.20 Pseudo code for the procedure of Figure 4.19 in SSA form. Here it

is obvious (after expression propagation) that a is preserved, but b is

overwritten. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
xx

4.21 The procedure of gure 4.19 with two extra statements. . . . . . . . . . 122

4.22 A version of the running example with the push and pop of ebx removed,
illustrating how preservation analysis handles φ-functions. . . . . . . . 123

4.23 Analysing preserved parameters using propagation and dead code elimi-

nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.24 Part of the IR for the program of Figure 4.15. . . . . . . . . . . . . . . 125

4.25 A small part of the call graph for the 253.perlbmk SPEC CPU2000 bench-

mark program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.26 A call graph illustrating the algorithm for nding the correct ordering

for processing procedures. . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.27 A simplied control ow graph for the program of Figure 4.15. . . . . . 132

4.28 Two possible lattices for preservation analysis. . . . . . . . . . . . . . . 134

4.29 Example program illustrating that not all parameters to recursive calls

can be ignored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.30 Use of collectors for call bypassing, caller and callee contexts, arguments

(only for childless calls), results, denes (also only for childless calls),

and modieds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.31 The weak update problem for malloc blocks. From Fig. 1 of [BR06]. . . 140

4.32 The code of Figure 4.31(a) in SSA form. . . . . . . . . . . . . . . . . . 141

4.33 An example program from [Sin03]. . . . . . . . . . . . . . . . . . . . . 143

4.34 IR of the optimised machine code output from Figure 4.33. . . . . . . . 144

4.35 A comparison of IRs for the program of Figure 4.1. Only a few def-use

chains are labelled, for simplicity. After Figure 1 of [JP93]. . . . . . . . 146

5.1 Type analysis is a major component of a machine code decompiler. . . 150

5.2 Organisation of the CodeSurfer/x86 and companion tools. From [RBL06]. 152

5.3 Elementary and aggregate types at the machine code level. . . . . . . . 156

5.4 The decompilation of two machine code programs processing an array of

ten characters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.5 A program referencing two dierent types from the same pointer. . . . 157

5.6 Pointer equality comparisons. . . . . . . . . . . . . . . . . . . . . . . . 160

5.7 A pointer ordering comparison. . . . . . . . . . . . . . . . . . . . . . . 161


xxi

5.8 A simple program fragment typed using constraints. From [Myc99]. . . 164

5.9 Constraints for the two instruction version of the above. Example from

[Myc99]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.10 A program fragment illustrating how a pointer can initially appear not

be be a structure pointer, but is later used as a structure pointer. . . . 167

5.11 Illustrating the greatest lower bound. . . . . . . . . . . . . . . . . . . . 171

5.12 A simplied lattice of types for decompilation. . . . . . . . . . . . . . . 172

5.13 Class pointers and references. . . . . . . . . . . . . . . . . . . . . . . . 172

5.14 Typing a simple expression. . . . . . . . . . . . . . . . . . . . . . . . . 174

5.15 Complexity of the ascend and descend type algorithms. . . . . . . . . . 176

5.16 Source code for accessing the rst element of an array with a nonzero

lower index bound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

5.17 Equivalent programs which use the representation m[pl + K]. . . . . . 184

5.18 A type lattice fragment relating structures containing array elements,

array elements, structure members, and plain variables. . . . . . . . . . 185

5.19 Various congurations of live ranges for one variable. . . . . . . . . . . 187

5.20 A program with colocated variables and taking the address. . . . . . . 189

5.21 Nested structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

6.1 Decoding of instructions in a machine code decompiler. . . . . . . . . . 195

6.2 Example program with switch statement. . . . . . . . . . . . . . . . . . 197

6.3 IR for the program of Figure 6.2. . . . . . . . . . . . . . . . . . . . . . 200

6.4 Output for the program of Figure 6.2 when the switch expression is not

checked for subtract-like expressions. . . . . . . . . . . . . . . . . . . . 200

6.5 A program using an assigned goto. . . . . . . . . . . . . . . . . . . . . 202

6.6 Tree of φ-statements and assignments to the goto variable from Figure 6.5.203

6.7 Decompiled output for the program of Figure 6.5. Output has been

edited for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

6.8 Source code for a short switch statement with special case values. . . . 204

6.9 Direct decompiled output for the program of Figure 6.8. . . . . . . . . 205

6.10 Source code using a sparse switch statement. . . . . . . . . . . . . . . . 205


xxii

6.11 Control Flow Graph for the program of Figure 6.10 (part 1). . . . . . . 206

6.11 Control Flow Graph for the program of Figure 6.10 (part 2). . . . . . . 207

6.12 Typical data layout of an object ready to make a virtual call such as

p->draw(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.13 Implementation of a simple virtual function call (no adjustment to the

this (object) pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

6.14 Implementation of a more complex function call, with an adjustment to

this, and using two virtual function tables (VTs). . . . . . . . . . . . 210

6.15 Source code for a simple program using shared multiple inheritance. . . 212

6.16 Machine code for the start of the main function of Figure 6.15. . . . . . 213

6.17 IR for the program of Figures 6.15 and 6.16. . . . . . . . . . . . . . . . 216

6.18 Splitting a function due to a newly discovered call target. . . . . . . . . 220

7.1 Output from [VEW04], Figure 8. . . . . . . . . . . . . . . . . . . . . . 225

7.2 Common subexpression elimination applied to the same code as Figure

7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

7.3 Propagation limited to below complexity 2, applied to the same code as

Figure 7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

7.4 Propagation limited to below complexity 3, applied to the same code as

Figure 7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

7.5 Propagation limited to below complexity 4, applied to the same code as

Figure 7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

7.6 Original C source code for function test in the Boomerang SPARC

minmax2 test program. The code was compiled with Sun's compiler using
-xO2 optimisation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

7.7 Disassembled SPARC machine code for the program fragment of Fig-

ure 7.6. Note the subcc (subtract and set condition codes) and subx
(subtract extended, i.e. with carry) instructions. . . . . . . . . . . . . . 229

7.8 Decompiled output for the program fragment of Figure 7.6, without ex-

pression propagation limiting. . . . . . . . . . . . . . . . . . . . . . . . 229

7.9 Decompiled output for the program fragment of Figure 7.6, but with

expression propagation limiting. . . . . . . . . . . . . . . . . . . . . . . 229

7.10 A copy of the output of Figure 4.11 with local variables named after the

registers they originated from. . . . . . . . . . . . . . . . . . . . . . . . 230


xxiii

7.11 The code of Figure 4.11 with limited propagation. . . . . . . . . . . . . 231

7.12 Assembly language source code for part of the Boomerang restoredparam
test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

7.13 Intermediate representation for the code of Figure 7.12, just before dead

code elimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

7.14 Boomerang output for the code of Figure 7.12. The parameter and return

are identied correctly. . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

7.15 Original source code for a Fibonacci function. From [Cif94]. . . . . . . 232

7.16 Disassembly of the modied Fibonacci program adapted from [Cif94]. . 233

7.17 Output from the dcc decompiler for the program of Figure 7.16. . . . . 234

7.18 Output from the REC and Boomerang decompilers for the program of

Figure 7.16 and its 32-bit equivalent respectively. . . . . . . . . . . . . 234

7.19 Source code for a slightly dierent Fibonacci function. . . . . . . . . . . 235

7.20 IR for the Fibonacci function of Figure 7.19. . . . . . . . . . . . . . . . 236

7.21 Debug output from Boomerang while nding that esi (register esi) is

preserved (saved and restored). . . . . . . . . . . . . . . . . . . . . . . 237

7.22 Call graph for the Boomerang test program test/pentium/recursion2. 238

7.23 An outline of the source code for the program test/pentium/recursion2


from the Boomerang test suite. . . . . . . . . . . . . . . . . . . . . . . 238

7.24 The code generated for procedure b for the program test/pentium/recursion2.
The Boomerang -X option was used to remove extraneous variables, as

discussed in Section 7.3. The code has been modied by hand (under-

lined code) to return more than one location. . . . . . . . . . . . . . . . 240

7.25 Assembler source code for a modication of the Fibonacci program shown

in Figure 7.19. The underlined instructions assign values to registers ecx


and edx. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

7.26 IR part way through the decompilation of the program of Figure 7.25. . 242

7.27 Generated code for the program of Figures 7.25 and 7.26. Redundant

parameters and returns have been removed. . . . . . . . . . . . . . . . 242


xxiv

List of Tables

1.1 Contrasts between source code and machine code. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problems to be solved by various reverse engineering tools. . . . . . . . . . . . . . . . . . . . 18

1.3 Limitations for the two most capable preexisting machine code decompilers. . 27

3.1 x86 assembly language overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1 Type relationships for expression operators and constants. . . . . . . . . . . . . . . . . . . . 176

5.2 Type patterns and propositions discussing them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6.1 High level expressions for switch expressions in the Boomerang decompiler. . 199

6.2 Analysis of the program of Figure 6.16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

6.3 Analysis of the program of Figure 6.17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

7.1 Complexity metrics for the code in Figures 7.1 - 7.5. . . . . . . . . . . . . . . . . . . . . . . . . 228

List of Algorithms

1 Preventing extraneous local variables in the decompiled output due to

propagation of components of overwriting statements past their denitions. . . 113

2 General algorithm for nding strongly connected components in a graph. . . . . 130

3 Finding recursion groups from the call graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

4 Renaming variables, with updating of Collectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138


List of Abbreviations

ABI Application Binary Interface

AMD Advanced Micro Devices

ARA Ane-Relation Analysis [BR04]

ARM Advanced RISC Machines

ASI Aggregate-Structure Identication

AST Abstract Syntax Tree

AT&T American Telephone and Telegraph Corporation (as: AT&T syntax vs

Intel syntax)
AXP appellation for DEC Alpha architecture

BB Basic Block

BCPL Basic Combined Programming Language

BSD Berkeley Software Distribution

CCR Condition Code Register

CDC Class Device Context

CFG Control Flow Graph

CISC Complex Instruction Set Computer

CLI Common Language Infrastructure

CLR Common Language Runtime

CODASYL Conference on Data Systems Languages

COINS Compiler Infrastructure. www.coins-project.org


CPS Continuation Passing Style

CPU Central Processing Unit

CSE Common Subexpression Elimination

CSP Constraint Satisfaction Problem

dcc de- C compiler

DCE Dead Code Elimination

DEC Digital Equipment Corporation

DFA Data Flow Analysis

DFG Dependence Flow Graph

DLL Dynamically Linked Library

xxv
xxvi LIST OF ABBREVIATIONS

DOS Disk Operating System (elsewhere also as Denial Of Service)

DML Data Manipulation Language

DSP Digital Signal Processing

du denition-use (as: du-chain)

ELF Executable and Linkable Format

EM64T Extended Memory 64 Technology

EPIC Explicitly Parallel Instruction Computing

FFT Fast Fourier Transform

FLIRT Fast Library Recognition Technique

FPGA Field Programmable Gate Array

GCC GNU Compiler Collection

GLB Greatest Lower Bound

GNU Gnu is Not Unix

GOT Global Oset Table

GPL General Public License

GSA Gated Single Assignment form (rarely: GSSA)

GUI Graphical User Interface

HLL High Level Language

IA32 Intel Architecture, 32-bits. Also: x86

IBM International Business Machines

ICT Indirect Control Transfer

IDA Interactive Disassembler (usually as IDA Pro)

ILP Instruction-Level Parallelism

IR Intermediate Representation (sometimes Internal Representation)

ISS Instruction Set Simulation

JIT Just In Time

KB Kilo Byte (binary Kilo: 1024; sometimes called Kibi)

LHS Left Hand Side (of an assignment)

LLVM Low Level Virtual Machine. llvm.org


LNCS Lecture Notes in Computer Science (published by Springer)

MGM Metro-Goldwyn-Mayer

MFC Microsoft Foundation Classes

MS-DOS Microsoft Disk Operating System

MSIL Microsoft Intermediate Language

MVS Multiple Virtual Storage

NAN Not A Number (an innity, the result of an overow or underow, etc)

NOP No Operation (machine code instruction to do nothing)

OO Object Oriented
GLOSSARY xxvii

PC Personal Computer

PE Portable Executable (as: PE format)

PLM Programming Language for Microcomputers (also as PL/M)

PPC Power Performance Computing

PSW Program Status Word

REC Reverse Engineering Compiler

RET Reverse Engineering Tool; occasionally Return (statement or instruction)

RHS Right Hand Side (of an assignment)

RISC Reduced Instruction Set Computer

RTL Register Transfer Language (sometimes Register Transfer Lists)

RTTI Runtime Type Information (sometimes Runtime Type Identication)

SC Strong Component (also called a strongly connected component)

SCALE Scalable Compiler for Analytical Experiments. ali-www.cs.umass.


edu/Scale
SMC Self Modifying Code

SPARC Scalable Processor Architecture

SR Status Register

SSA Static Single Assignment

SSI Static Single Information

SSL Semantic Specication Language

STL Standard Template Library

SUIF Stanford University Intermediate Format. Also SUIF1 and SUIF2.

suif.stanford.edu
TA Type Analysis

ud use-denition (as: ud-chain)

VAX originally Virtual Address Extension

VDG Value Dependence Graph

VFC Virtual Function Call

VFT Virtual Function Table (also virtual table or VT or virtual method table)

VSA Value-Set Analysis

VT Virtual Table (also virtual method table or VFT or virtual method table)

WSL Wide Spectrum Language (here, a particular language by M. Ward and

K. Bennett)

x64 64-bit versions of the x86 microprocessor family: AMD64 or EM64T

x86 8086 series microprocessor: 8086/80186/80286/80386/80486/Pentium/

Core/AMD86. Also IA32


xxviii GLOSSARY

Glossary

There is currently vastly more literature devoted to compilers than to decompilers.

As a result, certain commonly used terms such as source code carry a bias towards

the forward engineering direction. A few terms will be used here with slightly special

meanings.

• An ane relation is a relation where one location remains at a xed oset or a


xed scaled oset from another location over an area of interest, usually a loop.

For example, a loop index and a running array pointer are usually related this

way.

• An aggregate is a structure (record) or array, as in the C Language Reference


[IBM04]. Unions are not considered aggregates.

• The term  argument is used here specically to represent an actual argument

in call statements, and the term  parameter is used for formal or dummy pa-

rameters in callees.

• An assembler translates assembly language into object code, suitable for linking.
Occasionally, the term refers to the assembler proper and linker, i.e. a translator

from assembly language to machine code.

• Assembly language is machine specic source code with instruction mnemonics,


labels, and optional comments. It is sometimes misused to mean machine code.

• An assignment in a decompiler is similar to an assignment in a compiler: the


value of a location (on the left of the assignment) is changed to become the result

of evaluating an expression (on the right side of the assignment).

• A basic block is a block of statements or instructions that will execute sequen-


tially, terminated by a branch, label, or a call.

• The term  binary is used here to mean executable, as in Application Binary

Interface (ABI). Elsewhere it is sometimes used to mean non text. Operators

that take exactly two operands are also known as binary operators, forming binary

expressions.

• The Boomerang decompiler is a machine code decompiler co-authored by the


author of this thesis [Boo02]. It was written partly to test and demonstrate some

of the ideas developed here.


GLOSSARY xxix

• The borrow ag (also called a condition code bit or status register bit) is the
bit used to indicate a borrow from one subtract operation to the next. In most

processors, it is the same register bit as the carry ag; in some processors, the

logical negation of the carry ag is the eective borrow ag.

• A call statement is said to be bypassed when a use is found to be dened by


the call, but the call is later found to preserve the used location. (In the special

case of the stack pointer, the eect of the procedure could be to have a constant

value added to it.) The use is modied to become dened by the denition of the

location which reaches the call.

• Callers are procedures that call callees. Callees are sometimes known as called
procedures.

• The carry ag (also called a condition code bit or status register bit) is the bit
used to indicate a carry (in the ordinary arithmetic sense) from one addition,

shift, or rotate operation to the next. For example, after adding 250 to 10 in an

8-bit add instruction, the result is 4 with a carry (representing one unit of 256

that has to be taken into account by adding one to the high order byte of the

result, if there is one). Adding 240 and 10 results in 250 with the carry ag being

cleared.

• A call graph is the directed graph of procedures with edges directed from callers
to callees. Self recursion causes cycles of length 1 in the call graph; mutual

recursion causes cycles of longer length. Often, a call graph is connected, i.e. the

entry point of the program reaches all procedures. Sometimes, as in a library,

there will be multiple entry points, and the possibility exists that the call graph

will not be connected.

• The canonical form of an expression is the most favoured form. For example,
m[esp0 -8] is preferred to m[ebp+8-12], where esp0 is the value of the stack pointer
register on entry to the procedure. With completely equivalent expressions such

as a+8 and 8+a, one is arbitrarily chosen as canonical. The process of converting

expressions to their canonical equivalents is canonicalisation. When expressions


are canonicalised, they can be more readily compared for equality, and errors due

to aliasing at the machine code level are reduced.

• A childless call is one where the callee does not yet have a data ow summary
available, either because it is a recursive call, or because the call is indirect and

not yet analysed. In the call graph, childless calls have no child nodes that have

call summaries.
xxx GLOSSARY

• A collector is an invention of this thesis, which collects either reaching denitions


(a def collector), or live locations (a use collector).

• Colocated variables are variables that an optimising compiler has assigned


the same address (memory location) or register. This is possible when the live

variables of the ranges do not overlap. Such variables might have dierent types,

necessitating their separation.

• CLI (Common Language Infrastructure) encompasses languages and tools


that target the CIL (Common Infrastructure Language) . Microsoft's own imple-
mentation of this standard is the .NET framework, their CIL language is MSIL
(Microsoft Intermediate Language), and their runtime engine is called the CLR
(Common Language Runtime).

• A compilation is the result of running a compiler. Usually the output of a

compiler is object code, however some compilers emit assembly code which has to

be assembled to object code. Sometimes the term compiler includes the linking

stage, in which case the output would be machine code. The term can therefore

apply to the compiler proper, compiler and assembler, compiler and linker, or

compiler, assembler and linker.

• Condition codes (also called ags or status bits) are a set of single bit registers
that represent such conditions as the last arithmetic operation resulting in a

zero result, negative result, overow, and so on. These bits are usually collected

into a condition code register or ags register. A status register often


includes condition codes in its lower bits. Dierent input machines dene dierent

condition codes, though the big four of Zero, Sign, Overow and Carry are

usually present in some form.

• Conditional preservation is an analysis for determining whether a location is


preserved in a procedure when that procedure is involved in recursion.

• A constant in the intermediate representation is a literal or immediate value.


Constants have types; often the same bit pattern can represent several dierent

things depending on the type. A constant could be any integer, oating point,

string, or other xed value, e.g. 99, 0x8048ca8, -5.3, or hello, world. Pointers

to procedures or global data objects are also constants with special types. There

are also type constants, e.g. in the constraint T (x) = α | int; the int is a type

constant.

• Expressions representing parameters, arguments, deneds, and results have a dif-

ferent form in the caller context as opposed to the callee context. For example,
GLOSSARY xxxi

the rst parameter of a procedure might take the form m[esp0 +4], while in the

caller context it might take the form m[esp0 -20]. Both expressions refer to the

same physical memory word, but esp0 (the stack pointer value on entry to the

current procedure) depends on the context.

• Continuation Passing Style (CPS) is a style of programming with functional


languages. It can also appear at the machine code level, which is more relevant

here. In CPS machine code, the address of the machine code to continue with after

a call is passed as a parameter to each procedure, and the procedure is jumped

to rather than called. At the end of the procedure, the usual return instruction is

replaced with an indirect jump to the address given by the continuation parameter.

• To contract a set of vertices from a graph, the set of vertices is replaced by a


single vertex representing the contracted vertices. For example, nodes x, y, and z

of a graph could be contracted by replacing them with a single vertex labelled {x,

y, z }. The connectivity of the graph is maintained, apart from edges associated

only with contracted vertices.

• A Cygwin program is one which calls Unix library functions, but using a special
library, is able to run on a Windows operating system. A version of the GCC

compiler is used to generate these executables.

• The process of decoding a procedure in executable, binary form is the trans-


forming of the individual instructions to a raw intermediate representation, using

the semantics of each individual instruction, instantiating operands if required.

• Dead code is code, usually an assignment, whose denition is unused. The

elimination of dead code (dead code elimination, or DCE) does not aect the

semantics of the output, and improves the readability.

• Decompiled output is the high level code generated by the decompiler. It is a


form of source code, despite being an output.

• A decompiler could be designed to read either machine code, Java or other


bytecode les, assembly language, or object code. Where not otherwise stated, a

machine code decompiler is assumed. In other contexts, there are decompilers

for Flash movies, database formats, and so on. Such usage is not considered here.

• The denes of a call statement are the locations it denes (modies). Not nec-
essarily all of these may end up being declared as results of the call.
xxxii GLOSSARY

• A denition is a statement that assigns a value to a location. Sometimes, as


at a call, several locations can be assigned values at once. The term denition

can also refer to a declaration of a variable in a program, e.g. in the decompiled

output.

• A denition-use chain (also du-chain or dene-use chain) is a data structure


storing the uses for a particular denition of a location.

• A denition is said to dominate a use if for any path from the entry point to the
use, the path includes the dominating denition.

• Downcasting is the process of converting an expression representing a pointer


to an object of a base class to a pointer to an object of a derived class. The

pointer thus moves down the class hierarchy. The reciprocal process is called

upcasting. In some cases, downcasting or upcasting will result in the compiler

emitting machine code, for example adding a constant to the pointer, which should

be handled correctly by a decompiler.

• The endianness of an architecture is either little endian (little end rst, at lowest
memory address) or big endian (big end rst). Data in a le will be represented

in a particular endianness; if the le and architecture have opposite endianness,

byte swapping will be necessary.

• The address of a local variable is said to escape the current procedure, if the
address is passed as an argument to a procedure call, is the return expression of

the current procedure, or is assigned to a global variable. When the address of a

local variable escapes, it is dicult to determine when an aliased assignment is

made to the local variable.

• An executable le is a general class of les that could be decompiled. Here, the
term includes both machine code programs, and programs compiled to a virtual

machine form. The term native distinguishes machine code executables from

others, although machine code seems to be clearer than native executable.

• An expression is as per a compiler: either a value, or a combination of an


operator and (an) appropriate value or expression operand(s). Usually a function

call will not be part of an expression in a decompiler, because of its initially

unknown side eects.

• A lter is here used as an innite set of locations to be intersected with potential


parameters or returns. The eect is to remove certain types of location, such as
GLOSSARY xxxiii

global variables, from sets intended to be used as parameters or returns. The

ltered locations are of interest, but not as potential parameters or returns.

• The type oat is the C oating point type, equivalent to real or shortreal types in
some other languages. It is often assumed that the size of a oat and the size of

an integer are the same, as is true for most compilers.

• An idiom or idiomatic instruction sequence is a sequence of instructions whose


overall semantics is not obvious from the semantics of the individual instructions.

For example, three xor (exclusive or) instructions can be used to swap the values
of two registers without requiring a third register or memory location.

• An implicit denition may be inserted into the IR of a program to provide a


denition for locations such as parameters which do not have an explicit denition

in the current procedure. This may be done to provide a consistent place (a

location's denition) where type or other information is stored.

• An implicit reference may be inserted into the IR of a program to provide a


use or reference to a location which is not explicitly used. An example is where

the address of a memory location is passed as a parameter; there is no explicit

use of the memory location. Implicit references may be required to prevent the

denition of some locations from being incorrectly eliminated as dead code.

• An induction variable is a variable whose value is increased or decreased by a


constant value in a loop. Sometimes, such variables are induced by indexing;

for example, incrementing an index by 1 in a loop may increase the value of a

temporary pointer generated by a compiler by 4.

• The input program is the machine code or assembly language program (etc.)
that the decompiler reads. Terms such as source program create confusion.

• itof is the i nteger to f loat operator. For example, itof(-5) yields the oating

point value-5.00. In practice, the operator may take a pair of precisions, e.g.

itof(-5, 32, 64) to convert the 32-bit integer 5 to the 64-bit oating point

value -5.00.

• A Just In Time (JIT) compiler is a piece of code that translates instructions


designed for execution on a Virtual Machine (VM) to machine code on-the-y as

dictated by the current virtual program counter. Such a compiler achieves the

same result as an interpreter for the VM, but usually has better performance.

• The term  lattice usually refers to a lattice of types; see Notation on page xl.
xxxiv GLOSSARY

• A linker is a program that combines one or more object les into an executable,
machine code le.

• A location is considered live at a program point if dening the location at that


point would aect the semantics of the program. The liveness of a location is
the question of whether the location is live, e.g. the liveness of x is killed by an

unconditional assignment to x .

• The live range of a variable is the set of program points from one or more
denitions to one or more uses, where the variable is live. If the program is

represented in Static Single Assignment form, the live ranges associated with

multiple denitions are united by φ-functions.

• A local variable is a variable located in the stack frame such that it is only visible
(in scope) in the current procedure. Its address is usually of the form sp±K, where
sp is the stack pointer register, and K is a constant. Stack parameters usually also

have the same form, and are sometimes considered to be local variables.

• A location is a register or memory word which can be assigned to (it can ap-
pear on the left hand side of an assignment). It can be used as a value (it can

appear on the right hand side of assignments, and in other kinds of statements).

Temporaries, array elements, and structure elements are special cases of regis-

ters or variables, and are therefore also locations. A location in machine code

corresponds to a variable in a high level language.

• Machine code (also called machine language) refers to native executable pro-
grams, i.e. instructions that could be executed by a real processor. Java bytecodes

are executable but not machine code.

• When subtracting two numbers, the minuend is the number being subtracted
from, i.e. dierence = minuend - subtrahend.

• The modieds of a procedure are the locations that are modied by that proce-
dure. A subset of these become returns. A few special locations (e.g. the stack

pointer register and program counter) which are modieds are not considered

returns.

• The term  name is sometimes used in the aliasing sense, e.g. *p and q can be

dierent names (or aliases) for the same location if p points to q. In the context

of the SSA form, the term is used in a dierent sense. Locations (including those

not subject to aliasing, such as registers) are subscripted, eectively renaming


GLOSSARY xxxv

them or giving them dierent names, in order to create unique denitions for each

location.

• Object code is the relocatable output of an assembler or compiler. It is used by


some authors to mean machine code, but is here used to refer to the contents of

object les (.o or .obj les on most machines). These les contain incomplete

machine code, relocation information, and enough symbolic information to be

able to link the object le with others to form an executable machine code le.

• An oset pointer is a pointer to other than the start of a data object, obtained by
adding an original pointer to an oset or displacement. Oset pointers frequently

arise as a result of one or more array indexes having a non zero lower bound.

• The original compiler is the one presumed to have created the input executable
program.

• An original pointer is a pointer to the start of a data object, e.g. the rst element
of an array (even if that represents a non zero index), or the rst element of a

structure.

• Original source code is the original, usually high level source code that the
program was written in.

• An overwriting statement is an assignment of the form x = f(x), e.g. x = x+2


or x = (z*x)-y.

• Parameters are formal or dummy parameters in callees, as opposed to argu-


ments which are actual arguments in calls.

• A parameter lter is a lter to remove locations, such as memory expressions,


that cannot be used as a parameter to a procedure or argument to a call.

• Pentium refers to the Intel IA32 processor architecture. The term x86 is more
appropriate, since Pentium is no longer used by Intel for their latest IA32 proces-

sors, and other manufacturers have compatible products with dierent names.

• A location is preserved by a procedure if its value at the end of the procedure


is the same as the value at the start of the procedure, regardless of the path

taken in the procedure. A preserved location may be simply unused (even by

procedures called by the current procedure). Most commonly it is saved near the

start of the procedure and restored near the end. In rare cases, a location may be

considered preserved if it always has a constant added to its initial value through
xxxvi GLOSSARY

the procedure. For example, a CISC procedure where the arguments are removed

by the caller might always increase the stack pointer register by 4, representing the

return address popped by the return instruction (call instructions always subtract

4 from the stack pointer in those architectures).

• The term procedure is used interchangeably with the term function; whether
a procedure returns a value or values or nothing at all is not known until quite

late in the decompilation.

•  Propagation is here used to mean expression propagation. Compilers often

perform copy propagation, where only assignments of the form x := y (where

x and y are simple variables, not expressions) are propagated. This is because

compilers usually do not want to increase complexity of expressions; they prefer

simple expressions that match better with machine code instructions. Compilers

sometimes propagate simple expressions such as x+K where K is a constant. This

is sometimes called forward substitution or forward propagation. Decompilers by

contrast prefer more complex expressions, as these are generally more readable,

and therefore propagate complete expressions such as x := y+z*4.

• Range analysis is an analysis that attempts to nd the possible runtime values
of locations, usually as a range or a set of ranges. The ranges may or may not be

strided, that is, the possible values may be a minimum value plus a multiple of a

stride, particularly for pointers. For function pointers or assigned goto pointers,

this tends to be called value analysis.

• Some applications of decompilation require a recompile of the decompiled out-


put. While some programs will not have been compiled in the rst place, the

term recompiled will still be used to indicate compiling of the source code gen-

erated by the decompiler. The ability to compile the decompiler's output is called

 recompilability .

• The term  record is sometimes used synonymously with structure, i.e. an ag-

gregate data structure with several members (possibly of dierent types).

• A procedure is said to be self recursive if it calls itself. It is said to be mutually


recursive if it calls itself via other procedures. For example, a calls b, b calls c,

and c calls a.

• Relocation information is information allowing the linker or loader to adjust


various pointers inside object les and sometimes executable les so that they will

correctly point to a data object in its nal location after linking.


GLOSSARY xxxvii

• In the SSA form, locations are renamed to eectively create unique denitions
for each location. For example, two denitions of register r8 could be renamed

to r81 and r82 .

• The results of a call statement are those denes that are used before being
redened after the call. In a decompiler, calls are treated as special assignment

statements, where several locations can be the results of a call. For example, a is

a result in a := mycall(x, y) ; a and b are both results in {a,b} := mycall(x, y, z).

• The returns from a procedure are the locations that are dened by that procedure
and in addition are used by at least one caller before being redened. Each return

has a corresponding result in at least one call. In a high level program, usually

only one value is returned, and the location (register or memory) where that value

is returned is not important.

• A return lter is a lter to remove locations, such as global variables, that can
not be used as a return location for a procedure.

• Reverse compilation and decompilation are here used interchangeably. John-


stone [JS04] prefers to use decompilation for the uncompiling of code produced

by a compiler, and uses reverse compilation for the more general problem of

rewriting hand-written code as high level source. Whether the original program

was machine compiled or not should make little dierence to a general decompiler.

• Reverse engineering is the process of identifying a system's components and


their interrelationships, and creating representations of the system in another

form or at a higher level of abstraction [CC90]. Decompilation is therefore one

form of reverse engineering; there are also source to source forms such as archi-

tecture extraction. It could be argued that binary to binary translations are also

reverse engineering. Popular usage and some authors consider only the process of

extraction of information from a binary program.

• Self Modifying Code is code that makes changes to executable instructions


as the program is running. Such code can be dicult to write such that the

processor's instruction cache does not produce unwanted eects. It also makes

the code dicult to decompile sensibly.

• A variable is said to be shadowed by a local variable in a procedure or nested


scope if the local variable has the same name as the other variable, and the other

variable would have been in the scope of the procedure. For the lifetime of the

local variable, the shadowed variable is not visible to the procedure.


xxxviii GLOSSARY

• The term  signature has an unfortunate pair of conicting meanings, both of

which are used in decompilation. Usually, as used here, the term is a synonym

for prototype, in the sense of a declaration of a function with the names and

types for its parameters, and also its return type. The other meaning is as a bit

pattern used to recognise the binary form of the function as it appears statically

linked into an application [VE98]. Library function recognition is not considered

in detail here.

• The signedness of a variable or a type refers to whether the variable is signed or


unsigned. This usually only applies to the integral types, sometimes including the

character type. For example, the C type char is usually but not always regarded

as signed.

• A sink in a directed graph is a vertex that has no outgoing edges.

•  Source code is a term very rmly associated with a high level program rep-

resentation, and is used here despite the fact that it is usually the output of a

decompilation.

• The term  sparse as applied to a data ow analysis implies a minimisation of the

information stored about the program. Conventional ow-sensitive type analysis

stores type information for each variable at each program point (each basic block

or sometimes even each statement). By contrast, sparse type analysis stores type

information for each variable, and if the program is represented in SSA form, a

degree of ow sensitivity is achieved if the variable can be assumed to keep the

same type from each denition to the end of its live range.

• Static Single Assignment (SSA) form is a program representation where each


use of a location is reached statically by only one denition. When a program is

represented in the SSA form, many important transformations are simplied.

• A strongly connected component is a maximal subgraph such that each node


reaches all other nodes in the component, i.e. there is a path from every node to

every other node.

• A subarray of a multidimensional array is the array implied by removing some


indexes. For example, consider a C array declared as int a[10][20]. It is

eectively an array of 10 subarrays, each of which are arrays of 20 integers. If

sizeof(int) = 4, the expression a[x][y] is eectively a reference to the memory


at address ((void*)&a) + x*80 + y*4. The number 80 in this expression is the

size of the subarray a[x], which is an array of 20 integers.


NOTATION xxxix

• When subtracting two numbers, the subtrahend is the number being subtracted,
i.e. dierence = minuend - subtrahend.

• The type x is a subtype of type y (x < y ) if for every place that type y is valid,

substituting x for y does not change the validity of the program.

• Locations are here labelled suitable if they meet the criteria of a parameter lter
or a return lter.

• The terms  target and retargeting are rmly associated with machines, and

are best avoided. The decompiled output is often referred to as source code

despite the inherent contradiction, or as generated code.

• Unreachable code is code which can not be reached by any execution path of
the program.

• A use-denition chain (also ud-chain or use-dene chain) is a data structure


which stores the denitions for a particular use of a location.

• A value is a location or a constant, i.e. an elementary item that can appear on the
right hand side of an assignment, either by itself or combined with other values

and operators into an expression. The leaves of expression trees are values. The

term is sometimes also used to refer to one or more runtime values of a location.

• Value analysis is an analysis which attempts to nd the set of possible runtime
values of locations, e.g. function pointers. When run on ordinary variables, it

tends to be called range analysis, since the values of ordinary variables tends to

be a range or set of ranges.

• The terms  x64 (sometimes  x86_64 ,  x86-64,  AMD64, or  EM64T )

refer to a modern version of the x86 processor family capable of running in 64-bit

mode. Sometimes EM64T or AMD64 are used to imply the slightly dierent

Intel and AMD versions of this processor, respectively. Note that IA64 refers to

Intel's Itanium architecture, which has a completely dierent instruction set.

•  x86 is the term referring to the family of Intel and compatible processors which

maintain binary instruction set compatibility to the original 8086 processor. The

name came from the original numbering system of 8086 and 80186 through 80486.

The term i686 is still used by some software to denote a modern member of this

series, even though there has never been a processor numbered 80586 or 80686.

Here, this term implies a 32-bit processor, which means at least an 80386. Also

called IA32 (Intel Architecture, 32 bits).


xl NOTATION

Notation

• 0x123C represents the hexadecimal constant 123C16 , as per the C language.

• a[exp ] represents the address of exp in memory.

• locn represents a version of the location loc. In the SSA form, all denitions of

a location are considered distinct, so the subscript is used to dierentiate them.

The subscripts could be sequential integers, statement number of the dening

statement, or other schemes could be used.

• m[exp ] represents the memory value at address exp.

• φ(a1 , a2 ) represents a phi-function. See section 4.1 for details. It should not be

confused with ∅, which represents the empty or null set.

• T (e) is used to represent the type of e  (where e is some expression). In the

functional programming world, it is common to use e :: τ to say  e has the type

τ , but commonly needed constraints of the type  e1 has the same type as e2 
cannot readily be expressed with this notation. No standard has emerged: Tip

et al. use [e] for the type of e [TKB03], Palsberg uses [[e]] [PS91], while Hendren

uses T(e) [GHM00].

• Type constants such as int are printed in sans serif font. Type variables (i.e. vari-

ables whose possible values are types) are written as lower case Greek letters such

as α.

• > (top) represents the top element of a lattice of types or other data ow infor-
mation, representing (for types) no type information. Analogously, ⊥ (bottom)

represents overconstraint, or conicting type information. If the lattice elements

are thought of as a set of possible types for a location, > represents the set of all

possible types and ⊥ the empty set. Type information progresses from the top of

the lattice downwards. Authors in semantics and abstract interpretation use the

opposite convention, i.e. types are only propagated up the lattice [App02].

• u represents the meet operator; a u b is the lattice node that is the greatest

lower bound of a and b. Some authors use ∧ for greater generality. A type

lattice contains elements that are not comparable, hence the u symbol is preferred
because it implies a partial ordering.

• t represents the join operator; a t b is the lattice node that is the least upper

bound of a and b. Some authors use ∨ for greater generality.


NOTATION xli

• < represents the subtype operator. It is written as x < y ⇒X ⊂Y ∧ x, y are

comparable (where X and Y are the sets of values that variables of type x and y

could hold, respectively). Similarly x v y ⇒X ⊆ Y ∧ x, y are comparable.

• ξ (array(α)) indicates an element of an array whose elements are of type α.


• ξ (structure-containing-α) indicates a structure member whose type is α.
xlii NOTATION
Summary

Static Single Assignment enables the ecient implementation of many

important decompiler components, including expression propagation, preser-

vation analysis, type analysis, and the analysis of indirect jumps and

calls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

1 Introduction 1

Machine code decompilers have the ability to provide the key to software evolution:

source code. Before their considerable potential can be realised, however,

several problems need to be solved. . . . . . . . . . . . . . . . . . . . . 1

1.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Source code is so important to software development that at times it becomes

worthwhile to derive it from the executable form of computer programs. 3

1.2 Forward and Reverse Engineering . . . . . . . . . . . . . . . . . . . . . 6

Decompilation is a form of reverse engineering, starting with an executable le,

and ending where traditional reverse engineering starts (i.e. with source

code that does not fully expose the design of a program). . . . . . . . . 6

1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Decompilers have many useful applications, broadly divided into those based on

browsing parts of a program, providing the foundation for an automated

tool, and those requiring the ability to compile the decompiler's output. 8

1.3.1 Decompiler as Browser . . . . . . . . . . . . . . . . . . . . . . . 8

When viewed as program browsers, decompilers are useful tools that focus

on parts of the input program, rather than the program as a whole. . . 8

1.3.2 Automated Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Decompilation technology can be used in automated tools for nding bugs,

nding vulnerabilities, nding malware, verication, and program com-

parison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

xliii
xliv SUMMARY

1.3.3 Recompilable Code, Automatically Generated . . . . . . . . . . 11

The user may choose to accept the default, automatically generated out-

put, which even if dicult to read still has signicant uses. . . . . . . 11

1.3.4 Recompilable, Maintainable Code . . . . . . . . . . . . . . . . . 13

If sucient manual eort is put into the decompilation process, the gen-

erated code can be recompilable and maintainable. . . . . . . . . . . . . 13

1.4 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

The state of the art as of 2002 needed improvement in many areas, including

the recovery of parameters and returns, type analysis, and the handling

of indirect jumps and calls. . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Reverse Engineering Tool Problems . . . . . . . . . . . . . . . . . . . . 16

Various reverse engineering tools are compared in terms of the basic prob-

lems that they need to solve; the machine code decompiler has the largest

number of such problems. . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.1 Separating Code from Data . . . . . . . . . . . . . . . . . . . . 18

Separating code from data is facilitated by data ow guided recursive

traversal and the analysis of indirect jumps and calls, both of which are

addressed in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.5.2 Separating Pointers from Constants . . . . . . . . . . . . . . . . 19

The problem of separating pointers from constants is solved using type

analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5.3 Separating Original from Oset Pointers . . . . . . . . . . . . . 19

Separating original pointers from those with osets added (oset pointers)

requires range analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5.4 Tools Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 21

The problems faced by reverse engineering tools increases as the abstrac-

tion distance from source code to the input code increases; hence assem-

bly language decompilers face relatively few problems, and machine code

decompilers face the most. . . . . . . . . . . . . . . . . . . . . . . . . . 21

Disassemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Machine code decompilers face four additional problems to those

of an ideal disassembler; all are addressed in this thesis, or are already

solved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SUMMARY xlv

Assembly Decompilers . . . . . . . . . . . . . . . . . . . . . . . 23

Good results have been achieved for assembly language decompil-

ers, however, they only face about half the problems of a machine code

decompiler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Object Code Decompilers . . . . . . . . . . . . . . . . . . . . . 24

Object code decompilers are intermediate in the number of prob-

lems that they face between assembly decompilers and machine code de-

compilers, but the existence of relocation information removes two sig-

nicant separation problems. . . . . . . . . . . . . . . . . . . . . . . . 24

Virtual Machine Decompilers . . . . . . . . . . . . . . . . . . . 25

Most virtual machine decompilers such as Java bytecode decompil-

ers are very successful because their matadata-rich executable le formats

ensure that they only face two problems, the solutions for which are well

known. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Limitations of Existing Decompilers . . . . . . . . . . . . . . . . 26

The limitations of existing machine code decompilers include the

size of the input program, identication of parameters and returns, han-

dling of indirect jumps and calls, and type analysis. . . . . . . . . . . . 26

1.5.5 Theoretical Limits and Approximation . . . . . . . . . . . . . . 27

Compilers and decompilers face theoretical limits which can be avoided

with conservative approximations, but while the result for a compiler is

a correct but less optimised program, the result for a decompiler ranges

from a correct but less readable program to one that is incorrect. . . . . 27

1.6 Legal Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

There are sucient important and legal uses for decompilation to warrant

this research, and decompilation may facilitate the transfer of facts and

functional concepts to the public domain. . . . . . . . . . . . . . . . . 29

1.7 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

The main goals are better identication of parameters and returns; recon-

structing types, and correctly translating indirect jumps and calls; all are

facilitated by the Static Single Assignment form. . . . . . . . . . . . . 30

1.8 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


xlvi SUMMARY

Following chapters review the limitations of existing machine code decompilers,

and show how many of their limitations in the areas of data ow analysis,

type analysis, and the translation of indirect jumps and calls are solved

with the Static Single Assignment form. . . . . . . . . . . . . . . . . . 31

2 Decompiler Review 33

Existing decompilers have evaded many of the issues faced by machine code decom-

pilers, or have been decient in the areas detailed in Chapter 1. Related

work has also not addressed these issues. . . . . . . . . . . . . . . . . 33

2.1 Machine Code Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . 33

A surprisingly large number of machine code decompilers exist, but all suer

from the problems summarised in Chapter 1. . . . . . . . . . . . . . . 33

2.2 Object Code Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . 38

Object code decompilers have advantages over machine code decompilers, but

are less common, presumably because the availability of object code with-

out source code is low. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3 Assembly Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

The relatively small number of problems faced by assembly decompilers is re-

ected in their relatively good performance. . . . . . . . . . . . . . . . 39

2.4 Decompilers for Virtual Machines . . . . . . . . . . . . . . . . . . . . . 42

Virtual machine specications (like Java bytecodes) are rich in information

such as names and types, making decompilers for these platforms much

easier, however good type analysis is still necessary for recompilability. 42

2.4.1 Java decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Since Java decompilers are relatively easy to write, they rst started ap-

pearing less than a year after the release of the Java language. . . . . . 42

2.4.2 CLI Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . 44

MSIL is slightly easier to decompile than Java bytecodes. . . . . . . . . 44

2.5 Decompilation Services . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A few commercial companies oer decompilation services instead of, or in

addition to, selling a decompiler software license. . . . . . . . . . . . . 44

2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46


SUMMARY xlvii

This related work faces a subset of the problems of decompilation, or feature

techniques or representations that appear to be useful for decompilation. 46

2.6.1 Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Disassembly achieves similar results to decompilation, and encounters

some of the same problems as decompilation, but the results are more

verbose and machine specic. No existing disassemblers automatically

generate reassemblable output. . . . . . . . . . . . . . . . . . . . . . . 46

2.6.2 Decompilation of DSP Assembly Language . . . . . . . . . . . . 46

This is a specialised area, where assembly language is still in widespread

but declining use, and has several unique problems that are not considered

here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6.3 Link-Time Optimisers . . . . . . . . . . . . . . . . . . . . . . . 47

Link-time Optimisers share some of the problems of machine code decom-

pilers, but have a decided advantage because of the presence of relocation

information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.4 Synthesising to Hardware . . . . . . . . . . . . . . . . . . . . . 48

An additional potential application for decompilation is in improving the

performance of hardware synthesised from executable programs. . . . . 48

2.6.5 Binary Translation . . . . . . . . . . . . . . . . . . . . . . . . . 48

Static Binary Translators that emit C produce source code from binary

code, but since they do not understand the data, the output has very

low readability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6.6 Instruction Set Simulation . . . . . . . . . . . . . . . . . . . . . 49

This is an automatic technique for generating source code which is like a

static interpretation of the input program inlined into a large source le,

relying heavily on compiler optimisations for performance. . . . . . . . 50

2.6.7 Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . 50

Abstract interpretation research shows how desirable features such as cor-

rectness can be proved for decompilation analyses. . . . . . . . . . . . 50

2.6.8 Proof-Carrying Code . . . . . . . . . . . . . . . . . . . . . . . . 50

Proof-carrying code can be added to machine code to enable enough type

analysis to prove various kinds of safety. . . . . . . . . . . . . . . . . . 50

2.6.9 Safety Checking of Machine Code . . . . . . . . . . . . . . . . . 51


xlviii SUMMARY

CodeSurfer/x86 uses a variety of proprietary tools to produce interme-

diate representations that are similar to those that can be created for

programs written in a high-level language. . . . . . . . . . . . . . . . . 51

2.6.10 Traditional Reverse Engineering . . . . . . . . . . . . . . . . . . 52

Traditional reverse engineering increases high level comprehension, start-

ing with source code, while decompilation provides little high level com-

prehension, starting with machine code. . . . . . . . . . . . . . . . . . 52

2.6.11 Compiler Infrastructures . . . . . . . . . . . . . . . . . . . . . . 53

Several compiler infrastructures exist with mature tool sets which, while

intended for forward engineering, might be adaptable for decompilation. 53

LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

SUIF2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

COINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

SCALE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

GCC Tree SSA . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Open64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.6.12 Simplication of Mathematical Formulae . . . . . . . . . . . . . 59

Decompilers can make use of research on the simplication of mathemat-

ical formulae to improve the output. . . . . . . . . . . . . . . . . . . . 59

2.6.13 Obfuscation and Protection . . . . . . . . . . . . . . . . . . . . 59

Obfuscation and protection are designed to make the reverse engineer-

ing of code more dicult, including decompilation; in most cases, such

protection prevents eective decompilation. . . . . . . . . . . . . . . . 59

3 Data Flow Analysis 61

Static Single Assignment form assists with most data ow components of decom-

pilers, assisting with such fundamental tasks as expression propagation,

identifying parameters and return values, deciding if locations are pre-

served, and eliminating dead code. . . . . . . . . . . . . . . . . . . . . 61

3.1 Expression Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Expression propagation is the most common transformation used by decompil-

ers, and there are two simple rules, yet dicult to check, for when it can

be applied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
SUMMARY xlix

3.2 Limiting Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Although expression propagation is usually benecial in a decompiler, in some

circumstances limiting propagation produces more readable output. . . 68

3.3 Dead Code Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Dead code elimination is facilitated by storing all uses for each denition

(denition-use information). . . . . . . . . . . . . . . . . . . . . . . . 69

3.3.1 Condition Code Combining . . . . . . . . . . . . . . . . . . . . 70

Expression propagation can be used to implement machine independent

combining of condition code denitions and uses. . . . . . . . . . . . . 70

3.3.2 x86 Floating Point Compares . . . . . . . . . . . . . . . . . . . 75

Expression propagation can also be used to transform away machine de-

tails such as those revealed by older x86 oating point compare instruction

sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.4 Summarising Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

The eects of calls are best summarised by the locations modied by the callee,

and the locations used before denition in the callee. . . . . . . . . . . 76

3.4.1 Call Related Terminology . . . . . . . . . . . . . . . . . . . . . 77

The semantics of call statements and their side eects necessitates ter-

minology that extends terms used by the compiler community. . . . . . 77

3.4.2 Caller/Callee Context . . . . . . . . . . . . . . . . . . . . . . . 79

Memory and register expressions are frequently communicated between

callers and callee(s); the dierence in context requires some substitutions

to obtain expressions which are valid in the other context. . . . . . . . 79

3.4.3 Globals, Parameters, and Returns . . . . . . . . . . . . . . . . . 81

Three propositions determine how registers and global variables that are

assigned to along only some paths should be treated. . . . . . . . . . . 81

3.4.4 Call Summary Equations . . . . . . . . . . . . . . . . . . . . . . 83

The calculation of call-related data ow elements such as parameters,

denes, and results can be concisely expressed in a series of data ow

equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.4.5 Stack Pointer as Parameter . . . . . . . . . . . . . . . . . . . . 86

The stack pointer, and occasionally other special pointers, can appear to

be a parameter to every procedure, and is handled as a special case. . . 86


l SUMMARY

3.5 Global Data Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 87

Decompilers could treat the whole program as one large, global (whole-program)

data ow problem, but the problems with such an approach may outweigh

the benets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.6 Safe Approximation of Data Flow Information . . . . . . . . . . . . . . 90

Whether over- or under-estimation of data ow information is safe depends on

the application; for decompilers, it is safe to overestimate both denitions

and uses, with special considerations for calls. . . . . . . . . . . . . . . 90

3.7 Overlapped Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Overlapped registers are dicult to handle eectively in the intermediate rep-

resentation, but representing explicit side eects produces a correct pro-

gram, and dead code elimination makes the result readable. . . . . . . 92

3.7.1 Sub-elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Sub-elds present similar problems to that of overlapped registers. . . . 95

3.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

The related work conrms that the combination of expression propagation and

dead code elimination is a key technique in decompilation. . . . . . . . 95

4 SSA Form 97

Static Single Assignment form assists with most data ow components of decom-

pilers, including such fundamental tasks as expression propagation, iden-

tifying parameters and return values, deciding if locations are preserved,

and eliminating dead code. . . . . . . . . . . . . . . . . . . . . . . . . 97

4.1 Applying SSA to Registers . . . . . . . . . . . . . . . . . . . . . . . . . 97

SSA form vastly simplies expression propagation, provides economical data

ow information, and is strong enough to solve problems that most other

analysis techniques cannot solve. . . . . . . . . . . . . . . . . . . . . . 97

4.1.1 Benets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

The SSA form makes propagation very easy; initial parameters are readily

identied, preservations are facilitated, and SSA's implicit use-denition

information requires no maintenance. . . . . . . . . . . . . . . . . . . 100

4.1.2 Translating out of SSA form . . . . . . . . . . . . . . . . . . . . 102

The conversion from SSA form requires the insertion of copy statements;

many factors aect how many copies are needed. . . . . . . . . . . . . 102


SUMMARY li

Unused Denitions with Side Eects . . . . . . . . . . . . . . . 106

Unused but not eliminated denitions with side eects can cause

problems with the translation out of SSA form, but there is a simple

solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

SSA Back Translation Algorithms . . . . . . . . . . . . . . . . . 107

Several algorithms for SSA back translation exist in the literature,

but one due to Sreedhar et al. appears to be most suitable for decompila-

tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Allocating Variables versus Register Colouring . . . . . . . . . . 108

Translation out of SSA form in a decompiler and register colour-

ing in a compiler have some similarities, but there are enough signicant

dierences that register colouring cannot be adapted for use in a decom-

piler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.1.3 Extraneous Local Variables . . . . . . . . . . . . . . . . . . . . 110

While SSA and expression propagation are very helpful in a decompiler,

certain patterns of code induce one or more extraneous local variables,

which reduces readability. . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2 Applying SSA to Memory . . . . . . . . . . . . . . . . . . . . . . . . . 113

As a result of alias issues, memory expressions must be divided into those

which are safe to propagate, and those which must not be propagated at

all. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.2.1 Problems Resulting from Aliasing . . . . . . . . . . . . . . . . . 114

Alias-induced problems are more common at the machine code level, aris-

ing from internal pointers, a lack of preservation information, and other

causes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.2.2 Subscripting Memory Locations . . . . . . . . . . . . . . . . . . 116

If memory locations are propagated, care must be taken to avoid alias

issues; however, not propagating memory locations is by itself not su-

cient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.2.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

A solution to the memory expression problem is given, which involves

propagating only suitable local variables, and delaying their subscripting

until after propagation of non-memory locations. . . . . . . . . . . . . 118


lii SUMMARY

4.3 Preserved Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

The ability to determine whether a location is preserved by a procedure call is

important, because preserved locations are an exception to the usual rule

that denitions kill other denitions. . . . . . . . . . . . . . . . . . . . 119

4.3.1 Other Data Flow Anomalies . . . . . . . . . . . . . . . . . . . . 123

Decompilers need to be aware of several algebraic identities both to make

the decompiled output easier to read, and also to prevent data ow anoma-

lies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.3.2 Final Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Final parameters are locations live on entry after preservation analysis,

dead code elimination, and the application of identities. . . . . . . . . 124

4.3.3 Bypassing Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

The main reason for determining whether locations are preserved is to

allow them to bypass call statements in caller procedures. . . . . . . . . 124

4.4 Recursion Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Recursion presents particular problems when determining the preservation of

locations, and the removing of unused parameters and returns. . . . . . 126

4.4.1 Procedure Processing Order . . . . . . . . . . . . . . . . . . . . 128

An algorithm is given for determining the correct order to process pro-

cedures involved in mutual recursion, which maximises the information

available at call sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.4.2 Conditional Preservation . . . . . . . . . . . . . . . . . . . . . . 130

A method for extending the propagation algorithm to mutually recursive

procedures is described. . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.4.3 Redundant Parameters and Returns . . . . . . . . . . . . . . . . 134

The rule that a location which is live at the start of a function is a param-

eter sometimes breaks down in the presence of recursion, necessitating an

analysis to remove redundant parameters. An analogous situation exists

for returns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

4.5 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Collectors, a contribution of this thesis, extend the sparse data ow infor-

mation provided by the Static Single Assignment form in ways that are

useful for decompilers, by taking a snapshot of data ow information

already computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136


SUMMARY liii

4.5.1 Collector Applications . . . . . . . . . . . . . . . . . . . . . . . 139

Collectors nd application in call bypassing, caller/callee context trans-

lation, computing returns, and initial arguments and denes at childless

calls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

With suitable modications to handle aggregates and aliases well, the SSA

form obviates the need for the complexity of techniques such as recency

abstraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.7 Other Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Many alternative intermediate representations exist, especially for optimising

compilers, but few oer real advantages for decompilation. . . . . . . . 141

4.7.1 Value Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 142

The VDG and other representations abstract away the control ow graph,

but the results for decompilation are not compelling. . . . . . . . . . . 142

4.7.2 Static Single Information (SSI) . . . . . . . . . . . . . . . . . . 142

Static Single Information, an extension of SSA, has been suggested as an

improved intermediate representation for decompilation, but the benets

do not outweigh the costs. . . . . . . . . . . . . . . . . . . . . . . . . . 142

4.7.3 Dependence Flow Graph (DFG) . . . . . . . . . . . . . . . . . . 145

The Dependence Flow Graph shows promise as a possible augmentation

for the Static Single Assignment form in machine code decompilers. . . 145

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

The Static Single Assignment form has been found to be a good t for the

intermediate representation of a machine code decompiler. . . . . . . . 147

5 Type Analysis for Decompilers 149

The SSA form enables a sparse data ow based type analysis system, which is well

suited to decompilation. . . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

The work of Mycroft and Reps et al. have some limitations, but they laid the

foundation for type analysis in machine code decompilation. . . . . . . 151

5.2 Type Analysis for Machine Code . . . . . . . . . . . . . . . . . . . . . 152


liv SUMMARY

Type information encapsulates much that distinguishes low level machine code

from high level source code. . . . . . . . . . . . . . . . . . . . . . . . . 152

5.2.1 The Role of Types . . . . . . . . . . . . . . . . . . . . . . . . . 153

Types are assertions; they partition the domain of program semantics,

and partition the data into distinct objects. . . . . . . . . . . . . . . . 153

5.2.2 Types in High Level Languages . . . . . . . . . . . . . . . . . . 154

Types are essential to the expression of a program in high level terms:

they contribute readability, encapsulate knowledge, separate pointers from

numeric constants, and enable Object Orientation. . . . . . . . . . . . 154

5.2.3 Elementary and Aggregate Types . . . . . . . . . . . . . . . . . 155

While elementary types emerge from the semantics of individual instruc-

tions, aggregate types must at times be discovered through stride analysis. 155

5.2.4 Running Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 156

While running pointers and array indexing are equivalent in most cases,

running pointers pose a problem in the case of an initialised array. . . 156

5.3 Sources of Type Information . . . . . . . . . . . . . . . . . . . . . . . . 158

Type information arises from machine instruction opcodes, from the signatures

of library functions, to a limited extent from the values of some constants,

and occasionally from debugging information . . . . . . . . . . . . . . 158

5.4 Typing Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Constants have types just as locations do, and since constants with the same

numeric value are not necessarily related, constants have to be typed in-

dependently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.5 Type Constraint Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . 162

Finding types for variables and constants in the decompiled output can be

treated as a constraint satisfaction problem; its specic characteristics

suggest an algorithm that makes strong use of constraint propagation. . 162

5.5.1 Arrays and Structures . . . . . . . . . . . . . . . . . . . . . . . 165

Constraint-based type analysis requires extra rules to handle arrays cor-

rectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

5.6 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 166

Compilers implicitly use pointer-sized addition instructions for structure mem-

ber access, leading to an exception to the general rule that adding an

integer to a pointer of type α* yields another pointer of type α*. . . . 166


SUMMARY lv

5.7 Data Flow Based Type Analysis . . . . . . . . . . . . . . . . . . . . . . 168

Type analysis for decompilers where the output language is statically type

checked can be performed with a sparse data ow algorithm, enabled by

the SSA form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

5.7.1 Type Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Since types are hierarchical and some type pairs are disjoint, the rela-

tionship between types forms a lattice. . . . . . . . . . . . . . . . . . . 169

5.7.2 SSA-Based Type Analysis . . . . . . . . . . . . . . . . . . . . . 173

The SSA form links uses to denitions, allowing a sparse representation

of type information in assignment nodes. . . . . . . . . . . . . . . . . 173

5.7.3 Typing Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 174

In a sparse intermediate representation, the types of subexpressions are

not stored, so these have to be calculated on demand from the expression

leaves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

In general, type information has to be propagated up the expression tree,

then down again, in two separate passes. . . . . . . . . . . . . . . . . . 175

5.7.4 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . 177

The types inferred from pointer-sized addition and subtraction instruc-

tions require special type functions in a data ow based type analysis. . 177

5.8 Type Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

A small set of high level patterns can be used to represent global variables, local

variables, and aggregate element access. . . . . . . . . . . . . . . . . . 178

5.9 Partitioning the Data Sections . . . . . . . . . . . . . . . . . . . . . . . 186

Decompilers need a data structure comparable to the compiler's symbol table

(which maps symbols to addresses and types) to map addresses to symbols

and types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.9.1 Colocated Variables . . . . . . . . . . . . . . . . . . . . . . . . . 187

Escape analysis is needed to determine the validity of separating colocated

variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

5.10 Special Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

A few special types are needed to cater for certain machine language details,

e.g. upper(oat64). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


lvi SUMMARY

5.11 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Most related work is oriented towards compilers, and hence does not address

some of the issues raised by machine code decompilation. . . . . . . . . 191

5.12 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

While good progress has been made, much work remains before type analysis

for machine code decompilers is mature. . . . . . . . . . . . . . . . . . 192

5.13 SSA Enablements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Expression propagation, enabled by the SSA form, combines with simplication

to prepare memory expressions for high level pattern analysis, and the

SSA form allows a sparse representation for type information. . . . . . 193

6 Indirect Jumps and Calls 195

While indirect jumps and calls have long been the most problematic of instruc-

tions for reverse engineering of executable les, their analysis, facilitated

by SSA, yields high level constructs such as switch statements, function

pointers, and class types. . . . . . . . . . . . . . . . . . . . . . . . . . 195

6.1 Incomplete Control Flow Graphs . . . . . . . . . . . . . . . . . . . . . 196

Special processing is needed since the most powerful indirect jump and call

analyses rely on expression propagation, which in turn relies on a com-

plete control ow graph (CFG), but the CFG is not complete until the

indirect transfers are analysed. . . . . . . . . . . . . . . . . . . . . . . 196

6.2 Indirect Jump Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 198

Indirect jump instructions are used in executable programs to implement switch

(case) statements and assigned goto statements, and tail-optimised calls. 198

6.2.1 Switch Statements . . . . . . . . . . . . . . . . . . . . . . . . . 198

Switch statements can conveniently be analysed by delaying the analysis

until after expression propagation. . . . . . . . . . . . . . . . . . . . . 198

Minimum Case Value . . . . . . . . . . . . . . . . . . . . . . . . 200

Where there is a minimum case value, expression propagation,

facilitated by the SSA form, enables a very simple way to improve the

readability of the generated switch statement. . . . . . . . . . . . . . . 200

No Compare for Maximum Case Value . . . . . . . . . . . . . . 201


SUMMARY lvii

There are three special cases where an optimising compiler does

not emit the compare and branch that usually sets the size of the jump

table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

6.2.2 Assigned Goto Statements . . . . . . . . . . . . . . . . . . . . . 201

Fortran assigned goto statements can be converted to switch statements,

removing an unstructured statement, and permitting a representation in

languages other than Fortran. . . . . . . . . . . . . . . . . . . . . . . . 201

Other Indirect Jumps . . . . . . . . . . . . . . . . . . . . . . . . 203

Indirect jump instructions that do not match any known pattern

have long been the most dicult to translate, but value analysis combined

with the assigned goto switch statement variant can be used to represent

them adequately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Other Branch Tree Cases . . . . . . . . . . . . . . . . . . . . . . 204

Branch trees or chains are found in switch statements with a small

number of cases, and subtract instructions may replace the usual compare

instructions, necessitating some logic to extract the correct switch case

values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

6.2.3 Sparse Switch Statements . . . . . . . . . . . . . . . . . . . . . 205

Sparse switch statements usually compile to a branch tree which can be

transformed into a switch-like statement for greater readability. . . . . 205

6.3 Indirect Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Indirect calls implement calls through function pointers and virtual function

calls; the latter are a special case which should be handled specially for

readability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

6.3.1 Virtual Function Calls . . . . . . . . . . . . . . . . . . . . . . . 209

In most object oriented languages, virtual function calls are implemented

with indirect call instructions, which present special challenges to decom-

pilers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Data Flow and Virtual Function Target Analyses . . . . . . . . 211

Use of the SSA form helps considerably with virtual function an-

alysis, which is more complex than switch analysis, by mitigating alias

problems, and because SSA relations apply everywhere. . . . . . . . . . 211

Null-Preserved Pointers . . . . . . . . . . . . . . . . . . . . . . . 217


lviii SUMMARY

Compilers sometimes emit code that preserves the nullness of

pointers that are osets from other pointers, necessitating some special

simplication rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

6.3.2 Recovering the Class Hierarchy . . . . . . . . . . . . . . . . . . 218

Value analysis on the VT pointer member, discussed in earlier sections,

allows the comparison of VTs which may give clues about the original

class hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

6.3.3 Function Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Value analysis could be applied to function pointers, which should yield a

subset of the set of possible targets, avoiding readability reductions that

these pointers would otherwise cause. . . . . . . . . . . . . . . . . . . . 219

Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Imprecision in the list of possible targets for indirect calls leads to

one of the few cases where correct output is not possible in general. . . 219

6.3.4 Splitting Functions . . . . . . . . . . . . . . . . . . . . . . . . . 220

A newly discovered indirect call, or occasionally a branch used as a call,

could end up pointing to the middle of an existing function; this situation

can be handled by splitting the function. . . . . . . . . . . . . . . . . . 220

6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

7 Results 223

Several techniques introduced in earlier chapters were veried with a real decom-

piler, and show that good results are possible with their use. . . . . . . 223

7.1 Industry Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

When a Windows program was decompiled for a client, the deciencies of

existing decompilers were conrmed, and the importance of recovering

structures was highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . 224

7.2 Limiting Expression Propagation . . . . . . . . . . . . . . . . . . . . . 224

Common Subexpression Elimination does not solve the problem of excessive ex-

pression propagation; the solution lies with limiting propagation of com-

plex expressions to more than one destination. . . . . . . . . . . . . . 225

7.3 Preventing Extraneous Local Variables . . . . . . . . . . . . . . . . . . 230

When the techniques of Section 4.1.3 are applied to the running example, the

generated code is signicantly more readable. . . . . . . . . . . . . . . 230


SUMMARY lix

7.4 Preserved Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Preserved locations appear to be parameters, when usually they are not, but

sometimes a preserved location can be a parameter. Propagation and

dead code elimination in combination solve this problem. . . . . . . . . 231

7.5 Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Most components of the preservation process are facilitated by the SSA form. 234

7.5.1 Conditional Preservation Analysis . . . . . . . . . . . . . . . . . 235

Preservation analysis in the presence of mutual recursion is complex, as

this example shows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

7.6 Redundant Parameters and Returns . . . . . . . . . . . . . . . . . . . . 240

The techniques of Section 4.4.3 successfully remove redundant parameters and

returns in a test program, improving the readability of the generated code. 240

8 Conclusion 243

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

The solutions to several problems with existing machine code decompilers are

facilitated by the use of the SSA form. . . . . . . . . . . . . . . . . . . 243

8.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 245

This thesis advances the state of the art of machine code decompilation through

several key contributions. . . . . . . . . . . . . . . . . . . . . . . . . . 245

8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

While the state of the art of decompilation has been extended by the techniques

described in this thesis, work remains to optimise an SSA-based IR for

decompilers that handles aggregates well, and supports alias and value

analyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
lx SUMMARY
Chapter 1

Introduction

Machine code decompilers have the ability to provide the key to software evolution:

source code. Before their considerable potential can be realised, however, several prob-

lems need to be solved.

Computers are ubiquitous in modern life, and there is a considerable investment in the

software needed to control them. Software requires constant modication throughout

its life, to correct errors, improve security, and adapt to changing requirements. Source

code, usually written in a high level language, is the key to understanding and modifying

any program. Decompilers can generate source code where the original source code is

not available.

Since the rst compilers in the 1950s, the prospect of converting machine code into

high level language has been intriguing. Machine code decompilers were rst used in

the 1960s to assist porting programs from one computer model to another [Hal62]. In

essence, a decompiler is the inverse of a compiler, as shown in Figure 1.1.

In this gure, the compiler, assembler, and linker are lumped together for simplicity.

Compilers parse source code into an intermediate representation (IR), perform various

analyses, and generate machine code, as shown in the left half of Figure 1.1. Decompilers

decode binary instructions and data into an IR, perform various analyses, and generate

source code, as shown in the right half of the gure. Both compilers and decompilers

transform the program from one form into another. Both also use an IR to represent

the program, however the IR is likely to be more low level for a compiler and high level

for a decompiler, reecting the dierent target languages. The overall direction is from

source to machine code (high level to low level) in a compiler, but from machine to

source code (low level to high level) in a decompiler. Table 1.1 compares the features

of source code and machine code.

Despite the complete reversal of direction, compilers and decompilers often employ

1
2 Introduction

Figure 1.1: Symmetry between a compiler and a decompiler.

Table 1.1: Contrasts between source code and machine code.

Source code Machine code


High level Low level
Low detail High detail
Complex expressions Simple expressions
Machine independent Machine dependent
More structure Less structure
Comprehension aids No comprehension aids

similar techniques in the analysis phases, such as data ow analysis. As an example,

both compilers and decompilers need to apply algebraic identities such as x or 0 = x,

even though a compiler may eventually emit an instruction such as or r3,0,r5 to copy
register r3 to register r5 if the target instruction set lacks a move instruction. Similarly,

both compilers and decompilers need to know the denition(s) of a variable, whether a

variable is live at a certain program point, and so on.

The compiler process of parsing the source code to an intermediate representation (IR)

corresponds roughly with the decompiler process of decoding instructions into its inter-

mediate representation. Similarly, low level code generation in a compiler corresponds

roughly with high level code generation in a decompiler.

Compilation adds machine details to a program, such as which registers to use and

exactly what primitive instructions to use to evaluate expressions, how to implement a

loop, what address to use for a variable, and so on. Compilation removes comprehension
1.1 Source Code 3

aids such as comments and meaningful names for procedures and variables; these aids

will probably never be recovered automatically. Decompilation removes (abstracts)

the machine dependent detail, and recovers information such as the nesting of loops,

function parameters, and variable types from the instructions and data emitted by the

compiler.

Section 1.1 shows the importance of source code, while Section 1.2 relates decompilers

to other reverse engineering tools. Several of the applications for decompilation are

enumerated in Section 1.3, indicating that many benets would result from solving some

of the current problems. The goal of this thesis is to solve several of these problems,

in large part by using the Static Single Assignment form, as used in many optimising

compilers. The current state of the art in machine code decompilation is poor, as will

be shown in Section 1.4.

Section 1.5 discusses the problems faced by a variety of binary analysis tools: various

existing and general classes of decompiler, and the ideal disassembler. It is found that

a good decompiler has to solve about twice the number of fundamental problems of any

existing binary analysis tool. Two existing machine code decompilers are included in

the comparison, to show in greater detail the problems that need to be solved. Legal

issues are briey mentioned in Section 1.6, and Section 1.7 summarises the main goals

of the research.

1.1 Source Code

Source code is so important to software development that at times it becomes worthwhile

to derive it from the executable form of computer programs.

A program's source code species to a computer the precise steps required to achieve

the functionality of the program. When that program is compiled, the result is an

executable le of verbose, machine specic, and minute detail with the precise steps

required to perform the steps of the program on the target machine. The executable

version has the same essential steps, except in much greater detail. For example, the

machine registers and/or memory addresses are given for each primitive step.

The source code and the machine code for the same program are equivalent, in the sense

that each is a specication of how to achieve the intended functionality of the program.

In other words, the program, in both source and machine code versions, conveys how to

perform the program's function. Comments and other design documents can optionally,

and to varying extents, convey what the program is doing, and why. The computer has

no need for the what or the why, only the how.


4 Introduction

Computers with dierent instruction sets require dierent details about how to perform

the program; hence two executable les for the same program compiled for dierent in-

struction sets will in general be completely dierent. However, the essential components

of how to perform the program are in all versions of the program: the original source

code, innitely many variants of the original source code that also perform the same

functionality, and their innitely many compilations. The task of a decompiler is es-

sentially to nd one of the variants of the original source code that is semantically

equivalent to the machine code program, and hence to the original source code.

It could be argued that the output of a decompiler, typically C or C++ with generated

variable names and few if any comments, is not high level source code. Despite this,

decompiler output is easier to read than machine code, because

• source code is much more compact than any machine code representation;

• irrelevant machine details are not present, so the reader does not have to be

familiar with the input machine; and

• high level (more abstract) source code features such as loops and conditionals are

easier to grasp at a glance than code with compares, branches and other low level

instructions.

Since the output of a decompiler is much more readable and compact than the machine

code it started with, such output will be referred to as high level language.

Source code does not necessarily imply readability, even though it is a large step in

the right direction. Figure 1.2 shows the C source code for a program which is almost

completely unreadable, even though it compiles on any C compiler. It is unlikely that

any decompiler could produce readable source code for this program, since by design its

operation is hidden. Source code comprehension (even with meaningful variable names

and comments) is an active area of Computer Science.

What is desired is fully reverse engineered source code, such as shown in Figure 1.3. This

was achieved using traditional reverse engineering techniques, starting with the source

code of Figure 1.2. Techniques used include pretty printing (replacing white space),

transforming ternary operator expressions (?:) into if-then-else statements, path pro-

ling, and replacing most recursive function calls with conventional calls [Bal98]. From

this it can be seen that low readability source code has an intrinsic value: it can be

transformed to higher readability source code.

This example also illustrates the dierence between decompilation (which might pro-

duce source code similar to that of Figure 1.2) and traditional reverse engineering (which

might produce source code similar to that of Figure 1.3).


1.1 Source Code 5

main(t,_,a)
char *a;
{return!0<t?t<3?main(-79,-13,a+main(-87,1-_,
main(-86, 0, a+1 )+a)):1,t<_?main(t+1, _, a ):3,main ( -94, -27+t, a
)&&t == 2 ?_<13 ?main ( 2, _+1, "%s %d %d\n" ):9:16:t<0?t<-72?main(_,
t,"@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+\
,/+#n+,/#;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l q#'+d'K#!/\
+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# ){n\
l]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#\
n'wk nw' iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \
;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;\
#'rdq#w! nr'/ ') }+}{rl#'{n' ')# }'+}##(!!/")
:t<-50?_==*a ?putchar(a[31]):main(-65,_,a+1):main((*a == '/')+t,_,a\
+1 ):0<t?main ( 2, 2 , "%s"):*a=='/'||main(0,main(-61,*a, "!ek;dc \
i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
Figure 1.2: An obfuscated C program; it prints the lyrics for The Twelve Days of
Christmas (all 64 lines of text). From [Ioc88].

void inner_loop(int count_day, int current_day) {


if (count_day == FIRST_DAY) {
print_string(ON_THE); /* "On the " */
/* twelve days, current_day ranges from -1 to -12 */
print_string(-current_day);
print_string(DAY_OF_CHRISTMAS); /* "day of Christmas ..." */
}
if (count_day < current_day) /* inner iteration */
inner_loop(count_day+1,current_day);
/* print the gift */
print_string(PARTRIDGE_IN_A_PEAR_TREE+(count_day-1));
}

Figure 1.3: Part of a traditional reverse engineering (from source code) of the
Twelve Days of Christmas obfuscated program of Figure 1.2. From [Bal98].

Source code with low readability is also useful simply because it can be compiled. It

could be linked with other code to keep a legacy application running, compiled for a

dierent target machine, optimised for a particular machine, and so on. Where main-

tainability is required, meaningful names and comments can be added via an interactive

decompiler, or in a post-decompilation process.

It could be said that source code exists at several levels:

• Well written, documented source code, written by the best human programmers.

It contains well written comments and carefully chosen variable names for func-

tions, classes, variables, and so on.


6 Introduction

• Human written source code that, while readable, is not as well structured as the

best source code. It is commented, but not as readable by programmers unfamiliar

with the program. Functions and variables have names, but are not as well chosen

as those in the best programs.

• Source code with few to no comments, mainly generic variable and function names,

but otherwise no strange constructs. This is probably the best source code that

a fully automatic decompiler could generate.

• Source code with few to no comments, mainly generic variable and function names,

and occasional strange constructs such as calling an expression where the original

binary program contained an indirect call instruction.

• Source code with no translation of the data section. Accesses to variables and

data structures are via complex expressions such as *(v3*4+0x8048C00). Most

registers in the original program are represented by generic local variables such

as v3. This is the level of source code that might be emitted by a static binary

translator that emits C code, such as UQBT [CVE00].

• As above, but all original registers are visible.

• As above, but even the program counter is visible, and the program is in essence a

large switch statement with one arm for the various original instructions. This is

the level of source code emitted by Instruction Set Simulation techniques [MAF91].

1.2 Forward and Reverse Engineering

Decompilation is a form of reverse engineering, starting with an executable le, and

ending where traditional reverse engineering starts (i.e. with source code that does not

fully expose the design of a program).

Figure 1.4 shows the relationship between various engineering and reverse engineering

tools and processes. Some compilers generate assembly language for an assembler, while

others generate object code directly. However, the latter can be thought of as gener-

ating assembly language internally, which is then assembled to object code. Compilers

for virtual machines (e.g. Java bytecode compilers) generate code that is roughly the

equivalent of object code. The main tools available for manipulating machine code are:

• Disassemblers: these produce a simplied assembly language version of the pro-

gram; the output may not be able to be assembled without modication. The
1.2 Forward and Reverse Engineering 7

Concepts

Traditional Requirements Traditional


forward reverse
engineering engineering
(refinement) Design (abstraction)

Source code
with comments* Post
decompilation
More abstract
editing
(less concrete)
Assisted
Source code machine code
Compiler w/out comments* decompiler
Assembly
Less abstract
decompiler
(more refined)
Object code
Assembly decompiler
language Machine code
decompiler
Assembler
Disassembler;
Object code debugger

Linker
* Comments,
meaningful
Machine code
names, etc.

Figure 1.4: The machine code decompiler and its relationship to


other tools and processes. Parts adapted from Figure 2 of [Byr92].

user requires a good knowledge of the particular machine's assembly language,

and the output is voluminous for non trivial programs.

• Debuggers: conventional debuggers also operate at the simplied assembly lan-

guage level, but have the advantage that the program is running, so that values

for registers and memory locations can be examined to help understand the pro-

gram's operation. Source level debuggers would oer greater advantages, but

would require all the problems of static decompilation to be solved rst.

• Decompilers: these produce a high level source code version of the program. The

user does not have to understand assembly language, and the output is an order

of magnitude less voluminous than that of a disassembler. The output will in

general not be the same as the original source code, and may not even be in the

same language. Usually, there will be few, if any, comments or meaningful variable
8 Introduction

names, except for library function names. Some decompilers read machine code,

some read object (pre-linker) code, and some read assembly language.

Clearly, a good decompiler has signicant advantages over the other tools.

It is possible to imagine a tool that combines decompilation and traditional reverse

engineering. For example, Ward has transformed assembly language all the way to

formal specications [War00]. However, reverse engineering from source code entails the

recognition of plans which must be known in advance [BM00], whereas decompiling

to source code from lower forms needs only to recognise such basic patterns as loops

and conditionals. It therefore seems reasonable to view decompilation and traditional

reverse engineering as separate problems.

1.3 Applications

Decompilers have many useful applications, broadly divided into those based on browsing

parts of a program, providing the foundation for an automated tool, and those requiring

the ability to compile the decompiler's output.

The rst uses for decompilation were to aid migration from one machine to another, or to

recover source code. As decompiler abilities have increased, software has become more

complex, and the costs of software maintenance have increased, a wide range of potential

applications emerge. Whether these are practical or not hinges on several key questions.

The rst of these is, Is the output to be browsed, is an automated tool required, or

should the output be complete enough to recompile the program? If the output is to

be recompiled, the next key question is Is the user prepared to put signicant eort

into assisting the decompilation process, or will the automatically generated code be

sucient? These questions lead to the graph of potential decompilation applications

shown in Figure 1.5.

1.3.1 Decompiler as Browser


When viewed as program browsers, decompilers are useful tools that focus on parts of

the input program, rather than the program as a whole.

For these applications, a decompiler is used to browse part of a program. Not all of

the output code needs to be read or even generated. Browsing high level source code

from a decompiler could be compared to browsing assembly language from a simple

disassembler such as objdump.


1.3 Applications 9

automated
browse recompile
tool

code auto maintainable


checking generated code

inter- find find cross source


comparison
operability bugs malware platform recovery

learn find vulner- optimise for fix add


verification
algorithm abilities platform bugs features

Figure 1.5: Applications of decompilation, which depend on how the output is to be


used, and whether the user modies the automatically generated output.

Interoperability

Interoperability is the process of discovering the design principles governing a program

or system, for the purposes of enabling some other program or system to operate with

it. These design principles are ideas, not implementations, and are therefore not able to

be protected by patent or copyright. Reverse engineering techniques are lawful for such

uses, as shown by Sega v. Accolade [Seg92] and again by Atari v. Nintendo [Ata92].

See also the Australian Copyright Law Review Committee report [Com95], and the

IEEE position on reverse engineering [IU00].

Decompilation is a useful reverse engineering tool for interoperability where binary code

is involved, since the user sees a high level description of that code.

Learning Algorithms

The copyright laws in many countries permit a person in lawful possession of a program

to observe, study, or analyse the program in order to understand the ideas and principles

underlying the program (adapted from [CAI01]). The core algorithm of a program

occupies typically a very small fraction of the whole program. The lower volume of code

to read, and the greater ease of understanding of high level code, confer an advantage

to decompilation over its main alternative, disassembly.

Code Checking

Browsing the output of a decompiler could assist with the manual implementation of

various forms of code checking which would normally be performed by an automatic

tool, if available. This possibility is suggested by the left-most dotted line of Figure 1.1.
10 Introduction

1.3.2 Automated Tools


Decompilation technology can be used in automated tools for nding bugs, nding vul-

nerabilities, nding malware, verication, and program comparison.

Tools that operate on native executable code are becoming more common, especially

as the need for checking the security of applications increases.

Finding Bugs

Sometimes software users have a program which they have the rights to use, but it

is not supported. This can happen if the vendor goes out of business, for example.

Another scenario is where the vendor cannot reproduce a bug, and travelling to the

user's premises is not practical. If a bug becomes evident under these conditions,

decompilation may be the best tool to help an expert user to x that bug. The only

other alternative would be to use a combination of disassembly and a debugging tool.

It will be dicult to nd the bug in either case, but with the considerably smaller

volume of high level code to work through, the expert with the decompiler should

in general take less time than the expert working with disassembly. A tool with a

decompiler integrated with a debugger would obviously be more useful than a stand

alone debugger and decompiler.

The bugs could be xed by patching the binary program, or source code fragments

illustrating the bug could be sent to the vendor for maintenance. With more user

eort, the bug could be xed in the saved source code and the program is rebuilt, as

described in subsection 1.3.4.

An automated tool could also search for sets of commonly found bugs, such as derefer-

encing pointers that could be null.

Finding Vulnerabilities

Where a security vulnerability is suspected, a large amount of code has to be checked.

Again, the lower volume of output to check would in general make the decompiler a

far more eective tool. It should be noted that the actual vulnerability will sometimes

have to be checked at the disassembly level, since some vulnerabilities can be extremely

machine specic. Others, such as buer overow errors, may be eectively checked

using the high level language output of the decompiler. For the low level vulnerabilities,

decompilation may still save considerable time by allowing the expert user to navigate

quickly to those areas of the program that are most likely to harbour problems.
1.3 Applications 11

Finding Malware

Finding malware (behaviour not wanted by the user, e.g. viruses, keyboard logging,

etc.) is similar in approach to nding vulnerabilities. Hence, the same comments apply,

and the same advantages exist for decompilers.

Verication

Verication checks that the software build process has produced correct code, and that

no changes have been made to the machine code le since it was created. This is

important in safety critical applications [BS93].

Comparison

When enforcing software patents, or defending claims of patent infringements, it is

sometimes necessary to compare the high level semantics of one piece of binary code

with another piece of binary code, or its source code. If both are in binary form, these

pieces of code could be expressed in machine code for dierent machines. Decompilation

provides a way of abstracting away details of the machine language (e.g. which compiler,

optimisation level, etc), so that more meaningful comparisons can be made. Some

idiomatic patterns used by compilers could result in equivalent code that is very dierent

at the machine code level, and the dierences may be visible in the decompiled output.

In these cases, decompilation may need to be accompanied by some source to source

transformations.

1.3.3 Recompilable Code, Automatically Generated


The user may choose to accept the default, automatically generated output, which even

if dicult to read still has signicant uses.

Decompilers can ideally be used in round-trip reengineering: starting with a machine

code program, they should be able to produce source code which can be modied, and a

working program should result from the modied source code. This ideal can currently

only be approached for Java and CLI (.NET) bytecode programs, but applications can

be discussed ahead of the availability of machine code decompilers with recompilable

output.

Automatically generated output will be low quality in the sense that there will be few

meaningful identier names, and few if any comments. This will not be easy code to
12 Introduction

maintain, however, recompilable source code of any quality is useful for the following

applications.

Optimise for Platform

Many Personal Computer (PC) applications are compiled in such a way that they will

still run on an 80386 processor, even though only a few percent of programs would be

running on such old hardware. Widely distributed binary programs have to be built

for the expected worst case hardware. A similar situation exists on other platforms,

e.g. SPARC V7 versus V9. There is a special case with Single Instruction Multiple Data

(SIMD) instructions, where the advantage of utilising platform specic instructions is

decisive, and vendors nd that it is worth rewriting very small pieces of code several

times for several dierent variations of instruction set. Apart from this exception,

the advantages of using the latest machine instruction set are often not realised. A

decompiler can realise those advantages by producing source code which is then fed

into a suitable compiler, optimising for the exact processor the user has. The resulting

executable will typically perform noticeably better than the distributed program.

Cross Platform

Programs written for one platform are commonly ported to other platforms. Ob-

viously, source code is required, and decompilation can supply it. Two other issues

arise when porting to other platforms: the issues of libraries and system dependencies

in the programs being ported. The latter issue has to be handled by hand editing of

the source code. The libraries required by the source program may already exist on

the target platform, they may have to be rewritten, or a compatibility library such as

Winelib [Win01] could be used.

With the introduction of 64-bit operating systems, another opportunity for decompila-

tion arises. 32-bit drivers will typically not work with 64-bit operating systems. In these

cases, either all drivers will have to be rewritten, or some form of compatibility layer

provided. The compatibility layer, if provided, typically provides lower performance

than native drivers, and of course will not provide the benets of 64-bit addresses

(i.e. being able to address more than 4GB of memory). There will be some hardware

for which there is no vendor support, yet the hardware is still useful. In these cases,

a 64-bit driver could be rewritten from decompiled source code for the 32-bit driver.

This would allow maximum performance and provide full 64-bit addressing.

Machine code drivers are commonly the only place that low level hardware details

are visible. As a result, porting drivers across operating systems when the associated
1.3 Applications 13

hardware is not documented may even be feasible with the aid of decompilation to

expose the necessary interoperability information. Drivers also have the advantage

of being relatively small pieces of code, which typically call a relatively small set of

operating system functions.

Other Applications for Automatically Generated Code

Automatically generated code may be suitable for xing bugs and adding features.

More commonly, these would be performed on manually enhanced, maintainable code,

as discussed below. The dotted lines near the right end of Figure 1.5 indicate this

possibility.

1.3.4 Recompilable, Maintainable Code


If sucient manual eort is put into the decompilation process, the generated code can

be recompilable and maintainable.

In contrast with creating source code for the above cases, creating maintainable code

requires considerable user input. The process could be compared to using a disassembler

with commenting and variable renaming abilities. The user has to understand what the

program is doing, enter comments and change variable names in such a way that others

reading the resultant code can more easily understand the program. There is still a

signicant advantage over rewriting the program from scratch, however: the program

is already debugged.

The result will probably not be bug free, since no signicant program is. However the

eort of writing and debugging a new program to the same level of bugs as the input

program is often substantially larger than the eort of adding enough comments and

meaningful identiers via decompilation. (This assumes that the decompiler does not

introduce more bugs than a manual rewrite.)

It is easier to optimize correct code than to correct optimized code.  Bill

Harlan [Har97].

In decompilation terms, optimisation is actually the creation of the most readable rep-

resentation of the program; the optimum program is the most readable one. The above

quote, obviously aimed at the forward engineering process, applies equally to decompi-

lation.

Decompilation for comprehension, including for maintenance, could be viewed as a

program understanding problem where the only program documents available are end-

user documents and the machine code program itself.


14 Introduction

Fix Bugs

Once maintainable source code is available for an application, the user can x bugs by

making changes to the decompiled high level code.

It could be argued that xing bugs is possible with automatically generated code, be-

cause the changes could be made in the absence of comments or meaningful names.

However, it will be easier to eect the change and to maintain it (bug xes need main-

tenance, like all code), if at least the code near the bug is well commented. This is the

reason for the dotted line between auto generated and x bugs in Figure 1.5.

Many programs ship without debugging information, and often link with third party

code. Sometimes when the program fails in the eld, a version of the failing program

with debugging support turned on will not exhibit the fault. In addition, developers

are reluctant to allow debug versions of their program into the eld. Existing tools

for this situation are inadequate. A high level debugger, essentially a debugger with a

decompiler in place of the usual disassembler, could ease the support burden, even for

companies that have the source code (except perhaps to third party code) [CWVE01].

Add Features

Similarly to the above, once maintainable source code is available, the user can add

new features to the program. Also similarly to the above, it could be argued that this

facility is also available to automatically generated code.

Source Code Recovery

Legacy applications, where the source code (or at least one of the source les) is missing,

can be maintained by combining a decompiler with hand editing to produce maintain-

able source code. Variations include applications with source code but the compiler is

no longer available, or is not usable on modern machines. Another variation is where

there is source code, it is missing some of the desirable features of modern languages.

An example of the latter is the systems language BCPL, which is essentially untyped.

Sometimes there is source code, but it is known to be out of date and missing some

features of a particular binary le. Finally, one or more source code les may be written

in assembly language, which is machine specic and dicult to maintain.


1.4 State of the Art 15

1.4 State of the Art

The state of the art as of 2002 needed improvement in many areas, including the recovery

of parameters and returns, type analysis, and the handling of indirect jumps and calls.

At the start of this research in early 2002, machine code decompilers could automatically

generate source code for simple machine code programs that could be recompiled with

some eort (e.g. adding appropriate #include statements). Statically compiled library

functions could be recognised to prevent decompiling them, give them correct names,

and (in principle) infer the types of parameters and the return value [VE98, Gui01]. The

problem of transforming unstructured code into high level language constructs such as

conditionals and loops is called structuring. This problem is largely solved; see [Cif94]

Chapter 6 and [Sim97].

Unlike bytecode programs, machine code programs rarely contain the names of functions

or variables, with the exception of the names of library functions and user functions

dynamically invoked by name. As a result, converting decompiler output into maintain-

able code involves manually changing the names of functions and variables, and adding

comments. At the time, few decompilers performed type recovery, so that the types

for variables usually have to be manually entered as well. Enumerated types (e.g. enum
colour {red, green, blue};) that do not interact with library functions can never

be recovered automatically from machine code, since they are indistinguishable from

integers.

Many pre-existing decompilers have weak analyses for indirect jump instructions, and

even weaker analysis of indirect call instructions. When an indirect jump or call is not

analysed, it becomes likely that the decompilation will not only fail to translate that

instruction, but the whole separation of code and data could be incomplete, possibly

resulting in large parts of the output being either incorrect or missing altogether.

Most decompilers attempted to identify parameters and returns of functions. However,

there were often severe limitations on this identication (e.g. the decompiler might

assume that parameters are always or never located in registers).

Existing decompilers can be used for a variety of real-world applications, if the user is

prepared to put in signicant eort to manually overcome the deciencies of the decom-

piler. Sometimes circumstances would warrant such eort; see for example [VEW04].

However, such situations are rare. There would be far more applications for decom-

pilation if the amount of eort needed to correct deciencies in the output could be

reduced. In summary, the main limitations of existing decompilers were:


16 Introduction

• Type recovery was poor to non existent.

• There was poor translation of indirect jump instructions to switch/case state-

ments.

• There was even poorer handling of indirect call instructions, to e.g. indirect func-

tion calls in C, or virtual functions in C++.

• The identication of parameters and returns made assumptions that were some-

times invalid.

• Some decompilers had poor performance, or failed altogether, on programs more

than a few tens of kilobytes in size.

• Some decompilers produced output that resembled high level source code, but

which would not compile without manual editing.

• Some decompilers produced incorrect output, e.g. through incorrect assumptions

about stack pointer preservations.

Finally, meaningful names for functions and variables are not generated. Certainly,

the original names are not recoverable, assuming that debug symbols are not present.

Similarly, enumerated types (not associated with library functions) are not generated

where a human programmer would use them. Unless some advanced articial intelli-

gence techniques become feasible, these aspects will never be satisfactorily generated

without manual intervention.

1.5 Reverse Engineering Tool Problems

Various reverse engineering tools are compared in terms of the basic problems that they

need to solve; the machine code decompiler has the largest number of such problems.

Reverse Engineering Tools (RETs) that analyse binary programs face a number of

common diculties, which will be examined in this section. Most of the problems stem

from two characteristics of executable programs:

1. Some information is not present in some executable le formats, e.g. variable and

procedure names, comments, parameters and returns, and types. These are losses

of information, and require analyses to recover the lost information.


1.5 Reverse Engineering Tool Problems 17

2. Some information is mixed together in some executable le formats, e.g. code and

data; integer and pointer calculations. Pointers and osets can be added together

at compile time, resulting in a dierent kind of mixing which is more dicult

to separate. These are losses of separation, and require analyses to make the

separation, without which the representation of the input program is probably

not correct.

By contrast, forward engineering tools have available all the information needed. Pro-

gramming languages, including assembly languages, are designed with forward engi-

neering in mind, e.g. most require the declaration of the names and types of variables

before they are used.

There are some advantages to reverse engineering over forward engineering. For ex-

ample, usually the reverse engineering tool will have the entire program available for

analysis, whereas compilers often only read one module (part of a program) in isolation.

The linker sees the whole program at once, but usually the linker will not perform any

signicant analysis. This global visibility for reverse engineering tools can potentially

make data ow and alias analysis more precise and eective for reverse engineering tools

than for the original compiler. This advantage has a cost: keeping the IR for the whole

program consumes a lot of memory.

Another advantage of reverse engineering from an executable le is that the reverse

engineering tool sees exactly what the processor will execute. At times, the compiler

may make decisions that surprise the programmer [BRMT05]. As a result, security

analysis tools increasingly work directly from executable les, and hopefully decompilers

may be able to help in this area.

Table 1.2 shows the problems to be solved by various reverse engineering tools. The

tools considered are:

• decompilers for Java and CLI/CIL bytecodes

• decompilers that read assembly language and object (pre-linker) code

• the ideal disassembler: it transforms any machine code program to reassemblable

assembly language

• the ideal machine code decompiler: it transforms any machine code program to

recompilable high level language

• the problems that were solved by two specic machine code decompilers, dcc

[Cif96] and the Reverse Engineering Compiler (REC) [Cap98].


18 Introduction

Table 1.2: Problems to be solved by various reverse engineering tools.

Machine code

disassembler
Object code
decompiler

decompiler

decompiler

decompiler

decompiler

decompiler

decompiler
Assembly

bytecode
Ideal

Java
REC
dcc

CLI
Problem
Separate
yes some some some no yes no no
code from data
Separate
yes no no easy no yes no no
pointers from constants
Separate original from
yes no no easy no yes no no
offset pointers
Declare data yes no no yes easy yes easy easy
Recover
yes yes most yes yes no no no
parameters and returns
Analyse
yes no no yes yes yes no no
indirect jumps and calls
most local
Type analysis yes no no yes some no no
variables
Merge instructions yes yes yes yes yes no yes yes
Structure
yes yes yes yes yes no yes yes
loops and conditionals
Total 9 3½ 3¼ 6½ 4½ 5 2½ 2

1.5.1 Separating Code from Data


Separating code from data is facilitated by data ow guided recursive traversal and the

analysis of indirect jumps and calls, both of which are addressed in this thesis.

Recompilability requires the solution of several major problems. The rst of these is

the separation of code from data. For assembly, Java bytecode, and CLI decompilers,

this problem does not exist. For the other decompilers, the problem stems from the

fact that in most machines, data and code can be stored in the same memory space.

The general solution to this separation has been proved to be equivalent to the halting

problem [HM79]. The fact that many native executable le formats have sections with

names such as .text and .data does not alter the fact that compilers and programmers

often put data constants such as text strings and switch jump tables into the same

section as code, and occasionally put executable code into data segments. As a result,

the separation of code and data remains a signicant problem.

There are a number of ways to attack this separation problem; a good survey can be
+
found in [VWK 03]. The most powerful technique available to a static decompiler is

the data ow guided recursive traversal. With this technique, one or more entry points

are followed to discover all possible paths through the code. This technique relies on all

paths being valid, which is unlikely to be true for obfuscated code. It also relies on the
1.5 Reverse Engineering Tool Problems 19

ability to nd suitable entry points, which can be a problem in itself. Finally, it relies

on the ability to analyse indirect jump and call instructions using data ow analysis.

Part of the problem of separating code from data is identifying the boundaries of pro-

cedures. For programs that use standard call and return instructions this is usually

straightforward. Tail call optimisations replace call and return instruction pairs with

jump instructions. These can be detected easily where the jump is to the start of a

function that is also the destination of a call instruction. However, some compilers

such as MLton [MLt02] do not use conventional call and return instructions for the

vast majority of the program. MLton uses Continuation Passing Style (CPS), which

is readily converted to ordinary function calls. Decompilation of programs compiled

from functional programming languages is left for future work, but it does not appear

at present that nding procedure boundaries poses any particularly dicult additional

problems.

1.5.2 Separating Pointers from Constants


The problem of separating pointers from constants is solved using type analysis.

The second major problem is the separation of pointers from constants. For each

pointer-sized immediate operand of an instruction, a disassembler or decompiler faces

the choice of representing the immediate value as a constant (integer, character, or other

type), or as a pointer to some data in memory (it could point to any type of data). Since

addresses are always changed between the original and the recompiled program, only

those immediate values identied as pointers should change value between the original

and recompiled programs.

This problem is solved using type analysis, as described in Chapter 5.

1.5.3 Separating Original from Oset Pointers


Separating original pointers from those with osets added (oset pointers) requires range

analysis.

Reverse engineering tools that read native executable les have to separate original

pointers (i.e. pointers to the start of a data object) from oset pointers (e.g. pointers

to the middle of arrays, or outside the array altogether). Note that if the symbol table

was present, it would not help with this problem. Figure 1.6 shows a simple program

illustrating the problem, in C source code and x86 machine code disassembled with IDA

Pro [Dat98]. There is no need to understand much about the machine code, except to
20 Introduction

note that the machine code uses the same constant (identied in this disassembly as str)
to access the three arrays. Interested readers may refer to the x86 assembly language

overview in Table 3.1 on page 63, but note that this example is in Intel assembly

language style where the destination operand comes rst.

mov eax, offset str


lea eax, [edx+eax]
#include <stdio.h>
mov eax, [eax]
int a[8] = {1, 2, 3, 4, 5, 6, 7, 8};
push eax
float b[8] =
push offset a3d ; "%-3d "
{1., 2., 3., 4., 5., 6., 7., 8.};
call sub_8048324 ; printf
char* str = "collision!";
int main() {
mov eax, offset str
int i;
lea eax, [edx+eax]
for (i=-16; i < -8; i++)
fld dword ptr [eax]
printf("%-3d ", (a+16)[i]);
lea esp, [esp-8]
printf("\n");
fstp [esp+24h+var_24]
for (i=-8; i < 0; i++)
push offset a_1f ; "%.1f "
printf("%.1f ", (b+8)[i]);
call sub_8048324 ; printf
printf("\n");
printf("%s\n", str);
mov eax, str
return 0;
push eax
}
call sub_8048304 ; puts
(a) Source code
(b) Disassembly of the underlined
source code lines. Intel syntax.

Figure 1.6: A program illustrating the problem of separating original and oset pointers.

In this program, there are two arrays which use negative indexes. Similar problems

result from programs written in languages such as Pascal which support arbitrary array

bounds. In this program, the symbol table was not stripped, allowing the disassembler

to look up the value of the constants involved in the array fetch instructions. However,

the disassembler assumes that the pointers are original, i.e. that they point to the start

of a data object, in this case the string pointer str. The only way to nd out that the

rst two constants are actually oset pointers is to analyse the range of possible values

for the index variable i (register edx in the machine code contains i*4).
Note that decompiling the rst two array elements as

((int*)(str-16))[i] and

((float*)(str-8))[i] respectively

would in a sense be an accurate translation of how the original program worked. It

may under some circumstances even compile and run correctly, but it would be very

poor code quality. It is dicult to read, but most importantly the compiler of the

decompiled output is free to position the data objects of the program in any order and
1.5 Reverse Engineering Tool Problems 21

with any alignment that may be required by the target architecture, probably resulting

in incorrect behaviour.

Without a separation of original and oset pointers, it is not possible to determine

which data object is accessed in memory expressions, so that not all denitions are

known. In addition, when the address of a data object is taken, it is not known which

object's address is taken, which could also lead to a denition not being identied. As

will be shown in Chapter 3, underestimating denitions is unsafe.

The problems of separating pointers from constants and original pointers from oset

pointers could be combined. The combined problem would be the problem of separating

constants, original pointers, and oset pointers. These problems are not combined here

because although they stem from the same cause (the linker adding together quantities

that it is able to resolve into constants), the problems are somewhat distinct. Also, sep-

arating original from oset pointers requires an extra analysis that separating constants

from pointers does not.

1.5.4 Tools Comparison


The problems faced by reverse engineering tools increases as the abstraction distance

from source code to the input code increases; hence assembly language decompilers face

relatively few problems, and machine code decompilers face the most.

The information and separation losses discussed above occur at dierent stages of the

compilation process, as shown in Figure 1.7.

Figure 1.7 shows that no separations are lost in the actual compilation process (here ex-

cluding assembly and linking). In principle, procedure level comments could be present

at the assembly language level (and even at the level of major loops, before and after

procedure calls, etc.) Hence, only statement level detailed comments are lost at this

stage. The entry structured statements indicates that several structured statements

are lost, such as conditionals, loops, and multiway branch (switch or case) statements.

However, the switch labels persist at the assembly language level. The original expres-

sions from the source code are not directly present in the assembly language, however

the elementary components (e.g. individual add and divide operations) are present in

individual instructions. Some higher level types may still be present, e.g. structure

member Customer.address may be accessed via the symbols Customer (address of


the structure) and address (oset to the address eld). An array of double precision

oating point numbers starts with a label and has assembly language statements that

reserve space for the array. The elementary type of the array elements is absent, how-

ever its name and size (in machine words or bytes, if not in elements) is given. This
22 Introduction

Source code Lost Information Lost Separations


detailed comments
structured statements
Compiler
(excluding expressions
assembler)
elementary types
local variable names1
Assembler code
all comments
code from data
all types
data declarations
Assembler
switch labels
parameters and returns
local variable names2
Object code

relocation information pointers from constants


Linker variable and original from offset pointers
procedure names

Machine code 1
Some compilers 2
If not already lost

Figure 1.7: Information and separations lost at various stages in the com-
pilation of a machine code program.

shows that decompiling from assembly language to source code is not as dicult, in

terms of fundamental problems to be overcome, as decompilation or even disassembly

from object or machine code.

After assembly, all comments, types, and data declarations are lost, as are switch labels.

At this stage, the rst separation is lost: code and data are mixed in the executable

sections of the object code le. Even at the object code level, however, there are

clues provided by the relocation information. For example, a table of pointers will have

pointer relocations and no gaps between the relocations; this can not be executable code.

However, a data structure with occasional pointers is still not readily distinguished

from code. For this reason, the entry for separate code from data in the object code

decompiler column is given as some.

After linking, all variable and procedure names are lost, assuming the usual case that

symbols and debug information is stripped. The very useful relocation information is

also lost. It would be convenient for decompilation if processors had separate integer and

pointer processing units, just as they now have separate integer and oating point units.
1.5 Reverse Engineering Tool Problems 23

Since they do not, an analysis is required to separate integer constants from pointer

constants. Object code provides ready clues via relocation information to make this

analysis easy. From machine code, the analysis has to determine whether the result of a

calculation is used as a memory address or not, and, if it is a memory pointer, whether

the pointer is original or oset. These are more dicult problems that cannot be solved

for all possible input programs.

1.5.4.1 Disassemblers

Machine code decompilers face four additional problems to those of an ideal disassem-

bler; all are addressed in this thesis, or are already solved.

The problems encountered by an ideal decompiler as opposed to an ideal disassembler

are:

• identifying parameters and returns, this is covered in Chapter 3;

• declaring types using type analysis, this is covered in Chapter 5;

• merging instruction semantics to form complex expressions; this is already solved

by Java and CLI decompilers, and is facilitated by the data ow techniques of

Chapter 3; and

• structuring loops and conditionals, which are already solved by Java and CLI

decompilers.

In other words, if an ideal disassembler could be written, the techniques now exist to

extend it to a full decompiler. Unfortunately, present disassemblers suer from similar

limitations to those of current decompilers.

1.5.4.2 Assembly Decompilers

Good results have been achieved for assembly language decompilers, however, they only

face about half the problems of a machine code decompiler.

Some decompilers take as input assembly language programs rather than machine code;

such tools are called assembly decompilers. Assembly decompilers have an easier job in

several senses. An obvious advantage is the likely presence of comments and meaningful

identier names in the assembly language input. Other advantages include the nonex-

istence of the three separation problems of disassemblers (separating code from data,

pointers from constants, and original from oset pointers), type analysis is much less
24 Introduction

work, and declaring data is easy. Assembly language decompilers have the names and

sizes of data objects, hence there is no problem separating original and oset pointers.

Figure 1.8 shows the assembly language code for the underlined code of Figure 1.6(a),

with the explicit labels a, b, and str.


movl $b+32, %eax
movl $a+64, %eax
leal (%edx,%eax), %eax
leal (%edx,%eax), %eax
flds (%eax) movl str, %eax
movl (%eax), %eax
leal -8(%esp), %esp pushl %eax
pushl %eax
fstpl (%esp) call puts
pushl $.LC1 ; "%-3d "
pushl $.LC2 ; "%.1f "
call printf
call printf
Figure 1.8: Assembly language output for the underlined code of Figure 1.6(a), pro-
duced by the GCC compiler.

Assembly decompilers still have some of the major problems of decompilers as shown

in Table 1.2: identifying function parameters and return values, merging instruction

semantics into expressions, and structuring into high level language features such as

loops. The symbols in assembly language typically make compound types such as

arrays and structures explicit, but the types of the array or structure elements typically

still requires type analysis, hence the entry some under type analysis for assembly

decompilers in Table 1.2. Overall, assembly decompilers have slightly fewer problems

to solve than an ideal disassembler, if each problem is assigned an equal weight.

The above considerations apply to normal assembly language, with symbols for each

data element. Some generated assembly language, where the addresses of data items

are forced to agree with the addresses of some other program, would be much more

dicult to generate recompilable high level source code for. The assembly language

output of a binary translator would be an example of such dicult assembly language.

Applications for the decompilation of such programs would presumably be quite rare,

and will not be considered further here.

1.5.4.3 Object Code Decompilers

Object code decompilers are intermediate in the number of problems that they face be-

tween assembly decompilers and machine code decompilers, but the existence of reloca-

tion information removes two signicant separation problems.

A few decompilers take as input linkable object code ( .o or .obj les). Object les are

interesting in that they contain all the information contained in machine code les,

plus more symbols and relocation information designed for communication with the

linker. The extra information makes the separation of pointers from constants easy,
1.5 Reverse Engineering Tool Problems 25

and original pointers from osets pointers as well. Figure 1.9 shows the disassembly for

the underlined code of Figure 1.6(a), starting from the object le, again with explicit

references to the symbols a, b, and str. However, there are few circumstances under

which a user would have access to object les but not the source les.

mov eax, (offset b+20h)


mov eax, (offset a+40h)
lea eax, [edx+eax]
lea eax, [edx+eax]
fld dword ptr [eax] mov eax, str
mov eax, [eax]
lea esp, [esp-8] push eax
push eax
fstp [esp+24h+var_24] call puts
push offset a3d ; "%-3d "
push offset a_1f ; "%.1f "
call printf
call printf
Figure 1.9: Disassembly of the underlined code from Figure 1.6(a) starting with object
code. Intel syntax. Compare with Figure 1.6(b), which started with machine code.

1.5.4.4 Virtual Machine Decompilers

Most virtual machine decompilers such as Java bytecode decompilers are very success-

ful because their matadata-rich executable le formats ensure that they only face two

problems, the solutions for which are well known.

Some decompilers take as input virtual machine executable les, e.g. Java bytecodes.

Despite the fact that the Java bytecode le format was designed for programs written

in Java, compilers exist which compile several languages other than Java to bytecodes

(e.g. Component Pascal [QUT99] and Ada [Sof96]; for a list see [Tol96]). Like assem-

bly decompilers, most virtual machine decompilers have an easier job than machine

code decompilers, because of the extensive metadata present in the executable program

les. For example, Java bytecode decompilers have the following advantages over their

machine code counterparts:

• Data is already separated from code.

• Separating pointers (references) from constants is easy, since there are bytecode

opcodes that deal exclusively with object references (e.g. geteld, new ).

• Parameters and return values are explicit in the bytecode le.

• There is no need to decode global data, since all data are contained in classes.

Class member variables can be read directly from the bytecode le.

• Type analysis is not needed for member variables and parameters, since the types

are explicit in the bytecode le. However, type analysis is needed for most local
26 Introduction

variables. In the Java Virtual machine, local variables are divided by opcode

group into the broad types integer, reference, oat, long, and double. There is no

indication as to what subtype of integer is involved (boolean, short, or int), or what


objects a reference refers to. The latter problem is the most dicult. It is also

possible for a local variable slot to take on more than one type, even though this

is not allowed at the Java source code level [GHM00]. This requires the dierent

type usages of the local variable to be split into dierent local variables.

The following problems are common with machine code decompilers:

• Typing of most local variables;

• Merging instruction eects into expressions; and

• Structuring the code (eliminate gotos, generate conditionals, loops, break state-

ments, etc).

Bytecode decompilers have one minor problem that most others do not: they need to

atten the stack oriented instructions of the Java (or CIL) Virtual Machine into the

conventional instructions that real processors have. While this process is straightfor-

ward, it is sometimes necessary to create temporary variables representing the current

top of stack (sometimes called stack variables). A few bytecode decompilers assume

that the stack height is the same before and after any Java statement (i.e. values are

never left on the stack for later use). These fail with optimised bytecode.

The advantages of virtual machine decompilers are due exclusively to the metadata

present in their executable le formats. A decompiler for a hypothetical virtual machine

whose executable les is not metadata rich would be faced with the same problems as

a machine code decompiler.

At present, bytecode to Java decompilers are in an interesting position. The number of

problems that they have to solve is small, so the barrier to writing them is comparatively

small, many are available, and some perform quite well. A few of them perform type

analysis of local variables. For this reason, several Java decompilers are reviewed in

Section 2.4 on page 42.

1.5.4.5 Limitations of Existing Decompilers

The limitations of existing machine code decompilers include the size of the input pro-

gram, identication of parameters and returns, handling of indirect jumps and calls,

and type analysis.


1.5 Reverse Engineering Tool Problems 27

Of the handful of machine code decompilers in existence at the start of this research in

early 2002, only two had achieved any real success. These are the dcc decompiler [Cif96],

and the Reverse Engineering Compiler (REC) [Cap98], both summarised in Section 2.1

on page 35. Dcc is a research compiler, written to validate the theory given in a PhD

thesis [Cif94]. It decompiles only 80286 DOS programs, and emits C. REC is a non

commercial decompiler recognising several machine code architectures and Binary File

Formats, and produces C-like output. Considerable hand editing is required to convert

REC output to compilable C code. Table 1.2 showed the problems solved by these

decompilers, compared with the problems faced by various other tools, while Table 1.3

shows the limitations in more detail.

Table 1.3: Limitations for the two most capable preexisting machine code decompilers.

Problem area dcc REC

Yes, but does not appear to


Handle large
No perform much interprocedural
programs
analysis

Yes, but only performed inter-


Identication of
procedural analysis on registers
parameters and
Inconsistent results
and made assumptions about
returns
stack parameters

IR level pattern matching;

indirect jumps mentions need for more id- Inconsistent results

ioms

Assumes memory function


No recovery of higher level con-
indirect calls pointer is constant; no recovery
structs
of higher level constructs

Handles pairs of registers for


No oats, arrays, structures,
type analysis long integers; no oats, arrays,
etc.
structures, etc.

1.5.5 Theoretical Limits and Approximation


Compilers and decompilers face theoretical limits which can be avoided with conserva-

tive approximations, but while the result for a compiler is a correct but less optimised

program, the result for a decompiler ranges from a correct but less readable program to

one that is incorrect.


28 Introduction

Many operations fundamental to decompilation (such as separating code from data) are

equivalent to the halting problem [HM79], and are therefore undecidable. By Rice's the-

orem [Ric53], all non-trivial properties of computer programs are undecidable [Cou99],

hence compilers and other program-related tools are aected by theoretical limits as

well.

A compiler can always avoid the worst outcome of its theoretical limitations (incorrect

output) by choosing conservative behaviour. As an example, if a compiler can not

prove that a variable is a constant at a particular point in a program when in reality

it is, there is a simple conservative option: the constant propagation is not applied

in that instance. The result is a program that is correct; the cost of the theoretical

limit is that the program may run more slowly or consume more memory than if the the

optimisation had been applied. Note that any particular compiler theoretical limitation

can be overcome with a suciently powerful analysis; the theoretical limitation implies

that no compiler can ever produce optimal output for all possible programs.

This contrasts with a decompiler which due to similar theoretical limitations cannot

prove that an immediate value is the address of a procedure. The conservative behaviour

in this case would be to treat the value as an integer constant, yet somehow ensure that

all procedures in the decompiled program start at the same address as in the input

program, or that there are jump instructions at the original addresses that redirect

control ow to the decompiled procedures. If in fact the immediate value is used as a

procedure pointer, correct behaviour will result, and obviously if the value was actually

an integer (or used sometimes as an integer and elsewhere as a procedure pointer), no

harm is done. Note that such a solution is quite drastic, compared to the small loss

of performance suered by the compiler in the previous example. To avoid this drastic

measure, the decompiler will have to choose between an integer and procedure pointer

type for the constant (or leave the choice to the user); if the incorrect choice is made,

the output will be incorrect.

A similar situation will be discussed in Section 3.6 with respect to calls where the

parameters cannot be fully analysed. A correct but less readable program can be

generated by passing all live locations as parameters. In the worst case, a decompiler

could fall back to binary translation techniques, whose output (if expressed in a language

like C) is correct but very dicult to read. Since the original program is (presumed

to be) correct, the decompiled program can always mimic the operation of the original

program.

Where a decompiler chooses not to be completely conservative (e.g. to avoid highly

unreadable output), it is important that it issues a comprehensive list of warnings

about decisions that have been made which may prove to be incorrect.
1.6 Legal Issues 29

1.6 Legal Issues

There are sucient important and legal uses for decompilation to warrant this research,

and decompilation may facilitate the transfer of facts and functional concepts to the

public domain.

A decompiler is a powerful software analysis tool. Section 1.3 detailed many important

and legal uses for decompilation, but there are obviously illegal uses as well. It is

acknowledged that the legal implications of decompilation, and reverse engineering tools

in general, are important. However, these issues are not considered here. For an

Australian perspective on this subject, see [FC00] and [Cif01].

Noted copyright activist Lawrence Lessig has written:

`But just as gun owners who defend the legal use of guns are not endorsing

cop killers, or free speech activists who attack overly broad restrictions on

pornography are not thereby promoting the spread of child pornography,

so too is the defence of p2p technologies not an endorsement of piracy. '

[Les05]

The last part could be replaced with so too is the defence of decompilation research

not an endorsement of the violation of copyright or patent.

It is possible that decompilation may eventually help to attain the original goals of

copyright laws. Temporary monopolies are granted to authors, to eventually advance

the progress of science and art. From [Seg92]:

... the fact that computer programs are distributed for public use in object

code form often precludes public access to the ideas and functional concepts

contained in those programs, and thus confers on the copyright owner a

de facto monopoly over those ideas and functional concepts. That result

defeats the fundamental purpose of the Copyright Act - to encourage the

production of original works by protecting the expressive elements of those

works while leaving the ideas, facts, and functional concepts in the public

domain for others to build on.

Unfortunately, as of 2005, the legal position of all technologies capable of copyright

infringement is unclear. Prior to the MGM v Grokster, the landmark Sony Betamax

case stated in eect that manufacturers of potentially infringing devices (in this case,

Sony's Video Cassette Recorder) can not be held responsible for infringing users, as

long as there are substantial non-infringing uses [Son84]. However, the ruling in MGM

vs Grokster was:
30 Introduction

We hold that one who distributes a device with the object of promoting

its use to infringe copyright, as shown by clear expression or other ar-

mative steps taken to foster infringement, is liable for the resulting acts of

infringement by third parties using the device, regardless of the device's

lawful uses. [Gro05]

1.7 Goals

The main goals are better identication of parameters and returns; reconstructing types,

and correctly translating indirect jumps and calls; all are facilitated by the Static Single

Assignment form.

Many questions related to decompilation are undecidable. As a result, it is certainly

true that not all machine code programs can successfully be decompiled. The inter-

esting question is therefore how closely is it possible to approach an ideal machine

code decompiler for real-world programs. The objective is to overcome the following

limitations of existing machine code decompilers:

• Correctly and not excessively propagating expressions from individual instruction

semantics into appropriate complex expressions in the decompiled output.

• Correctly identifying parameters and returns, without assuming compliance with

the Application Binary Interface (ABI). This requires an analysis to determine

whether a location is preserved or not by a procedure call, which can be problem-

atic in the presence of recursion.

• Inferring types for variables, parameters, and function returns. If possible, the

more complex types such as arrays, structures, and unions should be correctly

inferred.

• Correctly analysing indirect jumps and calls, using the power of expression prop-

agation and type analysis. If possible, calls to object oriented virtual functions

should be recognised as such, and the output should make use of classes.

This thesis shows that the above items are all facilitated by one technique: the use of

the Static Single Assignment (SSA) form. SSA is a representation commonly used in

optimising compilers, but until now not in a machine code decompiler. SSA is discussed

in detail in Chapter 4.
1.8 Thesis Structure 31

It is expected that the use of SSA in a machine code decompiler would enable at least

simple programs with the following problems, not handled well by current decompilers,

to be handled correctly:

• register and stack-based parameters that do not comply with the Application

Binary Interface (ABI) standard,

• parameters that are preserved,

• parameters and returns should be recovered in the presence of recursion, without

having to assume ABI compliance,

• basic types should be recovered,

• switch statements should be recognised despite code motion and other optimisa-

tions, and

• at least the simpler indirect call instructions should be converted to high level

equivalents.

1.8 Thesis Structure

Following chapters review the limitations of existing machine code decompilers, and

show how many of their limitations in the areas of data ow analysis, type analysis, and

the translation of indirect jumps and calls are solved with the Static Single Assignment

form.

This section gives an overview of the whole thesis. Throughout the thesis, one sentence

summaries precede most sections. These can be used to obtain an overview of a part of

the thesis. The index of summaries on page xliii is therefore an overview of the whole

thesis.

Chapter 2 reviews the state of the art of various decompilers of machine code through
to virtual machines. It reviews what has been done to date, with an emphasis on their

shortcomings.

Data ow analysis is one of the most important components of a decompiler, as shown in

Chapter 3. Data ow establishes the relationship between denitions and uses, which
is important for expression propagation and eliminating condition codes from machine

code. The data ow eects of call statements are particularly important; equations are

dened to summarise the various elements. While traditional data ow techniques can

solve the problems required by decompilers, they are complex.


32 Introduction

Chapter 4 introduces the Static Single Assignment (SSA) representation; the use of
SSA for decompilation forms the core of this thesis. SSA is a well known technique in

compilation, used to perform optimisations that require data ow analysis. The advan-

tages of SSA over other data ow implementations for the very important technique of

propagation are demonstrated. A more convenient way of identifying parameters and

returns is given. Also shown are the solutions to two other problems which are enabled

by the use of SSA: determining whether a procedure preserves a location, and handling

overlapped registers. Some problems caused by recursion are discussed. Methods for

safely dealing with the inevitable imprecision of data ow information are also given.

In Chapter 5, a solution to the important problem of type analysis for decompilers is


presented. While this problem has been implemented before as a constraint satisfaction

problem, it can also be cast as a data ow problem. The use of a representation based on

SSA is useful, since SSA provides a convenient pointer from every use of a location to its

denition; the denition becomes a convenient place to sparsely store type information.

After a brief comparison of the constraint based and data ow based implementations,

details are presented for handling arrays and structures. Such compound types have

not been handled properly in existing decompilers.

Chapter 6 describes techniques for analysing indirect jumps and calls. These instruc-
tions are the most dicult to resolve, and are not well handled by current decompilers.

The solution is based on using the power of propagation and high level pattern match-

ing. However, there are problems caused by the fact that before these instructions

are analysed, the control ow graph is incomplete. When correctly analysed, indirect

jumps can be converted to switch-like statements. Fortran style assigned gotos can also

be converted to such statements. With the help of type analysis, indirect calls can be

converted to class member function invocations.

Decompilation is a subject that is dicult to theorise about without actually testing

the theory on actual programs. Chapter 7 demonstrates results for various topics
discussed throughout the other chapters.

Chapter 8 contains the conclusion, where the main contributions and future work are
summarised.
Chapter 2

Decompiler Review

Existing decompilers have evaded many of the issues faced by machine code decompilers,

or have been decient in the areas detailed in Chapter 1. Related work has also not

addressed these issues.

The history of decompilers stretches back more than 45 years. In the sections that

follow, salient examples of existing decompilers and related tools and services are re-

viewed. Peripherally related work, such as binary translation and obfuscation, are also

considered.

2.1 Machine Code Decompilers

A surprisingly large number of machine code decompilers exist, but all suer from the

problems summarised in Chapter 1.

Machine code decompilation has a surprisingly long history. Halstead [Hal62] reports

that the Donnelly-Neliac (D-Neliac) decompiler was producing Neliac (an Algol-like

language) code from machine code in 1960. This is only a decade after the rst com-

pilers. Cifuentes [Cif94] gives a very comprehensive history of decompilers from 1960

to 1994. The decompilation Wiki page [Dec01] reproduces this history, and includes

history to the present. A brief summary of the more relevant decompilers follows; most

of the early history is from Cifuentes [Cif94].

There was a dierence of mindset in the early days of decompilation. For example,

Stockton Gaines worries about word length (48 vs 36 was common), the representation

of negative integers and oating point numbers, self modifying code, and side eects

such as setting the console lights. Most of these problems have vanished with the

exception of self modifying code, and even this is much less common now. Gaines also

33
34 Decompiler Review

has a good explanation of idioms [Gai65]. Early languages such as Fortran and Algol do

not have pointers, which also marks a change between early and modern decompilation.

• D-Neliac decompiler, 1960 [Hal62], and Lockheed Neliac decompiler 1963-7. These

produced Neliac code (an Algol-like language) from machine code programs. They

could even convert non-Neliac machine code to Neliac. Decompilers at this time

were pattern matching, and left more dicult cases to the programmer to perform

manually [Hal70].

• C. R. Hollander's PhD thesis 1973. Hollander used a formal methodology, which

while novel for the time, is essentially still pattern matching. Hollander may have

been the rst to use a combination of data ow and control ow techniques to

improve the decompiler output [Hol73].

• The Piler System 1974. Barbe's Piler system was a rst attempt to build a general

decompiler. The system was able to read the machine code of several dierent

machines, and generate code for several dierent high level languages. Only one

input phase was written (for the GE/Honeywell 600 machine) and only two output

phases were written (Fortran and COBOL) [Bar74].

• Hopwood's PhD thesis 1978. Hopwood describes a decompiler as a tool to aid

porting and documentation. His experimental decompiler translated assembly

language into an articial language called MOL620, which features machine regis-

ters. This choice of target language made the decompiler easier to write, however

the result is not really high level code. He also chose one instruction per node

of his control ow graph, instead of the now standard basic block, so his decom-

piler wasted memory (relatively much more precious at that time). He was able

to successfully translate one large program with some manual intervention; the

resultant program was reportedly better documented than the original [Hop78].

• Exec-2-C, 1990. This was an experimental project by the company Austin Code

Works, which was not completed. Intel 80286/DOS executables were disassem-

bled, converted to an internal format, and nally converted to C. Machine features

such as registers and condition codes were visible in the output. Some recovery

of high level C (e.g. if-then, loops) was performed. Data ow analysis would be

needed to improve this decompiler. The output was approximately three times

the size of the assembly language le, when it should be more like three times

smaller. It can be downloaded from [Dec01], which also contains some tests of

the decompiler.
2.1 Machine Code Decompilers 35

• 8086 C Decompiling System 1991-3. This work describes an Intel 8086/DOS to C

decompiler . It has library function recognition (manually entered and compiler

specic) to reduce unwanted output from the decompiler. It also has rule-based

recognition of data types such as arrays and pointers to structures, though the

papers give little detail on how this is done [FZL93, FZ91, HZY91].

• Cifuentes' PhD thesis Reverse Compilation Techniques 1994. Parameters and

returns were identied using data ow analysis, and control ow analysis was

used to structure the intermediate representation into compilable C code. Her

structuring algorithm produces quite natural conditionals and loops, although

in the rare case of an irreducible graph, a goto is generated. She demonstrated

her work with a research decompiler called dcc [Cif96], which reads small Intel

80286/DOS programs [Cif94, CG95].

The only types dcc could identify were 3 sizes of integers (8, 16, and 32 bit),

and string constants (only for arguments to recognised library functions). Arrays

were emitted as memory expressions (e.g. *(arg2 + (loc3 < < 1)). dcc can be

considered a landmark work, in that it was the rst machine code decompiler

to have solved the basic decompiler problems excluding the recovery of complex

types.

The dcc compiler was modied to read 32-bit Windows programs in 2002; see

ndc c on page 36.

• Reverse Engineering Compiler (REC), 1998. This decompiler extends Cifuentes'

work in several small ways, but the output is less readable since it generates a

C-like output with registers (but not condition codes) still present. It is able to

decompile executable les for several processors (e.g. Intel 386, Motorola 68K),

and handles multiple executable le formats (e.g. ELF, Windows PE, etc). It is

able to use debugging information in the input le, if present, to name functions

and variables as per the original source code. Variable arguments to library func-

tions such as printf are handled well. Complex types such as array references

remain as memory expressions. Individual instruction semantics (occasionally in

the form of asm statements) are visible in the decompiled output.

REC is not open source software, however several binary distributions are avail-

able. The decompiler engine does not appear to have been updated after 2001,

however a GUI frontend for Windows was released in 2005 [Cap98].

• The University of Queensland Binary Translator (UQBT), 1997-2001. This binary

translator uses a standard C compiler as the back end; in other words, it emits

C source code. The output is not intended to be readable, and in practice is very

dicult to read. However, the output is compilable, so UQBT could be used for
36 Decompiler Review

optimising programs for a particular platform, or cross platform porting (Section

1.3.3). Work on UQBT was not completed, however it was capable of producing

low level source code for moderate sized programs, such as the smaller SPEC
+
[SPE95] benchmarks [CVE00, CVEU 99, UQB01].

• Computer Security Analysis through Decompilation and High-Level Debugging ,

2001. Cifuentes et al. suggested dynamic decompilation as a way to provide a

powerful tool for security work. The main idea is that the security analyst is only

interested in one small piece of code at one time, and so high level code could

be generated on the y. One problem with traditional (static) decompilation

is that it is dicult to determine the range of possible values of variables; by

contrast, a dynamic decompiler can provide at least one value (the current value)

with little eort [CWVE01].

• Type Propagation in IDA Pro Disassembler, 2001. Guilfanov describes the type

propagation system in the popular disassembler IDA Pro [Dat98]. The types of

parameters to library calls are captured from system header les. The parameter

types for commonly used libraries are saved in les called type libraries. Assign-

ments to parameter locations are annotated with comments with the name and

type of the parameter. This type information is propagated to other parts of the

disassembly, including all known callers. At present, no attempt is made to nd

the types for other variables not associated with the parameters of any library

calls [Gui01].

• DisC, by Satish Kumar, 2001. This decompiler is designed to read only programs

written in Turbo C version 2.0 or 2.01; it is an example of a compiler specic

decompiler. With only one compiler generating the input programs, simpler

pattern matching is more eective. DisC does not handle oating point instruc-

tions or perform type analysis, so strings are emitted as hexadecimal constants.

It handles conditionals, switch statements, and loops reasonably well, although

for loops are translated to while loops. It is an interesting observation that since

most aspects of decompilation are ultimately pattern matching in some sense, the

dierence between simple pattern matching and general decompilers is essentially

the generality of the patterns [Kum01b].

• ndcc decompiler, 2002. André Janz modied the dcc decompiler to read 32-bit

Windows Portable Executable (PE) les. The intent was to use the modied

decompiler to analyse malware. The author states that a rewrite would be needed

to fully implement the 80386 instruction set. Even so, reasonable results were

obtained, without overcoming dcc 's limitations [Jan02].


2.1 Machine Code Decompilers 37

• The Anatomizer decompiler, circa 2002. K. Morisada released a decompiler for

Windows 32-bit programs [Mor02]. On some Windows programs, it performs

fairly well, recovering parameters, arguments, and returns. Conditionals and

switch statements are handled well. When run on a Cygwin program, it failed to

nd arguments for a call to printf, possibly because the compiler translated it to
a call to Red_puts(). Anatomizer does not appear to handle any oating point

instructions or indirect calls. Arrays are left as memory expressions. Overlap-

ping registers (e.g. BL and EBX) are treated as unrelated. No attempt was made

to propagate constants (e.g. where EBX = 0 throughout a procedure). In some

procedures where registers are used before denition, they are not identied as

parameters. It is dicult to evaluate this tool much further, since no source code

or English documentation is available, the program aborts on many inputs, and

activity on the web site (which is in Japanese) appears to have ceased in 2005.

• Analysis of Virtual Method Invocation for Binary Translation , 2002. Tröger

and Cifuentes show a method of analysing indirect call instructions. If such a call

implements a virtual method call and is correctly identied, various important

aspects of the call are extracted. The technique as presented is limited to one basic

block; as a result, it fails for some less common cases. The intended application

is binary translation, however it could be used in a decompiler. An extension of

the analysis is discussed in Section 6.3.1 [TC02].

• Boomerang, 2002-present. This is an open source decompiler, with several front

ends (two are well developed) and a C back end. Boomerang has been used to

demonstrate many of the techniques from this thesis. At the time of writing, it

was just becoming able to handle larger binary programs. Its many limitations

are mainly due to a lack of development resources. Chapter 7 contains results

using this decompiler.

• Desquirr, 2002. This is an IDA Pro plug-in, written by David Eriksson as part

of his Masters thesis. It decompiles one function at a time to the IDA output

window. While not intended to be a serious decompiler, it illustrates what can

be done with the help of a powerful disassembler and about 5000 lines of C++

code. Because a disassembler does not carry semantics for machine instructions,

each supported processor requires a module to decode instruction semantics and

addressing modes. The x86 and ARM processors are supported. Conditionals

and loops are emitted as gotos, there is some simple switch analysis, and some

identication of parameters and returns is implemented. [Eri02]

• Yadec decompiler, 2004. Raimar Falke submitted his Diploma thesis Entwicklung
38 Decompiler Review

eines Typanalysesystem für einen Decompiler (Development of a type analysis

system for a decompiler) to the Technical University of Dresden. He implemented

an i386 decompiler, based on an adaptation of Mycroft's theory [Myc99], and

extended it to handle arrays. A le representing user choices is read to resolve

conicts [Fal04].

• Andromeda decompiler, 2004-5. This decompiler written by Andrey Shulga was

never publicly released, but a GUI program for interacting with the IR generated

by the decompiler proper is available from the website [And04]. The author claims

that the decompiler is designed to be universal, although only an x86 front end

and C/C++ back end are written at present. The GUI program comes with one

demonstration IR le, and the output is extremely impressive. Unfortunately, it

is not possible to analyse arbitrary programs with the available GUI program, so

the correct types might for example be the result of extensive manual editing. As

of March 2007, the web page has been inactive since May 2005.

• Hex-Rays decompiler for 32-bit Windows, 2007. At the time of writing, author Il-

fak Guilfanov had just released a decompiler plugin for the IDA Pro Disassembler.

The decompiler view shows one function at a time in a format very close to the C

language. As with the assembler view, it is possible to click on a function name

to jump to the view for that function. Small functions are translated in a fraction

of a second. The sample code contained conditionals (including compound predi-

cates with operators such as ||) and loops (for and while loops, including break
statements). Parameters and returns were present in all functions. There is an

API to access the decompiler IR as a tree, allowing custom analyses to be added.

The author stressed that the results are for visualisation, not for recompilation

[Gui07a, Gui07b].

2.2 Ob ject Code Decompilers

Object code decompilers have advantages over machine code decompilers, but are less

common, presumably because the availability of object code without source code is low.

As noted earlier, the term object code is here used strictly to mean linkable, not

executable, code. Object code contains relocation information and symbol names for

objects that can be referenced from other object les, including functions and global

variables.
2.3 Assembly Decompilers 39

• Schneider and Winger 1974. Here the contrived grammar for a compiler was

inverted to produce a matching decompiler [SW74]. This works only for a partic-

ular compiler, and only under certain circumstances; it was shown in the paper

that Algol 60 could not be deterministically decompiled. It generally fails in the

presence of optimisations, which are now commonplace. However, this technique

could be useful in safety critical applications [BS93].

• Decomp 1988. Reuter wrote a quick decompiler for a specic purpose (to port a

game from one platform to another, without the original source code). Decomp

was an object code decompiler, and produced les that needed signicant hand

editing before they could be recompiled. No data ow analysis was performed, so

every function call required inspection and modication [Reu88].

2.3 Assembly Decompilers

The relatively small number of problems faced by assembly decompilers is reected in

their relatively good performance.

Some of the early decompilers (e.g. those by W. Sassaman [Sas66] and Ultrasystems

[Hop78]) read assembly language, because there was a pressing need to convert assembly

language programs (second generation languages) to high level languages (third gener-

ation languages). This is a somewhat easier task than machine code decompilation, as
1
evidenced by the number of problems listed in Table 1.2 on page 18 (4 versus 9).
2

• Housel's PhD thesis 1973. Housel describes a general decompiler as a series of

mappings (transformations) with a Final Abstract Representation as the main

intermediate representation. He uses concepts from compiler, graph, and op-

timisation theory. His experimental compiler was written in 5 person-months,

and translated from a toy assembly language (MIXAL) to PL/1 code. Six small

programs were tested, all from Knuth's Art of Computer Programming Vol. 1

[Knu69]. Of these six, two were correct with no user intervention, and of all PL/1

statements generated, 12% needed manual intervention [Hou73, HH74].

• Friedman's PhD thesis 1974. Friedman describes a decompiler used to translate

operating system minicomputer assembly language code from one machine to

another. It highlighted the problems that such low level code can cause. Up to

33% of a module's assembly language required manual intervention on the rst

test (dissimilar machines). On a second test (similar machines) only 2% required

intervention on average. The nal program had almost three times the number

of instructions [Fri74].
40 Decompiler Review

• Zebra 1981. Zebra was a prototype decompiler developed at the Naval Underwater

Systems Center [Bri81]. It was another assembly decompiler, this time emitting

another assembly language. The report concluded that decompilation to capture

the semantics of a program was not economically practical, but that it was useful

as an aid to the porting of programs.

• Simon's honours thesis Structuring Assembly Programs , 1997. Simon proposed

parenthesis theory as a higher performance alternative to interval graphs for struc-

turing assembly language programs. He also found ways to improve the perfor-

mance of interval based algorithms as used in the dcc decompiler. His conclusion

was that with these improvements, there was no decisive advantage of one algo-

rithm over the other; intervals produced slightly better quality, while parentheses

gave slightly better performance [Sim97].

• Glasscock's diploma project An 80x86 to C reverse compiler, 1998. This project

decompiled 80x86 assembler to C. No mention is made of indirect jump or call

instructions. Test programs were very simple, consisting of only integer variables,

string constants used only for the printf library function, and no arrays or struc-

tures. The goal was only to decompile 5 programs, so even if-then-else conditionals
were not needed. Data ow analysis was used to merge instruction semantics and

eliminate unwanted assignments to ags (condition codes) [Gla98].

• Mycroft's Type-Based Decompilation, 1999. Mycroft describes an algorithm where

machine instructions (in the form of Register Transfer Language, RTL) could be

decompiled to C. RTL is at a slightly higher level than machine code; the author

states that it is assumed that data and code are separated, and that procedure

boundaries are known. It is not stated how these requirements are fullled; per-

haps there is a manual component to preparation of the RTLs. He uses the SSA

form to undo register-colouring optimisations, whereby objects of various types,

but having disjoint lifetimes, are mapped onto a single register.

One of the drivers for Mycroft's work was a large quantity of BCPL source code

that needed to be translated to something modern; BCPL is an untyped prede-

cessor of C. He proposes the collection of constraints from the semantics of the

RTL instructions, and unication to distill type information. Only registers, not

memory locations, are typed. He considers code with pointers, structures, and

arrays. Sometimes there is more than one solution to the constraint equations,

and he suggests user intervention to choose the best solution. Mycroft describes

his unication as Herbrand-based, with additional rules to repair occurs-check

failures. No results are given in the paper [Myc99].


2.3 Assembly Decompilers 41

• Ward's FermaT transformation system, 1999. Ward has been working on program

transformations for well over a decade. His FermaT [War01] system is capable of

transforming from assembly language all the way up to specications with some

human intervention [War00]. He uses the Wide Spectrum Language WSL as his

intermediate representation. Assembly language is translated to WSL; various

transformations are applied to the WSL program, and nally (if needed) the

WSL is translated to the output language. The FermaT transformation engine

is released under the GPL license and can be downloaded from the author's web

page [War01]. He also founded the company Software Migrations Ltd. In one

project undertaken by this company, half a million lines of 80186 assembler code

were migrated to a similar number of lines of maintainable C code. The migration

cost about 10% of the estimated 65 person-months needed to manually rewrite

the code [War04].

• University of London's asm21toc reverse compiler, 2000. This assembly lan-

guage decompiler for Digital Signal Processing (DSP) code was written in a

compiler-compiler called rdp [JSW00]. The authors note that DSP is one of

the last areas where assembly language is still commonly used. This decompiler

faces problems unique to DSP processors, as noted in section 2.6.2, however de-

compilation from assembly language is considerably easier than from executable

code. The authors doubt the usefulness of decompiling from binary les. See

also [JS04].

• Proof-Directed De-compilation of Low-Level Code , 2001. Katsumata and Ohori

published a paper [KO01] on decompilation based on proof theoretical methods.

The input is Jasmin, essentially Java assembly language. The output is an ML-

like simply typed functional language. Their example shows an iterative imple-

mentation of the factorial function transformed into two functions (an equivalent

recursive implementation). Their approach is to treat each instruction as a con-

structive proof representing its computation. While not immediately applicable

to a general decompiler, their work may have application where proof of correct-

ness (usually of a compilation) is required.

In [Myc01], Mycroft compares his type-based decompilation with this work. Struc-

turing to loops and conditionals is not attempted by either system. He concludes

that the two systems produce very similar results in the areas where they overlap,

but that they have dierent strengths and weaknesses.


42 Decompiler Review

2.4 Decompilers for Virtual Machines

Virtual machine specications (like Java bytecodes) are rich in information such as

names and types, making decompilers for these platforms much easier, however good

type analysis is still necessary for recompilability.

As a group, the only truly successful decompilers to date have been those which are

specic to a particular virtual machine standard (e.g. Visual Basic or Java Bytecodes).

Executables designed to run on virtual machines typically are rich in information such

as method names and signatures, which greatly facilitate their decompilation.

Because of the large number of Virtual Machine decompilers, only the more interesting

ones will be described here. For more details, and a comparison of some decompilers,

see [Dec01].

2.4.1 Java decompilers


Since Java decompilers are relatively easy to write, they rst started appearing less than

a year after the release of the Java language.

As indicated in Table 1.2 on page 18, the types of most local variables can only be

found with type analysis. Some of the simpler Java decompilers, including some com-

mercial decompilers, do not perform type analysis, rendering the output unsuitable for

recompilation except for simple programs.

• McGill's Dava Decompiler. The Sable group at McGill University, Canada, have

been developing a framework for manipulating Java bytecodes called Soot. The

main purpose of Soot is optimisation of bytecodes, but they have also built a

decompiler called Dava [Dec01] on top of Soot. With Dava, they have been con-

centrating on the more dicult aspects of bytecode decompilation. In [MH02],

Miecznikowski and Hendren found that four commonly used Java decompilers

were confused by peephole optimisation of the bytecode. The goal of their Dava

decompiler is to be able to decompile arbitrary, veriable bytecode to pure, com-

pilable Java source code. Three problems have been overcome to achieve that

goal: typing of local variables, generating stack variables, and structuring.

Finding types for all local variables is more dicult than might be imagined. The

process is largely a matter of solving constraints, but Gagnon et al. [GHM00]

found that a three stage algorithm was needed. The rst stage solves constraints;

the second is needed when the types of objects created diers depending on the

run-time control ow path. The third stage, not needed so far in their extensive
2.4 Decompilers for Virtual Machines 43

tests with actual bytecode, essentially types references as type Object, and intro-

duces casts where needed.

Stack variables, in the context of Java bytecodes, arise in optimised bytecode;

instead of saving such variables to locals, they are left on the stack to be used as

needed. The decompiler has to create local variables in the generated source code

for these stack variables, otherwise the code would not function. Optimisation

of bytecodes can also result in reusing a bytecode local variable for two or more

source code variables, and these may have dierent types. It is important for

the decompiler to separate these distinct uses of one bytecode local variable into

distinct local variables in the generated output.

Structuring is the process of transforming the unstructured control ow of byte-

codes into readable, correct high level source code (Java in this case). For exam-

gotos in the bytecode program must be converted into conditionals, loops,


ple, all

break or continue statements, or the like. Most decompilers do a reasonable job


of this most of the time.

Miecznikowski and Hendren give examples where four other decompilers fail at

all three of the above problems, and their own Dava decompiler succeeds [MH02].

Van Emmerik [Dec01] shows that one of the more recent Java decompilers (JODE,

below) can also recover types well.

While Dava usually produced correct Java source code, it was often dicult to

read. Some high level patterns and ow analysis were used in later work to im-

prove the readability, even from obfuscated code [NH06]. Compound predicates

(using the && and || operators) , rarely handled by decompilers, are included in

the patterns.

• JODE Java Decompiler. JODE (Java optimise and decompile environment) is an

open source decompiler and obfuscater/optimiser. JODE has a verier, similar

to the Java runtime verier, that attempts to nd type information from other

class les. JODE is able to correctly infer types of local variables, and is able to

transform code into a more readable format, closer to the way Java is naturally

written, than early versions of Dava [Hoe00].

• JAD Java Decompiler. JAD is a freeware decompiler for non-commercial use,

but source code is not available. Since it is written in C++, it is relatively fast

[Kou99]. It is in the top three of nine decompilers tested in [Dec01], but is confused

by some optimised code.

• JReversePro Java Decompiler. JReversePro is an open source Java decompiler.

It seems immature compared with JODE, which is similarly open source. It fails
44 Decompiler Review

a number of tests that most of the other Java decompilers pass. However, it does

attempt to type local variables. For example, in the Sable test from [Dec01], it

correctly infers that the local variable should be declared with type Drawable,
when most other decompilers use the most generic reference type, Object (and
emit casts as needed) [Kum01a].

2.4.2 CLI Decompilers


MSIL is slightly easier to decompile than Java bytecodes.

The situation with Microsoft's CLI (.NET) is slightly dierent to that of Java. In CLI

bytecode les, the types of all local variables are stored, as well as all the information

that bytecode les contain. This implies that local variables can only be reused by

other variables of the same type, so that no type problems arise from such sharing. No

types analysis is needed. It is therefore possible to write a very good decompiler for

MSIL.

• Anakrino .NET to C# Decompiler. This decompiler is released under a BSD-like

license [Ana01]. As of this writing, the decompiler does not decompile classes,

only methods, necessitating some hand editing for recompilation. It exited with

a runtime fault for two of ve tests from [Dec01].

• Reector for .NET. Reector is a class browser for CLI/.NET components and

assemblies, with a decompiler feature. It decompiles to C#, Delphi-like and Visual

Basic-like output. Full source code is available [Roe01].

2.5 Decompilation Services

A few commercial companies oer decompilation services instead of, or in addition to,

selling a decompiler software license.

Despite the lack of capable, general purpose decompilers, a few companies specialise

in providing decompilation services. It could be that they have a good decompiler

that they are not selling to the public. It is more likely, however, that they have an

imperfect decompiler that needs signicant expert intervention, and they see more value

to their company in selling that expertise than selling the decompiler itself and providing

support for others to use it. This is similar to some other reverse engineering tools,

whose results cannot be guaranteed. Only a few commercial decompilation services are

summarised here; for more details, see [Dec01].


2.5 Decompilation Services 45

• Software Migrations Ltd [SML01]. Their services are based on ground breaking

research work rst undertaken during the 1980's at the Universities of Oxford

and Durham in England (from their web page). They specialise in mainframe

assembler comprehension and migration. See also the entry on the company's

founder, Martin Ward, in Section 2.3 on page 40.

• The Source Recovery Company and ESTC. The Source Recovery Company [SRC96,

FC99], and their marketing partner ESTC [EST01] oer a service of recovering

COBOL and Assembler source code from MVS load modules. They use a low

level pattern matching technique, but this seems to be suitable for the mainframe

COBOL market. The service has been available for many years, and appears

to be successful. It is possible that this and related services are relatively more

successful than others in part because COBOL does not manipulate pointers.

• JuggerSoft [Jug05], also known as SST Global [SST03] and Source Recovery

[SR02a] (not related to The Source Recovery Company above). This company

sells decompilers and decompilation services, as well as source to source trans-

lators. They have a good collection of decompilers for legacy platforms. Their

most recent decompiler for HP-UX produces C or C++ [SR02b]. Interestingly,

this company guarantees success (provided the customer has enough money, pre-

sumably), and will write a custom decompiler if necessary. They claim that even

this will be accomplished in a fraction of the time of rewriting.

• Dot4.com. This company oers the service of translating assembly language code

to maintainable C. Some 90-95% is performed automatically using proprietary in-

house tools. Comments and comment blocks are maintained from the assembly

language code, and instructions are merged to produce expressions [Dot04].

• MicroAPL assembler to assembler translators. MicroAPL oer both tools and

services for the translation of one assembly language to another. Several machines

are supported, including 68K, DSP, 80x86, Z8K, PowerPC, and ColdFire cores

[Mic97].

• Decompiler Technologies, formerly Visual Basic Right Back, oered a Visual Basic

decompilation service from 2004-2006 [DT04]. A Windows C/C++ decompilation

service was announced in mid 2005. Recompilability was not guaranteed; output

was generated automatically using proprietary, not for sale decompilers and other

tools. Manual enhancement of the output was oered as a separate service. In

mid 2007, the Visual Basic decompilation service was resumed.


46 Decompiler Review

2.6 Related Work

This related work faces a subset of the problems of decompilation, or feature techniques

or representations that appear to be useful for decompilation.

Several existing tools, analyses, or representations overlap with decompilation research.

Each of these is considered below.

2.6.1 Disassembly
Disassembly achieves similar results to decompilation, and encounters some of the same

problems as decompilation, but the results are more verbose and machine specic. No

existing disassemblers automatically generate reassemblable output.

Disassemblers can be used to solve some of the same problems that decompilers solve.

They have three major drawbacks compared to decompilers: their output is machine

specic, their output is large compared to high level source code, and users require

knowledge of a particular machine language. Decompilers are preferred over disassem-

blers, if available at the same level of functionality, for the same reasons that high level

languages are preferred over assembly language. As an example, Java disassemblers are

rarely used (except perhaps to debug Java compilers), because good Java decompilers

are available.

The most popular disassembler is probably IDA Pro [Dat98]. It performs some auto-

matic analysis of the input program, but also oers interactive commands to override the

auto analysis results if necessary, to add comments, declare structures, etc. Separation

of pointers from constants and original from oset pointers is completely manual. Ida

Pro has the ability to generate assembly language in various dialects, however routine

generation of assembly language for reassembly is discouraged [Rob02, Van03]. High

level visualiser plugins are available, and a pseudocode view was being demonstrated

at the time of writing.

2.6.2 Decompilation of DSP Assembly Language


This is a specialised area, where assembly language is still in widespread but declining

use, and has several unique problems that are not considered here.

Current Digital Signal Processors (DSPs) are so complex that they rely on advanced

compilers for performance. Previous generations relied heavily on assembly language

for their performance. As a result, there is a need for conversion of existing assembly
2.6 Related Work 47

language code to high level code, which can be achieved by assembly decompilation.

Decompilation of DSP assembly language has extra challenges over and above the usual

problems, including

• saturated arithmetic,

• sticky overow,

• hardware support for buers, including circular buers, and

• more reliance on timing guarantees.

These issues are not considered here.

2.6.3 Link-Time Optimisers


Link-time Optimisers share some of the problems of machine code decompilers, but have

a decided advantage because of the presence of relocation information.

Link-time optimisers share several problems with decompilers. Muth et al. [MDW01]

found the following problems with the alto optimiser for the Compaq Alpha:

• It is not possible to assume Application Interface (ABI, essentially calling conven-

tion) compliance, since the compiler or linker could perform optimisations that

ignores the conventions, as could hand written assembly language code.

• Memory analysis is dicult.

• Preserved registers (registers whose value is unaected by a call to a particular

procedure) are needed. This is covered in Section 4.3.

• Constant propagation is very useful for optimisations and for analysing indirect

control transfer instructions. The authors found that an average of about 18% of

instructions had operands and results with known constant values.

Analysis is easier at the object code (link-time) level, because of the presence of re-

location information. Optimisers have a very simple fallback for when a proposed

optimisation cannot be proved to be safe: do not perform the optimisation. Alto is

able to force conservative assumptions for indirect control transfers by using the spe-

cial control ow graph nodes Bunknown (for indirect branches) and Funknown (for indirect

calls), which have worst case behaviour (all registers are used and dened). This is
48 Decompiler Review

able to be kept reasonably precise because of the presence of relocation information.

For example, there are control ow edges from Bunknown to only those basic blocks in

the procedure whose addresses have relocation entries associated with them (implying

that there is at least one pointer in the program to that code). Without relocation

information, Bunknown would have edges to the beginning of all basic blocks, or possibly

to all instructions.

2.6.4 Synthesising to Hardware


An additional potential application for decompilation is in improving the performance

of hardware synthesised from executable programs.

Programs written for conventional computers can be partially or completely synthesised

in hardware such as eld programmable gate arrays (FPGAs). Hardware compilers can

read source code such as C, or they can read executable programs. Better performance

can be obtained if basic control and data ow information is provided, allowing the

use of advanced memory structures such as smart buers [SGVN05]. As noted in this

paper, decompilation techniques can be used to automatically provide such information,

or to provide source code that, even if not suitable for software maintenance, is adequate

for synthesising to hardware.

The ability to synthesise to hardware is becoming more important with the availability

of chips that incorporate a microprocessor and congurable logic on one chip. These

chips allow portions of a program for embedded devices to be implemented in hardware,

which can result in signicant power savings [SGVV02].

2.6.5 Binary Translation


Static Binary Translators that emit C produce source code from binary code, but since

they do not understand the data, the output has very low readability.

The University of Queensland Binary Translator (UQBT) is an example of a binary

translator that uses a C compiler as its back end. It is able to translate moderately

sized programs (e.g. the SPEC CPU95 [SPE95] benchmark go) into very low level
+
C code [UQB01, CVE00, CVEU 99, BT01]. The semantics of each instruction are

visible, and all control ow is via goto statements and labels. Parameters and returns

are identied, although actual parameters are always in the form of simple variables,

not expressions.

There is no concept of the data item that is being manipulated, apart from its size.

To ensure that the address computed by this instruction will read the intended data,
2.6 Related Work 49

binary translators have to force the data section of the input binary program to be

copied exactly to the data section of the output program and at the same virtual

address. In essence, the target program is replicating the bit manipulations of the

source program. The data section is regarded as a group of addressable memory bytes,

not as a collection of integers, oats, structures, and so on.

This is a subtle but important point. The compiler (if there was one) that created

the input program inserted instructions to manipulate high level data objects dened

in the source program code. The compiler has an intermediate representation of the

data declared in the source program, as well as the imperative statements of the pro-

gram. By contrast, a binary translator makes no attempt to recreate the intermediate

representation of the data. It blindly replicates the bit manipulations from the source

program, without the benet of this data representation. It ends up producing a pro-

gram that works, but only because the input program worked. It makes no attempt

to analyse what the various bit manipulations do. In eect, the binary translator is

relying on the symbol table of the original compiler to ensure that data items do not

overlap. The source code emitted by such a translator is at the second or third lowest

level, no translation of the data section, as described in Section 1.1 on page 6.

This contrasts with the code quality that a decompiler should produce. A decompiler

needs to recover a representation of the data as well as the imperative statements.

Experienced programmers reading the decompiled output should be able to understand

most of the program's behaviour, just as they would if they were to read the original

source code with the comments blanked out. In other words, decompilers should strive

for source code that is two levels higher up the list of Section 1.1.

The above discussion concerns static binary translation, where no attempt is made to

execute the input program. Dynamic binary translation does run the program, and

achieves two of the goals of decompilation. The rst is the ability to run a legacy ap-

plication with no source code on a new platform. The second is the ability to optimise

a binary program to run with close to optimal performance on a particular platform

(e.g. Dynamo [BDB00]). However, dynamic binary translation in the usual congu-

ration does not allow maintenance of the program except by patching the executable

le, and most programs require modication over their lifetimes. However, a modied

dynamic binary translator might be able to nd values for important pointers or other

locations and emit information to aid a static decompilation.

2.6.6 Instruction Set Simulation


This is an automatic technique for generating source code which is like a static interpre-
50 Decompiler Review

tation of the input program inlined into a large source le, relying heavily on compiler

optimisations for performance.

Instruction Set Simulation is a technique where each instruction of the original machine

code program is represented by an inlined interpretation for that instruction. This is

a static translation (however there are also dynamic versions, e.g. Shade [CK94]), and

the result is source code, like the output of a binary translator, except that even the

original program's program counter is visible. This is the lowest level of source code

listed in Section 1.1. When macros are used, the source code resembles assembly

language implemented in a high level language. As with binary translation, no attempt

is made to decode the data section(s) [MAF91, Sta02].

2.6.7 Abstract Interpretation


Abstract interpretation research shows how desirable features such as correctness can be

proved for decompilation analyses.

Most analyses in a decompiler compute a result that is incomplete in some sense, e.g. a

subset of the values that a register could take. In other words, these analyses are ab-

stract evaluations of a program. Cousot and Cousot [CC76] show that provided certain

conditions are met, correctness and termination can be guaranteed. This result gives

condence that the various problems facing decompilers can eventually be overcome.

2.6.8 Proof-Carrying Code


Proof-carrying code can be added to machine code to enable enough type analysis to

prove various kinds of safety.

Proof-carrying code is a framework for proving the safety of machine-language programs

with machine-checkable proofs [AF00]. The proof-carrying code can be at the machine

code level [SA01] or at the assembly language level [MWCG98]. While these systems

work with types for machine or assembly language, they do not derive the types from

the low level code. Rather, extra information in the form of checkable proofs and types

are included with the low level code.

Presumably, decompilation of proof-carrying code would be easier than that of normal

machine code, perhaps comparable to the decompilation of Java bytecode programs.

Veriable Java programs provide a measure of safety through a dierent means. It is

interesting to observe that most attempts to increase program safety seem to increase

the ease of decompilation.


2.6 Related Work 51

2.6.9 Safety Checking of Machine Code


CodeSurfer/x86 uses a variety of proprietary tools to produce intermediate representa-

tions that are similar to those that can be created for programs written in a high-level

language.

Some security checking tools read machine code directly, without relying on or in some

cases trusting any annotations from the compiler.

In his University of Wisconsin-Madison thesis, Xu uses the concept of typestate check-

ing, which encompasses the types and also the states (e.g. readable, executable) of

machine code operands [Xu01]. He uses symbol table information (not usually avail-

able in real-world machine code programs) to identify the boundaries of procedures. Xu

claims to be able to deal with indirect calls using a simple intraprocedural constant

propagation algorithm.

Also from the University of Wisconsin-Madison, Christodorescu and Jha discuss a Static

Analyser for Executables (SAFE) [CJ03]. The binary is rst disassembled with the IDA

Pro disassembler. A plug-in for IDA Pro called the Connector, provided by Gramma-

Tech Inc [Gra97], interfaces to the rest of the tool. Value-Set Analysis (VSA) is used

to produce an intermediate representation, which is presented in a source code analysis

tool called CodeSurfer (sold by GrammaTech Inc). CodeSurfer is a tool for program

understanding and code inspection, including slicing and chopping. It appears that

the executable loader used in this work was an early version that relied on IDA Pro

to (possibly manually) analyse indirect jump and call instructions. They were able to

detect obfuscated versions of four viruses with no false positives or negatives, where

three commercial pattern-based virus detection tools all failed.

A later paper from the same university names the composite tool for analysis and in-

spection of executables as CodeSurfer/x86 [BR04]. The aim is to produce intermediate

representations that are similar to those that can be created for a program written in

a high-level language. Limitations imposed by IDA Pro (e.g. not analysing all indirect

branches and calls) are mitigated by value-set analysis and also Ane Relations An-

alysis (ARA). This work appears to make the assumption that absolute addresses and

osets indicate the starting address of program variables. In other words, it does not

perform separation of original and oset pointers (see Section 1.5.3). Various safety

checks were demonstrated, and additional checks or visualisations can be implemented

via an API. Memory consumption is high, e.g. 737MB for analysing winhlp32.exe,
which is only 265KB in size, and this is while a temporary expedient for calls to

library functions was used.

Starting in 2005, papers from the same university have been published, featuring a third
52 Decompiler Review

major analysis [BR05, RBL06]. Aggregate Structure Identication (ASI) provides the

ability to recover record and array structure. A technique the authors call recency-

abstraction avoids problems of low precision [BR06]; see also Section 4.6 on page 139.

Decompilation is mentioned as a potential application for their tool, but does not appear

to have been attempted in the currently published papers.

2.6.10 Traditional Reverse Engineering


Traditional reverse engineering increases high level comprehension, starting with source

code, while decompilation provides little high level comprehension, starting with machine

code.

Decompilation is a reverse engineering process, as dened by the generally accepted

taxonomy [CC90], since it creates representations of the system in another form or at

a higher level of abstraction. It would seem reasonable to assume that decompilation

therefore has much in common with traditional reverse engineering, but this is not the

case. Traditional reverse engineering is performed on source code, usually with the

benet of some design documentation. From [CC90]:

The primary purpose of [traditional] reverse engineering a software system

is to increase the overall comprehensibility of the system for both mainte-

nance and new development.

Decompilation provides the basis for comprehension, maintenance and new develop-

ment, i.e. source code, but any high-level comprehension is provided by the reader.

Comprehension in decompilation is limited to such details as this variable is incre-

mented by one at this point in the program, and its type is unsigned integer. Tra-

ditional reverse engineering could add high-level comprehension such as because the

customer has ordered another book.

Baxter and Mehlich [BM00] make the following observation:

A single implementation may in fact represent any of several possible plans.

An INC Memory instruction could implement the add one abstraction,

the store a One (knowing a value is already zero) abstraction, the Com-

plement the LSB abstraction, or several other possibilities.

A decompiler can readily deduce that a value is set to one by the combination of clearing

and incrementing it; this is an example where decompilation provides a limited kind of
2.6 Related Work 53

comprehension. It is a mechanical deduction, but quite useful even so. Such mechanical

deductions can be surprising at times (see for example the triple xor idiom in Section

4.3). If sucient identities are preprogrammed into a decompiler, it seems plausible that

a complement the LSB comment could be inserted under appropriate circumstances

(e.g. the LSB is isolated with an appropriate AND instruction, and the result is tested

for zero or one in a conditional statement predicate). In most other circumstances,

emitting mem++ or the like seems appropriate, leaving the determination of the higher

level meaning to the reader.

2.6.11 Compiler Infrastructures


Several compiler infrastructures exist with mature tool sets which, while intended for

forward engineering, might be adaptable for decompilation.

Compilers are becoming so complex, and the need for performance is so great, that

several compiler infrastructures have appeared in the last decade, oering the ability to

research a small part of the compiler chain, and not having to write the whole compiler,

visualisation tools, testing frameworks, and so on. None of these are designed with

decompilation in mind, but they are all exible systems. It is worthwhile therefore

to consider whether one of these relatively mature infrastructures, with documented

interfaces and debugged transformations already in place, might be able to be extended

to become the basis for a machine code decompiler. Presumably, much less work would

be required to get to the point where some code could be generated, compared to

starting from scratch. Parts that could be used with little modication include the

IR design, the C generators, simplication passes, translating into and out of SSA

form, constant and copy propagation passes (which could be extended to expression

propagation), dead code elimination, and probably many data ow analyses such as

reaching denitions. Even the front ends could be used as a quick way to generate IR

for experimentation.

2.6.11.1 LLVM

The Low Level Virtual Machine is an infrastructure for binary manipulation tools such

as compilers and optimisers. It has also been used for binary to binary tools, such as

binary translators. The IR of LLVM is interesting in that it is based on the Static

Single Assignment form (SSA form). Memory objects are handled only through load

and store instructions, and are therefore not in SSA form.

The LLVM intermediate representation is essentially a list of RISC-like three address

instructions, with an encoding scheme that allows an innite number of virtual registers
54 Decompiler Review

while retaining good code density. There are a few concessions for type and exception

handling information, but fundamentally this is not suitable for representing the com-

plex expressions needed for decompilation. It is possible that complex expressions could

be generated only in the back ends, but this would duplicate the propagation code in

all back ends. The propagation could be applied immediately before language specic

back ends are invoked, but this would mean that the back ends require a dierent IR

than the rest of the decompiler. Also, earlier parts of a decompiler (e.g. analysing in-

direct jump and call instructions) rely on the propagation having generated complex

expressions. Figure 2.1 shows an overview of LLVM for multi-stage optimisation.

Libraries Runtime Optimizer


LLVM Native Profile Optimized
LLVM & Trace Code
Static Compiler 1 Info
Profile
. LLVM .exe
.exe & Trace
. .o files Optimizing Linker (llvm + Host Machine Info
Static Compiler N native)
LLVM
Offline Reoptimizer

Figure 2.1: LLVM can be used to compile and optimise a program at various stages of
its life. From [Lat02].

2.6.11.2 SUIF2

SUIF2 is version 2 of the Stanford University Intermediate Format, an infrastructure

designed to support collaborative research and development of compilation techniques


+
[ADH 01]. Figure 2.2 gives an brief overview. The SUIF2 IR supports high level pro-

gram concepts such as expressions and statements, and it is extensible so that it should

be able to handle e.g. SSA phi nodes. However, authors of a competing infrastructure
+
state that SUIF [17] has very little support of the SSA form [SNK 03]. There are

tools to convert the IR to C, front ends for various languages, and various optimisation

passes. At any stage, the IR can be serialised to disk, or several passes can operate in

memory. SUIF2 may be able to be extended to an IR suitable for decompilers.

2.6.11.3 COINS

COINS is a compiler infrastructure written in Java designed for research on compiler


+ +
optimisations [SNK 03, SFF 05]. Figure 2.3 shows an overview. It has a high level and

a low level IR, and C code can be generated from either. The low level IR (LIR) appears

to be suitable for most compiler work, with a tree-based expression representation.

LIR is not SSA based, but there is support to translate into and out of SSA form.

Operands are typed, but currently the supported types are basically only a size and a
2.6 Related Work 55

,%"-./01"-23+%4
3*,)RUWUDQ ('*& ('*& -DYD

268,)
Interprocedural
*
68,)
Analysis
Parallelization
/0,1#*2 Locality Opt
!"#"$%&'(
"+!,-80-76 & 0DFK68,) )*+(,% !-./0#1*2
3.21+(.$%4##&!"(1&*
$OSKD [
* C++ OSUIF to SUIF is incomplete

Figure 2.2: Overview of the SUIF2 compiler infrastructure. From [Lam00].

ag indicating whether the operand is an integer (sign is not specied) or oating point.
)46)!%!+3"7/8 #$%&$'%(")*"-./0"5)46)!%!+3"7//8
There is some extensibility in the IR design, so COINS may be able to be expanded be

become the basis of a decompiler IR.


3#9AWQ,J,0%T'!0<"#"%&'()!"*)+$*+),
C-5%=D&%C4)*)!'E%->D%FGFHHE%I!J!
'9 6K&CL%G%6K&CM%*)!'"0!*4)"E%6M$ %%%D)!8A"E%"$$" &'*)!8)4$,2+)!0%!'!0<","
&'*,)!$*#J,%$4;8#0!*#4'5%"+#(2)#J,) %%%&*,)!*,2%24;#'!'$,%()4'*#,) %%%$48<%8)48!9!*#4'
6*!*,;,'*%2#";!'*0,)" %%%>4*%9)!8A%4+*8+* %%%2,!2$42,%,0#;#'!*#4'
6K&C%&1%$4'"#"*,'$<%$A,$N,)" 6*,,'"9!!)2Y"%!0#!"%!'!0<"#"
B 6+#(/)4O",)E%PFQ%J#"+!0%"A,00 F!00%9)!8A
Q#'N,)
%%%1,9#4'%()!;,O4)N F4'*)40%(04O%9)!8A"
C %%%&'*,)8)4$,2+)!0%!'!0<"#"%()!;,O4)N
Fortran Java &'*,)8)4$,2+)!0%),9#4'W/!",2%!'!0<","5
New language C OpenMP
I!J!%R6K&C%WX%6K&C%04O,)#'9 program program program %%%!))!<%2,8,'2,'$,%Z%8)#J!*#S!*#4'
program program program
%%%4/7,$*%0!<4+*%!'2%;,*A42%2#"8!*$A %%%"$!0!)%),2+$*#4'%Z%8)#J!*#S!*#4'
&'*,)8)4$,2+)!0%8!)!00,0#S!*#4'
' 6$!0!)%48*#;#S!*#4'" C %%%=),"/+)9,)%!)#*A;,*#$%?4;,9!B
Fortran Java T((#',%8!)*#*#4'#'9%(4)%8!)!00,0#";%Z%04$!0#*<
New language
%%%C!)N!"%0,;;! +'#(#,"5 analyzer HIR to C
%%%$4;;4'%"+/,.8),""#4'%,0#;#'!*#4' analyzer analyzer analyzer
%%%2,!2$42,%,0#;#'!*#4' %%%D!+""#!'%,0#;#'!*#4'%8!$N!9, %%%+'#;42+0!)%*)!'"(4);
%%%8,,8A40,%48*#;#S!*#4'" %%%?#'*,)$A!'9,E%),J,)"!0E%"N,O#'9B
Compiler control (Schedule module invocation)

D)!8A%$404)#'9%),9#"*,)%!004$!*#4' %%%(+"#4'E%(#""#4'
High-level Intermediate%%%"*!*,;,'*%),#'2,.#'9%!'2%"$!0#'9
Representation (HIR)
T08A!%!'2%.UV%/!$N,'2"
04$N#'9%(4)%'4'8,)(,$*0<%',"*,2%0448"
Symbol
table Basic optimizer Parallelization Advanced optimizer
HIR
to Basic data flow analyzer Loop analyzer Alias analyzer
LIR Common subexp. elim. Coarse grain parallelizer Loop optimizer
Dead code elim. Loop parallelizer Partial redundancy elim.

Low-level Intermediate Representation (LIR)

!3'=>%"/? Code generator


5)!@%6+"/A"?%*>%@+'$%"/?
x86 TMD
Machine independent SIMD optimizer SSA optimizer
Sparc TMD
optimizer If-conversion LIR-SSA transformation
Code generation based :,*!$0!""5%$!8*+),"%),8),",'*!*#4'%4(%&1%'42,"%#'%!%2!*!%"*)+$*+),
Instruction scheduler
ARM TMD Code motion Basic optimizer
!""," on machine description -'!/0,"%$4;;4'%9,',)#$%)4+*#',"%*4%#;80,;,'*
Branch opt. Parallelization Advanced optimizer
Register allocator
O,%!22%',O%&1%'42," =,)"#"*,'$,5%),!2#'9GO)#*#'9%*4%2#"N
MIPS TMD
),!2#'9%!'2%O)#*#'9%*4%(#0, F04'#'95%)4+*#',%*A!*%$48#,"%*A,%2!*!%"*)+$*+),
PowerPC
SH4 TMD TMD SSE2 BOP
=)#'*#'95%!%9,',)#$%)4+*#',%*4%8)#'*%4+*%*A,%#'(4);!*#4'
Machine dependent LIR to C
\!0N,)"E%#*,)!*4)"E%J#"#*4)"
New machine
2%8)49)!;;#'9B optimizer TMD EE BOP
)40%(04O%",;!'*#$" Instruction scheduler Altivec BOP
2%#'%;+0*#;,2#!%#'"*)+$*#4'%",*"
C
Issues
New program
in progress Sparc x86 ARM MIPS PowerPC SH4
code code code code code code machine
'"%#'*)42+$,2%/<%8!)!00,0#S!*#4' code

+
Figure 2.3: Overview of the COINS compiler infrastructure. From Fig. 1 of [SFF 05].
56 Decompiler Review

2.6.11.4 SCALE

 Scale is a exible, high performance research compiler for C and Fortran, and is written
+
in Java [CDC 04, Sca01]. The data ow diagram of Figure 2.4 gives an overview.

Again, there is a high and low level IR; the low level IR is in SSA form. Expressions in

the SSA CFG are tree based, hence it may be possible to adapt this infrastructure for

research in decompilation.

C, C99, K&R Abstract Syntax Tree (Clef)


C Parser

F77, F90 Abstract Syntax Tree (Clef)


Fortran Parser

AST to CFG AST to C


Graph Generator
(Clef2Scribble) (Clef2C)

CFG
Graph (file or direct) C source
Alias Analysis

CFG

Static Single
Assignent

SSA CFG

Optimizations

SSA CFG

Control Flow Graph (CFG)


Normal CFG

Code Graph CFG to C


Generator Generator (Scribble2C)

Assembly code Graph (file or direct) C source

Figure 2.4: Scale Data Flow Diagram. From [Sca01].

2.6.11.5 GCC Tree SSA

As of GCC version 4.0, the GNU compiler collection includes the GENERIC and

GIMPLE IRs, in addition to the RTL IR that has always been a feature of GCC

[Nov03, Nov04]. RTL is a very low level IR, not suitable for decompilation. GENERIC

is a tree IR where arbitrarily complex expressions are allowed. Compiler front ends can

convert their language-dependent abstract syntax trees to GENERIC, and a common

translator (the gimplier) could convert this to GIMPLE, a lower version of GENERIC

with three address representations for expressions. Alternatively, front ends can con-

vert directly to GIMPLE. All optimisation is performed at the GIMPLE level, which is

probably too low level for decompilation. GENERIC shows some promise, but few of

the tools support it. Figure 2.5 shows the relationship of the various IRs.
2.6 Related Work 57

Tree SSA

C C
trees genericize

C++ C++ GENERIC GIMPLE GIMPLE


gimplify RTL
trees genericize trees trees optimizer

Java Java
trees genericize

Figure 2.5: The various IRs used in the GCC compiler. From a presentation by D.
Novillo.

There is a new SSA representation for aggregates called Memory SSA [Nov06] which is

planned for GCC version 4.3 . While primarily aimed at saving memory in the GCC

compiler, some ideas from this representation may be applicable to decompilation.

2.6.11.6 Phoenix

IR Chaining example: *p += (a+1)


START
Compilers - Tools - Plugins

Variable ADD Variable Immediate


Inputs Executables
·C++ ·x86 t1 a 1
·Native ·x64
·MSIL
Phoenix
IR ·ARM Use/Def chaining for
·Mixed Mode ·IA64 expression temp
Memory ADD Memory Label Opnd
·MDIL
·... [p](1) [p](1) t1 p handler

Variable Variable

Code Analysis and Transformation END Exception Handler LabelInstr


Profiles, Patterns, Obfuscation, etc.

Figure 2.6: Overview of the Phoenix compiler infrastructure and IR. From [MRU07].

Phoenix is a compiler infrastructure from Microsoft Research, used by Microsoft as

the basis of future compiler products. Figure 2.6 gives an overview, and an example

of IR showing one line of source code split into two IR instructions. Phoenix has the

ability to read native executable programs, but does not (as of this writing) come with

a C generator. IR exists at three levels (high, medium, and low), plus an even lower

level for representing data. Unfortunately, even at the highest level (HIR), expression

operands can not be other expressions, only variables, memory locations, constants,

and so on. The three main IR levels are kept as similar as possible to make it easier for

phases to operate at any level. As a result, Phoenix would probably not be a suitable

infrastructure for decompiler research [MS04].


58 Decompiler Review

2.6.11.7 Open64

Open64 is a compiler infrastructure originally developed for the IA64 (Itanium) archi-

tecture. It has since been broadened to target x86, x64, MIPS, and other architectures.

The IR is called WHIRL, and exists at ve levels, as shown in Figure 2.7.

Optimizer Representation Translator /Lowering Action


C Java F90
C++ Bcode F77

Front-ends
VHO Very High
standalone inliner WHIRL
Lower aggregates
Un-nest calls
IPA Lower COMMAs, RCOMMAs
PREOPT High WHIRL
LNO Lower ARRAYs
Lower Complex Numbers
Lower high level control flow
Lower IO
Lower bit-fields
Spawn nested procedures for
parallel regions
WOPT Mid WHIRL
RVI1
Lower intrinsics to calls
Generate simulation code for quads
All data mapped to segments
Lower loads/stores to final form
Expose code sequences for
constants and addresses
Expose $gp for -shared
Expose static link for nested
procedures
RVI2 Low WHIRL

Map opcodes to target machine


opcodes
Very Low WHIRL
CG

Code generation

CG Machine
CG Instruction
Representation

Figure 2.7: The various levels of the WHIRL IR used by the Open64 compiler
infrastructure. From [SGI02].

At all IR levels, the IR is a tree, which is suitable for expressing high level language

constructs. The higher two levels can be translated directly to C or Fortran by provided

tools. Many of the compiler optimisations use the SSA form, hence there are facilities

for transforming into and out of SSA form. This infrastructure would therefore appear

to be suitable for decompilation research.


2.6 Related Work 59

Open64 is free software licensed under the GPL, and can be downloaded from Source-

Forge [SF06]. See also [UH02].

2.6.12 Simplication of Mathematical Formulae


Decompilers can make use of research on the simplication of mathematical formulae

to improve the output.

One of the transformations needed by compilers, decompilers, and other tools is the

simplication of expressions, e.g. x+0 → x. The general goal is to reduce terms to

canonical or simpler forms. Dolzmann and Sturm [DS97] give ideas on what constitutes

simplest: few atomic formulae, small satisfaction sets, etc. They point out that some

goals are contradictory, so at times a user may have to select from a set of options, or

select general goals such as minimum generated output size or maximum readability.

The eld of decompilation probably has much to learn from such related work, to ne

tune the output presented to users.

2.6.13 Obfuscation and Protection


Obfuscation and protection are designed to make the reverse engineering of code more

dicult, including decompilation; in most cases, such protection prevents eective de-

compilation.

Obfuscation is the process, usually automatic, of transforming a program distributed

in machine code form in such a way as to make understanding of the program more

dicult. (There is also source code obfuscation [Ioc88], but at present this appears to

be largely a recreational activity.)

A typical technique is to add code not associated with the original program, confusing

the reverse engineering tool with invalid instructions, self modifying code, and the like.

Branches controlling entry to the extra code are often controlled by opaque predicates,

which are always or never taken, without it being obvious that this is the case. Another

technique is to modify the control ow in various ways, e.g. turning the program into a

giant switch statement.

Protection of executable programs is a term that encompasses various techniques

designed to hide the executable code of a program. Techniques include encrypting the

instructions and/or data, using obscure instructions, self modifying code, and branching

to the middle of instructions. Attempts to thwart dynamic debuggers are also common,

but these do not aect static decompilation except to add unwanted code to the output.
60 Decompiler Review

The result of decompiling an obfuscated or protected program will depend on the nature

of the modications:

• If compiler specic patterns of machine code are merely replaced with dierent

but equivalent instructions, the result will probably decompile perfectly well with

a general decompiler.

• Where extraneous code is inserted without special care, the result could well

decompile perfectly with a general decompiler.

• Where extraneous code is inserted with special care to disguise its lack of reacha-

bility, the result will likely be a program that is correct and usable, but dicult to

understand and costly to maintain unless most of the obfuscations are manually

removed. Even though such removal could be performed at the source code level,

any removal is likely to be both tedious and costly. Some obfuscations (e.g. cre-

ating branches to the middle of instructions) may cause a decompiler to fail to

generate a valid representation of the program.

• Where self-modifying code is present, the output is likely to be incorrect.

• Where the majority of the program is no longer in normal machine language, but

is instead encrypted, the result is likely to be only valid for the decrypting routine,

and the rest will be invalid. Where several levels of encryption and/or protection

are involved, the result is likely to be valid only for the rst level of decryption.

• Where the majority of the program has been translated into instructions suitable

for execution by a Virtual Machine (VM), and a suitable interpreter or Just In

Time (JIT) compiler is part of the executable program, the result will likely be

valid only for the VM interpreter or JIT compiler.

As a research area, obfuscation has only received serious attention since the mid 1990s,

with the appearance of Java bytecodes and CLI.

Collberg et al. have published the seminal paper in this area, A Taxonomy of Obfus-

cating Transformations [CTL97]. Wroblewski's thesis [Wro02] presents a method for

obfuscating binary programs.

The diculties presented by most forms of executable program protection indicate that

decompilation of such code is usually not feasible. Where decompilation of a heavily

protected program is required (e.g. source code is lost, and the only available form of the

program is a protected executable), the protection must rst be removed by processes

unrelated to decompilation.
Chapter 3

Data Flow Analysis

Static Single Assignment form assists with most data ow components of decompilers,

assisting with such fundamental tasks as expression propagation, identifying parameters

and return values, deciding if locations are preserved, and eliminating dead code.

Data ow analysis and control ow analysis are two of the main classes of analyses for

machine code decompilers, as shown in Figure 3.1. Control ow analysis, where the

intermediate representation of a program is converted to high level statements such as

conditionals and loops, is a largely solved problem. However, data ow analysis, where

instruction semantics are transformed into more complex expressions, parameters and

return values are identied, and types are recovered, still has signicant potential for

improvement.

Figure 3.1: Basic components of a machine code decompiler.

The intermediate representation (IR) for a decompiler resembles that of a compiler.

It consists of statements containing expressions. For example, there may be a branch

61
62 Data Flow Analysis

#include <stdio.h>
/* n n! n(n-1)...
* Calculate C = -------- = --------------- (n-r terms top and bottom)
* r r!(n-r)! (n-r)(n-r-1)... */
int comb(int n, int r) {
if (r >= n)
r = n; /* Make sure r <= n */
double res = 1.0;
int num = n;
int denom = n-r;
int c = n-r;
while (c-- > 0) {
res *= num--;
res /= denom--;
}
int i = (int)res; /* Integer result; truncates */
if (res - i > 0.5)
i++; /* Round up */
return i;
}
int main() {
int n, r;
printf("Number in set, n: "); scanf("%d", &n);
printf("Number to choose, r: "); scanf("%d", &r);
printf("Choose %d from %d: %d\n", r, n, comb(n, r));
return 0;
}
Figure 3.2: Original source code for the combinations program.

80483b0: 55 push %ebp ; Standard function prologue


80483b1: 56 push %esi ; Save three registers
80483b4: 53 push %ebx
80483b5: 57 push %ecx
80483b6: 83 ec 08 sub $0x8,%esp ; Space for 8 bytes of locals
80483b9: 8b 55 08 mov 0x8(%ebp),%edx ; edx stores n and num
80483bc: 89 f9 mov %ecx,%ebx ; ebx = r
80483be: 2b 4d 08 sub 0x8(%ebp),%ebx ; ebx = r-n, carry set if r<n
80483c1: 19 c0 sbb %eax,%eax ; eax = -1 if r<n else eax = 0
80483c3: 21 c8 and %ebx,%eax ; eax = r-n if r<n else eax = 0
80483c5: 03 45 08 add 0x8(%ebp),%eax ; eax = r-n=r if r<n, else eax=n
Figure 3.3: First part of the compiled machine code for procedure comb of Figure 3.2.
Data Flow Analysis 63

Table 3.1: x86 assembly language overview.

Assembly language Explanation

%eax, ... %edx, %esi, Registers. %esp is the stack pointer; %ebp is the base
%edi, %ebp, %esp pointer, usually pointing to the top of the stack frame.
%st, %st(1) Top of the oating point register stack (often implied),
and next of stack
(r ) Memory pointed to by register r.
disp (r ) Memory whose address is disp + the value of register r.
push r Push the register r to the stack.
Afterwards, the stack pointer esp points to r.
pop r Pop the value currently pointed to by esp to register r.
Afterwards, esp points to the new top of stack.
mov sr,d est Move (copy) the register or immediate value sr to register
or memory dest.
cmp sr,dr Compare the register dr with the register or immediate
value sr. Condition codes (e.g. carry, zero) are aected.
jle dest Jump if less or equal to address dest. Uses condition
codes set by an earlier instruction.
lea mem ,rd Load the eective address of mem to register rd.
jmp dest Jump to address dest. Equivalent to loading the program
counter with dest.
call dest Call the subroutine at address dest. Equivalent to
pushing the program counter and jumping to dest.
test rs ,rd And register or memory rd to register or immediate value
rs ; result is not stored. Condition codes are aected.
sub rs ,rd rd := rd - rs. Condition codes are aected.
sbb rs ,rd rd := rd - rs - carry/borrow. Condition codes are aected.
fidiv mem The value at the top of the oating point stack (FPTOS)
is divided by the integer in memory location mem.
leave Equivalent to mov %ebp,%esp; pop %ebp.
ret Subroutine Return. Equivalent to pop temp; jmp temp
dec rd Decrement register rd , i.e. rd := rd -1.
nop No operation (do nothing).

statement with the expression m[esp-8] > 0 representing the condition for which the

branch will be taken, and another expression (often a constant) which represents the

destination of the branch in the original program's address space. esp is the name of a
register (the x86 stack pointer), and m[esp-8] represents the memory location whose

address is the result of subtracting 8 from the value of the esp register. Register and

memory locations are generally the only entities that can appear on the left hand side

of assignments. When locations appear in the decompiled output, they are converted

to local or global variables, array elements, or structure members, depending on their

form and how they are used. Expressions consist of locations, constants, or operators
64 Data Flow Analysis

combining other expressions, locations, or constants. Operators consist of the usual

high level language operators such as addition and bitwise or. There are also a few low

level operators such as those which convert various sizes of integers to and from oating

point representations.

Data ow analysis concerns the denitions and uses of locations. A denition of a

location is where is it assigned to, usually on the left hand side of an assignment.

One or more locations could be assigned to as a side eect of a call statement. A use

of a location is where the value of the location aects the execution of a statement,

e.g. in the right hand side of an assignment, or in the condition or destination of a

branch. Just as compilers require data ow analysis to perform optimisation and good

code generation, decompilers require data ow analysis for the various transformations

between instruction decoding and control ow analysis.

To demonstrate the various uses of data ow analysis in decompilers, the running ex-

ample of Figure 3.2 will be used for most of this and the next chapters. It is a simple

program for calculating the number of combinations of r objects from a set of n objects.

The combinations function is dened as follows:



 (n−r) terms top and bottom


if n > r
 z }| {
!



 n(n − 1)...
n
n n! 
(n − r)(n − r − 1)...
Cr = = =
r r!(n − r)! 





 1 if n = r

 undef ined if n < r

Figure 3.3 shows the rst part of a possible compilation of the original program for the

x86 architecture. For readers not familiar with the x86 assembly language, an overview

is given in Table 3.1. The disassembly is shown in AT&T syntax, where the last operand

is the destination, as used by the binutils tools such as objdump [FSF01].

Data ow analysis is involved in many aspects of decompilation, and is a well known

technique. In compilers, it is frequently required for almost any form of optimisation

[ASU86, App02]. Other tools such as link time optimisers also use data ow analysis

[Fer95, SW93]. Early decompilers made use of data ow analysis (e.g. [Hol73, Fri74,

Bar74]).

Section 3.1 shows how useful expression propagation is in decompilation. Uncontrolled

propagation, however, can cause problems as Section 3.2 shows. Dead code elimina-

tion, introduced in Section 3.3, reduces the bulk of the decompiled output, one of the

advantages of high level source code. One set of the machine code details that are elim-

inated includes the setting and use of condition codes; Section 3.3.1 shows how these
3.1 Expression Propagation 65

are combined. The x86 architecture has an interesting special case with oating point

compares, discussed in Section 3.3.2. One of the most important aspects of a program's

representation is the way that calls are summarised. Section 3.4 covers the special

terminology related to calls, the necessity of changing from caller to callee contexts,

and discusses how parameters, returns, and global variables interact when locations are

dened along only some paths. The various elements of call summaries are enumerated

in Section 3.4.4 in the form of equations. Section 3.5 considers whether data ow could

or should be performed over the whole program at once, while Section 3.6 discusses

safety issues related to the summary of data ow information. Architectures that have

overlapped registers present problems that can be overcome with data ow analysis, as

shown in Section 3.7.

3.1 Expression Propagation

Expression propagation is the most common transformation used by decompilers, and

there are two simple rules, yet dicult to check, for when it can be applied.

The IR for the rst seven instructions of Figure 3.3 are shown in Figure 3.4(a). Note

how statement 2 uses (in m[esp]) the value of esp computed in statement 1.

Let esp0 be a special register for the purposes of the next several examples. An extra

statement, statement 0, has been inserted at the start of the procedure to capture the

initial value of the stack pointer register esp. This will turn out to be a useful thing

to do, and the next chapter will introduce an intermediate representation that will do

this automatically.

Statement 0 can be propagated into every other statement of the example; this will

not always be the case. For example, statement 1 becomes esp := esp0 - 4. Similarly,

m[esp] in statement 2 becomes m[esp0-4], as shown in Figure 3.4(b). Propagation is

also called forward propagation or forward substitution.

The rules for when propagation is allowed are well known. A propagation of x from a

source statement s of the form x := exp to a destination statement u can be performed

if the following conditions hold (adapted from [ASU86]):

1. Statement s must be the only denition of x that reaches u.

2. On every path from s to u, there are no assignments to any location used by s.

For example if s is edx := m[ebp+8], then there must be no assignments to m[ebp+8]


or to ebp along any path from s to u. With traditional data ow analysis, two separate
66 Data Flow Analysis

0 esp0 := esp ; Save esp; see text


80483b0 1 esp := esp - 4
2 m[esp] := ebp ; push ebp
80483b1 3 ebp := esp
80483b3 4 esp := esp - 4
5 m[esp] := esi ; push esi
80483b4 6 esp := esp - 4
7 m[esp] := ebx ; push ebx
80483b5 8 esp := esp - 4
9 m[esp] := ecx ; push ecx
80483b6 10 tmp1 := esp
11 esp := esp - 8 ; sub esp, 8
80483b9 13 edx := m[ebp+8] ; Load n to edx
(a) Before expression propagation

0 esp0 := esp
80483b0 1 esp := esp0 - 4
2 m[esp0-4] := ebp
80483b1 3 ebp := esp0-4
80483b3 4 esp := esp0 - 8
5 m[esp0-8] := esi
80483b4 6 esp := esp0 - 12
7 m[esp0-12] := ebx
80483b5 8 esp := esp0 - 16
9 m[esp0-16] := ecx
80483b6 10 tmp1 := esp0 - 16
11 esp := esp0 - 24
80483b9 13 edx := m[esp0+4] ; Load n to edx
(b) After expression propagation

80483b0 2 m[esp0-4] := ebp


80483b3 5 m[esp0-8] := esi
80483b4 7 m[esp0-12] := ebx
80483b5 9 m[esp0-16] := ecx
80483b9 13 edx := m[esp0+4] ; Load n to edx
(c) After dead code elimination

Figure 3.4: IR for the rst seven instructions of the combinations example.

analyses are required. The rst analysis pass uses use-denition chains (ud-chains),

based on reaching denitions, a forward-ow, any-path data ow analysis. The second

one requires a special purpose analysis called rhs-clear , a forward-ow, all-paths an-

alysis [Cif94]. The next chapter will introduce an intermediate representation which

makes the checking of these rules much easier.

Statement 3 also uses the value calculated in statement 1, so statement 1 is also prop-

agated to statement 3. Statement 13 uses the value calculated in statement 3, and the
3.1 Expression Propagation 67

conditions are still met, so statement 3 can be propagated to statement 13. The result

of applying all possible propagations is shown in Figure 3.4(b).

Note that the quantity being propagated in these cases is of the form esp0-K, where K
is a constant; the quantity being propagated is an expression. This is expression prop-

agation , and diers from the constant propagation and copy propagation performed in

compilers. Compilers propagate statements of the form x := y, where x is a variable

and y is either a constant or another variable, but not a more complex expression.

Performing expression propagation tends to make the result more complex, copy prop-

agation leaves the complexity the same, and constant propagation usually makes the

result simpler. Compilers translate potentially complex expressions eventually to simple

machine instructions. In other words, the overall progression is from the complex to the

simple. Decompilers do the opposite; the progression is from the simple to the complex.

As a result, decompilers need not restrict propagation to that of constants or simple

variables (locations). Copy and constant propagation are also useful for decompilers,

since propagating them tends to reduce the need for temporary variables.

Expression propagation, when applied repeatedly, tends to result in expressions that are

in terms of the input values of the procedure. This is already evident in Figure 3.4(b),

where the uses of seven dierent values for esp are replaced by constant osets from

one (esp0). This tendency is very useful in a decompiler, because memory locations

end up in a canonical form, as shown in the following example.

80483d8 89 34 24 mov %esi,(%esp) ; Copy denom to integer stack


80483db da 75 e8 fidiv -24(%ebp) ; res /= denom
(a) Machine code

80483d8 55 m[esp] := esi


80483db 56 st := st /f (double)m[ebp-24]
(b) IR before propagation

80483d8 55 m[esp0-28] := esi


80483db 56 st := st /f (double)m[esp0-28]
(c) IR after propagation

Figure 3.5: Two machine instructions referring to the same memory location using
dierent registers. /f is the oating point division operator, and (double) is the
integer to oating point conversion operator.

Figure 3.5 shows two instructions from inside the while loop, which implement the

division. In Figure 3.5(a), m[esp] is an alias for m[ebp-24], however this fact is not
68 Data Flow Analysis

obvious. After repeated propagation, both memory expressions result in m[esp0-28], as


shown in Figure 3.5(c). As a result, both memory expressions are treated as representing

the same memory location, as they should. Treating them as separate locations could

result in errors in the data ow information. This form of aliasing is more prevalent at

the machine code level than at the source code level.

3.2 Limiting Propagation

Although expression propagation is usually benecial in a decompiler, in some circum-

stances limiting propagation produces more readable output.

Figure 3.6 shows decompiler output where too much propagation has been performed.

24 if (local26 >= maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0) > > 1) {


25 local26 = (maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0) > > 1) - 1;
Figure 3.6: Excessive propagation. From [VEW04].

The result is needlessly dicult to read, because a large expression has been duplicated.

The reader has to try to nd a meaning for the complex expression twice, and it is not

obvious from the code whether the condition is the same as the right hand side of the

assignment on line 25. Figure 3.7 shows how the problem arises in general.

x := large_expression x := large_expression // Dead code


if (x ...) if (large_expression ...)
... x ... large_expression
(a) Original intermediate code (b) Undesirable code after propagation

Figure 3.7: The circumstance where limiting expression propagation results in more
readable decompiled code.

Common Subexpression Elimination (CSE) is a compiler optimisation that would ap-

pear to solve this problem. However, implementing CSE everywhere possible eectively

reverses all propagations, as demonstrated by the results in Section 7.2.

Excessive propagation can be limited by heuristics, as shown below. In some cases,

it may be desirable to undo certain propagations, e.g. where specied manually. For

those cases, CSE could be used to undo specic propagations. While compilers perform

CSE for more compact and faster machine code, decompilers limit or undo expression

propagation to reduce the bulk of the decompiled code and improve its readability.

The problem with Figure 3.6 is that a large expression has been propagated to more

than one use. Where this expression is small, e.g. max-1, this is no problem, and is

probably easier to read than a new variable such as max_less_one.


3.3 Dead Code Elimination 69

Similarly, when a complex expression has only one use, it is also not a problem. However,

where the result of a propagation would be an expression occupying several lines of

code, it may be worth preventing the propagation. The threshold for such propagation

limiting may vary with individual user's preferences.

The main problematic propagations are therefore those which propagate expressions

of moderate complexity to more than one destination. Simple measures of expression

complexity will probably suce, e.g. the number of operators is easy to calculate and

is eective.

Section 7.2 demonstrates that minimising such propagations is eective in improving

the readability of the output in these cases.

3.3 Dead Code Elimination

Dead code elimination is facilitated by storing all uses for each denition (denition-use

information).

Dead code consists of assignments for which the denition is no longer used; its elimi-

nation is called Dead Code Elimination (DCE). Dead code contrasts with unreachable

code, which can be any kind of statement, to which there is no valid path from the

program's entry point(s).

Propagation often leads to dead code. In Figure 3.4(b), half the statements are cal-

culations of intermediate results that will not be needed in the decompiled output,

and are therefore eliminated as shown in Figure 3.4(c). In fact, all the statements in

this short example will ultimately be eliminated as dead code. DCE is also performed

in compilers, where it is considered an optimisation. The main process by which the

machine specic details of the executable program are removed in a decompilation is

by a combination of expression propagation and dead code elimination. This removes

considerable bulk and greatly increases the readability of the generated output. Hence,

DCE is an important process in decompilation.

As the name implies, dead code elimination depends on the concept of liveness infor-

mation. If a location is live at a program point, changing its value (e.g. by overwriting

it or eliminating its denition) will aect the execution of the program. Hence, a live

variable can not be eliminated. In traditional data ow analysis, every path from a

denition to the end of the procedure has to be checked to ensure that on that path the

location is dened before it is used. The next chapter will introduce an intermediate

representation with the property that any use anywhere in the program implies that

the location is live.


70 Data Flow Analysis

Some instructions typically generate eects that are not used. For example, a divide

instruction usually computes both the quotient and the remainder in two machine

registers. Most high level languages have operators for division and remainder, but

typically only one or the other operation is specied. As a result, the divide and similar

instructions often generate dead code.

3.3.1 Condition Code Combining


Expression propagation can be used to implement machine independent combining of

condition code denitions and uses.

Another machine specic detail which must usually be eliminated is the condition code

register, also called the ags register or status register. Individual bits of this register

are called condition codes, ags, or status bits. Condition codes are a feature of machine

code but not high level source code, so the actual assignments and uses of the condition

codes should be eliminated. In some cases, it may be necessary for a ag, e.g. the

carry ag as a local Boolean variable CF, to be visible. In general, such cases should be

avoided if at all possible, since they represent very low level details that programmers

normally don't think about, and hence do not appear in normal source code.

In the example IR of Figure 3.4, the instruction at address 80483b6, a subtract imme-

diate instruction, has the side eect of setting the condition codes. There is a third

statement for this instruction (statement 12, omitted earlier for brevity) representing

this eect, as shown in Figure 3.8(a).

As shown in Figure 3.8(b), the semantics implied by the SUBFLAGS macro are complex.

Decompilers do not usually need the detail of the macro expansion. Other macro calls

are used to represent the eect on condition codes after other classes of instructions, e.g.

oating point instructions, or logical instructions such as a bitwise and. For example,

add and subtract instructions aect the carry and overow condition codes dierently

on most architectures.

In this case, none of the four assignments to condition codes are used. In a typical

x86 program, there are many modications of the condition codes that are not used;

the semantics of these modications are removed by dead code elimination. Naturally,

some assignments to condition codes are not dead code, and are used by subsequent

instructions.

Separate instructions set condition codes (e.g. compare instructions), and use them (e.g.

conditional branch instructions). Specic instructions which use condition codes use

dierent combinations of the condition code bits, e.g. one might branch if lower (carry
3.3 Dead Code Elimination 71

80483b6 10 tmp1 := esp


11 esp := esp - 8 ; sub $8,%esp
12 %flags := SUBFLAGS(tmp1, 8, esp)

(a) Complete semantics for the instruction using the SUBFLAGS macro call.

SUBFLAGS32(op1, op2, result) {


%CF := op1 <u op2
%OF := MSB(op1) ^ MSB(op2) ^ MSB(result) ^ %CF
%NF := result <s 0
%ZF := result == 0
};
(b) Denition for the SUBFLAGS macro. %CF, %OF, %NF and %ZF
represent respectively the carry, overow, negative, and zero ags;
MSB(loc ) represents the most signicant bit of location loc ; <u
and <s are the unsigned and signed comparison operators.

Figure 3.8: The subtract immediate from stack pointer instruction from regis-
ter esp from Figure 3.4(a) including the side eect on the condition codes.

set) or equal (zero set), while another might set a register to one if the sign bit is set.

Hence the overall eect is the result of combining the semantics of the instruction which

sets the condition codes, and the instruction which uses them. Often, these instructions

are adjacent or near each other, but it is quite possible that they are separated by many

instructions.

Data ow analysis makes it easy to nd the condition code denition associated with

a condition code use. It reveals which instructions combine at a particular condition

code use, and hence the semantics of that use. This combination can be used to dene

the semantics of the instruction using the condition codes (e.g. the branch condition is

a ≤ b ). After this is done, the condition code assignments are no longer necessary, and

can be eliminated as dead code.

In a decompiler, modications to the condition code register can be modelled by as-

signments to the %flags abstract location, as shown earlier in Figure 3.8(a). Figure

3.9(b) shows an assignment to the condition codes by a decrement instruction that is

used by the following branch instruction.

In general, after an arithmetic instruction (e.g add, decrement, compare), there will be

an assignment of the form

%flags := SUBFLAGS(op1 , op2 , res )


where op1 and op2 are the operands, and res is the result (sum or dierence). Add-like

instructions use the ADDFLAGS macro, which has slightly dierent semantics to the
72 Data Flow Analysis

80483e2 4B dec %ebx ; Decrement counter and set flags


80483e3 7d ee jge 80483d3 ; Continue while counter >= 0
(a) Machine code

80483e2 59 tmp1 := ebx


60 ebx := ebx - 1
61 %flags := SUBFLAGS(tmp1, 1, ebx)
80483e3 62 BRANCH 80483d3, conditions: ge, %flags
(b) IR before propagation

80483e2 59 tmp1 := ebx // Dead code


60 ebx := ebx - 1
61 %flags := SUBFLAGS(tmp1, 1, ebx) // Dead code
80483e3 62 BRANCH 80483d3 if ebx >= 1
(c) IR after propagation

Figure 3.9: Combining condition codes in a decrement and branch sequence.

SUBFLAGS macro. A compare instruction is the same as a subtract instruction without

the result being stored, however it is convenient to compute a result in a temporary

location to pass as the last operand of the SUBFLAGS macro. After a logical instruction

(and, xor, test) there will be an assignment of the form

%flags := LOGICALFLAGS(res)

where res is the result of the logical operation.

Branch statements contain an expression representing the high-level condition, essen-

tially the expression that would be emitted within parentheses of an if statement when
decompiling to C. For integer branches, this expression is initially set to %flags, so that

expression propagation will automatically propagate the correct setting of the integer

condition codes to the branch statement. Floating point branches initially set the con-

dition to %fflags (representing the oating point ags), so that interleaved integer and

oating point condition codes producers (e.g. compare instructions) and consumers

(e.g. branch statements) will combine correctly.

At many stages of decompilation, expressions are simplied . For assignments, identities

such as x + 0 = x are applied. Combining condition code denitions and uses can be

implemented as a special case of expression simplication. For example, there can be

special modications that convert

BRANCH (SUBFLAGS(op1 , op2 , res ), BRANCH_ULE) to BRANCH(op1 <u op2 ) and

BRANCH (LOGICALFLAGS(op1 , op2 , res ), BRANCH_MINUS) to BRANCH(res < 0).


3.3 Dead Code Elimination 73

Here, BRANCH_ULE is an enumeration representing branches with the condition unsigned


less, and <u represents the unsigned less than operator.

The general method of using propagation to combine the setting and use of condition

codes to form conditional expressions is due to Trent Waddington, who rst imple-

mented the method in the Boomerang decompiler [Boo02]. It is able to handle various

control ow congurations, such as condition code producers and consumers appearing

in dierent basic blocks.

There is a special case where the carry ag can be used explicitly, as part of a subtract

with borrow instruction. Figure 3.10 from the running example shows a subtract with

borrow instruction used in an idiomatic code sequence.

The essence of the sequence is to perform a compare or subtract which aects the

condition codes, followed by a subtract with borrow of a register from itself. The carry

ag (which also stores borrows from subtract instructions) is set if the rst operand of

the compare or subtract is unsigned less than the second operand. The result is therefore

either zero (if no carry/borrow) or -1 (if carry/borrow was set). This value can be anded
with an expression to compute conditional values without branch instructions, or by

adding one to the result, to compute C expressions such as a = (b < c) (a is set to 1


if b<c, else 0).

In the example of Figure 3.10, ebx is set to r-n, and the carry ag is set if r<n and

cleared otherwise. Register eax is subtracted from itself with borrow, so eax becomes
-1 if r<n, or 0 otherwise. After eax is anded with ebx, the result is r-n if r<n, or 0
otherwise. Finally, n is added to the result, giving r if r<n, or n otherwise.
The carry ag can be treated as a use of the %flags abstract location, so that appro-

priate condition code operations will propagate their operands into the %CF location.

During propagation, a special test is made for the combination of propagating %flags
= SUBFLAGS(op1 , op2 , res ) into %CF. The branch condition is set to op1 <u op2 .
The result is correct, compilable, but dicult to read code as shown in Figure 3.10(e).

Readability could be improved with an extra simplication rule of the form

((0 - (a <u b ) & (a -b )) ⇒ (a <u b ) ? (a -b ) : 0


in conjunction with

(p ? x : y ) + c ⇒ p ? x +c : y +c .
The above technique is not suitable for multi-word arithmetic, which also uses the carry

ag. Multi-word arithmetic could be handled by high level patterns.

The reader may question why the semantics for ags are not substituted directly into

expressions for the branch conditions directly to derive the high level branch condi-

tion from rst principles each time. For example, the instruction commonly known as
74 Data Flow Analysis

if (r > n)
r = n; /* Make sure r <= n */
(a) Source code

80483b9 8b 55 08 mov 0x8(%ebp),%edx ; edx stores n and num


80483bc 89 fb mov %ecx,%ebx ; ebx = r
80483be 2b 5d 08 sub 0x8(%ebp),%ebx ; ebx = r-n, carry set if r<n
80483c1 19 c0 sbb %eax,%eax ; eax = -1 if r<n else = 0
80483c3 21 d8 and %ebx,%eax ; eax = r-n if r<n else = 0
80483c5 03 45 08 add 0x8(%ebp),%eax ; eax = r-n=r if r<n, else = n
(b) Machine code

80483b9 13 edx := m[ebp + 8]


80483bc 14 ebx := ecx
80483be 15 tmp1 := ebx
16 ebx := ebx - m[ebp + 8]
17 %flags := SUBFLAGS(tmp1, m[ebp + 8], ebx)
80483c1 18 tmp1 := ...
19 eax := 0 - %CF
20 %flags := SUBFLAGS(...)
80483c3 21 eax := eax & ebx
22 %flags := LOGICALFLAGS(eax)
80483c5 23 tmp1 := eax
24 eax := eax + m[ebp + 8]
25 %flags := ADDFLAGS(...)

(c) IR before propagation

80483b9 13 edx := m[esp0 + 4]


80483bc 14 ebx := ecx
80483be 15 tmp1 := ecx
16 ebx := ecx - m[esp0+4]
17 %flags := SUBFLAGS32(ecx, m[esp0+4], ebx)
80483c1 18 tmp1 := eax
19 eax := 0 - (edi <u m[esp0+4])
20 %flags := SUBFLAGS(...)
80483c3 21 eax := (0 - (ecx <u m[esp0+4])) & ebx
22 %flags := LOGICALFLAGS(...)
80483c5 23 tmp1 := ...
24 eax := ((0 - (ecx <u m[esp0+4])) & ebx) + m[esp0+4]
25 %flags := ADDFLAGS( ... )
(d) IR after propagation

Figure 3.10: Code from the running example where the carry ag is used explicitly.
3.3 Dead Code Elimination 75

edx = param1; // edx := n


ebx = param2 - param1; // ebx := r-n
eax = 0 - (param2 < param1); // eax := r<n ? -1 : 0
eax = (eax & ebx) + param1; // eax := r<n ? r : n
esi = param1 - eax; // esi := n-r if n<r
...
do{ ...
st = st * (float)edx_1 / (float)esi; // Use esi as denom

(e) Generated C code.

Figure 3.10: (continued). Code from the running example where the carry ag
is used explicitly.

branch if greater or equals actually branches if the sign ag is equal to the overow

ag. It is possible to substitute the expressions for %NF and %OF from Figure 3.8(b) to

op1 ≥s op2, but the derivation is dicult. The distilled knowledge


derive the condition

that the combination of op1 - op2 (assuming ags are set) and JGE implies that the

branch is taken when op1 ≥s op2 is essentially caching a dicult proof .

Once propagation of the %flags abstract location is performed, the assignment to

%flags is unused, so that standard dead code elimination will remove it. In sum-

mary, expression propagation makes it easy to combine condition code setting and use.

Special propagation or simplication rules extract the semantics of the condition code

manipulations. Finally, dead code elimination removes machine dependent condition

codes and the %flags abstract location.

3.3.2 x86 Floating Point Compares


Expression propagation can also be used to transform away machine details such as

those revealed by older x86 oating point compare instruction sequences.

Modern processors in the x86 series are instruction set compatible back to the 80386

and even the 80286 and 8086, where a separate, optional numeric coprocessor was used

(80387, 80287 or 8087 respectively). Because the CPU and coprocessor were in separate

physical packages, there was limited access from one chip to the other. In particular,

oating point branches were performed with the aid of the fnstsw (oating point store
status word) instruction, which copied the oating point status word to the ax register

of the CPU. Machine code for the oating point compare of Figure 3.2 is shown in

Figure 3.11.

The upper half of the 16-bit register ax is accessible as the single byte register, ah.
The oating point status bits are masked o with the test $0x45,%ah instruction.
76 Data Flow Analysis

int i = (int) res; /* Integer result; truncates */


if (res - i > 0.5)
i++; /* Round up */
(a) Source code

; res-i is on the top of the floating point stack


08048405 d9 05 60... flds 0x8048568 ; Load 0.5
0804840b d9 c9 fxch %st(1) ; Exchange: TOS=0.5
0804840d da e9 fucompp ; Floating compare 0.5 to res-i
0804840f df e0 fnstsw %ax ; Store FP condition codes in ax
08048411 f6 c4 45 test $0x45,%ah ; Test all FP flags
08048414 75 01 jne 8048417 ; Jump if integer nz ⇐⇒ if FP ≥
(b) Disassembly

Figure 3.11: 80386 code for the oating point compare in the running example.

The expression propagated into the branch is (SETFFLAGS(op1 , op2 , res) & 0x45)
!= 0. Similar logic to the integer propagation is used to modify this to op1 >=f op2,

where >=f is the oating point greater-or-equals operator. The result is a quite general

solution that eliminates most of the machine dependent details of the above oating

point compare code. The test $0x45,%ah is often replaced with sequences such as and
$0x45,%ah; xor $0x40,%ah or and $1,ah. Some compilers emit a jp instruction (jump
if parity of the result is even) after a test or and instruction. This can save a compare

instruction. For example, after test $0x41,%ah and jp dest , the branch is not taken

if the oating point result is less (result of the test instruction is 0x01) or equal (0x40),

but not uncomparable (0x41). (Two oating point numbers are uncomparable if one

of them is a special value called a NAN (Not A Number).) The exact sequence of

instructions found depends on the compiler and the oating point comparison operator

in the original source code.

3.4 Summarising Calls

The eects of calls are best summarised by the locations modied by the callee, and the

locations used before denition in the callee.

If data ow analysis is performed on the whole program at once, the eects of called

procedures are automatically incorporated into their callers; Section 3.5 will consider

this possibility. Assuming that the whole program is not analysed at once, a summary

of some kind is required of the eects of calls.

Information has to be transmitted from a caller to and from a callee (e.g. the number
3.4 Summarising Calls 77

and types of parameters), and this information may change as the decompilation pro-

ceeds. This raises some issues with respect to the context of callers and callees, which

will be considered after the terminology denitions.

Finally, the various call-related quantities of interest can be concisely dened by a set

of data ow equations.

3.4.1 Call Related Terminology


The semantics of call statements and their side eects necessitates terminology that

extends terms used by the compiler community.

Call instructions, and the call statements that result from them in a decompiler's in-

termediate representation, are complex enough to warrant some special terminology.

By comparing the source and machine code in Figures 3.2 and 3.3, the instructions at

addresses 80483b9 and 80483bc are seen to load parameters from the stack and from
a register respectively. To prevent confusion, a strict distinction is made between the

formal parameters of a procedure and actual arguments of a call statement. Instruc-


tions not shown compute a value in register edx, copy it to register eax, and the caller
uses the value in register eax.

Since both edx and eax are modied by the procedure, they are called modieds.
The value or values returned by a procedure are called returns (used as a noun). It
is possible at the machine language level that more than one value is returned by a

procedure, hence the plural form. Unlike the return values of high level languages, at

the machine code level, returns are assignments. Thus, there are return locations
(usually registers) and return expressions (which compute the return values). In this
program, only the value in eax is used, so only eax is called a return of the procedure
comb and a result of the call to it. If and only if a call has a result will there be

either an assignment to the result in the decompiled output, or the call will be used as

a parameter to another call.

Figure 3.12 shows IR for part of the running example, illustrating several of the terms

dened above.

The concept of parameters and returns as special features of a procedure is almost

entirely nonexistent at the machine code level. The two loads from the parameters are

similar to a load from a local variable and a register to register copy. Assignments to

return locations are indistinguishable from ordinary assignments. The fact that register

eax is used by some callers, and register edx is not used by any callers, is not visible

at all by considering only the IR of the called procedure. The locations used to store
78 Data Flow Analysis

Figure 3.12: Intermediate representation illustrating call related terminology.

parameters and return locations are a matter of convention, called the Application

Binary Interface (ABI); there are many possible conventions. While most programs

will follow one of several ABI conventions, some will not follow any, so a decompiler

should not assume any particular ABI.

Most of the time, a location that is used before it is dened implies that it is a parameter.

However, in the example of Figure 3.3, registers esi and ebx are used by the push
instructions, yet they are not parameters of the procedure, since the behaviour of the

program does not depend on the value of those registers at the procedure's entry point.

The registers esi and ebx are said to be preserved by the procedure.

Register ecx is also preserved, however the procedure does depend on its value; in

other words, ecx is a preserved parameter. Care must be taken to separate those

locations used before being dened which are preserved, parameters, or both. This can

be achieved by a combination of expression propagation and dead code elimination.

For example, consider statements 5, 7 and 9 of Figure 3.4. These save the preserved

registers, and are ultimately dead code. They are needed to record the fact that comb

preserves esi, ebx and ecx, but are not wanted in the decompiled output. When

the preservation code is eliminated, the only remaining locations that are used before

denition will be parameters.

Reference parameters are an unusual case. Consider for example a procedure that takes

only r as a reference parameter. At the machine code level, its address is passed in

a location, call it ar. In a sense, the machine code procedure has two parameters, ar
(since the address is used before denition) and r itself, or *ar (since the referenced

object and its elements are used before being dened). However, the object is by

denition in memory. There is no need to consider the data ow properties of memory

objects; the original compiler (or programmer) has done that. Certainly it is important

to know whether *ar is an array or structure, so that its elements can be referenced

appropriately, but this can be done by dening ar with the appropriate type (e.g. as

being of type Employee* rather than void*). It is quite valid for the IR to keep the
3.4 Summarising Calls 79

parameter as a pointer until very late in the decompilation. It could be left to the

back end or even to a post decompilation step to convert the parameter to an actual

reference, since not all languages support references (e.g. C).

3.4.2 Caller/Callee Context


Memory and register expressions are frequently communicated between callers and callee(s);

the dierence in context requires some substitutions to obtain expressions which are valid

in the other context.

When the callee for a given caller is known, the parameters are initially the set of

variables live at the start of the callee. These locations are relative to the callee, and

not all can be used at the caller without translation to the context there. Figure 3.13

shows the call from main to combs in the running example.

804847e 47 ecx := m[esp0-12] ; copy r


8048481 48 esp := esp0 - 64
49 m[esp0-64] := m[esp0-8] ; copy n
8048489 51 esp := esp0 - 68
52 m[esp0-68] := %pc ; Save program counter
54 CALL comb
(a) Caller (main)

80483b9 13 edx := m[esp0+4] ; load n


80483bc 14 ebx := ecx ; load r
(b) Callee (comb)

Figure 3.13: Caller/callee context.

The rst argument, n, is passed in memory location m[esp0-64], while r is passed

in register ecx. In the callee, parameter n is read from m[esp0+4], while r is read

from register ecx. (The memory location m[esp0] holds the return address in an x86

procedure). Note that these memory expressions are in terms of esp0, the stack pointer
on entry to the procedure, but the two procedures (main and comb ) have dierent

initial values for the stack pointer. Without expression propagation to canonicalise the

memory expressions in the callee, the situation is even worse, since parameters could be

accessed at varying osets from an ever changing stack pointer register, or sometimes

at osets from the stack pointer register and at other times at osets from the frame

pointer register (if used).

Since it will be necessary to frequently exchange location expressions between callers and

callees , care needs to be taken that this is always possible. For example, if register ecx
in main is to be converted to a local variable, say local2, it should still be accessible as the
80 Data Flow Analysis

original register ecx, so that it can be matched with a parameter in callees such as comb.
Similarly, if ecx in comb is to be renamed to local0, it should still be accessible as ecx for
the purposes of context conversion. Similar remarks apply to memory parameters such

as m[esp0-64] in main and m[esp0+4] in comb. One way to accomplish this is to not

explicitly rename a location to an expression representing a local variable, but rather

maintain a mapping between the register or memory location and the local variable

name to be used at code generation time (very late in the decompilation). Most of

the time the IR is used directly, but at code generation time and possibly in some IR

display situations the mapped name is used instead.

Note that while arguments and the values returned by procedures can be arbitrary

expressions, the locations that they are passed in are usually quite standardised. For

example, parameters are usually passed in registers or on the stack; return values are

usually located in registers. Hence, as long as registers are not renamed, the main

problem is with the stack parameters, which are usually expressions involving the stack

pointer register.

Consider argument n in main in Figure 3.13. It has the memory expression m[esp0-64].
Suppose we know the value of the stack pointer at the call to comb ; here it is espcall

= esp0-68 (in the context of main ; in other words, the value of the stack pointer at

the call is 68 bytes less than the initial value at the start of main ). This is reaching

denitions information; it turns out to be useful to store this information at calls (for

this and several other purposes). It is assumed that if the caller pushes the return

address, this is part of the semantics of the caller, and has already happened by the

time of the call. The call is then purely a transfer of control, and control is expected

to return to the statement following the call.

Hence, esp0 (at the start of comb ) = espcall +68. Substituting this equation into
the memory expression for n in main (m[esp0-64]) yields m[espcall + 68 - 64] =

m[espcall + 4]. Since the value for the stack pointer at the call is the same as the value

at the top of comb, this is the memory expression for n in terms of esp0 in comb, in

other words, in the context of comb this expression is m[esp0+4], just as the IR in

Figure 3.13(b) indicates. Put another way, it is possible to generate the expression

(here m[esp0+4]) that is equivalent to the expression valid in main (here m[esp0-64])
for a stack location, so that it is possible to search through the IR for a callee to nd

important information. A similar substitution can be performed to map from the callee

context to the caller context. When the number of parameters in a procedure changes,

machine code decompilers need to perform this context translation to adjust the number

of actual arguments at each call.


3.4 Summarising Calls 81

3.4.3 Globals, Parameters, and Returns


Three propositions determine how registers and global variables that are assigned to

along only some paths should be treated.

Figure 3.14 shows parts of the running example extended with some assignments and

uses of global variables. It shows a potential problem that arises when locations are

assigned to on some but not all paths of a called procedure.

int g; register int edi; /* Global memory and register */


int comb(int n, int r) {
...
int c = n-r;
if (c == 0) {
g = 7; edi = 8; /* Possibly modify globals */
} ...
}
int main() {
g=2; edi=3; /* Assign to globals */
...comb(n, r);
printf("g = %d, edi = %d\n", g, edi);
}

Figure 3.14: Potential problems caused by locations undened on some


paths through a called procedure.

In this example, g and edi are global variables, with edi being a register reserved for this
global variable. Both g and edi are found to be modied by comb. If the assumption

is made that locations dened by a call kill reaching denitions and livenesses, the

denitions of g and edx in main no longer reach the use in the printf call. As a result,

the denitions are dead and would be removed as dead code. This is clearly wrong,

since if n6=r and c6=0, then the program should print g = 2, edi = 3 rather than

undened values.

One of these global variables is a register, represented in the decompiled output as a

local variable. It could also be represented by a global variable, but at the machine code

level, there are few clues that the register behaves like a global variable. The two ways

that the modication of a local variable can be communicated to its callers are through

the return value, or through a reference parameter. A reference parameter is logically

equivalent to the location being both a parameter and a return of the procedure. In

a decompiler, it is convenient to treat parameters as being passed by value, so that

arguments of procedure calls are pure uses, and only returns dene locations at a call

statement.
82 Data Flow Analysis

Location edi therefore becomes a return of procedure comb. Since comb already returns
i, this will be the second return for it. Local variable edi is not dened before use on
the path where n=r, in other words the value returned in edi by comb depends on the

value before the call, hence edi also becomes a parameter of comb.

Conveniently, making edi an actual argument of comb means that the denition of edi
is no longer unused (it is used by the one of the arguments of the call to comb).

Unfortunately, this solution is not suitable for the global variable, g. In the example,

there would be code similar to g = comb(g, ...), and the parameter would shadow

the global inside the procedure comb. This has the following problems:

• It is inecient allocating a parameter to shadow the global variable, copying the

global to the parameter, from the parameter to the return location, and from the

return location back to the global.

• Manipulating global variables in this fashion is unnatural programming; program-

mers do not write code this way, so the result is less readable.

• The global may be aliased with some other expression (e.g. another reference

parameter, or the dereference of a pointer). When the aliased location is assigned

to, the global will change, but not the parameter.

The problem can be solved by adopting a policy of never allowing globals to become

parameters or returns, and never eliminating denitions to global variables. The latter is

probably required for alias safety in any case. The above discussion can be summarised

in the following propositions:

Proposition 3.1: Locations assigned to on only some paths become parameters and
returns.

Proposition 3.2: Global variables are never parameters or returns.

Proposition 3.3: Assignments to global variables should never be eliminated.

Using these extra constraints causes the denitions for g and edi of Figure 3.14 to

remain, although for dierent reasons. The program decompiles correctly, as shown in

Figure 3.15.

The decompiled program does not retain edi as a global register. It could be argued

that a decompiler should recognise that such variables are global variables. The only

clue at the machine code level that a register is global would be if it is not dened

before being passed to every procedure that takes it as a parameter, perhaps initialised
3.4 Summarising Calls 83

int g; /* Global memory */


int comb(int n, int r, int* edi) {
...
int c = n-r;
if (c == 0) {
g = 7; *edi = 8; /* Possibly modify globals */
} ...
}
int main() {
g=2; /* Initialise the global variable */
int edi=3; /* edi is a local variable now */
...comb(n, r, &edi); /* Pass edi by reference */
printf("g = %d, edi = %d\n", g, edi);
}
Figure 3.15: Solution to the problem of Figure 3.14.

in a compiler generated function not reachable from main. In the above example, this

does not apply since the parameter is dened before the only call to comb.

Decompilation can sometimes expose errors in the original program. For example, a

program could use an argument passed to a procedure call after the call, assuming that

the argument is unchanged by the call. This could arise through a compiler error, or

a portion of the program written in assembly language. The decompiled output will

make the parameter a reference parameter, or add an extra return, which is likely to

make the error more visible than in the original source code. In a sense, there is no such

thing as an error in the original program as far as a decompiler is concerned; its job is

to reproduce the original program, including all its faults, in a high level language.

3.4.4 Call Summary Equations


The calculation of call-related data ow elements such as parameters, denes, and re-

sults can be concisely expressed in a series of data ow equations.

Proposition 3.2 states that global memory variables are not suitable as parameters and

returns. This proposition motivates the denition of lters for parameters and returns:

param-lter = {non-memory expressions + local-variables} (3.4)

ret-lter = {non-memory expressions} (3.5)

These lters can be thought of as sets of expressions representing locations, which can

be intersected with other sets of locations to remove unsuitable locations. Locations


84 Data Flow Analysis

that satisfy the lter criteria (i.e. are not ltered out) are referred to as suitable .

Local variables are not ltered from potential parameters for the following reason.

Some parameters are passed on the stack; architectures that pass several parameters

in registers pass subsequent parameters, if any, on the stack. Stack locations used

for parameters are equivalent to local variables in the caller. They are both located

at osets (usually negative) from the stack pointer value on entry to the procedure

(esp0 in the running example). A machine code decompiler will typically treat local

variables and stack locations used to pass procedure arguments the same, and eliminate

assignments to the latter as dead code. For simplicity, no distinction will be made

between the two types of variable here.

The main data ow sets of interest concerning procedures are as follows.

• Potential parameters of a procedure p are live variables at the procedure entry.

They are potential parameters, because preserved locations often appear to be pa-

rameters when they are not. Such locations will be removed from the parameters

with a later analysis, described in Section 4.3.

init-params(p) = live-on-entry(p) ∩ param-lter (3.6)

• Initial argument locations of a call c are the potential parameters of the callee,

translated to the caller context as discussed in Section 3.4.2:

argument-locations(c) = params(callee) (3.7)

Here params will be either init-params, or nal-params if available. Note that

while an argument location may be a register or memory location, the actual

argument may be any expression such as a constant or a sum of two locations.

Arguments are best represented as a kind of assignment, with a left hand side

(the argument location) and a right hand side (the actual argument expression).

The argument location has to be retained for translating to and from the callee

context, as mentioned earlier. By contrast, parameters are represented by only

one location, the register or memory location that they are passed in.

• Modieds of a procedure are the denitions reaching the exit, except for preserved

locations. This exception is explained in detail in Section 4.3. Modieds are a

superset of returns: not all dened locations will be used before denition in any

caller.

modif ieds(p) = reach-exit(p) − preserveds(p) (3.8)


3.4 Summarising Calls 85

• Denes of a call are the denitions reaching the exit of the callee:

def ines(c) = modif ieds(callee) (3.9)

• Results of a call are the suitable denes that are also live at the call:

results(c) = def ines(c) ∩ ret-lter ∩ live(c) (3.10)

• Returns of a procedure are the suitable locations modied by the procedure, which

are live at some call or calls to that procedure:

S
return-locations(p) = modif ieds(p) ∩ ret-lter ∩ live(c) (3.11)
c calls p

Returns, like arguments, are best represented as assignments, with a left hand side

containing the return location, and the right hand side being the actual return

expression. Results, like parameters, are represented by only one expression, the

location that they are passed in.

In all but the rst case, context translation from the callee to the caller or vice versa is

required.

Potential return locations are based on reaching denitions at the exit of the procedure.

It might be thought that available denitions (statements whose left hand side is dened

on all paths) should be used instead of reaching denitions, since it seems reasonable

that a return value should be dened along all paths in a procedure. However, as shown

in the example of Figures 3.14 and 3.15, suitable locations that are dened along only

some paths become returns and parameters, or equivalently reference parameters.

The above equations assume that a callee procedure can be found for each call. In some

cases, this is not true, at least temporarily. For example, an indirect call instruction may

not yet be resolved, or recursion may prevent knowledge of every procedure's callees.

In these cases, the conservative approximation is made that the callee uses and denes

all locations. For such childless calls , denoted as cc :

• Argument locations are all denitions reaching the call:

argument-locations(cc) = reach(cc) (3.12)

• Denes and results are all locations live at the call:

def ines(cc) = results(cc) = live-at-call(cc) (3.13)

where live-at-call(cc) is the set of locations live at the childless call statement cc.
86 Data Flow Analysis

3.4.5 Stack Pointer as Parameter


The stack pointer, and occasionally other special pointers, can appear to be a parameter

to every procedure, and is handled as a special case.

The stack pointer register is used before it is dened in all but the most trivial of

procedures. For all other locations, use before denition of the location (after dead

code is eliminated) implies that the location is a parameter of the procedure. In the

decompiled output, the stack pointer must not appear as a parameter, since high level

languages do not mention the stack pointer. It is constructive to consider why the stack

pointer is an exception to the general parameter rule, and whether there may be other

exceptions.

The value of the stack pointer on entry to a procedure aects only one aspect of the

program: which set of memory addresses are used for the stack frame (and the stack

frames for all callees). By design, stack memory is not initialised, and after the proce-

dure exits, the values left on the stack are never used again. The stack is therefore a

variable length array of temporary values. The value of the stack pointer at any instant

is not important; only the relative addresses (osets) within the stack array are im-

portant, to keep the various temporary values separated from one another, preventing

unintended overwriting.

The program will run correctly with any initial stack pointer value, as long as it points

to an appropriately sized block of available memory. In this sense, the program does

not depend on the value of the stack pointer. This is the essence of the exception to

the general rule.

Some programs reference initialised global memory through osets from a register. One

example of this are Motorola 68000 series programs which use register A5 as the pointer
to this data. Another example are x86 programs using the ELF binary le format.

These programs use the ebx register to point to a data section known as the Global

Oset Table (GOT). These special registers would also appear to be parameters of all

procedures, yet the specic value of these registers is unimportant to the running of the

program, and should not appear in the decompiled output. In these cases, however, it is

probably better for the front end of the decompiler to assign an arbitrary value to these

special registers. Doing this ensures that all accesses to scalar global variables are of

the form m[K] where K is a constant. If this is done, the question of whether the special

registers should be removed as parameters disappears: propagation and simplication

will transform all the scalar memory addresses to the m[K] form.

It might be considered that giving the stack pointer an initial arbitrary value would

avoid the need for any exception. However, when a procedure is called from dierent
3.5 Global Data Flow Analysis 87

points in the program, it can be passed dierent values for that procedure's initial stack

pointer. Worse, in a program with recursion, the value of the stack pointer depends on

the dynamic behaviour of the program. Rather than causing the stack pointer to take

on simple constant values, such a scheme would, for many programs, generate extremely

complex expressions for each stack memory reference.

In summary, the stack pointer is the only likely exception to the general rule that a

location used before denition after dead code elimination in a procedure implies that

the location is a parameter.

3.5 Global Data Flow Analysis

Decompilers could treat the whole program as one large, global (whole-program) data

ow problem, but the problems with such an approach may outweigh the benets.

Some analyses are approximate until global analyses can be performed over the IR for

all procedures. In the running example, the procedure comb modies registers eax and
edx. It is not possible to know whether these modied registers are returning a value

until all callers are examined. Imagine that comb is used in a much larger program,

and there are a hundred calls to it. No matter what order the callers are processed, it

could be the case that the last caller processed will turn out to be the rst one to use

the result computed in (e.g.) register edx. Other analyses also require the IR for all

procedures. For example, type analysis is global in the sense that a change to the type

of a parameter or return can aect callers or callees respectively.

Since the IR for all procedures has to be available for a complete decompilation, one

approach would be to treat the whole program as a single whole-program data ow

problem, with interprocedural edges from callers to callees, and from the exit of callees

back to the basic block following the caller. The solution of data ow problems is usually

iterative, so a global data ow analysis would automatically incorporate denitions and

uses from recursive callees.

As noted by Srivastava and Wall [SW93], a two phase algorithm is required, to prevent

impossible data ow (a kind of leakage of data ow information from one call to

another). So-called context insensitive interprocedural data ow ignores this problem,

and suers a lack of precision as a result [YRL99]. Figure 3.16 shows a program with

two procedures similar to main in the running example (main1 and main2 ) calling

comb, which is the same as in the running example. This program illustrates context

sensitivity.

Two callers are passing dierent arguments, say comb(6, 4) and comb(52, 5). The
88 Data Flow Analysis

Figure 3.16: Interprocedural data ow analysis. Adapted from [SW93].

second argument, passed in register ecx, is 4 for the rst caller, and 5 for the second.

Without the two stage approach, all interprocedural edges are active at once, and it is

possible for data ow information to traverse from basic block b1 (which assigns 4 to

ecx), through procedure comb, to basic block b4, which prints among other information
the value of ecx to caller 2, which should see only the denition in b2 reaching that use

(with the value 5). Similarly, data ow information from b2 can imprecisely ow to b3.

The solution to this lack of precision requires that the data ow be split into two phases.

Before the start of the rst phase, some of the interprocedural edges are disabled.

Between the rst and second phases, these edges are enabled and others are disabled.

In [SW93], only backward-ow problems were considered, but the technique can be

extended to forward-ow problems as well.

Factors not favouring global data ow analysis include:

• Resources: the interprocedural edges consume memory, and their activation and

deactivation consumes time. In addition, there are many more denitions for call

parameters, since they are dened at every call. Every parameter will typically

have from two to several or even hundreds of calls. Special locations such as the

stack pointer will typically have thousands of denitions.


3.5 Global Data Flow Analysis 89

• There is no need for the two phase algorithm in the non-global analysis, with its

detail, and time and space costs.

• Since some programs to be compiled require a large machine to execute them, and

decompilation requires of the order of 20 times as much memory as the original

executable, it would be useful to be able to split the problem of decompiling large

programs so that parts of it can be run in parallel with cluster or grid computing.

This is more dicult with the global approach.

• Procedures are inherently designed to be separate from their callers; the non-

global approach seems more natural.

• The number of denitions for a parameter can become very large if a procedure

is called from many points in the program. So there can be problems scaling the

global approach to large programs.

Factors that do not signicantly favour one approach over the other include:

• Potential parameters are readily identied in the non-global approach as those

that have no denition. However, with the global approach, parameters could be

identied as locations that have some or all denitions in procedures other than

the one they are used in.

• The saving and restoration of preserved locations become dead code when con-

sidering procedures in isolation. This eectively solves the problem of preserved

locations becoming extra parameters and returns, although recursion causes extra

problems (Sections 4.3 and 4.4). With the global approach, denitions only used

outside the current procedure (returns are used by the return statement in the

current procedure) could be eliminated as dead code.

Factors favouring the global approach include:

• Simplicity: the data ow problems are largely solved by one, albeit large, analysis.

• There is no need to summarise the eects of calls, and any imprecision that may

be associated with the summaries.

• There are problems that result from recursion, which do not arise with the global

approach. However, these are solvable for the non-global approach, as will be

shown in Section 4.4.

• Information has to be transmitted from the callee context to the caller context,

and vice versa, in the non-global approach. Section 3.4.2 showed that this is easily

performed.
90 Data Flow Analysis

3.6 Safe Approximation of Data Flow Information

Whether over- or under-estimation of data ow information is safe depends on the

application; for decompilers, it is safe to overestimate both denitions and uses, with

special considerations for calls.

Some aspects of static program analysis are undecidable or dicult to analyse, e.g.

whether one branch of an if statement will ever be taken, or whether one expression

aliases with another. As a result, there will always be a certain level of imprecision

in any data ow information. Where no information is available about a callee, the

lack of precision of data ow information near the associated call instructions could

temporarily be extensive, e.g. every location could be assumed to have been assigned

to by the call.

Since approximation is unavoidable, it is important to consider safety, in the sense

of ensuring that approximations lead at worst to lower readability, and not to incor-

rect output (programs whose external behaviour diers from that of the input binary

program).

Less readable output is obviously to be preferred over incorrect output. In many cases,

if the analyses are sophisticated enough, low quality output will be avoided, as tem-

porarily prevented transformations could later be found to be safe, excess parameters

can be removed by a later analysis, and so on. Where information is incomplete, ap-

proximations are necessary to prevent errors. Errors usually cannot be corrected, but

approximations can be rened.

The main question to be answered in this section is whether an over- or under-estimate of

elementary data ow operations (e.g. denitions, parameters, etc.) is safe. For example,

denitions over which there is some uncertainty could be counted as actual denitions

(over-estimating denitions), or not (under-estimating denitions). Denitions could

also be divided into three categories: must-dene, does-not-dene, and may-dene.

Using the three categories is more precise, but consumes more memory and complicates

the analyses.

An expression can only be safely propagated from a source statement to a destination

statement if there are no denitions to locations that are components of the source

statement (Section 3.1). If denitions are underestimated (the algorithm is not aware

of some denitions), then a propagation could be performed which does not satisfy the

second condition. Hence, underestimating denitions is unsafe for propagation. By

contrast, overestimating denitions could prevent some propagations from occurring,

or at least be delayed until later changes are made to the IR of the program. This will

potentially cause lower quality decompilations.


3.6 Safe Approximation of Data Flow Information 91

Denitions that reach the exit of a procedure are potential return locations. Underes-

timating denitions would cause some such denitions to not become returns, resulting

in some semantics of the original program that are not contained in the decompiled

output. Overestimation of denitions may cause extra returns, which is safe.

Locations that are live (used before being dened) are generally parameters of a pro-

cedure (an exception will be discussed later). Underestimating uses would result in

too few parameters for procedures, hence underestimating uses is unsafe. Overesti-

mating uses may result in overestimating livenesses, which may result in unnecessary

parameters in the generated output; this is less readable but safe.

Dead code elimination depends on an accurate list of uses of a denition. If uses are

overestimated, then some statements that could be deleted would be retained, resulting

in lower quality but correct decompilations. If uses are underestimated, denitions could

be eliminated when they are used, resulting in incorrect output. Hence, underestimation

of uses is also unsafe for dead code removal.

Overall, an overestimate of denitions and uses is safe, and an underestimation of

either is unsafe. However, overestimation of one quantity can lead to underestimation

of another. In particular, denitions kill liveness (the state of being live), leading to

potentially underestimated uses. Figure 3.17(a) shows a denition and use with a call

between them; the call has an imprecise analysis which nds that x may be dened

by the call when in reality it is not; it is approximated to be denitely dened (its

denitions are overestimated). The arrows represent the liveness of x owing from

the use to the denition. This situation arises commonly when it is not known whether

a location is preserved or not, either because the preservation analysis has failed, or

because the analysis can not be performed yet (e.g. as a result of recursion).

definition of x definition of x
x appears to be unused:
underestimating uses of x x is used

x may be used: x presumed


x := func() overestimate x := func(x) defined and
definitions used

real use of x real use of x


(a) Overestimating definitions only (b) Overestimating definitions and uses

Figure 3.17: Transmission of liveness from use to denition through a call.

With the data ow as shown in Figure 3.17(a), the denition of x appears to be unused,

hence it appears to be safe to eliminate the denition as dead code. In reality, x is used

and the removal would be incorrect. The solution to this problem is as follows: inside
92 Data Flow Analysis

calls, for every non-global location whose denition is overestimated (such as x in the

gure), its uses must be also be overestimated. In Figure 3.17(b), x is an argument of

the call as well as a return, hence the original denition of x is still used, and it will

not be eliminated.

This is similar to the problem of Figure 3.14; in that example, the location was known

to be dened but only along some paths. If x is a global variable, the above would not

apply, and the uses of x would in fact be underestimated above the call. However, by

Proposition 3.3, denitions of globals are never eliminated, hence the underestimation is

safe. Global variables must not be propagated past calls to procedures that may dene

them, but this is guaranteed by the fact that such globals appear in the modieds list

for the call. However, these modieds never become returns because of the return lter

of Equation 3.5.

Parameters act as denitions when there is no explicit denition before a use in a pro-

cedure. As shown in Figure 3.18, a similar situation exists with procedure parameters,

and the solution is the same: overestimate the uses as well as the denitions in the call

by making x an argument as well as a return of the call to func.

proc outer() proc outer(x)


x appears to be dead:
underestimating liveness x is live
of x
x may be used: x presumed
x := func() overestimate x := func(x) defined and
definitions used

real use of x real use of x


(a) Overestimating definitions only (b) Overestimating definitions and uses

Figure 3.18: Transmission of liveness from use to parameter through a call.

3.7 Overlapped Registers

Overlapped registers are dicult to handle eectively in the intermediate representa-

tion, but representing explicit side eects produces a correct program, and dead code

elimination makes the result readable.

CISC architectures (Complex Instruction Set Computers) often provide the ability to

operate on operands of various sizes, e.g. 8-/16-/32- and even 64-bit values. Figure

3.19 shows the x86 register eax, whose lower half can be accessed as the register ax,
and both halves of ax can be accessed as al (low half ) and ah (high half ). In the
3.7 Overlapped Registers 93

x64 architecture, all these registers overlap with the 64-bit rax register. Three other

registers (ebx, ecx , and edx) have similar overlaps, while other registers have overlaps

only between the 32-bit register and its lower half (e.g. ebp and bp).
31 0
0x12345678 eax

15 0

0x5678 ax

15 8 7 0
ah 0x56 0x78 al

Figure 3.19: Overlapped registers in the x86 architecture.

The Motorola 68000 family of processors has a similar situation, although only the

lower parts (lower 16 bits, or lower 8 bits) can be accessed. In the 6809 and 68HC11

series, both halves of the D (double byte) register can be accessed as the A (upper) and

B (lower) accumulators.

Some RISC architectures have a similar problem. For example, SPARC double precision

registers use pairs of single precision registers, and typical 32-bit SPARC programs use

pairs of 32-bit loads and stores when loading or storing a 64-bit oating point register.

However, this problem is dierent, in that the smaller register is the machine word size,

and bit manipulation such as shifting and masking cannot be performed on oating

point registers.

Since there are distinct names for the overlapped registers at the assembly language

level, it seems natural to represent these registers with separate names in the interme-

diate representation (IR). However, this leads to a kind of aliasing at the register level;

treating the registers as totally independent leads to errors in the program semantics.

The alternative of representing all subregisters with the one register in the IR also

has problems. For example, al and ah may be used to implement two distinct 8-bit

variables in the original source program. When accessing the variable assigned to ah,
awkward syntax will result, e.g. local3 > > 8 or local3 & 0xFF. Even worse, the data
ow for the two variables will become conated, so that assignment to one variable will

appear to kill the other.

Another possibility is to declare registers in the decompiled output using overlapped

source code constructs such as C unions or Fortran common statements. However, such

output is quite unnatural, and represents low level information that does not belong

in the decompiled output. Such a scheme is more suitable for a binary translator that

uses a compiler as its back end (e.g. [CVE00]).


94 Data Flow Analysis

The best solution seems to be to treat the overlapped registers as separate, but to

make the side eects explicit. This was rst suggested for the Boomerang decompiler

by Trent Waddington [Boo02]. Most such side eects can be added at the instruction

decoding level, which is source machine dependent already. (An exception are results

from call statements, which are not explicit at the assembly language level, and which

may be added later in the analysis.) After any assignment to an overlapped register, the

decoder can emit assignments representing the side eects. Considering the example

register in Figure 3.19, there are four cases to consider:

• After an assignment to eax:

 ax := truncate(eax)
 al := truncate(eax)
 ah := eax@[8:15] /* Select bits 8-15 from eax */

• After an assignment to ax:

 eax := (eax & 0xFFFF0000) | ax


 al := truncate(ax)
 ah := ax@[8:15]

• After an assignment to al:

 eax := (eax & 0xFFFFFF00) | al


 ax := (ax & 0xFF00) | al

• After an assignment to ah:

 eax := (eax & 0xFFFF00FF) | ah << 8


 ax := (ax & 0x00FF) | ah << 8

Many of these have the potential of leading to complex generated source code, but

dead code elimination will remove almost all such code. For example, where al and

ah are used as independent variables, the larger registers ax and eax will not be used,

so dead code elimination will simply eliminate the complex assignments to ax and

eax. When operands are constants, the simplication process reduces something like

0x12345678@8:15 to 0x56 or plain decimal 86. Where the complex manipulations are

not eliminated, the original source program must in fact have been performing complex

manipulations, so the complexity is justied.


3.8 Related Work 95

3.7.1 Sub-elds
Sub-elds present similar problems to that of overlapped registers.

Some high level languages, such as C, allow the packing of several structure elds into

one machine word. Despite the fact that the names of such elds are related, in data

ow terms, the variables are completely independent. As mentioned above, current data

ow analyses will not treat them as independent, leading to less than ideal generated

code. There is no reason that compilers for languages such as Pascal which support

range variables (e.g. 0..255 or 100..107) could not pack two or more such variables

into a machine word. Such variables will also not be handled properly by a standard

data ow analysis.

The solution to this problem is left for future work.

The PowerPC architecture has a 32-bit register containing 8 elds of four condition

codes (8 condition code registers, named cr0 - cr7 ). Instructions using the condition

codes register specify which condition codes set they will refer to. This poses a similar

problem, except that the way that the condition code register is represented in a de-

compiler is a design decision. In this case, it would appear to be better to implement

the 32-bit condition code register as eight separate registers.

3.8 Related Work

The related work conrms that the combination of expression propagation and dead code

elimination is a key technique in decompilation.

Johnstone and Scott describe a DSP assembly language to C reverse compiler called

asm21toc [JS04]. They nd that the combination of propagation and dead code elimi-

nation is also useful for their somewhat unusual application (DSP processors have some

unusual features compared to ordinary processors). They do not appear to use any

special IR; they parse the assembly language to produce low-level C, perform data

ow analyses using standard techniques (presumably similar to those of [ASU86]), and

use reductions to structure the some high level constructs. Their output typically re-

quires editing before it can be compiled. They found that some 85% of status (ags)

assignments could be eliminated, and typically 40% of other assignments.


96 Data Flow Analysis
Chapter 4

SSA Form

Static Single Assignment form assists with most data ow components of decompilers,

including such fundamental tasks as expression propagation, identifying parameters and

return values, deciding if locations are preserved, and eliminating dead code.

Chapter 3 listed many applications for data ow analysis, several of which were some-

what dicult to apply. This chapter introduces the Static Single Assignment form (SSA

form), which dramatically reduces these diculties. The SSA form also forms the basis

for type analysis in Chapter 5 and the analysis of indirect jumps and calls in Chapter

6.

The SSA form is introduced in Section 4.1, showing that the implementation of many of

the data ow operations in a machine code decompiler is simpler. Some problems that

arise with the propagation of memory expressions are discussed in Section 4.2, while

Section 4.3 introduces the important concept of preserved locations. The analysis of

these is complicated by recursion, hence Section 4.4 discusses the problems caused by

recursion, and presents solutions. At various points in the process of decompilation, it is

convenient to store a snapshot of denitions or uses, and these are provided by collectors,

discussed in Section 4.5. Sections 4.6 and 4.7discuss related work and representations,

while Section 4.8 summarises the contributions of this chapter.

4.1 Applying SSA to Registers

SSA form vastly simplies expression propagation, provides economical data ow in-

formation, and is strong enough to solve problems that most other analysis techniques

cannot solve.

The Static Single Assignment (SSA) form is a form of intermediate representation

which maintains the property that each variable or location is dened only once in

97
98 SSA Form

the program. Maintaining the program in this form has several advantages: analyses

are simpler in this form, particularly expression propagation. For most programs, the

size of the SSA representation is roughly linear in the size of the program, whereas

traditional data ow information tends to be quadratic in memory consumption.

In order to make each denition of a location unique, the variables are renamed. For

example, three denitions of register edx could be renamed to edx1 , edx2 , and edx3 .
The subscripts are often assigned sequentially as shown, but any renaming scheme

that makes the names unique would work. In particular, the statement number or

memory address of the statement could be used. SSA form assumes that all locations

are initialised at the start of the program (or program fragment or procedure if analysing

in isolation). The initial values of locations are usually given the subscript 0, e.g. edx0
for the initial value of edx.
Since each use in a program has only one denition, a single pointer can be used from

each use to the denition, eectively providing reaching denitions information (the one

denition that reaches the use is available by following the pointer). Much of the time,

the original denition does not have to be literally renamed; it can be assumed that

each denition is renamed with the address of the statement as the subscript. (For

convenience, display of the location could use a statement number as the subscript).

Sassa et al. report that renaming the variables accounts for 60-70% of the total time
+
to translate into SSA form [SNK 03]. Implicit assignments are required for parameters

and other locations that have uses but no explicit assignments; these are comparatively

rare. This representation is used in the Boomerang decompiler [Boo02].

The algorithms for transforming ordinary code into and out of SSA form are somewhat

complex, but well known and ecient [BCHS98, App02, Mor98, Wol96]. Figure 4.1

shows the main loop of the running example and its equivalent in SSA form. Statements

irrelevant to the example have been removed for clarity (e.g. assignments to unused

temporaries, unused ag macros, and unused assignments to the stack pointer).

Note how the special variable esp0 instroduced earlier is now automatically replaced

by the initial value of the stack pointer, esp0 , and there is no need to explicitly add

an assignment from esp0 to esp. The initial value for other locations is also handled

automatically, e.g. st0 .


Assignments with φ-functions are inserted at the beginning of basic blocks where control

ow merges, i.e. where a basic block has more than one in-edge. A φ-function is needed
for location a only if more than one denition of a reaches the start of the basic block.

The top of the loop in Figure 4.1 is an example; locations m[esp0 -28], st, edx, esi,
and ebx are all modied in the loop, and also have denitions prior to the loop, but

esp0 itself requires no φ-function. st1 := φ(st0 , st3 ) means that the value of st1 ,
4.1 Applying SSA to Registers 99

loop1:
80483d8 46 m[esp] := edx
80483d9 47 st := st *f (double)m[esp]
80483dc 49 edx := edx - 1 // Decrement numerator
80483dd 51 m[esp] := esi
80483e0 52 st := st /f (double)m[ebp-24]) // m[ebp-24] = m[esp]
80483e6 57 esi := esi - 1 // Decrement denominator
80483e7 60 ebx := ebx - 1 // Decrement counter
80483e8 62 goto loop1 if ebx >= 0
80483f5 65 m[esp] := (int)st
80483fb 67 edx := m[esp]
(a) Original program. *f is the oating point multiply operator, and /f
denotes oating point division.

loop1:
m[esp0 -28]1 := φ(m[esp0 -28]0 , m[esp0 -28]3 )
st1 := φ(st0 , st3 )
edx1 := φ(edx0 , edx2 )
esi1 := φ(esi0 , esi2 )
ebx1 := φ(ebx0 , ebx2 )
46 m[esp0 -28]2 := edx1
47 st2 := st1 *f (double)m[esp0 -28]2
49 edx2 := edx1 - 1 ; Decrement numerator
51 m[esp0 -28]3 := esi1
52 st3 := st2 /f (double)m[esp0 -28]3
57 esi2 := esi1 - 1 ; Decrement denominator
60 ebx2 := ebx1 - 1 ; Decrement counter
62 goto loop1 if ebx2 >= 0
65 m[esp0 -20]1 := (int)st3
67 edx3 := m[esp0 -20]1
(b) SSA form.

Figure 4.1: The main loop of the running example and its equivalent SSA form.

treated as a separate variable from st0 and st3 , takes the value st0 if control ow

enters via the rst basic block in-edge (falling through from before the loop), and the

value st3 if control ow enters through the second in-edge (from the branch at the end

of the loop). In general, a φ-function at the top of a basic block with n in-edges will

have n parameters.

A relation called the dominance frontier eciently determines where φ-functions are

necessary: for every node n dening location a, every basic block in the dominance

frontier of n requires a φ-function for a. The dominance frontier for a program can

be calculated in practically linear time. If required, φ-functions can be implemented

(made executable) by inserting copy statements. For example, st1 := φ(st0 , st3 )
could be implemented by inserting st1 := st0 just before the loop (i.e. at the end of
100 SSA Form

the basic block which is the rst predecessor to the current basic block), and inserting

st1 := st3 at the end of the loop (which is the end of the basic block which is the

second predecessor to the current basic block). In many instances the copy statements

will not be necessary, as will be shown in a later section.

The main property of a program in SSA form is that each use has eectively only one

denition. If in the pre-SSA program a use had multiple denitions, the denition

will be a φ-function that has as operands the original denitions, or other φ-functions
leading them. A secondary property is that denitions dominate all uses. That is, any

path that reaches a use of a location is guaranteed to pass through the single denition

for that location. For example st is dened at statement 52 in Figure 4.1(a), after a use
in statement 47. However, in the equivalent program in SSA form in Figure 4.1(b), there

is a φ-function at the top of the loop which dominates the use in statement 47. One of

the operands of that φ-function references the denition in statement 52 (i.e. sp3 ).

4.1.1 Benets
The SSA form makes propagation very easy; initial parameters are readily identied,

preservations are facilitated, and SSA's implicit use-denition information requires no

maintenance.

Both requirements for expression propagation (Section 3.1 on page 65) are automatically

fullled by programs in SSA form. The rst condition (that s must be the only denition

of x to reach u ) is satised because denitions are unique, and the second condition

(that there be no denitions of locations used by s ) is satised because denitions

dominate uses. For example, consider propagating statement

s: st2 := st1 *f (double)m[esp0 -28]2 into

u: st3 := st2 /f (double)m[esp0 -28]3 .


Since the program is in SSA form, any path from the entry point to statement s must

pass through the unique denitions for st1 , esp0 , and m[esp0 -28], the three subscripted
locations on the right hand side. Because all denitions are unique, any path from s to

u cannot contain another denition of st1 , esp0 , or m[esp0 -28].


If the left hand side of the statement being propagated is a memory expression or an

array expression, there is a similar requirement for subscripted locations in the address

or index expressions on the left hand side. For example, if propagating

s2 : m[esp0 -28]3 := esi1 into

u: st3 := st2 /f (double)m[esp0 -28]3 ,


4.1 Applying SSA to Registers 101

there can be no assignment to esp0 or esi1 between s2 and u, and s2 must be the only

denition of m[esp0 -28] to reach u. These are again automatically satised, except that

the SSA form has no special way of knowing whether expressions other than precisely

m[esp0 -28] alias to the same memory location. For example, if m[ebp1 -4] aliases with
m[esp0 -28] and there is an assignment to m[ebp1 -4], the assumption of denitions
being unique is broken. This issue will be discussed in greater detail later.

After statement 46 is propagated into statement 47, the IR is as shown in Figure 4.2.

Note that the propagation from statement 47 to statement 52 is one that would not

normally be possible, since there is a denition of edx between them. In SSA terms,

however, the denition is of edx2 , whereas edx1 , treated as a dierent variable, is used

by statement 47. The cost of this exibility is that when the program is translated back

out of SSA form, an extra variable may be needed to store the old value of edx (edx1 ).

46 m[esp0 -28]2 := edx1


47 st2 := st1 *f (double)edx1
49 edx2 := edx1 - 1 ; Decrement numerator
51 ...
52 st3 := st2 /f (double)m[esp0 -28]3
Figure 4.2: A propagation not normally possible is enabled by the SSA form.

As a result, once a program or procedure is in SSA form, most propagations can be

performed very easily. Most uses can be replaced by the right hand side of the expression

indicated by the subscript. In practice, some statements are not propagated e.g. φ-
functions and call statements. φ-functions must remain in a special position in the

control ow graph. Call statements may have side eects, but could be propagated

under some circumstances. If the resultant statement after a propagation still has

subscripted components, and those denitions are suitable for propagation, the process

can be repeated. For example, the result of the substitution from s to u is

st3 := (st1 *f (double)m[esp0 -28]2 ) /f (double)m[esp0 -28]3 ,

which can have the two memory expressions propagated, resulting in

st3 := (st1 *f (double)edx1 ) /f (double)esi1 .

Note that this propagation has removed the machine code detail that the oating point

instructions used require the operands to be pushed to the stack; these instructions can

not reference integer registers such as edx directly. All three uses in this expression

refer to φ-functions, so no further propagation is possible.

Since expression propagation is simple in SSA form, analyses relying on them (e.g. con-

dition code elimination) benet indirectly.


102 SSA Form

While SSA form inherently provides only use-denition information (the one denition

for this use, and other denitions via the φ-functions), denition-use information (all

uses for a given denition) can be calculated from this in a single pass. The IR can

also be set up to record denition-use information as well as or instead of use-denition

information upon translation into SSA form (so the arcs of the SSA graph point from

denition to uses). Denition-use information facilitates transformations such as dead

code elimination and translating out of SSA form.

The algorithm for transforming programs into SSA form assumes that every variable is

implicitly dened at the start of the program, at an imaginary statement zero. This

means that when a procedure is transformed into SSA form, all its potential parameters

(locations used before being dened) appear with a subscript of zero, and are therefore

easily identied.

The implicit use-denition information provided by the SSA form requires no mainte-

nance as statements are modied and moved. There is never an extra denition created

for any use, and a denition should never be deleted while there is any use of that

location. By contrast, denition-use information changes with propagation (some uses

are added, others are removed), and when dead code is eliminated (uses are removed).

Deciding whether a location is preserved (saved and restored, or never modied) by a

procedure is also facilitated by the way that the SSA form separates dierent versions

of the same location. Firstly, it is easy to reason with equations such as  esi99 =
esi0 , rather than having to distinguish the denition of esi that reaches the exit, the

original value of esi, and all its intermediate denitions. The equation can be simplied

to true or false by manipulation of the equation components. Secondly, starting with

the denition reaching the exit, preceding denitions can be accessed directly without

searching all statements in between.

With the aid of some simple identities, SSA form is able to convert some sequences of

complex code into a form which greatly assists its understanding. One example is the

triple xor idiom, shown in Section 4.3.

Finally, the SSA form is essential to the operation and use of collectors, discussed in

Section 4.5.

From the above, it is clear that the SSA form is very useful for decompilers.

4.1.2 Translating out of SSA form


The conversion from SSA form requires the insertion of copy statements; many factors

aect how many copies are needed.


4.1 Applying SSA to Registers 103

The Static Single Assignment form is not directly suitable for converting to high level

source code, since φ-functions are not directly executable. Φ -functions could eectively
be executed by inserting one copy statement for each operand of the φ-function. Every

denition of every original variable would then eectively be the denition of a separate

variable. This would clutter the decompiled output with variable declarations and copy

statements, making it very dicult to read. Hence, the intermediate representation has

to be transformed out of SSA form in a dierent way.

Several mapping policies from SSA variables to generated code variables are possible.

For example, the original names could be ignored, and new variables could be allocated

as needed. However, this discards some information that was present in the input

binary - the allocation of original program variables to stack locations and to registers.

While optimising compilers do merge several variables into the one machine location,

and sometimes allocate dierent registers to the same variable at dierent points in the

program, they may not do this, especially if register pressure is low. Attempting to

minimise the number of generated variables could lead to the sharing of more than one

unrelated original variable in one output variable. While the output program would be

more compact, it could become more confusing than other options.

The simplest mapping policy is to keep the original allocation of program variables to

memory or register locations as far as possible. In most cases, the implementation of

this policy is to simply remove the subscripts and φ-functions. However, there are two

circumstances where this is not possible.

The rst is where the binary program uses the location in such a way that dierent

types will have to be declared in the output program. In the running example, register

location edx is used to hold three original program variables: num, n, and i. All of these
were declared to be the same type, int, but if one was of a dierent fundamental type

(e.g. char*), then the location edx would have to be split into two output variables,
each with a dierent type.

The second circumstance where subscripts cannot simply be removed is that there may

be an overlap of live ranges between two or more versions of a location in the current

(transformed) IR. When two names for the same original variable are live at the same

point in the program, one of them needs to be copied to a new local variable. Figure 4.3

shows part of the main loop of the running example, after the propagations of Section

4.1.1. Note how the multiply operation, previously before the decrement of edx, now

follows it, and the version of edx used (edx1 ) is the version before the decrement.

The seven columns at the left of the diagram represent the live ranges of the SSA

variables. None of the various versions of the variables overlap, except for edx1 and

edx2 , indicated by the shaded area. When transforming out of SSA form, a temporary
104 SSA Form

edx2
edx1
edx0
esi2
esi1
st3
st1
loop1:
edx1 := O(edx0, edx2)
st1 := O(st0, st3)
esi1 := O(esi0, esi2)
edx2 := edx1 - 1
st3 := st1 *f (double)edx1 /f (double)esi1
esi2 := esi1 - 1
goto loop1 if ...
edx3 := (int)st3

Figure 4.3: Part of the main loop of the running example, after the propagation
of the previous section and dead code elimination.

variable is required to hold either edx1 or edx2 . Figure 4.4(a) shows the result of

replacing edx1 with a new local variable, local1.


loop1: loop1:
local1 := edx local2 := edx - 1
edx := edx - 1
st := st *f (double)local1 st := st *f (double)edx
/f (double)esi /f (double)esi
esi := esi - 1 esi := esi - 1
edx := local2
goto loop1 if ... goto loop1 if ...
edx := (int) st edx := (int) st

(a) Replacing edx1 with local1 (b) Replacing edx2 with local2
Figure 4.4: The IR of Figure 4.3 after transformation out of SSA form.

edx2 with local2, as shown in Figure 4.4(b).


It is also possible to replace In this case,

the denition for edx1 became φ(edx0 , local2), which is implemented by inserting

the copy statement edx := local2 at the end of the loop.

Note that in this second case, it is now possible to propagate the assignment to local2
into its only use, thereby removing the need for a local variable at all. In addition,

the combination of the SSA form and expression propagation has achieved code motion

(i.e. the decrement of edi has moved past the multiply (as in the original source code)

and past the divide (which it was not in the original source code). Unfortunately, this is

not always possible, as shown in the next example. Figure 4.5 shows the code of Figure

4.3, with the loop condition now num > r c=


instead of 6 0, and unlimited propagation

as before. Register ecx0 holds the value of parameter r.

Expression propagation has again been unhelpful, changing the loop condition from the

original edx2 > r to edx1 -1 > r. Now it is no longer possible to move the denition of
4.1 Applying SSA to Registers 105

edx2
edx1
edx0
esi2
esi1
st3
st1
loop1:
edx1 := O(edx0, edx2)
st1 := O(st0, st3)
esi1 := O(esi0, esi2)
edx2 := edx1 - 1
st3 := st1 *f (double)edx1 /f (double)esi1
esi2 := esi1 - 1
goto loop1 if edx1-1 > ecx0
edx3 := (int)st3

Figure 4.5: The code of Figure 4.3, with the loop condition optimised to num>r.

edx2 to below the last use of edx1 , since edx1 is used in the loop termination condition.

Figure 4.6 shows two possible transformations out of SSA form for the code of Figure

4.5.

loop1: loop1:
local1 := edx local2 := edx-1
edx := local1-1
st := st *f (double)local1 st := st *f (double)edx
/f (double)esi /f (double)esi
esi := esi - 1 esi := esi - 1
local1 := edx
edx := local2
goto loop1 if local1-1 > ecx goto loop1 if local1-1 > ecx
edx := (int) st edx := (int) st

(a) Replacing edx1 with local1 (b) Replacing edx2 with local2,
then of necessity edx1 with local1

Figure 4.6: Two possible transformations out of SSA form for the code of
Figure 4.5.

Note that the second case could again be improved if expression propagation is per-

formed on local2. Unfortunately, at this point, the program is no longer in SSA form,

so propagation is no longer as easy as it was. Such propagations within the transfor-

mation out of SSA form are only needed across short distances (within a basic block).

Hence it may be practical to test the second condition for propagation by testing each

statement between the source and destination. This condition (see Section 3.1) states

that components of the expression being propagated should not be redened on the

path from the source to the destination.

In this case, not propagating into the loop expression would have avoided the problem.

Techniques for minimising the number of extraneous variables and copy statements will
106 SSA Form

be discussed in a following section. Optimising the decision of which of two versions

of a variable to replace with a new variable, and eectively performing code motion, is

equivalent to sequentialising data dependence graphs, and has been shown to be NP

complete [Upt03].

The above examples show that the number of copy statements and extra local variables

is aected by many factors. In the worst case, it is possible for a φ-function with

n operands to require n copy statements to implement it, if every operand has been

assigned a dierent output variable. Section 4.1.3 will discuss how to minimise the

copies.

4.1.2.1 Unused Denitions with Side Eects

Unused but not eliminated denitions with side eects can cause problems with the

translation out of SSA form, but there is a simple solution.

Figure 4.7 shows a version of the main loop of the running example where an assignment

to st2 , even though unused, is not eliminated before transformation out of SSA form.

This can occur with assignments to global variables, which should never be eliminated.

loop1:
st1 := φ(st0 , st3 )
edx1 := φ(edx0 , edx2 )
esi1 := φ(esi0 , esi2 )
st2 := st1 *f (double)edx1 ; Unused definition
edx2 := edx1 - 1 ; Decrement numerator
st3 := st1 *f (double)edx1 /f (double)esi1
esi2 := esi1 - 1 ; Decrement denominator
goto loop1 if ...
edx3 := (int) st3

Figure 4.7: A version of the running example where an unused denition has
not been eliminated.

The denition of st2 no longer has any uses, so it has no live range, and hence does

not interfere with the live ranges of other versions of st. As a result, considering only

live ranges, translation to the incorrect code of Figure 4.8 is possible.

This program is clearly wrong; in terms of the original program variables, res *= num
is being performed twice for each iteration of the loop, where the original program

performs this step only once per iteration. The solution to this problem is to treat

denitions of versions of variables as interfering with any existing liveness of the same

variable, even if they cannot themselves contribute new livenesses. The result is to

cause one or other assignment to st to assign to a new variable.


4.1 Applying SSA to Registers 107

loop1:
st := st *f (double)edx ; Unused code with side effects
local1 := edx
edx := edx - 1
st := st *f (double)local1 /f (double)esi
esi := esi - 1
goto loop1 if ...
edx := (int) st

Figure 4.8: Incorrect results from translating the code of Figure 4.7 out of SSA
form, considering only the liveness of variables.

4.1.2.2 SSA Back Translation Algorithms

Several algorithms for SSA back translation exist in the literature, but one due to Sreed-

har et al. appears to be most suitable for decompilation.

Because of the usefulness of the SSA form, and the cost of transforming out of SSA

form, decompilers benet from keeping the intermediate representation in SSA form for

as long as possible.

The process of translating out of SSA form is also known as SSA back translation in

the literature. The original algorithm for translation out of SSA form by Cryton et
+
al. [CFR 91] has been shown to have several critical problems, including the so-called

lost copy problem. Solutions to the problem have been given by several authors; the

salient papers are by Briggs et al. [BCHS98] and Sreedhar et al. [SDGS99]. Sassa et al.

attempted to improve on these solutions, but concluded that the algorithm by Sreedhar

et al. produces fewer copies except in rare cases [SKI04].

While the application of these algorithms to decompilation remains the subject of future

work, it would appear that the algorithm by Sreedhar et al. oers the best opportunity

for decompilation. Sreedhar uses both liveness information and the interference graph

to choose where to insert copy statements to implement the semantics of φ-functions.


(Two locations interfere with each other if their live ranges overlap.) He then uses a

coalescing algorithm to coalesce pairs of variables (use the same name). Unique to

Shreedhar's algorithm is the ability to safely coalesce the variables of copy statements

such as x=y where the locations interfere with each other, provided that the coalesced

variable does not cause any new interference.

Figure 4.9 illustrates the coalescing phase. Figure 4.9(a) shows a program in SSA form.

Since none of the operands of the φ-functions interfere with each other, the subscripts

can be dropped as shown in Figure 4.9(b). The standard coalescing algorithm, due to

Chaitin, can not eliminate the copy x=y since they interfere with each other [Cha82].
108 SSA Form

y1=30 y=30
y2=10 x2=20 y=10 x=20 xy=30 xy=10 xy=20
x1=y1 x=y

y3=phi x3=phi
(y1,y2) (x1,x2)
foo(y3) foo(x3) foo(y) foo(x) foo(xy) foo(xy)

(a) Translated SSA form (b) Chaitin coalescing (c) Sreedhar coalescing

Figure 4.9: Sreedhar's coalescing algorithm. Adapted from [SDGS99].

Although the variables x1 and y1 interfere with each other, Sreedhar's algorithm is able
to identify that if x1 and y1 are coalesced, the resulting variable does not cause any
new interferences. Figure 4.9(c) shows the example code after x1, x2, y1, and y2 are

coalesced into one variable xy, and the copy statement x1=y1 is removed.

4.1.2.3 Allocating Variables versus Register Colouring

Translation out of SSA form in a decompiler and register colouring in a compiler have

some similarities, but there are enough signicant dierences that register colouring

cannot be adapted for use in a decompiler.

The role of the register allocator in a compiler, usually implemented by a register colour-

ing algorithm, is to allocate registers to locations in the IR. The number of registers

is xed, and depends on the target machine. It is important that where live ranges

overlap, distinct registers are allocated.

One of the main tasks when translating out of SSA form in a decompiler is to allocate

local variables to locations in the SSA-based IR. The number of local variables has

no xed limit, however for readability, it is important to use as few local variables as

possible. It is important that where live ranges overlap, distinct local variables are

allocated.

From the above two paragraphs, it can be seen that there are similarities and dierences

between register colouring and the translation out of SSA form. The similarities include:

• Resources are allocated to locations in the IR.

• These resources should be allocated sparingly.

• Where live ranges of locations in the IR overlap, an edge is inserted in an inter-

ference graph.
4.1 Applying SSA to Registers 109

• For both compilers and decompilers, φ-functions indicate sets of locations that

it would be benecial to allocate the same resource to. These relations could be

recorded either by a dierent kind of edge in the interference graph, or edges in a

dierent graph (e.g. a unites graph).

The dierences include:

• Compilers allocate machine registers, while decompilers allocate local variables.

• The number of registers is xed, but the number of local variables has no xed

limit.

• Spilling to memory may be required in a compiler when a given graph cannot be

coloured with K colours, where K is the number of available machine registers.

There is no equivalent concept in a decompiler, although the necessity of allocating

new local variables in some circumstances has a supercial similarity.

• For a compiler, a small set of interference graph nodes are pre-coloured, because

some registers must be used for parameter passing or the return value, or because

of peculiarities of the target machine architecture (e.g. the result of a multiplica-

tion appears in edx:eax). For a decompiler, assuming that the original mapping

from locations to local variables is to be preserved as far as possible, the graph is

initially completely pre-coloured.

• For a compiler, the interference graph is always consistent, but until nished it

is not fully coloured. For a decompiler, the graph is always fully coloured but

possibly inconsistent; the task is not to colour the graph, but to remove the

inconsistencies caused by overlapping live ranges.

• For a compiler, it is simplest to implement the φ-functions with copy statements,

and attempt to minimise the copy statements with the coalescing phase of the

register colouring algorithm. The φ-functions or copy statements guide the allo-

cation of registers to IR locations. For a decompiler, φ-functions are used to guide


the allocation of new local variables if needed.

• While a compiler attempts to coalesce nodes in the interference graph (thereby

allocating the same register to a group of assignments), a decompiler must at

times split a group of nodes that currently uses the same local variable, so that

some of the members of the group no longer use the same local variable as the

others. The aim of coalescing is to reduce copy statements; the cost of splitting

is to introduce more copy statements.


110 SSA Form

The dierences are signicant enough that only interference graphs and the broad con-

cept of colouring can be adapted from register colouring to allocating local variables in

a decompiler.

4.1.3 Extraneous Local Variables


While SSA and expression propagation are very helpful in a decompiler, certain patterns

of code induce one or more extraneous local variables, which reduces readability.

Consider the IR fragment of Figure 4.1 from the running example. After exhaustive

expression propagation, the result is as shown in Figure 4.10.


ebx2
ebx1
edx2
edx1
esi2
esi1
st3
st1

loop1:
st1 := O(st0, st3)
edx1 := O(edx0, edx2)
esi1 := O(esi0, esi2)
ebx1 := O(ebx0, ebx2)
edx2 := edx1 - 1
st3 := st1 *f (double)edx1 /f (double)esi1
esi2 := esi1 - 1
ebx2 := ebx1 - 1
goto loop1 if ebx1-1 0
edx3 := (int)(st1 *f (double)edx1 /f (double)esi1)

Figure 4.10: The code from Figure 4.1 after exhaustive expression propagation, showing
the overlapping live ranges.

Note the multiple shaded areas indicating an overlap of live ranges of dierent ver-

sions of the same variable. When propagating st2 edx1 (via


of Figure 4.1, which uses

m[esp0 -28]2 ), past a denition of edx2 , local8 is


allocated, representing edx2 . How-

ever, now the φ-function for edx1 requires a copy, since edx1 can now become edx0 or

local8 depending on the path taken. This necessitates a copy of edx2 at the end of
the loop into local12, and back to edx1 (allocated to local11) at the start of the

loop. Similarly, the live range overlaps of ebx, esi, and st (st is the register rep-

resenting the top of the oating point stack) necessitate copies to variables local10,

local17+local9, and local7 respectively. The resultant generated code is obviously


very hard to read, due to six extra assignment statements in the main loop, as shown

in Figure 4.11.

As well as the extra local variables, the loop condition is changed by one (local10 >= 1
compared with the original c > 0). In other programs, this can be more obvious,
4.1 Applying SSA to Registers 111

do {
local11 = local12;
local10 = local15;
local17 = local13;
local7 = local18;
local8 = local11 - 1;
local18 = local7 * (float)local11 / (float)local17;
local9 = local17 - 1;
local15 = local10 - 1;
local12 = local8;
local13 = local9;
} while (local10 >= 1)
local14 = (int)(local7 * (float)local11 / (float)local17);

Figure 4.11: Generated code from a real decompiler with extraneous vari-
ables for the IR of Figure 4.10. Copy statements inserted before the loop
are not shown.

e.g. i < 10 becomes old_i < 9. The oating point multiply and divide operations

are repeated after the loop, adding more unnecessary volume to the generated code.

Finally, the last iteration of the loop, computing the result in local18, is unused. As

a result of all these eects, the output program is potentially less ecient than the

original. A very good optimising compiler used on the decompiled output might be

able to attain the same eciency as the original program.

The code where these extraneous local variables are being generated contains statements

such as x3 := af(x2 ) inside a loop, where af is an arithmetic function, not a function


call. Examples include x := x - 1 and y := (x*y) + z. Propagation of statements like
this inside a loop will always cause problems, as shown in Figure 4.12(a).

x2 x3 x2 x3
loop: loop:
x2 := φ(x1 ,x3 ) x2 := φ(x1 ,x3 )
... ...
x3 := af(x2 ) x3 := af(x2 )
... ...
print x3 print af(x2 )
(a) Before propagation. (b) After propagation of x3 .
Figure 4.12: Live ranges for x2 and x3 when x3 := af(x2 ) is propagated inside
a loop.

Note that either x2 or x3 is live throughout the loop. If x3 is propagated anywhere in

the loop, e.g. to the print statement, x2 will be live after the assignment to x3 , as shown
in Figure 4.12(b). The assignment to x3 cannot be eliminated, since it has side eects;
112 SSA Form

in other words, x3 is used by the φ-function, hence it is never unused. Overlapping live

ranges for the dierent versions of the same base variable will result in extra variables in

the decompiled output, as noted above. Only assignments of this sort, where the same

location appears on the left and right hand sides, have this property. These assignments

are called overwriting statements .

Note that the opportunity for propagation in this situation is unique to renaming

schemes such as the SSA form. In normal code, overwriting statements cannot be

propagated, since a component of the right hand side (here x2 ) is modied along all

paths from the source to the destination. Propagation of this sort in SSA form is pos-

sible only as a result of the assumption that dierent versions of the same variable are

distinct.

Propagating overwriting statements where the denition is not in a loop is usually not

a problem. The reason is that the propagated expression usually becomes eliminated

as dead code, so that the new version of x does not interfere with other versions of x.
When inside a loop, the new version of x is always used by the φ-function at the top of
the loop.

The problem also occurs when propagating expressions that carry a component of the

overwriting statements across the overwriting statement. Figure 4.13 shows an example

where the non-overwriting statement y := x-1 is propagated across an overwriting

statement such as x := x+1.


x2 x3 x2 x3
loop: loop:
x2 := φ(x1 ,x3 ) x2 := φ(x1 ,x3 )
y1 := x2 - 1
x3 := af(x2 ) x3 := af(x2 )
... ...
print y1 print x2 - 1
(a) Before propagation. (b) After propagation of y1 .
Figure 4.13: Live ranges for x2 and x3 when y1 := x2 - 1 is propagated across
an overwriting statement inside a loop.

Whether an overwriting statement is inside a loop can be determined by whether there

is a component of the right hand side that is dened by a φ-function one of whose

operands is the location dened by the statement. Φ -functions reference earlier values

of a variable; for a location such as x3 in 4.13 to appear as an operand of a φ-function


that denes one of its components, it must be in a loop.

Proposition 4.1: Propagating components of overwriting statements past their de-


nitions inside a loop leads to the generation of extraneous local variables when the code
4.2 Applying SSA to Memory 113

Algorithm 1 Preventing extraneous local variables in the decompiled output due to


propagation of components of overwriting statements past their denitions.

/* Initial: r is a subscripted location that could be propagated into. */


/* Returns: true if this location should be propagated into */
function shouldPropagateInto(r )
src := the assignment dening r
for each subscripted component c of the RHS of r
if the denition for c is a φ-function ph then
for each operand op of ph
opdef := the denition for op
if opdef is an overwriting statement and either
(opdef = src ) or
(there exists a CFG path from src to opdef and
from opdef to the statement containing r ) then
return false
return true

is transformed out of SSA form.

In Figure 4.13(a), x2 is a component of the assignment to y1 . Because x2 appears in an

overwriting statement between the denition of y1 and its use, and the denition of the

overwriting statement (x3 ) is an operand of the φ-function dening x2 , the overwriting

statement is in a loop. Hence, propagating y1 past the overwriting statement (dening

x3 ) would lead to an extraneous variable in the decompiled output.

Algorithm 1 gives the detail of how to prevent extraneous local variables as discussed

in this section. It has been implemented in the Boomerang decompiler; for results see

Section 7.3 on page 230.

4.2 Applying SSA to Memory

As a result of alias issues, memory expressions must be divided into those which are

safe to propagate, and those which must not be propagated at all.

Memory expressions are subject to aliasing, in other words, there can be more than

one way to refer to the location, eectively creating more than one name for the same

location. The meaning of the term name here is dierent to the one used in the

context of SSA renaming or subscripting. Aliasing can occur both at the source code

level and at the machine code level.

At the source code level, the names for a location might include g (a global variable),

*p (where p is a pointer to g ), and param1 (where param1 is a reference parameter

that is in at least some cases supplied with the actual argument g ).


114 SSA Form

At the machine code level, all these aliases persist. Reference parameters appear as

call-by-value pointers at the machine code level, but this merely adds another * (deref-

erence) to the aliasing name. In addition, a global variable might have the names

m[10 000] (the memory at address 10 000), m[r1] where r1 is a machine register and r1
could take the value 10 000, m[r2+r3] where r2+r3 could sum to 10 000, m[m[10 008]]

where m[10 008] has the value 10 000, and so on. There will also be several equivalents

each for *p (e.g. *edx) and param1.

These problems exist regardless of the intermediate representation of the program. As a

result of these additional aliasing opportunities, aliasing is more common at the machine

code level than at the source code level.

The main causes of machine code aliasing are the manipulation of the stack and frame

pointers, and the propensity for compilers to use registers as internal pointers to objects

(pointers inside multi-word data items, not pointing to the rst word of the item).

Internal pointers are particularly common with object oriented code, since objects are

frequently embedded inside other objects.

The following sections will detail the problems and oer some solutions.

4.2.1 Problems Resulting from Aliasing


Alias-induced problems are more common at the machine code level, arising from inter-

nal pointers, a lack of preservation information, and other causes.

Figure 4.14 illustrates the problem posed by internal pointers with a version of the

running example.

Statement 3, which causes one register to become a linear function of another register

containing a pointer, provides an opportunity for memory aliasing. By inspection, the

assignments of statements 14 and 15 are to the same memory location, but the memory

expressions appear to be dierent in Figure 4.14(b). When the memory variables are

placed into SSA form, as shown in Figure 4.14(c), they are treated as independent

variables. As a result, the wrong value (from statement 3) is propagated, and the

decompilation is incorrect.

In this example, statements 2 and 3 have not been propagated into statements 13-

15 for some reason; perhaps they were propagated but too late to treat the memory

expressions of statements 14 and 15 as the same loation. If they had been propagated,

the last two memory expressions would have been m[eax1 + 4], and the problem would
not exist. This is an example where too little propagation causes problems.
4.2 Applying SSA to Memory 115

int main() {
pair<int>* n_and_r = new pair<int>;
...
printf("Choose %d from %d: %d\n", n_and_r->first,
n_and_r->second, comb(n_and_r));
(a) Original source code

1 eax1 := malloc(8) // eax1 points to n_and_r and n_and_r.n


2 esi2 := eax1 // esi1 points to n_an_r and n_and_r.n
3 edi3 := eax1 + 4 // edi3 points to n_and_r.r
...
13 m[esi2 ] := exp1
14 m[esi2 +4] := exp2
15 m[edi3 ] := exp3
26 printf ..., m[esi2 ], m[esi2 +4], ...

(b) Original IR with the call to comb inlined

1 eax1 := malloc(8)
2 esi2 := eax1
3 edi3 := eax1 + 4
...
13 m[esi2 ]13 := exp1
14 m[esi2 +4]14 := exp2
15 m[edi3 ]15 := exp3
26 printf ..., m[esi2 ]13 →exp1 , m[esi2 +4]14 →exp2, ...

(c) Incorrect SSA form

Figure 4.14: A version of the running example using pairs of integers.

Alias-induced problems can also arise from a lack of preservation information. Preser-

vation analysis will be discussed more fully in Section 4.3. Figure 4.15(a) shows code

where a recursive call (statement 30) prevents knowledge of whether the stack frame

register (ebp) is preserved by the call.

Preservation analysis requires complete data ow analysis, and complete data ow

analysis requires preservation analysis for the callee (which, because of the recursion,

is the current procedure, and has not yet had its data ow summarised). As a result of

the lack of preservation knowledge, the frame pointer is conservatively treated as being

dened by the call. Later analysis reveals that ebp is not aected by the call, i.e. ebp30
= ebp3 = esp0 -4, and the memory expression in statement 79 is in fact the parameter

n as read at statement 14 (i.e.m[esp0 +4]0 ). In statement 79, the memory expression


m[ebp30 +8]30 therefore becomes m[esp0 +4]0 . The problem is to ensure that no attempt
is made to propagate to or from the memory location m[ebp30 +8]30 until its address

expression is in some sense safe to use. Another issue is that the data ow analysis

of the whole procedure is not complete until it is known that the memory locations of
116 SSA Form

int rcomb(int n, int r) {


if (n <= r)
return 1;
double res = rcomb(n-1, r);
res *= n;
res /= (n-r);
return (int)res;
} (a) Source code.

14 eax14 := m[esp0 +4]0 ; n


23 m[esp0 -48]23 := m[esp0 +8] ; argument r
25 m[esp0 -52]25 := eax14 -1 ; argument n-1
30 eax30 , ebp30 , m[ebp30 +8]30 := CALL rcomb(...) ; Recurse
...
79 st79 := st76 *f (double) m[ebp30 +8]30 ; res *= n
(b) IR after expression propagation but before preservation analysis.

Figure 4.15: A recursive version comb from the running example, where

the frame pointer (ebp) has not yet been shown to be preserved because

of a recursive call.

statements 14 and 79 are in fact the same location.

4.2.2 Subscripting Memory Locations


If memory locations are propagated, care must be taken to avoid alias issues; however,

not propagating memory locations is by itself not sucient.

The problems with correctly renaming memory expressions leads to the question of

whether memory expressions should be renamed at all. For example, the Low Level

Virtual Machine (LLVM, [LA04]) compiler optimisation framework uses SSA as the

basis for its intermediate representation, however memory locations in LLVM are not

in SSA form (not subscripted). To address this question, consider Figure 4.16.

Figure 4.16(a) shows the main loop of the running example with one line added; this

time variables c and denom are allocated to local memory. Figure 4.16(b) shows the

program in SSA form, ignoring the alias implications of statement 82. The program is

correctly represented, except for one issue. If c and *p could be aliases, the program is
incorrect. Hence, care must be taken if the address of c is taken, and could be assigned

to p.
4.2 Applying SSA to Memory 117

c = n-r;
*p = 7; /* Debug code */
while (c-- > 0) {
res *= num--;
res /= denom--;
}
(a) Source code.

80c80 := n0 - r0 80c := n - r;
81esi81 := p0 81esi81 := p
82m[esi81 ] := 7 // *p = 7 82m[esi81 ] := 7 // *p = 7
84dl84 := (n0 -r0 > 0) 84dl84 := (c > 0)
85c85 := c80 - 1 85c := c - 1
86goto after_loop if dl84 = 0 86goto after_loop if dl84 = 0
→ n0 -r0 <= 0 → c <= 0
loop: loop:
(b) Subscripting and propagating memory (c) Error: No propagation of memory
expressions. expressions.

Figure 4.16: The running example with a debug line added.

Figure 4.16(c) shows the result of propagating only non-memory expressions; the while
loop now tests c after the decrement, which is incorrect.

Note that although no denition of a memory location was propagated, dl (an 8-bit

register, the low byte of register edx) was propagated, and carried the old value of c
(from statement 84) across a denition of c (statement 85). LLVM appears to avoid this

problem by making memory locations accessible only through special load and store
bytecode instructions; all expressions are in terms of virtual registers, never memory

locations directly. Load and store instructions are carefully ordered so that the original

program semantics are maintained. As a result, expressions such as c ≤ 0 (where


c represents a memory location) are not even expressible. (They would load c into a

virtual register and compare that against 0). Decompilers cannot avoid converting some

memory expressions (e.g. m[sp-K]) to variables, but the fact that a variable originated
from memory expressions can be recorded.

One solution would be to not allow propagation of any statements that contain (or

contained in the past) any memory expression components on the right hand side.

However, this has the drawback of making programs dicult to read. For example,

consider a program fragment with the following original source code:

a[i] = b[g] + c[j]; // g is a global variable

The IR and the generated code, if not propagating non-local memory expressions, would

be similar to
118 SSA Form

r1 := m[G] r1 = g;
r2 := m[B+r1] r2 = b[r1];
r3 := m[C+rj] # rj holds j r3 = c[j];
r4 := r1 + r3
m[A+ri] := r4 # ri holds i a[i] = r2+r3;

Obviously, it is desirable to propagate memory expressions at least short distances, so

that these lines can be combined. Naturally, this must be done with alias safety.

4.2.3 Solution
A solution to the memory expression problem is given, which involves propagating only

suitable local variables, and delaying their subscripting until after propagation of non-

memory locations.

The solution to the problem of conservatively propagating memory locations has several

facets. Firstly, only stack local variables whose addresses do not escape the current

procedure are propagated at all. Such local variables are where virtual registers in a

compiler's IR are allocated, apart from those that are allocated to registers. All other

memory expressions, including global variables, array element accesses, and dereferences

of pointers, are not propagated using the usual method.

Naturally, with a suitable analysis of where the address of a local variable is used, the

restriction on propagating local variables whose address escapes the local procedure

could be relaxed. Also, where the address is used inside the current procedure, care

must be taken not to propagate a local variable across an aliased denition. Escape

analysis is much more dicult in a decompiler than in a compiler. For example, to

take the address of local variable foo at the source code level, the expression &foo
or equivalent is always involved. If the address of foo at the machine code level is

sp-12 (where sp sp-12 may


is the stack pointer register), then an expression such as

be involved, or it may actually be used to nd the address of the variable at sp-4,

or the expression sp-24 might be used to forge the address of foo by adding 12 to

that expression. The latter is particularly likely if foo happens to be a member of a

structure whose address is sp-24. Escape analysis is the subject of future work.

For those memory locations that are identied as able to be propagated, their sub-

scripting and hence propagation is delayed until the rst round of propagations and

preservation analysis is complete. This requires a Boolean variable, one for each pro-

cedure, which is set after these propagations and analyses. Before this variable is set,

no attempt is made to subscript or create φ-functions for memory locations. After the
4.3 Preserved Locations 119

Boolean is set, the subscripting and φ-function inserting routines are called again. Only

local variable locations identied as suitable for propagation are subscripted.

For propagation over short distances to avoid temporary variables as noted at the end

of the last section, a crude propagation could be attempted one statement at a time,

checking each component propagated for alias hazards associated with the locations

dened in the statement being propagated across. This would allow propagation of

array elements, as per the example at the end of the last section.

Figure 4.17 shows the result of applying the solution of this section to the problem

shown in Figure 4.14.

1 eax1 := malloc(8)
2 esi2 := eax1
3 edi3 := eax1 + 4
...
13 m[eax1 ]13 := exp1
14 m[eax1 +4]14 := exp2
15 m[eax1 +4]15 := exp3
26 printf ..., m[eax1 ]13 →exp1 , m[eax1 +4]15 →exp3

Figure 4.17: The example of Figure 4.14 with expression propagation before
renaming memory expressions.

Now, expression propagation has replaced esi2 +4 and edi3 in statements 13-15 with
the more fundamental expressions involving eax1 . As a result, when SSA subscripting is

re-run, the two assignments in statements 14 and 15 are treated correctly as assignments

to the same location, and the correct result is printed.

4.3 Preserved Locations

The ability to determine whether a location is preserved by a procedure call is important,

because preserved locations are an exception to the usual rule that denitions kill other

denitions.

In data ow analysis, a denition is considered to be killed by another denition of the

same location. There is a change of the expression currently assigned to the location de-

ned in the rst denition, which normally means that this rst value is lost. However,

this is not always the case. A common example is in a procedure, where callee-save

register locations are saved near the start of the procedure, and restored near the end.

The register is said to be preserved. Between the save and the restore, the procedure

uses the register for various purposes, usually unrelated to the use of that register in

its callers.
120 SSA Form

Whether a location is preserved or not is often specied in a convention called the

application binary interface (ABI). However, there are many possible ABIs, and the one

in use can depend on the compiler and the calling convention of a particular procedure.

In addition, a compiler may change the calling convention for calls that are not visible

outside a compilation module, e.g. to use more registers to pass paremeters. Assembler

code or binary patches may also have violated the convention. This is particularly

important if the reason for decompiling is security related. Hence, the safest policy is

not to make ABI assumptions, hence preservation analysis is required.

A pop instruction, or load from stack memory in a RISC-like architecture, which re-

stores the register is certainly a denition of the register. The unrelated denitions of

the register between the save and the restore are certainly killed by the restore. How-

ever, any denitions reaching the start of the procedure in reality reach the end of the

procedure. Since the save and restore are on all possible paths through the procedure,

denitions available at the start of the procedure are also available at the end.

Figure 4.18 shows the eect of ignoring the restoration of a variable.

a := 5 a := 5;
x := a x := 5 x := 5; x := 5;
call proc1 proc1(); a := proc1(5); proc1(a); // ref param
y := a y := 5; y := a; y := a;
(a) Input (b) Ideal (c) Decompilations resulting from ignoring
program decompilation the restoration of variable a in proc1
Figure 4.18: The eect of ignoring a restored location. The last example uses a call-
by-reference parameter.

Either the decompiled output has extra returns, or extra assignments. More impor-

tantly, the propagation of constants and other transformations which improve the out-

put (e.g. y := 5) are prevented.

It may be tempting to reason that the saving and restoring of registers at the beginning

and end of a procedure is a special case, deserving special treatment, as in [SW93].

However, compilers may spill (save) registers at any time, and assembler code could

save and restore arbitrary locations (such as local or global variables). Treating push
and pop instructions as special cases in the intermediate representation is adequate for

architectures featuring such instructions (e.g. the dcc [Cif94] compiler does this), but

does not solve the problem for RISC machine code.

Saving of locations could occur without writing and reading memory, as shown in Figure

4.19. In this contrived example, an idiomatic code sequence for swapping two registers

without using memory or a temporary register has been used.


4.3 Preserved Locations 121

b := xor a, b ; b = aold ^ bold (^ is xor op.)


a := xor a, b ; a = aold ^ (aold ^ bold ) = bold
b := xor a, b ; b = bold ^ (aold ^ bold ) = aold
a := 5 ; define a (new live range)
print(a) ; use a
b := xor a, b ; swap...
a := xor a, b ; ... a and b ...
b := xor a, b ; ... again
ret ; return to caller

Figure 4.19: Pseudo code for a procedure. It uses three xor instructions to
swap registers a and b at the beginning and end of the procedure. Eectively,
register a is saved in register b during the execution of the procedure.

By itself, expression propagation cannot show that the procedure of Figure 4.19 pre-

serves variable a. The rst statement could be substituted into the next two, yielding

a=b and b=a except that a is redened on the path from the rst to the third statement.
A temporary location is required to represent a swap as a sequence of assignments. The

combination of expression propagation and SSA form can be used to analyse this triplet

of statements, as shown in Figure 4.20.

b1 = a0 ^ b0 ; b 1 = a0 ^ b0
a1 = a0 ^ b1 ; a1 = a0 ^ (a0 ^ b0 ) = b0
b2 = a1 ^ b1 ; b2 = b0 ^ (a0 ^ b0 ) = a0
a2 = 5 ; dene a (new live range)
print(a2 ) ; use a print(5)
b3 = a2 ^ b2 ; b3 = 5 ^ a0
a3 = a2 ^ b3 ; a3 = 5 ^ (5 ^ a0 ) = a0 : preserved
b4 = a3 ^ b3 ; b4 = a0 ^ (5 ^ a0 ) = 5 : overwritten b4 = 5
ret ; return to caller return b4
(a) Intermediate representation (b) After prop. & DCE

Figure 4.20: Pseudo code for the procedure of Figure 4.19 in SSA form. Here it is
obvious (after expression propagation) that a is preserved, but b is overwritten.

In this example, it is clear that a is preserved (the nal value for a3 is a0 ), but b is
overwritten (the nal value for b4 is 5, independent of b0 ). Eectively, b1 is used to

save and restore a0 (although combined in a reversible way with b0 ). b is a potential

return, despite the fact that the assignment is to the variable a. If no callers use b, it

can be removed from the set of returns, but not from the set of parameters. Figure 4.21

shows the procedure of Figure 4.19 with two extra statements.

With propagation and dead code elimination, it is obvious that a and b are now pa-

rameters of the procedure (since they are variables with zero subscripts).
122 SSA Form

b = xor a, b b1 = a0 ^ b0
c = b c1 = a0 ^ b0
a = xor a, b a1 = b0
b = xor a, b b2 = a0
a = 5 a2 = 5
print(a) print(5) print(5)
print(c) print(a0 ^ b0 ) print(a0 ^ b0 )
b = xor a, b b3 = 5 ^ a0
a = xor a, b a3 = a0
b = xor a, b b4 = 5 b4 = 5
ret ret return b4
(a) Original code (c) After propagation and DCE
(b) SSA form

Figure 4.21: The procedure of gure 4.19 with two extra statements.

The above example illustrates the power of the SSA form, and how the combination of

SSA with propagation and dead code elimination resolves many issues.

All procedures can trivially be transformed to only have one exit, and a denition

collector (see Section 4.5) can be used to nd the version of all variables reaching the

exit. This makes it easy to nd the preservation equation for variable x: it is simply

x0 =xe , where e is the denition reaching the exit, provided by the denition collector

in the return statement. With some manipulation and a lot of simplication rules, this

equation can be resolved to true or false. This algorithm for determining the preservation

of locations using the SSA form was rst implemented in the Boomerang decompiler

by Trent Waddington [Boo02].

In the absence of control ow joins, expression propagation will usually make the deci-

sion trivial, as in Figure 4.21, where the preservation equation for a is a3 =a0 , and the

denition reaching the exit is the assignment a3 := a0 . Substituting the assignment


into the equation yields a0 =a0 , which simplies to true. At the control ow joins, there

will be φ-functions; the equation is checked for each operand of the φ-function.

Figure 4.22 shows an example with a φ-function.


The preservation equation is ebx0 =ebx2 . Substituting the assignment ebx2 := ebx1 -1
into the equation yields ebx0 =ebx1 -1 . It is not yet impossible that ebx is preserved;

for example, if the denition for ebx1 happened to be ebx0 +1, ebx would indeed be

preserved by procedure comb. However, the denition in the example is the φ-function

φ(ebx0 , ebx2 ). The current equation ebx0 =ebx1 -1 is therefore split into two equa-
tions, one for each operand of the φ-function, substituting into ebx1 of the current

equation. The resultant equations are ebx0 =ebx0 -1 and ebx0 =ebx2 -1, both of which

have to simplify to true for ebx to be analysed as preserved by comb. In this case, the

rst equation evaluates to false, so there is no need to evaluate the second. Note that
4.3 Preserved Locations 123

comb: ...
loop:
ebx1 := φ(ebx0 , ebx2 )
...
ebx2 := ebx1 - 1
goto loop1 if ebx2 >= 0
...
return {Reaching definitions: ebx2 , ...}

Figure 4.22: A version of the running example with the push and pop of ebx
removed, illustrating how preservation analysis handles φ-functions.

if the second equation had to be evaluated, there is a risk of innite recursion that

has to be carefully avoided, since ebx2 is dened in terms of ebx1 and ebx1 is partly

dened in terms of ebx2 . Maintaining a set of locations already processed is sucient.

It is possible to encounter  φ loops that involve several φ-functions, and which are not

obvious from examination of the program's IR [BDMP05].

Section 7.5 on page 234 describes the implementation of an equation solver in the

Boomerang decompiler which uses the above techniques. The presence of recursion

complicates preservation analysis, as will be shown in Section 4.4.2.

4.3.1 Other Data Flow Anomalies


Decompilers need to be aware of several algebraic identities both to make the decompiled

output easier to read, and also to prevent data ow anomalies.

Just as preserved locations appear to be used when for all practical purposes they are

not, there are several algebraic identities which appear to use their parameters, but

since the result is a constant, do not in reality use them. These include:

x - x = 0 x ^ x = 0 (^ = xor) x | ~x = -1

x & 0 = 0 x & ~x = 0 x | -1 = -1

To these could be added x ÷ x = 1 and x mod x = 0 if it is known that x 6=0. For

each of these, a naive data ow analysis will erroneously conclude that x is used by

the left hand side of each of these identities. The exclusive or and subtract versions

are commonly used by compilers to set a register to zero. Decompilers need to be

aware of such identities to prevent needless overestimating of uses. The constant result

is shorter than the original expression, thereby making the decompiled output easier

to read. Often the constant result can combine with other expressions to trigger more

simplications. This problem is readily solved by replacing appropriate patterns by their


124 SSA Form

constant values before data ow analysis is performed. These changes can be made as

part of the typical set of simplications that can be performed on expressions, such

as x + 0 = x.

4.3.2 Final Parameters


Final parameters are locations live on entry after preservation analysis, dead code elim-

ination, and the application of identities.

Equation 3.6 on page 84 states that the initial parameters are the locations live on entry

at the start of the procedure after intersecting with a parameter lter. Two exceptions

have been seen. Firstly, preserved locations are often not parameters, although they

could be. Figure 4.23 shows how propagation and dead code elimination combine to

expose whether preserved location loc is used as a parameter or not.

save1 := loc0 save1 := loc0


... ... ...
use(loc0 ) // If present use(loc0 ) // If present use(loc0 ) // If present
... ... ...
loc99 := save1 loc99 := loc0
return return return

(a) Initial (b) After propagation (c) After DCE

Figure 4.23: Analysing preserved parameters using propagation and dead code
elimination.

Secondly, locations involved in one of the above identities are never parameters. For

example, if the rst use of register eax in a procedure appears to be used in a register

which performs an exclusive-or operation on eax with itself, e.g. eax := eax ^ eax, then
eax is not a parameter. Application of the identity x ^ x = 0 solves this problem.

Hence, the equation for nal parameters is in fact the same as that for initial parameters

(Equation 3.6), but parameters are only nal after propagation, dead code analysis, and

the application of identities.

4.3.3 Bypassing Calls


The main reason for determining whether locations are preserved is to allow them to

bypass call statements in caller procedures.

Figure 4.24 shows the IR for a fragment of the recursive version of the running example

in a little more detail than that of Figure 4.15.


4.3 Preserved Locations 125

int rcomb(int n, int r) {


if (n <= r)
return 1;
double res = rcomb(n-1, r);
res *= n;
res /= (n-r);
return (int)res;
}
(a) Source code.

3 ebp3 := esp0 - 4
11 goto to L2 if eax7 > m[r280 + 8]
12 m[esp0 - 32] := 1
13 GOTO L2
L1:
14 eax14 := m[esp0 +4]0 ; n
23 m[esp0 -48]23 := m[esp0 +8] ; argument r
25 m[esp0 -52]25 := eax14 -1 ; argument n-1
30 eax30 , ebp30 , m[ebp30 +8]30 := CALL rcomb ; Recurse
Reaching denitions: ... esp= esp0 -56, ebp := esp0 -4, ...
79 st79 := st76 *f (double) m[ebp30 +8]30 ; res *= n
L2:
191 ebp191 := φ(ebp3 , ebp30 )
184 esp184 := ebp191 + 8
185 return
(b) Before bypassing. Many statements are removed for simplicity, including
those which preserve and restore ebp.
...
30 eax30 := CALL rcomb ; Recurse
79 st79 := st76 *f (double) m[esp0 +4]0 ; res *= n
L2:
191 ebp191 := esp0 - 4 ; φ-function now an assignment
184 esp184 := esp191 + 8 → esp0 + 4 ; esp is preserved
185 return
(c) After bypassing.

Figure 4.24: Part of the IR for the program of Figure 4.15.

Note that the preservation of the stack pointer esp depends on the preservation of

another register, the frame pointer ebp. The fact that the call is recursive is ignored

for the present; the eect of recursion on preservation will be addressed soon. For

the purposes of this example, it will be assumed that after preservation analysis, it is

determined that ebp is unchanged and esp is incremented by 4. (The addition of 4

comes about from the fact that the stack is balanced throughout the procedure except
126 SSA Form

for the return statement at the end, which pops the 32-bit (4-byte) return address from

the stack. X86 call statements have a corresponding decrement of the stack pointer

by 4, where this return address is pushed. Hence, incrementing esp by 4 is the x86

equivalent of preservation.)

The φ-function now has two parameters, ebp3 and ebp30 . Ebp30 is the value of ebp after
the call; since ebp was found to be preserved by the call, all references to ebp30 can be

replaced by the value that reaches the call, which is esp0 -4 (reaching denitions are

stored in calls by a collector; see Section 4.5). Hence, both operands of the φ-function

evaluate to esp0 -4. There is no longer any need for the φ-function, so it is replaced

by an assignment as shown. Eectively, the value of ebp has bypassed the call (it is as

if the call did not exist) and also bypassed the φ-function caused by the control ow

merge at L2.

4.4 Recursion Problems

Recursion presents particular problems when determining the preservation of locations,

and the removing of unused parameters and returns.

The presence of recursion in a program complicates the ordering of many decompilation

analyses. The most important analysis aected by recursion is preservation analysis.

Following subsections discuss how to determine the best order to process procedures in

when they are involved with recursion, and how to perform the preservation analysis.

A nal subsection discusses the problem of removing unused parameters and returns,

which is also complicated by recursion.

Decompilers have the whole input binary program available at once, unlike compilers,

which typically see source code for only one of potentially many source modules. A

decompiler therefore potentially has access to the IR of the whole program at once.

Denitions and uses in a procedure depend on those in all child procedures, if any, with

the exception that a procedure's returns depend on the liveness of all callers. As has

been mentioned earlier, this must be done after all procedures are analysed. It makes

sense therefore to process procedures in a depth rst ordering (of the call graph, the

directed graph of procedures, where a callee is a child of a caller). In a depth rst

ordering, child nodes are processed before parent nodes, so the callers have the data

ow summary of callees (as per Section 3.4) available to them. The main summary

information stored in callees that is of interest to callers are the modieds set, possibly

stored in the return statement of the callee, and the set of live locations at the callee

entry point.
4.4 Recursion Problems 127

However, it is only possible to access the summary information of every callee if the

call graph is a tree, i.e. there are no cycles in the call graph introduced by recursive

procedures. Many programs have at least a few recursive procedures, and sometimes

there is mutual recursion, which causes larger cycles in the call graph. Figure 4.25

shows an example of a program call graph with many cycles; there is self recursion and

mutual recursion.

Figure 4.25: A small part of the call graph for the 253.perlbmk SPEC CPU2000
benchmark program.

From this program fragment, several cycles are evident, e.g.

3 → 3 (direct recursion), 7 → 8 → 7 (mutual recursion),

2 → 3 → 4 → 5 → 6 → 2 (a long mutual recursion cycle), and

2 → 3 → 4 →6 → 2.

These cycles imply that during analysis of these procedures, an approximation has to be

made of the denitions and uses made by the callees in recursive calls. Since unanalysed

indirect calls also have no callee summary available, they are treated similarly. Both

types of calls will be referred to as childless calls. Section 3.6 concluded that it is

safe to overestimate the denitions and uses of a procedure when there is incomplete

information about its actual denitions and uses. Therefore, it can be assumed that all

denitions reaching a recursive call are used by the callee (all locations whose denitions

reach the call are considered live), and all live locations at the call are denes (dened

by the callee), according to Equations 3.12 and 3.13 respectively on page 85. In eect,

all childless calls (calls for which the callee is not yet analysed, including recursive calls

and unanalysed indirect calls) become of this form:

<all> := CALL childless_callee(<all>)


128 SSA Form

where <all> is a special location representing all live locations (the leftmost <all>)
and all reaching denitions (the rightmost <all>).

This is a quite coarse approximation, but it is safe. Preservation analysis is the main

analysis to be performed considering together all the procedures involved in mutual re-

cursion. In eect, preservation and bypassing rene the initially coarse assumption that

everything is dened in the callee. For example, in Figure 4.24, the initial assumption

that register ebp is dened at the recursive call is removed, once it is determined by

preservation analysis that ebp is preserved by the call, and is therefore eectively not

dened by the call.

Similarly, before preservation analysis, it is assumed that register ebx is used by the

procedure rcomb, i.e. it is live at the call to rcomb. In reality, ebx is pushed at the

start of rcomb and popped at the end, and not used as a parameter. In other words,

it is not actually used by the call, so that denitions of ebx before the call not used

except by the call are in reality dead code. After preservation analysis and dead code

elimination, the real parameters of rcomb can be found (recall that preserved locations
appear to be parameters until after dead code elimination). Similarly to the situation

with denitions, the coarse assumption of all locations being used by the recursive call

can be rened after preservation analysis and call bypassing.

All the procedures involved in a recursion cycle can have the usual data ow analyses

(expression propagation, preservation analysis, call bypassing, etc) applied with the

approximation of childless calls using and dening everything, which will result in a

summary for the procedures (locations modied and used by the procedures).

4.4.1 Procedure Processing Order


An algorithm is given for determining the correct order to process procedures involved

in mutual recursion, which maximises the information available at call sites.

Figure 4.26 shows another call graph. In this example, i should be processed before

h, and h before c. Procedures involved in independent cycles such as f-g-f can be

processed independently of the other cycles. Both f and g are processed together using

the approximations mentioned above, and the data ow analyses are repeated until

there is no change.

However, procedures such as j and k, while part of only one cycle, must be processed

with the larger group of nodes, including b, c, d, e, j, k, and l. The dierence arises

because f and g depend only on each other, and so once processed together, all infor-

mation about f and g is known before c is processed. Nodes j and k, however, are not
4.4 Recursion Problems 129

d f h j l

e g i k

Figure 4.26: A call graph illustrating the algorithm for nding the correct
ordering for processing procedures.

complete until e is complete; e depends on c, which in turn depends on j and k, but

also l, which depends on b as well. The order of processing should be as follows: visit

a, b, c, d, e, f and g ; group process f and g ; visit h and i ; process i ; process h ; visit j,

k, l and b ; group process b, c, d, e, j, k, and l ; process a.

The procedures that have to be processed as a group are those involved in cycles in

the call graph, or sets of cycles with shared nodes. For these call graph nodes, there

is a path from one procedure in the group to all the others, and from all others in the

group to the rst. In other words, each recursion group forms a strongly connected

component of the call graph. In the example graph, there is a path from c to f, but

none from f to c (the call graph is directed), so f is not part of the strongly connected

component associated with c. Nodes l and e are part of the same strongly connected

component, because there is a path from each to the other (via their cycles and the

shared node c ).

Algorithm 2 gives a general algorithm for nding strongly connected components in

a graph. It has been shown that this algorithm is linear in the size of the graph

[Gab00]. Decompilers typically do not store call graphs directly, but the algorithm can

be adapted for decompilation by only notionally contracting vertices in the graph that

are associated with cycles.

Algorithm 3 shows an algorithm for nding the recursion groups and processing them

in the correct order.

When the child procedure does not cause a new cycle, the recursive call to decompile
130 SSA Form

Algorithm 2 General algorithm for nding strongly connected components in a graph.


From Algorithm 10.1.5 of [GY04].

Input: directed graph G = (V, E ).


Output: strongly connected components of G.
Repeat until G has no more vertices:
grow a depth-rst path P until a sink or a cycle is found.
sink s : mark {s } as a strongly connected component and delete s from P and G.
cycle C : contract the vertices of C.

performs a depth rst search of the call graph. In the example call graph of Figure

4.26, procedure i is decompiled rst, followed by h. When the algorithm recurses back

to c, cycles are detected. For example, when child f of c is examined, no new cycles are

found, so f is merely visited. When child g of f is examined, the rst and only child

f is found to have been already visited. At this point, path contains a-b-c-f-g. The

fact that f is in path indicates a new cycle. The set child, initially empty, is unioned

with the set of all procedures from f to the end of path (i.e. f and g ). As a result, g

is not decompiled yet, and decompile returns with the set {f, g }. At the point where

the current procedure is f, the condition at Note 1 is true, so that f and g are analysed

as a group. (The group analysis will be described more fully in Section 4.4.2 below.)

Note that the cycle involving c and six other procedures is still being built. Assuming

that c 's children are visited in left to right order, a, b, c, d, and e have been visited but

otherwise not processed.

When j 's child e is considered, it has already been visited, but is not part of path

(which then contains a-b-c-j-k ). However, e has already been identied as part of a

cycle c-d-e-c, so c>cycleGrp contains the set {c, d, e }. The rst element of path that

is in this set is c, so all procedures after c to the end of path (i.e. {j, k }), are added to

child, which becomes {c, d, e, j, k }. Finally, when l 's child b is found to be in path,

the recursion group becomes {b, c, d, e, j, k, l }. Each of the procedures involved in

this cycle have their cycleGrp members point to the same set.

4.4.2 Conditional Preservation


A method for extending the propagation algorithm to mutually recursive procedures is

described.

Consider rst a procedure with only self recursion, i.e. a procedure which calls itself

directly. When deciding whether a location is preserved in some self recursive proce-

dure p, there is the problem that at the recursive calls, preservation for location l will

usually depend on whether l is preserved by the call. There may be copy or swap

instructions such that l is preserved if and only if some other location m is preserved.
4.4 Recursion Problems 131

Algorithm 3 Finding recursion groups from the call graph.


/* Initial: cycleGrp is a member variable (an initially empty set of procedure pointers)
path is an initially empty list of procedures, representing the call path from the
current entry point to the current procedure, inclusive.
Returns: a set of procedure pointers representing cycles associated with the current
procedure and all its children */
function decompile (path ) /* path initially empty */
child := new set of proc pointers /* child is a local variable */
append this proc to path
for each child c called by this proc
if c has already been visited but not nished then
/* have new cycle */
if c is in path then
/* this is a completely new cycle */
child := child ∪ {all procs from c to end of path inclusive}
else
/* this is a new branch of an existing cycle */
child := c >cycleGrp
nd rst element f of path that is in cycleGrp
insert every proc after f to the end of path into child
for each element e of child
child := child ∪ e >cycleGrp
e >cycleGrp := child
else
/* no new cycle */
tmp := c >decompile(path ) /* recurse */
child := child ∪ tmp
set return statement in call to that of c
if child empty then
child := earlyDecompile()
removeUnusedStatments() /* Not involved in recursion */
else
/* is involved in recursion */
nd rst element f in path that is also in cycleGrp
if f = this proc then /* Note 1 */
recursionGroupAnalysis(cycleGrp )
/* Do not add these procs to the parent's cycles */
child := new set of proc pointers
remove last element (= this proc) from path
return child

This will be considered below. The location l will be preserved by the call if the whole

procedure preserves l, but the whole procedure's preservation depends on many parts

of the procedure, including preservation of l at the call. This is a chicken and egg prob-

lem; the innite recursion has to be broken by making some valid assumptions about

preservations at recursive calls.


132 SSA Form

Preservation is an all-paths property; a location is preserved by a procedure if and

only if it is preserved along all possible paths in the procedure. When the preservation

depends on the preservation of l in p at a recursive call c, the preservation at c will

have to succeed for the overall preservation to succeed. A failure along any path will

result in the failure of the overall preservation. Figure 4.27 shows a simplied control

ow graph for the recursive version of the running example program, which illustrates

the situation.

proc rcomb:

n <= r?

call rcomb ?

Return

Figure 4.27: A simplied control ow graph for the program of Figure 4.15.

The ticked basic blocks are those which are on a path for which a particular location is

known to be preserved. In some cases, several of the blocks have to combine, e.g. with

a save of the location in the entry block and a restoration in the return block. However,

the result is the same. The only block with a question mark is the recursive call. If

that recursive call preserves the location, then the location is preserved for the overall

procedure. However, the recursive call does not add any new control ow paths; it will

only lead to ticked blocks or the recursive call itself. The call itself does not prevent

any locations from being preserved, so the location truly is preserved.

Hence, for the purpose of preservation analysis, the original premise can safely be

assumed to succeed until shown otherwise. In eect, the problem is l preserved in

recursive procedure p  is answered by asking the question assuming that l is preserved

by calls to p in p, is the location l preserved in p ?. This assumption breaks the innite

recursion, so that the problem can be solved.

Consider now the situation where the preservation of l in p depends on the preservation

of some other location m in p. This happens frequently, for example, when a stack

pointer register is saved in a frame pointer register, so that preservation of the stack

pointer in p depends on preservation of the frame pointer in p. It is not safe to assume

that given the premise that l is preserved in p, m must automatically be preserved in


4.4 Recursion Problems 133

p ; this has to be determined by a similar analysis.

The analysis recurses with the new premise, i.e. that both l and m are preserved. The

dierence now is that a failure of either part of the premise (i.e. if either l or m are

found not to be preserved) causes the outer premise to fail (i.e. l is not preserved).

There may be places where the preservation of m in p requires the preservation of l in

p ; if so, this can safely be assumed, as described above. There may also be places where

the preservation of m in p depends on the preservation of m in p, which would lead

to innite recursion again. However, given that the current goal is to prove that m is

preserved in p, by a similar argument to the above, it is safe to make that assumption,

breaking the innite recursion.

In order to keep track of the assumptions that may safely be made, the analysis main-

tains a set of required premises. This set could be thought of as a stack, with one

premise pushed at the beginning of every analysis, and popped at the end. This stack

is unique to the question of the preservation of l in p ; other preservation questions

require a separate stack. Each individual assumption (e.g. that m is preserved in p ) is

not valid until the outer preservation analysis is complete. In other words, each premise

is a necessary but not sucient condition for the preservation to succeed. Since the

number of procedures and locations involved in mutual recursion is nite, this analysis

will eventually terminate.

Consider now mutual recursion, where p calls q and possibly other procedures, one or

more of which eventually calls p. When the preservation analysis requires that m is

preserved by q, this premise is added to the stack of required premises, as above. The

dierence now is that the elements of the stack have two components: the location that

is assumed preserved, and the procedure to which this assumption applies. In all other

respects, the algorithm is the same.

Note how the preservation analysis requires the intermediate representation for each

of the procedures involved in the mutual recursion to be available. In the example

of Figure 4.26, when checking for a location in procedure f, there will at some point

be a call to g, so the IR for g will need to be available. To prevent another innite

recursion, the analyses are arranged as follows. When Algorithm 3 nds a complete

recursion group, early analyses are performed separately on each of the procedures

in the recursion group. This decodes each procedure, inserting denitions and uses at

childless calls as described earlier. Once each procedure has been analysed to this stage,

middle analyses are performed on each procedure with a repeat until no change for the

whole group of middle analyses. Preservation analysis the main analysis applied during

middle analyses. When this process is complete, each procedure in the recursion group

is nished o with late analyses, involving dead code elimination and the removal of
134 SSA Form

unused parameters and returns.

The repeat until no change aspect of middle analyses suggests that perhaps preservation

analysis could be viewed as a xedpoint dataow algorithm. The t is not a good one,

but two possible lattices for this process are shown in Figure 4.28.

Preservation unknown Preservation unknown

Pres. Pres. Pres. Pres.


... ...
Location is preserved +0 +4 +8 +128

Location is not preserved Location is not preserved


(a) (b)

Figure 4.28: Two possible lattices for preservation analysis.

Figure 4.28(b) is remiscent of the textbook lattices for constant propagation as a xed-

point data ow transformation, or Fig. 1 of [Moh02]. Pres. +0 indicates that the

location of interest has been found to be preserved along some paths with nothing added

to it; Pres. +4 indicates that it is preserved but has 4 added to it, and so on. If along

some path the location is found not to be preserved, or to be preserved by adding a

dierent constant than the current state indicates, the location's state moves to the

bottom state (not preserved).

Section 7.5.1 on page 235 demonstrates the overall preservation analysis algorithm (not

lattice based) in practice, using the Boomerang decompiler.

4.4.3 Redundant Parameters and Returns


The rule that a location which is live at the start of a function is a parameter sometimes

breaks down in the presence of recursion, necessitating an analysis to remove redundant

parameters. An analogous situation exists for returns.

After dead code elimination, the combination of propagation and dead code elimination

will have removed uses and denitions of preserved locations that are not parameters.

As a result, no locations live at the entry are falsely identied as parameters due to saves

of preserved locations. However, there may still be some parameters whose only use

in the whole program is to supply arguments to recursive calls. Consider the example

program in Figure 4.29.


4.4 Recursion Problems 135

int, int a(p, q) {


... /* No use of p , q , r or s*/
r, s := b(p, q); /* b is always called by a */
... /* No use of p , q , r or s */
return r, s;
}
int,int b(p, q) {
if (q > 1) /* q is used other than in a call to a */
r, s := a(p, q-1);
else
... /* No use of p , q , r , or s */
print(r); /* Use r */
return r, s;
}
Figure 4.29: Example program illustrating that not all parameters to recursive
calls can be ignored.

Procedures a and b are mutually recursive, and the recursion is controlled entirely in b.

The question is what algorithm to use to decide whether any of the parameters of a or b

are redundant and can therefore be removed. By inspection, p is a redundant parameter

of both a and b, but q is not. Parameter p is used only to pass an argument to the

call to b, and in b it is used only to pass an argument to the call to a. By removing

the parameter p from the denitions of a and b and the calls to them, the program is

simplied and becomes more readable. By contrast, q is used gainfully inside b, so it is

also needed in a (to pass an argument to the call to b ).

Note that by considering procedure a in isolation, there is no apparent dierence in the

redundancy or otherwise of p and q. Hence, for each procedure involved in recursion,

each parameter has to be considered separately, recursive calls have to be followed,

and only locations whose only uses in the whole program are as arguments to calls to

procedures in the current recursion group can be considered redundant. For example,

consider procedure a in Figure 4.29. It currently has two parameters, p and q. When

a recursive call is encountered, the analysis needs to consider the callee (here b ) and if

necessary all its callees, for calls to a. During this analysis, parameter p is only used as

an argument to a call to a, so it is redundant. However, q is used in the if statement,

so it is not redundant.

A symmetrical situation exists with redundant returns. Considering procedure a in

isolation, there is no apparent dierence in the redundancy or otherwise of r and s.

However, while s is used only by return statements in recursive procedures (the return

statements in a and b ), r is used gainfully (in the print statement in b ). Hence, s is a

redundant parameter but r is not. For each procedure involved in recursion, each return
136 SSA Form

component has to be considered separately (e.g. in return c+d, consider components

c and d separately), recursive calls have to be followed, and only return components

dened by calls to other procedures in the current cycle group that are otherwise unused

can be considered redundant.

Once the intermediate representation for all procedures is available, (this is a late

analysis, in the sense of Section 4.4.2), unused returns can be removed by searching

through all callers for a liveness (use before denition) for each potential return location.

Livenesses from arguments to recursive calls or return statements in recursive calls are

ignored for this search. When the intersection of all considered livenesses is empty, the

return can be removed from the callee. Removing the returns removes some uses in

the program, so that some statements and some parameters may become unused as a

result.

For example, when a newly removed return is the only use of a dene in a call statement,

it is possible that the last use of that dene has now been removed; this will happen

if no other call to that callee uses that location. Hence the unused return analysis

has to be repeated for that callee. In this way, changes resulting from unused returns

propagate down the call graph. When parameters are removed, the arguments of all

callers are reduced. When arguments are reduced, uses are removed from the caller's

data ow. As a result of this, the changes initiated from removing unused parameters

propagate up the call graph. Hence, there is probably no best ordering to perform the

removal of unused parameters and returns.

Section 7.6 on page 240 demonstrates this algorithm in practice, using the Boomerang

decompiler.

4.5 Collectors

Collectors, a contribution of this thesis, extend the sparse data ow information provided

by the Static Single Assignment form in ways that are useful for decompilers, by taking

a snapshot of data ow information already computed.

The SSA form provides use-def information (answering the question what is the deni-

tion for this use?) and if desired also def-use information (answering the question what

are the uses of this denition?). Sometimes reaching denitions are required, which

answers what locations reach this program point, and what are their denitions?. The

SSA form by itself can answer the second part of the question, but not the rst part.

During the conversion of a program to SSA form, however, this information is available,

but is discarded during conversion (the required information is popped o a stack).


4.5 Collectors 137

Decompilers only need reaching denitions at two sets of program points: the end of

procedures, and at call sites. Live variables are also needed at calls, and this information

is also available during conversion. Despite this, SSA is generally considered in the

literature to be unsuitable for backwards data ow information such as live variables

[CCF91, JP93].

These observations are the motivation for denition collectors , which capture reaching

denitions, and use collectors , which capture live variables. The collectors store in-

formation available during the standard variable renaming algorithm, when it iterates

through all uses in a statement. In eect, collectors are exceptions to the usually sparse

storage of data ow information in SSA form. The additional storage requirements are

modest because they are only needed at specic program points: the end of procedures,

and at call statements, as shown in Figure 4.30.

Call statement Procedure object


(caller) (start of callee)
Def collector For bypass,
Use collector contexts,
Def collector For modifieds
arguments*
For results, Return statement
defines* *Childless calls only (end of callee)

Figure 4.30: Use of collectors for call bypassing, caller and callee contexts,
arguments (only for childless calls), results, denes (also only for childless calls),
and modieds.

Algorithm 4 shows the algorithm for renaming variables, as per [App02], modied for

updating collectors, and for subscripting with pointers to dening statements rather

than consecutive integers.

Where the algorithm reads if can rename a , it is in the sense given in Section 4.2.3,

i.e. a is not a memory location, or it is a suitable memory location and memory locations

are being propagated for the current procedure.

Recall that at childless calls (call statements whose destination is unknown, or has not

had complete data ow analysis performed to summarise the denitions and uses), all

locations are assumed to be dened and used. To implement this successfully, Appel's

algorithm is extended. When a childless call is encountered, a denition by the call is

pushed to all elements of Stacks (note that Stacks is an array of stacks, one stack for

each location seen so far). This slightly complicates removal of these elements, adding

the requirement, absent in Appel's algorithm, that the statements be considered in

reverse order at Note 1.


138 SSA Form

Algorithm 4 Renaming variables, with updating of collectors. Adapted from [App02].


Rename(n ) =
for each statement S in block n
for each use of some location x in S
if cannot rename x /* Can memory expression can be renamed yet? */
continue;
if x is subscripted
if x refers to a call c
add x to use collector of c
else /* x is not subscripted */
if Stacks [x ] is empty
if Stacks [all] is empty
def := null; /* No denition, i.e. x0 */
else
def := Stacks [all].top
else
def := Stacks [x ].top
if def is a call statement c
add x to use collector in c
replace the use of x with x def in S
if S is a call or return statement
update denition collector in S
for each denition of some location a in S
push pointer to S onto Stacks [a ]
if S is a childless call
for each location l in Stacks
push pointer to S onto Stacks [l ]
for each successor Y of block n /* Successor in the CFG */
suppose n is the j th predecessor of Y
for each φ-function in Y
suppose the j th operand of the φ-function is a
if Stacks [a ] is empty then def := null
else def := Stacks [a ].top
replace the j th operand with adef
for each child X of n /* child in the dominator tree */
Rename(X )
for each statement S in block n in reverse order /* Note 1 */
for each denition of some location a in S
if can rename a
pop Stack [a ]
if S is a childless call
for each location l in Stacks /* Pop all denitions due to childless calls */
if Stacks [l ].top = S
pop Stacks [l ]
4.6 Related Work 139

Collectors complicate the handling of statements containing them slightly. For example,

when performing a visitor pattern on a call statement, which contains two collectors,

should the expressions in the collectors visited? At times, this is wanted, but not at

other times. As an example, it is desirable to propagate into the expressions on the

right hand side of denition collectors, so in this case the right hand sides of denition

collectors are treated as uses. However, when deciding if a denition is dead, these uses

should be ignored.

4.5.1 Collector Applications


Collectors nd application in call bypassing, caller/callee context translation, computing

returns, and initial arguments and denes at childless calls.

As covered in Section 4.3.3, the denitions of locations reaching calls are needed for call

bypassing. The denition collector in call statements provides this expression.

Recall from Section 3.4.2, an expression in the caller context such as m[esp0 -64] is

converted to the callee context (as m[esp0 +4]) using the value espcall of the stack pointer
at the call statement. The denition collector in the call collects reaching denitions

for every location, so it is simple to compute espcall .


At childless calls, the arguments are initially all denitions that reach the call. These

are provided by the denition collector at the call. Similarly, denes at a childless call

are initially all locations live at the call. These are provided by the use collector at the

call.

Call results are computed late in the decompilation process. For each call, the results

are the intersection of the live variables just after the call (in the use collector of the

call) and the denes of the call (which are the modieds of the callee translated to the

context of the caller). The nal returns of the callee depend on the union of livenesses

(obtained from the use collectors in the callers) over all calls. Thus, if a procedure

denes a location that is never used by any caller, it is removed from the returns of the

callee.

4.6 Related Work

With suitable modications to handle aggregates and aliases well, the SSA form obviates

the need for the complexity of techniques such as recency abstraction.

Balakrishnan and Reps propose recency-abstraction as a way of analysing heap-

allocated storage in executable programs [BR06]. Figure 4.31 reproduces an example


140 SSA Form

from that paper.

void foo() { void foo() {


int **pp, a; int **pp, a, b;
while(...) { while(...) {
pp = pp =
(int*)malloc(sizeof(int*)); (int*)malloc(sizeof(int*));
if(...) if(...)
*pp = &a; *pp = &a;
else { else {
// No initialization of *pp *pp = &b;
} }
**pp = 10; **pp = 10;
} }
} }

(a) (b)

Figure 4.31: The weak update problem for malloc blocks. From Fig. 1 of [BR06].

The problem is that for representations without the single assignment property, infor-

mation about assignments such as *pp= in the example are summary nodes; they must

summarise all assignments to *pp in the program. To ensure soundness, value analysis

must always over-approximate the set of possible values, so the summary nodes are

initialised for all variables to the special value >, representing all possible values. All

updates to summary nodes must be weak updates, so the values of the nodes never

change. If the information for *pp was initialised to ∅ (the empty set), information

such as  *pp points to a can be established, but in the case of Figure 4.31(a) the fact

that *pp is not initialised along some paths has been lost, and this is the very kind of

situation that the authors want to capture.

The authors use the elaborate recency-abstraction to overcome this problem while main-

taining soundness. However, ignoring alias issues for a moment, single assignment rep-

resentations such as the SSA form only have one assignment to each variable (or virtual

variable, such as *pp in the example). In such a system, all updates can be strong,

since the one and only assignment (statically) to each variable is guaranteed to change

it to the value or expression in the assignment. For example, let pp1 be the variable

assigned to at the malloc call. pp1 always points to the last heap object allocated at
this site. Following (*pp1 )1 = &a and (*pp1 )2 = &b, the two versions of (*pp1 ) always
point to a and b respectively.

Information about each variable can be initialised to > as required for soundness. Fol-

lowing a heap allocation, all elements of the structure (in this case there is only one,

*pp), are assigned > (e.g. (*pp1 )0 = >). In Figure 4.31(a), the value for *pp that is
4.7 Other Representations 141

assigned the value 10 will be the result of a φ-function with (*pp1 )1 and (*pp1 )0 as
operands; the value sets for these are &a and > respectively. The resultant variable will

have a value set of > (i.e. possibly undened). In Figure 4.31(b), the equivalent value

for *pp will be the result of a φ-function with (*pp1 )1 and (*pp1 )2 , which have values

sets &a and &b respectively. The result will have the value set {&a, &b} (i.e. *pp could

point to a or to b). Figure 4.32 shows the example of Figure 4.31(a) in SSA form.

void foo() {
int **pp, a;
while(...) {
pp1 = (int*)malloc(sizeof(int*));
*(pp1 )0 = >; // Value set = {>}
if(...)
(*pp1 )1 = &a; // Value set = {&a}
else {
// No initialization of *pp
}
(*pp1 )2 = φ((*pp1 )0 , (*pp1 )1 ); // Value set = {>}
*(*pp1 )2 = 10;
}
}
Figure 4.32: The code of Figure 4.31(a) in SSA form.

This considerable advantage comes at a cost; information is stored about every version

of every variable. While the SSA form solves the simple example above, as usually

applied in compilers the SSA form needs to be extended to handle aggregates well.

More importantly, variables such as *pp1 are only uniquely assigned to if no other

expression both aliases to *pp1 and is assigned to. If the SSA form can be extended to

accommodate the complications of aliasing and aggregates, it shows great promise for

obviating the need for techniques such as recency-abstraction.

4.7 Other Representations

Many alternative intermediate representations exist, especially for optimising compilers,

but few oer real advantages for decompilation.

Many intermediate representations (IRs) for program transformations have been sug-

gested, mainly to facilitate optimisations in compilers. The Static Single Assignment

form (SSA form) is one such IR, and there have been several extensions to the basic

SSA form.

The Gated Single Assignment form (GSA form) [TP95] is an extension of the SSA form

designed to make possible the interpretation of programs in that form. This is not
142 SSA Form

required in a decompiler, since copy statements can be added to make the program

executable. However, it is possible that GSA or similar forms might reduce the number

of copy assignments, and hence increase readability.

4.7.1 Value Dependence Graph


The VDG and other representations abstract away the control ow graph, but the results

for decompilation are not compelling.

There is a series of intermediate representations that are based on the idea of abstracting

away the Control Flow Graph (CFG). A notable example is the Value Dependence

Graph (VDG) [WCES94]. The key dierence between VDG and CFG-based IR is that

VDG is a parallel representation that species a partial order on the operations in the

computation, whereas the CFG imposes an arbitrary total order. The authors claim

that the CFG, for example by naming all values, gets in the way of analysing the

underlying computation.

The running example in their paper shows seven dierent types of optimisations, all

facilitated by their representations, which is impressive. However, many optimisations

such as loop invariant code motion do not apply to decompilation; the best position

for an assignment in source code is the position that maximises readability. One opti-

misation is name insensitive global common subexpression elimination. Decompilers

based on the IR described in this chapter are already inherently name insensitive, in

the sense that variables such as acopy (a copy of variable a ) in their example are auto-

matically replaced with the original denition by the process of expression propagation.

Similarly, code motion into the arm of an if statement is automatically performed

by expression propagation.

VDG does reveal opportunities for parallelisation that might be useful for future de-

compilers that emit source code in some language that expresses parallelism directly.

However, it could be argued that parallelisation opportunities are the job of the com-

piler, not the programmer, and therefore do not belong in source code.

4.7.2 Static Single Information (SSI)


Static Single Information, an extension of SSA, has been suggested as an improved

intermediate representation for decompilation, but the benets do not outweigh the costs.

Using the SSI form, as opposed to SSA form, improves a decompiler's type analysis in

very rare circumstances. One such case involves a pair of load instructions which are
4.7 Other Representations 143

hoisted ahead of a compare and branch. Figure 4.33 shows the source and machine

code of an example.

#define TAG_INT 1
#define TAG_PTR 2 int f(tagged_union *tup) {
typedef struct { int x;
int tag; if (tup->tag == TAG_INT)
union { x = tup->u.a;
int a; else
int *p; x = *(tup->u.p);
} u; return x;
} tagged_union; }
(a) Source code

// Calling conventions: argument passed in register r0,


// result returned in register r0
// Expected output from a // Possible output from an optimising
// non-optimising compiler // compiler
f: f:
LDW r1 ← [r0] LDW r1 ← [r0]
CMP r1, #1 LDW r0 ← [r0,#4] // note that the load
BNE label CMP r1, #1 // has been hoisted
LDW r0 ← [r0,#4] BEQ return // above the compare
JMP return LDW r0 ← [r0] // and branch
label: return:
LDW r0 ← [r0,#4] RET
LDW r0 ← [r0]
return:
RET
(b) Machine code

Figure 4.33: An example program from [Sin03].

The problem is that the hoisted load instruction loads either a pointer or an integer,

depending on the path taken in the if statement. In the non-hoisted version, while r0
is sometimes used as a pointer and as an integer, there are separate denitions for the

two dierent types. With its concept of having only one denition of a variable, SSA

is very good at separating the dierent uses of the register, and sensible output can be

emitted, with no type conicts. The decompiled program will require casts or unions,

but so did the original program.

In the compilation with the hoisted load, the one (underlined) load instruction (and

therefore one denition) produces either an integer or a pointer, and there is a path in

the program where that value is returned. In essence, the machine code conates the

two loads, and uses extra information (the tag eld) to perform the right operations

on the result. When the value loaded is a pointer, control always proceeds to the last
144 SSA Form

load instruction, which dereferences the pointer and loads an integer. It appears that

the program could take the other path, thereby returning the pointer, but this never

happens, due to the design of the program.

By itself, the SSA form combined with any type analysis is not powerful enough to infer

that the type of the result is always an integer. This is indicated by the type equation

for the return location r0d of Figure 4.34(a), which is T (r0d ) = α ∨ α∗. This equation

could be read as the type of variable r0d is either an alpha, or a pointer to an alpha.

(α is a type variable, i.e. a variable whose values are types like oat or int*. It appears

here because until the return location is actually used, there is nothing to say what type

is stored at oset 4 in the structure; as soon as the return value is used as an integer,

α is assigned the value int.)


LDW r1a ←m[r0a ] LDW r1a ←m[r0a ] LDW r1a ←m[r0a ]
LDW r0b ←m[r0a +4] LDW r0b ←m[r0a +4] LDW r0b ←m[r0a +4]
CMP r1a , #1 CMP r1a , #1 CMP m[r0a ], #1
r0x ,r0y ←σ (r0b )
LDW r0c ←m[r0b ] LDW r0c ←m[r0y ] LDW r0c ←m[m[r0a +4]]
r0d ←φ(r0b , r0c ) r0d ←φ(r0x , r0c ) r0d ←φ(r0b , r0c )
T (r0d ) = α ∨ α∗ T (r0d ) = α T (r0d ) = α
(a) SSA form (b) SSI form (c) SSA form with
propagation

Figure 4.34: IR of the optimised machine code output from Figure 4.33.

As with all three representations, int and r0a has the type
T (r1a ) =

struct{int, union{α, α*}}*, in other words, r0a points to a structure with an int followed
by a union of an α and a pointer to an α.

When the program is converted to SSI form, as shown in Figure 4.34(b), the decompiler

spends extra time and space splitting the live range of every live variable at every control

ow split (branch or switch instruction). For example, r0b is split into into r0x and

r0y , gambling that something dierent will happen to the split variables to justify this

extra eort. In this case the gamble succeeds, because r0y is used as a pointer while

r0x remains used only as an integer. This allows each renamed variable to be typed

separately, and the nal type for r0d is α, as it should be. Hence this program is one

case where the SSI form has an advantage over the SSA form.

However, this advantage disappears with propagation, as shown in Figure 4.34(c).

While SSI splits all live variables at every control ow split, propagation removes the

use of r0b as a temporary pointer value, placing the memory indirection directly where

it is used. In other words, the single memory of operator from the single load instruc-

tion is copied into two separate expressions, neatly undoing the hoisting, and neatly
4.7 Other Representations 145

avoiding the consequent typing problem. The variables r0b , r0c , and most importantly
r0d all have a consistent type, α, as expected.

Once again, the combination of the SSA form and expression propagation is found to

be quite useful for a decompiler. The extra overhead of the SSI form (the time and

space to create the σ -functions) has not been found to be useful for decompilation in

this case.

One area where the extra information could potentially be useful for a decompiler is in

the analysis of array sizes. For example, if an array index takes the values 0 through 9

in a loop, then the version of the index variable in the loop can be shown to take the

range 0-9, while the same variable outside the loop could take other values (typically

only one value, 10, after the loop exits). This information could potentially be used to

correctly emit the initialised values for an initialised array. However, decompilers need

to be able to do the same thing even if the index variable has been optimised away,

eectively deducing the existence of an induction variable. The SSA form appears to

be adequate for this purpose [Wol92].

4.7.3 Dependence Flow Graph (DFG)


The Dependence Flow Graph shows promise as a possible augmentation for the Static

Single Assignment form in machine code decompilers.

The Dependence Flow Graph (DFG) can be thought of as an extension of the Static

Single Assignment form [JP93]. In this form, the emphasis is on recording dependencies,

both from uses to denitions and vice versa. Figure 4.35 shows the main loop of the

running example in various intermediate representations.

Note that the arrows are reversed for the SSA form, since in that form, the most

compact encoding of the dependencies is for each use to point to the unique denition.
+
This encoding is used in [KCL 99]. However, the dependencies could be recorded as a
+
list of uses for each denition, as per the earlier SSA papers (e.g. [CFR 91, WZ91]),

or both. In the DFG literature, dependencies are described as pairs of nodes, so in

the DFG form, uses could point to denitions (sometimes via merge operators), and/or

denitions could point to their uses (sometimes via switch operators or multiedges).

Switch operators dier from multiedges in the DFG form in that there is a second

input (not shown in Figure 4.35) that carries information about the uses at the target

of the switch operator. For example, at the true output of the switch operator for ebx,
it is known that ebx>0. For the other switch operators, the switch predicate does not

give directly useful information, although it may be possible for example to deduce that
146 SSA Form

control flow
dependence (def-use) phi-function merge operator switch operator
dependence (use-def)
multiedge

m[esp-28] := edx m[esp-28] := edx m[esp-28] := edx

st := st *f (double)m[esp-28] st := st *f (double)m[esp-28] st := st *f (double)m[esp-28]

edx := edx-1 edx := edx-1 edx := edx-1

m[esp-28] := esi m[esp-28] := esi m[esp-28] := esi

st := st /f (double)m[esp-28] st := st /f (double)m[esp-28] st := st /f (double)m[esp-28]

esi := esi-1 esi := esi-1 esi := esi-1

start ebx0 start


ebx := ebx-1 ebx := ebx-1 ebx := ebx-1

if ebx > 0 if ebx > 0 if ebx > 0

m[esp-20] := (int)st m[esp-20] := (int)st m[esp-20] := (int)st

edx := m[esp-20] edx := m[esp-20] edx := m[esp-20] end

(a) CFG with def-use chains (b) SSA form (c) DFG form

Figure 4.35: A comparison of IRs for the program of Figure 4.1. Only a few def-use
chains are labelled, for simplicity. After Figure 1 of [JP93].

esi is always greater than zero, so that a divide by zero error or exception will never

occur in the loop.

The presence of switch operators and multiedges makes the DFG more suitable for

backwards data ow analyses than SSA form.

The original control ow graph still exists in the DFG form, so no extra step is necessary

to construct a CFG for high level code generation, which would normally require one.

The DFG form can also readily be converted to a dataow graph for execution on a

so-called dataow machine.

The availability of uses for each denition, albeit at some cost for maintaining this

information, may make it suitable as a decompiler intermediate representation.


4.8 Summary 147

4.8 Summary

The Static Single Assignment form has been found to be a good t for the intermediate

representation of a machine code decompiler.

It has been demonstrated that data ow analysis is very important for decompilation,

and that the Static Single Assignment form makes most of these considerably easier.

The propagation of memory expressions has been shown to be dicult, but adequate

solutions have been given.

Preservation analysis was found to be quite important, and in the presence of recursion,

surprisingly dicult. An algorithm has been given which visits the procedures involved

in recursion in the correct order, and correctly analyses the preservations.

Denition and use collectors have been introduced to take a snapshot of already com-

puted data ow information that is of particular importance in decompilation.

Finally, an overview of related intermediate representations has been presented.

Overall, the SSA form has been found to be a good t for the intermediate representa-

tion of a machine code decompiler. The main reasons are the extreme ease of propaga-

tion, compact storage of use-denition information, ease of reasoning for preservation

analysis, simplicity of dead code elimination, and a general suitability for identifying

parameters and returns.

The SSA form is conventionally used only on scalar variables. Although numerous

attempts have been made to extend it to handle arrays and structures, an extension

that works well for for machine code programs has yet to be found.
148 SSA Form
Chapter 5

Type Analysis for Decompilers

The SSA form enables a sparse data ow based type analysis system, which is well suited

to decompilation.

Compilers perform type analysis to reject invalid programs early in the software de-

velopment process, avoiding more costly correction later (e.g. in debugging, or after

deployment). Some source languages such as C and Java require type denitions for

all variables. For these languages, the compiler checks the mutual consistency of state-

ments which have type implications at compile time in a process called type checking.

Since much type information is required by the compiler of the decompiled source code,

and no explicit type information exists in a typical machine code program, a machine

code decompiler has considerable work to do to recover the types of variables. This

process is called type analysis for decompilers.

Other languages such as Self, C/C++ and Smalltalk require few type denitions, vari-

ables can hold dierent types throughout the program, and most type checking is per-

formed at runtime. Such languages are dynamically type checked. Static (compile

time) type analysis is often still performed for these languages, to discover what types

a variable could take at runtime for optimisation. This process, called type inferencing

or type reconstruction, is harder for a compiler than type checking [KDM03]. Since C-

like languages are more common, this more dicult case will not be considered further

here.

The type analysis problem for decompilers is to associate each piece of data with a high-

level type. The program can reasonably be expected to be free of type errors, even

though some languages such as C allow casting from one type to almost any other.

Various dierent pieces of data require typing: initialised and uninitialised global data,

local variables, heap allocated variables, and constants.

Type analysis is conceptually performed after data ow analysis and before control ow

149
150 Type Analysis for Decompilers

analysis, as shown in Figure 5.1.


Output
source
file
Intermediate Representation (IR)

Control Flow
Data Flow Type Code
Loader Decoder Analysis
Analysis Analysis generation
(structuring)
Front end Back end

Input
binary file

Figure 5.1: Type analysis is a major component of a machine code decompiler.

Type analysis, even more so than data ow analysis, is vastly more heavily discussed

in the literature from the point of view of compilers. Detailed consideration is given

here on the nature of types in machine code programs, and the requirements for types

in the decompiler output.

The type information present in machine code programs can be regarded as a set of

constraints to be solved, or a set of operations to be processed by something similar to

data ow analysis. This chapter presents a sparse data ow based type analysis based

on the Static Single Assignment form (SSA form). Since type information can be stored

with each denition (and constant), there is ready access to the current type from each

use of a location. The memory savings from this sparse representation are considerable.

Addition and subtraction instructions, which generate three constraints each, require

a dierent approach in a data based type analysis system. Similarly, arbitrary expres-

sions deserve special attention in such a system, as there is nowhere to store types for

intermediate expression results.

In any type analysis system for machine code decompilers, there is a surprisingly large

variety of memory expressions that represent data objects from simple variables to array

and structure members. These will be enumerated in detail for the rst time.

Section 5.1 lists previous work that forms the basis for this chapter. Section 5.2 in-

troduces the nature of types from a machine code point of view, and why they are so

important. The sources of type information are listed in Section 5.3. Constants require

typing as well as locations, as shown in Section 5.4. Section 5.5 discusses the limitations

of solving type analysis as a constraint satisfaction problem. The special requirements

of addition and subtraction instructions are considered in Section 5.6, and revisited in

Section 5.7.4. An iterative data ow based solution is proposed in Section 5.7, and the
5.1 Previous Work 151

benets of an SSA version are given in Section 5.7.2. The large number of memory

location patterns is enumerated in Section 5.8. Decompilers need to process data as

well as code, and this process goes hand in hand with type analysis, as discussed in

Section 5.9. Section 5.10 mentions some special types useful in type analysis, Section

5.11 discusses related work, and Section 5.12 lists work that remains for the future.

Finally, Section 5.13 summarises the contributions of the SSA form to type analysis.

5.1 Previous Work

The work of Mycroft and Reps et al. have some limitations, but they laid the foundation

for type analysis in machine code decompilation.

Mycroft's paper [Myc99] was the rst to seriously consider type analysis for decompila-

tion. He recognised the usefulness of the SSA form to begin to undo register-colouring

optimisations, i.e. to separate colocated variables. He notes the problem caused by

array indexing where the compiler generates more than one instruction for the array

element access. Section 5.5.1 discusses this limitation.

Mycroft's work has a number of other limitations, e.g. global arrays do not appear

to have been considered. However, the work in this chapter was largely inspired by

Mycroft's paper.

Reps et al. describe an analysis framework for x86 executables [RBL06, BR05]. Their

goal is to produce an IR that is similar to that which could be produced from source

code, but with low-level elements important to security analysis. They ultimately use

tools designed to read source code for browsing and safety queries. Figure 5.2 gives an

overview.

One of the stated goals is type information, but the papers do not spell out where this

information comes from, apart from a brief mention of propagating information from

library functions. Their papers mention decompilation as a potential application, but

it is not one of their primary areas of interest.

Three main analyses are used: value set analysis (VSA), a form of value analysis; ane

relation analysis (ARA), and aggregate structure identication (ASI). VSA nds an

overapproximation of the values that a location could take at a given program point.

ARA is a source code analysis modied by the authors for use with executable programs.

It nds relationships between locations, such as index variables and running pointers.

ASI recovers the structure of aggregates such as structures and arrays, including arrays

of structures.
152 Type Analysis for Decompilers

CodeSurfer/x86

IDA Pro Connector CodeSurfer User Scripts

Executable Disassemble VSA WPDS++


Executable Build SDG Path
Inspector
ARA Browse
Build
Decompiler
CFGs
ASI
Code
Rewriter

Initial estimate of • fleshed-out CFGs


• code vs. data • fleshed-out call graph
• procedures • used, killed, may-killed
• call sites variables for CFG nodes
• malloc sites • points-to sets
• reports of violations

!
Figure 5.2: Organisation of the CodeSurfer/x86 and companion tools. From
"
[RBL06].

The authors report results that are reasonable, particularly compared to the total failure

of most tools to analyse complex data structures in executable programs. However, only

55% of virtual functions were analysed successfully [BR06], and 60% of the programs

tested have one or more virtual functions that could not be analysed [BR06, BR05].

72% of heap allocated structures were analysed correctly (i.e. the calculated structure

agreed with the debugging information from the compiler), but for 20% of the programs,

12% or less were correct [BR05]. As will be shown later, the form of memory expressions

representing accesses to data items, particularly to aggregate elements, is surprisingly

complex. In particular, it is not clear that their analyses can separate original from

oset pointers (Section 1.5.3 on page 19). It is hoped that an analysis that takes

into account all the various possible memory expression forms will be able to correctly

analyse a larger proportion of data accesses, while also nding types for each data item.

5.2 Type Analysis for Machine Code

Type information encapsulates much that distinguishes low level machine code from high

level source code.

The nature and uses of types will be considered in the next sections from a machine

code point of view.


5.2 Type Analysis for Machine Code 153

5.2.1 The Role of Types


Types are assertions; they partition the domain of program semantics, and partition the

data into distinct objects.

When a programmer declares variables to be of type integer or class Customer in a

statically type checked language, assertions are made about what values the variables

can take, and what operations can usefully be performed on them. Part of the domain

of program semantics applies to variables of type integer (e.g. a shift left by 2 bits),

and others (e.g. a call to method getName()) apply to objects of type class Customer.
When a program is found to be applying the wrong semantics to a variable, it can be

shown that the program violates the typing rules of the language, so an error message

can be issued at compile time.

Types include information about the size of a data object. Types partition the data

sections from blocks of bytes to sets of distinct objects with known properties.

In a machine code program, type declarations usually have been removed by the com-

piler. (An exception is debugging information, inserted during software development,

which may still be present in some cases.) When decompiling the program, therefore,

the fact that a variable is at some point shifted left by two is a strong indication that

it should be typed as integer and not as class Customer.


Similarly, when four bytes of data are used as integer, those four bytes become distinct

from other data objects in the data sections. It is unlikely that these four bytes are

part of another data object, a string for example, that happens to precede it in the

data section. In this way, the decompiler can eectively reconstruct the mapping from

addresses to types present in the symbol table of the original compiler.

The view of types as assertions leads to the idea of generating constraints for each

variable whenever it is used in a context that implies something about its type. Not

all uses imply full type information. For example, a 32-bit copy (register to register

move) instruction only constrains the size of the type, not whether it is integer, oat,

or pointer. The above assumes that pointers are 32 bits in size. Programs generally

use one size for pointers, which can be found by examining the binary le format.

The examples in this chapter will assume 32-bit pointers; obviously other sizes can be

accommodated.

Some operations imply a basic type, such as integer, but no detail of the type, such

as its signedness (signed versus unsigned). The shift left operator is an example of

this. Operators such as shift right arithmetic imply the basic type ( integer), and the

signedness (in this case, signed). The partial nature of some type information leads to

the notion of a hierarchy or lattice of types. The type unsigned short is more precise than
154 Type Analysis for Decompilers

short integer (where the signedness is not known), which is more precise than size16,
which in turn is more precise than > (no type information at all).

In object oriented programs, an important group of types are the class types. It is

desirable to know, for example, whether some pointer is of type Customer* or type Em-
ployee*. Such type information does not come directly from the semantics of individual

instructions, but from class hierarchy analysis.

A program expressed in a high level language must contain some type declarations to

satisfy the rules of the language. However, it is possible to not use the type system ex-

cept at the most trivial level. Consider for example a binary translator which makes no

attempt to comprehend the program it is translating, and copies the data section from

the original program as one monolithic block of unstructured bytes. Suppose also that

it uses the C language as a target machine independent assembler; the UQBT binary
+
translator operates this way [CVE00, CVEU 99, UQB01]. Such low level C emitted

by the translator will contain type-related constructs, since the C language requires

that each variable denition is declared with a type, but all variables are declared

as integer, and casts are inserted as needed to reproduce the semantics of the origi-

nal program. Memory is referenced by expressions of the form *(<type> *)(<address


expression> ).
Code such as this is essentially a low level disassembly expressed in a high level language.

One of the main features absent from this kind of code is real type information for the

data.

5.2.2 Types in High Level Languages


Types are essential to the expression of a program in high level terms: they contribute

readability, encapsulate knowledge, separate pointers from numeric constants, and en-

able Object Orientation.

From the above discussion, it is clear that types are an essential feature of code written

in high level languages, and without them, readability will be extremely poor.

The separation of pointers and numeric constants is implicit in deciding types for vari-

ables. An immediate value in an instruction (which in a disassembly could represent the

address of some structure in memory, an integer number, or a oating point number)

is clearly a pointer or constant once assigned a type. Recall that in Section 1.5, sepa-

rating pointers from numeric constants is one of the fundamental problems of reverse

engineering from machine code.

One of the dierences between machine code programs and their high level language

equivalents is that features such as null pointer checks and array bounds checks are
5.2 Type Analysis for Machine Code 155

usually implicitly applied by the compiler. In other words, they are present at the

machine code level, but a good decompiler should remove them. Type analysis is

needed for this removal; for a null pointer check to be removed, the analysis must nd

that a variable is a pointer. Similarly, arrays and array indexes must be recognised

before array bounds checks can be removed. It could be argued that if the null pointer

or array bounds checks are removed by a decompiler but were present in the original

source code, that the decompiled program is still correct.

Recursive types are types where part of a type expression refers to itself. For example,

a linked list may have an element named Next which is a pointer to another element of
the same linked list. Care must be taken when dealing with such types. For example,

a simple algorithm for printing a type as a string is as follows: when encountering a

pointer, emit a star and recurse with the child of the pointer type. This works for

non-recursive types, but fails for the linked list example (it will attempt to emit an

innite series of stars). This problem is of course not unique to decompilers.

5.2.3 Elementary and Aggregate Types


While elementary types emerge from the semantics of individual instructions, aggregate

types must at times be discovered through stride analysis.

Most machine instructions deal with elementary (simple) types. Such types include in-

tegers and oating point numbers. Enumerations and function pointers are represented

at the machine code level as integers. Aggregate types are combinations of elementary

types: arrays, structures, and unions of elementary types.

The other main class of types (other than elementary and aggregate types) are the

pointers. These will be considered in more detail in the sections that follow.

Aggregate types are usually handled at the machine code level one element at a time.

The few machine instructions which deal with aggregate types, such as block move or

block set instructions, can be broken down into more elementary instructions in a loop.

While machine instruction semantics will often determine the type of an elementary data

item, aggregate types can generally only be discovered as emerging from the context

of elementary instructions, e.g. elementary operations performed in a loop, or some

elementary operation performed at a xed oset from a pointer.

Figure 5.3 shows source and machine code that uses elementary and aggregate types.

Note how on a complex instruction set such as the Intel x86, objects of elementary types

are accessed with simple addressing modes such as m[r1], while the aggregate types
use more complex addressing modes such as m[r1+r2*S] (addressing mode (r1,r2,S))

and m[r1+K] (r1 and r2 are registers, and S and K are constants).
156 Type Analysis for Decompilers

void process(int& i, mov 0x8(%ebp),%edx ; pi


float& f, int* a, s* ps) { mov (%edx),%eax ; i
i += 1; inc %eax ; i+1
mov %eax,(%edx) ; save, eax=i
f += 2.; mov 0xc(%ebp),%ecx ; pf
fld 0x402000 ; 2.0
fadd (%ecx) ; f+2.0
fstp (%ecx) ; store
a[i] += 3; mov 0x10(%ebp),%ebx ; pa
add $3,(%ebx,%eax,4) ; a[i] += 3
ps->b += 4; mov 0x14(%ebp),%esi ; ps
add $4,4(%esi) ; ps->b += 4
(a) Source code (b) Machine code

Figure 5.3: Elementary and aggregate types at the machine code level.

5.2.4 Running Pointers


While running pointers and array indexing are equivalent in most cases, running point-

ers pose a problem in the case of an initialised array.

C programmers are aware that it is possible to access an array using either indexing

or by incrementing a pointer through the array. Generally, pointers are more ecient

than indexing; as a result, a program written with indexing may be converted by an

optimising compiler to equivalent code that manipulates pointers. This implies that a

decompiler is free to represent array handling code either way. Users may prefer one

representation over the other. It could be argued that the representation can safely be

left to a runtime option or an interactive user choice. Alternatively, one representation

could always be produced, and a post decompilation transformation phase used if the

other representation is desired.

The use of running pointers on arrays has an important implication for the recovery of

pre-initialised arrays. Simplistic type analysis expecting to see indexing semantics for

arrays (e.g. [Myc99]) may correctly decompile the code section of a program, but fail

to recover the initial values for the array. Figure 5.4 illustrates the problem.

In Figure 5.4(a), the type analysis discovered an array, and in order to declare the array

properly, is prompted to analyse the size of the array. If it is in a read-only section

of the binary le, its size and type can be used to declare initial values for the array

(not shown in the example). In Figure 5.4(b), the type of p is char*, and the type of

the constants is also char*. The fact that address 10 000 is used as a char may prompt

one char to be declared as a character, and if in a read-only section of the binary le,

it may be given an initial value. Note that there is no prompting to declare the other

nine values as type char. Worse, the constant 10 010 is used as type char*, which may
5.2 Type Analysis for Machine Code 157

char a[10]; char* p;


int i=0; p = (char*) 10000;
while (i<10) { while (p < (char*) 10010) {
process(a[i]); process(*p);
++i; ++p;
} }

(a) Using an indexed (b) Using a running pointer


array

Figure 5.4: The decompilation of two machine code programs processing an


array of ten characters.

prompt the type analysis to declare the object at address 10 010 to be of type char,

when in fact this is just the next data item in the data section after array a. That

object could have any type, or be unused. This demonstrates that whenever constant

K is used with type Y*, it does not always follow that location K* (i.e. m[K]) is used

with type Y.

Programs written in the style of 5.4(a) can be transformed by compilers to machine

code programs in the style of 5.4(b).

The size of the array in Figure 5.4(a) is relatively easy to determine. In other cases,

it may be dicult or impossible to determine the length of the array. For example, a

special value may be used to terminate the loop. The loop termination condition could

be arbitrarily complex, necessitating in eect that the program be executed to nd the

terminating condition. Even executing the program may not determine the full size of

the array, since for each execution, only part of the array may be accessed. In these

cases, type analysis may determine that there is an initialised array, but fail to compute

the length of the array.

typedef struct {
int i;
float f;
} s;
s as[10];
s* ps = as; void* p = (void*) 10000;
while (ps < as+10) { while (p < (void*) 10080) {
processInt(ps->i); processInt(*(int*)p); p += 4;
processFloat(ps->f); processFloat(*(float*)p);
++ps; p += 4;
} }

(a) Original program (b) Decompilation

Figure 5.5: A program referencing two dierent types from the same pointer.
158 Type Analysis for Decompilers

Another case is where an array of structures is traversed with running pointers, as

shown in Figure 5.5(a). Here the structure contains one integer and one oat; inside

the loop, the pointer references alternately an integer and a oat. The original compiler

has rearranged the code to increment the pointer by the size of the integer and the size

of the oat after each reference, to make the machine code more compact and ecient.

As shown in Figure 5.5(b), the type of the pointer is now void*, since it is used as two

incompatible pointer types.

Programs such as shown in Figure 5.4(b) and 5.5(b) are less readable but correct (as-

suming suitable casts and allocating memory at the appropriate addresses) if the array

is not initialised. However, the only way to make them correct if the array is initialised

is to use binary translation techniques: force the data to the original addresses, and

reverse the data sections if the endianness of source and target machines is dierent.

Needless to say, the results are far from readable, and translation to machines with

dierent pointer sizes (among other important characteristics) is simply not feasible.

To avoid this extreme unreadability, it is necessary to analyse the loops containing the

pointer based code, and nd the range of the pointer(s).

Pointer range analysis for decompilers is not considered here; analyses such as the value

set analysis of Reps et al. may be suitable [RBL06].

5.3 Sources of Type Information

Type information arises from machine instruction opcodes, from the signatures of library

functions, to a limited extent from the values of some constants, and occasionally from

debugging information

One of the best sources of type information in a machine code program is the set of

calls to library functions. Generally, the signature of a library function is known, since

the purpose of a library function is to perform a known and documented function. A

signature consists of the function name, the number and type of parameters, and the

return type (if any) of the function. This information can be stored in a database of

signatures derived from parsing header les, indexed by the function name (if dynam-

ically linked) or by an identier obtained by pattern matching statically linked code

[FLI00, VE98].

A limited form of type information comes from the value of some constants. In many

architectures, certain values can be ruled out as pointers (e.g. values less than about

0x100). One of the major decisions of type analysis is whether a constant is a pointer

or not, so this information can be valuable.


5.3 Sources of Type Information 159

A third source of type information is the semantics of individual machine instructions.

All machine instructions will imply a size for each non-immediate operand. Registers

have a denite size, and all memory operations have a size encoded into the opcode

of the instruction. For instructions such as some load, store and move instructions,

there is no information about the type other than the size, and that the type of the

source and destination are related by T (dest ) ≥ T (src ). (T (x) denotes the type of x.

The inequality only arises with class pointers; the destination could point to at least

as many things as the source). The same pointer-sized move instruction could be used

to move a pointer, integer, oating point value, etc. As a result, there needs to be a

representation for a type whose only information is the size, e.g. size16 for any 16-bit

quantity.

In some machine code programs, there is runtime type information (sometimes called

RTTI or runtime type identication). This is typically available where the original

program used the C++ dynamic_cast operation or similar. Even if the original source

code did not use such operations explicitly, the use of certain libraries such as the

Microsoft Foundation Classes (MFC) may introduce RTTI implicitly [VEW04]. When

RTTI is present in the input program, the names and possibly also the hierarchy of

classes is available. Class hierarchy provides type information; for example if class A is

a parent of class B, then T (A) > T (B ).

Finally, it is possible that the input program contains debug information or symbols.

Debugging information is usually switched on during program development to aid with

debugging. It is possible that in some circumstances, debugging information will still

be present in the machine code available to a decompiler. When debug information

is available in the input program, the names of all procedures are usually available,

along with the names and types of parameters. The types of function returns and

local variables may also be available, depending on the details of the debug information

present.

Most type information travels from callee to caller, which is convenient for decompi-

lation because most of the other information (e.g. information about parameters and

returns) also travels from callee to caller. However, some type information travels

in the opposite direction. There is symmetry between arguments/parameters and re-

turns/results, with values sent from argument to parameter and return to result. The

direction that type information from the sources discussed above travels depends on

whether the library function, constant, machine instruction, etc. resides in the caller

or callee.

Consider the simple example of a function that returns the sum of its two 32-bit (pointer-

sized) parameters. The only type information known about the parameters of this
160 Type Analysis for Decompilers

function, taken in isolation, is both parameters and the return are of size 32 bits, and

not of any oating point type. In other words, several combinations of pointer and

integer are possible. If types are found for either of the callers' actual arguments or

result, type information will ow from caller to callee.

This example also highlights a problem with type analysis in general: there may be

several possible solutions, all valid. The programs for adding a pointer and an integer,

an integer and a pointer, and two integers, could all compile to identical machine

code programs. This problem is most acute with small procedures, however, since the

chance of encountering a use that implies a specic type increases as larger procedures

are considered.

Type analysis is considered a bidirectional problem [KDM03]. In other words, if imple-

mented as a data ow problem, neither forward or reverse traversal of the ow graph

will result in dramatically improved performance. The bidirectionality has its roots in

the fact that most type dening operations aect a destination (a denition) and some

operands (uses). The type eects of denitions ow forward to uses of that denition;

the type eects of uses ow backwards to denition(s) of those uses. Library function

calls similarly have bidirectional eects. Call by value actual arguments to the library

calls are uses of locations dened earlier (type information ows backwards). Return

value types and call by reference arguments are denitions whose type eects ow for-

wards. As a result, type analysis is one of few problems that can be expressed in data

ow terms and is truly bidirectional.

Comparison operations result in type constraints that may not be intuitive. The equal-

ity relational operators (= and 6=) imply only that the types of the variables being

compared are comparable. (If two types are comparable, it means that they are com-

patible in the C sense; int and char* are not compatible so they can't be compared.)

The type of either operand could be greater or equal to the type of the other operand,

since the equality relations are commutative, as shown in Figure 5.6.

void AddEmployee(CEmployee* emp, CManager* mgr) {


CEmployee
if (emp == mgr) ...
...
CManager CWorker if (mgr != emp) ...

Figure 5.6: Pointer equality comparisons.

By contrast, the non-commutative relational operators such as ≤ imply an array of

uniformly typed objects, since high level languages do not allow assumptions about

the relative addresses of other objects. Figure 5.7(b) shows an example where the

constant a+10 should be typed as int*, even though the same numeric constant may be
5.4 Typing Constants 161

used elsewhere as the address of the float f. These operators therefore imply that the

types of the operands are equal.

int a[10] = {1, 2, ...}; int a[10] = {1, 2, ...};


float f[10]; float f[10];
int i=0; int* p=a;
while (i < 10) { while (p < a+10) {
process(a[i]); ... process(*p); ...

(a) Original C (b) Optimised eective code

Figure 5.7: A pointer ordering comparison.

Some relational operators imply a signedness of the operands (e.g. signed less than or

equal). Signedness mismatches of integral operands are often ignored by compilers,

hence signed integers, unsigned integers, and integers of unknown sign should not be

regarded as distinct types. Rather, signedness of an integer type is a property, and

the nal declared signedness of a variable can be decided by a heuristic such as the

statically most commonly used variant.

Usually, a dierent comparison operator is used when comparing integers and oating

point numbers, so the basic type is implied by these operators. Fixed point numbers

will often be manipulated by integer operators (e.g. comparison operators, add and

subtract), with only a few operations (e.g. xed point multiply) identifying the operands

as xed point. It may therefore be possible to promote integers to xed point numbers,

i.e. xed-point < integer.

5.4 Typing Constants

Constants have types just as locations do, and since constants with the same numeric

value are not necessarily related, constants have to be typed independently.

In high level languages, constants have implied types. For example, the type of 2.0 is

double; 'a' is char, 0x61 is integer, and so on. However, in decompilation, this does

not apply. Constants require typing just as locations do. Depending on the type of the

constant, the same immediate value (machine language constant) might end up emitted

in the decompiled code as 1084227584, 5.000000, &foo, ENUM_40A00000, or possibly

other representations. It is important to realise that every constant has to be typed

independently. For example, in the following statement, each of the constants with

value 1000 could be a dierent type:

m[10001 ] := m[10002 ] + 10003


162 Type Analysis for Decompilers

The three solutions are:

• T (10001 ) = int* and T (10002 ) = int* and T (10003 ) = int


• T (10001 ) = α** and T (10002 ) = α** and T (10003 ) = int
• T (10001 ) = α** and T (10002 ) = int* and T (10003 ) = α*

where each α represents an arbitrary type, but each appearance of α implies the same

arbitrary type as the other αs. In the third example, 10003 is a pointer to an α, which

is added to an integer (found at memory address 1000), yielding a pointer to an α,


which overwrites the integer that was temporarily at location 1000. Casts must have

been used in the original program for the third solution to be valid, and will be needed

in the decompiled output.

Note also that the value in a register can be an intermediate value that has no type,

e.g. in the following SPARC code:

sethi 0x10 000, %o0


add %o0, 0x500, %o1 // %o1 used later as char*, value 0x10 500
add %o0, 0x600, %o2 // %o2 used later as integer, value 0x10 600

The value 0x10 000 in register %o0 has no standard type, although it appears to be used
as part of two dierent types, char* and integer. It must not appear in the decompiled

output. The intermediate value is a result of a common feature of RISC machines: since

RISC instructions are usually one word in length, two instructions are needed to produce

most word-length constants. Constant propagation and dead code elimination, both

facilitated by the SSA form, readily solve this problem by removing the intermediate

constants.

5.5 Type Constraint Satisfaction

Finding types for variables and constants in the decompiled output can be treated as

a constraint satisfaction problem; its specic characteristics suggest an algorithm that

makes strong use of constraint propagation.

Assigning types to decompiled variables can be considered to be a Constraint Satisfac-

tion Problem (CSP) [VH89, Bar98]. The domain of type variables is relatively small:
5.5 Type Constraint Satisfaction 163

• elementary types (enumerated in Section 5.2.3)

• pointer to α (where α is any type, including another pointer)

• an array of α (where α is any type, including another array)

• a structure or class

• an enumerated type

Figure 5.8 shows a program fragment in source code, machine code (with SSA trans-

formation), the resulting constraints, and the solutions to the constraints.

In the second instruction, the register r1a (rst SSA version of register r1) is set to 0.

Since zero could be used as an integer and but also as a NULL pointer, the constraints

are that r1a could be an integer (t1a = int ) or a pointer to something, call it α1 (t1a =

ptr(α1 )). The constraints generated by the add instruction are more complex, reecting

the fact that there are three possibilities: pointer + integer = pointer, integer + pointer

= pointer, and integer + integer = integer respectively. The constraints are normally

solved with a standard constraint solver algorithm, but parts of a simple problem such

as that of Figure 5.8 can be solved by eye. For example, the constraint for the load

instruction has only one possibility, t0b = ptr(mem(0: t2a) (meaning that r0b points

to a structure in memory with type t2a at oset 0 from where r0b is pointing). This

can be substituted into the constraints for the instruction with the label 3F2:, so the

types for t0a and t0c must be the same as this.

Continuing the constraint resolution, two solutions are found. In most cases, there

would not be more than one solution.

Because constants are typed independently (previous section), and expressions (includ-

ing constants) have to be expressed in terms of constraints, it is necessary to dierentiate

constants in some way. For example, constants could be subscripted much as SSA vari-

ables are renamed, e.g. 10003 as has already been seen. The type of each version of the

constant 1000 needs to be treated as a separate variable.

Type constants (values for type variables which are constant) are common. For exam-

ple, an xor instruction implies integer operands and result; a sqrt instruction implies

oating point operands and result; an itof instruction implies integer operand and

oating point result. A library function call results in type constants for each parame-

ter (if any) and the result (if any).

Choosing a value for a type variable (in CSP terminology) is often implicit in the

constraint. For example, the add instruction of Figure 5.8, add r2a,r1b,r1c, results

in the constraints
164 Type Analysis for Decompilers

int f(struct A *x) Destination is last operand


{ int r=0; r0a, r0b are SSA versions of register
for (; x !=0; x = x->t1) r0
r += x->hd; t1a = type of r1a
return x; mem(n:t) means a pointer oset n
} bytes from pointer t

(a) Original C program (b) Legend

f: tf = t 0 → t 99
mov r0,r0a t 0 = t 0a
mov #0,r1a t 1a = int ∨ t 1a = ptr (α1 )
cmp #0,r0a t 0a = int ∨ t 0a = ptr (α2 )
beq L4F2
L3F2: mov φ(r0a,r0c),r0b t 0b = t 0a, t 0b = t 0c
mov φ(r1a,r1c),r1b t 1b = t 1a, t 1b = t 1c
ld.w 0[r0b],r2a t 0b = ptr (mem (0 : t 2a )
add r2a,r1b,r1c t 2a = ptr (α3 ), t 1b = int, t1c = ptr(α3 ) ∨
t 2a = int, t 1a = ptr (α4 ), t 1c = ptr (α4 ) ∨
t 2a = int, t 1b = int, t 1c = int
ld.w 4[r0b],r0c t 0b = ptr (mem (4 : t 0c ))
cmp #0,r0c t 0c = int ∨ t 0c = ptr (α5 )
bne L3F2
L4F2: mov φ(r1a,r1c),r1d t 1d = t 1a, t 1d = t 1c
mov r1d,r0d t 0d = t 1d
ret t 99 = t 0d

(c) SSA machine code (d) Constraints

Solving the above constraints leads to the following failure:


t 0c = t 0b = ptr (mem (4: t 0c )) = ptr (mem(0 : t 2a )
This is repaired by using struct G { t 2a m0; t 0cm4; ...} i.e.
t0c = ptr (mem (0 : t 2a, 4 : t 0c )) = ptr (struct G)
Continuing the process yields the following two solutions:
t 0 = t 0a = t 0b = t 0c = ptr (struct G)
t 99 = t 1a = t 1b = t 1c = t 1d = t 2a = t 0d = int
tf = ptr (struct G) →int
and the parasitic solution:
t 0 = t 0a = t 0b = t 0c = ptr (struct G)
t 2a = int
t 99 = t 1a = t 1b = t 1c = t 1d = t 2a = t 0d = ptr (α4 )
tf = ptr (struct G) → ptr (α4 )

Parasitic solutions are unlikely with larger programs, and Mycroft suggests ways to
avoid them in his paper.

Figure 5.8: A simple program fragment typed using constraints. From [Myc99].
5.5 Type Constraint Satisfaction 165

t 2a = ptr (α3 ), t 1b = int, t1c = ptr(α3 ) ∨


t 2a = int, t 1a = ptr (α4 ), t 1c = ptr (α4 ) ∨

t 2a = int, t 1b = int, t 1c = int

Here, the constraints are expressed as a disjunction ( or-ing) of conjuncts (some terms

and-ed together; the commas here imply and). Finding values for t 2a, t 1b, and t 1c

happen simultaneously, by choosing one of the three conjuncts. The choice is usually

made by rejecting conjuncts that conict with other constraints. Hopefully, at the end

of the process, there is one set of constraints that represents the solution to the type

analysis problem.

Some constraint solvers are based on constraint propagation (e.g. the forward check-

ing technique). Others rely more on checking for conicts (e.g. simple backtracking

or generate and test). Taken together, the above factors (small domain size, constants

and equates being common, and so on) indicate that constraint propagation will quickly

prune branches of the search tree that will lead to failure. Therefore, it would appear

that constraint propagation techniques would suit the problem of type constraint sat-

isfaction as it applies to decompilation better than the other group of techniques.

A weakness of constraint-based type analysis algorithms is that some such algorithms

are incomplete, i.e. a given algorithm may nd one or more solutions, or prove that the

constraints cannot be solved (prove they are inconsistent), or they may not be able to

do either.

5.5.1 Arrays and Structures


Constraint-based type analysis requires extra rules to handle arrays correctly.

Mycroft proposed a constraint based type analysis system for decompilation of machine

code programs (in Register Transfer Language form or RTL) [Myc99]. He generates

constraints for individual instructions, and solves the constraints to type the variables

of the program being decompiled. He assumes the availability and use of double register

addressing mode instructions to signal the use of arrays. For example, from Section 4

of his paper,

instruction generated constraint

ld.w (r5)[r0],r3 t0=ptr(array(t3)), t5 = int ∨ t0 = int, t5=ptr(array(t3))

However, some machine architectures do not support two-register indexing, and even if

they did, a compiler may for various reasons decide to perform the addition separately
166 Type Analysis for Decompilers

instruction generated constraint

t5 = ptr(α) ∧ t0 = int ∧ t1 = ptr(α) ∨


add r5,r0,r1 t5 = int ∧ t0 = ptr(α) ∧ t1 = ptr(α) ∨
t5 = int ∧ t0 = int ∧ t1 = int
ld.w 0[r1],r3 t1 = ptr(mem(0: t3) ∧
(t5 = int ∧ t0 = ptr(t3) ∨ t5 = ptr(t3) ∧ t0 = int)

Figure 5.9: Constraints for the two instruction version of the above. Example
from [Myc99].

to the load or store instruction. Hence, the above instruction may be emitted as two

instructions, with constraints as shown in Figure 5.9.

The last (underlined) conjunct for the rst instruction is immediately removed since r1
is used as a pointer in the second instruction. The nal constraints are now in terms

of ptr (t 3), rather than ptr (array (t 3)). These are equivalent in the C sense, but the

fact that an array is involved is not apparent. In other words, considering individual

instructions by themselves is not enough (in at least some cases) to analyse aggregate

types. Either some auxiliary rule has to be added outside of the constraint system,

or expression propagation can be used in conjunction with a high level type pattern.

(In Section 4.3 of his paper, Mycroft seems to suggest considering pairs of instructions

to work around this problem.) This is another area where expression propagation and

simplication as described in Chapters 3 and 4 could be used.

5.6 Addition and Subtraction

Compilers implicitly use pointer-sized addition instructions for structure member access,

leading to an exception to the general rule that adding an integer to a pointer of type

α* yields another pointer of type α*.


Pointer-sized addition and subtraction instructions are a special case for type analysis,

since they can be used on pointers or integers. Mycroft [Myc99] states the type con-

straints for the instruction add a,b,c (where c is the destination, i.e. c := a + b) to

be:

T (a) = ptr(α),T (b) = int, T (c) = ptr(α) ∨


T (a) = int, T (b) = ptr(α'), T (c) = ptr(α') ∨
T (a) = int, T (b) = int, T (c) = int

where again T (x) represents the type of x  (Mycroft uses tx ) and ptr(α) represents

a pointer to any type, with the type variable α representing that type. Mycroft uses
5.6 Addition and Subtraction 167

the C pointer arithmetic rule, where adding an integer to a variable of type α* will

always result in a pointer of type α*. However, the C denition does not always

apply at the machine code level. Compilers emit add instructions to implement the

addition operator in source programs, but also for two other purposes: array indexing

and structure member access. For array indexing, the C pointer rules apply. The base

of an array of elements of type α is of type α*, the (possibly scaled) index is of type

integer, and the result is a pointer to the indexed element, which is of type α*.
However, structure member access does not follow the C rule. Consider a structure of

type Σ with an element of type  at oset K. The address of the structure could be
type Σ* (a pointer to the structure) or type 0 * (a pointer to the rst element 0 of the

structure), depending on how it is used. At the machine code level, Σ and Σ.0 have

the same address and are not distinguishable except by the way that they are used. K is

added to this pointer to yield a pointer to the element , which is of type *, in general

a dierent type to Σ* or 0 *.

Unfortunately, until type analysis is complete, it is not known whether any particular

pointer will turn out to be a structure pointer or not. Figure 5.10 gives an example.

void foo(void* p) {
m[p] := ftoi(3.00); // Use p as int*
...
m[p+4] := itof(-5); // Use p as pointer to struct with oat at oset 4

Figure 5.10: A program fragment illustrating how a pointer can initially appear
not be be a structure pointer, but is later used as a structure pointer.

Initially, parameter p is known only to be a pointer. After processing the rst statement

of the procedure, p is used with type int*, and so T (p) is set to int*. In the second

statement (in general, any arbitrary time later), it is found that p points to a struct

with an int at oset 0 and a oat at oset 4. If the C rule was used that the sum of p
(then of type int*) and 4 results in a pointer of the same type as p, then in the second

statement p is eectively used as both type int* and type oat*. This could lead to p

being declared as a union of int* and oat*.


Note that the exception to the otherwise general C rule involves only the addition of a

structure pointer and a constant integer; compilers know the oset of structure members

and keep this information in the symbol table for the structure. Hence if the integer is

not a constant, the types of the result and the input pointer can be constrained to be

equal, i.e. Mycroft's constraints apply.

To avoid this problem, Mycroft's constraints could be rewritten as follows:

T (a) = ptr(α), T (b) = var-int, T (c) = ptr(α) ∨


168 Type Analysis for Decompilers

T (a) = void* , T (b) = const-int, T (c) = void* ∨


T (a) = var-int, T (b) = ptr(α'), T (c) = ptr(α') ∨

T (a) = const-int, T (b) = void* , T (c) = void* ∨


T (a) = int, T (b) = int, T (c) = int

Here, void*, represents a variable known to be a pointer, but the type pointed to is

unknown (output) or is ignored (input). Where there is a conict between void* and an
α* , α* would be used by preference. Note that this is already suggesting a hierarchy

of types, with void* being preferred to no type at all, but any α* preferred to void*.
Subtraction from or by pointers is not performed implicitly by compilers, so Mycroft-like

constraints for the pointer-sized subtract operation can be derived. For c := a - b :

T (a) = ptr(α),T (b) = int, T (c) = ptr(α) ∨


T (a) = ptr(α),T (b) = ptr(α), T (c) = int ∨
T (a) = int, T (b) = int, T (c) = int

5.7 Data Flow Based Type Analysis

Type analysis for decompilers where the output language is statically type checked can

be performed with a sparse data ow algorithm, enabled by the SSA form.

Types can be thought of as sets of possible values for variables. The smaller the set of

possible values, the more precise and useful the type information. These sets form a

natural hierarchy: the set of all possible values, the set of all possible 32-bit values, the

set of signed or unsigned 32-bit integers, the set of signed 32-bit integers, a subrange

of the signed integers, and so on. The eect of machine instructions is often to restrict

the range of possible values, e.g. from all 32-bit values to 32-bit integers, or from 32-bit

integers to unsigned 32-bit integers.

This eect suggests that a natural way to reconcile the various restrictions and con-

straints on the types of locations is to use an iterative data ow framework [MR90,

KU76]. The data that are owing are the type restrictions and constraints, and the re-

sults are the most precise types for the locations in the program given these restrictions

and constraints.

Data ow based type analysis has some advantages over the constraint-based type

analysis outlined in Section 5.5. Since there are no explicit constraints to be solved

outside the normal intermediate representation of the program, there is no need to

distinguish constants that happen to have coinciding values. Constraints are dicult to
5.7 Data Flow Based Type Analysis 169

solve, sometimes there is more than one solution, and at other times there is no solution

at all. By contrast, the solution of data ow equations is generally quite simple. Two

exceptions to this simplicity are the integer addition and subtraction instructions; as

will be shown, these are more complex than other instructions, but the complexity is

moderate.

5.7.1 Type Lattices


Since types are hierarchical and some type pairs are disjoint, the relationship between

types forms a lattice.

Types can be thought of as sets of possible values, e.g. the type integer could be thought
of as the innite set of all possible integers. In this chapter, the two views will be

used interchangeably. The set of integers is a superset of the set of unsigned integers
(counting numbers). There are more elements in the set of integers than the set of
unsigned integers, or integer ⊃ unsigned integer. The set of integers is therefore in a
meaningful way greater than the set of unsigned integers; hence there is an ordering

of types.

The ordering is not complete, however, because some type pairs are disjoint, i.e. they

cannot be compared sensibly. For example, the oating point numbers, integers, and

pointers are not compatible with each other at the machine code level, and so are

considered to be mutually incompatible (i.e. not comparable). None of these elementary

types implies more information than the others. To indicate that the ordering is partial,

the square versions of the usual set operator symbols are used: ⊂, ⊃, ∩, and ∪ become
<, =, u, and t respectively. Hence int = unsigned int or  int is a supertype of unsigned

int.
These types are incompatible at the machine code level despite the fact that the math-

ematical number 1027 is an integer, it is also a real number, and it could be used as the

address of some object in memory. Floating point numbers have a dierent bit repre-

sentation to integers, and are therefore used dierently. Pointer variables should only

be used by load and store instructions (including complex instructions incorporating

loads and/or stores), or compare/test instructions. Integer variables should similarly be

used by integer instructions (integer add, shift, bitwise or), and oating point variables

should only be used by oating point instructions (oating point add, square root, etc).

Two exceptions to this neat segregation of types with classes of instructions are the

integer add and subtract instructions. In most machines, these instructions can be used

on integer variables and also pointer variables. Section 5.6 discussed the implications

of this exception in more detail. It is the fact that objects of dierent types usually
170 Type Analysis for Decompilers

must be used with dierent classes of instructions that makes it so important that type

errors are avoided.

In decompilation, types additional to those used in computer languages are considered,

such as size32 (a 32-bit quantity, whose basic type is not yet known, yet the size is

known), or the type pointer-or-integer. These are temporary types that should be elimi-

nated before the end of the decompilation. As a contrived example, consider a location

referenced by three instructions: a 32-bit test of the sign bit, an integer add instruc-
tion, and an arithmetic shift right. Before the test instruction, nothing is known about

the location at all. The type system can assign it a special value, called > (top). >

represents all possible types, or the universal set U, or equivalently, no type information

(since everything is a member of U). After the test instruction, all that is known is

that it is a 32-bit signed quantity, so the type analysis can assign it the temporary type

signed32. Following the add instruction, it is known to be either a pointer or an integer:


pointer-or-int. An arithmetic shift right instruction applies only to integer variables, so
the location could be assigned the type int.

The type hierarchy so far can be considered an ordered list, with > at one end, and int
at the other end. It is desirable for the type of a location to move in one direction (with

one exception, described below), towards the most constrained types (those with fewer

possible values, away from >). It is also possible that later information will conict

with the currently known type. For example, suppose that a fourth instruction used

the location as a pointer. Logically, this could be represented by another special value

⊥ (bottom), representing overconstraint or type conict, or the empty set ∅ of possible

values. This could occur as a result of a genuine inconsistency in the original program

(e.g. due to a union of types, or a typecast), or from some limitation of the type analysis

system.

In practice, for decompilation, it is preferable never to assign the value ⊥ to the types

of variables; this is equivalent to giving up on typing the location altogether. Rather,

decompilers would either retain the current type ( int) or assign the new type (the

pointer), and conicting uses of the location would be marked in such a way that a

cast or union would be emitted into the decompiled output. (If a cast or union is not

allowed in the output language, a warning comment may be the best that can be done.

Some languages are more suitable as the output language of a decompiler than others.)

Continuing the example of the location referenced by three instructions, the list of types

can be represented vertically, with > and types size32, pointer-or-int, and int in that

order extending downwards, as shown in Figure 5.11(a).

Since another instruction (e.g. a square root instruction) in place of the add instruction
would have determined the location to be of type oat instead, there is a path from
5.7 Data Flow Based Type Analysis 171

No type information = T

size32 a b c

pointer-or-int float-or-int
d

pointer int float

T e
type conflict =

(a) (b)

Figure 5.11: Illustrating the greatest lower bound.

size32 to oat, and no path from int to oat, since these are incompatible.

So far, when a location is used as two types a and b, the lower of the two types in the

lattice becomes the new type for the location. However, consider if a location had been

used with types pointer-or-int and oat-or-int. (The latter could come about through

being assigned a value known not to be valid for a pointer). The resultant type cannot

be either of the types it has been used with so far ( pointer-or-int


oat-or-int), but or

should in fact be the new type int, which is the greatest type less than both pointer-or-int

and oat-or-int. In other words, the general result of using a location with types a and

b is the greatest lower bound of a and b, also known as the meet of a and b, written

a u b.

Note the similarity with the set intersection symbol ∩. The result of meeting types a

and b is basically a ∩ b where a, b, and the result are thought of as sets of possible

values. For example, if the current type is ?signed-int (integer of unknown sign), it could
be thought of as the set {sint, uint}. If this type is met with unsigned-int ({uint}), the

result is {sint, uint} ∩ {uint} = {uint} or unsigned-int.

Figure 5.11(b) shows why it is the greatest lower bound that is required. When meeting

types a and c, the result should be d or e, which are lower bounds of a and c. In fact,

it should be d, the greatest lower bound, since there is no justication (considering only

the meet operation of a and c ) for selecting the more specialised type e. For example,

if the result is later met with b, the result has to be d, since types b and e are not

comparable in the lattice (meaning that they are not compatible types in high level

languages).

Figure 5.12 shows a more practical type lattice, showing the relationship of the numeric
172 Type Analysis for Decompilers

types.

size64 size32 size16 size8

pointer or
int (pi

pointer int, short, char,


to ? ? sign ? sign ? sign

long long dou- pointer pointer signed unsigned signed unsigned signed unsigned
float
double long ble to class to other int int short short char char

Figure 5.12: A simplied lattice of types for decompilation.

Earlier it was mentioned that when considering the types of locations in a program, the

types do not always move from top to bottom in the lattice. The exception concerns

pointers to class objects in an object oriented language such as C++.

T = void*

Communicator Customer Communicator* Customer*

Sender* Receiver*
Sender Receiver GoldCustomer GoldCustomer*
Transceiver*
Transceiver T

(a) Class hierarchy (b) Class pointer lattice

Figure 5.13: Class pointers and references.

Consider the example class hierarchy and lattice shown in Figure 5.13. The similarity

between the class hierarchy and the hierarchy of pointers to those classes is evident. A

pointer could be assigned the type Sender* in one part of the program, and the type

Receiver* in another part. If there is a control ow merge from these two parts, the

pointer will have been used as both Sender* and Receiver*. The rules so far would result

in the type Transceiver*, which is a pointer to a type that has multiple inheritance

from classes Sender and Receiver. However, this may be overkill, if the program is

only referring to those methods and/or members which are common to Senders and
5.7 Data Flow Based Type Analysis 173

Receivers, i.e. methods and members of the common ancestor Communicator. Also,

multiple inheritance is relatively uncommon, so in many programs there is no class that

inherits from more than one class, and to generate one in the decompiled output would

be incorrect.

This example illustrates that sometimes the result of relating two types is higher up the

lattice than the types involved. In these cases, relating types α* and β* results in the

type (αtβ )*, where t is the join operator, and αtβ results in the least upper bound

of α and β. Note the similarity to the set union operator symbol ∪.

This behaviour occurs only with pointers and references to class or structure objects.

For pointers to other types of objects, the types of the pointer or reference and the

variable so referenced have to correspond exactly in type.

There is an additional dierence brought about by class and structure pointers and

references. In a copy statement such as a := b, if a and b are not such pointers or

references, then the type of a and b are both aected by each other; the type of both

is the type of a met with the type of b. However, with p := q where p and q are both

class or structure pointers or references, p may point to a larger set of objects than q,

and this broadening of the type of p does not aect q. Hence after such an assignment,

the type of p becomes a pointer to the join of p 's base type and q 's base type, but the

type of q is not aected. Only assignments of this form have this exception, and only

when the variables involved are pointers or references to classes or structures.

For the purposes of type analysis, procedure parameters are eectively assigned to by

all corresponding actual argument expressions at every call to the procedure. The

parameter could take on values from any of the actual argument expressions. For

parameters whose types are not class or structure pointers or references, the types of

all the arguments and that of the parameter have to be the same. However, if the type

of the parameter is such a pointer or reference, the type of the parameter has to be the

join of the types of all of the corresponding arguments.

For more detail on the subject of lattices, see [DP02].

5.7.2 SSA-Based Type Analysis


The SSA form links uses to denitions, allowing a sparse representation of type infor-

mation in assignment nodes.

In the beginning of Section 5.7, it was stated that the decompiler type recovery problem

can be solved as a data ow problem, just as compilers can implement type checking for

statically checked languages that way. Traditional data ow analysis, often implemented
174 Type Analysis for Decompilers

using bit vectors, can be used [ASU86]. However, this involves storing information

about all live variables for each basic block of the program (or even each statement,

depending on the implementation). Assuming that the decompiler will generate code for

a statically typed language, each SSA variable will retain the same type, so that a more

sparse representation is possible. Each use of an SSA location is linked to its statically

unique denition, so the logical place to store the type information for variables is in

the assignment statement associated with that denition. For parameters and other

locations which have no explicit assignment, an implicit assignment can be inserted

into the intermediate representation.

This framework is sparse in the sense that type information is located largely only where

needed (one type variable per denition and constant, instead of one per variable and

constant per basic block).

The SSA form of data ow based type analysis is not ow sensitive, in the sense that

the computed type is summarised for the whole program. However, since the type for

the location is assumed to be the same throughout the program, this is not a limitation.

Put another way, using the SSA form allows the same precision of result with a less

expensive ow insensitive implementation.

5.7.3 Typing Expressions


In a sparse intermediate representation, the types of subexpressions are not stored, so

these have to be calculated on demand from the expression leaves.

Early in the decompilation process, assignments are often in a simple form such as c :=

a b, where is a binary operator, c is a location, and a and b are locations or con-

stants. However, some instructions are more complex than this, and after propagation,

expressions can become arbitrarily complex. Figure 5.14 shows a simple example.

The leaves of the expression tree are always locations or constants

a[2000] + s.b (except for the destination of an assignment, which is always a lo-

cation). Locations and constants have type variables associated

with them in a sparse type analysis system for a decompiler. How-


[] . ever, there is no place to store the type for a subexpression, such

as a[2000] or s.b, unless all subexpressions store a type and the


a 2000 s b
sparseness is lost.

Figure 5.14: Typing a simple expression.

In the example, the only type information known from other statements is that a is

an array of character pointers (i.e. a currently has type char*[]). Type analysis for the
5.7 Data Flow Based Type Analysis 175

expression starts at the bottom of the expression tree in a process which will be called

ascend-type. In this example the algorithm starts with subexpression a[2000]. The

array-of operator is one of only a few operators where the type of one operand (here

a, not 2000 ), if known, aects the result, and the type of the result, if known, aects

that operand. Since the type of a is known, the type of a[2000] can be calculated; in

this case it is char*. This subexpression type is not stored; it is calculated on demand.

The integer addition operator is a special case, where if one operand is known to be a

pointer, the result is a pointer type, because adding a pointer and an integer results in

a pointer. Hence the type of the overall expression is calculated to be void*. (Adding

an integer to a char* does not always result in another char*, hence the result has type

void*). Again, this type is not stored. No type information is gained from s, b, or s.b,

since the types of s and b are currently unknown.

Next, a second phase begins, which will be called descend-type. Now type information

ows down the expression tree, from the root to the leaves. In order to nd the type

that is pushed down the expression tree to the right of the addition operator, ascend-

type is called on its left operand. This will result in the type char*, as before. This

type, in conjunction with the type for the result of the addition, is used to nd the type

for the right subexpression of the add. Since pointer plus integer equals pointer, the

type found is integer. The structure membership operator, like the array-of operator,

can transmit type information up or down the expression tree. In this case, it causes

the type for b to be set to integer, and the type for s to be set to a structure with an

integer member.

When the process is repeated for the left subexpression of the add node, the result is

type void*, which implies a type void*[ ] for a. However when this is met with the more
precise existing type char*[ ], the type of a remains as char*[ ]. The type for the constant

2000 is set to integer (array indexes are always integer).

In the example above, the initial type for location a came from type information else-

where in the program. In contrast to locations, constants have no connection to other

parts of the program, as shown in Section 5.4. As a result, constants are typed only by

the second phase, descend-type.

In general, type information has to be propagated up the expression tree, then down

again, in two separate passes.

Table 5.1 shows the type relationships between operands(s) and results for various

operators and constants. Most operators are in the rst group, where operand and

result types are xed. For the other, less common operators and constants, the full

cycle of ascend-type and descend-type is required.

In the example of Figure 5.14, the type for a[2000] was calculated twice; once during
176 Type Analysis for Decompilers

Table 5.1: Type relationships for expression operators and constants.

Operators Type relationships

*, /, itof, < etc Fixed types for operands and results


memory-of, address-of, Type information ows up or down the tree
array-of, structure membership
Type information ows up the tree (from the
?: (ternary operator) related operands) and down (from the boolean
operand)
Types for any two operands implies the type for
integer + and - the other; if one operand is a pointer, it implies
types for the others
Types ow down to constants, but constraints
constants such as cannot be oat or cannot be pointer
could ow up the tree.

ascend-type, and again for descend-type. For more complex expressions, descend-type

may call ascend-type on a signicant fraction of the expression tree many times. Figure

5.15 (a) shows a worst-case example for an expression with four levels of binary oper-

ators. At the leaves, checking the type of the location (if present) could be considered

one operation. This expression tree has 16 leaf nodes, for a total of 16 operations. One

level up the tree, the type of the parent nodes is checked, using information from the

two child nodes, for a total of three operations. There are eight such parent nodes, for

a total of 24 operations at this level. Similarly, at the top level, 31 operations (almost

2h+1 where h is the height of the expression tree) are performed by descend-type.

1x31
1x40
2x15
3x13
h=4 4x7 h=3
9x4
8x3
27x1
16x1
(a) (b)

Figure 5.15: Complexity of the ascend and descend type algorithms.

Figure 5.15(b) shows a tree with all ternary operators (e.g. the C ?: operator). Such

a tree would never be seen in a real-world example, but it illustrates the worst-case

complexity of the descend-type algorithm. Here, of the order of 3h+1 operations are

performed. This potential cost osets the space savings of storing type information

sparsely (only at denitions, not uses).


5.7 Data Flow Based Type Analysis 177

5.7.4 Addition and Subtraction


The types inferred from pointer-sized addition and subtraction instructions require spe-

cial type functions in a data ow based type analysis.

Section 5.6 showed a modication of Mycroft's constraints that take into account the

exception caused by adding constants to structure pointers. In order to represent My-

croft's constraints in a data ow based analysis, some extra types and operators are

required. Let π pointer-or-integer or higher in the lattice. It is hoped that all occur-

rences of π will eventually be replaced with a lower (more precise) type such as integer
or a specic pointer type. The integer is understood to be the same size as a pointer

in the original program's architecture. In the lattice of Figure 5.12, π could have the

values pointer-or-integer, size32, or >. The data ow equations associated with add a,
b, c , (i.e. c=a+b), with the structure pointer exception, can be restated as:

T (a) = Σa (T (c), T (b)) (5.1)

T (b) = Σa (T (c), T (a)) (5.2)

T (c) = Σs (T (a), T (b)) (5.3)

where Σa and Σs (a stands for addend or augend, s for sum) are special functions

dened as follows:

T (c) =

Σa β* var-int var-π const-π

α* var-int ⊥ var-int var-int

T (other) =var-int β* int var-π var-π

const-int void* int var-π -

var-π π π var-π varπ

const-π π π var-π -

T (a) =

Σs α* var-int const-int var-π const-π

β* ⊥ β* void* β* void*

T (b) = var-int α* int int var-π var-π

const-int void* int int var-π -

var-π α* var-π var-π var-π var-π

const- π void* varπ - var-π -

For brevity, ptr(α) is written as α*. The type variables T (a), T (b), and T (c) are

initialised to var-π or const-π if the operands are locations or constants respectively.


178 Type Analysis for Decompilers

As an example, consider p = q+r with q known to have type char*, and the type of r
is wanted. Since the type of p is not known yet, the type of p remains at its initial

value of var-π . p, q, and r are substituted for c, a, and b respectively of equation 5.2.

This equation uses function Σa , dened above in the left table. Since T (c) = var-π ,

the third column is used, and since T (other) = α* with α = char, the rst row is used.

The intersection of the third column and rst row contains var-int, so T (b) = T (r) =

int.

Similarly, the data ow equations associated with sub a, b, c (i.e. c=a-b) can be

restated as:

T (a) = ∆m (T (c), T (b)) (5.4)

T (b) = ∆s (T (c), T (a)) (5.5)

T (c) = ∆d (T (a), T (b)) (5.6)

where ∆m , ∆s and ∆d (m stands for m inuend (item being subtracted from), s for

s ubtrahend (item being subtracted), and d for d ierence (result)) are special functions

dened as follows:

T (c) = T (c) =

∆m β* int π ∆s β* int π
α* ⊥ α* α* α* int α* π
T (b) = int β* int π T (a) = int ⊥ int int

π β* int π π int π π

T (a) =

∆d α* int π
β* int ⊥ int

T (b) = int α* int π


π π int π
As noted earlier, compilers do not use subtraction for referencing structure members,

so this time there is no need to distinguish between var-int and const-int, or var-π and

const-π .

5.8 Type Patterns

A small set of high level patterns can be used to represent global variables, local variables,

and aggregate element access.


5.8 Type Patterns 179

A sequence of machine instructions for accessing a global, local, or heap variable, array

element, or structure element will in general result in a memory expression which can

be expressed in the following normal form:

n
h i
Sj ∗ iej + K]
P
m[ sp0 pl + (5.7)

j=1

where

• m[x ] represents memory at address x .


• All but one of the terms of the sum inside m[...] could be absent.
h i
• a b indicates optionally a or b but not both.

• sp0 represents the value of the stack pointer register at the start of the procedure.
• pl is a nonconstant location used as a pointer (although nonconstant, it could

take one of several constant values at runtime).

• The Sj are scaling constants, some of which could be unity.

• Here ∗ represents multiplication.

• The iej are nonconstant integer expressions that do not include the stack pointer

register, and are known not to be of the form x+C where x is a location and C is a
constant. Constants could appear elsewhere in an iej , e.g. it could be 4*r1*r2.

• K is a constant (an integer or pointer constant).

Where necessary, the distributive law is applied, e.g. m[4*(r1+1000)] is transformed

to m[r1*4 + 4000] before attempting to match to the above equation. Expression prop-

agation, simplication, and canonicalisation, as discussed in Chapter 3, can be used to

prepare the intermediate representation for high level pattern matching.

The sum of the terms inside the m[...] must be a pointer expression by denition.

Pointers cannot be added, adding two or more integers does not result in a pointer, and

the result of adding a pointer and an integer is a pointer. It follows that exactly one of

the terms is a pointer, and the rest must be integers. Since sp0 is always a pointer, sp0
and pl cannot appear together, and if sp0 or pl are present, K cannot be a pointer.

It could be argued that since pl or K could be negative, all three of sp0 , pl, and K
could be present, with two pointers being subtracted from each other, resulting in a
180 Type Analysis for Decompilers

constant. However, such combinations would require the negation of a pointer, which

is never seen in real programs.

Initially, it may not be possible to distinguish pl from an iej with Sj =1, so temporary
expressions such as m[l1 + K] or m[l1 + l2 + K] may be needed, until it becomes clear
which of l1 , l2 and K is the pointer.

When present, sp0 indicates that the memory expression represents a stack allocated

variable, array element, or structure element.

Table 5.2: Type patterns and propositions discussing them.

Pattern Proposition Pattern Proposition

iej present 5.8 m[l1 +l2 +K] 5.18


K present 5.9 m[ie+pl+K] 5.19
m[K] 5.10 m[ie1 +ie2 +K] 5.20
m[sp0 +K] 5.12 m[ie+pl] 5.21
m[l+K] 5.13 m[S1 *ie1 +...+Sn *ien +K] 5.22
m[ie+K] 5.14 m[S1 *ie1 +...+Sn *ien +pe+K] 5.23
m[pl+K] 5.15 m[S1 *l1 +...+Sn *ln +K] 5.24
m[sp0 +ie+K] 5.17 other 5.25

Table 5.2 shows a range of patterns that may be found in the intermediate representation

after propagation and dead code elimination, and the propositions which discuss them

over the next several pages. It is evident that there are a lot of possible patterns, and

distinguishing among them is not a trivial task. Most of this distinguishing is left as

future work.

Proposition 5.8: (iej present) Adding a non-constant integer expression to a pointer

implies an array access.

An array is an aggregation of uniform elements, and can be indexed. A structure

is an aggregation of any kind of elements, not necessarily identical in type, so that

indexing is not possible. Arrays can therefore be indexed with a constant or variable

index expression; index expressions are of integer type. Structure elements can only be

accessed with constant osets.

The only place where a non-constant integer expression can appear is therefore as an

array index. Hence, when present, the iej indicate array indexing, and the overall

memory expression references an array element. For an array element access with m

dimensions n of which are non-constant, there will be at least n such terms. (One or

more index expressions could be of the form j+k where j and k are locations, hence n

is a minimum.)
5.8 Type Patterns 181

For single dimension arrays whose elements occupy more than one byte, there will be

a scale involved. The scaling may be implemented as a shift operation, an add to

self, or as a multiply by a constant, but canonicalisation of expressions will change all

these into multiply operations. Multidimensional arrays are usually implemented as

arrays of subarrays, so the higher order index expressions are scaled by the size of the

subarray. The sizes of arrays and subarrays are constant, so the Sj will be constant for

any particular array. Variations in Sj between two array accesses indicate either that

dierent arrays are being accessed, or that what appears to be scaling is in at least one

case part of the index expressions.

Proposition 5.9: : (K present) When present, the constant K of Equation 5.7 could

represent the sum of the following:

(a) The address of a global array, global structure, or global variable (sp0 and pl
not present). This component of K may be bounded: the front end of the decom-

piler may be able to provide limits on addresses that fall within the read-only or

read/write data sections.

(b) The (possibly zero) oset from the initial stack pointer value to a local variable,

array, or structure (sp0 present).

(c) One or more possibly scaled constant array index(es).

(d) Constants arising from the constant term(s) in array index expressions. For

example, if in a 10×10 array of 4-byte integers, the index expressions are a*b+4
and c*d+8, the expression for the oset to the array element is (a*b+4)*40 +
(c*d+8)*4, which will canonicalise to 40*a*b + 4*c*d + 192. In the absence of
other constants, K will then be 192, which comes in part from the constants 4 and

8 in the index expressions, as well as the size of the elements (S2 =4) and the size

of the subarray (S1 =40).

(e) Osets arising from the lower array bound not being zero (for example, a: array
[-20 .. -11] of real). Where more than one dimension of a multidimensional

array has a lower array bound that is non zero, several such osets will be lumped

together. In the C language, arrays always start at index zero, but it is possible

to construct pointers into the middle of arrays or outside the extent of arrays, to

achieve a similar eect, as shown in Figure 5.16. Note that in Figure 5.16(b), it is

possible to pass a0+100 to another procedure, which accesses the array a0 using

indexes ranging from -100 to -91.

(f ) Structure member oset of a variable, array, or structure inside a parent structure.

Where nested structures exist, several structure member osets could be lumped
182 Type Analysis for Decompilers

together to produce the oset from the start of the provided object to the start of

the structure member involved. For example, in s.t.u[i], K would include the

osets from the start of s to the start of u, or equivalently the sum of the osets

from the start of s to the start of t and the start of t to the start of u.

var a:array[-20 .. -11] of double a0[10], *a =


real; a0+20;
a[-20] := 0; a[-20] = 0;
process(a); process(a);
(a) Pascal language. (b) C language.

Figure 5.16: Source code for accessing the rst element of an array with a
nonzero lower index bound.

Many combinations are possible; e.g. options (d) and (f ) could be combined if an array

is accessed inside a structure with at least one constant index, e.g. s.a[5] or s.b[5, y]
where s is a structure, a is an array, and b is a two-dimensional array.

The iej terms represent the variable part of the index expressions, with the constant

part of index expressions split o as part of K, as shown above at option 5.9(d). To save

space, statements such as  ie represents the variable part of the index expression will

be shortened to  ie represents the index expression, with the understanding that the

constant parts of the index expressions are actually lumped in with other terms into K.
It is apparent that many patterns could be encountered in the IR of a program to be

typed, and these make dierent assertions about the types of the various subexpressions

involved. The following propositions summarise the patterns that may be encountered.

Proposition 5.10: m[K] represents a global variable, a global structure member, or

a global array element with constant index(es), possibly inside a global structure.

K is the sum of options 5.9(a) and 5.9(c)-(f ). As noted in section 3.4.4 on page 83,

it is assumed that architectures where global variables are accessed as osets from a

register reserved for this purpose, that register is initialised with a constant value by

the decompiler front end to ensure that all global variable accesses (following constant

propagation) are of this form. Since this pattern can represent either a global variable,

structure element, or an array element with xed oset(s), elementary types can be

promoted to structure or array elements. To t this into the lattice of types concept, this

observation could, by abusing terminology slightly, be expressed as ξ (array(int)) v int.


The notation ξ (array(α)) here denotes an element of an array whose elements are of

type α. This relation could be read as the type `element of an array of int' is a subtype
of or equal to type `variable of int'  (the former, occuring less frequently, is in a sense
5.8 Type Patterns 183

more constrained than the latter). The same applies for structure elements; an element

of a structure σ whose type is β is written ξ (σ )(β ), and where the structure type is

not known or not specied, as ξ (structure-containing-β ). These thoughts lead to the

following proposition:

Proposition 5.11: ξ (array( α)) v α, and ξ (structure-containing-α) v α.

The above proposition will be improved below. It should be noted that the types α
and ξ (array(α)) are strictly speaking the same types, so the v relation is really =

(equality), but expressing the various forms of α in this way allows these forms to

become part of the lattice of types. When various triggers are found (e.g. K is used as a
pointer to an array elsewhere in the program), the type associated with a location can

be moved down the lattice (e.g. from α to ξ (array(α))). This makes the handling of

array and structure members more uniform with the overall process of type analysis.

Proposition 5.12: m[sp0 + K] represents a stack-based local variable, local structure

member, or local array element with constant index(es), possibly inside a local structure.

K represents the sum of options 5.9(b)-(f ).

Proposition 5.13: m[l + K] represents an aggregate (array or structure) element ac-

cess.

Here, l is a location, which could be an ie representing an unscaled array index, or a pl


representing a pointer to an aggregate. If l is used elsewhere as an integer or a pointer,

this expression can be rened to one of the following patterns. Since l+K is used as a

pointer, and adding two terms implies that one of the two terms is an integer and the

other a pointer, then if the value of K is such that it cannot be a pointer, K must be

an integer and l a pointer. As pointed out in Section 5.4, constants are independent,

so other uses of the same constant elsewhere do not aect whether l or K is a pointer.
Hence, the only factors aecting whether l or K is the pointer in m[l + K] are whether
l is used elsewhere as a pointer or integer, and whether K has a value that precludes it

from being a pointer.

Proposition 5.14: m[ie + K] represents a global array element access.

Here ie is known to be used as an integer. K represents the sum of options 5.9(a)

and 5.9(c)-(f ), and ie is the possibly scaled index expression. There could be

other index expressions for a multidimensional array, all constant.


184 Type Analysis for Decompilers

Proposition 5.15: m[pl + K] represents a structure element access or array ele-

ment access with constant index(es).

Here pl is a pointer to the array or structure (global, local, or heap allocated), and
K represents the sum of options 5.9(c)-(f ). For example, m[pl + K] could represent
s.m where s is a variable of type structure (represented by pl), and m is a member
of s (K represents the oset from the start of s to m). It could also represent a[C],

where a is a variable of type array (represented by pl), and C is a constant with


K
the value . It is not possible to distinguish these two representations
sizeof(a[0])
unless a nonconstant array access is found elsewhere.

struct {
COLOUR c1;
COLOUR c2;
COLOUR c3;
} colours; COLOUR colours[3];
colours.c1 = Red; colours[1] = Red;
colours.c2 = Green; colours[2] = Green;
colours.c3 = Blue; colours[3] = Blue;
process(colours.c2); process(colours[2]);
(a) Structure. (b) Array.

Figure 5.17: Equivalent programs which use the representation m[pl + K].

The two representations for m[pl + K] are equivalent, as shown in Figure 5.17. Hence,

neither representation is incorrect. The structure representation seems more natural,

hence this pattern could be considered a structure reference unless and until an array

reference with a nonconstant index is found. In either case, since an array reference

with a nonconstant index could be found elsewhere at any time, the array element is in

a sense more constrained than the structure element. Again abusing terminology, this

can be expressed as ξ (array(α)) v ξ (structure containing α.


Combining this with Proposition 5.11 and noting that an array could be contained in

a structure results in the following proposition:

Proposition 5.16:
ξ (structure-containing-ξ (array(α))) v ξ (array(α)) v ξ (structure-containing-α) v α.

This leads to the lattice fragment shown in Figure 5.18.

The above example illustrates the point that the lattice of types for decompilation is

based at least in part not on the source language or how programs are written, but on

how information is uncovered during type analysis.


5.8 Type Patterns 185

!(structure-containing- )

!(array of )

!(structure-containing-!(array of ))

Figure 5.18: A type lattice fragment relating structures containing array ele-
ments, array elements, structure members, and plain variables.

Proposition 5.17: m[sp0 + ie + K] represents a local array element access.

ie is the index expression, and K lumps together options 5.9(b)-(f ).

There is no pattern m[sp0 + pl + K] since sp0 and pl are both pointers, and pointers are
assumed never to be added.

The pattern m[l1 + l2 + K] where l1 and l2 are locations is interesting because both

locations could match either ie or pl. Until other uses of l1 or l2 are found that

provide type information about the locations, it is not known which of the locations or

K is a pointer.

Proposition 5.18: m[l1 + l2 + K] represents an array element access. If either l1 or

l2 is used elsewhere as a pointer, or K has a value that cannot be a pointer and either

l1 or l2 is used elsewhere as a pointer or integer, this expression can be rened to the

following pattern:

Proposition 5.19: m[ie + pl + K] represents an array element access.


ie is an index expression, pl is a pointer to the array (global, local, or heap

allocated), and K represents the sum of options 5.9(c)-(f ).

If both l1 and l2 are used elsewhere as integers, then Proposition 5.18 can be rened

to the following pattern:

Proposition 5.20: m[ie1 + ie2 + K] represents a global array element access.

The non-constant index expression is ie1 + ie2 , and K represents the sum of op-

tions 5.9(a) and (c)-(f ).

Another special case of the pattern in Proposition 5.18 is when K=0. One of l1 and

l2 must be an integer, and the other a pointer. From proposition 5.8, this implies that

the integer expression represents array indexing. If one of l1 or l2 are used elsewhere

as a pointer or integer, this pattern can be rened to the following pattern.


186 Type Analysis for Decompilers

Proposition 5.21: m[ie + pl] represents an array element access.

ie is the possibly scaled index expression, and pl is a pointer to the array.

Proposition 5.22: m[S1 *ie1 + S2 *ie2 + ... + Sn *ien + K] represents an m-dimension-

al global array element access. Here the iej are the index expressions, S 1 ..Sn are

scaling constants, and K lumps together options 5.9(a) and 5.9(c)-(f ). If there are

index expressions with more than one location, then m could be less than n. If there

are constant index expressions, then m could exceed n. These two factors could cancel

out or not be present, in which case m = n.

Proposition 5.23: m[pl + S1 *ie1 + S2 *ie2 + ... + Sn *ien + K] represents an m-dimen-

sional array element access.

Here pl points to the array or structure containing the array, the iej are the index

expressions, S1 .. Sn are scaling constants, and K lumps together options 5.9(a) and

5.9(c)-(f ).

Proposition 5.24: m[sp0 + S1 *l1 + S2 *l2 + ... + Sn *ln + K] represents an m-dimensional

local array element access.

Here the li are the index expressions, S1 ..Sn are scaling constants, and K lumps together

options 5.9(b)-(f ).

Proposition 5.25: Other expressions are assumed to be indirections on pointer ex-


pressions.

Examples in the C language include (*p)[i] and *(ptrs[j]), where the above expres-
sions have been applied to part of the overall expression to yield the array indexes.

5.9 Partitioning the Data Sections

Decompilers need a data structure comparable to the compiler's symbol table (which

maps symbols to addresses and types) to map addresses to symbols and types.

Section 5.2.3 on page 155 noted that aggregate types are usually manipulated with a

series of machine code instructions, rather than individual instructions. For example, to

sum the contents of an array, a loop is usually employed. If a structure has ve elements

of elementary types, there will usually be at least ve separate pieces of code to access

all the elements. In other words, aspects of aggregate types such as the number of

elements and the total size emerge as the result of many instructions, not individual

instructions. This contrasts with the type and size of elementary types, which are

usually available from a single instruction.


5.9 Partitioning the Data Sections 187

Hence a data structure is required to build up the picture of how the data section is

composed. This process could be thought of as partitioning the data section into the

various variables of various types. This data structure is in a sense the equivalent of the

symbol table in the compiler or assembler which allocated addresses to data originally.

In a compiler or assembler, the symbol table is essentially a map from a symbolic name

to a type and a data address. In a decompiler, the appropriate data structure, which

could be called a data map, is a map from data address to a symbolic name and type.

At least two address spaces need to be considered: the global address space, containing

global variables, and the stack local address space, containing local variables. There

is also the heap address space, where variables, usually aggregates, are created with a

language keyword such as new or a call to a heap allocating library function such as

malloc. However, allocations for heap objects are usually for one object at a time, and

the addresses by design do not overlap. A separate data map would be used for each

address map.

5.9.1 Colocated Variables


Escape analysis is needed to determine the validity of separating colocated variables.

As a space saving optimisation, compilers may allocate more than one variable to the

same address. Once this has been determined to be safe, such colocation has little cost

for a compiler; it merely has some entries in the symbol table that have the same or

overlapping values for the data addresses. For a decompiler, however, the issue is more

complex. The data address does not uniquely identify a data map entry, as shown in

Figure 5.19.

D D D D D

U U D o
U D = definition
D o
U
D U = use
U D
D
o o = phi-function
o U
U U
U
U U U
(a) (b) (c) (d)

Figure 5.19: Various congurations of live ranges for one variable.

In Figure 5.19(a) and (b), although there are multiple denitions for the variable, the
188 Type Analysis for Decompilers

denitions and uses of the variable are united by φ-functions. In cases (c) and (d), there

is more than one live range for the variable, made obvious by the breaks in data ow

from one live range to the next. Where there is more than one live range for a variable

and the types of the variable are dierent in the live ranges, two or more variables must

be emitted, as the compiler has clearly colocated unrelated variables, and each named

variable can have only one type.

Although more than one name must be generated, there is still the option of uniting the

addresses of those names (e.g. with a C union), or separating them out as independent

variables, which the compiler of the generated code could assign to dierent addresses.

In cases where the types for the various live ranges agree, however, it is not possible to

decide in general whether the compiler has colocated unrelated variables that happen

to have the same type, or if the original source code used one variable in several live

ranges. An example of where the latter might happen is where an array index is re-

used in a second loop in the program. Always separating the live ranges into multiple

variables could clutter the generated source code with excess variable declarations.

Never separating the live variables into separate variables could result in confusion

when the variable is given a meaningful name. The name that is meaningful for one

live range could be incorrect for other live ranges. This is a case where an expert user

may protably override the default decompiler behaviour. Despite the potential for

lessened readability, there is no case here where incorrect output is unavoidable.

Sometimes the intermediate representation will eectively be taking the address of a

variable or aggregate element. At the high level, this may be explicit, as with the &
unary operator in C, or implicitly, as in Java when an object is referenced (references

are pointers at the bytecode level). This taking of the address may be far away from the

eventual use of that variable or aggregate element. The type patterns for the address

of a variable or aggregate element are as per Section 5.8, but with the outer m[. . .]
operation removed. In some cases, these will be very common expressions such as K or

ie + K. Clearly, not all such patterns represent the address of a global variable, or the

address of an aggregate element.

These patterns are therefore employed after type analysis, and are only used when type

analysis shows the pattern (excluding the m[. . .]) is a pointer type. Obviously, the

patterns of Section 5.8 with the m[. . .] guarantee that the inner expression is used as a
pointer.

When the address of a variable is taken, the variable is referenced, but it is not directly

dened or used. Data ow information ultimately comes from denitions or uses, but

when the reference is passed to a library function, the usage information is usually con-

densed to constant reference or non-constant reference of a specied type. A constant


5.9 Partitioning the Data Sections 189

reference implies that the location being referenced is used but not dened. A non

constant reference could imply various denition and use scenarios, so the conservative

summary is may dene and may use. The purpose of the library function may carry

more data ow information than its prototype. For example, the this parameter of

CString::CString() (the constructor procedure for a string class) does not use the
referenced location (*this) before dening it. This extra information could help sepa-

rating the live ranges of variables whose storage is colocated with other objects. While

it may be tempting to use types to help separate colocated variables, the possibility

that the original program used casts makes types less reliable for this purpose. In the

case of the CString constructor, a new live range is always being started, but this fact

is not captured in the prototype for the constructor procedure.

int i;
void (*p)(void);
char c, *edi;
mov $5,-16(%esp) ; Define i=5;
print -16(%esp) ; Use as int print(i);
lea -16(%esp),%edi ; Take the address, → edi edi = &c;
call proc1 proc1();
... ...
mov $proc2,-16(%esp); Define p = proc2;
call -16(%esp) ; Use as proc* (*p)();
... ...
mov $'a',-16(%esp) ; Define c='a';
putchar -16(%esp) ; Use as char putchar(c);
... ...
process((%edi)) ; Use saved address (as char) process(*edi);

(a) Machine code (b) Decompiled code

Figure 5.20: A program with colocated variables and taking the address.

Taking the address of a variable that has other variables colocated can cause an ambi-

guity as to which object's address is being taken. For example, consider the machine

code program of Figure 5.20(a). The location m[esp-16] has three separate variables
sharing the location. Suppose that it is known that proc1 and proc2 preserve and do

not use edi. The denition of m[esp-16] with the address of proc2 completely kills

the live range of the integer variable, yet the address taken early in the procedure turns

out to be used later in the program, at which point only the char denition is live.

Denitions of m[esp-16] do not kill the reach of the separate variableedi, which
esp-16. It is the interpretation of what
continues to hold the value esp-16 represents
that changes with denitions of m[esp-16], from &i to &proc2 to the address of a

character. Type information about edi is linked to the data ow information about
190 Type Analysis for Decompilers

m[esp-16] by the address taking operation.

In this case, if there were no other uses of edi and no other instructions took the

address of m[esp-16], the three variables could be separated in the decompiled output,
as shown in Figure 5.20(b). However, if the address of m[esi-16] escaped the procedure

(e.g. edi is passed to proc1), this separation would not in general be safe. For example,

it may not be known whether the procedure the address escapes to (here proc1) denes

the original variable or not. If it only used the original variable, then it would be used as

an integer, and the reference is to i in Figure 5.20(b). However, it could dene and use

the location with any type. Finally, it could copy the address to a global variable used

by some procedure called before the variable goes out of scope. In this case, the type

passed to proc1 depends on which of the colocated variables is live when the location

is actually used. (The compiler would have to be very smart to arrange this safely, but

it is possible.) Hence if the address escapes, the conservative approach is to declare

the three variables as a union, just as the machine code in eect does. Such escape

analysis is commonly performed by compilers to ensure the validity of optimisations;

decompilers require it to ensure the validity of separating colocated variables.

Ifproc1 in the example of Figure 5.20 was a call to a library function to which

m[esp-16] was a non-constant reference parameter, the same ambiguity arises. The

type of the parameter will be a big clue, but because of the possibility of casts, using

type information may lead to an incorrect decompilation. Hence, while library functions

are an excellent source of type information, they are not good sources of the data ow

information that can help separate colocated variables. This could be a case where it

is useful to at least partially decompile a library function.

Colocated variables are a situation where one original program address represents more

than one original variable. Figure 1.6 on page 20 showed a program where three arrays

are accessed at the machine code level using the same immediate values (one original

pointer and two oset pointers). This is a dierent situation, where although the three

variables are located at dierent addresses, the same immediate constant is used to refer

to these with indexes of dierent ranges. It illustrates the problem of how the constant

K of Equation 5.7 could have many components, in particular that of Proposition 5.9(e).

Figure 5.21 shows the declation of three nested structures. The address of the rst ele-

ment could refer to any of large, large.medium, large.medium.small, or

large.medium.small.a. The types of each of these is distinct, so that type analy-

sis can be used to decide which of the possible references is required.


5.10 Special Types 191

struct {
struct {
struct {
int a;
...
} small;
...
} medium;
...
} large;
Figure 5.21: Nested structures.

5.10 Special Types

A few special types are needed to cater for certain machine language details, e.g. up-
per(oat64).
In addition to the elementary types, aggregates, and pointers, a few special types are

useful in decompilation. There is an obvious need for no type information (> in a type

lattice, or the C type void), and possibly overconstraint (⊥ in a type lattice). Many

machines implement double word types with pairs of locations (two registers, or two

word sized memory locations). It can therefore be useful to deneupper(τ ) and lower(τ )
where τ is a type variable, to refer to the upper or lower half respectively of τ . For

larger types, these could be combined, e.g. lower(upper(oat128)) for bits 64 through

95 of a 128-bit oating point type. For example, size32 would be compatible with

upper(oat64). Pairs of upper and lower types can be coalesced into the larger type
in appropriate circumstances, e.g. where a double word value is passed to a function

at the machine language level in two word sized locations, this can be replaced by one

variable (e.g. of type double).

5.11 Related Work

Most related work is oriented towards compilers, and hence does not address some of

the issues raised by machine code decompilation.

Using iterative data ow equations to solve compiler problems has been discussed by

many authors, starting with Allen and Cocke [All70, AC72] and also Kildall [Kil73].

The theoretical properties of these systems were proved by Kam and Ullman [KU76].

Khedker, Dhamdhere and Mycroft argued for a more complex data ow analysis frame-

work, but they attempted to solve the more dicult problem of type inferencing for

dynamically typed languages [KDM03].


192 Type Analysis for Decompilers

Guilfanov [Gui01] discussed the problem of propagating types from library functions

through the IR for an executable program. However, he did not attempt to infer types

intrinsic in instructions.

Data ow problems can be solved in an even more sparse manner than that enabled

by the SSA form, by constructing a unique evaluation graph for each variable [CCF91].

However, this approach suers from the space and time cost of generating the evaluation

graph for each variable, and some other tables required by this framework.

Guo et al. reported on a pointer analysis for assembly language, which they suggested
+
could be extended for use at run-time [GBT 05]. In their treatment of addition in-

structions, they assumed that the address expressions for array and structure element

accesses will be of the form r2 + i × l + c, where i is a non-constant (index) value,

l is the size of the array elements, and c is a constant. This contrasts with the more

complex expression of Equation 5.7. The latter is more complex mainly to express

multidimensional arrays, e.g. a[x][y] m[r1 * 40 + r2 * 4] + 8. However, the


as e.g.

latter could be normalised slightly dierently to m[(r1 * 10 + r2) * 4] + 8, representing

a[x*10+y], where 10 is the number of elements in the row. It seems possible that ma-
chine code arrays could be analysed like this, and converted back to a[x][y] or a[x,

y] format at a later stage.

5.12 Future Work

While good progress has been made, much work remains before type analysis for machine

code decompilers is mature.

Most of the ideas presented in this chapter have been at least partially implemented in

the Boomerang decompiler [Boo02]. The basic principles such as the iterative data ow

based solution to the type analysis problem work well enough to type simple programs.

However, experimental validation is required for several of the more advanced aspects

of type analysis, including

• the more unusual cases involving pointer-sized add and subtract instructions (Sec-

tion 5.7.4 on page 177);

• splitting the value of K into its various possible origins (Proposition 5.9 on

page 181), which will probably require range analysis for pointers and index vari-

ables;

• separating colocated variables and escape analysis (Section 5.9.1 on page 187);

and
5.13 SSA Enablements 193

• partitioning the data sections (Section 5.9 on page 186).

Range analysis of pointers is also important for initialised aggregate data, as noted in

Section 5.2.4.

The current handling of arrays assumes a C-like array of arrays implementation of

multidimensional arrays. It should be possible to modify the high level patterns to

accommodate other implementations.

Object oriented languages such as C++ introduce more elements to be considered, such

as member pointers, class hierarchies, and so on. Some of these features are discussed

in the next chapter.

5.13 SSA Enablements

Expression propagation, enabled by the SSA form, combines with simplication to pre-

pare memory expressions for high level pattern analysis, and the SSA form allows a

sparse representation for type information.

The high level patterns of Section 5.8 on page 178 require the distinction of integer

constants from other integer expressions. This is readily achieved by the combination

of expression propagation and simplication that are enabled by the Static Single As-

signment form. These also eliminate partial constants generated by RISC compilers,

which would have been awkward to deal with in any type analysis system.

The SSA form also allows a sparse representation of type information at the denitions

of locations. One type storage per SSA denition contrasts with the requirement of one

type storage per live variable per basic block, as would be required by a traditional bit

vector based iterative framework.


194 Type Analysis for Decompilers
Chapter 6

Indirect Jumps and Calls

While indirect jumps and calls have long been the most problematic of instructions for

reverse engineering of executable les, their analysis, facilitated by SSA, yields high level

constructs such as switch statements, function pointers, and class types.

When decoding an input executable program, indirect jump and call instructions are

the most problematic. If not for these instructions, a recursive traversal of the program

from all known entry points would in most cases visit every instruction of the program,
+
thereby separating code from data [VWK 03]. This assumes that all branch instruction

targets are valid, there is no self modifying code, and the program is well behaved in

the sense of call and return instructions doing what they are designed to do.

Indirect jumps and calls are decoded after loading the program le, and before data

ow analysis, as shown in Figure 6.1.

Output
source
file
Intermediate Representation (IR)
Front end

Control Flow
Data Flow Type Code
Loader Decoder Analysis
Analysis Analysis generation
(structuring)
Back end
If indirect jump or call
Input
binary file

Figure 6.1: Decoding of instructions in a machine code decompiler.

It is interesting to note that it is indirect jump and call instructions that are most

problematic when reconstructing the control ow graph, and it is indirect memory

195
196 Indirect Jumps and Calls

operations (e.g. m[m[x ]]) which include the most problems (in the form of aliases), in
+
data ow analysis [CCL 96].

The following sections describe various analyses, most facilitated by the static single

assignment form (SSA form), which convert indirect jumps and calls to the appropriate

high level constructs.

6.1 Incomplete Control Flow Graphs

Special processing is needed since the most powerful indirect jump and call analyses rely

on expression propagation, which in turn relies on a complete control ow graph (CFG),

but the CFG is not complete until the indirect transfers are analysed.

The analysis of both indirect jumps and indirect calls share a common problem. It

is necessary to nd possible targets and other relevant information for the location(s)

involved in the jump or call, and the more powerful techniques such as expression

propagation, and value and/or range analysis have the best chance of computing this

information. These techniques rely heavily on data ow analyses, which in turn rely

on having a complete CFG. Until the analysis of indirect jumps and calls is completed,

however, the CFG is not complete. This type of problem is often termed a phase

ordering problem. While initially it would appear that this chicken and egg problem

cannot be resolved, consider that each indirect jump or call can only depend on locations

before the jump or call is taken, and consequently only instructions from the start of

the procedure to that jump or call instruction need be considered.

Figure 6.2 illustrates the problem. It shows a simple program containing a switch

statement, with one print statement in each arm of the switch statement (including the

default arm, which is executed if none of the switch cases is selected). Part (b) of the

gure shows the control ow graph. Before the n-way branch is analysed, the greyed

basic blocks are not part of the graph (they are code that is not yet discovered). As a

result, the data ow logic is able to deduce that the denition of the print argument

in block 8 (the string Other!) can be propagated into the print statement in block

9. In fact, basic blocks 8 and 9 are not yet separate at this stage. However, one of

the rules for safe propagation is that there are no other denitions of the components

of the right hand side of the assignment to be propagated which reach the destination

(Section 3.1 on page 65). Once the n-way branch is analysed, however, it is obvious

that this propagation is invalid.

One way to correct this problem would be to force conservative behaviour (in this case,

not propagating), by somehow anticipating the possibility of in-edges to block 8/9.


6.1 Incomplete Control Flow Graphs 197

int main(int argc) {


switch(argc) {
case 2: printf("Two!\n"); break; case 3: printf("Three!\n"); break;
case 4: printf("Four!\n"); break; case 5: printf("Five!\n"); break;
case 6: printf("Six!\n"); break; case 7: printf("Seven!\n"); break;
default:printf("Other!\n");break;
}
return 0;
}
(a) C Source code for the example program.

0
eax >u 5

false true

1 nway

8 fall

2 oneway 3 oneway 4 oneway 5 oneway 6 oneway 7 oneway

9 call
printf

10 ret

(b) Control Flow Graph for the above program. Shaded basic blocks are not
discovered until the n-way (switch) block is analysed.

Figure 6.2: Example program with switch statement.

However, nothing in the decoded instructions indicates such a possibility. It should be

noted that restricting the propagation too much will result in the failure of the n-way

branch analysis.

Another way would be to store undo information for propagations, so that propagations

found to be invalid after an indirect branch could be restored. At minimum, the original

expression before propagation would need to be stored, possibly in the form of the

original SSA reference to its denition. Dead code elimination is not run until much
198 Indirect Jumps and Calls

later, so the original denition is guaranteed to still exist.

The simplest method, costly in space and time, is to make a copy of the procedure's IR

just after decoding, and the data ow for the whole procedure is restarted after every

indirect jump (of course, making use of the new jump targets).Where nested switch

statements occur, several iterations of this may be needed; in practice, more than one

or two such iterations would be required. This is the approach taken by the Boomerang

decompiler [Boo02]. It causes a loop in the top level of the decompiler component graph

(as shown in Figure 6.1).

Simpler techniques not requiring propagation could be used for the simpler cases, to

reduce the number of times the expensive restarts are required.

Indirect calls that are not yet resolved have to assume the worst case preservation

information. It is best to recalculate the data ow information for these calls after

nding a new target, so that less pessimistic propagation information can be used.

Whether the analysis of the whole procedure needs to be restarted after only indirect

calls are analysed depends on the design of the decompiler.

6.2 Indirect Jump Instructions

Indirect jump instructions are used in executable programs to implement switch (case)

statements and assigned goto statements, and tail-optimised calls.

Compilers often employ indirect jump instructions (branches) to implement switch or

case statements, unless the number of cases is very small, or the case values are sparse.

They can also be used to implement assigned goto statements, and possibly exception
+
handling [SBB 00]. Finally, any call at the end of a procedure can be tail-call optimised

to a jump. It is assumed that jumps whose targets are the beginning of functions have

been converted to call statements followed by return statements.

6.2.1 Switch Statements


Switch statements can conveniently be analysed by delaying the analysis until after ex-

pression propagation.

The details of the implementations of switch statements are surprisingly varied. Most

implementations store direct code pointers in a jump table associated with the indirect

jump instruction. Some store osets in the table, usually relative to the start of the

table, rather than the target addresses themselves. This is presumably done to minimise
6.2 Indirect Jump Instructions 199

the number of entries in the relocation tables in the object les; relocation table entries

are not visible in executable les.

One approach to recognising these branch table implementations is to slice backwards

from the indirect jump until one of a few normal forms is found [CVE01].

Table 6.1: High level expressions for switch expressions in Boomerang.


Form High level pattern Table entry

A m[<expr >*4 + T] Address


O m[<expr >*4 + T] + T Oset
R %pc + m[%pc + <expr >*4 + k] Oset
r %pc + m[%pc + <expr >*4 - k] - k Oset

Table 6.1 shows the high level expressions for various forms of switch statement for

the Boomerang decompiler. T is a constant representing the address of the table; k


is a small constant (in the range 8 to a few thousand). The expressions assume a

32-bit architecture, with a table size of 4 bytes. %pc represents the program counter

(Boomerang is not very precise about what point in the program %pc represents, but it
does not need to be).

These expressions are very similar to those from Figure 7 of [CVE01], which builds up

the expression for the target of the indirect branch by slicing backwards through the

program at decode time. This approach was used initially in the Boomerang decom-

piler, however, the variations seen in real programs caused the code to become very

complex and dicult to maintain. Since a decompiler needs to perform constant and

other propagation, it seems natural to use this powerful technique instead of slicing, de-

spite the necessity of restarting analyses. By delaying the analysis of indirect branches

until after the IR is in SSA form and expression propagation has been performed, the

expression for the destination of the call appears in the IR for the indirect branch with

no need for any slicing or searching.

This delayed approach has the advantage that the analysis can bypass calls if necessary,

which is not practical when decoding. There is no need to follow control ow edges, and

the analysis automatically spans multiple basic blocks if necessary. Figure 6.3 shows

the IR for the program of Figure 6.2.

The expression in statement 11 (m[((eax4 - 2) * 4) + 0x8048934]) matches form


A, with expr = eax4 - 2 and T = 0x8048934. The number of cases is found by
searching the in-edge of the switch basic block for 2-way basic blocks. This is slicing in

a sense, but it is simpler because the branch expression has been propagated into the

condition of the branch instruction at the end of the 2-way basic block. In this case, the
200 Indirect Jumps and Calls

1 m[esp0 - 4] := ebp0
3 ebp3 := esp0 - 4
4 eax4 := m[esp0 + 4]0
6 eax6 := eax4 - 2
10 BRANCH to 0x804897c if (eax4 - 2) >u 5
11 CASE [m[((eax4 - 2) * 4) + 0x8048934]]
Figure 6.3: IR for the program of Figure 6.2.

branch expression is (eax4 - 2) >u 5 where (eax4 - 2) matches expr . The number

of cases is readily established as six (branch to the default case if expr is greater than

5).

6.2.1.1 Minimum Case Value

Where there is a minimum case value, expression propagation, facilitated by the SSA

form, enables a very simple way to improve the readability of the generated switch

statement.

In Figure 6.2, there is a minimum switch case value of 2. Typical implementations

subtract 2 from the switch variable before comparing the result against the number of

case values, including any gaps. Hence the comparison in block 0 is against 5, not 7.

Without checking the case expression, this would result in the program of Figure 6.4.

int main(int argc) {


switch(argc-2) {
case 0: printf("Two!\n"); break; case 1: printf("Three!\n"); break;
case 2: printf("Four!\n"); break; case 3: printf("Five!\n"); break;
case 4: printf("Six!\n"); break; case 5: printf("Seven!\n"); break;
default:printf("Other!\n");break;
}
return 0;
}

Figure 6.4: Output for the program of Figure 6.2 when the switch expression is not
checked for subtract-like expressions.

This program, while correct, is less readable than the original. There is a simple check

that can be performed to increase readability: if the switch expression is of the form

`-K where ` is a location and K is an integer constant, simply add K to all the switch

case values, and emit the switch expression as ` instead of `-K. This simple expedient

shows the power of expression propagation, canonicalisation, and simplication, which

are facilitated by the SSA form. It saves a lot of checking of special cases, following

possibly multiple in-edges to nd a subtract statement, and so on. This improvement

can be trivially extended to handle the case of `+K.


6.2 Indirect Jump Instructions 201

In the example of Figure 6.3, ` is eax4 and K is 2. The generated output is essentially

as per the original source code.

6.2.1.2 No Compare for Maximum Case Value

There are three special cases where an optimising compiler does not emit the compare

and branch that usually sets the size of the jump table.

Occasionally, the maximum value of a switch value is evident from the switch expression.

Examples include ` % K, ` &(N-1), and ` |(-N), where K is any nonzero integer constant
(e.g. 5), and N is an integral power of 2 (e.g. 8). For example, ` |(-8) will always

yield a value in the range -1 to -8. In these cases, an optimising compiler may omit the

comparison against the maximum switch case value, if it is redundant (i.e. the highest

valued switch case is present).

Rarely, a compiler may rely on a range restriction on a procedure parameter that is

known to be met by all callers. In these cases, range analysis will be needed to nd the

number of entries in the jump table.

6.2.2 Assigned Goto Statements

Fortran assigned goto statements can be converted to switch statements, removing an

unstructured statement, and permitting a representation in languages other than For-

tran.

Figure 6.5 shows a simple program containing an assigned goto statement. Many mod-

ern languages do not have a direct equivalent for this relatively unstructured form of

statement, including very expressive languages such as ANSI C. The similarity to the

switch program in Figure 6.2 suggests that the switch statement could be used to express

this program. For n assignments to the goto variable, there are n possible destinations

of the indirect jump instruction, which can be expressed with a switch statement con-

taining n cases. The case items are somewhat articial, being native target addresses

of the indirect jump instruction.

SSA form enables the task of nding of these targets eciently. Figure 6.6 shows the

detail. In eect, the φ-statements form a tree, with assignments to constants (rep-

resented here by the letter L and their Fortran labels) at the leaves. It is therefore

straightforward to nd the list of targets for the indirect jump instruction.
202 Indirect Jumps and Calls

program asgngoto
integer num, dest
print*, 'Input num:' 70 local2 := 134514528
read*, num 73 if (local1 6= 2) goto L1
assign 10 to dest 74 local2 := 134514579
if (num .eq. 2) assign 20 to dest L1:
if (num .eq. 3) assign 30 to dest 85 local2 :=
if (num .eq. 4) assign 40 to dest φ{local270 , local274 )
* The computed goto: 77 if (local1 6= 3) goto L2
goto dest, (20, 30, 40) 78 local2 := 134514627
10 print*, 'Input out of range' 86 local2 :=
return φ{local285 , local278 )
20 print*, 'Two!' 81 if (local1 6= 4) goto L3
return 82 local2 := 134514675
30 print*, 'Three!' L3:
return 87 local2 :=
40 print*, 'Four!' φ{local286 , local282 )
return 84 goto local287
end

(a) Fortran source code. (b) Intermediate representation.

print read p=L10

num=2? num=3? num=4?

true true true

p=L20 p=L30 p=L40

goto p

L10: L20: L30: L40:


print other print two print three print four

ret

(c) Control ow graph.

Figure 6.5: A program using an assigned goto.


6.2 Indirect Jump Instructions 203

87
Statement 84 of Figure 6.5(b) shows the case statement

86 82 location to be subscripted with 87. Statement 87 con-

tains its denition, a phi statement. Eectively, this


85 78 L40
line says that the location dened in statement 87 was

70 74 L30 dened at either statement 86 or 82. Statement 82 as-

signs the goto variable (location2) with a constant.


L10 L20

Figure 6.6: Tree of φ-statements and assignments to the goto variable from
Figure 6.5.

Figure 6.7 shows the resultant decompiled program. The output is not ideal, in that

aspects of the original binary program (the values of the labels) are visible in the

decompiled output. However, the computed goto is represented in the output language,

and is correct.

local2 = 0x8048760; /* assign 10 to dest */


if (local1 == 2) { /* if (num .eq. 2) */
local2 = 0x8048793; /* assign 20 to dest */
}
if (local1 == 3) { /* if (num .eq. 3) */
local2 = 0x80487c3; /* assign 30 to dest */
}
if (local1 == 4) { /* if (num .eq. 4) */
local2 = 0x80487f3; /* assign 40 to dest */
}
switch(local2) { /* goto dest, (20, 30, 40) */
case 0x8048760: /* Label 10 */
... /* 'Out of range' */
case 0x8048793: /* Label 20 */
... /* 'Two' */

Figure 6.7: Decompiled output for the program of Figure 6.5. Output has been
edited for clarity.

6.2.2.1 Other Indirect Jumps

Indirect jump instructions that do not match any known pattern have long been the

most dicult to translate, but value analysis combined with the assigned goto switch

statement variant can be used to represent them adequately.

The above assigned goto analysis has generated code which would be suitable for an

indirect jump instruction which does not match any of the high level patterns; all that
204 Indirect Jumps and Calls

is needed is the set of possible jump instruction targets. There may be cases where the

jump instruction targets are not available via a tree of φ-statements as is the case in the
program of Figure 6.5. An analysis that provides an approximation of the possible values

for a location (e.g. the value set analysis (VSA) of Reps et al. [BR04, RBL06]) would

enable such indirect jump instructions to be correctly represented in the decompiled

output. The imprecision of the value analysis may cause some emitting of irrelevant

output, or omission of required output, but this is still a considerable advance over not

being able to handle such jump instructions at all.

6.2.2.2 Other Branch Tree Cases

Branch trees or chains are found in switch statements with a small number of cases,

and subtract instructions may replace the usual compare instructions, necessitating some

logic to extract the correct switch case values.

Branch trees are also found in other situations, such as a switch statement with a small

number of cases. Figure 6.8 shows a program fragment with such an example. In this

case, with only three cases, there isn't a tree as such, but the implementation has an

interesting dierence, as shown in Figure 6.9.

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, ...)


{ ...
switch (message) {
case WM_COMMAND: // WM_COMMAND = 273
wmId = LOWORD(wParam); ...
case WM_PAINT: // WM_PAINT = 15
hdc = BeginPaint(hWnd, &ps); ...
case WM_DESTROY: // WM_DESTROY = 2
PostQuitMessage(0); ...
default:
return DefWindowProc(hWnd, message, wParam, lParam);
}
}
Figure 6.8: Source code for a short switch statement with special case values.

Although the three switch values are 2, 15, and 273, the values compared to are 2, 13,

and 258. This is because instead of three compare and branch instruction pairs, the

compiler has chosen to emit three subtract and branch pairs. (Perhaps the reasoning

is that the dierences between cases are usually less than the case values themselves,

and the x86 target has more compact instruction forms for smaller immediate values).

The decompiler has already transformed the relational expression param5-2 == 0 to

param5 == 2 for the rst comparison. Care needs to be taken to ensure that similar
6.2 Indirect Jump Instructions 205

if (param5 == 2) { // 2 = WM_DESTROY
PostQuitMessage(0);
} else {
if (param5 - 2 == 13) { // 13 = WM_PAINT - WM_DESTROY
BeginPaint(param4, &param2); ...
} else {
if (param5 - 15 == 258) { // 258 = WM_COMMAND - WM_PAINT
if ((param6 & 0xffff) == 104) {...
}
}
} else {
DefWindowProcA(param4, param5, param6, param7);
}
}
Figure 6.9: Direct decompiled output for the program of Figure 6.8.

transformations are applied to the other two relational expressions, so that the true

switch values are obtained. Yet again, these manipulations are easier after expression

propagation, with no need to trace backwards through the intermediate representation.

6.2.3 Sparse Switch Statements


Sparse switch statements usually compile to a branch tree which can be transformed into

a switch-like statement for greater readability.

int main() {
int n; printf("Input a number, please: "); scanf("%d", &n);
switch(n) {
case 2: printf("Two!\n"); break;
case 20: printf("Twenty!\n"); break;
case 200: printf("Two Hundred!\n"); break;
case 2000: printf("Two thousand!\n"); break;
case 20000: printf("Twenty thousand!\n"); break;
case 200000: printf("Two hundred thousand!\n"); break;
case 2000000: printf("Two million!\n"); break;
case 20000000: printf("Twenty million!\n"); break;
case 200000000: printf("Two hundred million!\n"); break;
case 2000000000: printf("Two billion!\n"); break;
default: printf("Other!\n");
}
return 0;
}
Figure 6.10: Source code using a sparse switch statement.
206

0 call 46 call 54
printf scanf local0 = 20K

false true

57
local0 > 20K

155
true false call
puts

77 60
local0 = 20M local0 = 20

true false false true

80 63
local0 > 20M local0 > 20

194
116 call
call false true true
puts
puts

Figure 6.11: Control Flow Graph for the program of Figure 6.10 (part 1).
83 90 70
false
local0 = 200K local0 = 200M local0 = 200
Indirect Jumps and Calls
true false true false false true 120 oneway

86 207 call 93 66 73
local0 = 2M puts local0 = 2B local0 = 2 local0 = 2K

168 call 129 call


true false true false false true false true
puts puts

181 call 220 call 142 call


6.2 Indirect Jump Instructions

87 oneway 94 oneway 67 oneway 74 oneway


puts puts puts

103 call
211 oneway 133 oneway
puts

233 call
224 oneway
puts

185 oneway 107 oneway 146 oneway

172 oneway bba1e4550 fall

243 ret

Figure 6.11: Control Flow Graph for the program of Figure 6.10 (part 2).
207
208 Indirect Jumps and Calls

While Figure 6.6 shows a tree of values leading to an indirect jump instruction, trees

also feature in the control ow graph of sparse switch statements. Compilers usually

emit a tree of branch instructions to implement sparse switch statements. Figure 6.10

shows C source code for such a program, and Figure 6.11 shows its control ow graph.

Note that no indirect jump instruction is generated; a jump table would be highly

inecient. In this case, almost two billion entries would be needed, with only ten of

these actually used. The compiler emits a series of branches such that a binary search

of the sparse switch case space is performed. Without special transformations, the

decompiled output would be a series of highly nested conditional statements. This

would be correct, but less readable than the original code with a switch statement.

It should be possible to recognise this high level pattern in the CFG. There are no control

ow edges from other parts of the program to the conditional tree, and no branches

outside what will become the switch statement. The follow node (the rst node that

will follow the generated switch statement) is readily found as the post dominator of

the nodes in the conditional tree. The switch cases are readily found by searching

the conditional tree for instructions comparing with a constant value. Range analysis

would be better, since it would allow for ranges of switch cases (e.g. case 20: case
21: ... case 29: print(Twenty something)).
The lcc compiler can generate a combination of branch trees and possibly short jump

tables [FH91, FH95]. Again, it should be possible to recognise this high level pattern

in the CFG and recreate close to the original source code.

An older version of the Sun C compiler, for sparse switch statements, compiled in a

simple hash function on the switch variable, and the jump table consisted of a pointer

and the hash key (switch value). A loop was required to account for collisions. Several

dierent simple hash functions have been observed; the compiler seemed to try several

hash functions and to pick the one with the best performance for the given set of switch

values. Suitable high level patterns can transform even highly unusual code like this

into normal switch statements [CVE01].

6.3 Indirect Calls

Indirect calls implement calls through function pointers and virtual function calls; the

latter are a special case which should be handled specially for readability.

Compilers emit indirect call instructions to implement virtual function calls (VFCs),

and calls through function pointers (e.g (*fp)(x), where fp is a function pointer, and

x is a passed argument). Similarly to the implementation of switch statements using


6.3 Indirect Calls 209

jump tables, virtual function calls are usually implemented using a few xed patterns,

with some variation amongst compilers. There is an extra level of indirection with

VFCs, because all objects of the same class share the same virtual function table (VFT

or VT). The VT is often stored in read-only memory, since it never changes; the objects

themselves are usually allocated on the heap or on the stack.

Indirect function calls that do not match the patterns for VFCs can be handled as

indirect function pointer calls. In fact, VFT calls could be handled as indirect function

pointer calls, but there would be needless and complex detail in the generated code. For

example, a call to method 3 of object *obj (obj->VT[3])(x),


might be generated as

where again x is the argument to the call, compared to the expected obj->method3(x).

6.3.1 Virtual Function Calls


In most object oriented languages, virtual function calls are implemented with indirect

call instructions, which present special challenges to decompilers.

Virtual
function
Object Function
Table (VT)
code
VT ptr offset 2 save
p member 1 offset 1 add
move
member 2 function 1 ...
function 2 =
draw()
function 3

Figure 6.12: Typical data layout of an object ready to make a virtual call such
as p->draw().

It is common in object oriented programs to nd function calls whose destination de-

pends on the run-time class of the object making the call. In languages like C++, these

are called virtual function calls. The usual implementation includes a table of function

pointers, called the virtual function table, VFT, VT, or virtual method table, as a

hidden member variable. Figure 6.12 shows a typical object layout. The VT pointer is

not necessarily the rst member variable. Figure 6.13 shows a typical implementation

of such a call in x86 assembly language.

Where multiple inheritance is involved, the VT may include information about how to

cast a base class pointer to a derived class pointer (a process known as downcasting).

Such casting usually involves adding an oset to the pointer, but the required constant
210 Indirect Jumps and Calls

% eax has pointer to object


mov (%esi),%eax % Load VT pointer to eax
mov %esi,(%esp) % Copy this to TOS (first parameter)
call *0x4(%eax) % Call member function at offset 4

Figure 6.13: Implementation of a simple virtual function call (no adjustment


to the this (object) pointer.

often depends on the current (runtime) type of the original pointer. Sometimes this

constant is stored at negative osets from the VT pointer, as shown dotted in gure

6.12. Figure 6.14 shows an implementation.

% eax has pointer to object


mov %esi,%ebx % Copy object to ebx
mov (%esi),%eax % Load VT pointer to eax
mov -12(%eax),%ecx % Load offset to ecx
add %ecx,%ebx % Add offset to object
mov (%ebx),%eax % Load VT for ancestor object
mov %ebx,(%esp) % Copy adjusted this as first
parameter
call *(%eax) % Call member function (offset 0)

Figure 6.14: Implementation of a more complex function call, with an adjust-


ment to this, and using two virtual function tables (VTs).

Tröger and Cifuentes [TC02] report that static analysis can identify virtual function calls

using high level patterns similar to those used for switch statements. It can determine

the location the object pointer is read from, the oset to the VT, and the oset to the

method pointer. To nd the actual method being called, however, requires a run-time

value of an object pointer. As an example, the analysis might reveal that the object

pointer resides in register esi at a particular instruction, that the VT is at oset 0 in

the object, and the function pointer is at oset 8 in the VT. To nd one of the methods

actually being called, the analysis needs to know that the object pointer could take on

the value allocated at the call to malloc at address 0x80487ab. The authors imply that

this analysis is possible only in a dynamic tool (such as a dynamic binary translator),

since only in a dynamic tool would such run-time information be available in general.

However, once the object pointer and oset of the VT pointer in the object is found from

the above analysis, nding the VT associated with the object is essentially equivalent

to performing value analysis (like range analysis but expecting sets of singleton values

rather than ranges of values) on the VT pointer member. In the compiler world, this is

called type determination. Pande and Ryder [PR94, PR96] prove that this is NP-hard,

but provide an approximate polynomial solution. Readability can be improved if all


6.3 Indirect Calls 211

such object pointer values can be found. Finding all such objects allows examination of

all possible targets, eliminating arguments that are not used by any callees, and allowing

precise preservation analysis. The latter prevents all live locations from having to be

passed as reference parameters (or as parameters and returns) of the call, as discussed

in Section 3.4.3 on page 81. Value analysis may be able to take advantage of trees of

φ-statements in the SSA form, as discussed in Section 6.2.2 on assigned gotos.

Once the arguments and object pointer are determined, the indirect call instruction can

be replaced by a suitable high level IR construct. The rest of the code implementing

the indirect call will then be eliminated as dead code, leaving only the high level call

(e.g. employee->calculate_pay()) in the decompiled output, as required.

If the above could be achieved for most object values, then it would be possible to iden-

tify most potential targets of virtual calls. This would enable decoding of instructions

reachable only via virtual functions, achieving more of the goal of separating code from

data.

6.3.1.1 Data Flow and Virtual Function Target Analyses

Use of the SSA form helps considerably with virtual function analysis, which is more

complex than switch analysis, by mitigating alias problems, and because SSA relations

apply everywhere.

While the techniques of Tröger and Cifuentes [TC02] can be used to nd the object

pointer, VT pointer oset, and method oset, it does not nd the actual VT associated

with a given class object. Finding the VT associated with the object is equivalent to

nding a class type for the object. In some cases, the VT will even have a pointer to

the original class name in the executable [VEW04]. This section considers two analyses

for nding the targets of virtual function calls, the rst without using the SSA form,

and the second using the SSA form.

Figure 6.15 shows a C++ program using shared multiple inheritance. The

class hierarchy is at left. It uses shared multiple inheritance in the sense


X that class A, which is multiply inherited, is shared by classes B and C (as

opposed to being replicated inside both B and C). In the source code, the
A
underlined virtual keywords make this choice. Machine code for this
v v program is shown in Figure 6.16. Unfortunately, simpler examples do not
B C
contain aliases, and hence do not illustrate the point that SSA form has

considerable advantages in practical use.


D

Consider the call at address 804884c, implementing c->foo() in the source code. Sup-

pose rst this call is being analysed instruction by instruction without the SSA form or
212 Indirect Jumps and Calls

#include <iostream>
class X {
public:
int x1, x2;
X(void) { x1=100; x2=101; }
virtual void foo(void) {
cout < < "X::foo(" < < hex < < this < < ")" < < endl; }
};
class A: public X {
public:
int a1, a2;
A(void) { a1=1; a2=2; }
virtual void foo(void) {
cout < < "A::foo(" < < hex < < this < < ")" < < endl; }
};
class B: public virtual A {
public:
int b1, b2;
B(void) { b1=3; b2=4; }
virtual void bar(void) {
cout < < "B::bar(" < < hex < < this < < ")" < < endl; }
};
class C: public virtual A {
public:
int c1, c2;
C(void) { c1=5; c2=6; }
virtual void foo(void) {
cout < < "C::foo(" < < hex < < this < < ")" < < endl; }
};
class D: public B, public C {
public:
int d1, d2;
D(void) { d1=7; d2=8; }
virtual void foo(void) {
cout < < "D::foo(" < < hex < < this < < ")" < < endl; }
virtual void bar(void) {
cout < < "D::bar(" < < hex < < this < < ")" < < endl; }
};
int main(int argc, char *argv[]) {
D* d = new D(); d->foo();
B* b = (B*) d; b->bar();
C* c = (C*) d; c->foo();
A* a = (A*) b; a->foo();
....

Figure 6.15: Source code for a simple program using shared multiple inheritance.
6.3 Indirect Calls 213

80487a9: push $0x38


80487ab: call operator_new // Allocate 56 bytes
80487b0: mov %eax,%esi // esi points to base
80487b2: lea 0x24(%esi),%eax // eax is 36 bytes past base
80487b5: lea 0x10(%esi),%ebx // ebx is 16 bytes past base
80487b8: movl $0x64,0x24(%esi)
80487bf: movl $0x8049c40,0x8(%eax)
80487c6: movl $0x65,0x4(%eax)
80487cd: movl $0x1,0xc(%eax)
80487d4: movl $0x2,0x10(%eax)
80487db: movl $0x8049c30,0xc(%esi)
80487e2: movl $0x3,0x4(%esi)
80487e9: movl $0x8049c20,0x8(%eax)
80487f0: movl $0x4,0x8(%esi)
80487f7: mov %eax,0x10(%esi)
80487fa: movl $0x5,0x4(%ebx)
8048801: movl $0x6,0x8(%ebx)
8048808: mov %eax,(%esi)
804880a: movl $0x8049c00,0xc(%esi)
8048811: movl $0x7,0x1c(%esi)
8048818: movl $0x8049c10,0x8(%eax)
804881f: movl $0x8,0x20(%esi)
8048826: mov %eax,(%esp)
8048829: call *0x8049c18 // d->foo()
804882f: mov 0xc(%esi),%eax
8048832: mov %esi,(%esp)
8048835: call *0x8(%eax) // b->bar()
8048838: xor %eax,%eax
804883a: test %esi,%esi
804883c: sete %al
804883f: lea 0xffffffff(%eax),%edi
8048842: and %ebx,%edi
8048844: mov (%edi),%eax
8048846: mov 0x8(%eax),%edx
8048849: mov %eax,(%esp)
804884c: call *0x8(%edx) // c->foo()

804884f: add $0x10,%esp


8048852: xor %ebx,%ebx
8048854: test %esi,%esi
8048856: je 804885a
8048858: mov (%esi),%ebx
804885a: sub $0xc,%esp
804885d: mov 0x8(%ebx),%eax
8048860: push %ebx
8048861: call *0x8(%eax) // a->foo() (via B)
Figure 6.16: Machine code for the start of the main function of Figure 6.15.
214 Indirect Jumps and Calls

the benet of any data ow analysis. By examining the preceeding three instructions,

it is readily determined that the object pointer is eax at instruction 8048846, the VT

pointer oset is 8, and the method oset is also 8.

A simplied algorithm for determining the value of the VT pointer associated with the

call is as follows. Throughout the analysis, one or more expression(s) of interest is (are)

maintained. Analysis begins with the expression of interest set to the expression for

the VT pointer, in this case m[eax+8] (with eax taking the value it has at instruction

8048846). First, analysis proceeds backwards through the program. For each assign-

ment to a component of an expression of interest (e.g. eax is a component of m[eax+8]),


the right hand side of the assignment is substituted into the component. For example,

when the instruction mov(%edi),%eax is encountered, the value of eax is no longer

of interest, since it is overwritten at this instruction, and cannot aect the indirect

call from that point back. The new expression of interest is found by replacing the

left hand side of the assignment (here eax) with the right hand side of the assignment

(here m[edi]), resulting in a new expression of interest of m[m[edi]+8]. For simplic-

ity, consider the and instruction at 8048842 to be a move instruction; it implements a

null-preserved pointer, discussed below. This phase of the algorithm terminates when

the expression of interest would contain the result of a call to the operator new library

function, or the start of the procedure is reached.

Now a second phase begins where analysis proceeds forwards from this point. In this

phase, when an assignment is reached that produces an ane relation for one of the

components, the analysis continues with an extra expression of interest involving new

locations. For example, if the expression of interest is ...m[eax+16]... and the assign-

ment esi := eax is encountered, the new expressions of interest are ...m[eax+16]...
and ...m[esi+16]... . If the assignment was instead ebx := eax+8, an expression for

eax is derived (here eax := ebx-8), and this is substituted into the existing expres-
sion. In this case, the resulting expressions of interest would be ...m[eax+16]... and

...m[ebx+8]... . An assignment such as ebx := eax+8 is called an ane relation be-

cause it marries the locations eax and ebx to each other. Such ane related locations,

if involved in memory expressions, produce aliases. This phase of the analysis termi-

nates successfully when an expression of interest is a constant, or unsuccessfully if the

starting instruction is reached.

Table 6.2 shows this process in detail for the program of Figures 6.15 and 6.16. When

the rst phase of the analysis winds back to the call to operator new, this is equivalent

to m[m[base +16]+8], were base is the start of the memory allocated for the object.

80487f7: mov %eax, 16(%esi), because at


However, the analysis ignored instruction

the time it was not known that m[ebx] and m[esi+16] were aliases. This is the reason
6.3 Indirect Calls 215

Table 6.2: Analysis of the program of Figure 6.16.

Instruction considered Expression(s) of interest

8048846 (start) m[eax+8]


8048844 mov (%edi),%eax m[m[edi]+8]
8048842 mov %ebx,%edi (eectively) m[m[ebx]+8]
8048835 call *0x8(%eax) m[m[ebx]+8]
8048829 call *0x8(%eax) m[m[ebx]+8]
80487b5 lea 16(%esi),%ebx m[m[esi+16]+8]
80487b0 mov %eax,%esi m[m[eax+16]+8]
80487ab call operator_new m[m[eax+16]+8]
80487b0 mov %eax,%esi m[m[eax+16]+8] m[m[esi+16]+8]
80487b2 lea 36(%esi),%eax m[m[esi+16]+8] m[m[eax-20]+8]
80487b5 lea 16(%esi),%ebx m[m[esi+16]+8] m[m[eax-20]+8]
m[m[ebx]+8]
80487f7 mov %eax,16(%esi) m[eax+8]
8048818 mov $8049c10,8(%eax) 0x8049c10

for the second phase of the algorithm. Further forward progress reaches instruction

8048818: movl $0x8049c10,8(%eax), which gives the address of the VT for this in-

direct call. However, the instructions at addresses 80487bf and 80487e9 also match

the nal expression of interest, and these can only be reached by changing analysis

direction yet again. (In this case, they are not needed, as they are overwritten by the

assignment at 8048818, but this may not always be the case.) Virtual calls with the

same VT pointer call methods of the same class, thereby giving information about the

classes that methods belong to. It is now a simple matter of looking up the VT at oset

8 (the method pointer oset, here coincidentally the same as the VT pointer oset) to

nd the target of the call.

Note that in this example, there is only one possible target for the call, so the call

could have been optimised to a direct call if the compiler had optimised better. The

more common case is that there will be several possible objects pointed to, and each of

these could have dierent types, and hence dierent VT pointer values. The simplied

algorithm needs to be extended to take this into account, and since object pointers are

commonly passed as parameters to procedures, the analysis needs to be interprocedural

as well. Since global objects are created in special initialisation functions that are

specic to the executable le format, this code needs to be examined as well.

Note also that for this example, the standard preservations apply (e.g. esi and ebx
are preserved by the calls at 8048829 and 8048835). Alternatively, each call could be

processed in sequence, strictly checking each preservation as targets are found. However,

this will only strictly be correct if for every call, all possible targets are discovered, since
216 Indirect Jumps and Calls

any undiscovered target might be an exception to the preservation information based

only on discovered targets.

This example provides a taste of the complexity caused by aliases, which are common

in machine code implementing virtual calls.

080487ab 18 eax18 := call operator_new()


080487b0 19 esi19 := eax18
080487b2 20 eax20 := eax18 + 36
080487b5 21 ebx21 := eax18 + 16
080487b8 22 m[eax18 + 36]22 := 100
080487bf 23 m[eax18 + 44]23 := 0x8049c40
080487c6 24 m[eax18 + 40]24 := 101
080487cd 25 m[eax18 + 48]25 := 1
080487d4 26 m[eax18 + 52]26 := 2
080487db 27 m[eax18 + 12]27 := 0x8049c30
080487e2 28 m[eax18 + 4]28 := 3
080487e9 29 m[eax18 + 44]29 := 0x8049c20
080487f0 30 m[eax18 + 8]30 := 4
080487f7 31 m[eax18 + 16]31 := eax18 + 36
080487fa 32 m[eax18 + 20]32 := 5
08048801 33 m[eax18 + 24]33 := 6
08048808 34 m[eax18 ]34 := eax18 + 36
0804880a 35 m[eax18 + 12]35 := 0x8049c00
08048811 36 m[eax18 + 28]36 := 7
08048818 37 m[eax18 + 44]37 := 0x8049c10
0804881f 38 m[eax18 + 32]38 := 8
08048826 39 m[esp0 - 44]39 := eax18 + 36
08048829 43 eax43 := call __thunk_36_foo__1D()
Reaching definitions: ... ebx43 =eax18 +16, esi43 =esi19 , ...
0804882f 44 eax44 := m[esi43 + 12]?
08048832 45 m[esp43 ]45 := esi43
08048835 49 eax49 := call m[eax44 + 8]()
Reaching definitions: ... ebx49 =ebx43 , esi49 =esi43 , ...
08048838 50 eax50 := 0
0804883c 54 eax54 := esi49 = 0
0804883f 56 edi56 := (0 | eax54 ) - 1
08048842 57 edi57 := ((0 | eax54 ) - 1) & ebx49
08048844 59 eax59 := m[edi57 ]?
08048846 60 edx60 := m[eax59 + 8]?
08048849 61 m[esp49 ]61 := eax59
0804884c 65 eax65 := call m[edx60 + 8]()
Figure 6.17: IR for the program of Figures 6.15 and 6.16.

Consider now the alternative of waiting until data ow analysis has propagated, canoni-

calised, and simplied expressions in the IR, and further that the IR is based on the SSA
6.3 Indirect Calls 217

form, as shown in Figure 6.17. For reasons of alias safety, propagation of memory ex-

pressions have not been performed at this stage. (If expression propagation could have

been performed with complete alias safety, which may be possible some day, the call at

statement 65 would be eax := call m[0x8049c18], which means that no analysis is

needed to nd the target of the call.

Table 6.3: Analysis of the program of Figure 6.17.

Statement considered Expression(s) of interest

60 (start) m[eax59 +8]+8]


59 eax59 := m[edi57 ]? m[m[edi57 ]+8]
57 edi57 := ebx49 (eectively) m[m[ebx49 ]+8]
49 call ... ebx49 =ebx43 m[m[ebx43 ]+8]
43 call ... ebx43 =eax18 + 16 m[m[eax18 +16]+8]
31 m[eax18 + 16]31 := eax18 + 36 m[eax18 +44]
37 m[eax18 + 44]37 := 0x8049c10 0x8049c10

Table 6.3 shows the equivalent analysis with these assumptions. In the SSA form,

assignments such as esi19 := eax18 are valid everywhere that they could appear. Un-

fortunately, although aliasing is mitigated (witness all the memory expressions that are

expressed in terms ofeax18 in Figure 6.16), aliasing can still exist because of statements
such as eax59 := m[edi57 ]? , where the data ow analysis has not been able to determine

where m[edi57 ] has been assigned to. Hence, with SSA form, there is no need for strict

direction reversals as with the rst algorithm, but sometimes more than one statement

has to be considered. Even so, the advantage of the SSA form version is apparent by

comparing Tables 6.2 and 6.3.

6.3.1.2 Null-Preserved Pointers

Compilers sometimes emit code that preserves the nullness of pointers that are osets

from other pointers, necessitating some special simplication rules.

In Figure 6.16, the instructions at addresses 8048838-8048842 guarantee that a pointer

into a memory allocation that may have failed will be NULL if that allocation failed.

For example, if the return value from operator new at address 80487ab returns NULL,
then the value of esi at address 804883a will be zero. If so, the sete (set if equal)

instruction will set register al (and hence eax) to 1, which will set edi to 0 at 804883f.

When this is anded with ebx at 8048842, the result will be 0 (NULL). Any nonzero

value returned from operator new will cause the sete instruction to assign 0 to eax,

-1 to edi, and after the and instruction, edx will have a copy of ebx (which has the

result of the call to operator new plus 16).


218 Indirect Jumps and Calls

Without special attention, these constructs result in expressions such as

edi57 := ((0 | (esi49 =0)) - 1) & ebx49 . This could be overcome with a simpli-

cation such as ((x = 0) − 1) & y → y . This construct could occur in the original

program's source code, so this simplication rule should be restricted to this sort of

analysis.

The code sequence at 8048852-8048856 of Figure 6.16 implements a similar construct for

memory references inside the potentially invalid memory block. (The idea seems to be

to make sure that any fault should happen in user code, not in compiler generated code).

In this case, the simplication (x = 0) ? 0 : y → y could be used, again restricted to

this analysis. Note that the branch in this sequence causes extra basic blocks that would

not otherwise be present, and this simplication avoids the complexity of following more

than one in-edge for the basic block currently being analysed.

6.3.2 Recovering the Class Hierarchy

Value analysis on the VT pointer member, discussed in earlier sections, allows the

comparison of VTs which may give clues about the original class hierarchy.

When decompiling object oriented programs, it is desirable to be able to recreate as

closely as possible the class hierarchy of the original source code in the decompiler

output. If at a virtual function call the associated object could be of type A or B,

then A and B must be related in the class hierarchy of the original program. This

is valuable information that comes from performing value analysis on the VT pointer

elds of objects associated with VFCs.

If class B is derived from class A, it generally follows that the VT for B has at least

the same number of methods as the VT for A. In other words, if the VT for B is a

superset of the VT for A, then B extends A, or B is a subtype of A. However, some

of the methods of B could override the methods of A, and in addition B may have no

new virtual methods that A does not. In fact, B could override all the methods of A, in

which case it is not possible to determine from the VTs alone whether B is derived from

A or A from B. In this case, the size of the object itself (which indicates the number

of data members) could be used to determine which is the superclass. In the case that

the objects are of equal size, then in reality it does not matter which is declared as the

parent, at least as far as compilable output is concerned.

As mentioned in [VEW04], it is sometimes possible to obtain class hierarchy information

by a compiler specic analysis of the executable program.


6.3 Indirect Calls 219

6.3.3 Function Pointers


Value analysis could be applied to function pointers, which should yield a subset of the

set of possible targets, avoiding readability reductions that these pointers would otherwise

cause.

Indirect call instructions that do not implement virtual function calls are most likely

the result of applying function pointers, e.g. (*fptr)(a, b) in the C language.

Value analysis applied to a function pointer would yield the set of possible targets. As

mentioned above in relation to virtual function calls, the precise solution to this is an

NP-hard problem. Hence, approximate solutions (a subset of all possible targets) are

all that can be expected.

As with virtual function pointers, if target information is not available, the possible

consequences are missing functions, excess arguments for the pointer indirect calls, and

a lack of dead code elimination (since there is no preservation information).

6.3.3.1 Correctness

Imprecision in the list of possible targets for indirect calls leads to one of the few cases

where correct output is not possible in general.

Since missing functions obviously imply an incorrect output, the inability to nd the

complete set of targets for indirect calls is one of the very few reasons why decompiled

output cannot be correct in the general case. Recall that for most other limitations of

decompilation, it is possible to emit correct but less readable code.

While the missing functions could be found manually or by using dynamic techniques,

there must still be at least one error in the code for each missing function, since the

original source code must have assigned the address of the function to some variable

or data structure. If the pointer analysis was able to identify this value as a function

pointer, then the function would not be missing. So somewhere in the decompiled code

is the use of a constant (function pointers are usually constants) in an expression, and

this constant will be expressed as an integer when it should be the name of a function.

When the decompiled output is recompiled, the integer will no longer point to a valid

function, and certainly not to the missing function, so the program will not have the

semantics of the original program.

In order for this incorrect output to occur, the necessary conditions are a function that

is reachable only through one or more function pointers, and the value analysis has to

fail to nd that value for the function pointer.


220 Indirect Jumps and Calls

6.3.4 Splitting Functions


A newly discovered indirect call, or occasionally a branch used as a call, could end up

pointing to the middle of an existing function; this situation can be handled by splitting

the function.

When either virtual function analysis or function pointer analysis is successful, a set

of possibly new function targets is available. An indirect branch could be used as a

tail call optimised call to another function, so it is also possible for an indirect branch

to lead to the discovery of a new function. The tail call optimisation replaces a call

and return sequence with a branch; this saves time, uses fewer instructions, and most

importantly saves stack space. In highly recursive programs, the tail call optimisation

can be very signicant.

Occasionally, the newly discovered function target could be in the middle of an existing

function. Balakrishnan and Reps treat this as an error and merely report a message

[BR04]. The function may however be split into two, with a tail call at the end of the

rst part to preserve the semantics. This will obviously not be practical if there are

loops crossing the split point.

Procedure A: Procedure A: Procedure B:

1000: 1000: 1200:

Block 1 Block 1 Block 2

1200: call B
return
Block 2

return

return

Target at 1200

call esi ? call esi

(a) (b)

Figure 6.18: Splitting a function due to a newly discovered call target.

Figure 6.18 shows an example of a function being split. In part (a), the indirect call

instruction has not yet been analysed, and procedure A consists of block1 and block2.

In part (b), a new call target of 1200 has been found, which is in the middle of the
6.4 Related Work 221

existing procedure A, and only one control ow edge crosses the address 1200. If the

address 1200 was in the middle of a basic block, it could be split, as routinely happens

during decoding when new branch targets are discovered.

The instructions from address 1200 to the end of the function become a new procedure

B as shown. In order that the semantics of procedure A are maintained, a call to B is

inserted at the end of block1, and a return statement is inserted after the call. Thus,

after executing block 1 in A, block 2 is executed, control returns to the return statement

in A, and A returns as normal to its caller. This is eectively the reverse of the tail call

optimisation.

6.4 Related Work

De Sutter et al. observe a similar problem to the phase ordering problem of Section

6.1 when constructing the ICFG (Interprocedural Control Flow Graph) from sets of
+
object les [SBB 00]. They solve it by using a so-called hell node in the ICFG, which

unknown control ows from indirect jumps or calls lead to. They construct control

ow edges also from the hell node to all possible successors of indirect jumps or calls.

For their domain, where relocation information is available, all possible successors

is a reasonably bounded set. However, for decompilation, relocation information is

generally not available, so the hell node approach is not suitable.

Tröger and Cifuentes implemented a way of analysing virtual function calls to nd the

object pointer, virtual table pointer oset, and method oset [TC02]. They relied on

dynamic techniques to nd actual values for the object pointer. They used a simple

algorithm that was limited to the basic block the the virtual call is in, and did not have

the advantages of expression propagation. Expression simplication was used, but only

a few special cases were considered.

Vinciguerra et al. surveyed the various techniques available to disassemblers for nding
+
the code in an executable program [VWK 03]. The same techniques are used in the

decoder component of a machine code decompiler. The problem of indirect jumps

and calls is cited as the problem which has dominated much of the work on C/C++

disassembly tools. Slicing and data ow guided techniques are mentioned, but not

with the advantages of expression propagation. The authors also mentioned that the

eect of limitations in this process produce, at best, a less readable but correct program.

Reps et al. perform security analysis on x86 executables [BR04]. To cope with indirect

jumps and calls, they use a combination of the heuristics in IDA Pro and their Value

Set Analysis (VSA). VSA has already been mentioned as being needed for the solution
222 Indirect Jumps and Calls

of several problems in type analysis and the analysis of indirect jumps and calls. They

do not split functions when a new target is in the middle of an existing function, but

presumably this could be done using IDA Pro's API.

Harris and Miller describe a set of analyses that split a machine code executable into

functions [HM05]. They claim that compiler independent techniques are not necessary,

because in practice simple machine and compiler dependent methods have proved ef-

fective at recovering jump table values. They do not appear to analyse indirect calls,

relying instead on various heuristics to nd procedures in the gaps between regions

reachable with recursive traversal.


Chapter 7

Results

Several techniques introduced in earlier chapters were veried with a real decompiler,

and show that good results are possible with their use.

Many of the analyses described in earlier chapters have been tested on the Boomerang

decompiler [Boo02]. Simpler machine code decompilers, such as REC, can be relied on

to produce passable output on programs of almost any size, but Boomerang specialises

on correct, recompilable output for a wide range of small programs. Boomerang's test

suite consists of some 75 small programs, of which about a dozen fail for various reasons.

The results in this chapter relate to material in earlier chapters. Section 7.1 describes

an industry case study which conrms the limitations of current decompilers stated in

Chapter 1. Chapter 3 showed how useful expression propagation is for decompilation,

but how unlimited propagation can in some circumstances lead to poor readability.

Section 7.2 shows that common subexpression elimination is not the answer, but a

simple heuristic is quite eective. The usefulness of the SSA form is demonstrated

in Chapter 4, although in certain circumstances involving overwriting statements in a

loop, extraneous variables can be emitted. Section 7.3 gives the results of applying

Algorithm 1 to an example where many extraneous variables were being generated.

Section 4.3.2 showed how a preserved location could be a parameter; Section 7.4 presents

an example and results. Preservation analysis is shown in detail in Section 7.5. In the

presence of recursion, preservation analysis is much more complex, as discussed in

Section 4.4.2. Section 7.5.1 gives detailed results of applying the theory of Section 4.4.2

to its example. Recursion also complicates the elimination of redundant parameters

and returns, as Section 4.4.3 indicated. Section 7.6 presents an example which shows

good results.

223
224 Results

7.1 Industry Case Study

When a Windows program was decompiled for a client, the deciencies of existing de-

compilers were conrmed, and the importance of recovering structures was highlighted.

Chapter 1 discussed the problems faced by machine code decompilers, and Chapter 2

reviewed the limitations of existing decompilers. This section conrms the limitations

of existing decompilers in an industrial setting.

The author of this thesis and a colleague were asked to attempt a partial decompilation

of a 670KB Windows program [VEW04]. Source code was available, but it was for a

considerably earlier version of the program. The clients were aware of the limitations

of machine code decompilation.

While there were a few surprising results, in general the case study conrmed the

ndings of Chapters 1 and 2. At the beginning of the study, Boomerang was broadly

comparable in capabilities with other existing machine code decompilers. Type analysis

was ad hoc, and structures were not supported. The parameter recovery logic assumed

that all parameters were passed on the stack, which was often not the case in the C++

Windows executable to be decompiled.

As a result of this study, the salient problems facing machine code decompilers were

conrmed to be

• weak or non-existent type analysis;

• lack of support for structures and compound types in general;

• incomplete recovery of parameters and returns; and

• poor handling of indirect control transfers, particularly indirect call instructions.

One speculation from the paper ([VEW04]) has since been overturned. Section 5.2 of

the paper, referring to the problem of excessive propagation, states that Some form of

common subexpression elimination could solve this problem. As shown below, this is

not the case, and an alternative solution is given.

7.2 Limiting Expression Propagation

Common Subexpression Elimination does not solve the problem of excessive expression
7.2 Limiting Expression Propagation 225

propagation; the solution lies with limiting propagation of complex expressions to more

than one destination.

Section 3.2 on page 68 states that Common Subexpression Elimination (CSE) does

not solve the problem of excessive propagation, but preventing propagation of complex

expressions to more than one destination does. Results follow that verify this statement.

Figure 7.1 shows part of the sample output of Figure 8 in [VEW04]. It shows two cases

where excessive propagation has made the output less readable than it could be: the

if statement of lines 18-21, and the if statement of line 24.

13 if ((*(char*)(local11 + 68) & 1) != 0) {


14 local26 = (/* opTruncs/u */ (int) (sizePixelsPerTick_cx *
15 -2004318071 > > 32) + sizePixelsPerTick_cx > > 4) + (/* opTruncs/u */
16 (int) (sizePixelsPerTick_cx * -2004318071 > > 32) +
17 sizePixelsPerTick_cx > > 4) / -2147483648;
18 if ((/* opTruncs/u */ (int) (sizePixelsPerTick_cx *
19 -2004318071 > > 32) + sizePixelsPerTick_cx > > 4) + (/* opTruncs/u */
20 (int) (sizePixelsPerTick_cx * -2004318071 > > 32) +
21 sizePixelsPerTick_cx > > 4) / -2147483648 < 2) {
22 local26 = 2;
23 }
24 if (local26 >= maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0) > > 1) {
25 local26 = (maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0) > > 1) - 1;
26 }

Figure 7.1: Output from [VEW04], Figure 8.

Value numbering was used to generate a table of subexpressions whose values were

available on the right hand sides of assignments. Because of the simple nature of ma-

chine instructions, almost all subexpressions are available at some point in the program.

CSE was performed before any dead code was eliminated, so all propagated expressions

still had their originating assignments available. A C++ map was used in place of the

usual hash table, to avoid issues arising from collisions. Statements were processed in

order from top to bottom of the procedure. Figure 7.2 shows the results of applying

CSE to the code of Figure 7.1.

Line 10 is now simplied as desired (it corresponds to lines 18-21 of Figure 7.1). Sim-

ilarly, lines 16 and 17 are simplied (corresponding to lines 24 and 25 respectively of

Figure 7.1). However, there are now far too many local variables, and the semantics of

individual instructions is again evident. What is desired for ease of reading is propaga-

tion where possible, except in cases where an already complex expression (as measured

e.g. by the number of subexpressions) would be propagated to more than one destina-

tion. In other words, it is better to prevent excessive propagation, rather than trying

to repair excessive propagation with CSE.


226 Results

1 tmp = *(int*)(local64 + 56) & 1;


2 if (tmp != 0) {
3 tmp = local84 * 0x88888889;
4 local143 = (unsigned int) tmp > > 32;
5 local143 += local84;
6 local83 = local143 > > 4;
7 local144 = local83 / 0x80000000;
8 local143 = local83 + local144;
9 local142 = local83 + local144;
10 if (local143 < 2) {
11 local142 = 2;
12 }
13 local143 = maxTickSizeX < 0 ? -1 : 0;
14 local144 = maxTickSizeX - local143;
15 local144 = local144 > > 1;
16 if (local142 >= local144) {
17 local142 = local144 - 1;
18 }

Figure 7.2: Common subexpression elimination applied to the same


code as Figure 7.1.

The propagation algorithm of Boomerang has been updated as follows. Before each

propagation pass, a map is now created of expressions that could be propagated to. The

map records a count of each unique expression that could be propagated to (an SSA

subscripted location). In the main expression propagation pass, before each expression

is propagated from the right hand side of an assignment to a use, this map is consulted.

If the number recorded in the map is greater than 1, and if the complexity of the

expression to be propagated is more than a given threshold (set with a command line

parameter), the expression is not propagated. The complexity is estimated as the

1 if ((*(int*)(local42 + 56) & 1) != 0) {


2 local80 = (unsigned int) (local46 * 0x88888889 > > 32) + local46 > > 4;
3 local80 += local80 / 0x80000000;
4 local79 = local80;
5 if (local80 < 2) {
6 local79 = 2;
7 }
8 local81 = maxTickSizeX;
9 local81 = local81 - (local81 < 0 ? -1 : 0) > > 1;
10 if (local79 >= local81) {
11 local79 = local81 - 1;
12 }

Figure 7.3: Propagation limited to below complexity 2, applied to the same code as
Figure 7.1.
7.2 Limiting Expression Propagation 227

1 if ((*(int*)(local45 + 56) & 1) != 0) {


2 local83 = (unsigned int) local49 * 0x88888889 > > 32;
3 local83 = (local83 + local49 > > 4) + (local83 + local49 > > 4)/0x80000000;
4 local82 = local83;
5 if (local83 < 2) {
6 local82 = 2;
7 }
8 local84 = maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0);
9 if (local82 >= local84 > > 1) {
10 local82 = (local84 > > 1) - 1;
11 }

Figure 7.4: Propagation limited to below complexity 3, applied to the same code as
Figure 7.1.

1 if ((*(int*)(local44 + 56) & 1) != 0) {


2 local82 = (unsigned int) (local48 * 0x88888889 > > 32) + local48;
3 local82 = (local82 > > 4) + (local82 > > 4) / 0x80000000;
4 local81 = local82;
5 if (local82 < 2) {
6 local81 = 2;
7 }
8 local83 = maxTickSizeX - (maxTickSizeX < 0 ? -1 : 0) > > 1;
9 if (local81 >= local83) {
10 local81 = local83 - 1;
11 }

Figure 7.5: Propagation limited to below complexity 4, applied to the same


code as Figure 7.1.

number of operators in the expression (binary and ternary operators, and the memory

of  operator).

Performing this limited expression propagation on the same code from Figure 7.1 re-

sults in the output shown in Figures 7.3 - 7.5. The dierence between these is that

the rst prevents propagation of expressions of complexity 2 or more, the second with

expressions of complexity 3 or more, and the last with expressions of complexity 4 or

more. The if statements are again simplied, but complex expressions are retained for
the other statements, for maximum readability. Although the number of lines of code

only reduces from 14 to 11 or 12, the readability of the code is considerably improved.

Readability was measured with three metrics: the character count (excluding multi-

ple spaces, newlines, and comments), the Halstead diculty metric, and the Halstead

program length metric [Hal77]. The Halstead program length metric is the sum of the

number of operators and operands in the program.

Metrics based on control ow complexity (e.g. McCabe Cyclomatic Complexity [McC76])

will not nd any dierence due to expression propagation or the lack thereof, since the
228 Results

control ow graph is not aected by propagation. Table 7.1 shows the results for the

selected metrics. The Halstead metrics were calculated with a public domain program

[Uti92].

Table 7.1: Complexity metrics for the code in Figures 7.1 - 7.5.

Character Halstead Halstead


count Diculty Length

Unlimited propagation (Figure 7.1) 595 28.3 142


CSE (Figure 7.2) 464 28.1 116
Limited propagation (limit = 2, Figure 7.5) 329 22.8 93
Limited propagation (limit = 3, Figure 7.5) 357 23.8 111
Limited propagation (limit = 4, Figure 7.5) 342 22.7 107

Varying the expression depth limit causes minor variations in the output, as can be

seen by inspecting the decompiled output of Figures 7.3 - 7.5. The metrics vary in an

unpredictable way with this limit, because of a kind of interference eect. For example,

preventing a propagation at one statement will lead to an intermediate statement of low

complexity, but this statement may now be able to be propagated to a later statement.

If the early propagation succeeded, the later propagation may have failed. Whether the

overall metrics improve or not ends up depending on chance factors. While the eect of

the propagation limit is not very large (the Halstead Diculty varies from 22.7 to 28.3,

a 25% dierence), it does suggest that propagation could be one area where user input

will be required for the most readable result (according to the tastes of the reader).

The low value for the Halstead length metric in the CSE case is because of the extreme

re-use of each subexpression; nothing is calculated more than once. The fact that

there are many local variables and assignments is not reected in this metric; the left

hand sides of each assignment are ignored. This metric is therefore not measuring the

distraction and diculty of dealing with a needlessly large number of local variables,

and hence this metric does not show the dramatic dierence that either the character

count or Halstead Diculty show for limited propagation.

As a further example of excessive propagation, consider the program fragment in Fig-

ure 7.6. As shown in Figure 7.7, the compiler uses an idiomatic sequence involving

the subtract with carry instruction. Output from the Boomerang decompiler without

expression limiting is shown in Figure 7.8. Boomerang has special code to recognise

the use of the carry ag in this idiom, but not to emit the conditional assignments that

the idiom represents. The result is completely incomprehensible code, but it is at least

correct. Figure 7.9 shows the same code with expression propagation limiting in eect.

While still dicult to comprehend, the result is much better laid out.
7.2 Limiting Expression Propagation 229

int test(int i) {
if (i < -2) i = -2;
if (i > 3) i = 3;
printf("MinMax result %d\n", i);
}
Figure 7.6: Original C source code for function test in the Boomerang SPARC
minmax2 test program. The code was compiled with Sun's compiler using -xO2
optimisation.

mov -0x2, %g1


sra %g1, 0x1f, %g2
mov -0x1, %g4
subcc %g1, 0x3, %g1
sra %o0, 0x1f, %o5
subx %g2, 0x0, %g2
subcc %g1, %o0, %g2
and %g1, %g2, %g1
subx %g4, %o5, %g4
add %g1, 0x3, %o1
and %g2, %g4, %g2
mov %o7, %g1
sethi %hi(0x10400), %g3
call printf
sub %g1, %g2, %g1
mov %g1, %o7
add %g3, 0x34c, %o0
Figure 7.7: Disassembled SPARC machine code for the program fragment of
Figure 7.6. Note the subcc (subtract and set condition codes) and subx (sub-
tract extended, i.e. with carry) instructions.

void test(int param1) {


printf("MinMax result %d\n",
(-5 - (-2 - param1 & -1 - (param1 > > 31) +
(0xfffffffe < (unsigned)param1)) &
(-2 - (-2 - param1 & -1 - (param1 > > 31) +
(0xfffffffe < (unsigned)param1)) > > 31) -
((unsigned)(-2 - (-2 - param1 & -1 - (param1 > > 31) +
(0xfffffffe < (unsigned)param1))) < 3)) + 3);
return;
}
Figure 7.8: Decompiled output for the program fragment of Figure 7.6, without
expression propagation limiting.

void test(int param1) {


int local0; // r4
int local1; // r1
int local2; // r2
local0 = -1 - (param1 > > 31) + (0xfffffffe < (unsigned)param1);
local1 = -2 - (-2 - param1 & local0);
local2 = (local1 > > 31) - ((unsigned)local1 < 3);
printf("MinMax result %d\n", (local1 - 3 & local2) + 3);
return;
}
Figure 7.9: Decompiled output for the program fragment of Figure 7.6, but
with expression propagation limiting.
230 Results

7.3 Preventing Extraneous Local Variables

When the techniques of Section 4.1.3 are applied to the running example, the generated

code is signicantly more readable.

Figure 4.11 on page 111 showed output from an early version of Boomerang which had

no logic for limiting extraneous variables. It is reproduced in Figure 7.10 with the local

variables renamed as far as possible the same names as the original registers. Variable

st represents the stack top register of the x86 processor. This is done to facilitate

comparison with later output, which does this automatically. There are 10 statements

inside the loop.

do {
edx_1 = edx;
ebx_1 = ebx;
esi_1 = esi;
st_1 = st;
edx_2 = edx_1 - 1;
st = st_1 * (float)edx_1 / (float)esi_1;
esi_2 = esi_1 - 1;
ebx = ebx_1 - 1;
edx = edx_2;
esi = esi_2;
} while (ebx_1 >= 1)
st_2 = (int)(st_1 * (float)edx_1 / (float)esi_1);

Figure 7.10: A copy of the output of Figure 4.11 with local variables named
after the registers they originated from.

Figure 7.11(a) shows the output from a more recent version of Boomerang, which in-

corporates the expression limiting heuristic (with expression depth limited to 3), and

makes better choices about which version of variable to allocate to a new variable.

Already, it is a vast improvement. In this version, there are 5 statements inside the

loop.

Figure 7.11(b) shows output from Boomerang using the -X (experimental) ag. This ag

uses Algorithm 1 in Section 4.1.3 to minimise extraneous variables by not propagating

overwriting statements in a loop. A side eect is that the assignment to st is split onto
two lines, so there are still ve statements in the loop, but no extraneous variables.

With support for the C post decrement operator (e.g. edx--), output similar to the

original source code would be possible.

As of April 2007, the Boomerang source code implementing the algorithm of Section

4.1.3 is awed, using the invalid concept of dominance numbers. It will fail in some
7.4 Preserved Parameters 231

do { do {
edx_1 = edx; st = st * (float)edx;
edx = edx_1 - 1; edx = edx - 1;
st = st * (float)edx_1 / (float)esi; st = st / (float)esi;
esi = esi - 1; esi = esi - 1;
ebx = ebx - 1; ebx = ebx - 1;
} while (ebx >= 0); } while (ebx >= 0);
edx = (int)st; edx = (int)st;

(a) Standard switches (b) With dominator heuristics

Figure 7.11: The code of Figure 4.11 with limited propagation.

cases, although the aw does not aect the example.

7.4 Preserved Parameters

Preserved locations appear to be parameters, when usually they are not, but sometimes

a preserved location can be a parameter. Propagation and dead code elimination in

combination solve this problem.

Figure 7.12 shows the x86 assembly language for a simple function that takes one

parameter and returns a result, when the parameter is a register that is saved, used as

a parameter, overwritten, and restored.

# parameter is passed in register %ebx


twice:
push %ebx # save ebx
mov %ebx, %eax # copy parameter to return register, %eax
addl %eax, %eax # double eax
mov $55, %ebx # usually preserved registers are overwritten
pop %ebx # restore ebx
ret
Figure 7.12: Assembly language source code for part of the Boomerang
restoredparam test.

Figure 7.13 shows the intermediate representation for the program, after expression

propagation, but before dead code elimination. Note how the save instruction becomes

dead code, the restore instruction becomes a null statement, and the use of the param-

eter in statement 3 has no denition (hence the zero subscript).

After dead code elimination, the only statement remaining is the return statement. The

decompiled output is correct, as shown in Figure 7.14.


232 Results

1 esp := esp0 - 4 // Decrement the stack pointer


2 m[esp0 -4] := ebx0 // Save ebx (result no longer used)
3 eax := ebx0 // Copy parameter to result register
4 tmp1 := ebx0
5 eax := ebx0 + ebx0 // Double result (also in return statement)
6 %flags := ADDFLAGS32( ... )
7 ebx := 55 // Overwrite ebx (not used)
8 ebx := ebx0 // Restore ebx (not used)
9 esp := esp0
10 %pc := m[esp0 ]0
11 esp := esp0 + 4
12 RET eax := ebx0 + ebx0 // Return result

Figure 7.13: Intermediate representation for the code of Figure 7.12, just before
dead code elimination.

int twice(int param1) {


return param1 +
param1;
}

Figure 7.14: Boomerang output for the code of Figure 7.12. The
parameter and return are identied correctly.

Figure 7.15 shows the source code for a Fibonacci function, and Figure 7.16 shows

the disassembly of an MS-DOS compilation of it. It is a modied version of a test

program for the dcc decompiler [Cif94]. Where the original program saved the SI

register with a PUSH instruction and then loaded the parameter from the stack, this

version passes the parameter in SI yet still PUSHes and POPs the register. The register

SI is therefore both preserved and a parameter. The resultant program still runs

on a 32-bit personal computer, despite the age of the instruction set (from 1978), and

performs the same operation as the original program in fewer instructions (not counting

NOPs). It is therefore a valid program that machine code decompilers should be able to

generate valid source code for.

unsigned fib(x) /* compute fibonacci number recursively */


int x;
{
if (x > 2)
return (fib(x - 1) + fib(x - 2));
else
return (1);
}

Figure 7.15: Original source code for a Fibonacci function. From [Cif94].
7.4 Preserved Parameters 233

035B 55 PUSH BP
035C 8BEC MOV BP,SP
035E 56 PUSH SI
035F 909090 NOP ; Was MOV SI,[BP+4]
0362 83FE02 CMP SI,+02
0365 7E1C JLE 0383
0367 4E DEC SI ; Was MOV AX,SI
0368 9090 NOP ; Was DEC AX
036A 90 NOP ; Was PUSH AX
036B E8EDFF CALL 035B
036E 5E POP SI ; Was POP CX; get original parameter
036F 56 PUSH SI
0370 50 PUSH AX
0371 83C6FE ADD SI,-02 ; Was MOV AX,SI
0374 90 NOP ; Was ADD AX,-2
0375 90 NOP ; Was PUSH AX
0376 E8E2FF CALL 035B
0379 90 NOP ; Was POP CX
037A 8BD0 MOV DX,AX
037C 58 POP AX
037D 03C2 ADD AX,DX
037F EB07 JMP 0388
0381 EB05 JMP 0388
0383 B80100 MOV AX,0001
0386 EB00 JMP 0388
0388 5E POP SI
0389 5D POP BP
038A C3 RET

Figure 7.16: Disassembly of the modied Fibonacci program adapted from [Cif94].

As shown in Figures 7.17 and 7.18(a), the dcc and REC decompilers do not produce

valid source code. Both decompilers do not identify any parameters, although REC

passes two arguments to one call and none to the other. Both emit invalid C code for

POP instructions.

The Boomerang decompiler uses propagation and dead code elimination to identify the
1
preserved parameter. The result of decompiling the 32-bit equivalent of the program

of Figure 7.16 is shown in Figure 7.18(b). This program also demonstrates the removal

of unused parameters and returns, in the presence of recursion.

1 The 32-bit version was not completely equivalent. The 32-bit version originated from the source
code of Figure 7.19. The two minor dierences cause b(0) to return 0, as is usually considered correct.
234 Results

int proc_1 ()
/* Takes no parameters.
* High-level language prologue code.
*/
{
int loc1;
int loc2; /* ax */
if (loc1 > 2) {
loc1 = (loc1 - 1);
POP loc1
loc1 = (loc1 + 0xFFFE); /* 0xFFFE = -2 */
loc2 = (proc_1 () + proc_1 ());
} else { ...
Figure 7.17: Output from the dcc decompiler for the program of Figure 7.16.

L0000025b()
{
/* unknown */ void si; int fib(int param1) {
if(si <= 2) { __size32 eax;
ax = 1; __size32 eax_1; // eax30
} else { int local2; // m[esp - 12]
si = si - 1; if (param1 <= 1) {
(restore)si; local2 = param1;
si = si - 2; } else {
dx = L0000025B( eax = fib(param1 - 1);
L0000025B(), si); eax_1 = fib(param1 - 2);
(restore)ax; local2 = eax + eax_1;
ax = ax + dx; }
} return local2;
} }

(a) REC (b) Boomerang (32-bit equivalent)

Figure 7.18: Output from the REC and Boomerang decompilers for the pro-
gram of Figure 7.16 and its 32-bit equivalent respectively.

7.5 Preservation

Most components of the preservation process are facilitated by the SSA form.

The Boomerang machine code decompiler has an equation solver. It was added when it

was found necessary to determine whether registers are preserved or not in the presence

of recursion. Figure 7.21 shows the output of Boomerang's solver, nding that esi (an
x86 register) is preserved in the recursive Fibonacci function of Figures 7.19 (source

code) and 7.20 (IR).


7.5 Preservation 235

int fib (int x)


{
if (x > 1)
return (fib(x - 1) + fib(x - 2));
else return (x);
}
Figure 7.19: Source code for a slightly dierent Fibonacci function.

It begins with the premise esi35 = esi0 (i.e. esi as last dened is the same as esi
on entry; the return statement contains information about which locations reach the

exit). Various rules are applied to the current equation: propagation into the left hand

side, adding constants to both sides, using the commutation property of equality (swap

left and right sides), and so on. For example on the second line, the LHS is esi35
(esi as dened at line 35). In Boomerang, the subscript on esi is actually a pointer

to the statement at line 35, so again the propagation is very easy. The RHS has esi0 ,

and the proof engine actually searches for statements (such as statement 5) which save

the value of esi0 . m[] (the memory-of operator) is removed from


On the fth line,

both sides (i.e. to prove that m[a] = m[b], it is sucient to show that a = b). When

the left hand side could be propagated to by a φ-function, then the current equation

has to be proved for each operand. In the example, the current equation becomes

ebp46 = (esp4 +4). Statement 46 is a φ-function (46:


ebp := φ(ebp26 , ebp3 )). In
eect, this expression represents two separate equations to be solved: ebp26 = (esp4 +4),

and ebp3 = (esp4 +4). (Control ow could be such that ebp could be dened at state-

ment 26, or at statement 3, so it is necessary to prove both subproblems in order to

prove the premise, i.e. that esi is preserved through the whole function.) Each of these

equations is proved separately, thereby proving the premise.

Without the SSA form, it would be necessary to separate various versions of each

location, such as  esi on entry,  esi at the procedure exit, and  esi after statement

26.

7.5.1 Conditional Preservation Analysis


Preservation analysis in the presence of mutual recursion is complex, as this example

shows.

A test program was written to exercise the conditional preservation analysis of Sec-

tion 4.4.2 on page 130. Figure 7.22 shows the call graph for the program, deliberately

designed to copy Figure 4.26 on page 129. An outline of the test program's source code

is given in Figure 7.23.


236 Results

08048798 1 m[esp0 - 4] := ebp0


2 esp := esp0 - 4
08048799 3 ebp := esp0 - 4
0804879b 4 m[esp0 - 8] := esi0
5 esp := esp0 - 8
0804879c 6 m[esp0 - 12] := ebx0
7 esp := esp0 - 12
0804879d 8 ebx := m[esp0 + 4]
080487a0 9 tmp1 := ebx8 - 1
080487a3 11 BRANCH 0x80487c0, ebx8 <= 1
080487a5 12 eax := ebx8 - 1
080487a8 13 m[esp0 - 16] := ebx8 - 1
14 esp := esp0 - 16
080487a9 15 esp := esp0 - 20
18 <all> := CALL fib(<all>)
080487ae 19 esi := eax18
080487b0 20 eax := ebx18 - 2
080487b3 21 m[esp0 - 20] := ebx18 - 2
22 esp := esp0 - 20
080487b4 23 esp := esp0 - 24
26 <all> := CALL fib(<all>)
080487b9 27 tmp1 := eax26
28 eax := eax26 + esi26
080487bb 30 GOTO 0x80487c2
080487c0 31 eax := ebx8
00000000 43 eax := φ(eax28 , eax31 )
44 ebx := φ(ebx26 , ebx8 )
45 esp := φ(esp26 , esp7 )
46 ebp := φ(ebp26 , ebp3 )
47 esi := φ(esi26 , esi0 )
48 tmp1 := φ(tmp127 , tmp19 )
080487c2 32 esp := ebp46 - 8
080487c5 33 ebx := m[ebp46 - 8]
34 esp := ebp46 - 4
080487c6 35 esi := m[ebp46 - 4]
36 esp := ebp46
080487c7 37 esp := ebp46
38 ebp := m[ebp46 ]
39 esp := ebp46 + 4
080487c8 41 esp := ebp46 + 8
42 RET eax := eax43

Figure 7.20: IR for the Fibonacci function of Figure 7.19.

The global variable res is used to ensure that each procedure is called the correct number

of times. Each procedure increments res by a prime number, so that if the program

outputs the correct total, the control ow is very likely correct. Global variables with
7.5 Preservation 237

attempting to prove esi is preserved by b The problem


esi35 = esi0 The premise
m[esp34 ] = esi0 Propagate into the left hand side (LHS)
m[esp34 ] = m[esp4 ] Use 5: m[esp4 ]:=esi0 on RHS
esp34 = esp4 Remove m[] from both sides
(esp32 + 4) = esp4 Propagate into LHS
esp32 = (esp4 + -4) Subtract 4 from both sides
esp32 = (esp4 - 4) a + -b → a - b
(ebp46 - 8) = (esp4 - 4) Propagate into LHS again
ebp49 = ((esp4 - 4) + 8) Add 8 to both sides
ebp49 = (esp4 + 4) Simplify
found φ(ebp26 , ebp3 ) prove for each Note 46: ebp := φ(ebp26 , ebp3 )
proving for ebp26 = (esp4 + 4) First φ-function operand
ebp26 = (esp4 + 4) New premise
using proven (or induction) for b ebp = We have earlier proved that ebp is preserved
ebp Therefore can propagate across call
ebp18 = (esp4 + 4) There is another recursive call
using proven (or induction) for b ebp = Propagate across this call
ebp Propagate into LHS
ebp3 = (esp4 + 4) Propagate into LHS
esp1 = (esp4 + 4) Add 4 to both sides
(esp0 - 4) = (esp4 + 4) Simplify
esp0 = ((esp4 + 4) + 4) Commute
esp0 = (esp4 + 8) Subtract 8 from both sides
(esp4 + 8) = esp0 a + -b → a - b
esp4 = (esp0 + -8) Substitute LHS
esp4 = (esp0 - 8) Add 4 to both sides
(esp1 - 4) = (esp0 - 8) Simplify
esp1 = ((esp0 - 8) + 4) Substitute LHS
esp1 = (esp0 - 4) LHS = RHS!
(esp0 - 4) = (esp0 - 4) Second φ-function operand
true New premise
proving for ebp3 = (esp4 + 4) Substitute LHS
ebp3 = (esp4 + 4) Substitute LHS
esp1 = (esp4 + 4) Add 4 to both sides
(esp0 - 4) = (esp4 + 4) Simplify
esp0 = ((esp4 + 4) + 4) Commute
esp0 = (esp4 + 8) Subtract 8 from both sides
(esp4 + 8) = esp0 a + -b → a - b
esp4 = (esp0 + -8) Substitute LHS
esp4 = (esp0 - 8) Add 4 to both sides
(esp1 - 4) = (esp0 - 8) Simplify
esp1 = ((esp0 - 8) + 4) Substitute LHS
esp1 = (esp0 - 4) LHS = RHS
(esp0 - 4) = (esp0 - 4)
Now proved for each φ-function operand
true
true
Figure 7.21: Debug output from Boomerang while nding that esi (register esi) is
preserved (saved and restored).
238 Results

main

b Legend:

: exchange ecx, edx


before and after call
c edx : decrement edx
after call to e

d f h j l

e g i k
edx

Figure 7.22: Call graph for the Boomerang test program


test/pentium/recursion2.

/* Global aected as a side eect from all calls */ void c() {


int res = 0; if (--c_d >= 0) d();
/* Number of times to traverse b to c edge */ if (--c_f >= 0) f();
int b_c = 3; if (--c_h >= 0) h();
int c_d = 3; if (--c_j >= 0) j();
... if (--c_l >= 0) l();
int l_b = 3; res += 3;
int main(int argc) { }
b(); ...
printf("ecx is %d, edx is %d\n", 0, 0); void k() {
printf("res is %d\n", res); if (--k_e >= 0) e();
return 0; res += 27;
} }
void b() { void l() {
if (--b_c >= 0) c(); if (--l_b >= 0) b();
res += 2; res += 29;
} }
Figure 7.23: An outline of the source code for the program
test/pentium/recursion2 from the Boomerang test suite.

names of the form x_y are used to control the recursion from procedure x to procedure

y. All such globals are initialised to 3; the program outputs res is 533. To make

this test more dicult for the conditional preservation analysis, there are instructions

to exchange the contents of machine registers ecx and edx at the points indicated on
7.5 Preservation 239

Figure 7.22, and a single instruction to decrement register edx is placed in procedure

k after the call to e. This results in edx changing value, but also register ecx, since

there is a path in the call graph where there is an odd number of exchange instructions

(c-d-e-c ).

The rst cycle detected is f-g-f, when g 's child f is found in path. The rst preservation

to succeed is esp = esp+4. As with most x86 procedures, the stack pointer for g is

preserved in the base pointer register (ebp). Because of the control ow join at the

end of g, there is a φ-statement dening the nal value of ebp, so the proof has to

succeed for both paths. One path is through the call to f, so the preservation of ebp in
g depends on the preservation of ebp in f. f has a similar structure to g, so again there

is a φ-statement, the proof has to succeed for both paths, and now the preservation for

ebp in f depends on the preservation of ebp in g. Note that the algorithm has now
come almost full circle: esp in g depends on ebp in g, ebp in g depends on ebp in f, ebp

in f depends onebp in g. Since preservation that depends only on recursive calls can be
assumed to succeed, ebp is indeed assumed to succeed in g (note: this is not yet proven,

it is a conditional result). No other problems are found with this preservation, so nally

esp = esp+4 is proven. Note that the intermediate conditional results (ebp=ebp in g

and f ) are not stored, since at the time when the proof function exits, it was not known

if the outer proof would succeed or fail. This is illustrated in the next example.

The nal example is the preservation ecx=ecx in b. Recall that there are instructions to

exchange the value of these registers before and after calls in the c-j-k-e-c, b-c-l, and c-

d-e-c cycles. Due to these, preservation of ecx in b depends on edx in c, which depends
on ecx in d, which depends on edx in e, which depends on ecx in c. Note that while

the algorithm is considering procedure c again, it requires a dierent preservation this

time, so the process continues. ecx in c depends on edx in d, which depends on ecx in

e, which depends on edx in c. Finally, there is a required premise that is already being

tested. As a result, the preservation of edx in c can be conditionally assumed. As will

be shown soon, this does not turn out to be true. In similar vein, the preservations ecx
in e and edx in d are assumed to be conditionally proven. However, c has other calls,

so ecx in c, which depended onedx in d and has conditionally passed, now depends
on edx in j, which depends on ecx in k, which depends on edx in e. This is another

premise, which is conditionally assumed to succeed, and similarly edx for j. There is

still one more call in c, to l, so ecx in c also depends on edx in l. This depends on

ecx in b, the original premise, and so conditionally succeeds. Finally, ecx in c succeeds
conditionally, leading to conditional successes for edx in e and ecx in d. The algorithm

is now considering edx in c, which depends on ecx in j, and in turn edx in k. This

depends on ecx in e, which has been calculated and conditionally proven before, but
240 Results

is calculated anew. After the call to k is the decrement of edx, so one critical path of

the whole proof nally fails, sinceedx0 = edx0 - 1 cannot be proven. As a result, ecx
is assumed to be assigned to in the call to c, ecx remains a parameter and return of b,

c, and all the other procedures involved in the cycle. Similarly, edx is found not to be

preserved, and so remains a parameter and return of those functions also.

//__size32 b(__size32 param1, __size32 param2) {


pair b(__size32 param1, __size32 param2) {
__size32 ecx;
__size32 edx_1; // edx22
pair pr;
b_c = b_c - 1;
if (b_c >= 0) {
//ecx = c(param2, param1); /* Warning: also results in edx_1 */
pr = c(param2, param1);
ecx = pr.first; edx_1 = pr.second;
param2 = ecx;
param1 = edx_1;
}
res += 2;
//return param1; /* WARNING: Also returning: edx := param2 */
pr.first = param1; pr.second = param2;
return pr;
}

Figure 7.24: The code generated for procedure b for the program
test/pentium/recursion2. The Boomerang -X option was used to remove
extraneous variables, as discussed in Section 7.3. The code has been modied by
hand (underlined code) to return more than one location.

Figure 7.24 shows an outline of the code generated for procedure b. Since Boomerang

did not at the time handle more than one return from a procedure, some hand editing

of the output was necessary. The nal program compiled and ran identically to the

original.

7.6 Redundant Parameters and Returns

The techniques of Section 4.4.3 successfully remove redundant parameters and returns

in a test program, improving the readability of the generated code.

Figure 7.25 shows assembly language code for the program of Figure 7.19. A few

instructions were modied so that registers ecx and edx were used and dened in the

function. Without the techniques of Section 4.4.3 on page 134, and assuming that
7.6 Redundant Parameters and Returns 241

fib:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $4, %esp
cmpl $1, 8(%ebp)
jle .L2
movl 8(%ebp), %eax
decl %eax
subl $12, %esp
pushl %eax
call fib
addl $16, %esp
push %edx # Align stack, and make a ...
push %edx # ... use of %edx (but it is dead code)
push %eax # Save intermediate result
movl 8(%ebp), %eax
subl $2, %eax
pushl %eax
call fib
pop %edx # Remove argument, assign to edx
pop %ecx # Get intermediate result, assign to ecx
addl $8, %esp
addl %eax, %ecx # Add the two intermediate results
movl %ecx, -8(%ebp)
jmp .L4
.L2:
movl 8(%ebp), %eax
movl %eax, -8(%ebp)
.L4:
movl -8(%ebp), %eax
movl -4(%ebp), %ebx
leave
ret
Figure 7.25: Assembler source code for a modication of the Fibonacci program
shown in Figure 7.19. The underlined instructions assign values to registers ecx
and edx.

push and pop instructions are not simply ignored, these two registers become extra

parameters and returns of the fib function, as shown by the IR in Figure 7.26.

Although the assignment to edx at statement 44 is never actually used, it appears to be


used by the φ-function in statement 68, which is in turn used by the return statement.

The edx return from b is used to pass an argument to the call at statement 43, and

the edx parameter is used by the φ-function in statement 68.


242 Results

void fib(int param10 , int ecx0 , /*signed?*/int edx0 )


00000000 0 param10 := -
0 ecx0 := -
0 edx0 := -
080483bb 11 BRANCH 0x80483e8, condition param10 <= 1
080483c5 24 {eax24 , ecx24 , edx24 , esp24 } := CALL
fib(param10 -1, ecx0 , edx0 )
080483d7 43 {eax43 , esp43 } := CALL fib(param10 -2, ecx24 , edx24 )
080483dc 44 edx44 := param10 - 2
080483e1 52 ecx52 := eax24 + eax43
080483e3 54 m[esp0 -12] := eax24 + eax43
080483e6 55 GOTO 0x80483ee
080483eb 57 m[esp0 -12] := param10
00000000 67 ecx67 := φ(ecx52 , ecx0 )
68 edx68 := φ(edx44 , edx0 )
78 m[esp0 -12]78 := φ(m[esp0 -12]54 , m[esp0 -12]57 )
080483f5 65 RET eax := m[esp0 -12]78 , ecx := ecx67 , edx := edx68
Figure 7.26: IR part way through the decompilation of the program of Figure
7.25.

Similar comments apply to the assignment to ecx at statement 52, although in this case,

that assignment is gainfully used. Along the path where param10 > 1, ecx is dened

before use, and along the path where param10 ≤ 1, it is only used in the φ-function in
statement 67, which is only ultimately used to pass a redundant parameter to fib.

The techniques of Section 4.4.3 were able to determine that ecx and edx were redundant
parameters and returns, resulting in the generated code of Figure 7.27.

int fib(int param1) {


int eax;
int eax_1; // eax30
int local2; // m[esp - 12]
if (param1 <= 1) {
local2 = param1;
} else {
eax = fib(param1 - 1);
eax_1 = fib(param1 - 2);
local2 = eax + eax_1;
}
return local2;
}
Figure 7.27: Generated code for the pro-
gram of Figures 7.25 and 7.26. Redundant
parameters and returns have been removed.
Chapter 8

Conclusion

8.1 Conclusion

The solutions to several problems with existing machine code decompilers are facilitated

by the use of the SSA form.

The main problems for existing machine code decompilers were found to be the iden-

tication of parameters and returns, type analysis, and the handling of indirect jumps

and calls.

The IR used by a decompiler has a profound inuence on the ease with which certain

operations can be performed. In particular, the propagation of registers is greatly facil-

itated by the SSA form, and propagation is fundamental to transforming the semantics

of many individual instructions into the complex expressions typical of source code. In

fact, unlimited propagation leads to expressions that are too complex, but appropriate

heuristics can produce expressions of appropriate complexity.

Propagating assignments of condition codes into their uses (e.g. conditional branches) is

a special case, where the combination of condition code and branch (or other) condition

results in specic semantics.

Once expressions are propagated, the original denitions become dead code, and can

be eliminated. This leads to the second fundamental transformation of machine code

to high level language: reducing the size of the generated output. The established

techniques of control ow structuring introduce high level constructs such as conditional

statements and loops.

The next major decompilation step is the accurate recovery of call statements: argu-

ments and parameters, returns and results. One of the main inputs to this process

is preservation analysis: deciding whether a particular register or memory location is

243
244 Conclusion

modied by the target of a call, or is preserved. In the main, this is solved by standard

data ow analysis, but the combination of indirect calls and recursion causes problems,

since the results of some calls will not be known when needed. An algorithm for solving

this problem is given.

While the SSA form makes the propagation of registers very easy, it is not safe to

propagate all memory expressions because of the possibility of aliasing. Compilers

can sidestep this issue by never moving load and store operations, although some will

attempt to move them when it can be proved to be safe. The cost of not moving

a memory operation is small for a compiler; a possible optimisation opportunity is

lost. For decompilers, the cost of not being able to propagate memory expressions is a

considerable loss of readability. In particular, all stack local variables, including arrays,

should be converted to variables in the decompiled output. Without this conversion, the

stack pointer will be visible in the decompiled output, and the output will be complex

and may not be portable. A solution to the problem is given, but the solution ignores

some aliasing possibilities.

Few if any existing decompilers attempt type recovery (type analysis) to any serious

degree. This thesis outlines a type analysis implemented with an iterative, data ow

based framework. The SSA form provides convenient, sparse type information stor-

age with each denition and constant. A problem strongly related to type analysis is

the partitioning of the data sections (global, local, and heap allocated) into objects of

distinct types. This problem is made more complex by the widespread practice of colo-

cating variables in the same register or memory location. Compilers use a surprisingly

large number of dierent memory expression patterns to access objects of elementary

types, and the elements of aggregate types. These patterns are explored in detail, but

the resolution of some of the more complex patterns remains for future work. In par-

ticular, the interpretation of the constant K associated with many memory expressions

is complicated by factors such as oset pointers and arrays with nonzero lower bounds.

The usual way to handle indirect jump and call instructions is to perform a pattern-

based local search near the instruction being analysed, usually within the basic block

that the indirect instruction terminates. This search is performed at decode time,

before any more powerful techniques are available. The logic is that without resolving

the indirect jump or call, the control ow graph is incomplete, and therefore the later

analyses will be incorrect. While this is true, it is possible to delay analysis of these

indirect instructions until the more powerful analyses are available. The analysis of

the whole procedure has to be restarted after the analysis of any indirect branches,

incorporating the new indirect branch targets. This step may be repeated several times

if the indirect branches are nested. The same may be required after indirect calls. The
8.2 Summary of Contributions 245

power of expression propagation, facilitated by the SSA form, can be used to improve

the analysis, e.g. it may well succeed even when several basic blocks have to be analysed.

The resolution of VFT pointers is shown to be simpler using this technique also.

Many of the techniques introduced in this thesis have been veried by a working machine

code decompiler.

8.2 Summary of Contributions

This thesis advances the state of the art of machine code decompilation through several

key contributions.

The rst two chapters established the limitations of current machine code decompilers.

Table 1.2 compared the fundamental problems of various tools related to machine code

decompilers. Figure 1.7 identied where the losses and separations that cause these

problems arise.

Chapter 3 demonstrated signicant advantages for the SSA form as a decompiler IR,

by simplifying expression propagation, recovery of parameters and returns, the analysis

of indirect branches and calls, and enabling a sparse data ow based type analysis.

The SSA implementation of a pointer from each use to its denition saves having to

rename the original denition in many cases. Compilers as well as decompilers could

make use of this contribution.

The power of expression propagation and importance to the overall decompilation pro-

cess was shown, along with the necessity of limiting some propagations.

In a machine code decompiler, summarising the eects of calls is important. Section

3.4 dened appropriate terminology and equations for deriving the required summary

information.

While compilers can combine translation out of SSA form with register allocation, in

decompilers, it is important to minimise the number of generated variables. Sections

4.1.2 and 4.1.3 showed how this can be done.

Although the SSA form makes it easy to propagate registers, the propagation of memory

expressions is more dicult because of the possibility of aliasing. It is important to

propagate as many classes of memory expressions as is safe, to improve the readability

of the generated code. Section 4.2 identied the issues and gave a solution.

Unless it is assumed that all procedures follow the ABI specication, preservation an-

alysis is important for correct data ow information. Section 4.3 dened the issues and

showed how the SSA form simplies the problem. Section 4.4 showed how recursion
246 Conclusion

causes extra problems, and gave a solution for preservation analysis in the presence of

recursion.

Chapter 5 gave the rst application of an iterative, data ow based type analysis for

machine code decompilers. In such a system, addition and subtraction are not readily

handled, but Section 5.7.4 provided a solution.

The memory expression patterns representing accesses to aggregate elements are sur-

prisingly complex. These were explored in detail in Section 5.8.

Chapter 6 showed how the analysis of indirect jumps and calls can be delayed until more

powerful analyses are available, and the implications of working with an incomplete

control ow graph.

Section 6.2.2 showed how Fortran style assigned gotos can be analysed and represented

in a language such as C.

In Section 6.3.1, it was shown that less work is involved in the identication of VFT

pointers when using an IR such as the SSA form.

8.3 Future Work

While the state of the art of decompilation has been extended by the techniques described

in this thesis, work remains to optimise an SSA-based IR for decompilers that handles

aggregates well, and supports alias and value analyses.

At the end of Chapter 3 it was stated that an IR similar to the SSA form, but which

stores and factors def-use as well as use-def information, could be useful for decompila-

tion; the dependence ow graph (DFG) may be one such IR.

In several places, the need for value (range) analysis has been mentioned. This would

help with function pointers and their associated indirect call instructions, interpreting

the meaning of K in complex type patterns, and would allow the declaration of initial

values for arrays accessed with running pointers.

Alias analysis is relatively common in compilers, and would be very benecial for de-

compilers. However, working from source code (including from assembly language) has

one huge advantage: when a memory location has its address taken, it is immediately

obvious from the source code. This makes escape analysis practical; it is possible to

say which variables have their addresses escape the current procedure, and it is safe to

assume that no other variables have their addresses escape. At the machine code level,

however, it is not obvious whether an expression represents the address of an object,

and even if this is deduced by analysis, it is not always clear which object's address is
8.3 Future Work 247

taken. Hence, for alias and/or escape analysis to be practical for machine code decom-

pilation, it probably needs to be combined with value analysis. These analyses would

also help with the problem of separating colocated variables.

One area that emerged from the research that remains for future work is the possibility

of partitioning the decompilation problem so that parts can be run in parallel threads

or on multiple machines. Most of the work of decompiling procedures not involved in a

recursion group is independent of the decompilation of other procedures. It seems likely

that following this stage, interprocedural analyses may be more dicult to partition

eectively.

For convenience, a standard stack pointer register is usually assumed. In many architec-

tures it is possible to use almost any general purpose register as a stack pointer, and a

few architectures do not specialise one register at all as the stack pointer. In these cases,

the ABI will specify a stack pointer, but it will be a convention not mandated by the

architecture. Programs compiled with a nonstandard stack pointer register are likely

to produce essentially meaningless output, since the stack pointer is so fundamental to

most machine code. The ability to be stack pointer register agile could therefore be

a signicant design challenge, left for future work.

With the beginnings of the ability to decompile C++ programs comes the question of

how to remove automatically generated code, such as constructors, destructors, and

their calls. Presumably, these functions could be recognised from the sorts of things

that they do, and their calls suppressed. The fallback is simply to declare them as

ordinary functions, and to not remove the calls. The user or a post decompilation

program could remove them if desired. Other compiler generated code such as null

pointer checking, debug code such as initialising stack variables to 0xCCCCCCCC and

the like, are potentially more dicult to remove automatically, since they could have

been part of the original source code.

There is also the question of what to do with exception handling code. Most of the time,

very little exception specic code is visible in the compiled code by following normal

control ow; often there are only assignments to a special stack variable representing

the current exception zone of the procedure. These assignments appear to be dead

code. Finding the actual exception handling code is compiler specic, and hence will

require support from the front end.

Structuring of control ow has been considered to be a solved problem. However, it is

possible that some unusual cases, such as compound conditionals (e.g. p && q) in loop
predicates, may require some research to handle eectively.

In decompilation, the issue of phase ordering (what order to apply transformations) is

very signicant; almost everything seems to be needed before everything else. There
248 Conclusion

may be some research opportunities in designing decompiler components in such a way

that these problems can be minimised.

As of 2007, there is an open source decompiler (Boomerang) that is capable of decom-

piling small machine code programs into readable, recompilable source code. More than

one source architecture is supported. The goal of a general machine code decompiler

has been shown to be within reach, although much work remains.


Bibliography

[AC72] F. Allen and J. Cocke. Graph theoretic constructs for program control ow

analysis. Technical Report RC 3923 (17789), IBM T.J. Watson Research

Center, 1972.

+
[ADH 01] G. Aigner, A. Diwan, D. Heine, M. Lam, D. Moore, B. Murphy, and C. Sa-

puntzakis. An overview of the SUIF2 compiler infrastructure. Technical re-

port, Computer Systems Laboratory, Stanford University, 2001. Retrieved

Apr 2007 from http://suif.stanford.edu/suif/suif2/doc-2.2.0-4.

[AF00] A. Appel and A. Felty. A semantic model of types and machine instructions

for proof-carrying code. In Proceedings of the SIGPLAN Conference on

Programming Language Design and Implementation. ACM Press, 2000.

[All70] F. Allen. Control ow analysis. SIGPLAN Notices, 5(7):119, July 1970.

[Ana01] Anakrino web page, 2001. Retrieved Mar 2007 from http://www.saurik.
com/net/exemplar.

[And04] Andromeda decompiler web page, 2004. Retrieved Mar 2007 from http:
//shulgaaa.at.tut.by.

[App02] A. Appel. Modern compiler implementation in Java, chapter 19. Cam-

bridge University Press, 2002.

[ASU86] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and

Tools, chapter 10. Adison-Wesley, 1986.

[Ata92] Atari Games Corp. v. Nintendo, 1992. 975 F.2d 832 (Fd. Cir. 1992).

[Bal98] T. Ball. Reverse engineering the twelve days of christmas, 23rd Decem-

http://research.microsoft.com/
ber 1998. Retrieved June 2006 from

~tball/papers/XmasGift/final.html.

249
250 BIBLIOGRAPHY

[Bar74] P. Barbe. The Piler system of computer program translation. Technical

report, Probe Consultants Inc., September 1974. See also [VE03].

[Bar98] R. Barták. On-line guide to constraint programming, 1998. Retrieved Mar

2004 from http://kti.ms.mff.cuni.cz/~bartak/constraints/.

[BCHS98] P. Briggs, K. Cooper, T. Harvey, and L. Simpson. Practical improvements

to the construction and destruction of static single assignment form. Soft-

ware  Practice and Experience, 28(8):859881, July 1998.

[BDB00] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A transperent dy-

namic optimization system. In Proceedings of the SIGPLAN Conference

on Programming Language Design and Implementation, pages 112, 2000.

[BDMP05] L. Bigum, H. Davidsen, L. Mikkelsen, and E. Pedersen. Detection and

avoidance ofφ-loops, 2005. Project report, Aalborg University. Retrieved


Mar 2006 from http://www.cs.aau.dk/library/cgi-bin/detail.cgi?

id=1117527263.

[BM00] I. Baxter and M. Mehlich. Reverse engineering is reverse forward engineer-

ing. Science of Computer Programming, 36:131147, 2000.

[Boo02] Boomerang web page. BSD licensed software, 2002. Retrieved May 2005

from http://boomerang.sourceforge.net.

[BR04] G. Balakrishnan and T. Reps. Analyzing memory accesses in x86 ex-

ecutables. In Proceedings of the International Conference on Compiler

Construction, pages 523. Springer, April 2004. LNCS 2985.

[BR05] G. Balakrishnan and T. Reps. Recovery of variables and heap structure in

x86 executables. Technical Report #1533, University of Wisconsin Madi-

son, July 2005.

[BR06] G. Balakrishnan and T. Reps. Recency-abstraction for heap-allocated stor-

age. In 13th Annual Static Analysis Symposium, pages 221239. Springer,

2006. LNCS 4136.

[Bri81] D. Brinkley. Intercomputer transportation of assembly language software

through decompilation. Technical report, Naval Underwater Systems Cen-

ter, October 1981.

[BRMT05] G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum. WYSINWYX:

What You See Is Not What You eXecute. In Proceedings of Veried Soft-

ware: Theory, Tools and Experiments, 2005.


BIBLIOGRAPHY 251

[BS93] J. Bowen and V. Stavridou. Safety-critical systems formal methods and

standards. IEE/BCS Software Engineering Journal, 8(4):176187, July

1993.

[BT01] BinaryTranslation wiki page, 2001. http://


Retrieved May 2005 from

www.program-transformation.org/Transform/BinaryTranslation.

[Byr92] E. Byrne. A conceptual foundation for software re-engineering. In Pro-

ceedings of the International Conference on Software Maintenance, pages

226235. IEEE-CS Press, 1992.

[CAI01] Canadian Association for Interoperable Systems submission on copyright

issues, 2001. Retrieved Mar 2002 from http://strategis.ic.gc.ca/SSG/


rp00326e.html.

[Cap98] G. Caprino. REC - Reverse Engineering Compiler. Binaries free for any

use, 1998. Retrieved Jan 2003 from http://www.backerstreet.com/rec/


rec.htm.

[CC76] P. Cousot and R. Cousot. Static determination of dynamic properties of

programs. In Proceedings of the 2nd International Symposium on Program-

ming, pages 106130, Paris, France, 1976.

[CC90] E. Chikofsky and J. Cross. Reverse engineering and design recovery: A

taxonomy. IEEE Software, 7:1317, January 1990.

[CCF91] J. Choi, R. Cryton, and J. Ferrante. Automatic construction of sparse

data ow evaluation graphs. In Conference Record of the 18th Annual

ACM Symposium on Principles of Programming Languages, pages 5566.

ACM Press, 1991.

+
[CCL 96] F. Chow, S. Chan, S. Liu, R. Lo, and M. Streich. Eective representation

of aliases and indirect memory operations in SSA form. In Proceedings

of the International Conference on Compiler Construction, pages 253267.

Springer, April 1996. LNCS 1060.

+
[CDC 04] R. Chowdhury, P. Djeu, B. Cahoon, J. Burrill, and K. McKinley. The

limits of alias analysis for scalar optimizations. In Compiler Construction,

pages 2438. Springer, 2004. LNCS 2985.

+
[CFR 91] R. Cryton, J. Ferrante, B. Rosen, M. Wegman, and F. Zadek. Eciently

computing static single assignment form and the control dependence graph.
252 BIBLIOGRAPHY

ACM Transactions on Programming Languages and Systems, 13(4):451

490, October 1991.

[CG95] C. Cifuentes and K.J. Gough. Decompilation of binary programs. Software

 Practice and Experience, 25(7):811829, 1995.

[Cha82] G. Chaitin. Register allocation and spilling via graph coloring. SIGPLAN

Notices, 17(6):98105, June 1982. Proceedings of the ACM SIGPLAN

Symposium on Compiler Construction.

[Cif94] C. Cifuentes. Reverse Compilation Techniques. PhD thesis, Queensland

University of Technology, School of Computing Science, July 1994. Also

http://www.itee.uq.edu.au/~cristina/dcc/
available Apr 2007 from

decompilation_thesis.ps.gz.

[Cif96] C. Cifuentes. The dcc decompiler. GPL licensed software, 1996. Retrieved

Mar 2002 from http://www.itee.uq.edu.au/~cristina/dcc.html.

[Cif01] C. Cifuentes. Reverse engineering and the computing profession. Com-

puter, 34(12):136138, Dec 2001.

[CJ03] M. Christodorescu and S. Jha. Static analysis of executables to detect

malicious patterns. In Proceedings of the 12th USENIX Security Sympo-

sium, pages 169186. Published online at http://www.usenix.org, Au-

gust 2003.

[CK94] B. Cmelik and D. Keppel. Shade: A fast instruction-set simulator for

execution proling. In ACM SIGMETRICS conference on Measurement

and Modelling of Computer Systems, volume 22. ACM Press, May 1994.

[Com95] Australian Copyright Law Review Committee. Report on computer soft-

ware protection, 1995. Retrieved Mar 2002 from http://www.law.gov.


au/clrc, Chapter 10, Exceptions to Exclusive Right.

[Cou99] P. Cousot. Directions for research in approximate system analysis. ACM

Computing Surveys, 31(3es), September 1999.

[CTL97] C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating

transformations. Technical Report 148, Department of Computer Science,

The University of Auckland, 1997.

[CVE00] C. Cifuentes and M. Van Emmerik. UQBT: Adaptable binary translation

at low cost. Computer, 33(3):6066, March 2000.


BIBLIOGRAPHY 253

[CVE01] C. Cifuentes and M. Van Emmerik. Recovery of jump table case statements

from binary code. Science of Computer Programming, 40:171188, 2001.

+
[CVEU 99] C. Cifuentes, M. Van Emmerik, D. Ung, D. Simon, and T. Wadding-

ton. Preliminary experiences with the use of the UQBT binary translation

framework. In Proceedings of the Workshop on Binary Translation, New-

Port Beach 16th Oct 1999, pages 1222. Technical Committee on Computer

Architecture Newsletter, IEEE-CS Press, Dec, 1999.

[CWVE01] C. Cifuentes, T. Waddington, and M. Van Emmerik. Computer security

analysis through decompilation and high-level debugging. In Proceedings of

the Working Conference on Reverse Engineering, pages 375380, Stuttgart,

Germany, 2001. IEEE-CS Press.

[Dat98] DataRescue. IDA Pro, 1998. Retrieved Jan 2003 from http://www.
datarescue.com/idabase.

[Dec01] DeCompilation wiki page, 2001. Retrieved May 2005 from http://www.
program-transformation.org/Transform/DeCompilation.

[Dot04] Dot4, Inc. Assembler to "C" Software Migration Tool, 2004. Retrieved

Jan 2004 from http://www.dot4.com/literature.

[DP02] B. Davey and H. Priestley. Introduction to Lattices and Order. Cambridge

Press, 2002. ISBN 0521784514.

[DS97] A. Dolzmann and T. Sturm. Simplication of quantier-free formulae over

ordered elds. J. Symbolic Computation, 24:209231, 1997.

[DT04] Decompiler Technologies web page, 2004. Retrieved Mar 2007 from http:
//www.decompiler.org.

[Ega98] G. Egan. Short story  The Planck Dive, February 1998. Retrieved

Nov 2005 fromhttp://gregegan.customer.netspace.net.au/PLANCK/


Complete/Planck.html.

[Eri02] D. Eriksson. Desquirr web page, 2002. Retrieved Jul 2005 from http:
//desquirr.sourceforge.net/desquirr.

[EST01] Essential systems technical consulting, 2001. Retrieved Feb 2003 from

http://www.essential-systems.com/resource/index.htm.
254 BIBLIOGRAPHY

[Fal04] R. Falke. Entwicklung eines Typanalysesystem für einen Decompiler (De-

velopment of a type analysis system for a decompiler), 2004. Diploma the-

sis, in German; retrieved Jan 2005 from http://risimo.net/diplom.ps.

[FC99] L. Freeman and C. Cifuentes. An industry perspective on decompilation.

In Proceedings of the International Conference on Software Maintenance.

IEEE-CS Press, 1999. Only the abstract is printed form; full text published

at http://doi.ieeecomputersociety.org/10.1109/ICSM.1999.10009.

[FC00] A. Fitzgerald and C. Cifuentes, editors. Going Digital 2000, chapter 3,

pages 3770. Prospect Media Pty, Sydney, Australia, 2nd edition, February

2000. Pegging out the Boundaries of Computer Software Copyright: The

Computer Programs Act and the Digital Agenda Bill.

[Fer95] M. Fernández. Simple and eective link-time optimization of Modula-3

programs. In Proceedings of the SIGPLAN Conference on Programming

Language Design and Implementation, pages 103115. ACM Press, 1995.

[FH91] C. Fraser and D. Hanson. A retargetable compiler for ANSI C. SIGPLAN

Notices, 26(10):2943, October 1991.

[FH95] C. Fraser and D. Hanson. A Retargetable C Compiler: Desigm and Imple-

mentation. Addison-Wesley, Boston, MA, USA, 1995.

[FLI00] IDA Pro FLIRT web page, 2000. Retrieved Mar 2007 from http://www.
datarescue.com/idabase/flirt.htm.

[Fri74] F. Friedman. Decompilation and the Transfer of Mini-Computer Operating

Systems. PhD dissertation, Purdue University, Computer Science, August

1974.

[FSF01] Free Software Foundation, Boston, USA. GNU Binutils, 2001. Retrieved

Apr 2007 from http://www.gnu.org/software/binutils.

[FZ91] C. Fuan and L. Zongtian. C function recognition technique and its imple-

mentation in 8086 C decompiling system. Mini-Micro Systems, 12(11):33

40,47, 1991. Chinese language.

[FZL93] C. Fuan, L. Zongtian, and L. Li. Design and implementation techniques

of the 8086 C decompiling system. Mini-Micro Systems, 14(4):1018,31,

1993. Chinese language.

[Gab00] H. Gabow. Path-based depth-rst search for strong and biconnected com-

ponents. Inf. Process. Lett., 74(3-4):107114, 2000.


BIBLIOGRAPHY 255

[Gai65] R. Stockton Gaines. On the translation of machine language programs.

Comm. ACM, 8(12):736741, December 1965.

+
[GBT 05] B. Guo, M. Bridges, S. Triantafyllis, G. Ottoni, E. Raman, and D. August.

Practical and accurate low-level pointer analysis. In Proceedings of the

International Symposium on Code Generation and Opmimization, pages

291302. IEEE Press, 2005.

[GHM00] E. Gagnon, L. Hendren, and G. Marceau. Ecient inference of static types

for Java bytecode. In Static Analysis Symposium, pages 199219. Springer,

June 2000. LNCS 1824.

[Gla98] P. Glasscock. An 80x86 to C reverse compiler. Technical report, University

of Cambridge Computer Laboratory Library, 1998. DIP98/14.

[Gra97] GrammaTech Inc, 1997. Retrieved Nov 2004 from http://www.


grammatech.com.

[Gro05] Metro-Goldwyn-Mayer Studios Inc. et al. v. Grokster, Ltd., et al.,

2005. 545 U.S. 125 S.Ct. 2764, 2770 (2005), retrieved Nov 2005

fromhttp://news.bbc.co.uk/1/shared/bsp/hi/pdfs/supreme_court_
mgm_grokster_27_06_05.pdf.

[Gui01] I. Guilfanov. A simple type system for program reengineering. In Proceed-

ings of the Working Conference on Reverse Engineering, pages 357361,

Stuttgart, Germany, 2001. IEEE-CS Press.

[Gui07a] I. Guilfanov. Blog: Decompilation gets real, April 2007. Retrieved Apr

2007 from http://hexblog.com/2007/04/decompilation_gets_real.


html.

[Gui07b] I. Guilfanov. Hex-rays home page, 2007. Retrieved Aug 2007 from http:
//www.hex-rays.com.

[GY04] J. Gross and J. Yellen, editors. Handbook of Graph Theory, chapter 10.

CRC Press, 2004.

[Hal62] M. Halstead. Machine-independent computer programming, chap-

ter 11, pages 143150. http://www.


Spartan Books, 1962. See also

program-transformation.org/Transform/NeliacDecompiler.

[Hal70] M. Halstead. Using the computer for program conversion. In Datamation,

pages 125129, May 1970.


256 BIBLIOGRAPHY

[Hal77] M. Halstead. Elements of Software Science, Operating, and Programming

Systems Series, volume 7. Elsevier, 1977.

[Har97] B. Harlan. A tirade against the cult of performance, 1997. Re-

http://billharlan.com/pub/papers/A_Tirade_
trieved Aug 2007 from

Against_the_Cult_of_Performance.html.

[HH74] B. Housel and M. Halstead. A methodology for machine code decompi-

lation. In Proceedings of the 1974 annual conference (ACM'74), pages

254260. ACM Press, 1974.

[HM79] R. Horspool and N. Marovac. An approach to the problem of detranslation

of computer programs. The Computer Journal, 23(3):223229, 1979.

[HM05] L. Harris and B. Miller. Practical analysis of stripped binary code. ACM

SIGARCH Computer Architecture News, 33(5):6368, December 2005.

[Hoe00] J. Hoenicke. Java Optimize and Decompile Environment, 2000. Retrieved

Jan 2003 from http://sourceforge.net/projects/jode.

[Hol73] C. Hollander. Decompilation of Object Programs. PhD dissertation, Stan-

ford University, Computer Science, January 1973.

[Hop78] G. Hopwood. Decompilation. PhD dissertation, University of California,

Irvine, Computer Science, 1978.

[Hou73] B. Housel. A Study of Decompiling Machine Languages into High-Level

Machine Independent Languages. PhD dissertation, Purdue University,

Computer Science, August 1973.

[HZY91] L. Hungmong, L. Zongtian, and Z. Yifen. Design and implementation of the

intermediate language in a PC decompiler system. Mini-Micro Systems,

12(2):2328,46, 1991. Chinese language.

[IBM04] International Business Machines Corporation. XL C Language Reference

V7.0, 2004. Retrieved Aug 2005 from scv.bu.edu/SCV/Archive/IBM/


language7.pdf.

[Ioc88] The International Obfuscated C Code Contest web page, 1988. Retrieved

Aug 2004 from http://www.ioccc.org, le phillipps.c.

[IU00] IEEE-USA. Opposing adoption of the Uniform Computer Information

Transactions Act (UCITA) by the states, 2000. Retrieved Jan 2003 from

http://www.ieeeusa.org/forum/POSITIONS/ucita.html.
BIBLIOGRAPHY 257

[Jan02] André Janz. Experimente mit einem Decompiler im Hinblick auf die

http://agn-www.
forensische Informatik, 2002. Retrieved Aug 2005 from

informatik.uni-hamburg.de/papers/doc/diparb_andre_janz.pdf.

[Jar04] J. Jarrett, 2004. tagline, Electric Vehicle Discussion List, retrieved June

2004 from http://autos.groups.yahoo.com/group/ev-list-archive/


messages.

[JP93] R. Johnson and K. Pingali. Dependence-based program analysis. In Pro-

ceedings of the '93 SIGPLAN Conference on Programming Language De-

sign and Implementation, pages 7889. ACM Press, June 1993.

[JS04] A. Johnstone and E. Scott. Suppression of redundant operations in reverse

compiled code using global dataow analysis. In Software and Compilers

for Embedded Systems: 8th International Workshop, SCOPES 2004, Am-

sterdam, Netherlands, September 2004. Springer. LNCS 3199.

[JSW00] A. Johnstone, E. Scott, and T. Womack. Reverse compilation for Digital

Signal Processors: a working example. In Proceedings of the Hawaii Inter-

national Conference on System Sciences. IEEE-CS Press, January 2000.

[Jug05] JuggerSoft, 2005. (Also known as SST GLobal and Source Recovery.)

Retrieved Apr 2006 from http://juggersoft.com.

+
[KCL 99] R. Kennedy, S. Chan, S. Liu, R. Lo, P. Tu, and F. Chow. Partial re-

dundancy elimination in SSA form. ACM Transations on Programming

Languages and Systems, 21(3):627676, May 1999.

[KDM03] U. Khedker, D. Dhamdhere, and A. Mycroft. Bidirectional data ow an-

alysis for type inferencing. Computer Languages, Systems & Structures,

29(1-2):1544, 2003.

[Kil73] G. Kildall. A unied approach to global program optimization. In Con-

ference Record of the 1st ACM Symposium on Principles of Programming

Languages, pages 194206. ACM Press, January 1973.

[Knu69] D. Knuth. The Art of Computer Programming, Vol. 1. Addison-Wesley,

1969.

[KO01] S. Katsumata and A. Ohori. Proof-directed de-compilation of low-level

code. In European Symposium on Prigramming, pages 352366. Springer,

2001. LNCS 2028.


258 BIBLIOGRAPHY

[Kou99] P. Kouznetsov. JAD - the fast JAva Decompiler, 1999. Retrieved Jan 2003

from http://kpdus.tripod.com/jad.html.

[KU76] J. Kam and J. Ullman. Global data ow analysis and iterative algorithms.

Journal of the ACM, 23(1):158171, January 1976.

[Kum01a] K. Kumar. JReversePro - Java decompiler, 2001. Retrieved Jan 2003 from

http://jrevpro.sourceforge.net.

[Kum01b] S. Kumar. DisC - decompiler for TurboC, 2001. Retrieved Feb 2003 from

http://www.debugmode.com/dcompile/disc.htm.

[LA04] C. Lattner and V. Adve. LLVM: a compilation framework for lifelong

program analysis & transformation. In CGO '04: Proceedings of the Inter-

national Symposium on Code Generation and Optimization, page 75, Palo

Alto, California, 2004. IEEE CS Press.

[Lam00] M. Lam. Overview of the SUIF system, 2000. Presentation from PLDI

http://suif.stanford.edu/suif/suif2/
2000, retrieved Apr 2007 from

doc-2.2.0-4/tutorial/suif-intro.ps.

[Lat02] C. Lattner. LLVM: An infrastructure for muli-stage optimization. Mas-

ter's thesis, University of Illinois at Urbana-Champaign, 2002. Also avail-

able Apr 2007 from http://llvm.org/pubs/2002-12-LattnerMSThesis.


html.

[Les05] L. Lessig. Brief for Creative Commons as Amicus Curiae in support of

respondents (MGM v Grokster), 2005. Retrieved Mar 2005 from http:


//www.eff.org/IP/P2P/MGM_v_Grokster/20050301_cc.pdf.

[MAF91] C. Mills, S. Ahalt, and J. Fowler. Compiled instruction set simulation.

Software  Practice and Experience, 21(8):877889, August 1991.

[McC76] T. McCabe. A complexity measure. IEEE Transactions on Software En-

gineering, SE-2(4):308320, December 1976.

[MDW01] R. Muth, S. Debray, and S. Watterson. alto: A link-time optimizer for the

Compaq alpha. Software  Practice and Experience, 31(1):67101, 2001.

[MH02] J. Miecznikowski and L. Hendren. Decompiling Java bytecode: Problems,

traps and pitfalls. In Proceedings of the International Conference on Com-

piler Construction, pages 111127, Grenoble, France, April 2002. Springer.

LNCS 2304.
BIBLIOGRAPHY 259

[Mic97] MicroAPL. MicroAPL: Porting tools and services, 1997. Retrieved Feb

2003 from http://www.microapl.co.uk.

[MLt02] MLton web page. BSD-style licensed software, 2002. Retrieved Jun 2006

from http://www.mlton.org.

[Moh02] M. Mohnen. A graph-free approach to data-ow analysis. In Compiler

Construction, pages 185213. Springer, 2002. LNCS 2304.

[Mor98] R. Morgan. Building an Optimizing Compiler. Digital Press, 1998. ISBN

155558179X.

[Mor02] K. Morisada. A decompilation introduction of Japanese, 2002. Re-

trieved Oct 2005 http://jdi.at.infoseek.co.jp (Japanese);


from

see also http://www.program-transformation.org/Transform/


AnatomizerDecompiler (English).

[MR90] T. Marlowe and B. Ryder. Properties of data ow frameworks. Acta

Informatica, 28:121163, 1990.

[MRU07] Microsoft Research and the University of Virginia. Phoenix Summer

https:
Workshop Presentation, August 2007. Retrieved Aug 2007 from

//connect.microsoft.com/Downloads/Downloads.aspx?SiteID=214.

[MS04] Microsoft Corporation. Phoenix Home Page, 2004. Retrieved Aug 2007

from https://connect.microsoft.com/Phoenix.

[MWCG98] G. Morrisett, D. Walker, K. Crary, and N. Glew. From System F to

typed assembly language. In Proceedings of the SIGPLAN Conference on

Programming Language Design and Implementation, pages 85  97. ACM

Press, January 1998.

[Myc99] A. Mycroft. Type-based decompilation. In 8th European Symposium on

Programming. Springer, March 1999. LNCS 1576.

[Myc01] A. Mycroft. Comparing type-based and proof-directed decompilation. In

Proceedings of the Working Conference on Reverse Engineering, pages 362

367, Stuttgart, Germany, 2001. IEEE-CS Press.

[NH06] N. Naeem and L. Hendren. Programmer-friendly decompiled Java. Techni-

cal Report SABLE-TR-2006-2, Sable Research Group, McGill University,

March 2006. http://www.sable.mcgill.ca/


Retrieved Jul 2006 from

publications/techreports/#report2006-2.
260 BIBLIOGRAPHY

[Nov03] D. Novillo. Tree SSA - A new optimization infrastructure for GCC. In

Proceedings of the GCC Developers Summit, pages 181194, June 2003.

[Nov04] D. Novillo. Design and implementation of tree SSA. In Proceedings of the

GCC Developers Summit, pages 119130, June 2004.

[Nov06] D. Novillo. Memory SSA - A unied approach for sparsely representing

memory operations, 2006. Preliminary draft. Retrieved Apr 2007 from

http://people.redhat.com/dnovillo/pub/mem-ssa.pdf.

[PR94] H. Pande and B. Ryder. Static type determination and aliasing for C++.

In Proceedings of the USENIX C++ Technical Conference, pages 8597,

1994. A longer version is available as Rutgers University Technical Report

lcsr-tr-236, and empirical results from lcsr-tr-250-a.

[PR96] H. Pande and B. Ryder. Data-ow-based virtual function resolution. In

Static Analysis: Third International Symposium (SAS'96), pages 238254.

Springer, Sept 1996. LNCS 1145.

[PS91] J. Palsberg and M. Schwartzbach. Object-oriented type inference. SIG-

PLAN Notices, 26(11):146161, 1991. Proceedings of the Conference

on Object-Oriented Programming Systems, Languages, and Applications

(OOPSLA).

[QUT99] Queensland University of Technology. Component Pascal on the JVM,

http://www.citi.qut.edu.au/research/
1999. Retrieved Feb 2003 from

plas/projects/cp_files/cpjvm.html.

[RBL06] T. Reps, G. Balakrishnan, and J. Lim. Intermediate-representation re-

covery from low-level code. In Proceedings of the ACM/SIGPLAN Sym-

posium on Partial Evaluation and Semantics-based Program Manipilation

(PEPM'06), pages 100111. ACM Press, January 2006. Invited talk.

[Reu88] J. Reuter, 1988. Public domain software; retrieved Feb 2003 from ftp:
//ftp.cs.washington.edu/pub/decomp.tar.Z.

[Ric53] H. Rice. Classes of recursively enumerable sets and their decision problems.

Transactions of the American Mathematics Society, 74(2):358366, 1953.

[Rob02] J. Roberts. Answer to forum topic How do I translate EXE -> ASM

-> EXE with IDA Pro? , 2002. http://www.


Retrieved Mar 2005 from

datarescue.com/ubb/ultimatebb.php?ubb=get_topic;f=1;t=000429.
BIBLIOGRAPHY 261

[Roe01] L. Roeder. Lutz roeder's programming.net, 2001. Retrieved Sep 2004 from

http://www.aisto.com/roeder/dotnet; oine as of April 2005.

[SA01] K. Swadi and A. Appel. Typed machine language and its semantics, July

2001. Retrieved Oct 2005 from http://www.cs.princeton.edu/~appel/


papers/tml.pdf.

[Sas66] W. Sassaman. A computer program to translate machine language into

Fortran. In Proceedings SJCC, pages 235239, 1966.

+
[SBB 00] B. De Sutter, B. De Bus, K. De Bosschere, P. Keyngnaert, and B. Demoen.

On the static analysis of indirect control transfers in binaries. In Proceed-

ings of the International Conference on Parallel and Distributed Processing

Techniques and Applications, Las Vegas, Nevada, USA, pages 10131019,

June 2000.

[Sca01] Scale, a scalable compiler for analytical experiments, 2001. Retrieved Apr

2007 from http://ali-www.cs.umass.edu/Scale.

[SDGS99] V. Sreedhar, R. Dz-Ching Ju, D. Gilles, and V. Santhanam. Translating

out of static single assignment form. In Proceedings of the 6th International

Symposium on Static Analysis, pages 194210. Springer, 1999. LNCS 1694.

[Seg92] Sega Entertainment Ltd. v. Accolade, Inc, 1992. 977 F.2d 1510 (9th Cir.

1992).

[SF06] Open64 | The Open Research Compiler, 2006. Retrieved Aug 2007 from

http://www.open64.net.
+
[SFF 05] M. Suzuki, N. Fujinami, T. Fukuoka, T. Watanabe, and I. Nakata. SIMD

optimization in COINS compiler infrastructure. In Proc. 8th Interna-

tional Workshop on Innovative Architecture for Future Generation High-

Performance Processors and Systems (IWIA2005), pages 131140. IEEE

Press, January 2005.

[SGI02] Silicon Graphics, Inc. WHIRL Intermediate Language Specication, 2002.

Retrieved Aug 2007 from http://www-rocq.inria.fr/~pop/doc/whirl.


html.

[SGVN05] G. Stitt, Z. Guo, F. Vahid, and W. Najjar. Techniques for synthesizing

binaries to an advanced register/memory structure. In Proceedings of the

ACM/SIGDA symposium on Field-programmable gate arrays, pages 118

124. ACM Press, 2005.


262 BIBLIOGRAPHY

[SGVV02] G. Stitt, B. Grattan, J. Villarreal, and F. Vahid. Using on-chip congurable

logic to reduce embedded system software energy. In Proceedings of the

IEEE Symposium on Field-Programmable Custom Computing Machines,

pages 143151. IEEE Press, 2002.

[Sim97] D. Simon. Structuring assembly programs. Honours thesis, The

University of Queensland, School of ITEE, 1997. Also avail-

http://www.itee.uq.edu.au/~cristina/students/
able Apr 2007 from

doug/dougStructuringAsmThesis97.ps.

[Sin03] J. Singer. Static single information improves type-based decompilation,

2003. Unpublished. Was retrieved Sep 2004 from http://www.cl.cam.


ac.uk/~jds31/research/ssidecomp.pdf.

[SKI04] M. Sassa, M. Kohama, and Y. Ito. Comparison and evaluation of back

translation algorithms for Static Single Assignment form. In IPSI, Prague,

Czech Republic, December 2004.

[SML01] Software Migrations, Ltd. The Assembler Comprehension and Migration

Specialists, 2001. Retrieved Feb 2003 from http://www.smltd.com.


+
[SNK 03] M. Sassa, T. Nakaya, M. Kohama, Fukuoka T., and M. Takahashi. Static

Single Assignment form in the COINS compiler infrastructure. In SSGRR

2003w, number 54, January 2003.

[Sof96] SofCheck Inc. Applet Magic, 1996. Retrieved Feb 2003 from http://www.
appletmagic.com.

[Son84] Sony Corp. of America v. Universal City Studios, 1984. 464 U.S. 417 (2005),

retrieved Nov 2005 from http://www.eff.org/legal/cases/betamax/.

[SPE95] Standard Performance Evaluation Corporation. CPU95 benchmark, 1995.

Retrieved Mar 2002 from http://www.spec.org/osg/cpu95/.

[SR02a] Source Recovery, also known as JuggerSoft and SST Global, 2002.

Retrieved Apr 2006 from http://www.sourcecovery.com, formerly

sourcerecovery.com.

[SR02b] Source Recovery's HP-UX C/C++ Decompiler, 2002. Retrieved Feb 2004

from http://www.sourcecovery.com/abstract.htm.

[SRC96] The Source Recovery Company, 1996. Retrieved July 2004 from http:
//www.source-recovery.com.
BIBLIOGRAPHY 263

[SST03] SST GLobal, 2003. (Also known as JuggerSoft and Source Recovery.)

Retrieved Apr 2006 from http://www.sstglobal.com.

[Sta02] Static recompilers. Yahoo Tech Group, 2002. Retrieved Apr 2007 from

http://tech.groups.yahoo.com/group/staticrecompilers.

[SW74] V. Schneider and G. Winiger. Translation grammars for compilation and

decompilation. BIT, 14:7886, 1974.

[SW93] A. Srivastava and D. Wall. A practical system for intermodule code op-

timization at link-time. Journal of Programming Languages, 1(1):118,

March 1993. Also available as WRL Research Report 92/06; retrieved Apr

2004 from citeseer.nj.nec.com/srivastava92practical.html.

[TC02] J. Tröger and C. Cifuentes. Analysis of virtual method invocation for

binary translation. In Proceedings of the Working Conference on Reverse

Engineering, pages 6574, Richmond, Virginia, 2002. IEEE-CS Press.

[Tho96] D. Thorpe. Delphi Component Design. Addison Wesley Longman, 1996.

[TKB03] F. Tip, A. Kiezun, and D. Bäumer. Refactoring for genearalization using

type constraints. In Proceedings of the Conference on Object-Oriented Pro-

gramming Systems, Languages, and Applications (OOPSLA), pages 1326.

ACM Press, October 2003.

[Tol96] R. Tolksdorf. Programming languages for the Java Virtual

Machine, 1996. Retrieved Feb 2003 from http://p.cs.tu-

berlin.de/~tolk/vmlanguages.html.

[TP95] P. Tu and D. Padua. Gated SSA-based demand-driven symbolic analysis

for parallelizing compilers. In Proceedings of the 9th ACM International

Conference on Supercomputing, pages 414423. ACM Press, 1995.

[UH02] Overview of the Open64 Compiler Infrastructure, November 2002. Re-

trieved Aug 2007 from http://www2.cs.uh.edu/~dragon/Documents/


open64-doc.pdf.

[Upt03] E. Upton. Optimal sequentialization of gated data dependency graphs is

NP-complete. In Proceedings of the International Conference on Paral-

lel and Distributed Processing Techniques and Applications (PDPTA'03).

CSREA Press, June 2003.

[UQB01] UQBT web page. BSD licensed software, 2001. Retrieved Apr 2002 from

http://www.itee.uq.edu.au/~cristina/uqbt.html.
264 BIBLIOGRAPHY

[Uti92] Unix-C.Utils, 1992. Retrieved Nov 2005 from http://ftp.unicamp.br/


pub/unix-c/utils/metrics.tar.Z.

[Van03] P. Vandevenne. Answer to forum topic "Help with IdaPros gen-

erated ASM les required", 2003. Retrieved Apr 2006 from

http://www.datarescue.com/cgi-local/ultimatebb.cgi?ubb=get_
topic;f=1;t=000495;p=0#000001.

[VE98] M. Van Emmerik. Identifying library functions in executable les using

patterns. In Proceedings of the Australian Software Engineering Confer-

ence, pages 9097, Adelaide, Australia, Nov 1998. IEEE-CS Press.

[VE03] M. Van Emmerik. The PILER Decompilation System, 2003. Retrieved

Jul 2003 from http://www.program-transformation.org/Transform/


PilerSystem.

[VEW04] M. Van Emmerik and T. Waddington. Using a decompiler for real-

world source recovery. In Proceedings of the Working Conference on Re-

verse Engineering, pages 2736. IEEE-CS Press, November 2004. Ex-

tended version available from http://www.itee.uq.edu.au/~emmerik/


experience_long.pdf.

[VH89] P. Van Hentenyck. Constraint Satisfaction in Logic Programming. MIT

Press, 1989.

+
[VWK 03] L. Vinciguerra, L. Wills, N. Kejriwal, P. Martino, and R. Vinciguerra. An

experimentation framework for evaluating disassembly and decompilation

tools for C++ and Java. In Proceedings of the Working Conference on

Reverse Engineering, pages 1423, Victoria, Canada, Nov 2003. IEEE-CS

Press.

[War00] M. Ward. Reverse engineering from assembler to formal specications via

program transformations. In Proceedings of the Working Conference on

Reverse Engineering, Brisbane, Australia, 2000. IEEE-CS Press.

[War01] M. Ward. The FermaT transformation system, 2001. Retrieved Feb 2003

from http://www.dur.ac.uk/martin.ward/fermat.html.

[War04] M. Ward. Pigs from sausages? Reengineering from assembler to C via

FermaT transformations. Science of Computer Prgramming Special Issue:

Transformations Everywhere, 52(1-3):213255, 2004.


BIBLIOGRAPHY 265

[WCES94] D. Weise, R. Crew, M. Ernst, and B. Steensgaard. Value depenence graphs:

Representation without taxation. In Conference Record of the 21st Annual

ACM Symposium on Principles of Programming Languages, pages 297

310. ACM Press, January 1994.

[Win01] Winelib web page, 2001. Retrieved Mar 2005 from http://winehq.org/
site/winelib.

[Wol92] M. Wolfe. Beyond induction variables. In SIGPLAN Conference on PLDI,

pages 162174. ACM Press, 1992.

[Wol96] M. Wolfe. High Performance Compilers for Parallel Computing. Addison-

Wesley, 1996.

[Wro02] G. Wroblewski. General Method of Program Code Obfuscation. PhD

dissertation, Wroclaw University of Technology, Institute of Engineer-

ing Cybernetics, 2002. Retrieved Dec 2002 from http://www.mysz.org/


publications.html.

[WZ91] M. Wegman and F. Zadeck. Constant propagation with conditional

branches. ACM Transactions on Programming Languages and Systems,

13(2):181210, April 1991.

[Xu01] Z. Xu. Safety-Checking of Machine Code. PhD thesis, University of

Wisconsin-Madison, 2001.

[YRL99] J. Yur, B. Ryder, and W. Landi. An incremental ow- and context-sensitive

pointer aliasing analysis. In Proceedings of the International Conference

on Software Engineering, pages 442451, 1999.


Index
.NET, xxx, 44 arguments, type of, 173

0xhhhh notation, xl Art of Computer Programming, the, 39

80286/DOS, 34, 35 ASI, 52

8086 C Decompiling System, 35 asm21toc, 41

assembler, xxviii, 45
a[exp ], xl
assembler comprehension, 45
abbreviations, xxv
assembly decompilers, 23
ABI, xxviii
assembly language, xxviii, 39, 41, 46
acronyms, xxv
assigned goto, 201
add features, 14
assigned goto statements, 201
addition and subtraction, 166
AT&T syntax, 64
address space, 187
Atari vs Nintendo, 9
ane relation, xxviii, 214
automated tools, 10
ane relations analysis, 51
automatically generated code, 11
aggregate, xxviii
available denitions, 85
aggregate structure dentication, 52

Alan Mycroft, 165 back translation, 107

alias, 67 Balakrishnan, Gogul, 139

alias analysis, 246 Barbe, Penny, 34

aliasing, 113 basic block, xxviii, 37, 87

allocating variables, 108 Baxter and Mehlich, 2000, 52

allocator, register, 108 BCPL, 40

α*, 177 bidirectional data ow problem, 160

Anakrino, 44 binary, xxviii

Analysis of Virtual Method Invocation for binary translation, xxxvii, 28, 48, 154, 158

Binary Translation, 37 binary translator, 6, 24

analysis, preservation, 120 Boomerang decompiler, iv, xxviii, 37, 73,

Andromeda decompiler, 38 94, 98, 122, 199, 223

Application Binary Interface, xxviii, 78 borrow ag, xxix

applications of decompilation, 8 bottom (lattice), 170

ARA, 51 boundaries, nding, 19

argument, xxviii branch statement, 61

arguments, 77 browser, decompiler as, 8

266
INDEX 267

bugs, nding, 10 compatible types, 160

bypassing (of calls), xxix, 124 compilable code, producing, 11

bytecodes, optimisation of, 42 compiler, xxx, 67, 149

comprehensibility, 52
call by reference, 120
Computer Security Analysis through De-
call graph, xxix
compilation and High-Level Debug-
call graph, cycles in the, 126
ging, 36
call tail optimisation, 221
condition code register, 70
callees, xxix
condition codes, xxx
callers, xxix
conditional preservation, xxx
canonical form, xxix, 67, 179, 181
Connector plugin, 51
carry ag, xxix
constant, xxx
carrying (in propagation), 117
constant K, 181, 190
Chikofsky and Cross 1990, 52
constant propagation, 67
childless call, xxix, 85, 127
constant reference, 188
Cifuentes et al., 36
constant, type, xl
Cifuentes, Cristina, iv, 33, 35
constants, 63
CIL, xxx
constants, typing, 161
class hierarchy, recovering, 218
constraints, type, 42
class types, 154
constructors and destructors, 247
CLI, xxx, 44
context, xxx
COBOL, 45
continuation passing style, xxxi, 19
code checking, 9
contract, xxxi
code motion, 104
control ow analysis, 61
code recovery, 14
control ow, incomplete, 196
code, low level, 39
coprocessor, 75
CodeSurfer, 51
copy propagation, 67
CodeSurfer/x86, 51
copyright, 29
COINS, 54
cross platform, 12
Collberg, 60
CSE, 68
collector, xxx, 122, 136
cycle, 130
colocated variables, xxx, 151, 187, 189, 247
cycles (call graph), 126
commercial decompilation, 44
Cygwin, xxxi
common subexpression elimination, 68

comparable types, 160 D-Neliac decompiler, 34

Comparing Type-Based and Proof-Directed data ow analysis, 61, 64

Decompilation, 41 data map, 187

comparing types, 169 data, partitioning, 186

comparison (programs), 11 dataow guided recursive traversal, 18


268 INDEX

dataow machine, 146 DSP (Digital Signal Processing), 41, 46

Dava decompiler, 42 dynamically type checked, 149

dcc decompiler, 17, 27, 35, 40


early analysis, 133
De Sutter et al, 221
elementay and aggregate types, 155
dead code, xxxi
encryption of executable code, 60
dead code elimination, 69
endianness, xxxii, 158
debug information, 159
equations, data ow, 177
decoding, xxxi
Eriksson, David, 37
Decomp 1988, 39
escape analysis, 190, 246
decompiled output, xxxi
escape, of local variable address, xxxii, 118
decompiler
Essential Systems Technical Consulting, 45
.NET, 44
ESTC, 45
applications, 8
evaluation graph, 192
assembly language, 39
exception handling, 247
history, 33
Exec-2-C, 34
Java, 42
executable le, xxxii
machine code, 33
expression, xxxii
object code, 38
expression propagation, 65, 67, 90, 142,
virtual machine, 42
179, 200, 205
Decompiler Technologies, 45
expressions, 63
denes, xxxi
expressions, typing, 174
denition, xxxii

denition collector, 122 Falke, Raimar, v, 37

denition collectors, 137 FermaT, 41

denition-use, xxxii, 102 lter, xxxii

∆ functions, 178 lters, 83

Dependence Flow Graph (DFG), 145 nding bugs, 10

depth rst search, 126 nding malware, 11

Desquirr, 37 nding vulnerabilities, 10

Digital Signal Processing, 41 x bugs, 14

Digital Signal Processors, 46 ags, xxx, 95

disassembler, 17 ags register, 70

DisC, 36 ags, oating point, 72

dominance frontier, 99 oat (type), xxxiii

dominate, xxxii oating point ags, 72

dominator, post, 208 ow sensitive, 174

downcasting, xxxii forged address, 118

drivers, 12 Fortran, 58, 202


INDEX 269

forward propagation, 65 ideal decompiler, 17

forward substitution, 65 idiomatic, xxxiii, 11, 73, 120

Friedman, 39 Ilfak Guilfanov, iv

Fuan, C., 35 implicit denition, xxxiii

function, xxxvi implicit reference, xxxiii

function boundaries, 19 impossible dataow, 87

function pointers, 219 incomplete control ow, 196

future work, 19, 95, 107 incomplete ordering, 169

indexing, 180
Gagnon et al, 42
indirect branches, 198
Gated Single Assignment (GSA) form, 141
indirect calls, 37, 208
GCC, 56
indirect jumps and calls, 195
GENERIC, 56
induction variable, xxxiii, 145
GIMPLE, 56
information loss, 16
Glasscock, Pete, 40
infringements, patent, 11
global data ow, 87
initialisation functions, 215
global data ow analysis, 87
initialised arrays, 156
Global Oset Table, 86
Input program, xxxiii
goto statement, assigned, 201
instruction set simulation, 6
grammar, inverting, 39
interference graph, 107, 108
GrammaTech Inc, 51
intermediate representation, 2, 61, 141
greatest lower bound, 171
internal pointers, 114
Guilfanov, Ilfak, iv, 36, 192
interoperability, 9, 13

Halstead, Maurice, 33 Interprocedural Control Flow Graph, 221

halting problem, 18, 28 interprocedural edges, 87

Harlan, Bill, 13 inverting compiler grammar, 39

Harris, Ron, iv itof operator, xxxiii

hell node, 221


JAD, 43
Hendren, Miecznikowski and, 42
Janz, André, 36
Hex-Rays decompiler, 38
Java decompilers, 42
high level language, 154
JIT compiler, 60
history of decompilation, 33
JODE, 43
Hollander, C. R., 34
join operator t, 173
Hopwood, G. L., 34
join operator t, xl
Housel, B. C., 39
JReversePro, 43
Hungmong, L., 35
JuggerSoft, 45

ICFG, 221 jump instructions, indirect, 198

IDA Pro, 19, 3638, 46, 51 Just In Time compiler, xxxiii, 60


270 INDEX

K in type pattern, 181, 190 meet operator u, xl

Katsumata, 41 MicroAPL, 45

Knuth, Donald, 39 Microsoft Research, 57

Kumar, Satish, 36 middle analysis, 133

Miecznikowski and Hendren, 42


languages, high level, 39
migration, 45
lattice, xxxiii, 153, 184
minimum case value, 200
lattices, 169
minuend, xxxiv
lcc compiler, 208
MIXAL, 39
learning algorithms, 9
MLton compiler, 19
least upper bound, 173
modieds, xxxiv, 77
legal issues, 29
MOL620, 34
levels of source code, 5
MSIL, xxx
library function, 158, 189
multidimensional array, 181
limitations, 27
multiple inheritance, 209
linker, xxxiv
mutual recursion, 127
live location, xxxiv, 69
Mycroft, Alan, iv, 38, 40, 151, 165, 166
live range, xxxiv, 188, 189

local variable, xxxiv name (alias), 113

location, xxxiv Naval Underwater Systems Center, 40

locations, 63 ndcc, 36

Lockheed Neliac decompiler, 34 nested aggregates, 190

loss of information, 16 .NET decompilers, 44

loss of separation, 17 non-constant reference, 188

lost copy problem, 107 nonstandard stack pointer, 247

low level code, 39 notation, xl

Low Level Virtual Machine, 53, 116 numeric coprocessor, 75

m[exp ], xl obfuscation, 18, 43, 59

machine code, xxxiv object code, xxxv

machine code decompiler, xxxi object code decompilers, 24, 38

machine language, xxxiv, 4 object les, 221

maintainable code, 13 oset pointer, xxxv, 19

maintenance (data ow information), 102 Ohori, 41

malware, nding, 11 opaque predicates, 59

maximum case value, 201 Open64, 58

McDiarmid, Glen, iv operators, 63

McGill University, 42 optimisation, 64

meet operator u, 171 optimisation of bytecodes, 42


INDEX 271

optimse for platform, 12 procedure, xxxvi

ordering (of types), 169 procedure boundaries, 19

ordering of types, 169 program comparison, 11

original compiler, xxxv, 153 Proof-Directed De-compilation of Low-Level

original pointer, xxxv, 19 Code, 41

original source code, xxxv propagation, xxxvi, 65, 142, 179

overwriting statement, xxxv propagation conditions, 65

overwriting statements, 112 propagation, expression, 67, 90, 200, 205

protection of executable code, 59


Pande and Ryder, 210
push and pop, 120
parallel, 89, 247

parallel representation, 142


quotes, ii, 13, 29, 30, 52
parameter, xxxv, 121

parameter lter, xxxv range analysis, xxxvi, 201, 246

parameters, 77 reaching denitions, 66

parameters, type of, 173 readability, 90, 188, 200

parenthesis theory, 40 REC decompiler, 17, 27, 35, 223

partial ordering, 169 recency-abstraction, 52, 139

partitioning decompilation, 247 recompilability, xxxvi

partitioning the data, 186 recompilable code, producing, 11

patterns, type, 178 recompile, xxxvi

pentium, xxxv record, xxxvi

phase ordering, 247 recover types, 149

phase ordering problem, 196 recovery, source code, 14

φ loops, 123 recursion, xxxvi

φ-function, 98, 235 recursion, problems, 126

Phoenix, 57 recursive traversal, 18

π (pointer or integer), 177 recursive types, 155

Piler System, 34 reengineering, 11

PL/1, 39 reference, 188

plan (reverse engineering), 8 reference parameter, 83

platform, optimise for, 12 reference parameters, 78

plug-in, 37, 51 Reector, 44

Portable Executable (PE), 36 register colouring, 108

post dominator node, 208 relocation information, xxxvi

PowerPC, 95 renaming variables, 98

pre-initialised arrays, 156 Reps, Thomas, 151, 204, 221

preserved locations, 78, 91, 119 restore problem, 119

problems for reverse engineering tools, 16 results, xxxvii


272 INDEX

returns, xxxvii, 77 Simon, Doug, v, 40

reverse compilation, xxxvii simplication, 179

reverse engineering, xxxvii simplifying expressions, 72

Reverse Engineering Compiler (REC), 17, Singer, Jeremy, iv

27, 35 sink, xxxviii, 130

Reverse Engineering Tools, 16 Smalltalk language, 149

rhs-clear, 66 SML, 45

Rice's theorem, 28 software development, 149

RISC, 120 Software Migrations Ltd, 45

round-trip, 11 software patents, 11

RTL, 40 Soot, 42

running pointers, 156 soundness, 140


runtime type information, 159 source, xxxviii

source code, 3
Sable group, 42
source code levels, 5
safety critical applications, 11
source code recovery, 14
Sassa, Masataka, 98
Source Recovery, 45
Sassaman, W., 39
Source Recovery Company, The, 45
the save/restore problem, 119
sources of type information, 158
Scale (compiler infrastructure), 56

scaling, 179 sparse, xxxviii

Schnieder and Winger, 39 sparse switch statement, 208

Sega vs Accolade, 9, 29 sparse type information, 174

Self language, 149 SPEC benchmarks, 48

Self Modifying Code, xxxvii specications, 41

self-modifying code, 60 splitting functions, 220

Sendall, Shane, v square set symbols, 169

separating code from data, 18 Sreedhar's algorithm, 107

separating original and oset pointers, 19, Srivastava and Wall, 87

190 SSA, 40, 97, 121

separating pointers from constants, 19 SSA back translation, 102, 107

separation, loss of, 17 SSI, 142

shadow, 82 SST Global, 45

shadowed variable, xxxvii stack pointer, 86

Shulga, Andrey, 38 stack variables, 43

Σ functions, 177 Statements, switch, 198

signature, xxxviii static binary translation, 48

signature (of procedure), 158 Static Single Assignment, xxxviii, 40, 97,

signedness, xxxviii, 153, 161 121


INDEX 273

static single information, 142 type hierarchy, 153

status register, 70 type inferencing, 149

strong updates, 140 constraint based, 162

strongly connected component, xxxviii, 129 type information, soruces of, 158

Structuring, 43 type lattice, 153

structuring, 15, 247 type lattices, 169

Structuring Assembly Programs, 40 type notation, xl

subarray, xxxviii, 181 type patterns, 178

SUBFLAGS macro, 70 type propagation, 36

subtraction, 166 Type Propagation in IDA Pro Disassem-

subtrahend, xxxix bler, 36

subtype, xxxix type reconstruction, 149

subtype operator, xli type recovery, 149

SUIF2, 54 Type-Based Decompilation, 165

suitable, xxxix, 84 typing constants, 161

summary nodes, 140 typing expressions, 174

Sun Microsystems, v
ud-chains, 66
swapping two registers, 120
Ultrasystems, 39
switch statements, 198
undecidable, 90
symbol table, 187
University of Durham, 45
symbols, debug, 159
University of London, 41
tail call optimisation, 19, 220
University of Oxford, 45
target, xxxix
University of Wisconsin-Madison, 51
T (e), xl
unreachable code, xxxix, 69
this parameter, 189
Upton, Eben, iv
thread, 247
UQBT, v, 35, 48, 154
tools, automated, 10
use dollectors, 137
top (lattice), 170
use-denition, xxxix, 102
Tröger and Cifuentes, 37, 210, 221
use-denition chains, 66
Tröger, Jens, v

traditional reverse engineering, 52 value, xxxix

transformations, 11, 41 value analysis, xxxix, 210, 219, 246, 247

translating out of SSA form, 102, 107 value dependence graph, 142

Tree SSA (GCC), 56 value-set analysis, 51

type analysis, 38, 149 variables, colocated, 187

type checking, 149 veriable bytecode, 42

type constant, xl verication, 11

type determination, 210 virtual function call, 209, 218


274 INDEX

virtual function table, 209

Virtual Machine, xxxiii, 60

virtual machine decompilers, 42

virtual method, 37

viruses, 11

Visual Basic Right Back, 45

VSA, 51

VT, 209

vulnerabilities, nding, 10

Waddington, Trent, iv, 73, 94, 122

Ward, Martin, 41

warnings, 28

weak update problem, 139

WHIRL, 58

whole-program analysis, 87

Wide Spectrum Language, 41

Winger, Schnieder and, 39

Wroblewski, 60

WSL, 41

x64, xxxix

x86, xxxix

YaDec, 37

Zebra, 40

Zongtian, L., 35

You might also like