0% found this document useful (0 votes)

23 views17 pages

Augmenting Decompiler Output With Learned Variable Names and Types

The paper presents DIRTY (DecompIled variable ReTYper), a technique that enhances decompiler output by automatically generating meaningful variable names and types, addressing the loss of information during compilation. DIRTY outperforms previous methods, recovering original variable names 66.4% of the time and types 75.8% of the time by leveraging large datasets of human-written code and a Transformer-based neural network model. The authors also introduce DIRT, a large-scale dataset for training models to retype or rename decompiled code, and demonstrate DIRTY's effectiveness in improving code readability and understanding.

Uploaded by

zhangrunze60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views17 pages

Augmenting Decompiler Output With Learned Variable Names and Types

Uploaded by

zhangrunze60

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Augmenting Decompiler Output with Learned Variable Names and Types

Qibin Chen* , Jeremy Lacomis* , Edward J. Schwartz† ,

Claire Le Goues* , Graham Neubig* , Bogdan Vasilescu*
* Carnegie Mellon University. {qibinc, jlacomis, clegoues, gneubig, bogdanv}@cs.cmu.edu
† Carnegie Mellon University Software Engineering Institute. [email protected]
arXiv:2108.06363v1 [cs.SE] 13 Aug 2021

Abstract that maintain an index variable, increment it each iteration of

a loop, and conditionally jump on its value). Another, more
A common tool used by security professionals for reverse-
sophisticated tool is a decompiler, which transforms code
engineering binaries found in the wild is the decompiler. A
from binary to a high-level language such as C.
decompiler attempts to reverse compilation, transforming a
binary to a higher-level language such as C. High-level lan- Although decompilers generate abstractions that improve
guages ease reasoning about programs by providing useful code readability and are widely used by reverse engineers in
abstractions such as loops, typed variables, and comments, but practice, they never fully reconstruct the original developer-
these abstractions are lost during compilation. Decompilers written code [43], since the process of compilation irrevocably
are able to deterministically reconstruct structural properties destroys some information. This means that useful pieces of
of code, but comments, variable names, and custom variable information, such as comments, identifier names, and types,
types are technically impossible to recover. all of which are known to meaningfully contribute to program
In this paper we present DIRTY (DecompIled variable Re- comprehension [17, 30], are typically absent from decompiler
TYper), a novel technique for improving the quality of decom- output. Nonetheless, recent work has shown that it is possi-
piler output that automatically generates meaningful variable ble to reconstruct some useful information about the original
names and types. Empirical evaluation on a novel dataset of code during decompilation, namely identifier [25, 29] and
C code mined from GitHub shows that DIRTY outperforms procedural names [8], even when this information is not part
prior work approaches by a sizable margin, recovering the of the binary. The key insight is that human-written code is of-
original names written by developers 66.4% of the time and ten repetitive in the same context [2, 9, 23]. Therefore, given
the original types 75.8% of the time. large corpora of human-written code, one can learn highly
probable names for identifiers in similar contexts, even if not
always the exact names the authors of the code chose origi-
1 Introduction nally. This is an important improvement on the facilities of
modern decompilers, which almost completely ignore names
Reverse engineering is an important problem in the context beyond simple heuristics (e.g., i and j for loop guards).
of software. For example, security professionals use reverse In this paper we focus on the closely related problem of
engineering to understand the behavior or provenance of mal- recovering meaningful variable types, an important additional
ware [12, 54, 55], discover vulnerabilities in libraries [49, 55], layer of code documentation that can help improve readability
or patch bugs in legacy software [49, 55]. However, since it and understandability [14, 43, 48]. Figure 1 shows an example
is rare to have access to source code, analysis is often per- of a simple function and its decompilation. The author of the
formed at the binary level. This can be challenging: compilers original code in Figure 1a has defined a pnt structure that
optimize for execution speed or binary size, not readability. contains two float members used to refer to the X and Y
A number of tools attempt to make the process of coordinates of a point. This makes it possible to define a
examining binary programs easier. One is the disassembler, new point and refer to its members by name (e.g., p1.x and
which converts raw binary code to a sequence of instructions p1.y). Because the decompiler does not know about the pnt
executed by the compiler. Although this produces human read- structure, it creates two float arrays instead of generating a
able code, reasoning about assembly code can still be difficult. struct (Figure 1b). This can harm understandability. First, it
Operations that are simple to specify at a high-level are often is not clear that v1 and v2 represent points at all. Second, even
translated into a long sequence of assembly instructions (e.g., if better names were chosen, such as point1 and point2, and a
looping over the elements of an array requires instructions reverse engineer concluded that they represent 2D points, it

1
typedef struct point { However, these systems typically only support a small set of
float x;
float y; manually-defined types and well-known library calls. Neither
} pnt; the first nor the second deal with the padding issue above.
void fun() { void fun() { In contrast, our system DIRTY (DecompIled variable Re-
pnt p1, p2; float v1[2], v2[2]; TYper) recovers both semantic and syntactic types, handles
p1.x = 1.5; v1[0] = 1.5;
p1.y = 2.3; v1[1] = 2.3; padding, and is not limited to a small set of manually-defined
// ... // ... types. Instead, DIRTY supports 48,888 possible types encoun-
use_pts(&p1, &p2); use_pts(v1, v2);
} } tered “in the wild” in open-source C code (compared to the
(a) Original code (b) Decompiled fun
150 different type names in 84 standard library calls supported
by REWARDS). At a high level, DIRTY is a Transformer-
Figure 1: A function with a struct and its decompilation. based [50] neural network model to recommend types in a
particular context, which operates as a postprocessing step to
void fun() { void fun() { decompilation. DIRTY takes a decompiled function as input,
// stack layout: // stack layout: and outputs probable names and types for all of its variables.
// [xxx][p][yyyy] // [xxxx][yyyy]
char x[3]; char x[4]; To build DIRTY, we start by mining open-source C code
int y; int y; from G IT H UB, and then use a decompiler’s typical ability to
// ... // ...
} } import variable names and types from DWARF debugging
information to create a parallel corpus of decompiled func-
(a) Original code (b) Decompiled fun
tions with and without their corresponding original names
Figure 2: A function illustrating the data layout problem in and types. As a side effect of this large-scale mining effort,
decompilation. In the stack layout the characters x, y, and p we also automatically compile a library of types encountered
represent a single byte assigned to the variables x and y, or across our open-source corpus. We then train DIRTY on this
padding data respectively. The decompiler cannot recognize data, introducing two task-specific innovations. First, we use
that the inserted padding data does not belong to the x array. a Data Layout Encoder to incorporate memory layout infor-
mation into DIRTY’s predictions and simultaneously address
a fundamental limitation of decompilers caused by padding.
is not clear which array index refers to which coordinate, or Second, we address both the variable renaming and retyp-
even that the coordinates are Cartesian (instead of e.g., polar). ing tasks simultaneously with a joint Multi-Task architecture,
Unlike names, types are constrained by memory layouts, enabling them to benefit from each other.
and thus theoretically should be easier to recover (only types We show that DIRTY can assign variable types that agree
that fit that memory layout should be considered as candi- with those written by developers up to 75.8% of the time, and
dates). In fact, decompilers already narrow down possible type DIRTY also outperforms prior work on variable names.
choices using the fact that base types targeting a specific plat- Note that even though we implement DIRTY on top of the
form can only be assigned to variables with a specific memory Hex-Rays1 decompiler because of its positive reputation and
layout (e.g., on most platforms an int variable can never be its programmatic access to decompiler internals, our approach
retyped to a char because they require different amounts of is not fundamentally specific to Hex-Rays, and should concep-
memory). This already makes it possible for decompilers to tually work with any decompiler that names variables using
infer base types and a small set of commonly-used typedefs. DWARF debug symbols.
On the other hand, despite performing a battery of complex In summary, we contribute:
binary analyses, the data layout inferred by the decompiler is • DIRT—the Dataset for Idiomatic ReTyping—a large-
often incorrect, which makes the problem harder. For exam- scale public dataset of C code for training models to
ple, consider the program shown in Figure 2. Two top-level retype or rename decompiled code, consisting of nearly
variables are declared, x: a three-byte char array, and y: a four- 1 million unique functions and 368 million code tokens.
byte int. During compilation, the compiler inserts a single
byte of padding after the x array for alignment. When this • DIRTY—the DecompIler variable ReTYper—an open-
function is decompiled, the decompiler can tell where x and source Transformer-based neural network model to re-
y begin, but it cannot tell if x is a three-byte array followed cover syntactic and semantic types in decompiled vari-
by a single byte of padding, or a four-byte array whose last ables. DIRTY uses the data layout of variables to im-
element is never used. prove retyping accuracy, and is able to simultaneously
Prior work on reconstructing types falls into two groups. retype and rename variables in decompiled code.
The first, such as TIE [31], attempt to recover syntactic types, Example output from DIRTY is available online at
e.g., struct {float; float}, but not the names of the structure https://dirtdirty.github.io/explorer.html.
or its fields. The second, such as REWARDS [33], attempt
to also recover the type name (referred to as semantic types). 1 https://www.hex-rays.com/products/decompiler/

2
2 Model Design more data in the same amount of time. In our case, this en-
ables us to train on our large-scale, real-world dataset, which
In this section, we describe our machine learning model and consists of 368 million decompiled code tokens.
decisions that influenced its design, starting with some rel- Although there have been a number of advances in
evant background. Our model is a neural network with an neural machine translation since the original Transformer
encoder-decoder architecture. model [50], most recent advances focus on improvements
on other factors, such as training data and objectives [4, 10,
32, 42], dealing with longer sequences [57], efficiency [6],
2.1 The Encoder-Decoder Architecture
and scaling [15], rather than changing the fundamental archi-
Our task consists of generating variable types (and names) as tecture. Moreover, most of these improvements are tailored
output given individual functions in decompiled code as input. for the natural language domain, making them less general-
This means that unlike a traditional classification problem izable than the original model and inapplicable to our task.
with a fixed number of classes, both our input and output are Instead, we keep our model simple, which allows different,
sequences of variable length: input functions (e.g., fed into the better architectures or implementations to be used out-of-the-
network as a sequence of tokens) can have arbitrarily many box in the future. For example, the recent Vision Transformer
variables, each requiring a type (and name) prediction. (ViT) [11], which also intentionally follows the original Trans-
Therefore, we adopt an encoder-decoder architecture [7], former architecture “as closely as possible” when adapting
commonly used for sequence-to-sequence transformations, Transformers to computer vision tasks.
as opposed to the traditional feed-forward neural network ar- We omit the technical details of Transformers, includ-
chitecture used in classification problems with a fixed-length ing multi-headed self-attention, positional encoding, and the
input vector and prediction target. More specifically, the en- specifics of training as they are beyond the scope of this paper.
coder takes the variable-length input and encodes it as a fixed-
length vector. Then, this fixed-length encoding is passed to 2.3 DIRTY’s Architecture
the decoder, which converts the fixed-length vector into a
variable-length output sequence. This architecture, further en- In DIRTY, we cast the retyping problem as a transformation
hanced through the attention mechanism [3], has been shown from a sequence of tokens representing the decompiled code
to be effective in many tasks such as machine translation, text to a sequence of types, one for each variable in the original
summarization [36], and image captioning [53]. source code. This section describes DIRTY’s architecture in
detail. Figure 3 shows an overview of the architecture.

2.2 Transformers Code Encoder. The encoder converts the sequence of code
tokens of the decompiled function (lower-left of Figure 3),
There are several ways to implement an encoder-decoder. x = (x1 , x2 , . . . , xn ), into a sequence of representations,
Until recently, the standard implementation used a particu- H = (h1 , h2 , . . . , hn ) , (1)
lar type of recurrent neural network (RNN) with specialized
neurons called long short-term memory units; these neurons where each continuous vector hi ∈ Rd_model is the contextual-
and networks constructed from them are commonly referred ized representation for the i-th token xi . During training, the
to as LSTMs [24]. More recently, Transformer-based mod- encoder learns to encode the information in the decompiled
els [4, 15, 42, 57], building on the original Transformer archi- function x relevant to solving the task into H. For example, for
tecture [50], have been shown to outperform LSTMs and are a code token xi =v1, useful information about v1 in the con-
considered to be the state-of-the-art for a wide range of natural text of x (e.g., operations performed on v1) is automatically
language processing tasks, including machine translation [4], learned and stored in hi .
question answering and abstractive summarization [10, 32], Specifically, we denote the encoding procedure as
and dialog systems [1]. Transformer-based models have also H = fen (x; θen ) , (2)
been shown to outperform convolutional neural networks such where the input x = (x1 , x2 , . . . , xn ) is the code token se-
as ResNet [19] on image recognition tasks [11]. quence of the decompiled function and the output H =
Transformers have several properties that make them a (h1 , h2 , . . . , hn ) is the sequence of deep contextualized repre-
particularly good fit for our type prediction task. First, they sentations. fen denotes the encoder, implemented with neural
capture long-range dependencies, which commonly occur in networks, and θen denotes its learnable parameters.
program code, more effectively than RNNs. For example, a The ultimate goal of DIRTY is to make type predictions
variable declared at the beginning of a function may not be about each variable that appears in the decompiled function.
used until much later; an ideal model captures information However, the encoder produces hidden representations for
about all uses of a variable. Second, transformers can perform every code token (e.g., “v1”, “:”, “=”, “v1”, “+”, “1” are all
more computations in parallel on typical GPUs than LSTMs. tokens). Because a variable can appear multiple times in the
As a result, training is faster, and a Transformer can train on code tokens of a function, we need a way to summarize all

3
VAR1

VAR1

VAR2
…

ID: VAR1
Size: 12
Loc: Stack 0x1c struct timeval {
time_t tv_sec;
Offsets: [0, 8] suseconds_t tv_usec;
…
}

Figure 3: Overview of DIRTY’s neural model architecture for predicting types. Decompiled code is sequentially fed into the
Code Encoder. When the input of the code encoder corresponds to a specific variable (e.g., VAR1), it is pooled with other instances
of the same variable to generate a single encoding for that variable. Each pooled encoding is then passed into the Type Decoder,
which outputs a vector of the log-odds (logits) for predicted types. This vector is masked with a vector generated by the Data
Layout encoder and the most probable type is chosen from the masked logits.

appearances of a variable. We achieve this through pooling, 2. The output layer of the decoder then uses its learnable
where the representation for the t-th variable2 is computed weight matrix W and bias vector b to transform the
based on all of its appearances in the code tokens, At , using hidden representation zt to the logits for prediction
an average pooling operation [29] st = Wzt + b, (5)
vt = AveragePoolxi ∈At hi , t = 1, . . . , m (3) where st ∈ R|T | , W ∈ R|T |×d_model , b ∈ R|T | , and |T | is
where m is the number of variables in the function. This solu- the number of types in the type library. The logits st is
tion removes the burden of gathering all information about a the unnormalized probability predicted by the model, or
variable throughout the function into a single token represen- the model’s scores on all types
tation from the model. The pooled representation for the first
variable, VAR1, is shown in the upper-left of Figure 3. 3. Finally, the softmax function computes a probability dis-
tribution over all possible types from st
Type Decoder. Given the encoding of the decompiled tokens, Pr(ŷt |ŷ1 , ŷ2 , . . . , ŷt−1 , x) = softmax st (6)
the decoder predicts the most probable (i.e., idiomatic) types
for all variables in the function. The decoder takes the encoded Note that the type library T is fixed, meaning DIRTY can
representations of the code tokens (H) and identifiers (vt ) as only predict types that it has seen during training. We discuss
input and predicts the original types ŷ = (ŷ1 , ŷ2 , . . . , ŷm ) for all this limitation, its implications, and potential mitigations in
m variables in the function. Unlike the encoder, the decoder Section 5. However, DIRTY can recover structure types as
predicts the output step-by-step using former predictions as well as normal types, as both are simply entries in T .
input for later ones.3 The goal of the decoder is to find the optimal set of
At each time step t, the decoder tries to predict the type for type predictions for all variables in a given function (i.e.,
the t-th variable as follows: the predictions with the highest combined probability):
1. The decoder takes the code representations H and vari- argmaxŷ Pr(ŷ|x). This probability can be factorized as the
able representation vt from the encoder, and also previ- product of probabilities at each step:
ous predictions ŷ1 , ŷ2 , . . . , ŷt−1 from itself, to compute a m
hidden representation zt ∈ Rd_model Pr(ŷ | x) = ∏ Pr (ŷt | ŷ1 , ŷ2 , . . . ŷt−1 , x) . (7)
t=1
zt = fde (ŷ1 , ŷ2 , . . . , ŷt−1 , vt , H; θde ) (4) We’ve shown how to compute Pr (ŷt | ŷ1 , ŷ2 , . . . ŷt−1 , x) with
where fde , θde denotes the decoder and its parameters. the decoder, but finding the optimal ŷ = (ŷ1 , ŷ2 , . . . , ŷm ) is not
The hidden representation zt is then used for prediction. an easy task, because each variable can have |T | possible pre-
2t is commonly used in RNN literature because it refers to a “timestep”. dictions, and each prediction affects subsequent predictions.
3 This is known as an autoregressive model. The time complexity of exhaustive search is O (|T |m ). There-

4
fore, finding the optimal prediction is often computationally VAR1 Data Layout

infeasible for large functions. A simple approach is greedy Loc_

S0x1c
Size_12
Offset_
0
Offset_
8
decoding, selecting the most promising prediction at every
step based on the previously selected predictions, i.e., taking
the max ŷt = argmaxyt Pr (ŷt = i | ŷ1 , ŷ2 , . . . , ŷt−1 , x). Greedy e1 e2 e3 en
decoding is fast, but it often finds subpar predictions.
In DIRTY, we use beam search [37], a compromise be- epos1 epos2 epos3 eposn
tween greedy decoding and an exhaustive search. Rather than
only taking the most promising prediction (greedy), beam
Data Layout Encoder
search considers a configurable number of most promising
predictions at each step. In practice, it is usually able to find
good (but not optimal) predictions, but is significantly faster
than an exhaustive search.
· =

2.4 Data Layout Encoder m1

Wm
The model described so far only uses information encoded
into the code tokens of the decompiled representation. But to Figure 4: The Data Layout Encoder of DIRTY. The data
actually create such an output, decompilers typically perform layout for a specific variable, including its location, size, and
a battery of complex binary analyses. Some decompilers al- offsets of its members is passed into the layout encoder (top),
low the user to programmatically access the interim results generating a mask (bottom).
from some of these analyses. In particular, Hex-Rays provides
information about the storage location (e.g., register or stack
offset), size, nested data types (e.g., if the variable is a struct), Internal Offsets: The offsets of members of the type (either
and offsets of its members, if any (e.g., offsets in an array or of array elements or struct fields), in bytes. E.g., the type
int[2] would have the offsets {0, 4}, while a struct with
fields in a struct), for each variable in a function. Intuitively,
this information can help DIRTY rule out bad predictions. For two char fields would have the offsets {0, 1}. These are
instance, a variable that is 4 bytes long could not be a char tokenized as a sequence of [Offset_<Offset>]. For
type because it would not fit. consistency, we also use [Offset_0] for types without
One inefficient approach could use this information as a substructure (i.e., scalar types like int).
hard constraint on the decoder’s predictions, i.e., a mask which The tokenized data layout information is concatenated into a
sets the probability of any “incompatible” types to 0. How- sequence denoted Mt and then encoded as

ever, this runs into a problem when the decompiler incorrectly mt = flayout Mt ; θlayout , (8)
reconstructs the data layout (see Figure 2). To mitigate this, where mt is the hidden representation of data layout informa-
DIRTY learns a soft mask, reducing probabilities without tion. Inspired by Michel and Neubig [35], we adjust the output
setting them to 0. For example, DIRTY can learn based on type distribution with data layout information. Formally, we
many observations that a decompiled char[4] should be typed modify Equation (5) to fuse the data layout representation mt
as a char[3] 5% of the time and char[4] 80%, and adjust the into the final output layer:
predictions of the type decoder accordingly. This allows the s̃t = st + Wm mt = Wzt + Wm mt + b, (9)
model to learn how best to incorporate the data layout infor- where st is the logits predicted by the Type Decoder, Wm mt
mation from the decompiler, including when the information is the “soft mask” produced by the Data Layout Encoder,
is likely to be incorrect. Figure 3 illustrates where the data and s̃t is the new masked logits. Wm ∈ R|T |×d_model denotes
layout encoder fits into the overall architecture. the learnable weight matrix in the final layer of Data Layout
To implement the soft mask encoder, we jointly train an- Encoder for transforming the data layout representation mt ∈
other Transformer encoder to use data layout information to Rd_model to the mask ∈ R|T | . This implements a soft filter for
generate a mask. Figure 4 shows the internals of the data type prediction using data layout information.
layout encoder. First, variable data layout is passed to the
encoder. There are three parts to the data layout for a specific
variable, each of which is simply converted to a token: 2.5 Multi-Task
Location: A variable can be located either in registers (to- Many variable names are indicative of their underlying type.
kenized as [Loc_<Register Name>]) or on the stack For example, i and j are often used to represent integers, s and
([Loc_S<Offset>]). E.g., a variable stored 28 bytes be- str are often used to represent strings, etc. Thus, intuitively,
low the stack pointer is tokenized as [Loc_S0x1c]. there is some connection between a variable’s name and its
Size: Measured in bytes and tokenized as [Size_<Size>]. type. Indeed, measuring the adjusted mutual information [51]

5
3 Evaluation

VAR1

VAR2
…

We conducted experiments to evaluate DIRTY, answering the

following research questions:
RQ1: How effective is DIRTY at idiomatic retyping?
RQ2: How well does DIRTY perform on other decompila-
+ + tion benchmarks compared to prior work?
RQ3: How does each component of DIRTY contribute to the
retyping and renaming performance?
Type Name Type
struct timeval {
RQ4: How does compiler optimization affect DIRTY’s pre-
tv … …
time_t tv_sec; diction accuracy?
suseconds_t tv_usec;
}
y1 z1 y2
3.1 Experimental Setup
Figure 5: The multi-task decoder for DIRTY, which predicts
both variable types and names. The encoder architecture is First, we introduce the DIRT dataset we used for training
the same as in Figure 3. Each variable is passed to the decoder DIRTY, and experimental setup details. The detailed hyper-
twice, the first time a type is predicted (yi ), and the second time parameters for our deep learning model and environment con-
a name is predicted (zi ). Note that the data layout encoding of figuration are described Appendix A.
a variable is only used to weight type predictions. Dataset for Idiomatic ReTyping (DIRT). To create DIRT,
we queried a 2017 version of the GHT ORRENT4 database,
compiling a list of public G IT H UB repositories predominantly
between variable names and types in our dataset, we find a written in C. We then cloned these repositories locally using
moderate association (0.41 on the scale [0, 1]). Since variable an open-source tool, GHCC,5 to automatically build them.
names can often be recovered from decompiled code using GHCC identifies build instructions (e.g., Makefiles) in reposi-
neural models [29], this may help us learn to predict variable tories, creates a Docker container with the requisite libraries,
types as well (and vice versa). and attempts to build the project. We used GCC version 9.2.0.
For most experiments, we explicitly disable optimizations
To test this, we extend DIRTY to also predict names with a
using the -O0 compiler flag. We also evaluated DIRTY at
single, integrated multi-task model. That is, we also predict a
higher optimization levels in Section 3.5. This process re-
variable name for each variable in the function
sulted in 4,346,134 automatically compiled 64-bit x86 bi-
ẑ = (ẑ1 , ẑ2 , . . . , ẑm ) (10) naries. After compilation, we then decompiled each binary
where ẑt denotes the predicted name for the t-th variable. using Hex-Rays and filtered out any functions that did not
DIRTY’s decoder outputs are interleaved to predict names have variables requiring renaming or retyping. Following
and types in parallel (Figure 5). The first time the decoder is DIRE [29], we compiled each binary again with debugging
invoked on the t-th variable, it outputs the predicted type (ŷt ) information to align decompiler-assigned variable names (e.g.,
and the second time it outputs a predicted name (ẑt ). v1) and developer-assigned variable names (e.g., picture) to

The training and prediction procedures remain almost the form training examples.
same, with two notable exceptions. First, to improve perfor- Since DIRE was only concerned with renaming, its dataset
mance, the Data Layout encoder is not activated when the did not include variables which did not correspond to a named
decoder is predicting a variable’s name. This is unnecessary variable in the original source code. Many such variables are
because name prediction depends on the predicted type, which actually caused by mistakes in the decompiler during type
has already incorporated the data layout information. Prelim- recovery, for instance decompiling a structure to multiple
inary experiments confirmed no improvement in accuracy scalar variables instead. Since the goal of DIRT is to enable
when using the Data Layout encoder for name prediction. type recovery and fix such mistakes, we label these instances
as <Component> to denote that they are components of a variable
Second, there are two ways to interleave the predictions of in the source code. This allows the model to combine them
types and names: types first or names first. In theory, this does with other variables into an array or a struct.
not matter because they are equivalent if the learned model
The final DIRT dataset consists of 75,656 binaries ran-
and the decoding algorithm are ideal. In practice, we chose
domly sampled from the full set of 4,346,134 binaries to
to predict types first because we believe the type prediction
task should be easier (since there is more information) and it 4 https://ghtorrent.org

better reflects how developers define variables. 5 https://github.com/huzecong/ghcc

6
yield a dataset that we could fully process based on the com- Overall In Train Not in Train
putational resources we had available. We split the dataset Method All Struct All Struct All Struct
per-binary as opposed to per-function, which ensures that dif-
ferent functions from the same binary cannot be in both the FSize 23.6 9.7 23.5 9.1 23.8 10.4
test and training sets. The training dataset consists of 997,632 HR 37.9 28.7 39.0 28.7 36.4 28.7
decompiled functions, and a total number of 48,888 different DIRTY 75.8 68.6 89.9 79.2 56.4 54.6
types. We also preprocess the decompiled code with byte-pair
encoding (BPE) [45], a widely adopted technique in NLP Table 1: DIRTY has higher retyping accuracy than Frequency
tasks to represent rare words with limited vocabulary by to- By Size (FSize ) and Hex-Rays (HR) on the DIRT dataset, both
kenizing them into subword units. After this step, the DIRT for all types (All) and on structural types alone (Struct).
dataset consists of 368 million decompiled code tokens, and
an average of 220.3 tokens per function. Detailed statistics Baselines. We measure our accuracy with respect to two base-
about the DIRT dataset and the train/valid/test split can be line methods for predicting variable types:
found in Table 11 in Appendix A.
Frequency by Size The number of bytes a variable occupies
Metrics. We evaluate DIRTY using two metrics: is the most basic information for a type. For this tech-
Name Match: Following DIRE [29], we consider a variable nique, we predict the most common developer-assigned
name prediction correct if it exactly string matches the type for a given size (as reported by the decompiler).
name assigned by the original developer. We compute E.g., int is the most common 4-byte type, and __int64
the prediction accuracy as the average percentage of is the most common 8-byte type; this baseline simply
correct predictions across all functions in the test set. assigns these types to variables of the respective size.
Type Match: We consider a type prediction to be correct Hex-Rays [22] During decompilation, Hex-Rays already pre-
only if the predicted type fully matches the ground truth dicts a type for each variable, so we can use these predic-
type, including data layout, and the type and name of tions as a baseline. However, Hex-Rays cannot predict
any fields if applicable. We serialize types to strings and developer-generated types without prior knowledge of
use string matching to determine type matching. them, e.g., Hex-Rays assigns unsigned __int16 instead
of the more common uint16_t, which puts it at an un-
Note that both metrics are conservative. Predictions may fair disadvantage. For this baseline, we reassign the type
still be meaningful, even if not identical to the original names. chosen by Hex-Rays to the most common developer-
A human study evaluating the quality of predicted types and chosen name associated with it (e.g., we replace every
names is beyond the scope of the current paper. unsigned __int16 with uint16_t.
Meaningful Subsets of the Test Data. We introduce several
Results. As shown in Table 1, DIRTY can correctly recover
subsets of the DIRT test set to better interpret the results:
75.8% of the original (developer-written) types from the de-
Function in training vs Function not in training. compiled code. In contrast, Hex-Rays, the highest scoring
Similarly to Lacomis et al. [29], Function in training baseline, can only recover 37.9% of the original types.
consists of the functions in the test set that also appear As expected, DIRTY performs even better when it has seen
in the training set, which are mainly library functions. a particular function before (In Train), generating the same
Allowing this duplication simulates the realistic use type as the developer 89.9% of the time. This indicates that
case of analyzing a new binary that uses well-known DIRTY works particularly well on common code such as
libraries. We also separately measure the cases where libraries. Even when a function has never been seen (Not in
the function is not known during training (i.e., Function Train), DIRTY predicts the correct type 56.4% of the time.
not in training) to measure the model’s generalizability. Table 1 also shows the performance of DIRTY on structure
Structure types. Only 1.8% of variables in DIRT have struc- types alone. Correctly predicting structure types is more diffi-
ture types. Because of this low percentage, examining cult than predicting scalar types, and all models show a drop in
overall accuracy may not reflect DIRTY’s accuracy when performance. Despite this drop, DIRTY still achieves 68.6%
predicting structure types, which we have found anec- accuracy overall, and 54.6% accuracy on the Function not in
dotally to be more challenging. To mitigate this, we training category. Frequency By Size struggles on structures
separately measure DIRTY’s accuracy on structures in with only 9.7% accuracy; this is expected since structures of a
addition to its overall accuracy. given size can have many possible types. Hex-Rays is slightly
more accurate at 28.7%, as the decompiler is able to analyze
the layout of structures.
3.2 RQ1: Overall Effectiveness
Table 2 shows several examples of retyping predictions
We evaluate DIRTY on the idiomatic retyping task and report from the Function not in training partition. These examples
its accuracy compared to several baselines. show that accuracy is not the full story; even when DIRTY

7
int char * class std::string

int 88.8% char * 60.3% class std::string 47.5%

unsigned int 4.3% const char * 11.4% char[32] 24.2%
<Component> 2.7% <Component> 4.4% char[47] 14.6%
uint32_t 0.8% __int64 4.1% class std::__cxx11::basic_string 6.1%
u_int32_t 0.3% size_t 1.8% char[40] 3.5%

Table 2: Example variable types from the Function not in training testing partition. The top rows are the developer-assigned
types and the columns show DIRTY’s top-5 most frequent predictions. <Component> represents a prediction that the variable in the
decompiled code does not correspond to a variable in the source code (e.g., because it corresponds to a member of a struct).

is unable to predict the correct type, the differences are often Because DIRTY can predict up to 48,888 different types,
minor (e.g., unsigned int v. int, and const char * v. char *). each including the full syntactic and semantic information,
The bottom half of Table 2 shows prediction examples of we convert its predictions in a post-hoc manner to make it
structure types.6 DIRTY is able to recover the actual struc- comparable with O SPREY.9
ture much of the time. At other times, DIRTY also produces Table 3 compares the accuracies of both systems. On the
some semantically reasonable but syntactically unacceptable overall coreutils benchmark, DIRTY slightly outperforms O S -
predictions, like char[32] for class std::string. PREY (76.8% vs 71.6%). O SPREY outperforms DIRTY on
the Visited subset, but as expected, performs worse on the
3.3 RQ2: Comparison with Prior Work Non-Visited functions. Meanwhile, DIRTY is more consistent
on Visited and Non-Visited. When only looking at structure
We further compare DIRTY with recent work on type recov- types, O SPREY outperforms DIRTY (26.6% vs 15.7%).
ery [58] and variable name recovery [29]. However, this comparison puts DIRTY at a disadvantage,
Type Recovery. While there is prior work on type recovery since O SPREY was designed for this task of recovering syn-
(see also Section 4), none of the existing approaches, TIE [31], tactic types, while DIRTY was trained to recover variable and
Howard [47], Retypd [39], TypeMiner [34] and O SPREY [58], type/field names, and much of this information is thrown out
are publicly available. We are grateful to Zhang et al. [58], for this evaluation. To address this, we trained a new model,
the authors of O SPREY, for kindly sharing their evaluation DIRTYLight , on DIRT, but tailored the training to O SPREY’s
material so we could compare results. simplified task. The accuracy of this model is also reported in
O SPREY is a recently proposed probabilistic technique Table 3. As expected, the DIRTYLight model outperforms the
for variable and structure recovery that outperforms exist- off-the-shelf DIRTY model, since it is trained specifically for
ing work including Howard [47], Angr [46], Hex-Rays [22] this task. DIRTYLight greatly improves prediction accuracy
and Ghidra [58]. The O SPREY authors provided us with the on the Struct subset, and even outperforms O SPREY.
GNU coreutils7 executables they used in their evaluation, To further get a fine-grained comparison with O SPREY, we
which were compiled with -O0 to disable optimization. We calculate accuracy on 101 coreutils binaries individually, and
ran DIRTY on these executables, but only evaluated on stack show the prediction accuracies of DIRTY and O SPREY with
and heap variables, since O SPREY does not recover register respect to the number of variables in the programs in Figure 6.
variables. This benchmark consists of 101 binaries and 17,089 We observe that DIRTY is competitive compared with O S -
variables. We also define two subsets of the dataset: PREY . Interestingly, while the results on large binaries are
Visited A subset of 13,020 variables that are covered by close, DIRTY performs better on small binaries. This suggests
BDA [59], a binary abstract interpretation tool that O S - our learning-based method trained on GitHub data might gen-
PREY relies on. O SPREY is expected to perform better on eralize better on rare patterns compared to empirical methods
these covered functions than uncovered functions, which that might have been developed based on observations on a
we also report as Non-Visited.8 However, DIRTY is not limited number of common and relatively larger programs.
subject to this limitation. In addition, DIRTY is also much faster and scalable. On
Struct A subset of 3,061 variables related to structure types. average, O SPREY takes around 10 minutes to analyze one
Following O SPREY, we include structs allocated on the binary in coreutils, while it takes 75 seconds for DIRTYLight
stack, pointers to structs on the heap, and arrays of structs. to finish inference on the whole coreutils benchmark.
These variables do not have to be in the Visited subset. 9 Specifically, we discard type names and field names. For example, bool
6 We omit the full predicted contents of structs here for conciseness. and char are both converted to Primitive_1, which stands for a primitive
7 https://www.gnu.org/software/coreutils/
type occupying 1 byte of memory, const char * and char * are converted
8 A majority of uncovered functions are unreachable from the entry point to Pointer<Primitive_1>, and struct ImVec2 {float x; float y
of the binary, and others are indirect call targets which BDA fails to analyze. ;} converted to Struct<Primitive_4, Primitive_4>.

8
Coreutils Accuracy
Model All Visited Non-Visited Struct Model Overall Struct
O SPREY 71.6 83.8 32.4 26.6 DIRTYS 74.5 65.4
DIRTY 76.8 79.1 69.6 15.7 DIRTY 75.8 68.6
DIRTYLight 80.1 80.1 80.1 27.7
Table 5: Effect of model size. The accuracy columns show
Table 3: Accuracy comparison on the Coreutils benchmark. the overall accuracy and the accuracy on struct types.

100%
on the DIRT dataset. Since DIRE is focused on variable re-
80% naming, and there is no type information collected in their
dataset, we cannot use the Data Layout Encoder for these
Accuracy

60%
experiments. Instead, we only use our Code Encoder and Re-
40% naming Decoder. We report the accuracy of both systems in
Table 4. DIRTY significantly outperforms DIRE in terms of
20% DIRTY
overall accuracy on both the DIRE dataset (81.4% vs. 72.8%),
OSPREY
0%
and on the DIRT dataset (66.4% vs. 57.5%). DIRTY also
100 200 300 400 500 generalizes better than DIRE: when functions are not in the
Number of Variables in Binary training set, DIRTY outperforms DIRE on both the DIRE
(42.8% vs. 33.5%) and the DIRT datasets (36.9% vs. 31.8%).
Figure 6: Accuracy of DIRTY and O SPREY on 101 individual DIRTY outperforms DIRE in spite of the fact that it only
programs in the coreutils benchmark with different number of leverages the decompiled code, whereas DIRE leverages both
variables. The two methods are competitive on large binaries, the decompiled code and the reconstructed AST from Hex-
while DIRTY performs much better on small binaries. Rays. Since the primary difference between DIRTY without
type prediction and DIRE is that it uses Transformer as its
encoder and decoder network, we attribute this improvement
Overall, we believe both methods are valuable. Since at this to the power of Transformers, which allow modeling interac-
point DIRTY is using Hex-Rays recovered data layout as input tions between any pair of tokens, unrestricted to a sequential
to its Data Layout Encoder, we believe a promising future or tree structure as in DIRE.
direction is to combine these two methods—using O SPREY’s Also notable is how DIRTY trains faster than DIRE. We
results as the input to DIRTY’s, and the combined approach found that DIRTY surpassed DIRE in accuracy after training
can potentially achieve even better results. for 30 GPU hours, compared to the 200 GPU hours required to
Name Recovery. The Decompiled Identifier Renaming En- train DIRE on the full DIRT dataset, which we again attribute
gine (DIRE) is a state-of-the-art neural approach for decom- to the efficiency of the Transformer architecture.
piled variable name recovery [29]. The DIRE model consists
of both a lexical encoder and a structural encoder, utilizing 3.4 RQ3: Ablation Study
both tokenized decompiled code and the reconstructed ab-
stract syntax tree (AST). In contrast, DIRTY’s simpler en- To understand how each component of DIRTY contributes to
coder only uses the tokenized decompiled code. its overall performance, we perform an ablation study.
The DIRE authors provide a public dataset for decompiled
Model Size. Transformers have the merit of scaling easily
variable renaming compiled with -O0. To compare with DIRE,
to larger representational power by stacking more layers, in-
we train DIRTY on the DIRE dataset and also train DIRE
creasing the number of hidden units and attention heads per
layer [10, 50]. We compare DIRTY to a modified, smaller
DIRE Dataset DIRT Dataset version DIRTYS . DIRTY contains 167M parameters, while
Model All FIT FNIT All FIT FNIT DIRTYS only 40M. Table 10 contains details of the hyperpa-
rameter differences between the two models.
DIRE 72.8 84.1 33.5 57.5 75.6 31.8 Table 5 shows overall DIRTY is 75.8% accurate vs. 74.5%
DIRTY 81.4 92.6 42.8 66.4 87.1 36.9 for DIRTYS ’s. This indicates increasing the model size has
a positive effect on retyping performance. The gain from
Table 4: Accuracy comparison of DIRE and DIRTY on the increased model capacity is notably larger when comparing
DIRE and DIRT datasets. Accuracy is reported overall (All), performance on structures. This improvement suggests that
when functions are in the training set (FIT), and when func- complex types are more challenging and require a model with
tions are not in the training set (FNIT). larger representational capacity. We are not able to train a

9
100%
inclusion of the Data Layout encoder improves overall accu-
80% racy from 72.2% to 75.8%, indicating that the Data Layout
encoder is effective. The results are even more interesting
Accuracy

60% when the results are broken into the two partitions. The rel-
ative gain on the Function in not training partition is 13%
40% All (49.9% to 56.4%), compared to 1.7% on the Function in train-
20%
In train ing partition (88.8% to 89.9%). This suggests the Data Layout
Not in train encoder greatly improves DIRTY’s generalization ability.
0% Table 7 compares example predictions from DIRTY and
20% 40% 60% 80% 100%
Size of Training Set
DIRTYNDL on the same types from the Function not in
training partition. For the __int64 example, the type pre-
Figure 7: Effect of training data size. With 100% of the data, dictions from DIRTY mostly have the correct size of 8
the accuracies of All, In train, and Not in train are 75.8%, bytes. DIRTYNDL , however, often incorrectly predicts int and
89.9%, and 56.4% respectively. With 20%, these drop to unsigned int. This is understandable because in situations
67.9%, 82.3%, and 48.0% respectively. where the value doesn’t exceed the 32-bit integer, __int64
can be safely interchanged with int, these situations can be
identified in some decompiled code. However, apart from the
Model Overall In train Not in train
correctness of the retyped program, accuracy to the original
DIRTYNDL 72.2 88.4 49.9 binary, (i.e., allocating 8 bytes instead of 4), is also important.
DIRTY 75.8 89.9 56.4 DIRTY achieves this better than DIRTYNDL .
In the second example, the struct __m128d type occupies
Table 6: Effect of the Data Layout encoder on the accuracy 16 bytes, and has two members at offset 0 and 8. DIRTYNDL
of DIRTY. Accuracy is reported for the model with (DIRTY) mainly mistakes this structure as a double, which might make
and without (DIRTYNDL ) the encoder. sense semantically but is unacceptable syntactically. With the
Data Layout encoder, DIRTY effectively reduces these errors.
This demonstrates this component achieves the soft masking
larger model due to limits on computation power. effect on type prediction as intended in Section 2.4.
Dataset Size. We examine the impact of training data size on Multi-Task Decoder. In this section we study the effective-
prediction accuracy. As a data-driven approach, DIRTY relies ness of the Multi-Task decoder when compared to decoders
on a large-scale code dataset; studying the impact of data size designed for only retyping or only renaming. Inspecting the
gives us insight into the amount of data to collect. We trained accuracy numbers reported in Table 8, the Multi-Task decoder
DIRTY on 20%, 40%, 60%, 80% and 100% portion of the has similar, but slightly lower overall accuracy on both tasks
full training partitionand report the results in Figure 7. as the two specialized models (-0.8% for retyping and -1.3%
Figure 7 shows the change in accuracy with respect to the for renaming). One possible reason is that the Multi-Task
percentage of training data. Increasing the size of training model has twice the length of decoding lengths than a special-
data has a significant positive effect on the accuracy. Between ized model, which makes greedy decoding harder.
20% and 100% of the full size the accuracy increases from Despite the small decrease in performance, the unified
67.9% to 75.8%, a relative gain of 11.6%. model has advantages. These are illustrated in the XName and
Notably, accuracy on Function not in training has a relative XType columns of Table 8. XName and XType stand for the
gain of 17.5% much larger than on the Function in training subsets of the full dataset where the Multi-Task decoder makes
partition. This is likely because the Function in training parti- correct renaming predictions and correct retyping predictions,
tion contains common library functions shared by programs and we evaluate the retyping and renaming performance on
both in the training and test set, and even a smaller dataset them, respectively.10 The Multi-Task decoder outperforms
will have programs that use these functions. In contrast, the the specialized models by 1.9% and 2.4% relatively on these
Function not in training part is open-ended and diverse. metrics, in spite of the longer decoding length. This means
It is also worth noting that the accuracy drops sharply when the type and name predictions from the Multi-Task decoder
the training set size is decreased from 40% to 20%, justifying are more consistent with each other than from specialized
the necessity for using a large-scale dataset. models. In other words, making a correct prediction on one
task increases the probability of success on the other task.
Data Layout Encoder. We explore the impact of the Data In practice, this offers additional flexibility and opens the
Layout encoder on DIRTY’s performance. We experiment opportunity for more applications. For example, consider a
with a new model with no Data Layout encoder, DIRTYNDL . 10 The probability of success on the other task also increases by chance,
Table 6 shows the accuracy results overall and on the Func- because success on one task implies it is easier than average. We have
tion in training and Function not in training partitions. The eliminated this influence by, e.g., comparing 92.3 to 90.6, instead of 74.9.

10
DIRTY DIRTYNDL
__int64 struct __m128d __int64 struct __m128d

int64 74.3% struct m128d 78.7% __int64 67.0% double 33.1%

<Component> 5.7% <Component> 15.4% int 6.3% <Component> 27.2%
void * 1.7% void 2.9% <Component> 6.0% __int64 10.3%
char * 1.7% __int128 2.2% unsigned int 1.5% struct __m128d 5.9%
const char * 1.6% double 0.7% char * 1.2% int 3.7%

Table 7: Comparative examples from DIRTY with and without Data Layout encoder from the Function not in training partition.
Predictions inside a gray box have a different data layout than the ground truth type. DIRTY effectively suppresses these, which
helps guide the model to a correct prediction. The structure’s full type is struct __m128d {double[2] m128d_f64;}.

Retyping Renaming we believe -O0 code to be simpler. Going from -O0 to -O1,
Model Overall XName Overall XType DIRTY’s accuracy drops from 48.2% to 46.0%. However,
there is little difference in performance between -O1, -O2,
Retyping 75.8 90.6 - - and -O3. This suggests that DIRTY does slightly better on
Renaming - - 66.4 82.6 the optimization level of code it was trained on, but that the
Multi-Task 74.9 92.3 65.1 84.6 effect of optimizations is small. We believe this is because
Hex-Rays recognizes and will “undo” some optimizations
Table 8: Performance comparison of the Retyping-only, so that the decompiled code will be very similar. For exam-
Renaming-only, and Multi-Task decoders. Overall perfor- ple, unoptimized code will often reference stack variables
mance is shown, in addition to performance on retyping when using a frame pointer, but optimized code will reference such
the name is correct (XName) and performance on renaming variables using the stack pointer, or even maintain them in
when the type is correct (XType). a register. But both implementations will look similar in the
decompiled code, since the mechanism used to reference the
GNU coreutils variable is not important at the C level. Since DIRTY operates
Model -O0 -O1 -O2 -O3 on the decompiled code, the decompiler effectively insulates
DIRTY from these optimizations.
DIRTY 48.20 46.01 46.04 46.00

Table 9: Accuracy comparison of DIRTY on the GNU core-

utils benchmark compiled with -O0, -O1, -O2, and -O3 opti- 3.6 Illustration
mization levels.
To gain more qualitative insights into DIRTY’s predictions,
consider the example in Figure 8. The code shown is the Hex-
cooperative setting where a human decompilation expert uses Rays output, cleaned for presentation. Here, we would like to
DIRTY as an analysis tool. The human expert may be unsat- rename and retype the arguments a1, a2, and a3, in addition to
isfied with the model’s top prediction and want to switch to the variable v1. The table in Figure 8 shows the developer’s
another one in the top-k candidates list. With a Multi-Task de- chosen types and names together with DIRTY’s suggestions.
coder, the model adjusts the name prediction for that variable, DIRTY suggests the same types and names as the developer
which is impossible with the specialized decoders. for a3 and v1, and the same type but a different name for a2.
Although the names disagree for a2, we note that pic is an
abbreviation for picture, so the disagreement is minor. We
3.5 RQ4: Compiler Optimization Levels also observe that Picture_0 *, the type of a2 itself carries a
We study the impact of compiler optimizations on DIRTY’s lot of semantic information; even if DIRTY was unable to
accuracy. In keeping with the spirit of the O SPREY evaluation suggest a meaningful name, Picture_0 *a2 is still helpful for
on coreutils compiled with -O3, we choose coreutils as our reverse engineering.
evaluation dataset. However, since we did not have access to The developer and DIRTY disagree on both the name and
the original dataset used by O SPREY except -O0, we recom- the type of a1. In this case, the name chosen by DIRTY
piled GNU coreutils 3.2 ourselves using optimization levels (s) would probably not be considered a very useful im-
-O0, -O1, -O2, and -O3. Table 9 shows how accurately DIRTY provement over a1. However, the type suggested by DIRTY
is able to recover the full type (including type and field names) (MpegEncContext_0 *) could still be quite useful to a reverse
informaition at each optimization level. As expected, DIRTY engineer, even if it is inaccurate. It suggests that this argument
does best at -O0, since DIRTY is trained on -O0 code and is a “context”, and hints that this function is used for video.

11
int find_unused_picture(int a1, int a2, int a3) { from C binaries. However, they use much simpler machine
int i, j, v1;
if (a3) { learning algorithms and their dataset only consists of 23,482
for (i = <Num>;; ++i) { variables and 17 primitive types. Escalada et al. [14] has
if (i > <Num>)
goto LABEL_13; provided similar insights. They adopt simple classification
if (!*(*(<Num> * i + a2) + <Num>)) algorithms to predict function return types in C, but they only
break;
} consider from only 10 different (syntactic) types and their
v1 = i; dataset is limited to 2,339 functions from real programs and
} else {
for (j = <Num>;; ++j) { 18,000 synthetic functions.
if (j > <Num>) { Two other projects targeting the improvement of decom-
LABEL_13:
av_log(a1, <Num>, <Str>); piler output using neural models are DIRE [29], which pre-
abort(); dicts variable names, DIRECT [38], which extends DIRE
}
if (pic_is_unused(<Num> * j + a2)) using transformer-based models, and Nero [8], which gen-
break; erates procedure names. Other approaches work directly on
}
v1 = j; assembly [16, 26, 27], and learn code structure generation in-
} stead of aiming to recover developer-specified variable types
return v1;
} or names. Similarly, D EBIN [18] and CATI [5] use machine
learning to respectively predict debug information and types
directly from stripped binaries without a decompiler.
ID Developer DIRTY
a1 AVCodecContext_0 *avctx MpegEncContext_0 *s
5 Discussion
a2 Picture_0 *picture Picture_0 *pic
a3 int shared int shared In this paper we presented DIRTY, a novel deep learning-
v1 int result int result based technique for predicting variable types and names in
decompiled code. Still, DIRTY is limited in several ways that
Figure 8: Simplified Hex-Rays output. <Num> and <Str> are provide key opportunities for future improvements.
placeholder tokens for constant numbers and strings respec-
tively. The table summarizes the original developer names and Alternative Decompilers to Hex-Rays. We implement
types along with the names and types predicted by DIRTY. DIRTY on top of the Hex-Rays decompiler because of its
positive reputation and the programmatic access it affords to
decompiler internals. However, DIRTY is not fundamentally
4 Related Work specific to Hex-Rays, and the technique should conceptually
work with any decompiler that names variables using DWARF
Other projects related to type recovery for decompilation are debug symbols. Note that, due to its recent popularity and
REWARDS [33], TIE [31], Retypd [39], and O SPREY [58]. promise, we attempted to evaluate our techniques using the
Unlike our approach, they use program analyses to compute newer, open-source Ghidra decompiler. Unfortunately, it is
constraints on types. Additionally, they are either limited to currently infeasible, because Ghidra routinely failed to accu-
only predicting the syntactic type (TIE, Retypd, O SPREY), rately name stack variables based on DWARF. This appears
or only predicting one of a small set of hand-written types to be a combination of specific issues11 and the general de-
(150 for REWARDS). In comparison, DIRTY automatically sign of the decompiler. Ghidra’s decompiler consists of many
generates a database of types by observing real-world code. passes which modify and augment the current decompilation.
Other projects use machine learning to predict types, but tar- Some of these passes combine variables, but in doing so may
get different languages than DIRTY. D EEP T YPER [20] learns combine a DWARF-named variable with others. Since the
type inference for JavaScript and O PT T YPER [40], L AMB - combined variable no longer corresponds directly with the
DA N ET [52], R-GNNNS-CTX [56] target TypeScript. Training DWARF variable information, Ghidra discards the name. We
a machine learning algorithm for the task of typing dynamic are optimistic, however, that when the above-mentioned is-
languages like these is a slightly easier task: generating a sues are addressed, Ghidra may again be a reasonable target
parallel corpus is simple, since the types can simply be re- for our approach.
moved without changing the semantics. The DIRT dataset
Generalizing to Unseen Types. A limitation of DIRTY’s
is fundamentally different: including debug information of-
current decoder is that it can only predict types seen during
ten changes the layout of the code as the decompiler adds
training. Fortunately, there appears, empirically, to be suf-
structures and syntax for accessing them.
ficient redundancy across large corpora that DIRTY is still
To the best of our knowledge, the most directly-related frequently able to successfully recover structural types. This
work to DIRTY is TypeMiner [34]. TypeMiner is a pioneering
work, providing the proof-of-concept for recovering types 11 https://github.com/NationalSecurityAgency/ghidra/issues/2322

12
lends credence to the hypothesis that code is natural, an ob- Extending DIRTY to support higher-level languages such
servation that has been explored in several domains [9, 23]. as C++ is an interesting open problem. To some degree, as
It moreover appears that data layout is of particular impor- long as the decompiler is able to import the higher-level type
tance here: layout information recovered from the decompiler information from debug symbols into the decompiler output, it
impose key constraints on the overall prediction problem. In- should be possible to train DIRTY to recognize non-C types.
deed, our results in Section 3.4 corroborate the intuition that For instance, 6% of the programs in DIRT are written in
the Data Layout Encoder is especially important for succeed- C++, and our evaluation measures DIRTY’s ability to predict
ing on previously unseen code. common C++ class types such as std::string. But recovering
We envision meaningful future opportunities to more di- higher level properties of these types, especially for those
rectly expand DIRTY’s capabilities to predict unseen struc- never seen during training, is a challenging problem and is
tures. This problem is analogous machine translation models likely to require language-specific adaptations [13, 44].
that must deal with rare or compound words (e.g., xenopho-
Limited Input Length. As common with Transformers, we
bia) that are not present in their dictionary. Byte Pair Encod-
truncate the decompiled function if the length n exceeds
ing [45] (BPE) is the most frequently used technique to tackle
some upper limit max_seq_length, which makes training
this problem in the natural language domain. It automatically
more efficient. In our experiments we set max_seq_length
splits words into multiple tokens that are present in the dic-
to 512 for two reasons. First, 512 is the default value for
tionary (e.g., xeno and ##phobia). (The ## indicates the word
max_seq_length in many Transformer models [10, 50]. Sec-
is still part of the current word, instead of a new word next
ond, in DIRT, the average number of tokens in a function is
to it.) This technique greatly increases the number of words
220.3, and only 8.8% of the functions have more than 512
a model can handle despite a limited dictionary size, and en-
tokens, i.e., we exclude relatively few of the possible inputs
ables the composition of new words that were not seen during
encountered in the wild. Still, if enough computational re-
training. This suggests that we can similarly extend DIRTY’s
sources are available, we recommend using efficient Trans-
decoder to predict previously unseen types by decomposing
former implementations such as Big Bird [57] instead. These
structure types into multiple pieces with BPE. For example,
can deal with much larger max_seq_length and can be used
a structure type struct timeval {time_t tv_sec; suseconds_t
out-of-the-box to replace our implementation.
tv_usec;} is split into four separate tokens, which are 1) struct
timeval, 2) time_t tv_vec;, 3) suseconds_t tv_usec;, and 4)
<end_of_struct>. 6 Conclusion
However, unfortunately, our preliminary experiments sug-
gested that this hurts overall prediction accuracy. It also sig- The decompiler is an important tool used for reverse engi-
nificantly slows down prediction, since it drastically increases neering. While decompilers attempt to reverse compilation by
the number of decoding steps. It moreover requires finer- transforming binaries into high-level languages, generating
grained accuracy metrics, like tree distance, to allow us to the same code originally written by the developer is impossi-
measure and credit partially correct predictions. Based on ble. Many of the useful abstractions provided by high-level
these observations, we believe unseen structure types should languages such as loops, typed variables, and comments, are
be handled specially with a tailored model, a problem we irreversibly destroyed by compilation. Decompilers are able
leave to future work. to deterministically reconstruct some structural properties of
code, but comments, variable names, and custom variable
Supporting Non-C Languages. A benefit of decompiling to
types are technically impossible to recover.
C is that as a relatively low-level language, it can express the
In this paper we address the problem of assigning decom-
behavior of executables beyond those written in C. Although
piled variables meaningful names and types by statistically
we designed DIRTY to be used with C programs and types,
modeling how developers write code. We present DIRTY (De-
DIRTY can run on non-C programs, and will try to identify
compIled variable ReTYper), a novel technique for improving
the C type that best captures the way in which that variable is
the quality of decompiler output that automatically generates
being used. Thus, DIRTY provides value to analysts seeking
meaningful variable names and types. Empirical evaluation
to understand non-C programs, similar to how C decompilers
of DIRTY on a novel dataset of C code mined from GitHub
such as Hex-Rays help analysts to understand C++ programs.
shows that DIRTY outperforms prior work approaches by
However, many compiled programming languages have
a sizable margin, recovering the original names written by
type systems far richer than C’s, and expressing these types
developers 66.4% of the time and the original types 75.8% of
in terms of C types may be confusing. For example, in C++,
the time.
virtual functions are often implemented by reading an address
out of a virtual function table [13, 44]. Although techniques
like DIRTY can recognize such tables as structs or arrays of
code pointers, it does not reveal the connection to the higher-
level C++ behavior of virtual functions.

13
References Thomas Unterthiner, Mostafa Dehghani, Matthias
Minderer, Georg Heigold, Sylvain Gelly, et al. An
[1] Daniel Adiwardana, Minh-Thang Luong, David R. So, image is worth 16x16 words: Transformers for image
Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, recognition at scale. arXiv preprint arXiv:2010.11929,
Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and 2020.
Quoc V. Le. Towards a human-like open-domain chat-
bot. arXiv preprint arXiv:2001.09977, 2020. [12] Lukas Durfina, Jakub Kroustek, and Petr Zemek. PsybOt
malware: A step-by-step decompilation case study. In
[2] Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Working Conference on Reverse Engineering, WCRE,
and Charles Sutton. A survey of machine learning for pages 449–456, 2013.
big code and naturalness. ACM Computing Surveys
(CSUR), 51(4):81, 2018. [13] Rukayat Ayomide Erinfolami and Aravind Prakash.
Devil is virtual: Reversing virtual inheritance in C++
[3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- binaries. In Proceedings of the ACM Conference on
gio. Neural machine translation by jointly learning to Computer and Communications Security, CCS, pages
align and translate. In International Conference on 133–148, 2020.
Learning Representations, ICLR, 2015.
[14] Javier Escalada, Ted Scully, and Francisco Ortin. Im-
[4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie proving type information inferred by decompilers
Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- with supervised machine learning. arXiv preprint
lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, arXiv:2101.08116, 2021.
et al. Language models are few-shot learners. arXiv
preprint arXiv:2005.14165, 2020. [15] William Fedus, Barret Zoph, and Noam Shazeer. Switch
transformers: Scaling to trillion parameter models
[5] Ligeng Chen, Zhongling He, and Bing Mao. CATI: with simple and efficient sparsity. arXiv preprint
Context-assisted type inference from stripped binaries. arXiv:2101.03961, 2021.
In International Conference on Dependable Systems
and Networks, DSN, 2020. [16] Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuan-
dong Tian, Farinaz Koushanfar, and Jishen Zhao. Coda:
[6] Rewon Child, Scott Gray, Alec Radford, and Ilya An end-to-end neural program decompiler. In Con-
Sutskever. Generating long sequences with sparse trans- ference on Neural Information Processing Systems,
formers. 2019. NeurIPS, 2019.
[7] Kyunghyun Cho, Bart van Merriënboer, Caglar Gul- [17] Edward M. Gellenbeck and Curtis R. Cook. An inves-
cehre, Dzmitry Bahdanau, Fethi Bougares, Holger tigation of procedure and variable names as beacons
Schwenk, and Yoshua Bengio. Learning phrase rep- during program comprehension. Technical report, Ore-
resentations using RNN encoder-decoder for statistical gon State University, 1991.
machine translation. In Conference on Empirical Meth-
ods in Natural Language Processing, EMNLP, 2014. [18] Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Ray-
chev, and Martin Vechev. D EBIN: Predicting debug
[8] Yaniv David, Uri Alon, and Eran Yahav. Neural re- information in stripped binaries. In Conference on Com-
verse engineering of stripped binaries using augmented puter and Communications Security, CCS, 2018.
control flow graphs. Proceedings of the ACM on Pro-
gramming Languages, 4(OOPSLA):1–28, 2020. [19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Sun. Deep residual learning for image recognition.
[9] Premkumar Devanbu. New initiative: The naturalness In IEEE Conference on Computer Vision and Pattern
of software. In International Conference on Software Recognition, CVPR, pages 770–778, 2016.
Engineering, ICSE, pages 543–546, 2015.
[20] Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Miltiadis Allamanis. Deep learning type inference. In
Kristina Toutanova. BERT: Pre-training of deep bidi- Joint Meeting of the European Software Engineering
rectional transformers for language understanding. In Conference and the Symposium on the Foundations of
Annual Conference of the North American Chapter of Software Engineering, ESEC/FSE, 2018.
the Association for Computational Linguistics, NAACL-
HLT, pages 4171–4186, 2019. [21] Dan Hendrycks and Kevin Gimpel. Gaussian error linear
units (GELUs). arXiv preprint arXiv:1606.08415, 2016.
[11] Alexey Dosovitskiy, Lucas Beyer, Alexander
Kolesnikov, Dirk Weissenborn, Xiaohua Zhai,

14
[22] Hex-Rays. The hex-rays decompiler, 2019. URL https: execution. In CERIAS Annual Security Symposium, CE-
//www.hex-rays.com/products/decompiler/. RIAS, 2010.
[23] Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, [34] Alwin Maier, Hugo Gascon, Christian Wressnegger, and
and Premkumar Devanbu. On the naturalness of soft- Konrad Rieck. TypeMiner: Recovering types in binary
ware. In International Conference on Software Engi- programs using machine learning. In International
neering, ICSE, pages 837–847. IEEE, 2012. Conference on Detection of Intrusions and Malware,
and Vulnerability Assessment, DIMVA, pages 288–308,
[24] Sepp Hochreiter and Jürgen Schmidhuber. Long short- 2019.
term memory. Neural Computation, 9(8):1735–1780,
1997. [35] Paul Michel and Graham Neubig. Extreme adaptation
for personalized neural machine translation. In Annual
[25] Alan Jaffe, Jeremy Lacomis, Edward J. Schwartz, Claire Meeting of the Association for Computational Linguis-
Le Goues, and Bogdan Vasilescu. Meaningful variable tics (Short Papers), ACL, pages 312–318, 2018.
names for decompiled code: A machine translation ap-
proach. In International Conference on Program Com- [36] Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, and
prehension, ICPC, pages 20–30, May 2018. Bing Xiang. Abstractive text summarization using
sequence-to-sequence RNNs and beyond. In SIGNLL
[26] Deborah S. Katz, Jason Ruchti, and Eric Schulte. Using Conference on Computational Natural Language Learn-
recurrent neural networks for decompilation. In Interna- ing, CoNLL, pages 280–290, 2016.
tional Conference on Software Analysis, Evolution and
Reegnineering, SANER, pages 346–356, 2018. [37] Hermann Ney, Dieter Mergel, Andreas Noll, and
Annedore Paeseler. A data-driven organization of the dy-
[27] Omer Katz, Yuval Olshaker, Yoav Goldberg, and Eran namic programming beam search for continuous speech
Yahav. Towards neural decompilation. arXiv preprint recognition. In International Conference on Acoustics,
arXiv:1905.08325, 2019. Speech, and Signal Processing, ICASSP, pages 833–836,
[28] Diederik P. Kingma and Jimmy Ba. Adam: A method 1987.
for stochastic optimization. In International Conference [38] Vikram Nitkin, Anthony Saieva, Baishakhi Ray, and
on Learning Representations, ICLR, 2015. Gail Kaiser. DIRECT: A transformer-based model for
[29] Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, decompiled identifier renaming. In Workshop on Natu-
Miltiadis Allamanis, Claire Le Goues, Graham Neubig, ral Language Processing for Programming, NLP4Prog,
and Bogdan Vasilescu. DIRE: A neural approach to 2021.
decompiled identifier naming. In International Confer- [39] Matthew Noonan, Alexey Loginov, and David Cok.
ence on Automated Software Engineering, ASE, pages Polymorphic type inference for machine code. In Con-
628–639, 2019. ference on Programming Language Design and Imple-
[30] Dawn Lawrie, Christopher Morrell, Henry Feild, and mentation, PLDI, pages 27–41, 2016.
David Binkley. What’s in a name? A study of identifiers. [40] Irene Vlassi Pandi, Earl T Barr, Andrew D Gordon, and
In International Conference on Program Comprehen- Charles Sutton. OptTyper: Probabilistic type inference
sion, ICPC, pages 3–12, 2006. by optimising logical and natural constraints. arXiv
[31] JongHyup Lee, Thanassis Avgerinos, and David Brum- preprint arXiv:2004.00348, 2020.
ley. TIE: Principled reverse engineering of types in
[41] Adam Paszke, Sam Gross, Francisco Massa, Adam
binary programs. In Network and Distributed System
Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,
Security Symposium, NDSS, 2011.
Zeming Lin, Natalia Gimelshein, Luca Antiga, et al.
[32] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan PyTorch: An imperative style, high-performance deep
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, learning library. In Conference on Neural Information
Veselin Stoyanov, and Luke Zettlemoyer. BART: De- Processing Systems, NeurIPS, pages 8024–8035. 2019.
noising sequence-to-sequence pre-training for natural
[42] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
language generation, translation, and comprehension. In
Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei
Annual Meeting of the Association for Computational
Li, and Peter J. Liu. Exploring the limits of transfer
Linguistics, ACL, pages 7871–7880, 2020.
learning with a unified text-to-text transformer. Journal
[33] Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Auto- of Machine Learning Research, 21:1–67, 2020.
matic reverse engineering of data structures from binary

15
[43] Eric Schulte, Jason Ruchti, Matt Noonan, David Ciar- [52] Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig.
letta, and Alexey Loginov. Evolving exact decompila- LambdaNet: Probabilistic type inference using graph
tion. In Workshop on Binary Analysis Research, BAR, neural networks. In International Conference on Learn-
2018. ing Representations, ICLR, 2020.
[44] Edward J. Schwartz, Cory F. Cohen, Michael Duggan, [53] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho,
Jeffrey Gennari, Jeffrey S. Havrilla, and Charles Hines. Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and
Using logic programming to recover C++ classes and Yoshua Bengio. Show, attend and tell: Neural image
methods from compiled executables. In Conference on caption generation with visual attention. In Interna-
Computer and Communications Security, CCS, 2018. tional Conference on Machine Learning, ICML, pages
2048–2057, 2015.
[45] Rico Sennrich, Barry Haddow, and Alexandra Birch.
Neural machine translation of rare words with subword [54] Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-
units. arXiv preprint arXiv:1508.07909, 2015. Padilla, and Matthew Smith. No more gotos: Decom-
pilation using pattern-independent control-flow struc-
[46] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, turing and semantics-preserving transformations. In
Nick Stephens, Mario Polino, Andrew Dutcher, John Network and Distributed System Security Symposium,
Grosen, Siji Feng, Christophe Hauser, Christopher NDSS, 2015.
Kruegel, and Giovanni Vigna. (State of) The art of war:
Offensive techniques in binary analysis. In Symposium [55] Khaled Yakdan, Sergej Dechand, Elmar Gerhards-
on Security and Privacy, SP, pages 138–157, 2016. Padilla, and Matthew Smith. Helping Johnny to analyze
malware: A usability-optimized decompiler and mal-
[47] Asia Slowinska, Traian Stancescu, and Herbert Bos. ware analysis user study. In Symposium on Security and
Howard: A dynamic excavator for reverse engineering Privacy, SP, pages 158–177, 2016.
data structures. In Network and Distributed System Se-
curity Symposium, NDSS, 2011. [56] Fangke Ye, Jisheng Zhao, and Vivek Sarkar. Advanced
graph-based deep learning for probabilistic type infer-
[48] Katerina Troshina, Yegor Derevenets, and Alexander ence. arXiv preprint arXiv:2009.05949, 2020.
Chernov. Reconstruction of composite types for decom-
pilation. In Working Conference on Source Code Analy- [57] Manzil Zaheer, Guru Guruganesh, Avinava Dubey,
sis and Manipulation, SCAM, pages 179–188, 2010. Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip
Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. Big
[49] Michael James van Emmerik. Static Single Assignment Bird: Transformers for longer sequences. arXiv preprint
for Decompilation. PhD thesis, University of Queens- arXiv:2007.14062, 2020.
land, 2007.
[58] Z. Zhang, Y. Ye, W. You, G. Tao, W. Lee, Y. Kwon,
[50] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Y. Aafer, and X. Zhang. OSPREY: Recovery of variable
Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and data structure via probabilistic analysis for stripped
and Illia Polosukhin. Attention is all you need. In binary. In Symposium on Security and Privacy, SP, pages
Conference on Neural Information Processing Systems, 872–891, 2021.
NeurIPS, pages 6000–6010, 2017.
[59] Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei,
[51] Nguyen Xuan Vinh, Julien Epps, and James Bailey. In- Yonghwi Kwon, and Xiangyu Zhang. BDA: Practical
formation theoretic measures for clusterings compari- dependence analysis for binary executables by unbiased
son: Variants, properties, normalization and correction whole-program path sampling and per-path abstract in-
for chance. The Journal of Machine Learning Research, terpretation. Proceedings of the ACM on Programming
11:2837–2854, 2010. Languages, 3(OOPSLA):1–31, 2019.

16
A Experimental Setup Hyperparameter DIRTY DIRTYS

Hyperparameter Configurations Our detailed hyperpa- Max Sequence Length 512 512
rameters are shown in Table 10. We use a six-layer Trans- Encoder/Decoder layers 6/6 3/3
former Encoder for the code encoder, a three-layer Trans- Hidden units per layer 512 256
former Encoder for the data layout encoder, and a six-layer Attention heads 8 4
Transformer Decoder for the type decoder. We set the num- Layout encoder layers 3 3
ber of attention heads to 8. Input embedding dimensions and Layout encoder hidden units 256 128
hidden sizes dmodel are set to 512 for the code encoder, and Batch size 64 64
256 for the data layout encoder. Following prior work, we Training epochs 15 30
empirically set the size of the inner-layer of the position- Learning rate 10−4 10−4
wise feed-forward inner representation size d f f to four times Adam ε 10−8 10−8
the hidden size dmodel [50]. We use the gelu activation func- Adam β1 0.9 0.9
tion [21] rather than the standard relu, following BERT [10]. Adam β2 0.999 0.999
During training, we set the batch size to 64 and the learning Gradient clipping 1.0 1.0
rate to 1 × 10−4 . We use the Adam optimizer [28] and set Dropout rate 0.1 0.1
β1 = 0.9, β2 = 0.999 and ε = 1 × 10−8 . We apply gradient Number of parameters 167M 40M
clipping by value within the range [−1, 1]. We also apply
a dropout rate of 0.1 as regularization. We train the model Table 10: Summary of the hyperparameters of DIRTY and
for 15 epochs. At the inference time, we use beam search to the smaller DIRTYS .
predict the types for each function with a beam size of 5.

Hardware Configuration We conducted all experiments Dataset DIRTY

on Linux servers equipped with two Intel Xeon Gold 6148
#Binaries 75,656
processors, 192GB RAM and 8 NVIDIA Volta V100 GPUs.
Unique #functions (train) 718,765
We expect that a similar machine could reproduce the full
Unique #functions (valid) 139,473
training and testing stage of our main experiments in 120
Unique #functions (test) 139,394
GPU hours.
% func body in train (valid) 64.6%
% func body in train (test) 65.5%
Software We implemented our models with PyTorch [41] Avg. #code tokens 220.3
version 1.5.1 and Python 3.6. We plan to release our dataset, Median #code tokens 86
code and pre-trained models at publication time. Avg. #identifiers per function 5.1
Median #identifiers per function 3

Table 11: Statistics of the DIRT datasets.

Improving Disassembly and Decompilation
No ratings yet
Improving Disassembly and Decompilation
99 pages
Reverse Engineering
No ratings yet
Reverse Engineering
39 pages
Improving Disassembly and de Compilation
No ratings yet
Improving Disassembly and de Compilation
111 pages
A-Level - Paper 3 - Study Guide
No ratings yet
A-Level - Paper 3 - Study Guide
55 pages
Master PDF
No ratings yet
Master PDF
334 pages
Session 3
No ratings yet
Session 3
49 pages
Pycode
No ratings yet
Pycode
55 pages
PPL Unit-II - Datatypes Final
No ratings yet
PPL Unit-II - Datatypes Final
154 pages
Improving Disassembly and Decompilation
No ratings yet
Improving Disassembly and Decompilation
102 pages
Unit II
No ratings yet
Unit II
38 pages
Shortcuts in Windows 11
100% (3)
Shortcuts in Windows 11
8 pages
Programming Language
No ratings yet
Programming Language
23 pages
Principled Reverse Engineering of Types in Binary Programs
No ratings yet
Principled Reverse Engineering of Types in Binary Programs
18 pages
Analysis of Pointers and Structure
No ratings yet
Analysis of Pointers and Structure
15 pages
Cifuentes 95 Decompilation
No ratings yet
Cifuentes 95 Decompilation
19 pages
Improving Disassembly and Decompilation
100% (1)
Improving Disassembly and Decompilation
88 pages
Obf Signal
No ratings yet
Obf Signal
16 pages
Cracklab - Team - Codisasm
No ratings yet
Cracklab - Team - Codisasm
13 pages
Dirty Sec22 Chen Qibin
No ratings yet
Dirty Sec22 Chen Qibin
18 pages
CSC 204 Data Structure-1
No ratings yet
CSC 204 Data Structure-1
31 pages
1.1 Static Reverse Engineering
No ratings yet
1.1 Static Reverse Engineering
10 pages
Android From Reversing To Decompilation
No ratings yet
Android From Reversing To Decompilation
24 pages
Introduction To Software Reverse Engineering With Ghidra Session 2: C To ASM
No ratings yet
Introduction To Software Reverse Engineering With Ghidra Session 2: C To ASM
49 pages
Automatic Reverse Engineering of Data Structures From Binary Execution
No ratings yet
Automatic Reverse Engineering of Data Structures From Binary Execution
18 pages
Report MPA On Assembler in C++
No ratings yet
Report MPA On Assembler in C++
17 pages
Debugging PDF
No ratings yet
Debugging PDF
29 pages
Decompilation
No ratings yet
Decompilation
6 pages
C++ Reverse Disassembly
100% (2)
C++ Reverse Disassembly
33 pages
Reverse Engineering COBOL Via Formal Met
No ratings yet
Reverse Engineering COBOL Via Formal Met
25 pages
Unit 2
No ratings yet
Unit 2
13 pages
C Decompilation PDF
No ratings yet
C Decompilation PDF
15 pages
Report On Decompilation
No ratings yet
Report On Decompilation
4 pages
Designing An Object Oriented Decompiler
No ratings yet
Designing An Object Oriented Decompiler
31 pages
12th Certificate
No ratings yet
12th Certificate
1 page
Chapter 4 - Data Types
No ratings yet
Chapter 4 - Data Types
24 pages
More Code Generation and Optimization: Pat Morin COMP 3002
No ratings yet
More Code Generation and Optimization: Pat Morin COMP 3002
33 pages
04 Reversing Tools
No ratings yet
04 Reversing Tools
22 pages
A Decompiler
No ratings yet
A Decompiler
4 pages
Compiler Assignment 1
No ratings yet
Compiler Assignment 1
4 pages
C Solution 2005
No ratings yet
C Solution 2005
15 pages
DECOMPILER
100% (1)
DECOMPILER
17 pages
A Comb For Decompiled C Code
No ratings yet
A Comb For Decompiled C Code
15 pages
Reverse Engineering Linux ELF Binaries On The x86 Platform: (C) 2002 Sean Burford The University of Adelaide
No ratings yet
Reverse Engineering Linux ELF Binaries On The x86 Platform: (C) 2002 Sean Burford The University of Adelaide
68 pages
Lovely Professional University: Phagwara Punjab
No ratings yet
Lovely Professional University: Phagwara Punjab
8 pages
SH125-150cc - 2015 Khoa Thong Minh
No ratings yet
SH125-150cc - 2015 Khoa Thong Minh
106 pages
Eturn Riented Bfuscation: Vivek Balachandran, Sabu Emmanuel and NG Wee Keong
No ratings yet
Eturn Riented Bfuscation: Vivek Balachandran, Sabu Emmanuel and NG Wee Keong
12 pages
Reverse Engineering
No ratings yet
Reverse Engineering
4 pages
ReverseEngineeringMachineCode1 PDF
No ratings yet
ReverseEngineeringMachineCode1 PDF
60 pages
The Body As Medium and Metaphor
83% (6)
The Body As Medium and Metaphor
213 pages
Desquirr Master Thesis
No ratings yet
Desquirr Master Thesis
31 pages
Reverse Engineering Of: Object Oriented Code
No ratings yet
Reverse Engineering Of: Object Oriented Code
2 pages
Introduction To SystemVerilog and Verification
No ratings yet
Introduction To SystemVerilog and Verification
107 pages
Portnoy
No ratings yet
Portnoy
5 pages
A15 Disassembler
No ratings yet
A15 Disassembler
13 pages
Haskell and Yesod
100% (1)
Haskell and Yesod
265 pages
Present Perfect vs. Past Simple & Past Continuous
No ratings yet
Present Perfect vs. Past Simple & Past Continuous
6 pages
Backward Chaining
No ratings yet
Backward Chaining
16 pages
Binary Code Obfuscation Through C++ Template Metaprogramming
No ratings yet
Binary Code Obfuscation Through C++ Template Metaprogramming
12 pages
Compiler Design Code Generation
No ratings yet
Compiler Design Code Generation
4 pages
Learning Management System SRS
No ratings yet
Learning Management System SRS
6 pages
Historiography and Identity Ii Postroman Multiplicity and New Political Identities Helmut Reimitz Instant Download
No ratings yet
Historiography and Identity Ii Postroman Multiplicity and New Political Identities Helmut Reimitz Instant Download
85 pages
Escaping Entrapment Gothic Heroines in Cinema
No ratings yet
Escaping Entrapment Gothic Heroines in Cinema
194 pages
Ajp 22517 Important Questions
No ratings yet
Ajp 22517 Important Questions
14 pages
What Is Computer?
No ratings yet
What Is Computer?
17 pages
Makkah Before 20th Century
No ratings yet
Makkah Before 20th Century
14 pages
Cot1 Ap 2019
67% (3)
Cot1 Ap 2019
2 pages
Manual FastReport
No ratings yet
Manual FastReport
493 pages
Assertions and Counterclaims Detailed LP
No ratings yet
Assertions and Counterclaims Detailed LP
10 pages
Systemverilog Q&A: Normal Inline Assertion Example
No ratings yet
Systemverilog Q&A: Normal Inline Assertion Example
10 pages
The Sacred Revolution Propaganda and Personality Cult in North Korea
No ratings yet
The Sacred Revolution Propaganda and Personality Cult in North Korea
18 pages
Chapter 4 Basic Probability
No ratings yet
Chapter 4 Basic Probability
41 pages
Java Test - Day 1 - July 1
No ratings yet
Java Test - Day 1 - July 1
54 pages
C Program To Implement A Stack: Problem Description
No ratings yet
C Program To Implement A Stack: Problem Description
9 pages
Context Driven
No ratings yet
Context Driven
18 pages
I&O Device Simulation 2.0
No ratings yet
I&O Device Simulation 2.0
7 pages
Marriland Team Builder For Pokémon Teams
No ratings yet
Marriland Team Builder For Pokémon Teams
1 page
SAP BODS Training v1.0
No ratings yet
SAP BODS Training v1.0
10 pages
Ten Promised Paradise Suhaba
No ratings yet
Ten Promised Paradise Suhaba
2 pages
Learn These 4 Word Stress Rules To Improve Your Pronunciation
No ratings yet
Learn These 4 Word Stress Rules To Improve Your Pronunciation
5 pages
Realize and Design An 8 Bit Serial in and Serial Out Shift Register Using Two 4 Bit Shift Register
No ratings yet
Realize and Design An 8 Bit Serial in and Serial Out Shift Register Using Two 4 Bit Shift Register
5 pages
Absolute Beginner S1 #1 Are You Indonesian?: Lesson Transcript
No ratings yet
Absolute Beginner S1 #1 Are You Indonesian?: Lesson Transcript
4 pages
Dos and Donts in Writing A Chapter Summary
No ratings yet
Dos and Donts in Writing A Chapter Summary
2 pages
Ee Mungu Unilinde 122017
No ratings yet
Ee Mungu Unilinde 122017
1 page
The Complete C++ Programming Guide
From Everand
The Complete C++ Programming Guide
gareth thomas
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Update to Modern C++
From Everand
Update to Modern C++
James Raynard
No ratings yet
C + +: C++ programming
From Everand
C + +: C++ programming
Ummed Singh
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
C++ Regular Expressions Simplified: A Practical Guide with Examples
From Everand
C++ Regular Expressions Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ Functional Programming for Starters: A Practical Guide with Examples
From Everand
C++ Functional Programming for Starters: A Practical Guide with Examples
William E. Clark
No ratings yet

Augmenting Decompiler Output With Learned Variable Names and Types

Uploaded by

Augmenting Decompiler Output With Learned Variable Names and Types

Uploaded by

Augmenting Decompiler Output with Learned Variable Names and Types

Qibin Chen* , Jeremy Lacomis* , Edward J. Schwartz† ,

Abstract that maintain an index variable, increment it each iteration of

infeasible for large functions. A simple approach is greedy Loc_

2.4 Data Layout Encoder m1

We conducted experiments to evaluate DIRTY, answering the

better reflects how developers define variables. 5 https://github.com/huzecong/ghcc

int 88.8% char * 60.3% class std::string 47.5%

__int64 74.3% struct __m128d 78.7% __int64 67.0% double 33.1%

Table 9: Accuracy comparison of DIRTY on the GNU core-

Hardware Configuration We conducted all experiments Dataset DIRTY

Table 11: Statistics of the DIRT datasets.

You might also like

int64 74.3% struct m128d 78.7% __int64 67.0% double 33.1%