0% found this document useful (0 votes)
26 views14 pages

Survey of Techniques To Detect Common Weaknesses in Program Binaries

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views14 pages

Survey of Techniques To Detect Common Weaknesses in Program Binaries

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Cyber Security and Applications 3 (2025) 100061

Contents lists available at ScienceDirect

Cyber Security and Applications


journal homepage: http://www.keaipublishing.com/en/journals/cyber-security-and-applications/

Survey of techniques to detect common weaknesses in program binaries


Ashish Adhikari∗, Prasad Kulkarni∗
University of Kansas, Department of EECS, Lawrence, 66045, KS, USA

a r t i c l e i n f o a b s t r a c t

Keywords: Software vulnerabilities resulting from coding weaknesses and poor development practices are common. Attack-
CWE detection ers can exploit these vulnerabilities and impact the security and privacy of end-users. Most end-user software
Static analysis is distributed as program binaries. Therefore, to increase trust in third-party software, researchers have built
Software binaries
techniques and tools to detect and resolve different classes of coding weaknesses in binary software. Our work
is motivated by the need to survey the state-of-the-art and understand the capabilities and challenges faced by
binary-level techniques that were built to detect the most important coding weaknesses in software binaries.
Therefore, in this paper, we first show the most critical coding weaknesses for compiled programming languages.
We then survey, explore, and compare the static techniques that were developed to detect each such coding weak-
ness in software binaries. Our other goal in this work is to discover and report the state of published open-source
implementations of static binary-level security techniques. For the open-source frameworks that work as docu-
mented, we independently evaluate their effectiveness in detecting code vulnerabilities on a suite of program
binaries. To our knowledge, this is the first work that surveys and independently evaluates the performance of
state-of-the-art binary-level techniques to detect weaknesses in binary software.

1. Introduction guages like Rust and Go belong to this category of safe programming
languages.
Technology and software have become integral to our daily lives. Alternatively, unsafe programming languages, like C and C++, are
More software is now present in more systems, including many embed- low-level languages with poor built-in memory, type, and thread safety.
ded devices, like refrigerators and microwave ovens, to cars and planes. Code bugs and missing safety oversight for vulnerable code constructs
Additionally, new software features continue to be added as the hard- are widespread in programs written using these languages [4]. In spite
ware, including the processor, memory, and storage, becomes faster, of these safety concerns and even though memory-safe language alter-
larger, and/or more capable. Thus, software programs also continue to natives are available, C/C++ remains popular due to the large amount
grow in size and perhaps, complexity. of existing legacy code, and low-level features of these languages that
Given this growth in the amount of software in use, it is no surprise are desired by many performance and memory critical, embedded, and
that the number of reported code vulnerabilities has been increasing in real-time systems. Consequently, C and C++ separately and consis-
number and severity for many years [1]. At the same time, software vul- tently rank among the top five most popular programming languages
nerabilities have been found to cause many disastrous real-world attacks according to the TIOBE index 1 .
[2,3]. In spite of this existing state of affairs where code bugs, vulnerabil-
Software vulnerabilities are caused by weaknesses or flaws in the ities, and exploits are commonplace, most available software has not
program code. These weaknesses may then be exploited to compromise been independently and rigorously evaluated for its security properties.
the security or integrity of the system. Code in any language can be Most ordinary customers don’t have the option of knowing the security,
insecure when it is not developed with due care. However, some pro- safety, and reliability properties of the software they buy and use. Such
gramming languages are designed with features that make them im- unevaluated or under-evaluated third-party software libraries may also
mune or more resistant to certain types of weaknesses. Such safer lan-
guages typically provide built-in mechanisms for memory management,
1
input validation, type safety, and other security-related features. Lan- https://www.tiobe.com/tiobe-index/.

Peer review under responsibility of KeAi Communications Co., Ltd.



Corresponding authors.
E-mail addresses: [email protected], [email protected] (A. Adhikari), [email protected] (P. Kulkarni).

https://doi.org/10.1016/j.csa.2024.100061
Received 31 October 2023; Received in revised form 9 May 2024; Accepted 22 May 2024
Available online 28 May 2024
2772-9184/© 2024 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

be integrated into other products, that may then even be shipped by 1. We present the top CWEs or weaknesses affecting compiled language
developers we trust. binaries.
Researchers have made important strides to resolve the issues regard- 2. For each top CWE, we survey and describe the cutting-edge static
ing code bugs vulnerabilities, and software exploits. Efforts have been approaches built to detect that weakness in program binaries.
made to understand and categorize the important software weaknesses. 3. We find the binary-level approaches that are available open-source,
A curated community-developed list of the top software and hardware build and independently evaluate them when possible, or reveal the
weaknesses was made, which is called the Common Weakness Enumera- challenges when not.
tions (CWE) 2 . Each year, a new top CWE list is developed and released 4. We build working prototypes for two open-source binary-level tools,
that lists the weaknesses that contributed most to the vulnerabilities dis- and compare their accuracy and shortcomings for several important
covered. This list can help developers and security practitioners address CWEs using multiple standard benchmark programs.
the top vulnerabilities by educating them and permeating the knowl-
edge on how they can be eliminated. The rest of this paper is organized as follows. We present related
When source code is available, manual code reviews to identify soft- work in Section 2. We describe the methodology we used to conduct
ware defects are still a common practice. Researchers have also devel- this survey in Section 3. We discuss the major static binary analysis
oped many automated techniques and tools to find weaknesses and vul- techniques in Section 4. We survey the techniques that have been de-
nerabilities in high-level source code. Lint and PC-Lint may be some of veloped to detect each of the top 10 CWEs for compiled language bi-
the oldest automated tools developed to detect programming errors and naries in Section 5. We search for open-source implementations of the
stylistic defects in C and C++ programs [5]. Similar and more recent techniques covered in Section 5, attempt to build and evaluate them,
source-level tools, commonly called SAST or Static Application Security when available, and report our results and observations in Section 6.
Testing tools, include Sonarqube 3 , CodeSonar 4 , Coverty 5 , Flawfinder We discuss the issues and directions for future advancement of static
6 , Klocwork 7 , and many others 8 . Most of the available source-level analysis techniques for vulnerability detection in Section 7. We present
SAST tools are commercial and intended to be used at the developer’s the limitations of our current work in Section 8. Finally, we list avenues
end to improve software quality. for future work and present our conclusions in Sections 9 and 10, re-
However, most software is distributed in its binary form and with- spectively.
out access to the original source code. Binary-level software is typically
harder to comprehend and evaluate compared to source-level code that 2. Related works
is written in a higher-level programming language. Source-level SAST
tools cannot analyze third-party binary executables and libraries and are In this section, we review the existing literature on surveys and tech-
blind to security-reducing decisions made by the compiler, including re- niques for detecting Common Weakness Enumerations (CWEs) and soft-
moval of security checks and memory clearing operations 9 as dead code ware vulnerabilities for binaries and compare them with our work in
10 . Thus, these SAST tools cannot be used to independently confirm the
this paper. A summary of the most relevant related works is presented
security of the distributed software for end-users of software. in Table 1.
A handful of binary-level SAST tools have also been developed to We did not find any previous work that compiled and presented a
detect weaknesses and vulnerabilities in software binaries. Examples in- comprehensive survey on techniques and tools for the detection of mul-
clude Grammatech’s CodeSonar for binaries 11 and Veracode SAST 12 . tiple top CWEs. However, there are several studies that have surveyed
However, these tools are commercial and also intended to be employed detection techniques for specific CWEs. A majority of these works only
at the developer’s end to analyze the interaction of any third-party bi- study and evaluate source-level techniques. For example, Byun et al. uti-
nary libraries with the developer’s code base. These tools are mostly lized the CMBC tool on the Juliet Test Suite to evaluate and detect CWEs
inaccessible to independent evaluators and the precision and coverage [6]. Other research explored Natural Language Processing (NLP) tech-
of these binary-level SAST tools has never been independently explored niques to generate source code embeddings, that then aid in the auto-
and evaluated. matic detection and classification of software vulnerabilities [7]. Cruzes
Other researchers have also developed binary-level techniques to de- et al. conducted a thorough survey investigating techniques for detect-
tect weaknesses and vulnerabilities in binary software, to understand ing only the Buffer Overflow (BO) vulnerabilities [8]. They organized
the behavior of closed-source software and malware, to assess compli- the techniques into multiple categories and found common limitations
ance with standards, and to enhance software security. These techniques of the techniques in each of their categories. While comprehensive, their
are not only crucial for developers who frequently utilize third-party li- study specifically focused on buffer overflow vulnerabilities and their
braries in their software, but also useful for end-users of software when techniques focused on source-code level techniques.
choosing between competing software options. Our goal in this work is In this work, we primarily focus on comparing (binary-level) static
to survey and evaluate the current state-of-the-art techniques designed analysis based techniques for CWE detection. We found several other
to address the most important software weaknesses for program bina- works that similarly focused on studying static analysis based techniques
ries. for CWE detection. Lipp et al. conducted an empirical study on the ef-
Specifically, we make the following contributions to this work. fectiveness of static C code analyzers for real-world vulnerability detec-
tion [15]. They assessed the ability of several open-source tools and one
commercial static C analyzer and found that all current tools do a poor
2
https://cwe.mitre.org/. job at detecting real-world vulnerabilities, even when they performed
3
https://www.sonarqube.org/downloads/. well on artificial/smaller benchmarks. Katherina et al. investigated the
4
https://www.grammatech.com/codesonar-cc. strengths and weaknesses of static code analysis tools in detecting CWEs
5
https://www.synopsys.com/software- integrity/security- testing/ and other vulnerabilities [17]. Although the specific names of the tools
static- analysis- sast.html. were not mentioned, their evaluation revealed that the tested commer-
6
https://dwheeler.com/. cial tools did not exhibit statistically significant differences in their abil-
7
https://www.perforce.com/products/klocwork. ity to detect security vulnerabilities. They underscored the need for
8
https://owasp.org/www-community/Source_Code_Analysis_Tools.
9
further advancements in vulnerability detection techniques. In another
https://cwe.mitre.org/data/definitions/14.html.
10 https://cwe.mitre.org/data/definitions/561.html. study, Pereira et al. evaluated two static analysis tools for their appli-
11
https://www.grammatech.com/codesonar- sast- binary. cability in large projects [11]. They found that these tools exhibited
12
https://www.veracode.com/products/binary- static- analysis- sast. diverse performances, with Flawfinder having higher false alarms
but fewer true negatives, while cppcheck showed high true negatives

2
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

Table 1
Summary of related works We categorize the Technique used in the research works as follows: SC – Source Code, BA – Binary Analysis, SA – Static Analysis, DA –
Dynamic Analysis, SE – Symbolic Execution and ML – Machine Learning.

Research Paper/ Tool Short Description Technique

Ahmed et al. (2022) [9] A survey about machine learning techniques and datasets being used for software vulnerability detection. ML
Research studies focused on were recent. CNN and RNN can give better performance than others.
Yosifova et al. (2021) [10] Predicting vulnerability type in Common Vulnerabilities and Exposures (CVE) database with machine ML
learning classifiers. They gave their evaluation of different ML classifiers for the detection of CVE.
Pereira et al. (2020) [11] Uses open-source C/C+ static analysis tools on large projects. They use two static analysis tools and study SC, SA
their applicability to detect data protection vulnerabilities and coding practices vulnerabilities.
Zaharia et al. (2021) [12] CWE pattern Identification using Semantical Clustering of Programming Language Keywords. They use source SC, ML
code to detect the CWEs incorporating the programmers’ code behavior.
Tiantian et al. (2018) [13] A survey of automatic software vulnerability detection, exploitation, and patching techniques. Binary-based SA, DA, ML
techniques are also studied. Nice breakdown of techniques.
Alenezi et al. (2020) [14] Machine Learning approach to predict computer operating systems vulnerabilities. They use five ML methods ML
to predict the vulnerabilities based on CVSS and evaluate them. Random forest seems to be a good classifier.
No neural network models.
Byun et al. (2020) [6] Analysis of software weakness detection of CBMC based on CWE. Evaluate the ability of CBMC to detect the SA
CWEs. Found the tool to be effective on division by zero, conversion error, and buffer overflows.
Saletta et al. (2020) [7] Use NLP in source codes for identifying CWEs. The classification of 13 CWEs was done. Some strictly related SC, ML
CWEs are misclassified for a Java file.
Lipp et al. (2022) [15] Empirical study on the effectiveness of static C code analyzers. They evaluated five open-source and one SC, SA
commercial static C code analyzer. They found that 47%-80% of the real-world vulnerabilities are missed by
them. A combination of static analyzers delivered better performance.
Cruzes et al. (2018) [8] Survey on techniques for detecting Buffer Overflow (BO) vulnerabilities. A comprehensive survey on buffer SC, SA, BA, DA, ML, SE
overflow detection techniques where multiple techniques and tools are categorized and reviewed. The binary
analysis techniques are also discussed and according to their findings, few recommendations are made.
Lin et al. (2020) [16] Survey on deep learning-based approaches for software vulnerability detection. Recent studies adopting deep ML, SC, BA
learning techniques for software vulnerability detection are done with their challenges and weaknesses.
Katherina et al. (2015) [17] Evaluation of static code analysis tools in detecting CWEs and vulnerabilities. The study evaluated three tools
and found no statistical difference in their ability to detect security vulnerabilities for C/C+ and Java.
Shoshi- taishvili et al. (2016) [18] Comparison of binary analysis techniques and introduction of angr framework. Many different binary SA, BA, SE,
techniques are studied and implemented in their framework;angr. The effectiveness of the techniques was
evaluated. The difficulties of combining many techniques are discussed.
Xue et al. (2020) [19] Study on machine learning-based analysis of program binaries with taxonomy and associated challenges. The ML, BA
paper explores challenges in binary code analysis, discusses various machine learning techniques, and
presents a framework and its application.

but lower false positives. This work is relevant to our evaluation of tools bilities. Similar to angr, several other frameworks have been build that
for CWE detection as both tools are CWE-compatible. However, all these implement the building-block algorithms and provide an API to conduct
earlier works only studied source code based tools. Instead, our goal in binary-level analysis, including Ghidra [20], Radare [21], IDA Pro [22],
this work is to study and evaluate binary-level CWE detection tools. BAP [23], and DynInst [24]. However, most of these tools are not im-
Given the popularity of machine-learning techniques, researchers plemented with built-in algorithms and techniques to directly perform
have also explored their use for software vulnerability detection [10,12– CWE detection. A comprehensive study by Xue et al. examined machine
14]. These studies involved the classification and prediction of weak- learning-based analysis of program binaries, providing a taxonomy of
nesses and vulnerabilities using machine learning methods. Notably, Za- techniques along with their associated challenges [19]. Although we
haria et al. employed the CWE classifier on the intermediate representa- did not find surveys that focus specifically on techniques to detect CWEs
tion of source code to identify security patterns [12]. They demonstrated in program binaries, the aforementioned studies are relevant to our re-
the effectiveness of machine learning techniques in aiding the detection search. Our emphasis lies in studying, collecting, and assessing works
and characterization of software vulnerabilities. Recent developments and tools, especially those that are open-source, to detect the presence
also highlight the use of deep learning techniques to understand vul- of the most important CWEs that occur in compiled binary code.
nerable code patterns and semantics. Lin et al. conducted a survey that
reviewed the literature on deep learning-based approaches for software 2.1. Comparison with existing literature
vulnerability detection [16]. They examined both the techniques em-
ployed and the challenges faced by such techniques for accurately de- Our survey paper stands out in the existing literature due to its spe-
tecting vulnerabilities. cific focus on the detection of Common Weakness Enumerations (CWEs)
All of these previous works only studied source-code based detection and software vulnerabilities in binaries, a topic that has received lim-
techniques. Instead, our research aims to explore detection techniques, ited attention in prior research. While much of the existing literature
open-source tools, and associated challenges to detect CWEs for com- centers on source-level analysis [7,11,12,15], our work fills a crucial
piled language binary programs that can be detected without access to gap by examining these issues specifically at the binary level.
the source code. In contrast to related surveys, such as the empirical studies con-
A limited number of previous studies have investigated binary analy- ducted by Lipp et al. on the effectiveness of static C code analyzers [15],
sis based techniques to uncover vulnerabilities and weaknesses in binary our paper is dedicated to the exploration of static techniques for detect-
code. Shoshitaishvili et al. conducted a comprehensive comparison of ing weaknesses and vulnerabilities in binaries. While some surveys cover
various binary analysis techniques, such as symbolic execution, fuzzing, a broad range of topics that discuss a range of techniques implemented
static analysis and exploit generation, and hardening [18]. They intro- on the source level [25], our paper maintains a clear focus on binary-
duced angr, a powerful binary analysis framework capable of perform- level techniques and tools for the detection of weaknesses, providing
ing such analyses and aiding their comparison in a uniform framework. evaluations of the current state of the art.
This work also implemented and compared diverse approaches, includ- Furthermore, while several other works focused on specific cate-
ing fuzzing and static analysis, for identifying and mitigating vulnera- gories of weaknesses or techniques [8,16], our survey provides a com-

3
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

prehensive overview of static analysis techniques for detecting vulnera- overall CWE database, while the remaining columns display their ID and
bilities across various categories in the binary context. a short description, respectively.
Additionally, while some studies focused solely on binary analysis Selection of relevant research studies
without delving into weaknesses and vulnerabilities [18], our paper ex- We conduct the following steps to find related research works. First,
plores the technical aspects of binary analysis and also specifically ad- we identify primary studies that are chosen based on their direct rele-
dresses the detection of weaknesses and vulnerabilities, offering evalu- vance to the subject matter. This step involved identifying studies that
ations of current tools. specifically addressed CWEs, their detection techniques, and related
Our overarching goal is to contribute to the dissemination of knowl- works. Next, we selected papers based on their references to related
edge about the current state of research and techniques in binary-level works. This step allows for the broader inclusion of studies that have
detection of weaknesses and vulnerabilities, filling a critical gap in the been cited or referenced in the context of CWE detection.
existing literature. We used a few other criteria to choose research works considered
in this study. In addition to works on CWE detection on binaries, we
3. Methodology also found papers on CVEs (Common Vulnerabilities and Exposures) and
vulnerability detection techniques.
The landscape of CWE detection techniques is diverse, encompass- Our selection process also prioritized newer techniques, including
ing both well-studied and lesser-known approaches. To explore the cur- those incorporating machine learning (ML) and artificial intelligence
rent state-of-the-art tools and techniques employed in CWE detection, (AI) approaches. We gave a special preference to open-source projects
we conduct an extensive review of research papers and evaluate open- and research studies to facilitate tool evaluation and repeatability and
source tools. We describe our research methodology in this section. We comparison of results. The selected papers were further categorized into
display a schematic of our methodology in Fig. 1. open-source research projects and closed-source projects.
Selection of CWEs In our study, we primarily focus on compiled lan- Selection of research tools to study state-of-art One of our goals in this
guages – that generate executable binary files. To identify the most crit- work is to conduct a fair comparison of the capabilities and performance
ical weaknesses that could result in significant software vulnerabilities, of the available open-source implementations of techniques to detect
we refer to the top 25 CWEs, along with an additional set of 15 weak- CWEs in software binaries. Therefore, for each selected CWE, we at-
nesses, published by the MITRE Corporation. Then, we narrow down this tempted to find, build, and evaluate the implementations of proposed
set of weaknesses to the top 10 CWEs that are specifically applicable to mitigation techniques on a set of common standard benchmarks.
compiled languages. The list of the top 10 CWEs relevant to compiled Unfortunately, we observed that the number of open-source tools
binaries is shown in Table 2. The first column shows their rank in the specifically designed for detecting CWEs in binaries was limited. Fur-

Fig. 1. A flow-graph schematic of our research


methodology

Table 2
TOP 10 CWEs for compiled language program binaries.

Rank ID CWE

1 CWE-787 Out-of-bounds Write


4 CWE-20 Improper Input Validation
5 CWE-125 Out-of-bounds Read
7 CWE-416 Use After Free
11 CWE-476 NULL Pointer Dereference
13 CWE-190 Integer Overflow or Wraparound
15 CWE-798 Use of Hard-coded Credentials
19 CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer
31 CWE-843 Access of Resource Using Incompatible Type (’Type Confusion’)
36 CWE-401 Missing Release of Memory after Effective Lifetime

4
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

thermore, many of these tools were not in working condition due to detect program errors. The taint analysis method may find it hard to ex-
dependency issues, configuration problems, or a lack of maintenance. plore all paths, and some errors might go undetected. This technique de-
The dynamic nature of software, policy changes, and lack of developer livers few false positives but can have false negatives as all paths might
motivation are factors contributing to the usability issues of these tools. not be explored.
Some tools required fulfilling multiple dependencies, making their
installation and usage demanding. Several others made various assump- 4.4. Machine learning
tions about the working environment or about the format of the pro-
grams tested. Despite the challenges, the tools that worked and were Recently, machine learning based techniques have gained much pop-
relevant to the research were shortlisted, built, and used for benchmark- ularity for vulnerability detection and other security tasks [42,43]. Dif-
ing and evaluation. ferent ML models have been used in these studies, from classical ML
Benchmark selection Multiple benchmarks, such as SPEC2017 [26], models, like random forest and SVM, to deep learning models, including
SARD [27], and Juliet [28] test suite, were used to evaluate the per- RNN and transformers [44]. Usually, a vulnerability database is needed
formance of the selected CWE detection tools. All these programs were to train the dataset, which might not always be readily available. ML
compiled with the Clang/LLVM compiler with no optimizations. Some based techniques are often coupled with static analysis to gather the
of the benchmarks, like SARD and Juliet, provide a ground truth regard- datasets from the binary and then trained to get the classification re-
ing errors and vulnerabilities in each program. For others, like the SPEC sults. Training the ML model can also be expensive in terms of time and
benchmarks, we do not have information about any known software compute resources, though inference is typically much faster.
defects. For such cases, and to compare the performance and detection
accuracy of binary-level tools with a source-level approach, we also em- 5. Survey of techniques to detect coding weaknesses in software
ployed a popular and well-regarded source-level CWE detection tool, binaries
called Sonarqube [29].
In this section we present a comprehensive survey of static tech-
4. Analysis techniques niques and tools developed to detect the ten most important coding
weaknesses for binary software. We organize this section according to
Researchers have developed and used different techniques to detect the order of the top coding weaknesses as listed in Table 2.
software vulnerabilities. In this section, we present a brief introduction
to the main methods used by vulnerability detection techniques for soft- 5.1. Buffer overflow
ware binaries. We limit the scope of this work to static analysis based
techniques. We do not discuss dynamic analysis based software weak- Buffer overflow is a notorious memory error that has plagued soft-
ness detection techniques, such as fuzzing [30,31] in this work. Dynamic ware security for decades. Yet, even after much research and effort to
analysis based techniques can produce fewer false positives, but only an- address this issue, errors related to buffer overflows are still the domi-
alyze code traces that are reached during execution with the limited set nating cause of failures and attacks in binary software.
of provided inputs. The buffer overflow issue occurs when reading from or writing to
memory that exceeds the buffer or memory allocated at that region.
This flaw has numerous consequences like the execution of unautho-
4.1. Static analysis
rized code or commands. Attacks such as denial of service, crashes, re-
source consumption, remote code execution, etc. are also possible [45].
Static program analysis consists of algorithms and heuristics to ex-
Buffer overflows can expose sensitive data and cost businesses huge
tract information about the program, without executing the program
monetary and reputation losses. These weaknesses have been respon-
being analyzed [32]. Static analysis is a popular technique for finding
sible for vulnerabilities in several instances, such as CVE-2021-22991
program vulnerabilities. The binary code is statically analyzed to cre-
and CVE-2020-29557.
ate detailed models of the application’s data and control flow. These
Buffer overflow is a broad term that encompasses several distinct
models are then used for various tasks from vulnerability detection, and
CWEs. Of these, CWE-787, CWE-125, and CWE-119 are in our list of
race condition detection to performance optimization. The static analy-
top 10 CWEs plaguing binary software, as listed in Table 2. Others, like
sis enables inspection of the entire program code but is known to result
CWE-121 and CWE-122, are not in our top 10 list. Below, we provide
in several false positives and false negatives [33].
further details about the more prominent CWEs related to buffer over-
flows.
4.2. Symbolic execution
• CWE-787 Out-of-bounds Write: This CWE tops the list of the 25
Symbolic execution was devised as a way of executing a program most dangerous software weaknesses. This error occurs when the
abstractly, by replacing normal program inputs with arbitrary symbols software writes the data past the end, or before the beginning of
that characterize a set of classes of inputs [34]. Symbolic execution can the intended buffer. Usually, this unintended overwrite can result in
find abstract inputs to explore different paths of the program, which the corruption of data, program crashes, or incorrect execution of
makes this technique useful for vulnerability analysis. This is a popular code.
security technique [35–37]. However, this technique suffers from a high • CWE-125 Out-of-bounds Read: This weakness occurs when the pro-
run-time overhead and the path explosion problem, which restricts its gram code reads data past the end, or before the beginning, of the
applicability [8]. Research is ongoing to determine ways to avoid the intended buffer. Exploiting this weakness can allow attackers to read
path explosion issue, such as filtering out the irrelevant dependencies sensitive information from other memory locations or cause a pro-
in symbolic values. gram crash. When the excess data (out of the bound) is read, it can
expose sensitive program data.
• CWE-119 Improper Restriction of Operations within the Bounds of
4.3. Taint analysis
a Memory Buffer: This is a more general weakness category that in-
dicates a memory read or write that is outside the intended buffer
Taint analysis is a technique used to identify and monitor values that
boundary.
could be influenced by or derived from plausibly malicious program in-
puts. Taint analysis can be applied statically [38–40] or dynamically Static analysis techniques used to detect BO include program slicing,
[41]. The tracking of tainted data and their propagation can be used to pointer analysis, and delta debugging. Additionally, some researchers

5
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

Table 3
Summary of binary-level studies to detect the Buffer Overflow Weakness.

Tools/Li- braries
Research Open sour- ce? Technique Evaluation Results Used Benchm- arks Limitations

Wang et al. (2021) [35] No Static analysis, Taint 88% accuracy None Juliet and SARD Synthesized, small
analysis, concolic exec. benchmarks, poor evaluation
on real-world programs,
closed-source
Padaman-bhuni, No Static analysis, dynamic 92% recall, 81% Pin, WEKA, MIT Lincoln Small benchmarks, low
Tan(2014) [46] analysis, ML precision CodeSurfer accuracy, closed-source
Padaman-bhuni, No Static analysis, ML 75% recall, 84% ROSE, WEKA, MIT Lincoln Small benchmarks, low
Tan(2015) [48] precision IDA Pro accuracy, closed-source
Gao et al. (2020) [50] No Taint analysis, static 94.3% precision, None Self-generated Small benchmarks, bigger
analysis 86.2% recall binaries take longer to
complete, many false
positive, closed-source.
Liang et al (2017) [51] No Dynamic taint analysis, Nine out-of nine QEMU, udis86 9 real-world Small benchmarks,
data recovery, dynamic real-world heap programs closed-source
instrumentation overflow programs
Xiangkun et al. (2017) No Dynamic, taint analysis, 47 new QEMU, solver Z3 17 real-world Small benchmarks, misses
[36] symbolic execution, vulnerabilities programs true vulnerabilities, slow,
fuzzing huge trace size, closed-source
Dahl et al. (2020) [52] Yes RNN, static analysis CCR of 99% web scraper self-generated Restricted to small
benchmarks, cannot analyze
bigger programs
Gotovchits et al. (2018) No Static analysis, static and Five Zero day errors BAP, 𝜇flux COTS, Coreutils Small benchmarks, no
[53] dynamic taint analysis comparison with other
similar tools
Xu et al. (2022) [54] Yes Symbolic execution, 22 out of 29 angr, radare2 24 CTF and 5 Tool installation
dynamic analysis program errors CVE programs issues,limited maintenance,
path explosion
Valgrind [55] Yes Synamic analysis and - Memcheck - Limited support for stack,
instrumentation, JIT static array overflow, slow
CWE Checker[56] Yes Static analysis, symbolic - Ghidra, BAP - Many false positives and
execution false negatives, slow

combine multiple techniques to increase their effectiveness. There has Some researchers focus on specific sub-categories of buffer overflow.
been less research on detecting buffer overflow issues for software bina- Xiangkun et al. (2017) [36], proposed a heap overflow detection tool
ries compared to source-level techniques. In this section, we provide a called HOTracer which models heap overflows as spatial inconsistencies
summary of the binary-level research efforts on this topic, which is also between heap allocation and heap access operations. They performed
presented in Table 3. analysis on program traces and then recognized the heap allocation and
Wang et al. (2021) [35] developed a new tool, named BOF-Sanitizer, heap access operation pairs and checked whether there are spatial in-
to locate buffer overflows, where they combined a metric and rank to consistencies to detect the potential vulnerability. They tested their tool
find the vulnerable locations where potential buffer overflow could ex- on certain software programs like KM player, VLC, iTunes, etc.
ist. Then the taint analysis method is used to find the vulnerable input Researchers developed a tool, named HCSIFTER (2017) [51], to de-
parameters that are symbolized by dynamic symbolic execution tech- tect heap overflows through dynamic execution without the need for
nology and sent to a detection engine and a custom memory model source code. The tool detected five of the nine overflows in their tested
where the buffer overflow is detected. Their approach achieved 88% programs. The tool can also assess the programs for their exploitabil-
accuracy on small benchmarks in the SARD and Juliet test suites. They ity by executing the program binary and analyzing the crash points and
also claimed to find 91 out of 100 vulnerabilities in some other real- exploit points.
world programs. Dahl et al. (2020) [52], proposed a stack-based buffer overflow de-
Padamanbhuni and Tan [46] proposed a vulnerability prediction tection method using recurrent neural networks. They treated the assem-
technique by identifying potentially vulnerable program constructs dur- bly code as a natural language and process it using recurrent neural net-
ing program analysis and getting the buffer usage pattern from code at- works based on long short-term memory cells. The dataset/benchmark
tributes extracted from those constructs. Then, machine learning meth- is self-generated and may not represent real-world data. The Correct
ods were used to predict the buffer overflow. They performed their pro- Classification Rate (CCR) was indeed similar but the dataset it used com-
gram analysis to accurately model an instructions semantics using ROSE, prised small functions that resembled the SARD benchmarks.
a binary analysis framework, and off-the-shelf tools like WEKA [47] and Baradan et al. (2022) [37], proposed a unit-based symbolic execution
IDA Pro [22]. The same authors also combined static and dynamic tech- method for detecting four classes of memory corruption vulnerabilities
niques to identify buffer overflows [48]. They automatically extracted in executable codes. The units are small program codes that might con-
code attributes from C/C++ programs and use the Pin tool [49] for per- tain vulnerable statements and they are statically identified. Each unit
forming the dynamic analysis and the WEKA data mining tool to train is then subjected to symbolic execution to calculate the path and vul-
the vulnerability prediction models. nerable constraints of each statement. Solving these constraints would
Gao et al (2020) [50] tried detecting buffer overflows based on ab- reveal vulnerabilities, if any.
normal program execution. They took instances of successful and ab- Gotovchits et al. (2018) [53] proposed a tool, named Saluki, for de-
normal executions where a group of input data is passed to a program. tecting taint-based security properties in binary code. The tool tried to
They took the memory of successful execution recovery as a buffer find different CWE types like missing sanitization checks, command in-
boundary and judge whether the boundary has overflowed in abnormal jection, or checks on buffer lengths. It used ôflux, a context and path-
execution. sensitive analysis technique to recover data dependence facts in binaries

6
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

and tried to perform a sound logic system for reasoning over these facts for building their tools. This tool is evaluated on the Juliet benchmarks
i.e. if they satisfy a security property. Some of the CWEs it tries to find and real-world programs with known vulnerabilities. The evaluation
include CWE-252, CWE-89, CWE-337/676, CWE-120, and CWE-78. found 2.39% false negative with the Juliet benchmarks, and 5 out of
Xu et al.,(2022) [54] attempted to find the stack buffer overflow 6 real-world cases were detected.
vulnerabilities and generate an exploit. Their BofAEG tool used symbolic GUEB [60] is a static analyzer to perform use-after-free detection on
execution and dynamic analysis to detect vulnerabilities. Out of 24, 22 binaries. It uses value set analysis and tracks pointers and states of the
of the vulnerabilities were found in their self-collected programs. heap objects. The program sub-graph is extracted when GUEB detects
BAP is a binary analysis platform developed at Carnegie Mellon Uni- the use of the freed pointer. This tool also uses IDA Pro and BinNavi to
versity that enables the analysis of binary programs [23]. It includes perform its analysis. Large binaries cannot be analyzed using this tool.
various different analyses, microexecution interpreters, standard inter- Yan et al. [61] introduced a static UAF detector called Tac, that uti-
preters, and a symbolic execution engine. There is a BAP toolkit repos- lized machine learning to bridge the gap between typestate and pointer
itory that has different tools that perform different checks, including analyses. They utilized support vector machines to learn the correla-
buffer overflow detection. Although relatively easy to use, it only de- tion between program features and UAF-related aliases. They tried to
tects heap overflows. find the true UAF bugs with reduced false positives by removing im-
CWE_checker is a large suite of binary-level tools that can detect precise aliases using machine learning. They used program slicing and
multiple classes of errors in program binaries [56]. The CWE_checker performed their path-sensitive type state analysis in addition to machine
uses Ghidra to disassemble binaries into one common intermediate rep- learning to get the desired output.
resentation and then implements its analyses on this IR. CWE_checker In addition, the open-source tools we used earlier, cwe_checker
has implemented checks for many different bugs in program binaries, [56] and BAP [23] also detect use-after-free errors in binary code. Both
including those related to buffer overflows, such as CWE 119, CWE-125, these tools have a high false positive rate.
and CWE-787. Many other execution-time tools have also been built to detect use-
Valgrind [55] is a popular run-time framework that provides several after-free errors in binary code. Such tools include dynamic fuzzers,
binary-level debugging and profiling tools. Memcheck is one Valgrind like UAFUZZ [62], and instrumentation frameworks, like Valgrind [55].
tool that can help find memory leaks in program binaries during execu- Memcheck is one of the tools that employ Valgrind to find several mem-
tion. ory leaks in C and C++ program binaries. These are out-of-scope for
this paper.
5.2. CWE-20 – improper input validation
5.4. CWE-476 – null pointer dereference
CWE-20 is caused when the program input is not validated or is in-
correctly validated. Proper input validation requires the input supplied A NULL pointer dereference occurs when the application derefer-
to be checked to determine if it is valid and conforms to the program’s ences a pointer that it expects to be valid, but is NULL, typically causing
expectations The absence of input validation can result in severe ex- a crash or exit. When a program is trying to dereference a null pointer,
ploits, including buffer overflow and resource consumption attacks. This it is accessing memory at an invalid address, which typically can lead to
CWE has been linked to many CVEs, including CVE-2021-22205 and unexpected behavior, crashes, and security vulnerabilities. It can be dif-
CVE-2008-3477. Researchers have claimed that developers often over- ficult to detect and fix this error in the case of large programs. This weak-
look this issue due to inadequate knowledge and training, even though ness can be exploited to cause serious attacks that include the ability to
it is relatively easy to detect and fix [57]. bypass the security logic, make the program reveal some debugging in-
We found the detection and fixing of this weakness is typically per- formation, or cause abnormal program crashes and other DoS attacks
formed at the source code level. Unfortunately, we did not find any [63]. Some vulnerabilities associated with this CWE include CVE-2020-
dedicated binary analysis tools designed for the detection of CWE-20. 6078 and CVE-2020-29652.
Static and dynamic taint analysis, symbolic execution, fuzzing, data
5.3. CWE-416 – use after free flow and control flow analysis, and dynamic binary instrumentation
techniques are some popular mechanisms to address CWE-476 in recent
The CWE-416 error occurs when the program tries to reference the research. Again, we only focus on presenting binary-level static analysis
memory that has already been freed. Thus, there are three things the techniques in this work.
program must do to trigger this error, (a) allocate heap memory, (b) free The CWE-Checker tool can find this error for cases where a pointer
the memory, and (c) access the freed heap memory again. This error can is explicitly set to Null before the pointer is used in the function [56].
have multiple consequences like corruption of valid data, crashes, execu- However, we found that this tool does not yet find the null pointer deref-
tion of arbitrary code, denial of service, execution of unauthorized code erence if a NULL parameter sent to a function is dereferenced.
or commands, etc. This weakness is also called the dangling pointer er- A tool, called NPDHunter uses an intra-procedural pointer and taint
ror. Some CVEs associated with this CWE are CVE-2020-6819 and CVE- analysis based approach to detect null pointer dereferences in binary
2021-0920. code [64]. This work uses an improved pointer aliasing analysis to cate-
Zhang et al. [58], presented a multi-level directed greybox fuzzing gorize and identify untrusted source cases, and then perform taint-style
tool, called MDFuzz, to detect the use-after-free errors by covering only vulnerability testing to detect whether the data from an untrusted source
specific heap operations. Although this is a fuzzing-based technique, it propagates to a sensitive sink without proper sanitization. The authors
utilizes static analysis to automatically recognize three critical targets here note that static detection methods on binaries, including their own,
related to heap operations: allocating heap memory, freeing memory, are limited due to challenges caused by complex code structures in-
and accessing the heap memory. It then improves the directed fuzzing cluding loops, indirect calls and jumps, etc. Even so, their technique
process by using a novel seed selection strategy and probability-based achieved no false negatives for benchmarks in the Juliet suite. How-
multi-level seed queue. The tool was evaluated on 7 real-world applica- ever, there were a large number of false positives. They also evaluated
tions. the cwe_checker tool and reported that cwe_checker had 70% false pos-
Zhu et al. [59] developed the UAFDetector tool that also combines itives for this check with the same benchmarks. Unfortunately, this tool
static analysis techniques with dynamic mechanisms. This paper focused is not available as open-source, and our attempts to contact the authors
on improving the CFG construction with the help of dynamic binary in- were unsuccessful. They use BAP for IR generation.
strumentation techniques to resolve indirect jumps. The technique per- Tobais et al. developed a tool, called TEEREX that uses a combination
forms alias analysis and pointer tracking. They use IDA Pro and BinNavi of symbolic execution and static analysis to identify potential sources of

7
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

null pointer dereference errors in SGX enclaves in binary code; later Zhang et al. also proposed a hybrid method that combines symbolic
run-time instrumentation is used to monitor the execution to detect any execution, static analysis, and dynamic taint analysis to detect the inte-
actual null pointer accesses [65]. The authors showed that null pointer ger overflow to buffer overflow vulnerabilities in program binaries [77].
dereferences can be used to cause memory corruption and compromise They used the Juliet test suite to evaluate their approach and found zero
the security of the enclave. false positives and zero false negatives.
Gotovchits et al. proposed a taint-style tool, Saluki, for statically
checking security properties and detecting different vulnerabilities [53].
They combined static analysis with taint analysis to perform path- 5.6. CWE-798 – use of hardcoded credentials
sensitive and context-sensitive recovery of the data dependence facts
in binaries. They then checked if the data dependence facts adhere to This weakness occurs when the developer uses hard-coded creden-
their rules to report potential vulnerabilities. They applied their tool to tials, such as passwords or cryptography keys, which it uses for sensi-
five real-world applications and the ARM coreutils binaries. They can tive purposes, for both inbound and outbound variant authentication.
check for numerous potential errors, including “Unchecked Return Val- This weakness can allow an attacker to bypass the authentication. A
ues”. This error can be associated with the Null pointer dereferencing simple string search could sometimes reveal the hard-coded credentials
error as denoted by CWE-690. They also use BAP for IR generation. BAP in the binary. This weakness can cause attacks, such as gaining unin-
and CWE_checker also provide tools to detect CWE-476. tended privileges and execution of unauthorized code or commands.
We have found that standalone static analysis techniques are gener- This CWE has been associated with real-world vulnerabilities, including
ally not used to detect this error. Static techniques are often coupled with CVE-2022-30314 and CVE-2010-2772.
other techniques such as taint analysis, dynamic analysis, or fuzzing. For This error can be mitigated by storing the passwords, keys, and other
instance, Vishnyakov et al. presented an approach that uses symbolic credentials outside of the code in a strongly protected and encrypted
execution, dynamic analysis, and hybrid fuzzing to detect various real- configuration file or a restricted database. The access control should be
world software flaws, including null pointer dereferences [66]. They limited in the case of the hard-coded credentials.
implemented their hybrid fuzzing tool by combining the Sydr [67] with Binary analysis tools such as IDA Pro [22], Ghidra [20], Radare [21],
libFuzzer [68] and AFL++ [69]. They used slicing to improve symbolic and Angr [18] can be utilized to detect possible strings that reveal hard-
execution. This work used their self-created OSS-Sydr-Fuzz repository coded credentials in the binary. Various debuggers can also be used to
and showed that their approach achieved higher coverage than other detect the hardcoded credentials. But both these approaches need some
related tools. manual work.
There are not a lot of research papers or released tools that propose
5.5. CWE-190 – integer overflow or wraparound techniques to detect the presence of hard-coded credentials in binary
software. The BAP binary-level tool claims to detect this weakness [23].
Integer overflow is one of the most common types of software vul- Source code level SAST tools, like Sonarqube [29], Veracode [78],
nerabilities that occurs when a calculation or operation results in a value and Checkmarx [79] use static analysis techniques to detect the hard-
that is outside the range of values that can be stored in an integer data coded values in the program. GitGuardian [80] is another source-level
type. When this occurs, the value can wrap to become a very small or approach that detects the hard-coded secrets in code repositories and
negative number. These integer overflows can cause the program to use performs real-time monitoring to detect secrets in every new commit.
incorrect numbers and respond in unintended ways. For instance, if the
malformed value generated by integer overflow is used to determine
how much memory to allocate, it will cause a buffer overflow which 5.7. CWE-843 – access of resource using incompatible type
is known as Integer Overflow to Buffer Overflow (IO2BO) vulnerability
[70]. Other attacks that are possible by exploiting these weaknesses in- CWE-843, also called Type Confusion, occurs when a program initial-
clude denial of service, program crashes, resource consumption issues, izes a resource, such as a pointer, object, or variable, using one data
and arbitrary code execution. CVE-2018-10887 and CVE-2019-1010006 type but later accessing it with an incompatible type. This issue can
are examples of actual vulnerabilities that were caused by CWE-190 In- potentially cause other logical errors as the resource may not have the
teger Overflow. Below, we review static approaches devised to detect expected properties, and can also result in out-of-bounds memory ac-
this code weakness. cesses. This bug has caused several real-world vulnerabilities, such as
Wang et al. presented a tool, called IntScope, that can automatically CVE-2010-4577 and CVE-2011-0611.
detect the integer buffer overflow vulnerabilities in x86 binaries [71]. It Accurate type detection for binaries, especially after stripping, is chal-
lifts the disassembled code into its intermediate representation (IR) and lenging due to factors such as vanished type casting operators, miss-
performs a path-sensitive data flow analysis and identifies the vulnera- ing class information, and unknown runtime type information. Conse-
ble points for the integer overflow using symbolic execution and taint quently, most previous research to detect this error assumes access to
analysis. Their mechanism used various tools like IDA Pro [22], Bestar the program source code. For instance, Haller et al. proposed TypeSan,
[72], GiNaC [73], and STP [74]. This tool was evaluated on two of the a type confusion detection tool that extends the LLVM compiler [81].
Microsoft programs and was successful in detecting all known vulnera- TypeSan identifies invalid casts by instrumenting the code to monitor
bilities. Additionally, it found more than 20 zero-day integer overflows object allocations and potentially unsafe casts. Similarly, tools like Ef-
in these programs along with several false positives. fectivesan [82], Htade [83], and Hextype [84] also rely on compilers to
Muntean et al. built a tool, named INTREPAIR, to detect and fix in- detect this weakness.
teger overflows in software Binaries [75]. This technique employs sym- Kim et al. introduced a hybrid tool called BinTyper, which combines
bolic execution. The tool only focuses on program paths that are fault- static and dynamic analysis to detect type confusion in C++ binaries
prone like assignment or multiplication. They conducted their evalua- [85]. Through static analysis, BinTyper recovers the class hierarchy and
tion on the Juliet test suite and another synthesized benchmark set of 50 layout of the binary. It then uses dynamic analysis to identify type confu-
programs. This tool successfully detected all actual overflows, but also sion when an application accesses a member variable of a polymorphic
produced many false positives (the actual number is not reported). object. BinTyper was evaluated with Google PDFium and successfully
Huang et al. proposed a hybrid method to detect integer overflow detected some type confusion bugs. However, this method is limited to
errors [76]. They used static taint analysis to find the program points to detecting errors when objects access polymorphic objects, and the cover-
instrument. The instrumented test code at each use and def site checks age is restricted to executed code. This is the only binary-level approach
for the overflow. The delayed runtime test minimizes the false positives. we found to detect CWE-843, and this tool is not open-source.

8
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

5.8. CWE-401 – missing release of memory after effective lifetime findings on the availability, status, and performance of the open-source
implementations of the techniques for the other common software er-
This weakness, also called a Memory Leak, occurs when allocated rors.
memory is not released after it has been used, and which slowly con-
sumes the remaining memory. This error is often caused by improper
handling of malformed data or unexpectedly interrupted sessions. It can 6.1. Buffer overflow detection
also be caused by confusion over which part of the program is responsi-
ble for freeing the memory. This error can cause denial of service, and In this section, we report our findings on the availability and ability
excessive resource consumption (both CPU and memory). Additionally, of open-source tools for buffer overflow detection for software binaries.
this issue can be hard to detect and fix quickly since the effect can take As mentioned previously, the top CWEs that correspond to buffer over-
some time to show itself. This CWE has been identified as the underly- flow detection include CWE-787, CWE-125 and CWE-119.
ing cause of several vulnerabilities, like CVE-2005-3119 and CVE-2022- We found that many of the released tools for buffer overflow de-
38177. tection did not work as documented or as expected. Dahl et al. (2020)
We found that most techniques to detect memory leaks operate on published an open-source implementation of their published work [52].
the source code and many have a dynamic component. We did not find The released software scripts were designed to compile datasets of func-
any approach that is both binary-level and uses only static analysis. tions with potential vulnerabilities, which were then fed into an RNN
Binary-level techniques to detect this error often conduct static anal- model for classification. We found that the underlying code and work-
ysis to identify and insert instrumentation points, which then monitor flow were relatively easy to understand. Likewise, the provided datasets
the code at run-time to detect or prevent leaks. and results were readily accessible. However, when we attempted to uti-
Andrzejak et al. introduced an intriguing machine learning ap- lize our independent benchmarks, we encountered many difficulties. We
proach to detect memory leaks [86]. This is a source-level tech- discovered that the programs required source-level modifications that
nique where they instrument the malloc and free calls in C/C++ went against our primary focus on analyzing unmodified binaries. For
programs to gather data on allocated memory fragments, their life- instance, functions without arguments couldn’t be used. Also, analyzing
times, and sizes to compute feature vectors. These properties were large programs was deemed impractical according to the authors.
then used to train a machine learning classifier to detect memory Baradan et al. have released an open-source implementation of their
leaks. symbolic execution based technique [37]. This tool required an installa-
A number of approaches instrument program binaries to detect mem- tion of the angr framework, which we installed. However, this tool too
ory leaks during program execution. For example, Trishul et al. pre- had limitations when source code was unavailable. Most notably, func-
sented a tool named SWAT that instruments the program binary to tions with default constant data or void datatype were not analyzed and
trace memory allocation and free requests [87]. The profiling is used needed to be modified for evaluation. This prevented us from using this
to construct a heap model and to monitor load/stores to allocated ob- tool to assess other benchmarks, as it stalled during the symbolic exe-
jects with low overhead. They monitor the staleness of each object and cution step.
check if relevant instructions have been executed to predict memory Another open-source tool released by Xu et al. appeared to have lim-
leaks. ited maintenance, as the associated GitHub page was inactive [54]. Not
In another work, Koizumi et al. presented the BIGLeak algorithm surprisingly, this software required non-trivial installation steps. We en-
that performs dynamic binary analysis to group objects based on their countered challenges due to dependency issues, necessitating the down-
allocation context and monitors each group’s size using I/O-based snap- grading of Angr and other dependencies for specific library versions.
shots [88]. Their detection algorithm incorporates intermittency anal- Even after sustained effort, we were unable to successfully execute this
ysis, enabling the rapid identification of both low and high-risk leaks. tool.
When combined with dynamic binary analysis using context-aware ex- BAP and CWE_checker are the only binary-level static buffer over-
ecution sampling, they claim to achieve low run-time overheads. They flow detection open-source tools that worked for us. Additionally, for
demonstrated nearly 100% precision in detecting leaks when real-world comparison of binary-level and source-level approaches to buffer over-
software was employed. flow detection, we used a popular source-code analysis tool, named
Popular binary instrumentation frameworks, like Valgrind and Dy- Sonarqube [29]. Sonarqube is a commercial tool that also provides a
namoRio also provide tools, called Memcheck [55] and Dr. Memory free cloud-based service.
[89], respectively to detect memory leaks. Other tools to detect this The results of our evaluation on the SARD (SARD-88 and SARD-89),
error include Electric Fence [90], mtrace [91], PurifyPlus [92], and De- Juliet, and SPEC benchmarks are shown in Table 4. The SARD and Juliet
leaker [93]. suites contain small programs with ground-truth results. Programs in the
SARD benchmark are available in different categories based on their size
6. Evaluation of open-source tools and type (bad and benign, min, med, and large). The Juliet Test Suite has
over 3000 programs. SPEC benchmarks are larger real-world programs,
In the previous section we surveyed the static techniques and tools but do not provide a ground truth. Therefore, we use the results from the
that were developed to detect the most common weaknesses in binary source-level Sonarqube as the baseline results for the SPEC benchmarks.
software. Along with publishing their work, it is now not uncommon for Thus, our results reveal that both the binary-level tools offer poor
researchers to also release an open-source version of their implementa- buffer overflow detection accuracy with many false positives and false
tion on platforms such as github. To better understand and evaluate the negatives, even for the small SARD and Juliet benchmarks. Surprisingly,
performance of these techniques on a common set of benchmark pro- even the source code analyzer, Sonarqube, is not able to detect all of
grams, we attempted to find, build, and test each technique implemen- the weaknesses performing just slightly better than binary tools in some
tation, if available, for our benchmark set that was previously described cases. BAP shows decent performance for heap overflow detection, but
in Section 3. We also contacted the authors if we encountered issues it does not support buffer overflow detection in other memory regions.
during this process. Overall, we find that there is a scarcity of reliable and effective open-
Of the ten most common binary-level weaknesses we survey in this source buffer overflow detection tools for binary analysis. Our evalua-
paper, we do not find any open-source implementation of techniques to tion revealed significant limitations and challenges associated with the
detect the improper input validation (CWE-20) and access of resource using available open-source tools. The tools exhibit poor accuracy, limited ap-
incompatible type (CWE-843) software errors. Therefore, we do not in- plicability to real-world scenarios, and challenges in installation and
clude them in this section. In the remainder of this section we report our maintenance.

9
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

Table 4
Evaluation of open-source tools to detect Buffer Overflows.

Buffer Over Ground CWE_checker BAP Sonarqube

Benchmarks Truth TP FP FN TP FP FN TP FP FN

SARD 88_bad 14 2 2 10 0 0 14 2 0 12
SARD 88_benign 14 12 2 2 0 0 14 12 2 0
SARD 89_min 291 1 0 290 0 0 291 183 0 108
SARD 89_med 291 1 0 290 0 0 291 183 0 108
SARD 89_large 291 9 0 282 0 0 291 182 0 109
SARD 89_benign 291 291 0 0 0 0 291 291 0 0
Juliet Stack 3198 331 573 2867 0 0 3198 561 0 2637
Juliet Heap 3870 1470 3349 2400 517 21 3353 535 0 3335
SPEC 399∗ 1 160 398 7 0 392 399∗ 0 0

Table 5
Evaluation of open-source tools to detect Use After Free defect.

Benchmarks Ground Truth CWE_checker BAP Sonarqube

UAF TP FP FN TP FP FN TP FP FN

Juliet Test Suite 394 17 18 377 96 42 298 333 0 61


spec 10∗ 0 2 10 1 306 9 10∗ 0 0

6.2. CWE-416 – use after free suite, cwe_checker detected 186 out of 306 NPD cases, while the BAP
toolkit outperformed all other tools by detecting 240 cases. However,
In this section, we review and test open-source binary-level static both tools had a significant number of false positives, with cwe_checker
analysis based tools to detect use-after-free errors. having around half of the total cases as false positives. The BAP toolkit
The GUEB tool developed by Josselin Feist [94] is one such tool that exhibited a lower false positive rate of 80. Both cwe_checker and the BAP
is available on GitHub. However, it has not been actively maintained for toolkit achieved the same detection rate of 13 out of 200 NPD cases for
a long time. We found that the tool’s installation process involves numer- the larger SPEC benchmarks. However, cwe_checker had a significantly
ous dependencies, making it challenging to use for evaluation purposes. higher false positive rate.
We again use BAP and cwe_checker to determine the accuracy of ex- Interestingly, the source code analyzer, Sonarqube, did not perform
isting state-of-the-art tools to detect UAF errors. We did not find any well in NPD detection with about half of the cases being correctly iden-
other open-source binary-level tools to detect UAF that worked for us. tified. Thus for NPD detection, binary analysis based tools outperformed
The Juliet test-suite includes many UAF benchmarks. Additionally, we our source code based tool in terms of detection accuracy.
also use the bigger SPEC 2017 benchmarks to conduct our evaluation.
We use the Sonarqube source-level SAST tool to compare the perfor-
mance of the binary-level tools and use the Sonarqube results as the
baseline for the SPEC benchmarks. 6.4. CWE-190 – integer overflow or wraparound
Our evaluation results are displayed in Table 5. Thus, we can see
that while Sonarqube with access to the source-code performs well, it CWE-190 Integer Overflow is a widely recognized issue with nu-
is not fully accurate in detecting use after free vulnerabilities. The BAP merous detection techniques proposed. Yet, the error is challenging to
tool demonstrated better accuracy than cwe_checker in detecting UAF identify statically as it can be deeply embedded within a program and
vulnerabilities in the Juliet benchmarks. The cwe_checker tool only de- may only become evident under specific input conditions. We found
tected 17 out of 394 total cases in the Juliet test suite. Additionally, CWE_checker to be the sole open-source tool that uses static techniques
the false positives were notably higher for the BAP tool, while the false to detect integer overflows in software binaries.
negatives were significantly higher for cwe_checker. Our evaluation results for CWE_checker are displayed in Table 7. The
In summary, our evaluation findings indicate that the accuracy of all evaluation was conducted using the Juliet test suite benchmark. Almost
the analyzed tools is limited when assessing use after free vulnerabilities 4000 cases were analyzed, of these only 252 were successfully detected.
in both small and large benchmarks. False positives and false negatives Interestingly, Sonarqube only detected 100 of these instances of CWE-
are prominent, highlighting the need for improved algorithms and tools 190. Thus, the prevalence of a large number of false negatives even in
to increase the accuracy and effectiveness in detecting UAF weaknesses. the small Juliet programs suggests that existing approaches may not be
sufficiently accurate for effectively identifying this issue.

6.3. CWE-476 – null pointer dereference

In this section, we review and test open-source binary-level static 6.5. CWE-798 – use of hardcoded credentials
analysis based tools to detect null pointer dereferences. We found that
most researchers did not make their works public or maintain their Although both Sonarqube and BAP claim to detect this error, both
repositories. Therefore, again, the BAP toolkit and cwe_checker were these tools were unable to detect any cases in our set of Juliet and SPEC
the only tools available to use to evaluate NPD detection for binaries. benchmarks. We found that BAP only checks for the hard-coded socket
We again use Sonarqube to compare the results from the binary-level addresses, but does not support the detection of hard-coded passwords.
tools and use the Sonarqube results as the ground truth for SPEC bench- One likely reason for the inability of existing tools to detect this error
marks. is that detection of the hard-coded passwords in the form of character
Table 6 shows the results of our evaluation of the binary-level open- strings is difficult as it can detect all other strings as the potential cre-
source tools for NPD detection on program binaries. For the Juliet test dentials resulting in a large number of false positives.

10
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

Table 6
Evaluation of open-source tools to detect Null Pointer Dereference defect.

Benchmarks Ground Truth Cwe_checker BAP Sonarqube

NPD TP FP FN TP FP FN TP FP FN

Juliet Test Suite 30 186 147 120 240 80 66 180 0 126


SPEC 200∗ 13 228 187 13 19 187 200∗ 0 0

Table 7 Intuitively, this challenge is exacerbated when statically analyzing


Evaluation of open-source tools to detect Integer Overflows. binary programs due to the loss of crucial program information, includ-
Benchmarks Ground Truth Cwe_checker Sonarqube ing types, names, and high-level code syntax and structure in binary
codes. Yet, binary analysis is important in several contexts when source
Integer Overflow TP FP FN TP FP FN
code is either unavailable (viruses and other malware, or third-party
Juliet Test Suite 3960 252 0 3708 100 0 2860 code) or is lost (old/legacy binaries), or to examine the actual program
SPEC 0∗ 0 53 0 0 0 0 that runs on a machine after compiler optimizations. Unfortunately, the
prevalence of false positives and false negatives has led to an under-
utilization of static analysis tools and techniques due to the costs asso-
Table 8 ciated with further manual inspection [96]. Our evaluation in this work
Evaluation of open-source tools to detect Memory Leaks. confirms that these issues regarding accuracy and scalability persist in
current state-of-the-art tools. Resolving this challenge is one of the most
Benchmarks Ground Truth valgrind Sonarqube
critical future frontiers for binary-level static analysis research.
Memory leak TP FP FN TP FP FN In this study, we find that several researchers are currently ad-
Juliet Test Suite 565 98 0 467 357 83 208 dressing this challenge by exploring combinations of static techniques
SPEC 22∗ 13 3 9 22∗ 0 0 with other methods. Many recent works show a shift towards integrat-
ing static analysis with other techniques. Some studies have combined
static and dynamic analysis to mitigate weaknesses in both approaches
6.6. CWE-401 – missing release of memory after effective lifetime [36,48,53], while others have leveraged machine learning and deep
learning alongside static techniques, dynamic fuzzing, and symbolic exe-
We did not find a static analysis based binary-level approach or open- cution [50–52,54,56]. This trend reflects the recognition that static tech-
source tool to detect this weakness. Therefore, in this section, we com- niques alone have limitations and can benefit from synergies with other
pare the detection accuracy of one popular binary-level memory de- complementary code analysis methods.
tection tool (Valgrind’s memcheck) with that of the source-level SAST
tool, Sonarqube. Our results with the Juliet test suite and SPEC bench- 8. Limitations and threats to validity
marks are displayed in Table 8. We find that Valgrind was only able
to detect about 17% of errors correctly for the small Juliet benchmarks. Our current study has several limitations that should be consid-
Sonarqube performs better with an accuracy of around 63% for the same ered when interpreting the results. Firstly, our evaluation of binary-
benchmarks. Thus, new techniques and tools may be necessary to im- level static analysis tools was constrained by the availability of open-
prove the detection accuracy of this bug, especially for program binaries. source options. Several techniques have proprietary implementations
and the codes are not available online. Additionally, even some open-
7. Discussion source projects have outdated dependencies and unmaintained reposi-
tories that make it challenging to evaluate those techniques. In Table 9,
Static analysis techniques are often used during code analysis and we report the other open-source tools that we attempted to employ, but
inspection. However, they often suffer from a high number of false pos- failed to build and/or use in this study for different reasons. Thus, we
itives and false negatives [95]. Lipp et al. reported that even state-of- were unable to assess many of the tools and techniques that have been
the-art source-level static analysis tools that produce accurate results reported in the literature.
on small programs still miss a significant percentage of vulnerabilities Secondly, our benchmark suite used to evaluate the performance of
in real-world benchmarks, ranging from 47% to 80% [15]. Most of these binary-level static analysis tools has a few limitations. The SARD and
studies were conducted with source-level static code analysis techniques Juliet test-suites provide a ground truth regarding errors and vulnera-
or tools. bilities in the codes. However, these are small programs that may not

Table 9
Summary of Vulnerabilities and Tools.

Vulnerability Tool Issue

Buffer Overflow RNN for Vulnerability Detection [52] Only works with provided source code, not others.
UBSYM [37] Tool installs but does not run on our benchmarks, requires source code
change.
bofAEG [54] Tool installs, but none of our benchmarks finish execution due to path
explosion issue.
Use After Free GUEB [94] Code is not maintained, Installation errors due to dependencies.
UBSYM [37] Installs but does not run on our benchmarks, requires source code change.
Null Pointer Dereference No other open-source tools found.
Integer Overflow or Wraparound No other open-source tools found.
Use of Hardcoded Credentials No other open-source tools found.
Memory Leak No static source-open tools found.

11
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

be representative of real-world programs. In contrast, the larger SPEC for refinement and enhancement of existing techniques and tools. Thus,
programs lack ground truth information for assessing the presence of this work distinguishes itself as the first survey of binary-level CWE de-
vulnerabilities. We rely on the source-level analyzer, Sonarqube, to pro- tection techniques, and the first independent assessment of binary-level
vide a ground truth for the SPEC benchmarks. However, as we have open-source tools for identifying software weaknesses, offering valuable
mentioned before, source-level analyzers are not completely accurate insights and setting the stage for further advancements in this critical
either. field.
Finally, our focus on binaries produced by the C/C++ languages
limits the generalizability of our findings to other languages and bina- Declaration of competing interest
ries compiled using multiple languages. Future research should aim to
address these limitations by incorporating a wider range of tools, bench- The authors declare that they have no known competing financial
marks, and language-specific analyses. interests or personal relationships that could have appeared to influence
the work reported in this paper.
9. Future work
CRediT authorship contribution statement
There are many avenues for future work. First, we plan to explore
error categories beyond the top 10 CWEs for program binaries. We also Ashish Adhikari: Writing – review & editing, Supervision, Software,
plan to review dynamic and run-time error detection approaches to com- Project administration, Methodology, Investigation. Prasad Kulkarni:
plement our static analysis focus from this work. Writing – original draft, Visualization, Validation, Software, Methodol-
Second, a major finding from this work is that there is a lack of open- ogy, Investigation, Data curation, Conceptualization.
source research and state-of-the-art tools to accurately detect the impor-
tant CWEs in program binaries. Our future goal is to learn from existing Acknowledgements
approaches to construct such an open-source tool to precisely detect er-
rors in binary software. Likewise, we plan to develop tools to detect This work is sponsored in part by the National Security Agency (NSA)
Common Weakness Enumeration (CWE) vulnerabilities that currently Science of Security Initiative.
lack dedicated tools or research, such as CWE-843 (Access of Resource
Using Incompatible Type) and CWE-798 (Use of Hard-coded Creden- References
tials).
Third, we plan to develop advanced deobfuscation and decompila- [1] N. N. V. Database, Cvss severity distribution over time, (https://nvd.nist.gov/
general/visualizations/vulnerability- visualizations/cvss- severity- distribution- over-
tion techniques that can handle obfuscated and optimized code more
time, Retrieved June 9, 2023).
effectively, aiming to recover higher-level abstractions from low-level [2] C. (Cybersecurity, I. S. A. Advisory, AA22-117A: conti ransomware, (https://www.
binary representations. cisa.gov/news- events/cybersecurity- advisories/aa22- 117a). Retrieved May 17,
Fourth, developing static analysis techniques that can handle exe- 2023.
[3] CISA (Cybersecurity and Infrastructure Security Agency), AA22-216A: conti
cutables built using different languages and multiple architectures, such ransomware, (https://www.cisa.gov/news- events/cybersecurity- advisories/aa22-
as ARM, x86, and others, can also be one of the crucial future works that 216a). Retrieved May 17, 2023.
enable comprehensive analysis of modern software systems. We plan to [4] L. Szekeres, M. Payer, T. Wei, D. Song, Sok: eternal war in memory, in: 2013 IEEE
Symposium on Security and Privacy, 2013, pp. 48–62, doi:10.1109/SP.2013.13.
undertake this research in the future. [5] S.C. Johnson, M. Hill, Lint, a c program checker (1978). https://api.semanticscholar.
Fifth, improving the accuracy of binary-level static analysis tools re- org/CorpusID:59749883.
mains a critical challenge in the field. Enhancing true positives while [6] M. Byun, Y. Lee, J.-Y. Choi, Analysis of software weakness detection of cbmc based
on cwe, 2020, pp. 171–175, doi:10.23919/ICACT48636.2020.9061281.
reducing false positives is essential for reliable vulnerability detection. [7] M. Saletta, C. Ferretti, A neural embedding for source code: security analy-
An emerging approach involves combining the results of static analysis sis and cwe lists, in: 2020 IEEE Intl Conf on Dependable, Autonomic and Se-
with other techniques, such as run-time monitoring or machine learn- cure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf
on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology
ing algorithms. This integration has shown promise in improving over-
Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 523–530, doi:10.1109/
all accuracy and reducing false positives by leveraging complementary DASC- PICom- CBDCom- CyberSciTech49142.2020.00095.
strengths. [8] D.S. Cruzes, M.L. Chaim, D.S. Santos, What do we know about buffer overflow detec-
tion? a survey on techniques to detect a persistent vulnerability, Int. J. Syst. Softw.
Finally, it is crucial to understand how inaccuracies in binary-level
Secur. Protect. 9 (3) (2018) 1-33, doi:10.4018/IJSSSP.2018070101.
static analysis can impact dependent tasks like Control-Flow Integrity [9] S.J. Ahmed, D.B. Taha, Machine learning for software vulnerability detection: a sur-
(CFI). Identifying and mitigating these impacts will be essential for en- vey, in: 2022 8th International Conference on Contemporary Information Technol-
suring the effectiveness of security mechanisms that rely on accurate ogy and Mathematics (ICCITM), 2022, pp. 66–72, doi:10.1109/ICCITM56309.2022.
10031734.
static analysis results. [10] V. Yosifova, A. Tasheva, R. Trifonov, Predicting vulnerability type in common vul-
nerabilities and exposures (cve) database with machine learning classifiers, 2021,
10. Conclusions doi:10.1109/ELECTRONICA52725.2021.9513723.
[11] J.D. Pereira, M.P.A. Vieira, On the use of open-source c/c++ static analysis tools
in large projects, 2020 16th European Dependable Computing Conference (EDCC)
Our goal in this work was to comprehensively review and compare (2020) 97–102.
past research in static analysis based approaches to detect the most im- [12] S. Zaharia, T. Rebedea, S. Trausan-Matu, Cwe pattern identification using semantical
clustering of programming language keywords, 2021, pp. 119–126, doi:10.1109/
portant CWE categories for program binaries. Another major goal was CSCS52396.2021.00027.
to evaluate the accuracy of open-source tools built to detect each stud- [13] T. Ji, Y. Wu, C. Wang, X. Zhang, Z. Wang, The coming era of alphahacking?: a survey
ied program’s weaknesses. We made many significant, interesting, and of automatic software vulnerability detection, exploitation and patching techniques,
in: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC),
novel discoveries and observations in this work. First, we found that we
2018, pp. 53–60, doi:10.1109/DSC.2018.00017.
currently lack tools and techniques to accurately detect many impor- [14] F. Alenezi, C. Tsokos, Machine learning approach to predict computer operating
tant classes of errors in binary software. Second, we found that much systems vulnerabilities, 2020, pp. 1–6, doi:10.1109/ICCAIS48893.2020.9096731.
[15] S. Lipp, S. Banescu, A. Pretschner, An empirical study on the effectiveness of static
research is not available in the open-source domain, and even the tools
c code analyzers for vulnerability detection, Association for Computing Machinery,
that exist are often not maintained and lack critical support. Third, many New York, NY, USA, 2022, doi:10.1145/3533767.3534380.
research works only evaluate their techniques on small benchmarks, and [16] G. Lin, S. Wen, Q.-L. Han, J. Zhang, Y. Xiang, Software vulnerability detection using
their results may not adequately represent performance in real-world ap- deep neural networks: a survey, Proc. IEEE 108 (10) (2020) 1825–1848, doi:10.
1109/JPROC.2020.2993293.
plications. Fourth, many CWE detection techniques suffer from a high [17] K. Goseva-Popstojanova, A. Perhinschi, On the capability of static code anal-
incidence of false positives and false negatives, underscoring the need ysis to detect security vulnerabilities, Inf. Softw. Technol. 68 (2015) 18–33,

12
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

doi:10.1016/j.infsof.2015.08.002. https://www.sciencedirect.com/science/article/ [48] B.M. Padmanabhuni, H.B. Kuan Tan, Auditing buffer overflow vulnerabilities using
pii/S0950584915001366 hybrid static-dynamic analysis, in: 2014 IEEE 38th Annual Computer Software and
[18] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, Applications Conference, 2014, pp. 394–399, doi:10.1109/COMPSAC.2014.62.
S. Feng, C. Hauser, C. Kruegel, G. Vigna, Sok: (state of) the art of war: offensive [49] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi,
techniques in binary analysis, in: 2016 IEEE Symposium on Security and Privacy K. Hazelwood, Pin: building customized program analysis tools with dynamic in-
(SP), 2016, pp. 138–157, doi:10.1109/SP.2016.17. strumentation, in: PLDI ’05, Association for Computing Machinery, New York, NY,
[19] H. Xue, S. Sun, G. Venkataramani, T. Lan, Machine learning-based analysis of USA, 2005, p. 190-200, doi:10.1145/1065010.1065034.
program binaries: a comprehensive study, IEEE Access 7 (2019) 65889–65912, [50] T. Gao, X. Guo, Buffer overflow vulnerability location in binaries based on abnor-
doi:10.1109/ACCESS.2019.2917668. mal execution, in: 2020 4th Annual International Conference on Data Science and
[20] National Security Agency, Ghidra, (GitHub repository). https://github.com/ Business Analytics (ICDSBA), 2020, pp. 29–31, doi:10.1109/ICDSBA51020.2020.
NationalSecurityAgency/ghidra, Retrieved May 13, 2023. 00015.
[21] R. Team, Radare2 github repository, (https://github.com/radare/radare2, Retrieved [51] L. He, Y. Cai, H. Hu, P. Su, Z. Liang, Y. Yang, H. Huang, J. Yan, X. Jia, D. Feng, Au-
May 13, 2023). tomatically assessing crashes from heap overflows, in: 2017 32nd IEEE/ACM Inter-
[22] Hex-Rays, IDA Pro, (https://hex- rays.com/ida- pro/, Retrieved May 3), 2023. national Conference on Automated Software Engineering (ASE), 2017, pp. 274–279,
[23] Cifuentes, Cristina and Levin, Mark and Ramos, Jaime and others, BAP (binary analy- doi:10.1109/ASE.2017.8115640.
sis platform), (https://github.com/BinaryAnalysisPlatform/bap). Retrieved May 13, [52] W.A. Dahl, L. Erdodi, F.M. Zennaro, Stack-based buffer overflow detection using
2023. recurrent neural networks, 2020, 2012.15116.
[24] Dyninst Development Team, Dyninst, (GitHub repository). https://github.com/ [53] I. Gotovchits, R. Van Tonder, D. Brumley, Saluki: finding taint-style vulnerabilities
dyninst/dyninstRetrieved May 13, 2023. with static property checking, in: Proceedings of the NDSS Workshop on Binary
[25] A.C. Eberendu, V.I. Udegbe, E.O. Ezennorom, A.C. Ibegbulam, T.I. Chinebu, A sys- Analysis Research, volume 2018, 2018.
tematic literature review of software vulnerability detection, Eur. J. Comput. Sci. [54] S. Xu, Y. Wang, L. Coppolino, Bofaeg: automated stack buffer overflow vulnerability
Inf. Technol. 10 (1) (2022) 23–37. detection and exploit generation based on symbolic execution and dynamic analysis
[26] SPEC CPU2017, (Standard Performance Evaluation Corporation). https://www. 2022 (2022), doi:10.1155/2022/1251987.
spec.org/cpu2017/, Retrieved March 2, 2023. [55] N. Nethercote, J. Seward, Valgrind: a framework for heavyweight dynamic bi-
[27] National Institute of Standards and Technology (NIST), Software assur- ance refer- nary instrumentation, SIGPLAN Not. 42 (6) (2007) 89-100, doi:10.1145/1273442.
ence dataset (SARD) benchmarks, NIST Software Assurance Metrics And Tool Eval- 1250746.
uation (SAMATE). https://samate.nist.gov/SARD/test-suites/89. [56] N.-E. Enkelmann, T. Barabosch, CWE Checker, (GitHub repository). https://github.
[28] National Institute of Standards and Technology (NIST), Juliet Test Suite, (NIST Soft- com/fkie-cad/cwe_checker Retrieved May 1, 2023.
ware Assurance Metrics And Tool Evaluation (SAMATE)b). https://samate.nist.gov/ [57] L. Braz, E. Fregnan, G. Çalikli, A. Bacchelli, Why don’t developers detect improper in-
SARD/test-suite/JULIET.html. put validation? ’; drop table papers; –, in: 2021 IEEE/ACM 43rd International Confer-
[29] SonarQube, (https://www.sonarsource.com/products/sonarqube/). Retrieved Jan- ence on Software Engineering (ICSE), 2021, pp. 499–511, doi:10.1109/ICSE43902.
uary 4, 2023. 2021.00054.
[30] X. Zhu, S. Wen, S. Camtepe, Y. Xiang, Fuzzing: a survey for roadmap, ACM Comput. [58] Y. Zhang, Z. Wang, W. Yu, B. Fang, Multi-level directed fuzzing for detecting use-
Surv. 54 (11s) (2022), doi:10.1145/3512345. after-free vulnerabilities, in: 2021 IEEE 20th International Conference on Trust, Se-
[31] P. Godefroid, Fuzzing: Hack, art, and science, Commun. ACM 63 (2) (2020) 70-76, curity and Privacy in Computing and Communications (TrustCom), 2021, pp. 569–
doi:10.1145/3363824. 576, doi:10.1109/TrustCom53373.2021.00087.
[32] P. Thomson, Static analysis, Commun. ACM 65 (1) (2021) 50-54, doi:10.1145/ [59] K. Zhu, Y. Lu, H. Huang, Scalable static detection of use-after-free vulnerabilities in
3486592. binary code, IEEE Access PP (2020) 1–1, doi:10.1109/ACCESS.2020.2990197.
[33] T. Muske, A. Serebrenik, Survey of approaches for postprocessing of static analysis [60] J. Feist, Finding the Needle in the Heap: Combining Binary Analysis Techniques to
alarms, ACM Comput. Surv. 55 (3) (2022), doi:10.1145/3494521. Trigger Use-After-Free, Université Grenoble Alpes, 2017 Ph.D. thesis. https://theses.
[34] J.C. King, Symbolic execution and program testing, Commun. ACM 19 (7) (1976) hal.science/tel-01681707v2/document
385-394, doi:10.1145/360248.360252. [61] H. Yan, Y. Sui, S. Chen, J. Xue, Machine-learning-guided typestate analysis for static
[35] W. Wang, M. Fan, A. Yu, D. Meng, Bofsanitizer: efficient locator and detector for use-after-free detection, Association for Computing Machinery, New York, NY, USA,
buffer overflow vulnerability, in: 2021 IEEE 23rd Int Conf on High Performance 2017, doi:10.1145/3134600.3134620.
Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int [62] M.-D. Nguyen, S. Bardin, R. Bonichon, R. Groz, M. Lemerre, Binary-level directed
Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data fuzzing for Use-After-Free vulnerabilities, in: 23rd International Symposium on
Systems & Application (HPCC/DSS/SmartCity/DependSys), 2021, pp. 1075–1083, Research in Attacks, Intrusions and Defenses (RAID 2020), USENIX Association,
doi:10.1109/HPCC- DSS- SmartCity- DependSys53884.2021.00168. San Sebastian, 2020, pp. 47–62. https://www.usenix.org/conference/raid2020/
[36] X. Jia, C. Zhang, P. Su, Y. Yang, H. Huang, D. Feng, Towards efficient heap presentation/nguyen.
overflow discovery, in: 26th USENIX Security Symposium (USENIX Security 17), [63] OWASP, Null dereference, https://owasp.org/www-community/vulnerabilities/
USENIX Association, Vancouver, BC, 2017, pp. 989–1006. https://www.usenix.org/ Null_Dereference Retrieved May 10, 2023.
conference/usenixsecurity17/technical-sessions/presentation/jia. [64] W. Jin, S. Ullah, D. Yoo, H. Oh, Npdhunter: efficient null pointer dereference vul-
[37] S. Baradaran, M. Heidari, A. Kamali, M. Mouzarani, A unit-based symbolic execution nerability detection in binary, IEEE Access 9 (2021) 90153–90169, doi:10.1109/
method for detecting memory corruption vulnerabilities in executable codes, Int. J. ACCESS.2021.3091209.
Inf. Secur. 22 (2023) 1–14, doi:10.1007/s10207- 023- 00691- 1. [65] T. Cloosters, M. Rodler, L. Davi, TeeRex: discovery and exploitation of mem-
[38] T. Yavuz, C. Brant, Security analysis of iot frameworks using static taint analysis, in: ory corruption vulnerabilities in SGX enclaves, in: 29th USENIX Security Sympo-
Proceedings of the Twelfth ACM Conference on Data and Application Security and sium (USENIX Security 20), USENIX Association, 2020, pp. 841–858. https://www.
Privacy, in: CODASPY ’22, Association for Computing Machinery, New York, NY, usenix.org/conference/usenixsecurity20/presentation/cloosters
USA, 2022, p. 203-213, doi:10.1145/3508398.3511511. [66] A. Vishnyakov, D. Kuts, V. Logunova, D. Parygina, E. Kobrin, G. Savidov, A. Fe-
[39] D. Boxler, K.R. Walcott, Static taint analysis tools to detect information flows, in: dotov, Sydr-Fuzz: continuous hybrid fuzzing and dynamic analysis for security de-
Proceedings of the Int’l Conf. Software Eng. Research and Practice (SERP’18), 2018. velopment lifecycle, in: 2022 Ivannikov ISPRAS Open Conference (ISPRAS), IEEE,
[40] GrammaTech Inc., GrammaTech, (Official Website).https://www.grammatech. 2022, pp. 111–123, doi:10.1109/ISPRAS57371.2022.10076861.
com/ Retrieved May 1, 2023. [67] Google, Google OSS-Fuzz: Continuous Fuzzing for Open Source Software, (https:
[41] E.J. Schwartz, T. Avgerinos, D. Brumley, All you ever wanted to know about dynamic //github.com/google/oss-fuzz, Retrieved July 12), 2023.
taint analysis and forward symbolic execution (but might have been afraid to ask), [68] L. Project, LibFuzzer - LLVM 13 documentation, 2023, https://llvm.org/docs/
in: 2010 IEEE Symposium on Security and Privacy, 2010, pp. 317–331, doi:10.1109/ LibFuzzer.html, Retrieved July 12, 2023.
SP.2010.26. [69] A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse, AFL++ : combining incremen-
[42] A. Aumpansub, Z. Huang, Learning-based vulnerability detection in binary code, in: tal steps of fuzzing research, 14th USENIX Workshop on Offensive Technolo-
2022 14th International Conference on Machine Learning and Computing (ICMLC), gies (WOOT 20), USENIX Association, 2020. https://www.usenix.org/conference/
in: ICMLC 2022, Association for Computing Machinery, New York, NY, USA, 2022, woot20/presentation/fioraldi
p. 266-271, doi:10.1145/3529836.3529926. [70] The MITRE Corporation, CWE-680: Integer Overflow to Buffer Overflow (IO2BO),
[43] S.M. Ghaffarian, H.R. Shahriari, Software vulnerability analysis and discovery using (MITRE CWE). https://cwe.mitre.org/data/definitions/680.htmlRetrieved July 20,
machine-learning and data-mining techniques: a survey, ACM Comput. Surv. 50 (4) 2023.
(2017), doi:10.1145/3092566. [71] T. Wang, T. Wei, Z. Lin, W. Zou, Intscope: automatically detecting integer overflow
[44] N.S. Harzevili, A.B. Belle, J. Wang, S. Wang, Z. Ming, Jiang, N. Nagappan, A survey vulnerability in x86 binary using symbolic execution, 2009.
on automated software vulnerability detection using machine learning and deep [72] T. Wei, J. Mao, W. Zou, Y. Chen, A new algorithm for identifying loops in decom-
learning, 2023, 2306.11673 pilation, in: H.R. Nielson, G. Filé (Eds.), Static Analysis, Springer Berlin Heidelberg,
[45] Open Web Application Security Project (OWASP), OWASP Buffer Overflow, Berlin, Heidelberg, 2007, pp. 170–183.
(OWASP). https://owasp.org/www-community/vulnerabilities/Buffer_Overflow [73] C. Bauer, A. Frink, R. Kreckel, Introduction to the ginac framework for symbolic
Retrieved May 13, 2023. computation within the c++ programming language, ArXiv cs.SC/0004015 (2000).
[46] B.M. Padmanabhuni, H.B.K. Tan, Buffer overflow vulnerability prediction from x86 [74] V. Ganesh, D.L. Dill, A decision procedure for bit-vectors and arrays, in: Proceedings
executables using static analysis and machine learning, in: 2015 IEEE 39th Annual of the 19th International Conference on Computer Aided Verification, in: CAV’07,
Computer Software and Applications Conference, volume 2, 2015, pp. 450–459, Springer-Verlag, Berlin, Heidelberg, 2007, p. 519-531.
doi:10.1109/COMPSAC.2015.78. [75] P. Muntean, M. Monperrus, H. Sun, J. Grossklags, C. Eckert, Intrepair: informed
[47] University of Waikato, Weka Data Mining Tool, (Weka Wiki). https://waikato. repairing of integer overflows, IEEE Trans. Softw. Eng. 47 (10) (2021) 2225–2241,
github.io/weka-wiki/, Retrieved August 20, 2023. doi:10.1109/TSE.2019.2946148.

13
A. Adhikari and P. Kulkarni Cyber Security and Applications 3 (2025) 100061

[76] Z. Huang, X. Yu, Integer overflow detection with delayed runtime test, 2021, pp. 1–6, [86] A. Andrzejak, F. Eichler, M. Ghanavati, Detection of memory leaks in c/c++ code
doi:10.1145/3465481.3465771. via machine learning, in: 2017 IEEE International Symposium on Software Relia-
[77] B. Zhang, C. Feng, B. Wu, C. Tang, Detecting integer overflow in windows binary bility Engineering Workshops (ISSREW), 2017, pp. 252–258, doi:10.1109/ISSREW.
executables based on symbolic execution, in: 2016 17th IEEE/ACIS International 2017.72.
Conference on Software Engineering, Artificial Intelligence, Networking and Par- [87] M. Hauswirth, T.M. Chilimbi, Low-overhead memory leak detection using adaptive
allel/Distributed Computing (SNPD), 2016, pp. 385–390, doi:10.1109/SNPD.2016. statistical profiling, SIGPLAN Not. 39 (11) (2004) 156-164, doi:10.1145/1037187.
7515929. 1024412.
[78] Veracode, (https://www.veracode.com/). Retrieved May 1, 2023. [88] Y. Koizumi, Y. Arahori, Risk-aware leak detection at binary level, in: 2020 IEEE
[79] Checkmarx, (https://checkmarx.com/). Retrieved April 4, 2023. 25th Pacific Rim International Symposium on Dependable Computing (PRDC), 2020,
[80] GitGuardian, (https://www.gitguardian.com/). Retrieved May 10, 2023. pp. 171–180, doi:10.1109/PRDC50213.2020.00028.
[81] I. Haller, Y. Jeon, H. Peng, M. Payer, C. Giuffrida, H. Bos, E. van der Kouwe, Typesan: [89] D. Bruening, Q. Zhao, Practical memory checking with dr. memory, in: International
practical type confusion detection, in: CCS ’16, Association for Computing Machin- Symposium on Code Generation and Optimization (CGO 2011), 2011, pp. 213–223,
ery, New York, NY, USA, 2016, p. 517-528, doi:10.1145/2976749.2978405. doi:10.1109/CGO.2011.5764689.
[82] G.J. Duck, R.H.C. Yap, Effectivesan: type and memory error detection using dy- [90] ElectricFence, (https://github.com/kallisti5/ElectricFence, Retrieved March 4),
namically typed c/c++, in: Proceedings of the 39th ACM SIGPLAN Conference 2023.
on Programming Language Design and Implementation, in: PLDI 2018, Association [91] mtrace, (https://man7.org/linux/man-pages/man3/mtrace.3.html, Retrieved May
for Computing Machinery, New York, NY, USA, 2018, p. 181-195, doi:10.1145/ 3), 2023.
3192366.3192388. [92] PurifyPlus, (https://www.ibm.com/docs/en/announcements/archive/ENUS204-
[83] X. Fan, S. Long, C. Huang, C. Yang, F. Li, Accelerating type confusion detection by 063, Retrieved May 3), 2023.
identifying harmless type castings, in: Proceedings of the 20th ACM International [93] Deleaker, (https://www.deleaker.com/, Retrieved May 3), 2023.
Conference on Computing Frontiers, in: CF ’23, Association for Computing Machin- [94] J. Feist, GUEB: a static analyzer performing use-after-free detection on binary, 2018,
ery, New York, NY, USA, 2023, p. 91-100, doi:10.1145/3587135.3592205. (https://github.com/montyly/gueb Retrieved July 12), 2023.
[84] Y. Jeon, P. Biswas, S. Carr, B. Lee, M. Payer, Hextype: efficient detection of type [95] I. Elkhalifa, B. Ilyas, Static code analysis: a systematic literature review and an in-
confusion errors for c++, in: CCS ’17, Association for Computing Machinery, New dustrial survey, 2016,
York, NY, USA, 2017, p. 2373-2387, doi:10.1145/3133956.3134062. [96] T. Muske, A. Serebrenik, Survey of approaches for postprocessing of static analysis
[85] D. Kim, S. Kim, Bintyper: type confusion detection for c++ binaries, alarms 55 (3) (2022), doi:10.1145/3494521.
BlackHat Europe, 2020. https://www.blackhat.com/eu-20/briefings/schedule/
bintyper- type- confusion- detection- for- c- binaries- 21351

14

You might also like