0% found this document useful (0 votes)
38 views5 pages

On The Effectiveness of Deep Vulnerability Detectors To Simple Stupid Bug Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views5 pages

On The Effectiveness of Deep Vulnerability Detectors To Simple Stupid Bug Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)

On the Effectiveness of Deep Vulnerability


2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) | 978-1-7281-8710-5/20/$31.00 ©2021 IEEE | DOI: 10.1109/MSR52588.2021.00068

Detectors to Simple Stupid Bug Detection


Jiayi Hua, Haoyu Wang
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China

Abstract—Recent studies have shown the promising direction fixes into 16 bug templates, such as “change identifier used”,
of deep learning based bug detection, which relieves human “change caller in function call” and “overload method more
experts from the tedious and subjective task of manually summa- args”. Since deep learning based bug detection methods show
rizing features. Simple one-statement bugs (i.e., SStuBs), which
occur relatively often in Java projects, cannot be well spotted high performance on traditional complex vulnerabilities, the
by existing static analysis tools. In this paper, we make effort question arises whether these methods can be used to detect
to empirically analyze whether deep learning based techniques SStuBs and achieve promising results.
could be used to detecting SStuBs. We have re-implemented This Work. In this paper, we seek to empirically analyze the
two state-of-the-art techniques in approximately 3,000 lines of performance of deep learning based bug detection techniques
code and adopted them to detecting Java SStuBs. Experiments
on large-scale datasets suggest that although deep vulnerability on locating SStuBs. Specifically, we make a huge effort to re-
detectors can achieve much better results than existing static implement two state-of-the-art techniques, VulDeePecker [4]
analyzers, the SStuBs cannot be well flagged when comparing and SySeVR [6] for analyzing Java source code. Note that
with traditional complex vulnerabilities. We further look in detail these two techniques are originally designed for C/C++ code,
on the per bug category basis, observing that deep learning based and VulDeePecker [4] is not open source. We implemented
methods perform better when detecting some specific types of
bugs (e.g., “Same Function Change Caller”), which have strong these two detectors in approximately 3,000 lines of code to
data flow and control flow semantic. Our observations could offer demonstrate their performance on Java SStuBs bugs. VulDeeP-
implications on the automated detection and repair of SStuBs. ecker [4] considers the data dependency of program and
performs program slicing based on key library/API function
I. I NTRODUCTION calls. It then assembles the program slices obtained into code
Bug detection and program repair are indispensable in soft- gadgets for training and detecting. SySeVR [6] extracts both
ware maintenance. Detecting and fixing bugs at the early stage data dependency and control dependency information extend-
of software development cycle will reduce software mainte- ing from vulnerability syntax characteristics for model training
nance cost. In order to alleviate manual effort of locating and and detecting. These two deep learning based techniques are
repairing bugs, many tools and techniques are proposed to considered in this paper because they perform excellently in
detect bugs automatically, e.g., SpotBugs [1], PMD [2] and bug detection tasks and they have a finer granularity (i.e., at
CheckStyle [3]. However, these traditional static analysis tools program slice level) when pinpointing bugs, which is more
need experts to define specific detection rules for different practical in real world scenarios.
types of bugs in advance, which is still labour-intensive and To compare the effectiveness of these two approaches on
time-consuming, and may incur high false positive/negative detecting both complex Java code vulnerabilities and SStuBs,
rates. To deal with this limitation, some researchers turn their we first built benchmarks for two representative vulnerabilities
attention to deep learning based techniques. Recent studies (i.e., CWE-22 and CWE-79), and then applied the two deep
have shown that deep learning can boost the performance learning based methods to them. Both methods can achieve
of detecting data-flow-related vulnerabilities [4], control-flow- a detection accuracy over 90%, which suggests that our re-
related vulnerabilities [5] and a wide range of vulnerabil- implementation is correct and traditional complex vulnera-
ities [6, 7], comparing to well-known conventional static bilities like CWE-22 and CWE-79 can be well flagged by
detectors. For example, VulDeePecker [4] claims to achieve an these two deep learning based methods. As a comparison,
F1-measure of over 90% when detecting buffer errors, which is we next applied these two deep learning based techniques
much better than other static pattern based analyzers, including to the ManySStuBs4J dataset, where the detection accuracy
Flawfinder [8], RATS [9] and Checkmarx [10]. is only around 70% for VulDeePecker detector and 66% for
Simple one-statement bugs, or the so-called simple stupid SySeVR detector. Although these two approaches can achieve
bugs (SStuBs), are bugs that appear on a single statement and much better results than existing static analysis tools, the
the corresponding fix is within that statement. Prior work [11] results showed that SStuBs cannot be well flagged comparing
showed that SStuBs are relatively common in Java projects to traditional complex vulnerabilities. We further analyze the
with a frequency of about one bug per 1,600-2,500 lines of results for the 16 types of bugs, finding that the effectiveness
code. They also provided a dataset, ManySStuBs4J, which of deep vulnerability detectors vary among different types of
has 153,652 SStuBs fix changes mined from 1,000 popular SStuBs. The detecting accuracy of some types of SStuBs is
open-source Java projects in GitHub. They also classified obviously higher than other bug types. This is mainly because

978-1-7281-8710-5/21/$31.00 ©2021 IEEE 530


DOI 10.1109/MSR52588.2021.00068

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on July 31,2022 at 21:27:07 UTC from IEEE Xplore. Restrictions apply.
1 public void execute() throws MojoExecutionException 1 public void execute() throws MojoExecutionException
2 { 2 {
___ training 3 File jarFile = new File( basedir, jarName + ".jar" ); 3 File jarFile = new File( basedir, jarName + ".jar" );
___
______
well- 4 MavenArchiver archiver = new MavenArchiver(); 4 MavenArchiver archiver = new MavenArchiver();
______
______
______ trained 5 archiver.setOutputFile( jarFile ); 5 archiver.setOutputFile( jarFile );
____ 6 String ejbJarXmlFile = "META-INF/ejb-jar.xml"; 6 String ejbJarXmlFile = "META-INF/ejb-jar.xml";
models 7 try 7 try
8 { 8 {
extracting generating vector 9 archiver.getArchiver().addDirectory(outputDirectory, ejbJarXmlFile); 9 archiver.getArchiver().addDirectory(outputDirectory, ejbJarXmlFile);
source code for trainging 10 archiver.getArchiver().addFile(outputDirectory, ejbJarXmlFile ); 10 archiver.getArchiver().addFile(outputDirectory, ejbJarXmlFile );
sensitive code representation 11 archiver.createArchive( project, archive ); 11 archiver.createArchive( project, archive );
points snippets of code snippets 12 if ( generateClient.booleanValue() ) 12 if ( generateClient.booleanValue() )
13 { 13 {
___ 14 File clientJarFile = new File( basedir, jarName + "-client.jar" ); 14 File clientJarFile = new File( basedir, jarName + "-client.jar" );
___ 15 MavenArchiver clientArchiver = new MavenArchiver(); 15 MavenArchiver clientArchiver = new MavenArchiver();
______
______
______
______
detecting 16 clientArchiver.setOutputFile( clientJarFile ); 16 clientArchiver.setOutputFile( clientJarFile );
17 clientArchiver.getArchiver().addDirectory(outputDirectory); 17 clientArchiver.getArchiver().addDirectory(outputDirectory);
____ results 18 archiver.createArchive( project, archive ); 18 archiver.createArchive( project, archive );
detecting 19 } 19 }
20 } 20 }
target source code 21 catch ( Exception e ) 21 catch ( Exception e )
22 { 22 {
23 throw new MojoExecutionException( "Error assembling EJB", e ); 23 throw new MojoExecutionException( "Error assembling EJB", e );
24 } 24 }
25 } 25 }
Fig. 1. Overview of the deep learning based bug detection.

some kinds of bugs only involve minor changes, which have Fig. 2. An example of locating sensitive points.
little relationship with data flow and control flow information.
3 File jarFile = new File( basedir, jarName + ".jar" );
Our findings suggest that different kinds of approaches should 4
5
MavenArchiver archiver = new MavenArchiver();
archiver.setOutputFile( jarFile );
6 String ejbJarXmlFile = "META-INF/ejb-jar.xml";
be combined together for better detecting SStuBs. We have 1 public void execute() throws MojoExecutionException
2 {
9
10
archiver.getArchiver().addDirectory(outputDirectory, ejbJarXmlFile);
archiver.getArchiver().addFile(outputDirectory, ejbJarXmlFile );

released our crafted benchmark and experiment results to the 3


4
File jarFile = new File( basedir, jarName + ".jar" );
MavenArchiver archiver = new MavenArchiver();
11
18
archiver.createArchive( project, archive );
archiver.createArchive( project, archive );
5 archiver.setOutputFile( jarFile ); (a) VulDeePecker based method
research community at: 6
7
String ejbJarXmlFile = "META-INF/ejb-jar.xml";
try
8 { control
9 archiver.getArchiver().addDirectory(outputDirectory, ejbJarXmlFile); 1 7 23 dependency
ejbJarXmlFile e
10 archiver.getArchiver().addFile(outputDirectory, ejbJarXmlFile );
https://doi.org/10.5281/zenodo.4609689 11
12
archiver.createArchive( project, archive );
if ( generateClient.booleanValue() ) 3
6 ejbJarXmlFile 21
data
dependency

5 archiver
13 { 4 9
archiver 11
14 File clientJarFile = new File( basedir, jarName + "-client.jar" ); archiver 10 12
15 MavenArchiver clientArchiver = new MavenArchiver();
II. D EEP V ULNERABILITY D ETECTORS 16
17
clientArchiver.setOutputFile( clientJarFile );
clientArchiver.getArchiver().addDirectory(outputDirectory);
jarFile

archiver
archiver

18 14 15 17 16
18 archiver.createArchive( project, archive ); clientArchiver
19 }
We first introduce the overview of the methodology we 20 }
21 catch ( Exception e )
clientJarFile
clientArchiver

22 {
have implemented to detecting SStuBs. Then we illustrate the 23
24 }
throw new MojoExecutionException( "Error assembling EJB", e );

25 } 1 public void execute() throws MojoExecutionException


process step by step with a real example. 4
7
MavenArchiver archiver = new MavenArchiver();
try
12 if ( generateClient.booleanValue() )
18 archiver.createArchive( project, archive );

A. Overview (b) SySeVR based method

Fig. 3. An example of code snippets generation.


As shown in Fig.1, deep vulnerability detectors in general
consist of two phases, training phase and detecting phase.
The inputs to training phase are the source code with ground identifiers (e.g., operand, arguments and caller of a method),
truth information (i.e., vulnerable and correct). After feature and assemble the lines of code obtained into code snippets.
extraction, the source code is represented in vectors and used As shown in Fig. 3, method invocation in line 18 is a
to train bug detection models. In detecting phase, unknown sensitive point, and “archiver”, “project” and “archive” are
source code is transferred into vectors through the same steps three corresponding identifiers. We get the data flow slices of
as training phase. The output of detecting phase is “correct” the three identifiers and assemble them according to the order
or “vulnerable” for each code snippet. of the statements’ appearance in code. The final code snippet
is shown in Fig.3 (a). (2) SySeVR detector. For each sensitive
B. Training phase point, we first generate program dependency graph (PDG) for
Training phase consists of four main steps, including 1) each method, then generate the inter-procedural backward and
locating sensitive points, 2) code snippets generation and forward slice, finally assemble the corresponding code of lines
labeling, 3) vector representation and 4) model training. into code snippets. For example, in Fig. 3, sensitive point in
1) Locating Sensitive Points: Sensitive point is the syntax line 18 has data dependency on line 4 and control dependency
characteristics where most simple stupid bugs manifest, which on line 12, line 4 and line 12 further have control dependency
is similar as “key point” mentioned in VulDeePecker [4] and on line 1 and line 7, separately. Therefore the code snippet
“SyVCs” defined in SySeVR [6]. Here we choose the flowing consists of line 1,4,7,12,18, as shown in Fig. 3 (b).
syntax characteristics as sensitive points: object construction, 3) Vector Representation: We use the word2vec [12] to
method invocation, expression statement, conditional state- transform code tokens into vectors. A code snippet will be
ment and loop statement. In this step, we first create abstract divided into a sequence of tokens and then transformed into
syntax trees (ASTs) of each source code files and then extract integers using a well-trained word2vec model. Since deep
all sensitive points. Fig. 2 shows an example of a simple learning models usually take equal-length vectors as input, the
SStuB. The developer wants to build ejb client in line 18 but vector representation of code snippets need some adjustment.
build ejb twice by error. The sensitive points we extracted Let l denotes the length of vectors that we should input to the
from the code snippet are highlighted by boxes. models. For vectors that are shorter than l, we pad zeros in
2) Code Snippets Generation and Labeling: A code snippet the beginning of the vector. For vectors that are longer than l,
consists of a number of semantically related lines of code. As we delete the begin part of the vector.
aforementioned, we have implemented two different detectors 4) Model Training: The bug detection task can be formu-
to generate code snippets, the VulDeePecker detector and lated as a binary classification problem. We select Bidirec-
SySeVR detector. (1) VulDeePecker detector. For each sensi- tional Long Short-Term Memory (BLSTM) as detecting model
tive point, we trace the backward data flow of corresponding because it can catch the information of both earlier statements

531

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on July 31,2022 at 21:27:07 UTC from IEEE Xplore. Restrictions apply.
and later statements in the program which may affect sensitive TABLE I
E VALUATION R ESULTS ON T RADITIONAL C OMPLEX V ULNERABILITIES
points [4]. The input of the training process is the vectors of VS. SS TU B S
length l and ground truth. Our model consists of an embedding
layer, a BLSTM layer, a dense layer and a softmax layer. For dataset metrics VulDeePecker based SySeVR based
FPR 6% 3%
each bug type, we randomly choose 80% code snippets as CWE-22 P 91% 94%
training dataset, 20% code snippets for evaluation. FNR 2% 12%
ACC 95% 94%
C. Detecting phase FPR 4% 3%
CWE-79 P 90% 94%
Similar to the training phase, given a target source code, we FNR 7% 1%
first extract sensitive points, then generate code snippets for ACC 95% 98%
each sensitive point. Next, we transform code snippets into FPR 31% 33%
vectors and input them to the trained BLSTM model. The ManySStuBs4J P 70% 66%
FNR 28% 38%
output of the model is “0” (the code snippet does not have ACC 71% 65%
SStuB) or “1” (the code snippet has SStuB).
III. E XPERIMENTS AND R ESULTS a proportion of around 1:1 between positive and negative
samples. The test sets (roughly 20% of the dataset) consist
A. Study Design
of 12,338 code snippets for VulDeePecker based method and
1) Research Questions: Our evaluation is driven by the 13,761 code snippets for SySeVR based method.
following research questions (RQs): 3) Model Training: Overall, we have trained six different
RQ1 What is the performance of deep learning based methods models on the three datasets. For each dataset, we have trained
on detecting traditional complex Java vulnerabilities and two models for the two deep learning based techniques. The
SStuBs? length of input vectors l is set to 50. The hidden size, dropout
RQ2 Do the deep learning based methods achieve consistent and recurrent dropout of BLSTM layer are set to 64, 0.5 and
performance on different types of SStuBs? 0.5, respectively. The binary crossentropy loss and ADAMAX
2) Datasets: To compare the effectiveness of these two with default parameters are used for training. The batch size
approaches on detecting both complex vulnerabilities and is 64 and the number of epochs is 50.
SStuBs, we use three large-scale datasets, including CWE- 4) Metrics: We use four widely used metrics including
22, CWE-79 and ManySStuBs4J [11] for evaluation. First, accuracy (ACC), false positive rate (FPR), false negative rate
for the traditional complex vulnerabilities, We make effort to (FNR), and precision (P) to evaluate the performance of bug
craft a benchmark dataset of 3,776 samples for CWE-22 Path detection. Let T P be the number of samples with bugs that
Traversal and a dataset of 4,827 samples for CWE-79 XSS are detected correctly, F P be the number of samples without
(Cross-Site Scripting) from SARD [13] and OWASP [14]. bugs while are detected as vulnerable, T N be the number
We choose CWE-22 and CWE-79 mainly because they are of clean samples that are detected correctly and F N be the
common vulnerabilities and have more available samples. In number of clean samples that are detected as vulnerable. The
each item of the datasets, the buggy or fixing line number, ACC measures the correctness of all detected samples and
bug type, and the path of the source codes are provided. can be denote as ACC = T P +FT PP +T +T N
N +F N . The FPR means
Following previous work[4, 6], we choose the invocation of proportion of false-positive samples in the total samples that
file reading and writing methods as sensitive points for CWE- are not vulnerable and can be calculate by F P R = F PF+T P
N.
22 and invocation of method that sending information to client F N R = T PF+FN
N , means the proportion of false-negative sam-
as sensitive points for CWE-79. Furthermore, we evaluate the ples in the total samples that are vulnerable. P measures the
performance of detecting SStuBs basing on the ManySStuBs4J correctness of detected vulnerable samples and P = T PT+F P
N.
dataset [11]. In each item of the dataset, it provides the line
in which the bug exists in the buggy version of the file, the B. RQ1: Traditional complex vulnerabilities VS. SStuBs
hash of the commit fixing the bug and the hash of the last Performance on Complex Vulnerabilities. Table I shows
commit containing the bug. We harvest related source code the overall results. Obviously, both methods can achieve a
files according to the commit hashes. detection accuracy of 95% with relatively low FPR and FNR.
For both datasets, we deem the code snippets generated from This result is inline with previous studies on C/C++ vulner-
the source code lines before fixing as “1” (has bug), after abilities [4, 6], which suggests that our re-implementation
fixing as “0” (no bug). To ensure the correctness of samples, of these approaches in Java is accurate, and these two deep
we only saved code snippets generated from sensitive points vulnerability detectors are able to well flag traditional complex
of which the line numbers are “bug line num” or “fix line vulnerabilities like CWE-22 and CWE-79.
num”. For CWE-22 and CWE-79, we got over 4,400 and 5,700 Performance on SStuBs The result is shown in Tab.I.
code snippets, respectively. The proportion of positive and The VulDeePecker based method achieves a P of 70% and
negative samples is around 1:2. For ManySStuBs4J, We finally ACC of 71%, which is obviously lower than the performance
got 61,667 code snippets for VulDeePecker based method of detecting traditional vulnerabilities. The SySeVR based
and 68,768 code snippets for SySeVR based method with method has 66% precision detecting SStuBs, and incurs a FPR

532

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on July 31,2022 at 21:27:07 UTC from IEEE Xplore. Restrictions apply.
of 33% and FNR of 38%. Previous work also measured the TABLE II
P ERFORMANCE ON D ETECTING D IFFERENT T YPES OF SS TU B S
proportion of bugs in ManySStuBs4J that can be identified
by popular static analysis tools such as SpotBugs [1], Error Bug Type
FPR
VulDeePecker based
P FNR ACC FPR
SySeVR based
P FNR ACC
Prone [15] and Infer [16]. The overall bug detection rate of CHANGE OPERATOR 67% 36% 66% 34% 55% 35% 74% 35%
CHANGE OPERAND 14% 87% 9% 88% 20% 80% 24% 78%
all three bug detectors together on their studied bugs is only CHANGE IDENTIFIER 25% 75% 22% 77% 28% 72% 33% 70%
CHANGE MODIFIER 68% 12% 89% 22% 61% 19% 78% 32%
4.5% [17]. In another work, researches find that SpotBugs CHANGE NUMERAL 58% 43% 53% 44% 53% 46% 59% 44%
CHANGE CALLER IN FUNCTION CALL 10% 89% 17% 87% 18% 80% 19% 81%
could only locate about 12% of SStuBs while also reporting CHANGE UNARY OPERATOR 53% 50% 43% 52% 51% 50% 51% 49%
OVERLOAD METHOD MORE ARGS 46% 58% 36% 59% 47% 50% 49% 52%
more than 200 million possible false positives. The results OVERLOAD METHOD DELETED ARGS 58% 52% 48% 48% 47% 51% 57% 48%
DIFFERENT METHOD SAME ARGS 19% 81% 15% 83% 20% 79% 23% 78%
show that deep-learning based method may not perform as MORE SPECIFIC IF 20% 78% 25% 77% 24% 77% 32% 72%
LESS SPECIFIC IF 14% 86% 19% 84% 28% 73% 25% 74%
well as traditional complex vulnerabilities in detecting simple SWAP ARGUMENTS 54% 51% 38% 54% 36% 56% 56% 54%
SWAP BOOLEAN LITERAL 86% 24% 70% 22% 67% 27% 76% 29%
bugs like SStuBs, but can outperform existing static analysis
tools. Moreover, the VulDeePecker based method performs
incur high false positive rate or false negative rate. The second
slightly better than SySeVR based method, which suggests
category is machine learning based approaches. Some ap-
that data flow information is more sensitive than control-flow
proaches detecting vulnerabilities according to code similarity,
information to detecting SStuBs.
which obtaining abstract representations of code fragments and
C. RQ2: Performance on different types of SStuBs comparing the similarity between pairs of the representations
[18, 19, 20]. There are also many approaches to detecting well-
We further look in detail on a per-category basis, which is
defined vulnerabilities using machine learning techniques. For
shown in Tab. II (The bug types that can be better detected
example, Yan et al. [21] introduced a static use-After-free
are shown in bold). Note that bug types “DELETE THROWS
detector that bridges the gap between typestate and pointer
EXCEPTION” and “ADD THROWS EXCEPTION” are not
analyses by a Support Vector Machine. Li et al. [4] designed
shown in the table for the sample sizes are small and the
VulDeePecker, embedding codes using data flow information
sensitive points we defined cannot cover all syntax char-
to detecting resource management errors and buffer overflows.
acteristics of these two bug types. Interestingly, we find
Moreover, some researches also pay attention to simple bugs
that the effectiveness of our detecting methods vary among
like simple one-statement bugs. Pradel et al. [22] presented
different types of SStuBs. Both methods are better at de-
DeepBugs, a learning approach to name-based bug detection.
tecting “CHANGE OPERAND”, “CHANGE IDENTIFIER”,
They focused on three kinds of bugs, including accidentally
“CHANGE CALLER IN FUNCTION CALL”, “DIFFERENT
swapped function arguments, incorrect binary operators, and
METHOD SAME ARGS”, “MORE SPECIFIC IF” and “LESS
incorrect operands in binary operations.
SPECIFIC IF” bugs, which can reach a precision of roughly
80% with FNR less than 20%. This is because the sensitive V. C ONCLUSION AND D ISCUSSION
point related identifiers (e.g, method caller) are changed be- In this paper, we empirically analyzed the effectiveness
fore and after fixing, thus leading more differences in code of deep learning based bug detection techniques on locating
snippets. We also notice that similar bugs may appear many SStuBs. The experimental results on ManySStubs4J show
times in one project, and there are some patterns in bug repair that the effectiveness of detecting some types of SStuBs is
(e.g, using LinkedHashMap instead of HashMap), thus leading obviously better than others. One main reason why other kinds
to better detecting performance. The performance on type of bugs cannot be well flagged is that those bugs only involve
“OVERLOAD METHOD MORE ARGS” and “OVERLOAD tiny changes on operator, Boolean value, string and so on,
METHOD DELETED ARGS” are not as good as our antici- which are hard to trace corresponding semantic information
pation. We look into the dataset and find that many arguments by the implemented methods that rely on mainly control-
added or deleted are string, “null” value, Boolean value flow and data-flow information. To better detect other types of
and number that are hard to handle by our methods. Also, SStuBs, it may be helpful to analyze code intent or function
bug types, such as “CHANGE OPERATOR” and “CHANGE from source codes and annotations. Moreover, at present we
NUMERAL” encounter similar problem. The rest kinds of just directly choose all aforementioned syntax characteristics
bugs, “SWAP ARGUMENTS”, cannot be well detected mainly as sensitive points, which probably introduce some useless
because the code snippets before and after fixing are almost information. Refining the sensitive points, e.g, identifying what
same, for the sensitive point related identifiers would not be kinds of functions are more likely to incur SStuBs, might
changed by the order of occurrence. be a useful way for improvement. Our findings suggest that
different kinds of approaches should be combined together
IV. R ELATED W ORK
for better detecting SStuBs, while our study offers practical
A number of static analysis tools and research works have implications on this direction.
been proposed to detect software vulnerabilities. These tools
and research works can be divided into two main categories. ACKNOWLEDGMENT
The first category is traditional static analyzers which de- This work was supported by the National Natural Sci-
tecting vulnerabilities based on predefined patterns, such as ence Foundation of China (grant numbers 62072046 and
SpotBugs [1] and PMD [2]. But these traditional tools often 61702045). Haoyu Wang is the Corresponding author.

533

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on July 31,2022 at 21:27:07 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES Code Analysis and Manipulation(SCAM). USA: IEEE
[1] “spotbugs,” https://spotbugs.github.io/. Computer Society, 2012, p. 14–23. [Online]. Available:
[2] “pmd,” https://pmd.github.io/. https://doi.org/10.1109/SCAM.2012.28
[3] “checkstyle,” http://checkstyle.sourceforge.io/. [16] C. Calcagno, D. Distefano, J. Dubreil, D. Gabi,
[4] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, P. Hooimeijer, M. Luca, P. O’Hearn, I. Papakonstantinou,
Z. Deng, and Y. Zhong, “Vuldeepecker: A deep J. Purbrick, and D. Rodriguez, “Moving fast with soft-
learning-based system for vulnerability detection,” ware verification,” in NASA Formal Methods. Springer
Proceedings 2018 Network and Distributed System International Publishing, 2015, pp. 3–11.
Security Symposium(NDSS), 2018. [Online]. Available: [17] A. Habib and M. Pradel, “How many of all bugs
http://dx.doi.org/10.14722/ndss.2018.23158 do we find? a study of static bug detectors,” in
[5] X. Cheng, H. Wang, J. Hua, M. Zhang, G. Xu, L. Yi, Proceedings of the 33rd ACM/IEEE International
and Y. Sui, “Static detection of control-flow-related Conference on Automated Software Engineering(ASE).
vulnerabilities using graph embedding,” in 2019 24th New York, NY, USA: Association for Computing
International Conference on Engineering of Complex Machinery, 2018, p. 317–328. [Online]. Available:
https://doi.org/10.1145/3238147.3238213
Computer Systems (ICECCS), 2019, pp. 41–50.
[18] J. Jang, D. Brumley, and A. Agrawal, “Redebug: Finding
[6] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, and Z. Chen, “Sy-
unpatched code clones in entire os distributions,” in
sevr: A framework for using deep learning to detect soft-
2012 IEEE Symposium on Security and Privacy(SP).
ware vulnerabilities,” arXiv preprint arXiv:1807.06756,
IEEE Computer Society, may 2012, pp. 48–62.
2018.
[Online]. Available: https://doi.ieeecomputersociety.org/
[7] X. Cheng, H. Wang, J. Hua, G. Xu, and Y. Sui, “Deep-
10.1109/SP.2012.13
wukong: Statically detecting software vulnerabilities us-
[19] S. Kim, S. Woo, H. Lee, and H. Oh, “Vuddy: A scalable
ing deep graph neural network,” ACM Transactions on
approach for vulnerable code clone discovery,” in 2017
Software Engineering and Methodology (TOSEM), 2021.
IEEE Symposium on Security and Privacy (SP), 2017,
[8] “Flawfinder,” https://dwheeler.com/flawfinder/.
pp. 595–614.
[9] “Rats,” https://code.google.com/archive/
[20] H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy,
p/rough-auditing-tool-for-security/.
and C. V. Lopes, “Sourcerercc: Scaling code
[10] “Checkmarx,” https://www.checkmarx.com/, 2017.
clone detection to big-code,” in Proceedings of
[11] R.-M. Karampatsis and C. Sutton, “How often do single-
the 38th International Conference on Software
statement bugs occur? the manysstubs4j dataset,” in
Engineering(ICSE). Association for Computing
Proceedings of the International Conference on Mining
Machinery, 2016, p. 1157–1168. [Online]. Available:
Software Repositories (MSR), 2020.
https://doi.org/10.1145/2884781.2884877
[12] “word2vec,” http://radimrehurek.com/gensim/models/
[21] H. Yan, Y. Sui, S. Chen, and J. Xue, “Machine-
word2vec.html.
learning-guided typestate analysis for static use-after-free
[13] “Software assurance reference dataset,” https://samate.
detection,” in Proceedings of the 33rd Annual Computer
nist.gov/SARD/index.php, 2017.
Security Applications Conference(ACSAC). Association
[14] “Owasp benchmark,” https://owasp.org/
for Computing Machinery, 2017, p. 42–54. [Online].
www-project-benchmark/.
Available: https://doi.org/10.1145/3134600.3134620
[15] E. Aftandilian, R. Sauciuc, S. Priya, and S. Krishnan,
[22] M. Pradel and K. Sen, “Deepbugs: A learning approach
“Building useful program analysis tools using an
to name-based bug detection,” Proc. ACM Program.
extensible java compiler,” in Proceedings of the 2012
Lang., vol. 2, no. OOPSLA, Oct. 2018. [Online].
IEEE 12th International Working Conference on Source
Available: https://doi.org/10.1145/3276517

534

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on July 31,2022 at 21:27:07 UTC from IEEE Xplore. Restrictions apply.

You might also like