0% found this document useful (0 votes)

18 views

Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera

Uploaded by

micktyson0222

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera

Uploaded by

micktyson0222

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/360614593

Systematic Analysis of Deep Learning Model for Vulnerable Code Detection

Conference Paper · June 2022

DOI: 10.1109/COMPSAC54236.2022.00281

CITATIONS READS
6 457

5 authors, including:

Mohammad Nazim Md Jobair Hossain Faruk

Kennesaw State University Kennesaw State University
18 PUBLICATIONS 85 CITATIONS 58 PUBLICATIONS 637 CITATIONS

SEE PROFILE SEE PROFILE

Hossain Shahriar Mohammad Masum

Kennesaw State University Kennesaw State University
316 PUBLICATIONS 3,050 CITATIONS 44 PUBLICATIONS 481 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Md Jobair Hossain Faruk on 15 May 2022.

The user has requested enhancement of the downloaded file.

Systematic Analysis of Deep Learning Model for
Vulnerable Code Detection
Mohammad Taneem Bin Nazim∗ , Md Jobair Hossain Faruk† , Hossain Shahriar‡ , Md Abdullah Khan∗
Mohammad Masum§ , Nazmus Sakib¶ , Fan Wu∥
∗ Department Computer Science, Kennesaw State University, USA
† Department of Software Engineering and Game Development, Kennesaw State University, USA
‡ Department of Information Technology, Kennesaw State University, USA
§ School of Data Science, Kennesaw State University, USA
¶ Department of Computer Science and Engineering, University at Buffalo, USA
∥ Department of Computer Science, Tuskegee University, USA

{∗ mnazim, † mhossa21}@students.kennesaw.edu, {‡ hshahria, ∗ mkhan74, § mmasum}@kennesaw.edu

{¶ nsakib2}@buffalo.edu, {∥ fwu}@tuskegee.edu

Abstract—Software vulnerabilities have become a serious prob-

lem with the emergence of new applications that contain po-
tentially vulnerable or malicious code that can compromise the
system. The growing volume and complexity of software source
codes have opened a need for vulnerability detection methods
to successfully predict malicious codes before being the prey
of cyberattacks. As leveraging humans to check sources codes
requires extensive time and resources and preexisting static
code analyzers are unable to properly detect vulnerable codes.
Thus, artificial intelligence techniques, mainly deep learning
models, have gained traction to detect source code vulnerability.
A systematic review is carried out to explore and understand
the various deep learning methods employed for the task and
their efficacy as a prediction model. Additionally, a summary
of each process and its characteristics are examined and its
implementation on specific data sets and their evaluation will Fig. 1. CVE Vulnerabilities detected from 1999 to 2022 [42]
be discussed.
Index Terms—Deep Learning, Software Security, Source code
Vulnerability
private information of political figures and extortion scams that
blackmailed people to pay a ransom [39].
I. INTRODUCTION
The prevalence of software vulnerabilities still exist in spite
The growing number of software applications in the modern of academics and industries efforts to enhance software qual-
world has also seen a growing number of cyber security attacks ity. The main two reasons for this phenomenon is widespread
have plagued the information era. The Mitre organization in usage of open-software and code reuse that contain these flaws
charge of the Common Vulnerability and Exposures (CVE) [40] and the increase in the number of internet user which
term software vulnerability as a: “A flaw in the computational also increased the number of attacks [38]. Thus, to prevent
logic, such as code, is identified in software and hardware cyberattacks that cause data leakage of sensitive information,
components that, when exploited, has a detrimental influence denial-of service condition, lose control of software or even
on confidentiality, integrity, or availability” [37]. suspend production of firms [41] deep learning approaches can
The number of software vulnerabilities have only increased be implemented to detect vulnerabilities existing in the source
over time where CVE reports 20149 vulnerabilities reported code of the software.
in 2021 while 4155 vulnerabilities detected in 2011 as shown Deep learning is a subset of machine learning in Artificial
extensively in Figure 1. Recently, Apple suffered an iMessage Intelligence that aims to imitate the human brain and nervous
vulnerability, FORCEDENTRY exploit, by Pegasus spyware system. Deep learning techniques tend to be superior than
that bypassed the iOS BlastDoor security feature to deploy traditional rule-based methods generated manually by human
the spyware and create malicious webpages when iPhone or experts, which are can be imprecise, resource-intensive, time-
iPads access the internet [39]. This attack implemented integer consuming, subjective to the software and entirely dependent
overflow as tracked by CVE [37] and resulted in snooping of on the knowledge and experience of the expert, or by static
tools such as Checkmarx, Flawfinder etc. that have high false- TABLE II
positive rates and/or high false-negative rates [40]. Deep learn- G ENERALIZED TABLE FOR S EARCH C RITERIA
ing techniques can learn features automatically to demonstrate
Scientific
better generalization ability than manually determined fea- Initial Keyword Search Total Inclusion
Databases
tures. So, success and efficacy of the vulnerability prediction IEEE Xplore 161 5
model is mediated by the sophistication of the deep neural arXiv e-Print Archive 42 3
ACM Digital ibrary 50 1
networks of the model in deep learning rather than the feature Springer 25 1
selection used in conventional machine learning. Google Scholar 50 0

A. Contributions of this Survey

Few surveys exist that explicitly conduct the investigation of to our study. A total of 303 research papers were procured
deep learning models in software vulnerability detection. This in-depth screening process that accounted for the publication
survey aims to expand on this research field and provide a title, abstract, experimental results and conclusions shortened
different perspective. An overview of the nature of datasets the list to 10 papers for our study.
that are used to train the deep learning models such as:
synthetic codes, semi-synthetic codes and real codes. An in-
A. Related Reviews
depth perspective on the popular source code representations
to capture semantic information of unlabelled source code are Multiple studies have been conducted to systematically
explored. Furthermore, deep learning models in vulnerability summarize the machine learning technology, deep learning
detection are presented with its evaluation metrics. technology or state-of-the-art research achievements in source
code vulnerability analysis. Ghaffarian and Shahriari [51]
II. R ESEARCH M ETHODOLOGY provided an extensive and detailed review of the traditional
A systematic literature review has been conducted to find machine learning and data mining techniques for detection
existing papers related to our topic of research. Thus, a “Search and analysis of software vulnerability. This research concluded
Process” was implemented to find such papers for our study the immature state of machine learning techniques but did not
[43], [44]. The specific search strings had the keywords, explore deep learning techniques. Malhotra [52] came to a
“Source Code Vulnerability Detection”, “Software Vulnerabil- similar conclusion after reviewing 64 preliminary studies of
ity Detection using Deep Learning”, “Automatic Detection OR machine learning techniques used on software fault prediction.
Discovery of vulnerability”, “Deep Learning detect Software Radjenovic et al. [45] conducted a systematic literature
Vulnerability”. An exclusion process was utilized to exclude review on the software metrics for software fault solutions.
papers unrelated to our goal or duplicates of the selected paper. However, software fault prediction has very limited application
The scientific databases considered were: (i) IEEE Xplore (ii) and lacks relevance to software vulnerability detection [46]. M.
arXiv e-Print Archive (iii) ACM Digital ibrary and (iv) Google Masum et al. [47] introduced a novel Bayesian optimization-
Scholar. based framework for the automatic optimization of hyper-
parameters to ensure the best deep neural network architec-
TABLE I ture. The authors also introduced a feature selection-based
I NCLUSION AND EXCLUSION CRITERIA FOR THE PRIMARY STUDIES framework by adopting different machine learning algorithms
Condition (Inclusion) Condition (Exclusion) including neural network-based architectures to classify the
The study must be related to Software The studies not focusing on security level for ransomware detection and prevention by ap-
Vulnerability that implement Deep software vulnerability or imple- plying Decision Tree (DT), Random Forest (RF), Naive Bayes
Learning methods specifically ments other models such as ma-
chine learning trained models (NB), Logistic Regression (LR) as well as Neural Network
Papers are not duplicated in different Similar papers in different (NN)-based classifiers [48]. M.J.H. Faruk et al. emphasized
databases databases Artificial Intelligence (AI) based techniques for detecting and
Peer-reviewed papers published in a Non-peer-reviewed papers
conference proceeding or journal preventing malware activity. Both machine learning and deep
Studies that are available in the full Studies are not available fully learning methods, techniques, and approaches were presented
format and in English and Non-English studies to detect and prevent malware [49].

The initial keyword search process had undergone a fil- III. S OURCE C ODE R EPRESENTATION
tration procedure that only selected papers published in the
last 5 years from 2017 to 2022. Additional restraints were Source code representation is an essential step to decompose
placed in each of the scientific databases to find relevant the input sample of source code to only contain important
research material. IEEE Xplore included only Conferences and syntactic and semantic information by removing unnecessary
Journals while ACM required filters specifying Journals and lines, comments, spaces. While many representations exist,
Research Articles. arXiv e-Print Archive did not require any this study will aim to present the state-of-the-art methods
predefined filters nor did Google Scholar. However, Google utilized to capture the structural and semantic information
Scholar failed to provide any unique research papers related from the source code for feature extraction.
A. Abstract Syntax Tree (AST) operators, separators while excluding irrelevant code in terms
Abstract Syntax Tree (AST) is the tree representation of a of compilation. The lexer converted the code to tokens of
source code that can capture the abstract syntax structure and three different types. String, character, and float literals were
semantics of the code block that allows the source code to be lexed to type-specific placeholders while integer literals were
analyzed statically [17, 18]. This procedure allows the partition tokenized digit-by-digit due to relevance to vulnerability de-
of the initial input source code into smaller parts and achieves tection. Types and function calls from common libraries are
greater granularity of the function-level source code [13]. An mapped to their generic versions. Zheng et al. [19] imple-
AST can be acquired by using a parser such as CodeSensor mented this custom lexer for word-level tokenization on texts
[18], [19], or Pycparser [7]. While these ASTs can be directly from the Draper VDISC dataset.
used, it lacks granularity due to codes being large or complex E. Semantics-based Vulnerability Candidates (SeVC)
[16]. Thus, in [7] the AST tree, which can be considered
Semantics-based Vulnerability Candidates are the various
an m-ary tree, enforces rules to convert it into a complete
statements that are semantically related to the Syntax-Based
binary AST tree to preserve its structural relations from the
Vulnerability Candidates (SyVC) by extraction of its program
AST nodes. Similarly, an RNN model has also been proposed
slices [19]. In order to conceptualize SeVCs the term SyVC
called Tree-LSTM which leverages its bottom-up calculation
needs to be explored. SyVCs extraction requires the conversion
to integrate outputs of all AST child nodes to construct a
of source code to an AST which contains multiple consecutive
binary AST tree [18], [20].
tokens. The AST is traversed to locate a code element that
B. Code Gadgets matches the defined vulnerability syntax which is labelled as
a SyVC [4]. SyVCs are transformed into SeVCs by program
Code gadgets are a composition of multiple lines of program
slicing [22] to capture the semantic relation of statements
statements that have a semantic correlation in terms of data
based on data dependency and control dependency. Joern [24]
dependency and control dependency [3] by implementing
tool extracts PDGs of each SyVC; then, program slices are
program slicing [22]. Program slicing can be categorized into
generated from interprocedural forward and backward slices
two type of slices: forward and backward [3]. Forward slices
[27] which are transformed to SeVCs [4].
are the slices of code that are received from an external input,
such as file o sockets; whereas, backward slices do not receive IV. D EEP L EARNING M ODELS
an direct input externally from the environment in which the Deep Neural Networks were inspired from the biological
program is run [2], [21]. This decomposition of programs by aspects of the human brain and nervous system. It forms
analyzing their data flow and control flow [22] allows the networks similar to the human nervous system and mimics
reduction of the lines of codes and focuses on the key points of the thought mechanisms of humans by training itself with
library/API function calls, arrays, or pointers. VulDeePecker the data provided. Deep learning has been successful in
[3] only accounts for data dependency in the code gadgets image classification and captures nonlinear effects of variables
[8], [21] as it uses the commercial tool Checkmarx [25] and with high-level feature representation. Thus, deep learning
performs forward and backward slices for each argument and techniques have been implemented to extract features from
then assembled to form code gadgets. In Zagane’s model [23] texts (source code) and then train its model to understand and
the dataset contains 420,627 code slices which consist of detect the vulnerability in the software.
56395 vulnerable code slices and the code metrics of each
slice is calculated for their deep learning model. A. CNN (Convolutional Neural Network)
CNN is a deep learning model that was introduced mainly
C. Code Property Graph (CPG)
to analyze features in images but it has also been implemented
Code Property Graph (CPG) is an amalgamation of classical on feature extraction of source codes to learn its vector
data structures representing a source i.e. abstract syntax tree, representations [12]. Furthermore, since codes do not possess
control flow graph (CFG), and program dependence graph extensive features that are present in images, CNN can extract
(PDG) [26]. The REVEAL [1] vulnerability prediction frame- the features as well as understand the structure of the program
work utilizes a modified CPG rather than the data structure by learning patterns and relationships between contexts in the
represented by Yamaguchi et al. [26] by adding a data-flow source code [9]. In the study of Russell et al. [6] the CNN
graph (DFG) in conjunction with the existing CPG to capture implemented had a filter size of 9 and 512 filters paired with
additional context about the semantic in the code. Devign [5] batch normalization and ReLU to determine the sequential
takes a similar approach of CPG in source code representation tokens.
by merging the concepts of AST, CFG, DFG, and Natural
Code Sequence (NCS) into a joint graph. B. RNN (Recurrent Neural Network)
RNN allows longer token dependencies to be extracted
D. Lexed Representation than CNN [6] as its “memory” contains information from the
Russell et al. [6] constructed a custom lexer for C/C++ previous and next tokens [9]. RNN used in Russel et al. [6]
code that formed a code representation of 156 tokens as had a hidden size of 256 and was maxpooled to generate a
the vocabulary size. This methodology included keywords, fixed size of vector representations.
LSTM is an extended version of the RNN architecture A. NVD and SARD
that can learn long-term dependencies. LSTM can overcome Software Assurance Reference Dataset (SARD) [28] con-
the limitations of a conventional statistical model (such as tains production, synthetic, and academic security flaws or
ARIMA) by capturing non-linearity of sequential data and vulnerabilities, and National Vulnerability Database (NVD)
simultaneously generating more precise forecasting for time- [29] contains vulnerabilities in production software [3]. The
series data [11]. LSTM addresses the vanishing gradient prob- dataset is made up of C/C++ programs and software products
lem of RNN [50]. The building block of LSTM architecture [4] which contain 61,638 code gadgets, including 17,725 code
is a memory block, which consists of a memory cell that gadgets that are vulnerable and 43,913 code gadgets that
can preserve information of the preceding time step with are not vulnerable [31] as shown in Figure 5. Among the
self-recurrent connections. BLSTM is a variation of LSTM 17,725 code gadgets that are vulnerable, 10,440 code gadgets
deep learning model but builds upon the one-way model by correspond to buffer error vulnerabilities (CWE-119) and the
pairing it with another CNN model called GRU to enforce rest 7,285 code gadgets correspond to resource management
two-way LSTM i.e. BLSTM. µVulDeePecker [8] implemented error vulnerabilities (CWE-399) [3]. The dataset is named
a multi-feature fusion method of BLSTM that could extract as Code Gadget Database (CGD) [31] and its extensions are
information from global-feature learning model, local-feature Semantics-based Vulnerability Candidate (SeVC) dataset [32]
learning model and feature-fusion model. (used during SeSeVR [4]) and Multiclass Vulnerability Dataset
GRU is an alternative version of LSTM that was intro- (MVD) [33] (used during µVulDeePecker [8]).
duced to avoid the vanishing gradient problem and boost
the efficiency of LSTM [17]. GRU has a less complicated B. Draper VDISC
architecture than LSTM, with a reduced number of parameters The dataset consists of the source code of 1.27 million
to learn. It consists of two gates: update and reset gates, (In functions mined from open source software, labelled by static
comparison, three gates are included in LSTM.) GRU solves analysis for potential vulnerabilities [30]. Draper VDISC
the vanishing gradient problems of RNN by leveraging the dataset has C/C++ codes from open source projects: Debian
two gates through controlling what information should be Linux Distribution [34], Public git repositories on GitHub
passed to the future states [14]. The input and forget gates [35] and, SATE IV Juliet Test Suite of NIST Samate project
in LSTM models are integrated into the GRU update gate, [36] where the first 2 source projects are real and the last
which determines how much information from previous steps one is synthetic. The Debian package releases provide a
should be passed to the future time steps. Similarly, to the selection of very well-managed and curated code while, the
output gate in LSTM, the reset gate in the GRU incorporates GitHub dataset provides a larger quantity and wider variety of
new input and previous memory and determines how much (often lower-quality) code and the SATE IV Juliet Test Suite
memory of past information should be forgotten. contains synthetic code examples with vulnerabilities from 118
different Common Weakness Enumeration (CWE) [6], [7].
V. DATASETS
C. REVEAL
The datasets observed in this field of research when im-
plementing Deep Learning models are source code in the ReVeal is a real-world source code dataset where vulnera-
C/C++ language. These datasets are popular because an in- bilities are tracked from Linux Debian Kernel and Chromium
depth review has been done in quantifying the amount of open-source projects. This dataset contains C/C++ sources and
vulnerable and malicious code present in the dataset as shown are well-maintained public projects with large evolutionary
in Figure 2. history and both represent important program domains (OS
and browsers) and plenty of publicly available vulnerability
reports.[1] Fixed issues with publicly available patches can be
collected using Bugzilla for Chromium and Debian security
tracker for Linux Debian Kernel. The ReVeal dataset contains
a total of 22,734 samples, with 2240 non-vulnerable and
20,494 vulnerable samples as seen in Figure 3.
D. FFMPeg+Qemu
FFMPeg+Qemu is a balanced, real world dataset collected
from Github repositories that consist of 4 large C-language
open-source projects that are popular among developers and
diversified in functionality, i.e., Linux Kernel, QEMU, Wire-
shark, and FFmpeg [5]. The labelling of this dataset was done
manually based on commit messages and domain experts [15].
Figure 5 presents the balanced nature of the dataset with
Fig. 2. Overview of Datasets 23355 non-vulnerable commits messages and 25332 which are
vulnerbale.
TABLE III
D EEP L EARNING M ODELS F OR V ULNERABILITY A NALYSIS

Author Dataset Dataset Type Feature Representation Source Code Representation Vector Representation DL Models

Russel et al. [6] DRAPER VDISC Synthetic, Semi-synthetic, Real Token NLP approach (Convolutional and Recurrent Feature Extraction) word2vec CNN + RF, RNN + RF

S. Chakraborty et al. [1] REVEAL, FFMPeg+Qemu Real Graph CPG word2vec GGNN + MLP + Triplet Loss

G.Tang et al. [2] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets doc2vec KELM

Z. Bligin et al. [7] DRAPER VDISC Synthetic, Semi-synthetic, Real Token Binary AST Array Representation MLP, CNN

Z.Li et al. [3] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets word2vec BLSTM

D.Zou et al. [8] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets + Code Attention word2vec BLSTM

Z.Li et al. [4] SARD & NVD Synthetic, Semi-synthetic Token SeVCs word2vec BGRU

Y.Zhou et al. [5] FFMPeg+Qemu Real Graph AST+CFG+DFG+NCS word2vec GGNN

S. Liu et al. [18] FFMPeg+Qemu Real Graph AST word2vec BLSTM

Guo et al. [18] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets word2vec CNN+LSTM

VI. C HALLENGES AND F UTURE W ORK specific network in the midst of all the neural networks in
the deep learning models. Further research can lead to better
The review acknowledges the infancy of the research field
source code analysis which can explain deep learning-based
of deep learning-based software vulnerability detection. There
models in vulnerability prediction.
are multiple problems that are unresolved but this indicates the
need for further research to solve these issues for better discov- VII. C ONCLUSION
ery of vulnerable codes. Thus, we have compiled a number of The emergence of vast software applications with increasing
challenges and possible future research directions which have complexity and this continual popularity of software in the
been assessed by the conclusive remarks of previous works. information era will require automatic deep learning-based
Dataset is an integral part of training a vulnerability predic- models to learn and detect vulnerabilities in these softwares.
tion model. As surveyed in this paper, various studies utilize In this survey, we review the studies conducted on the im-
various datasets, like SARD & NVD or REVEAL dataset, to plementation of deep learning technology for source code
train their models which indicates the lack of a standardized vulnerability analysis. At first, a general overview is discussed
benchmarking dataset that covers most CWE vulnerabilities. in the formation of deep learning models for source code
Thus, a unified and standardized metric for evaluations of the vulnerability detection. A detailed summary of the state-of-
deep learning-based models cannot be produced. Furthermore, the-art source code representations are explored which can
deep learning models require huge amounts of training data decompose the text of source code and still conserve its
to provide excellent performance but the current datasets behaviour to feed into the deep learning models for training.
are insufficient in this regard. Thus, development of such Next, the deep learning models implemented for vulnerability
vulnerability datasets is an essential direction for research to detection are outlined with brief examples of of the hidden
resolve the issue of dataset with ground truth. The ratio of layers used in the trained models. Lastly, based on the existing
vulnerable code to non-vulnerable code tends to be stagger- studies mentioned in this paper, the challenges and directions
ingly unbalanced as seen in Figure 5. Non-vulnerable codes for future work in this research field are stated.
are abundant but vulnerable codes are a huge minority that
results in deep learning vulnerability prediction models to have ACKNOWLEDGMENT
insufficient training data for detection of the vulnerabilities and The work is partially supported by the U.S. National Sci-
cause overfitting in the model. There exists class imbalance in ence Foundation (NSF) Awards 2100134, 2100115, 1723578,
specific CWE vulnerabilities as well such as CWE 469 present 1723586, and SunTrust Fellowship Award. Any opinions,
in the Draper VDISC dataset leading to poor prediction where findings, and conclusions or recommendations expressed in
as CWE 119 exists in multitude leading to best prediction of this material are those of the authors and do not necessarily
this vulnerability in the model during cross validation [7] reflect the views of the NSF amd SunTrust.
Deep learning models possess nonlinearity and hidden lay-
ers that make it difficult to interpret the behaviour that lead R EFERENCES
to a vulnerability prediction begs the following questions: Is [1] S. Chakraborty, R. Krishna, Y. Ding and B. Ray, “Deep Learning based
the model accurate? Is the the prediction on vulnerability Vulnerability Detection: Are We There Yet,” in IEEE Transactions on
Software Engineering, doi: 10.1109/TSE.2021.3087402.
discovery reliable? What is the reason for the classification of [2] T. Gaigai, Y. Lin, R. Shuangyin, M. Lianxiao, Y. Feng, W. Huiqiang.
vulnerable or non-vulnerable in a specific piece of the source (2021). An Automatic Source Code Vulnerability Detection Approach
code? This problem has been tackled by two methods by re- Based on KELM. Security and Communication Networks. 2021. 1-12.
10.1155/2021/5566423.
searchers. Firstly, the use of LIME [10] to create linear models [3] Z. Li et al. , “VulDeePecker: A deep learning-based system for vulnerabil-
of the neural networks for simple interpretation. Secondly, the ity detection,” presented at the Proceedings 2018 Network and Distributed
introduction of code attention to the source code representation System Security Symposium, 2018.
[4] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, Z. Chen, S. Wang, and J.
[8] that enables researchers to obtain the attention vectors and Wang, “Sysevr: A framework for using deep learning to detect software
quantitatively measure how much attention was given to a vulnerabilities,” arXiv preprint arXiv:1807.06756, 2018.
[5] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective [29] NIST Software Assurance Reference Dataset Project, 2022. Accessed:
vulnerability identification by learning comprehensive program semantics May 4, 2022. [Online] https://samate.nist.gov/SRD/index.php.
via graph neural networks,” in Advances in Neural Information Processing [30] Louis Kim and Rebecca Russell, Draper VDISC Dataset - Vulnerability
Systems, 2019, pp. 10 197-10 207 Detection in Source Code, 2021. Accessed: May 4, 2022. [Online]
[6] R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, https://osf.io/d45bw/
P. Ellingwood, and M. McConley, “Automated vulnerability detection in [31] NDSS, Database of “VulDeePecker: A Deep Learning-Based System
source code using deep representation learning,” in Proceedings of the for Vulnerability Detection”, 2018. Accessed: May 4, 2022. [Online]
17th IEEE International Conference on Machine Learning and Applica- https://github.com/CGCL-codes/VulDeePecker
tions (ICMLA 2018). IEEE, 2018, pp. 757- 762.
[32] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, Z. Chen. SySeVR: “A Framework
[7] Z. Bilgin, M. A. Ersoy, E. U. Soykan, E. Tomur, P. Comak and L.
for Using Deep Learning to Detect Software Vulnerabilities”. IEEE
Karacay, “Vulnerability Prediction From Source Code Using Machine
TDSC. 2021. doi: 10.1109/TDSC.2021.3051525.
Learning,” in IEEE Access, vol. 8, pp. 150672-150684, 2020, doi:
10.1109/ACCESS.2020.3016774. [33] Multiclass Vulnerability Dataset (MVD), 2019. Accessed: May 4, 2022.
[8] D. Zou, S. Wang, S. Xu, Z. Li and H. Jin, “VulDeePecker: A Deep [Online] https://github.com/muVulDeePecker/muVulDeePecker
Learning-Based System for Multiclass Vulnerability Detection,” in IEEE [34] Debian-The Universal Operating System. Accessed: May 4, 2022. [On-
Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. line]. Available: https://www.debian.org/
2224-2236, 1 Sept.-Oct. 2021, doi: 10.1109/TDSC.2019.2942930. [35] GithubâDistributed Version Control Software. Accessed: May 4, 2022.
[9] Wu, Jiajie. “Literature review on vulnerability detection using NLP [Online]. Available: https://github.com/
technology.” arXiv preprint arXiv:2104.11230 (2021). [36] P. E. Black and P. E. Black, Juliet 1.3 Test Suite: Changes From 1.2.
[10] X. Rong, “word2vec parameter learning explained”, arXiv preprint Gaithersburg, MD, USA: US Department of Commerce, National Institute
arXiv:1411.2738, 2014. of Standards and Technology, 2018.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, [37] “CVE website,” Accessed: May 4, 2022. [Online]. Available:
“Smote: synthetic minority over-sampling technique”, Journal of artificial https://cve.mitre.org.
intelligence research, vol. 16, pp. 321-357, 2002
[12] J. Wang, M. Huang, Y. Nie and J. Li, “Static Analysis of Source [38] Anjum N, Latif Z, Lee C, Shoukat IA, Iqbal U. MIND: A Multi-Source
Code Vulnerability Using Machine Learning Techniques: A Survey,” Data Fusion Scheme for Intrusion Detection in Networks. Sensors. 2021;
2021 4th International Conference on Artificial Intelligence and Big Data 21(14):4941. https://doi.org/10.3390/s21144941
(ICAIBD), 2021, pp. 76-86, doi: 10.1109/ICAIBD51990.2021.9459075. [39] https://cybersecurityworks.com/blog/cyber-risk/pegasus-spyware-
[13] “Common weakness enumeration,” CWE. [Online]. Available: snoops-on-political-figures-worldwide.html
https://cwe.mitre.org/top25/archive/2021/2021cwetop25.html. [Accessed: [40] A. Ramadan, A. Bahaa, O. Ghoneim. A systematic review of the
23-Feb-2022]. literature on software vulnerabilities detection using machine learning
[14] A. ASSARAF. This is what your developers are doing 75% of the time, methods. Information Bulletin in Computers and Information , 4, 1, 2022,
and this is the cost you pay. Available: https://coralogix.com/loganalytics- 1-9. doi: 10.21608/fcihib.2022.87660.1058
blog/this-is-what-your-developers-are-doing-75-of-the-timeand-this-is- [41] Y. Kageyama (2022). Toyota’s Japan Production Halted Over
the-cost-you-pay/ Suspected Cyberattack. ABC News. Accessed: May 4, 2022. [Online]
[15] Zhuang, Yufan, et al. “Software Vulnerability Detection via Deep https://abcnews.go.com/Technology/wireStory/toyotas-japan-production-
Learning over Disaggregated Code Graph Representation.” arXiv preprint halted-suspected-cyberattack-83155113
arXiv:2109.03341 (2021). [42] “CVE website,” CVE vulnerability data, 2022. Accessed: May 4, 2022.
[16] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?”, [Online] https://www.cvedetails.com/browse-by-date.php
presented at the Proceedings of the 22nd ACM SIGKDD International [43] M.J.H. Faruk, S. Santhiya, S. Hossain, V. Maria, X. Li. (2022).
Conference on Knowledge Discovery and Data Mining, 2016. “Software Engineering Process and Methodology in Blockchain-Oriented
[17] Chimmula, Vinay Kumar Reddy, and Lei Zhang. “Time series forecast- Software Development: A Systematic Study”. 20th IEEE/ACIS SERA
ing of COVID-19 transmission in Canada using LSTM networks.” Chaos, 2022.
Solitons Fractals 135 (2020): 109864.
[18] S. Liu, G. Lin, Q. L. Han, S. Wen, J. Zhang and Y. Xiang, “DeepBalance: [44] U. Paramita, M.J.H. Faruk, N. Mohammad, M. Mohammad, S. Hossain,
Deep-Learning and Fuzzy Oversampling for Vulnerability Detection,” in U. Gias, B. Shabir, R. Akond, A. Sheikh. (2022). “Evolution of Quantum
IEEE Transactions on Fuzzy Systems, vol. 28, no. 7, pp. 1329-1343, July Computing: A Systematic Survey on the Use of Quantum Computing
2020, doi: 10.1109/TFUZZ.2019.2958558. Tools”. 1st Int. Conf. on AI in Cybersecurity (ICAIC).
[19] W. Zheng, A. O. A. Semasaba, X. Wu, S. A. Agyemang, T. Liu and [45] D. Radjenovic, M. Hericko, R. Torkar, and A. zivkovic, “Software fault
Y. Ge, “Representation vs. Model: What Matters Most for Source Code prediction metrics: A systematic literature review,” Inf. Softw. Technol.,
Vulnerability Detection,” 2021 IEEE SANER, 2021, pp. 647-653, doi: vol. 55, no. 8, pp. 1397-1418, Aug. 2013.
10.1109/SANER50967.2021.00082. [46] D. Votipka, R. Stevens, E. Redmiles, J. Hu, and M. Mazurek, “Hackers
[20] Zeroual A, Harrou F, Dairi A, Sun Y. Deep learning methods for vs. Testers: a comparison of software vulnerability discovery processes,”
forecasting covid19 time-series data: a comparative study. Chaos, Solit in Proc. IEEE Symp. Secur. Privacy (SP), May 2018, pp. 374-391.
Fractals 2020;140:110121. [47] M. Mohammad, S. Hossain, H. Hisham, M.J.H. Faruk, V. Maria, K. Md,
[21] Z. Li, D. Zou, J. Tang, Z. Zhang, M. Sun and H. Jin, “A Comparative R. Mohammad, A. Muhaiminul, C. Alfredo, W. Fan. (2021). “Bayesian
Study of Deep Learning-Based Vulnerability Detection System,” in IEEE Hyperparameter Optimization for Deep Neural Network-Based Network
Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2930578. Intrusion Detection”. 10.1109/BigData52589.2021.9671576.
[22] M. Weiser, “Program Slicing,” in IEEE Transactions on Soft- [48] M. Mohammad, M.J.H. Faruk, S. Hossain, Q. Kai, L. Dan, A.
ware Engineering, vol. SE-10, no. 4, pp. 352-357, July 1984, doi: Muhaiminul. (2022). “Ransomware Classification and Detection With
10.1109/TSE.1984.5010248. Machine Learning Algorithms”. 10.1109/CCWC54503.2022.9720869.
[23] M. Masum, M.A. Masud, M. I. Adnan, H. Shahriar, S. Kim, “Com-
[49] M.J.H. Faruk, S. Hossain, V. Maria, B. Farhat, S. Shahriar, K. Abdullah,
parative study of a mathematical epidemic model, statistical modeling,
W. Michael, C. Alfredo, L. Dan, R. Akond, W. Fan. (2021). “Mal-
and deep learning for COVID-19 forecasting and management, Socio-
ware Detection and Prevention using Artificial Intelligence Techniques”.
Economic Planning Sciences”, Volume 80, 2022, 101249, ISSN 0038-
10.1109/BigData52589.2021.9671434.
0121, https://doi.org/10.1016/j.seps.2022.101249.
[24] “Joern,” May 11, 2019, Accessed: May 4, 2022. [Online]. Available: [50] K. Kim, D. K. Kim, J. Noh and M. Kim, “Stable Forecasting of
https://github.com/ ShiftLeftSecurity/joern Environmental Time Series via Long Short Term Memory Recurrent
[25] Checkmarx, Accessed: May 4, 2022. https://www.checkmarx.com/. Neural Network,” in IEEE Access, vol. 6, pp. 75216-75228, 2018, doi:
[26] F. Yamaguchi, N. Golde, D. Arp and K. Rieck, “Modeling and Discover- 10.1109/ACCESS.2018.2884827.
ing Vulnerabilities with Code Property Graphs,” 2014 IEEE Symposium [51] S. M. Ghaffarian and H. R. Shahriari, âSoftware vulnerability analysis
on Security and Privacy, 2014, pp. 590-604, doi: 10.1109/SP.2014.44. and discovery using machine-learning and data-mining techniques: A
[28] NVD, National Vulnerability Database, 2022. Accessed: May 4, 2022. survey,’ ACM Comput. Surv., vol. 50, no. 4, p. 56, Nov. 2017.
[Online] https://nvd.nist.gov/. [52] R. Malhotra, “A systematic review of machine learning techniques for
[27] F. Tip, “A survey of program slicing techniques,” J. Prog. Lang., vol. 3, software fault prediction,” Appl. Soft Comput., vol. 27, pp. 504-518, Feb.
no. 3, 1995. 2015.

View publication stats

Cybersecurity Fundamentals: Understand the Role of Cybersecurity, Its Importance and Modern Techniques Used by Cybersecurity Professionals (English Edition)
From Everand
Cybersecurity Fundamentals: Understand the Role of Cybersecurity, Its Importance and Modern Techniques Used by Cybersecurity Professionals (English Edition)
Rajesh Kumar Goutam
No ratings yet
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
No ratings yet
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
68 pages
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
7 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
1 s2.0 S0167404822004096 Main
No ratings yet
1 s2.0 S0167404822004096 Main
11 pages
CVEdB - Common Vulnerabilities and Exploits Database
No ratings yet
CVEdB - Common Vulnerabilities and Exploits Database
7 pages
MALWAREANALYSISUSINGCUCKOOSANDBOX_ITTARSVERSI2
No ratings yet
MALWAREANALYSISUSINGCUCKOOSANDBOX_ITTARSVERSI2
10 pages
Usage of Machine Learning in Software Testing
No ratings yet
Usage of Machine Learning in Software Testing
15 pages
Machine_Learning_and_Deep_Learning_Approaches__FOR DEEP LEARNING_for_CyberSecuriy_A_Review
No ratings yet
Machine_Learning_and_Deep_Learning_Approaches__FOR DEEP LEARNING_for_CyberSecuriy_A_Review
13 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
QNLP
No ratings yet
QNLP
20 pages
EvaluatingDeepLearning-basedNIDSinAdversarialSettings
No ratings yet
EvaluatingDeepLearning-basedNIDSinAdversarialSettings
11 pages
Using ML and Data-Mining Techniques in Automatic Vulnerability Software Discovery
No ratings yet
Using ML and Data-Mining Techniques in Automatic Vulnerability Software Discovery
18 pages
education-14-01330-v2
No ratings yet
education-14-01330-v2
20 pages
A Short Study On The Current Status of Web Applications Security in Africa and Across The World
No ratings yet
A Short Study On The Current Status of Web Applications Security in Africa and Across The World
10 pages
Ransomware Prevention and Mitigation Techniques
No ratings yet
Ransomware Prevention and Mitigation Techniques
10 pages
Blockchain and AI For Data Security in Distributed Systems
No ratings yet
Blockchain and AI For Data Security in Distributed Systems
6 pages
Where can buy Advances in Computers 56 First Edition Marvin Zelkowitz ebook with cheap price
No ratings yet
Where can buy Advances in Computers 56 First Edition Marvin Zelkowitz ebook with cheap price
67 pages
ICEGOV2020 AIuseinpublicservices Misuraca-Et-Al
No ratings yet
ICEGOV2020 AIuseinpublicservices Misuraca-Et-Al
11 pages
Data Mining With Big Data
No ratings yet
Data Mining With Big Data
26 pages
Data Mining With Big Data
No ratings yet
Data Mining With Big Data
26 pages
A Survey of The Recent Trends in Deep Le
No ratings yet
A Survey of The Recent Trends in Deep Le
30 pages
Malware Detection and Suppression Using Blockchain Technology
No ratings yet
Malware Detection and Suppression Using Blockchain Technology
6 pages
Artificial Intelligencein Cloud Computing Security
No ratings yet
Artificial Intelligencein Cloud Computing Security
10 pages
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
No ratings yet
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
15 pages
Human Factors in Phishing Attacks A Systematic Literature Review
No ratings yet
Human Factors in Phishing Attacks A Systematic Literature Review
35 pages
Insider Threat Detection Using a Graph-Based Appro
No ratings yet
Insider Threat Detection Using a Graph-Based Appro
91 pages
The Role of Artificial Intelligence in The Enhancement of Cyber Security of Pakistan
No ratings yet
The Role of Artificial Intelligence in The Enhancement of Cyber Security of Pakistan
15 pages
Final Print
No ratings yet
Final Print
117 pages
Kit TAP11
No ratings yet
Kit TAP11
10 pages
2308.10345v1
No ratings yet
2308.10345v1
18 pages
Literature Review On Malware and Its Analysis
No ratings yet
Literature Review On Malware and Its Analysis
13 pages
A Review of 40 Years of Cognitive Architecture Res
No ratings yet
A Review of 40 Years of Cognitive Architecture Res
38 pages
Paper22407
No ratings yet
Paper22407
9 pages
WhyDealingwithElectricalFaultsforSmartMicrogridisnotEnough
No ratings yet
WhyDealingwithElectricalFaultsforSmartMicrogridisnotEnough
21 pages
Big Data and The Courts
No ratings yet
Big Data and The Courts
8 pages
Artificial Intelligence Trendsand Challenges
No ratings yet
Artificial Intelligence Trendsand Challenges
7 pages
13-25PhishingAttackandItsDetectionsandPreventionTechniques1
No ratings yet
13-25PhishingAttackandItsDetectionsandPreventionTechniques1
14 pages
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
No ratings yet
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
9 pages
Artikel Bing 1 Re
No ratings yet
Artikel Bing 1 Re
17 pages
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
No ratings yet
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
14 pages
Efficiency and Effectiveness ofWeb Application Vulnerability Detection Approaches - A Review
No ratings yet
Efficiency and Effectiveness ofWeb Application Vulnerability Detection Approaches - A Review
36 pages
A Survey On Research in Software Reliability Engineering in The Last Decade
No ratings yet
A Survey On Research in Software Reliability Engineering in The Last Decade
3 pages
applsci-14-09697
No ratings yet
applsci-14-09697
14 pages
Socio-Cognitive and Affective Computing
100% (1)
Socio-Cognitive and Affective Computing
256 pages
Kali Linux, Ethical Hacking And Pen Testing For Beginners
From Everand
Kali Linux, Ethical Hacking And Pen Testing For Beginners
BHARAT NISHAD
No ratings yet
Ethical Hacking Basics for New Coders: A Practical Guide with Examples
From Everand
Ethical Hacking Basics for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
2109.01517
No ratings yet
2109.01517
21 pages
Plagiarism-Report 3
No ratings yet
Plagiarism-Report 3
3 pages
maxims equity
No ratings yet
maxims equity
5 pages
Document 14
No ratings yet
Document 14
18 pages
Municipal Corporation Complaint Management System
No ratings yet
Municipal Corporation Complaint Management System
4 pages
Security in Iot
No ratings yet
Security in Iot
37 pages
ITITS 1 Watermarking
No ratings yet
ITITS 1 Watermarking
11 pages
Test
No ratings yet
Test
12 pages
When A Patch Goes Bad Exploring The Properties of
No ratings yet
When A Patch Goes Bad Exploring The Properties of
10 pages
An Analysis of The Accuracy and Bias of A Generative Ai Model
No ratings yet
An Analysis of The Accuracy and Bias of A Generative Ai Model
31 pages
Applied DS and Smart Systems
No ratings yet
Applied DS and Smart Systems
202 pages
Robertetal.2020AIFairnessNewProof
No ratings yet
Robertetal.2020AIFairnessNewProof
32 pages
2020_PDF_accessibility_of_research_papers
No ratings yet
2020_PDF_accessibility_of_research_papers
11 pages
2018 HISISE BigDatavf
No ratings yet
2018 HISISE BigDatavf
10 pages
PBA-80Tx2500 Hydraulic Press Brake With E200P CNC Controller
No ratings yet
PBA-80Tx2500 Hydraulic Press Brake With E200P CNC Controller
3 pages
Check Dlc3 Sub Bus Line For Disconnection: Circuit Description
No ratings yet
Check Dlc3 Sub Bus Line For Disconnection: Circuit Description
3 pages
Unit 1
No ratings yet
Unit 1
8 pages
G.O 21, DT 1.5.2023 Voc CF To JL Regularisaton at SEWAA
No ratings yet
G.O 21, DT 1.5.2023 Voc CF To JL Regularisaton at SEWAA
7 pages
TEC-031100P-MET-DoR-001 (Method Statement For De-Shuttering Work For Peri)
0% (1)
TEC-031100P-MET-DoR-001 (Method Statement For De-Shuttering Work For Peri)
7 pages
Secrets series Emergency Medicine Secrets Fifth Edition Vincent J. Markovchick Md Faaem Facep - Instantly access the full ebook content in just a few seconds
100% (1)
Secrets series Emergency Medicine Secrets Fifth Edition Vincent J. Markovchick Md Faaem Facep - Instantly access the full ebook content in just a few seconds
62 pages
Quality Control of Construction Testing of Concrete Cubes
No ratings yet
Quality Control of Construction Testing of Concrete Cubes
7 pages
(NEW)Decolonial Futures (Postcolonia - Christine J. Hong
No ratings yet
(NEW)Decolonial Futures (Postcolonia - Christine J. Hong
249 pages
Child Language Acquisition Dissertation Ideas
100% (2)
Child Language Acquisition Dissertation Ideas
4 pages
Iranian Midlife Challenges Scale - 2018
No ratings yet
Iranian Midlife Challenges Scale - 2018
7 pages
PsyElec1 - Introduction To Guidance Counselling
No ratings yet
PsyElec1 - Introduction To Guidance Counselling
14 pages
Repayment Schedule For Personal Loan - Credit Saison
No ratings yet
Repayment Schedule For Personal Loan - Credit Saison
4 pages
The Enigma of Room 622
No ratings yet
The Enigma of Room 622
1 page
1D DVS Sulzer Rta48t-B Warsil 143
No ratings yet
1D DVS Sulzer Rta48t-B Warsil 143
143 pages
Handmaids Girls Secondary School: Student'S Information
No ratings yet
Handmaids Girls Secondary School: Student'S Information
1 page
Data-Types-and-Data-Structure1
No ratings yet
Data-Types-and-Data-Structure1
2 pages
Fourier Transform Infrared (FT-IR) Spectroscopy
No ratings yet
Fourier Transform Infrared (FT-IR) Spectroscopy
11 pages
Linkage and Crossing Over
100% (2)
Linkage and Crossing Over
42 pages
(B6) 113668-2002-People - v. - Dela - Cerna PDF
No ratings yet
(B6) 113668-2002-People - v. - Dela - Cerna PDF
13 pages
Cruz v. Salva
100% (1)
Cruz v. Salva
6 pages
Stage 1 Visit 1
No ratings yet
Stage 1 Visit 1
15 pages
Champak Agada
No ratings yet
Champak Agada
16 pages
Jurassic Park RPG JPRPG v2 PDF Free Part 3
No ratings yet
Jurassic Park RPG JPRPG v2 PDF Free Part 3
10 pages
SSC Udvahito Mukh-2025 Chemistry Unique Set------------------------04 (BV+EV)
No ratings yet
SSC Udvahito Mukh-2025 Chemistry Unique Set------------------------04 (BV+EV)
4 pages
19 - Best Practices in Lead Generation
No ratings yet
19 - Best Practices in Lead Generation
3 pages
Traditional Wedding of Vietnam: Welcome To Presentation of Group 2
No ratings yet
Traditional Wedding of Vietnam: Welcome To Presentation of Group 2
32 pages
Unit 6 Human Reproduction
No ratings yet
Unit 6 Human Reproduction
98 pages
Exp 2 - Drag Coefficient - Wind Tunnel
No ratings yet
Exp 2 - Drag Coefficient - Wind Tunnel
13 pages
1 - Checklist For Specification Writing For Building Construction
75% (4)
1 - Checklist For Specification Writing For Building Construction
14 pages

Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera

Uploaded by

Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Systematic Analysis of Deep Learning Model for Vulnerable Code Detection

Conference Paper · June 2022

Mohammad Nazim Md Jobair Hossain Faruk

SEE PROFILE SEE PROFILE

Hossain Shahriar Mohammad Masum

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

{∗ mnazim, † mhossa21}@students.kennesaw.edu, {‡ hshahria, ∗ mkhan74, § mmasum}@kennesaw.edu

Abstract—Software vulnerabilities have become a serious prob-

A. Contributions of this Survey

Y.Zhou et al. [5] FFMPeg+Qemu Real Graph AST+CFG+DFG+NCS word2vec GGNN

S. Liu et al. [18] FFMPeg+Qemu Real Graph AST word2vec BLSTM

View publication stats

You might also like