Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera
Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera
net/publication/360614593
CITATIONS READS
6 457
5 authors, including:
All content following this page was uploaded by Md Jobair Hossain Faruk on 15 May 2022.
The initial keyword search process had undergone a fil- III. S OURCE C ODE R EPRESENTATION
tration procedure that only selected papers published in the
last 5 years from 2017 to 2022. Additional restraints were Source code representation is an essential step to decompose
placed in each of the scientific databases to find relevant the input sample of source code to only contain important
research material. IEEE Xplore included only Conferences and syntactic and semantic information by removing unnecessary
Journals while ACM required filters specifying Journals and lines, comments, spaces. While many representations exist,
Research Articles. arXiv e-Print Archive did not require any this study will aim to present the state-of-the-art methods
predefined filters nor did Google Scholar. However, Google utilized to capture the structural and semantic information
Scholar failed to provide any unique research papers related from the source code for feature extraction.
A. Abstract Syntax Tree (AST) operators, separators while excluding irrelevant code in terms
Abstract Syntax Tree (AST) is the tree representation of a of compilation. The lexer converted the code to tokens of
source code that can capture the abstract syntax structure and three different types. String, character, and float literals were
semantics of the code block that allows the source code to be lexed to type-specific placeholders while integer literals were
analyzed statically [17, 18]. This procedure allows the partition tokenized digit-by-digit due to relevance to vulnerability de-
of the initial input source code into smaller parts and achieves tection. Types and function calls from common libraries are
greater granularity of the function-level source code [13]. An mapped to their generic versions. Zheng et al. [19] imple-
AST can be acquired by using a parser such as CodeSensor mented this custom lexer for word-level tokenization on texts
[18], [19], or Pycparser [7]. While these ASTs can be directly from the Draper VDISC dataset.
used, it lacks granularity due to codes being large or complex E. Semantics-based Vulnerability Candidates (SeVC)
[16]. Thus, in [7] the AST tree, which can be considered
Semantics-based Vulnerability Candidates are the various
an m-ary tree, enforces rules to convert it into a complete
statements that are semantically related to the Syntax-Based
binary AST tree to preserve its structural relations from the
Vulnerability Candidates (SyVC) by extraction of its program
AST nodes. Similarly, an RNN model has also been proposed
slices [19]. In order to conceptualize SeVCs the term SyVC
called Tree-LSTM which leverages its bottom-up calculation
needs to be explored. SyVCs extraction requires the conversion
to integrate outputs of all AST child nodes to construct a
of source code to an AST which contains multiple consecutive
binary AST tree [18], [20].
tokens. The AST is traversed to locate a code element that
B. Code Gadgets matches the defined vulnerability syntax which is labelled as
a SyVC [4]. SyVCs are transformed into SeVCs by program
Code gadgets are a composition of multiple lines of program
slicing [22] to capture the semantic relation of statements
statements that have a semantic correlation in terms of data
based on data dependency and control dependency. Joern [24]
dependency and control dependency [3] by implementing
tool extracts PDGs of each SyVC; then, program slices are
program slicing [22]. Program slicing can be categorized into
generated from interprocedural forward and backward slices
two type of slices: forward and backward [3]. Forward slices
[27] which are transformed to SeVCs [4].
are the slices of code that are received from an external input,
such as file o sockets; whereas, backward slices do not receive IV. D EEP L EARNING M ODELS
an direct input externally from the environment in which the Deep Neural Networks were inspired from the biological
program is run [2], [21]. This decomposition of programs by aspects of the human brain and nervous system. It forms
analyzing their data flow and control flow [22] allows the networks similar to the human nervous system and mimics
reduction of the lines of codes and focuses on the key points of the thought mechanisms of humans by training itself with
library/API function calls, arrays, or pointers. VulDeePecker the data provided. Deep learning has been successful in
[3] only accounts for data dependency in the code gadgets image classification and captures nonlinear effects of variables
[8], [21] as it uses the commercial tool Checkmarx [25] and with high-level feature representation. Thus, deep learning
performs forward and backward slices for each argument and techniques have been implemented to extract features from
then assembled to form code gadgets. In Zagane’s model [23] texts (source code) and then train its model to understand and
the dataset contains 420,627 code slices which consist of detect the vulnerability in the software.
56395 vulnerable code slices and the code metrics of each
slice is calculated for their deep learning model. A. CNN (Convolutional Neural Network)
CNN is a deep learning model that was introduced mainly
C. Code Property Graph (CPG)
to analyze features in images but it has also been implemented
Code Property Graph (CPG) is an amalgamation of classical on feature extraction of source codes to learn its vector
data structures representing a source i.e. abstract syntax tree, representations [12]. Furthermore, since codes do not possess
control flow graph (CFG), and program dependence graph extensive features that are present in images, CNN can extract
(PDG) [26]. The REVEAL [1] vulnerability prediction frame- the features as well as understand the structure of the program
work utilizes a modified CPG rather than the data structure by learning patterns and relationships between contexts in the
represented by Yamaguchi et al. [26] by adding a data-flow source code [9]. In the study of Russell et al. [6] the CNN
graph (DFG) in conjunction with the existing CPG to capture implemented had a filter size of 9 and 512 filters paired with
additional context about the semantic in the code. Devign [5] batch normalization and ReLU to determine the sequential
takes a similar approach of CPG in source code representation tokens.
by merging the concepts of AST, CFG, DFG, and Natural
Code Sequence (NCS) into a joint graph. B. RNN (Recurrent Neural Network)
RNN allows longer token dependencies to be extracted
D. Lexed Representation than CNN [6] as its “memory” contains information from the
Russell et al. [6] constructed a custom lexer for C/C++ previous and next tokens [9]. RNN used in Russel et al. [6]
code that formed a code representation of 156 tokens as had a hidden size of 256 and was maxpooled to generate a
the vocabulary size. This methodology included keywords, fixed size of vector representations.
LSTM is an extended version of the RNN architecture A. NVD and SARD
that can learn long-term dependencies. LSTM can overcome Software Assurance Reference Dataset (SARD) [28] con-
the limitations of a conventional statistical model (such as tains production, synthetic, and academic security flaws or
ARIMA) by capturing non-linearity of sequential data and vulnerabilities, and National Vulnerability Database (NVD)
simultaneously generating more precise forecasting for time- [29] contains vulnerabilities in production software [3]. The
series data [11]. LSTM addresses the vanishing gradient prob- dataset is made up of C/C++ programs and software products
lem of RNN [50]. The building block of LSTM architecture [4] which contain 61,638 code gadgets, including 17,725 code
is a memory block, which consists of a memory cell that gadgets that are vulnerable and 43,913 code gadgets that
can preserve information of the preceding time step with are not vulnerable [31] as shown in Figure 5. Among the
self-recurrent connections. BLSTM is a variation of LSTM 17,725 code gadgets that are vulnerable, 10,440 code gadgets
deep learning model but builds upon the one-way model by correspond to buffer error vulnerabilities (CWE-119) and the
pairing it with another CNN model called GRU to enforce rest 7,285 code gadgets correspond to resource management
two-way LSTM i.e. BLSTM. µVulDeePecker [8] implemented error vulnerabilities (CWE-399) [3]. The dataset is named
a multi-feature fusion method of BLSTM that could extract as Code Gadget Database (CGD) [31] and its extensions are
information from global-feature learning model, local-feature Semantics-based Vulnerability Candidate (SeVC) dataset [32]
learning model and feature-fusion model. (used during SeSeVR [4]) and Multiclass Vulnerability Dataset
GRU is an alternative version of LSTM that was intro- (MVD) [33] (used during µVulDeePecker [8]).
duced to avoid the vanishing gradient problem and boost
the efficiency of LSTM [17]. GRU has a less complicated B. Draper VDISC
architecture than LSTM, with a reduced number of parameters The dataset consists of the source code of 1.27 million
to learn. It consists of two gates: update and reset gates, (In functions mined from open source software, labelled by static
comparison, three gates are included in LSTM.) GRU solves analysis for potential vulnerabilities [30]. Draper VDISC
the vanishing gradient problems of RNN by leveraging the dataset has C/C++ codes from open source projects: Debian
two gates through controlling what information should be Linux Distribution [34], Public git repositories on GitHub
passed to the future states [14]. The input and forget gates [35] and, SATE IV Juliet Test Suite of NIST Samate project
in LSTM models are integrated into the GRU update gate, [36] where the first 2 source projects are real and the last
which determines how much information from previous steps one is synthetic. The Debian package releases provide a
should be passed to the future time steps. Similarly, to the selection of very well-managed and curated code while, the
output gate in LSTM, the reset gate in the GRU incorporates GitHub dataset provides a larger quantity and wider variety of
new input and previous memory and determines how much (often lower-quality) code and the SATE IV Juliet Test Suite
memory of past information should be forgotten. contains synthetic code examples with vulnerabilities from 118
different Common Weakness Enumeration (CWE) [6], [7].
V. DATASETS
C. REVEAL
The datasets observed in this field of research when im-
plementing Deep Learning models are source code in the ReVeal is a real-world source code dataset where vulnera-
C/C++ language. These datasets are popular because an in- bilities are tracked from Linux Debian Kernel and Chromium
depth review has been done in quantifying the amount of open-source projects. This dataset contains C/C++ sources and
vulnerable and malicious code present in the dataset as shown are well-maintained public projects with large evolutionary
in Figure 2. history and both represent important program domains (OS
and browsers) and plenty of publicly available vulnerability
reports.[1] Fixed issues with publicly available patches can be
collected using Bugzilla for Chromium and Debian security
tracker for Linux Debian Kernel. The ReVeal dataset contains
a total of 22,734 samples, with 2240 non-vulnerable and
20,494 vulnerable samples as seen in Figure 3.
D. FFMPeg+Qemu
FFMPeg+Qemu is a balanced, real world dataset collected
from Github repositories that consist of 4 large C-language
open-source projects that are popular among developers and
diversified in functionality, i.e., Linux Kernel, QEMU, Wire-
shark, and FFmpeg [5]. The labelling of this dataset was done
manually based on commit messages and domain experts [15].
Figure 5 presents the balanced nature of the dataset with
Fig. 2. Overview of Datasets 23355 non-vulnerable commits messages and 25332 which are
vulnerbale.
TABLE III
D EEP L EARNING M ODELS F OR V ULNERABILITY A NALYSIS
Author Dataset Dataset Type Feature Representation Source Code Representation Vector Representation DL Models
Russel et al. [6] DRAPER VDISC Synthetic, Semi-synthetic, Real Token NLP approach (Convolutional and Recurrent Feature Extraction) word2vec CNN + RF, RNN + RF
S. Chakraborty et al. [1] REVEAL, FFMPeg+Qemu Real Graph CPG word2vec GGNN + MLP + Triplet Loss
G.Tang et al. [2] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets doc2vec KELM
Z. Bligin et al. [7] DRAPER VDISC Synthetic, Semi-synthetic, Real Token Binary AST Array Representation MLP, CNN
Z.Li et al. [3] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets word2vec BLSTM
D.Zou et al. [8] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets + Code Attention word2vec BLSTM
Z.Li et al. [4] SARD & NVD Synthetic, Semi-synthetic Token SeVCs word2vec BGRU
Guo et al. [18] SARD & NVD Synthetic, Semi-synthetic Token Code Gadgets word2vec CNN+LSTM
VI. C HALLENGES AND F UTURE W ORK specific network in the midst of all the neural networks in
the deep learning models. Further research can lead to better
The review acknowledges the infancy of the research field
source code analysis which can explain deep learning-based
of deep learning-based software vulnerability detection. There
models in vulnerability prediction.
are multiple problems that are unresolved but this indicates the
need for further research to solve these issues for better discov- VII. C ONCLUSION
ery of vulnerable codes. Thus, we have compiled a number of The emergence of vast software applications with increasing
challenges and possible future research directions which have complexity and this continual popularity of software in the
been assessed by the conclusive remarks of previous works. information era will require automatic deep learning-based
Dataset is an integral part of training a vulnerability predic- models to learn and detect vulnerabilities in these softwares.
tion model. As surveyed in this paper, various studies utilize In this survey, we review the studies conducted on the im-
various datasets, like SARD & NVD or REVEAL dataset, to plementation of deep learning technology for source code
train their models which indicates the lack of a standardized vulnerability analysis. At first, a general overview is discussed
benchmarking dataset that covers most CWE vulnerabilities. in the formation of deep learning models for source code
Thus, a unified and standardized metric for evaluations of the vulnerability detection. A detailed summary of the state-of-
deep learning-based models cannot be produced. Furthermore, the-art source code representations are explored which can
deep learning models require huge amounts of training data decompose the text of source code and still conserve its
to provide excellent performance but the current datasets behaviour to feed into the deep learning models for training.
are insufficient in this regard. Thus, development of such Next, the deep learning models implemented for vulnerability
vulnerability datasets is an essential direction for research to detection are outlined with brief examples of of the hidden
resolve the issue of dataset with ground truth. The ratio of layers used in the trained models. Lastly, based on the existing
vulnerable code to non-vulnerable code tends to be stagger- studies mentioned in this paper, the challenges and directions
ingly unbalanced as seen in Figure 5. Non-vulnerable codes for future work in this research field are stated.
are abundant but vulnerable codes are a huge minority that
results in deep learning vulnerability prediction models to have ACKNOWLEDGMENT
insufficient training data for detection of the vulnerabilities and The work is partially supported by the U.S. National Sci-
cause overfitting in the model. There exists class imbalance in ence Foundation (NSF) Awards 2100134, 2100115, 1723578,
specific CWE vulnerabilities as well such as CWE 469 present 1723586, and SunTrust Fellowship Award. Any opinions,
in the Draper VDISC dataset leading to poor prediction where findings, and conclusions or recommendations expressed in
as CWE 119 exists in multitude leading to best prediction of this material are those of the authors and do not necessarily
this vulnerability in the model during cross validation [7] reflect the views of the NSF amd SunTrust.
Deep learning models possess nonlinearity and hidden lay-
ers that make it difficult to interpret the behaviour that lead R EFERENCES
to a vulnerability prediction begs the following questions: Is [1] S. Chakraborty, R. Krishna, Y. Ding and B. Ray, “Deep Learning based
the model accurate? Is the the prediction on vulnerability Vulnerability Detection: Are We There Yet,” in IEEE Transactions on
Software Engineering, doi: 10.1109/TSE.2021.3087402.
discovery reliable? What is the reason for the classification of [2] T. Gaigai, Y. Lin, R. Shuangyin, M. Lianxiao, Y. Feng, W. Huiqiang.
vulnerable or non-vulnerable in a specific piece of the source (2021). An Automatic Source Code Vulnerability Detection Approach
code? This problem has been tackled by two methods by re- Based on KELM. Security and Communication Networks. 2021. 1-12.
10.1155/2021/5566423.
searchers. Firstly, the use of LIME [10] to create linear models [3] Z. Li et al. , “VulDeePecker: A deep learning-based system for vulnerabil-
of the neural networks for simple interpretation. Secondly, the ity detection,” presented at the Proceedings 2018 Network and Distributed
introduction of code attention to the source code representation System Security Symposium, 2018.
[4] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, Z. Chen, S. Wang, and J.
[8] that enables researchers to obtain the attention vectors and Wang, “Sysevr: A framework for using deep learning to detect software
quantitatively measure how much attention was given to a vulnerabilities,” arXiv preprint arXiv:1807.06756, 2018.
[5] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective [29] NIST Software Assurance Reference Dataset Project, 2022. Accessed:
vulnerability identification by learning comprehensive program semantics May 4, 2022. [Online] https://samate.nist.gov/SRD/index.php.
via graph neural networks,” in Advances in Neural Information Processing [30] Louis Kim and Rebecca Russell, Draper VDISC Dataset - Vulnerability
Systems, 2019, pp. 10 197-10 207 Detection in Source Code, 2021. Accessed: May 4, 2022. [Online]
[6] R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, https://osf.io/d45bw/
P. Ellingwood, and M. McConley, “Automated vulnerability detection in [31] NDSS, Database of “VulDeePecker: A Deep Learning-Based System
source code using deep representation learning,” in Proceedings of the for Vulnerability Detection”, 2018. Accessed: May 4, 2022. [Online]
17th IEEE International Conference on Machine Learning and Applica- https://github.com/CGCL-codes/VulDeePecker
tions (ICMLA 2018). IEEE, 2018, pp. 757- 762.
[32] Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, Z. Chen. SySeVR: “A Framework
[7] Z. Bilgin, M. A. Ersoy, E. U. Soykan, E. Tomur, P. Comak and L.
for Using Deep Learning to Detect Software Vulnerabilities”. IEEE
Karacay, “Vulnerability Prediction From Source Code Using Machine
TDSC. 2021. doi: 10.1109/TDSC.2021.3051525.
Learning,” in IEEE Access, vol. 8, pp. 150672-150684, 2020, doi:
10.1109/ACCESS.2020.3016774. [33] Multiclass Vulnerability Dataset (MVD), 2019. Accessed: May 4, 2022.
[8] D. Zou, S. Wang, S. Xu, Z. Li and H. Jin, “VulDeePecker: A Deep [Online] https://github.com/muVulDeePecker/muVulDeePecker
Learning-Based System for Multiclass Vulnerability Detection,” in IEEE [34] Debian-The Universal Operating System. Accessed: May 4, 2022. [On-
Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. line]. Available: https://www.debian.org/
2224-2236, 1 Sept.-Oct. 2021, doi: 10.1109/TDSC.2019.2942930. [35] GithubâDistributed Version Control Software. Accessed: May 4, 2022.
[9] Wu, Jiajie. “Literature review on vulnerability detection using NLP [Online]. Available: https://github.com/
technology.” arXiv preprint arXiv:2104.11230 (2021). [36] P. E. Black and P. E. Black, Juliet 1.3 Test Suite: Changes From 1.2.
[10] X. Rong, “word2vec parameter learning explained”, arXiv preprint Gaithersburg, MD, USA: US Department of Commerce, National Institute
arXiv:1411.2738, 2014. of Standards and Technology, 2018.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, [37] “CVE website,” Accessed: May 4, 2022. [Online]. Available:
“Smote: synthetic minority over-sampling technique”, Journal of artificial https://cve.mitre.org.
intelligence research, vol. 16, pp. 321-357, 2002
[12] J. Wang, M. Huang, Y. Nie and J. Li, “Static Analysis of Source [38] Anjum N, Latif Z, Lee C, Shoukat IA, Iqbal U. MIND: A Multi-Source
Code Vulnerability Using Machine Learning Techniques: A Survey,” Data Fusion Scheme for Intrusion Detection in Networks. Sensors. 2021;
2021 4th International Conference on Artificial Intelligence and Big Data 21(14):4941. https://doi.org/10.3390/s21144941
(ICAIBD), 2021, pp. 76-86, doi: 10.1109/ICAIBD51990.2021.9459075. [39] https://cybersecurityworks.com/blog/cyber-risk/pegasus-spyware-
[13] “Common weakness enumeration,” CWE. [Online]. Available: snoops-on-political-figures-worldwide.html
https://cwe.mitre.org/top25/archive/2021/2021cwetop25.html. [Accessed: [40] A. Ramadan, A. Bahaa, O. Ghoneim. A systematic review of the
23-Feb-2022]. literature on software vulnerabilities detection using machine learning
[14] A. ASSARAF. This is what your developers are doing 75% of the time, methods. Information Bulletin in Computers and Information , 4, 1, 2022,
and this is the cost you pay. Available: https://coralogix.com/loganalytics- 1-9. doi: 10.21608/fcihib.2022.87660.1058
blog/this-is-what-your-developers-are-doing-75-of-the-timeand-this-is- [41] Y. Kageyama (2022). Toyota’s Japan Production Halted Over
the-cost-you-pay/ Suspected Cyberattack. ABC News. Accessed: May 4, 2022. [Online]
[15] Zhuang, Yufan, et al. “Software Vulnerability Detection via Deep https://abcnews.go.com/Technology/wireStory/toyotas-japan-production-
Learning over Disaggregated Code Graph Representation.” arXiv preprint halted-suspected-cyberattack-83155113
arXiv:2109.03341 (2021). [42] “CVE website,” CVE vulnerability data, 2022. Accessed: May 4, 2022.
[16] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?”, [Online] https://www.cvedetails.com/browse-by-date.php
presented at the Proceedings of the 22nd ACM SIGKDD International [43] M.J.H. Faruk, S. Santhiya, S. Hossain, V. Maria, X. Li. (2022).
Conference on Knowledge Discovery and Data Mining, 2016. “Software Engineering Process and Methodology in Blockchain-Oriented
[17] Chimmula, Vinay Kumar Reddy, and Lei Zhang. “Time series forecast- Software Development: A Systematic Study”. 20th IEEE/ACIS SERA
ing of COVID-19 transmission in Canada using LSTM networks.” Chaos, 2022.
Solitons Fractals 135 (2020): 109864.
[18] S. Liu, G. Lin, Q. L. Han, S. Wen, J. Zhang and Y. Xiang, “DeepBalance: [44] U. Paramita, M.J.H. Faruk, N. Mohammad, M. Mohammad, S. Hossain,
Deep-Learning and Fuzzy Oversampling for Vulnerability Detection,” in U. Gias, B. Shabir, R. Akond, A. Sheikh. (2022). “Evolution of Quantum
IEEE Transactions on Fuzzy Systems, vol. 28, no. 7, pp. 1329-1343, July Computing: A Systematic Survey on the Use of Quantum Computing
2020, doi: 10.1109/TFUZZ.2019.2958558. Tools”. 1st Int. Conf. on AI in Cybersecurity (ICAIC).
[19] W. Zheng, A. O. A. Semasaba, X. Wu, S. A. Agyemang, T. Liu and [45] D. Radjenovic, M. Hericko, R. Torkar, and A. zivkovic, “Software fault
Y. Ge, “Representation vs. Model: What Matters Most for Source Code prediction metrics: A systematic literature review,” Inf. Softw. Technol.,
Vulnerability Detection,” 2021 IEEE SANER, 2021, pp. 647-653, doi: vol. 55, no. 8, pp. 1397-1418, Aug. 2013.
10.1109/SANER50967.2021.00082. [46] D. Votipka, R. Stevens, E. Redmiles, J. Hu, and M. Mazurek, “Hackers
[20] Zeroual A, Harrou F, Dairi A, Sun Y. Deep learning methods for vs. Testers: a comparison of software vulnerability discovery processes,”
forecasting covid19 time-series data: a comparative study. Chaos, Solit in Proc. IEEE Symp. Secur. Privacy (SP), May 2018, pp. 374-391.
Fractals 2020;140:110121. [47] M. Mohammad, S. Hossain, H. Hisham, M.J.H. Faruk, V. Maria, K. Md,
[21] Z. Li, D. Zou, J. Tang, Z. Zhang, M. Sun and H. Jin, “A Comparative R. Mohammad, A. Muhaiminul, C. Alfredo, W. Fan. (2021). “Bayesian
Study of Deep Learning-Based Vulnerability Detection System,” in IEEE Hyperparameter Optimization for Deep Neural Network-Based Network
Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2930578. Intrusion Detection”. 10.1109/BigData52589.2021.9671576.
[22] M. Weiser, “Program Slicing,” in IEEE Transactions on Soft- [48] M. Mohammad, M.J.H. Faruk, S. Hossain, Q. Kai, L. Dan, A.
ware Engineering, vol. SE-10, no. 4, pp. 352-357, July 1984, doi: Muhaiminul. (2022). “Ransomware Classification and Detection With
10.1109/TSE.1984.5010248. Machine Learning Algorithms”. 10.1109/CCWC54503.2022.9720869.
[23] M. Masum, M.A. Masud, M. I. Adnan, H. Shahriar, S. Kim, “Com-
[49] M.J.H. Faruk, S. Hossain, V. Maria, B. Farhat, S. Shahriar, K. Abdullah,
parative study of a mathematical epidemic model, statistical modeling,
W. Michael, C. Alfredo, L. Dan, R. Akond, W. Fan. (2021). “Mal-
and deep learning for COVID-19 forecasting and management, Socio-
ware Detection and Prevention using Artificial Intelligence Techniques”.
Economic Planning Sciences”, Volume 80, 2022, 101249, ISSN 0038-
10.1109/BigData52589.2021.9671434.
0121, https://doi.org/10.1016/j.seps.2022.101249.
[24] “Joern,” May 11, 2019, Accessed: May 4, 2022. [Online]. Available: [50] K. Kim, D. K. Kim, J. Noh and M. Kim, “Stable Forecasting of
https://github.com/ ShiftLeftSecurity/joern Environmental Time Series via Long Short Term Memory Recurrent
[25] Checkmarx, Accessed: May 4, 2022. https://www.checkmarx.com/. Neural Network,” in IEEE Access, vol. 6, pp. 75216-75228, 2018, doi:
[26] F. Yamaguchi, N. Golde, D. Arp and K. Rieck, “Modeling and Discover- 10.1109/ACCESS.2018.2884827.
ing Vulnerabilities with Code Property Graphs,” 2014 IEEE Symposium [51] S. M. Ghaffarian and H. R. Shahriari, âSoftware vulnerability analysis
on Security and Privacy, 2014, pp. 590-604, doi: 10.1109/SP.2014.44. and discovery using machine-learning and data-mining techniques: A
[28] NVD, National Vulnerability Database, 2022. Accessed: May 4, 2022. survey,’ ACM Comput. Surv., vol. 50, no. 4, p. 56, Nov. 2017.
[Online] https://nvd.nist.gov/. [52] R. Malhotra, “A systematic review of machine learning techniques for
[27] F. Tip, “A survey of program slicing techniques,” J. Prog. Lang., vol. 3, software fault prediction,” Appl. Soft Comput., vol. 27, pp. 504-518, Feb.
no. 3, 1995. 2015.