0% found this document useful (0 votes)

23 views8 pages

E Ure: The Int'l Journal of Information Security

Uploaded by

lexiaochen5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views8 pages

E Ure: The Int'l Journal of Information Security

Uploaded by

lexiaochen5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

The ISC Int'l Journal of

ISeCure
Information Security
Manuscript template of ISeCure Journal (pp. 1–8)
http://www.isecure-journal.org

Using ChatGPT as a Static Application Security Testing Tool

Atieh Bakhshandeh 1,∗ , Abdalsamad Keramatfar 1 , Amir Norouzi 1 , and
Mohammad Mahdi Chekidehkhoun 1
arXiv:2308.14434v1 [cs.CR] 28 Aug 2023

1 Research Center for Development of Advanced Technologies, Tehran, Iran

ARTICLE I N F O. Abstract
Keywords:
Artificial Intelligence-based Code In recent years, artificial intelligence has had a conspicuous growth in almost
review, ChatGPT Model, every aspect of life. One of the most applicable areas is security code review, in
Common Weakness Enumeration,
which a lot of AI-based tools and approaches have been proposed. Recently,
Static Application Security
Testing, Vulnerability Detection ChatGPT has caught a huge amount of attention with its remarkable
performance in following instructions and providing a detailed response.
Regarding the similarities between natural language and code, in this paper, we
study the feasibility of using ChatGPT for vulnerability detection in Python
source code. Toward this goal, we feed an appropriate prompt along with
vulnerable data to ChatGPT and compare its results on two datasets with the
results of three widely used Static Application Security Testing tools (Bandit,
Semgrep and SonarQube). We implement different kinds of experiments with
ChatGPT and the results indicate that ChatGPT reduces the false positive and
false negative rates and has the potential to be used for Python source code
vulnerability detection.
© 2023 ISC. All rights reserved.

1 Introduction Software vulnerability is a technical vulnerability that

can be used for violating its security policies. Such
Today, almost all technologies are strongly dependent
vulnerabilities can be exploited which in turn leads
on source code. Therefore, code is of increasing impor-
to data leakage and tampering and even denial of
tance. A glance at the number of lines of used codes in
services.
some well-known tools is evidence for this claim. For
instance, the number of GitHub repositories increased Static source code analysis is a method for finding
from 100 million in 2018 to 200 million in 2022 [1]. It code vulnerabilities that is done by automatically ex-
is clear that the increase in the amount of code will amining the source code without having to execute the
lead to more security requirements in programming. program. Static Application Security Testing (SAST)
MITRE 1 and other studies indicate the growth in the tools analyze a piece of code or a compiled version
number of vulnerabilities in recent years [2–8]. Specif- of it in order to identify its security problems. Cov-
ically, software vulnerability is of great importance. ering a wide range of errors and high accuracy are
two important features of SAST tools [9]. Most of the
well-known SAST tools such as Semgrep, Bandit, and
∗ Corresponding author.
SonarQube often use rule-based techniques to find
Email addresses: [email protected], the vulnerable patterns of code. However, these tools
[email protected], [email protected],
[email protected]
have shown to have their own flaws, including a high
ISSN: 2008-2045 © 2023 ISC. All rights reserved.
rate of false positives and false negatives [4]. The more
1 Massachusetts Institute of Technology Research and Engi- false positives a SAST tool returns, the more time and
neering
2

effort are required by a security expert to validate the analysis of the obtained results 2 . In Section 6 we dis-
findings of the SAST tool. Moreover, this will increase cuss some factors that may threaten the validity of
the error rate by humans, which may then lead to ig- the results. Finally, Section 7 concludes the paper.
noring some vulnerabilities. On the other hand, a high
rate of false negative leads to catastrophic events. 2 Related Work
In recent years, Machine Learning (ML) and deep In this section, we review some of the works that
learning have had remarkable advances in various ar- used different kinds of AI models for vulnerability de-
eas such as natural language processing [10, 11]. There- tection. Note that we did not focus on works which
fore, considering the high similarity between code and proposed models for repairing the identified vulnera-
natural languages, the deep learning-based models bilities. When it comes to artificial intelligence, the
are expected to be successful in code processing tasks. main idea is the use of supervised learning. Therefore,
Likewise, studies in this area have shown the interest various machine learning models used methods of fea-
of researchers in using deep learning techniques in vul- ture engineering such as the number of lines of the
nerability detection [12, 13]. Machine learning models code, code complexity and the number of operations
can automatically learn the patterns of software vul- and also utilized textual features[15, 17]. In general,
nerabilities based on datasets. Furthermore, research research shows that text-based models have better
indicates that ML models have fewer false positives performance over feature engineering and the studies
compared to SAST tools [6, 14]. Results of a new re- also admit that machine learning models outperform
search have shown the superior performance of deep the existing SAST tools.
learning-based models over three open-source tools in Recently, more research has been devoted to deep
C/C++, reducing false positives and negatives rate learning. In this scope, researchers often used different
at the same time [15]. deep learning models such as Convolutional Neural
Recently, ChatGPT, an AI-powered chatbot tool Network (CNN), Long Short-Term Memory (LSTM),
that uses Natural Language Processing (NLP) and and Multilayer Perceptron (MLP) [13, 18–20]. Some of
machine learning algorithms to understand and re- the models were based on different kinds of code prop-
spond to customer inquiries, has drawn a lot of atten- erty graphs and used Graph Neural Network [13, 14],
tion. ChatGPT is vital for business professionals for while some others only relied on tokens [20]. A new
several reasons. It can help save time and resources by study has investigated the way deep learning models
automating tasks requiring human intervention. An function in vulnerability detection tasks. The results
important point to note is that, ChatGPT has been of this study reveal some points: first, the results of
trained with a huge amount of data till 2021 so that it different models are not compatible with each other.
can be a great help in finding known patterns in thou- Second, fine-tuned models have shown better perfor-
sands of packages in automated way. The model is mance in this field. Third, usually 1000 samples of
also trained on a large amount of code and is thus able each class are enough for the training of a neural net-
recognize common patterns. In this paper, we evaluate work and finally, models usually use the same features
the performance of ChatGPT in identifying security for prediction [21]. Although studies approved the
vulnerabilities of Python codes and compare the re- superiority of graph-based models, a new study indi-
sults with three well-known SAST tools for Python cates the superior performance of transformer-based
vulnerability detection ( Bandit, Semgrep and Sonar- models over graph-based ones [22]. In 2022, Hanif and
Qube). The reason for choosing the Python language Maffeis proposed a model named VulBERTa [23]. This
is that in 2022, Python was known to be the most pop- model is based on RoBERTa and is used for vulnerabil-
ular programming languages along with Java, based ity detection in C/C++ codes. Another recent study
on the Popularity of Programming Language Index also has used BERT architecture and CodeBERT vec-
(PYPL) and IEEE reports. Also, Stackscale ranked tors for predicting code vulnerabilities. The results
Python at the third place [16]. Although Python is of this study approve the superiority of transformer-
mainly used in the scope of machine learning and data based models over traditional deep learning models
science, its applications are not only limited to these and also graph-based models [24]. Overall, it seems
fields and with its famous frameworks such as Django that transformer-based models are effective in this
and Flask, it is prone to vulnerabilities. The rest of area. Another recent work has evaluated ChatGPT as
the paper is organized as follows: In Section 2, we pro- a large language model for detecting vulnerabilities
vide a brief literature review of this area. Section 3 is in Java source codes and compared the results with a
dedicated to datasets we used. Section 4 provides the dummy classifier and achieved no better results than
details of the experiments we performed with Chat- it [25]. However, there is still no academic study about
GPT. In Section 5, we present the evaluation and
2 https://github.com/abakhshandeh/ChatGPTasSAST.git
3

comparing the results of the ChatGPT model with no."}] so we can compare our results with those
traditional SAST tools for Python, and this paper of SAST tools.
aims to answer the question of whether the ChatGPT (3) In the third experiment, for each of vulnerable
model is outperforming SAST tools or not? files, we give the model all the labels returned
from Bandit, Semgrep and SonarQube tools for
3 Datasets Description the Python code, as the classes that ChatGPT
In this section, we provide the details about our dataset should use. We then ask the model whether each
and the labels we used. Our dataset consists of 156 vulnerable file contains any of those vulnerabili-
Python code files. These files contain 130 files of the ties or not? Here, the main difference with our
securityEval dataset which is proposed in [26]. As the second experiment is that we specify the classes
authors mentioned, these 130 files cover 75 vulnera- per vulnerable file separately. In other words, we
bility types that are mapped to Common Weakness use the model as an assistant for the SAST tools
Enumeration (CWE). The remaining 26 files belong to verify the detected vulnerabilities by them.
to a project called PyT in which the author developed In this experiment, we use the same JSON for-
a tool for Python code vulnerability detection and mat as the second experiment for the responses.
used these 26 vulnerable code files for evaluating his Note that in this experiment, although we pro-
tool [27, 28]. Since the used datasets do not provide vide the labels’ list beforehand for each vulnera-
the specific line of vulnerability, a security expert of ble file, in some cases the model has returned a
our team rechecked the data and specified the vulner- new CWE which is not among its input labels.
able line. We identified the corresponding line of code This is a natural behavior seen from a language
of CWEs that were assigned in the labels of these files model and in order to address this issue in our
with the help of a security expert. The datasets’ in- evaluation, we consider two cases: In one case,
formation and the distribution of their corresponding we ignore the new labels and calculate the met-
labels are presented in Table A.1 in appendix A. rics without considering them. This policy can
reduce the number of false positives of SAST
4 Working with ChatGPT API tools. In another case, we consider them as well
and this time the number of false negatives may
In this section, we provide the details of the process
decrease.
of utilizing the ChatGPT model API for identifying
(4) In our fourth experiment, we do not provide
vulnerabilities. In this study, we used the GPT-3.5-
any label list for the model and ask it to detect
turbo model. The GPT-3.5-Turbo model can accept a
the vulnerabilities in the files and determine
series of messages as input, unlike the previous version
their corresponding CWEs from its own trained
that only allowed a single text prompt. This capability
knowledge. Here, the format of the responses is
provides some interesting features, such as the ability
the same JSON structure in the previous exper-
to store prior responses or query with a predefined set
iments.
of instructions with context. This is likely to improve
the generated response. The GPT-3.5-Turbo model is To use the model for our experiments, we put all
a superior option compared to the GPT-3 model, as the vulnerable python codes of our dataset in a di-
it offers better performance across all aspects while rectory and called GPT-3.5 API with an optimized
being 10 times more cost-effective per token. We did prompt for each of the vulnerable python files. The
four kinds of experiments using the GPT-3.5-Turbo choice of prompt is the most challenging task in this
model. process, as it has direct effect on the results the model
provides. We optimize our prompts according to [29].
(1) In our first experiment, we give the model the
Table 2 provides the list of prompts we used for each
vulnerable files and ask it whether they con-
experiment.
tain any security vulnerabilities or not, without
specifying the corresponding CWEs. We ask the
model to just return the line number of the vul- 4.1 Parameters
nerability if it contains any. Then, we compare The parameters of the experiment are the prompt
these lines with ground truth labels. In effect, which contains the instructions the model will execute
this experiment is a binary classification. and a parameter called temperature that determines
(2) In our second experiment, we provide the list of the randomness level of the model response. The tem-
the corresponding CWEs and ask the model to perature can take the values between 0 to 6 with 6
find the vulnerabilities from the labels’ list in giving the most random output. Because we want to
the Python vulnerable file. In this experiment, specify labels for a piece of code, the nature of our job
we ask the model to respond in JSON format is deterministic, so we set the temperature to zero to
like [{"label": "CWE-X", "line of Code": "line get the most stable behavior. Another important pa-
4

Table 1. Comparison of the responses of two Prompts with Table 2. GPT-3.5 prompts used. Vul_code refers to the
minor change vulnerable code and labels1 is the list of labels of all vulnerable
files and labels2 is the labels of each vulnerable file that is
Prompt Response iterated through a loop.
find all the vulnerabilities with [ { ”label”: ”CWE-78: Improper
Experiment No. Prompt
the CWE standard in the Neutralization of Special Ele-
python code which is delim- ments used in an OS Command Experiment1 You will be provided with a python
ited with triple backticks. also (′ OS Command Injection′ )”, code delimited by triple backticks. If
give the line of the vulner-”line of Code”: ”command it contains any security vulnerability,
ability in the code. python = ′ echo′ + param + ′ » ′ + identify the lines of vulnerable code
code:′′′ {Vul_code}′′′ Format ′ menu.txt′ ” }, { ”label”: ”CWE- and only write the line in quotation If
your response as a list of JSON 78: Improper Neutralization of the code does not contain a vulnera-
objects with ”label” and ”line Special Elements used in an bility, then simply write None. python
of Code” as the keys for each el- OS Command (′ OS Command code: ′′′ {Vul_code}′′′
ement. only answer with JSON. Injection′ )”, ”line of Code”:
Experiment2 Which of the following vulnerabilities
”subprocess.call(command,
from list of vulnerabilities exist in the
shell=True)” }, { ”label”:
python code which is delimited with
”CWE-78: Improper Neu-
triple backticks. also give the line of
tralization of Special El-
the vulnerability in the code. python
ements used in an OS
code:′′′ {Vul_code}′′′ list of vulnera-
Command (′ OS Command
bilities: {”, ”.join(labels1)} Format
Injection′ )”, ”line of Code”:
your response as a list of JSON ob-
”subprocess.call(′ echo Menu: >
jects with ”label” and ”line of Code”
menu.txt′ , shell=True)” } ]
as the keys for each element. only an-
find all the vulnerabilities with [ { ”label”: ”Command swer with JSON.
the CWE standard in the Injection”, ”line of Code”:
Experiment3 Which of the following vulnerabilities
python code. also give the line ”subprocess.call(command,
from list of vulnerabilities exist in the
of the vulnerability in the code. shell=True)” }, { ”label”:
python code which is delimited with
python code: ′′′ {Vul_code}′′′ ”Command Injection”, ”line of
triple backticks. also give the line of
Format your response as a list Code”: ”subprocess.call(′ echo
the vulnerability in the code. python
of JSON objects with ”label”Menu: > menu.txt′ ,
code: ′′′ {Vul_code}′′′ list of vulnera-
and ”line of Code” as the keys shell=True)” } ]
bilities: {”, ”.join((labels2)} Format
for each element. only answer
your response as a list of JSON ob-
with JSON.
jects with ”label” and ”line of Code”
as the keys for each element. only an-
rameter is the prompt which is very influential in the swer with JSON.
results and adjusting it to get the best results is a chal- Experiment4 Your task is to determine whether the
lenging task. We used the prompts given in Table 2. following python code which is delim-
ited with triple backticks,is vulnerable
We chose our prompts based on [29]. According to [29],
or not? identify the following items:
some of the key points to create an optimized prompt - CWE of its vulnerabilities. - lines
are: using delimiters such as triple quotes or triple of vulnerable code. Format your re-
backticks to specify the piece of code, asking for struc- sponse as a list of JSON objects with
tured output such as JSON, HTML, etc., specifying ”label” and ”line of Code” as the keys
for each vulnerability. If the infor-
the steps to complete a task in a clear way, instructing
mation isn’t present, use ”unknown”
the model to work out its own solutions before rush- as the value. Make your response as
ing to a conclusion. In order to show the sensitivity of short as possible and only answer with
ChatGPT to its prompts, an example is provided in JSON. python code:′′′ {Vul_code}′′′
Table 1 where a prompt with minor modification is
given to the model and the model responses with dif-
SonarQube SAST tools and we also query the Chat-
ferent answers in which the second response contains
GPT model with our dataset using the appropriate
one less vulnerability compared to the first one.
prompts. We then calculate the following metrics for
5 Results each of the tools’ results and the model result based
on our ground truth labels. Finally, we compare the
In this section, we provide the results of our experi- results of the tools with GPT-3.5 model.
ments. First we explain the metrics we use for evalu-
ating our work and then we present GPT-3.5 results
5.1 Evaluation Metrics
and compare them with three popular SAST tools for
Python vulnerability detection. To be more precise, In classification, we have condition positive which
we perform the following actions: We give a dataset of indicates the number of real positive cases in the
156 vulnerable python codes to Bandit, Semgrep and data. Similarly, there is a condition negative which is
5

the number of real negative cases in the data. Based

on these conditions, there will be four parameters:
true positive (TP) which is the number of positive
examples labeled as such, true negative (TN) that is
the number of negative examples labeled as such, false
positive (FP) that is the number of negative examples
labeled as positive and false negative (FN) that is the
number of positive examples labeled as negative. We
define precision, recall and F-measure according to
the following formulas [30].
• Precision: It answers the question that out of all
the examples the classifier labeled positive, what
proportion were correct? It is defined according
to the following equation:
TP
precision = (1)
TP + FP
• Recall: It answers the question that out of real
positive examples, what proportion did the clas-
sifier labeled as positive? It is defined as the Figure 1. F1-score of top 6 CWE classes in experiment 3
following formula: (case 2)

TP
recall = (2) Precision Recall F1
TP + FN
Semgrep 0.6694 0.1504 0.2457
• F-measure: It is a measure which combines pre- Bandit 0.7450 0.1447 0.2424

cision and recall and is defined according to the SonarQube 0.9104 0.1161 0.2060

following formula: GPT-3.5 0.7413 0.0819 0.1475

precision × recall Table 3. Results of Experiment 1 (Binary classification)

F =2× (3)
precision + recall
Precision Recall F1
5.2 Analyzing Results Semgrep 0.4682 0.1123 0.1812
Bandit 0.3168 0.0609 0.1022
In this section, we present the results based on the
SonarQube 0.3283 0.0419 0.0743
mentioned metrics in the previous section. The results
GPT-3.5 0.1659 0.0761 0.1044
for experiment 1 in which we did not ask the model to
return the CWEs, are provided in Table 3. The pre- Table 4. Results of Experiment 2 (Selecting from the list)
cision for the model in this experiment is not better
than other three tools. Furthermore, the low recall
suggests that using this model for only detecting vul- The results of experiment 3 in which we provided
nerable lines of a code does not give any better results the classes per vulnerable file, are given in Table 5.
than SAST tools since low recall leads to high false Here the figures indicate that case 1, in which we do
negative rate. Likewise, the results of experiment 2, not accept the new labels returned from the model,
which are presented in Table 4, indicate that using has produced better results than case 2. These results
GPT-3.5 model with all the classes given as labels, are even significantly better than those of SAST tools
does not provide superior results in comparison with in this experiment. The F1-score for the top 6 CWEs
the SAST tools. Note that in this experiment, the in terms of frequency are illustrated in Figure 1 for this
order of the given labels to GPT-3.5 model has high experiment. This behavior shows that using ChatGPT
impact in the generated results from the model. This as an assistant along with SAST tools can be a good
is because when the order of the labels is changed, in idea. Moreover, if we do not provide any labels for
fact the prompt is modified and as we mentioned be- the model and ask it to return the CWEs of the
fore, the prompt has great effect on the model’s results. vulnerable codes from its own knowledge, as we did in
Therefore, we gave the labels in a random order. Here, experiment 4, we obtain the results in Table 6 which
we reach the same conclusion as [25] in which the au- are comparable to the SAST tools. By and large, our
thors concluded that the capabilities of the ChatGPT experiments show that using ChatGPT model as an
model for detecting vulnerabilities in code are limited. assistant for SAST tools can provide hopeful results.
6

Precision Recall F1 and provides primary steps toward this path. In future
Semgrep 0.4682 0.1123 0.1812
studies, the behavior of the latest model of ChatGPT
Bandit 0.3168 0.0609 0.1022
(GPT-4) which is more powerful than GPT-3.5 model,
SonarQube 0.3283 0.0419 0.0743
can be examined in vulnerability detection of codes
Experiment3,GPT-3.5-Case 1 0.7807 0.2781 0.4101
Experiment3,GPT-3.5-Case 2 0.333 0.1542 0.2109
with the hope of obtaining better results. Moreover,
the Temperature parameter of the model can be set
Table 5. Results of Experiment 3 (SAST assistant) to values other than zero and innovative rules can be
passed to decide for the most efficient obtained results.
Precision Recall F1 Another suggestion can be using one-shot learning in
Semgrep 0.4682 0.1123 0.1812 future works. Moreover, it should be considered that,
Bandit 0.3168 0.0609 0.1022 there is a security caution about using ChatGPT as
SonarQube 0.3283 0.0419 0.0743
as a SAST tool because it is required to upload the
GPT-3.5 0.3350 0.1238 0.1808
source code on the OpenAI servers.
Table 6. Results of Experiment 4 (Free Classification)
A Appendix
6 Threats to Validity The distribution of labels of our dataset is provided
in Table A.1
In this Section, we discuss some factors in our experi-
ments that could affect the correctness of the results. References
Our biggest challenge was the choice of the prompts of
[1] Wikipedia. https://en.wikipedia.org/wiki/
ChatGPT. There are some metrics for measuring the
GitHub, 2023. Accessed: 2023-03-27.
effectiveness of a prompt for LLMs. In [31] naturalness
[2] cvedetails. https://www.cvedetails.com/
and expressiveness are mentioned as two important
browse-by-date.php, 2023. Accessed: 2015-08-
factors for a prompt. Here, we tried to choose the
23.
most efficient prompts in terms of these metrics and
[3] Kumar V, Anjum M, Agarwal V, and Kapur
based on what was explained in 4.1 [29]. However, it
PK. A hybrid approach for evaluation and pri-
is possible that a more careful selection of the prompt
oritization of software vulnerabilities. Predictive
can affect the results. Another factor which may also
Analytics in System Reliability. Cham: Springer
affect the results is the size of the dataset and its acces-
International Publishing, - -:39–51, 2023.
sibility on the Internet. Furthermore, the distribution
[4] Sharma A. Zhou Y. Automated identification of
of the CWEs of the dataset is of great importance. To
security issues from commit messages and bug
overcome this threat, we chose three different datasets
reports. In Proceedings of the 2017 11th Joint
for better generalization of the vulnerabilities they
Meeting on Foundations of Software Engineering,
cover, but there may be still few coverages of the vul-
2017.
nerabilities. Moreover, we only compare this model
[5] Zhou Y., Liu S., Siow J, Du X, and Liu Y. Devign.
with three SAST tools for Python language. Perhaps,
Effective vulnerability identification by learning
further SAST tools affect the results. And finally, we
comprehensive program semantics via graph neu-
only test GPT-3.5 model of ChatGPT and it is possi-
ral networks. Advances in neural information
ble that the new billable version (GPT-4) performs
processing systems, 32, 2019.
better than this version.
[6] Perl H, Dechand S, Smith M, Arp D, Yamaguchi,
7 Conclusion Rieck K, and et al. Vccfinder: Finding potential
vulnerabilities in open-source projects to assist
In this paper, we did four types of experiments with code audits. In Proceedings of the 22nd ACM
ChatGPT model to detect the security vulnerabilities SIGSAC Conference on Computer and Commu-
of Python codes. We compared this model with Bandit, nications Security, 2015.
Semgrep and SonarQube that are popular SAST tools [7] Jabeen G, Rahim S, Afzal W, Khan D, Khan A,
for Python codes. We concluded that using GPT-3.5 Hussain Z, and et al. Machine learning techniques
model for vulnerability detection of codes in some for software vulnerability prediction: a compara-
especial manners gives promising results. Specifically, tive study. Applied Intelligence, 52, 2022.
if we use it as SAST tool assistant, it will produce [8] Maffeis S. Hanif H. Vulberta: Simplified source
results that can help to improve the returned results code pre-training for vulnerability detection. In
of SAST tools. Overall, we believe this model has the 2022 International Joint Conference on Neural
potential to be used in vulnerability detection tasks Networks (IJCNN), pages 1–8, 2022.
regarding the factors that may effect the correctness [9] Berabi B, He J, Raychev V, and Vechev M. Tfix:
of the results which we described in 6. However, we Learning to fix coding errors with a text-to-text
admit that this study is not general from all aspects
7

transformer. In Proceedings of the 38th Inter-

Table A.1. Details of Datasets national Conference on Machine Learning; Pro-
Vulnerability Occurrence in [26] Occurrence in [27] ceedings of Machine Learning Research: PMLR,
CWE-15 15 0 pages 78–91, 2021.
CWE-1004 1 0
CWE-614 1 0 [10] Lorenz Hüther, Bernhard J. Berger, Stefan
CWE-489 22 0 Edelkamp, Sebastian Eken, Lara Luhrmann, and
CWE-20 0 18
CWE-22 3 11 et al Hendrik Rothe. Machine learning in the
CWE-78 25 5 context of static application security testing -
CWE-79 26 11
CWE-80 0 3 ml-sast. Federal Office for Information Security
CWE-89 2 6 (BSI), 2021.
CWE-90 0 15
CWE-94 0 7 [11] Abdalsamad Keramatfar, Mohadeseh Rafiee, and
CWE-95 0 2 Hossein Amirkhani. Graph neural networks: a
CWE-99 0 2
CWE-113 0 5 bibliometrics overview. Machine Learning with
CWE-116 0 7 Applications, 10:100401, 2022.
CWE-117 0 6
CWE-1204 0 3 [12] Chakraborty S, Krishna R, and Ding Yand Ray B.
CWE-193 0 4 Deep learning based vulnerability detection: Are
CWE-200 0 5
CWE-209 0 4 we there yet? In IEEE Transactions on Software
CWE-215 0 4 Engineering, pages 3280–96. IEEE, 2022.
CWE-250 0 3
CWE-252 0 2 [13] Fu Michael and et al. Vulrepair: A t5-based auto-
CWE-259 0 2 mated software vulnerability repair. In Proceed-
CWE-269 0 4
CWE-283 0 2 ings of the 30th ACM Joint European Software
CWE-284 0 3 Engineering Conference and Symposium on the
CWE-285 0 5
CWE-295 1 8 Foundations of Software Engineering, pages 935–
CWE-297 0 10 947, 2022.
CWE-306 0 4
CWE-312 0 3 [14] et al. Zhou, Yaqin. Devign: Effective vulnerability
CWE-319 0 2 identification by learning comprehensive program
CWE-321 0 1
CWE-326 0 4 semantics via graph neural networks. Advances in
CWE-327 0 7 neural information processing systemsy, 32, 2019.
CWE-329 0 4
CWE-330 0 1 [15] et al. Ding, Yangruibo. Velvet: a novel ensemble
CWE-331 0 1 learning approach to automatically locate vul-
CWE-339 0 4
CWE-347 0 3 nerable statements. In 2022 IEEE International
CWE-352 0 1 Conference on Software Analysis, Evolution and
CWE-367 0 3
CWE-377 0 3 Reengineering (SANER), pages 959–970. IEEE,
CWE-379 0 4 2022.
CWE-384 0 4
CWE-385 0 6 [16] spectrum. https://spectrum.ieee.org/top-
CWE-400 0 3 programming-languages-2022, 2022.
CWE-406 0 9
CWE-414 0 7 Accessed:2023-06-23.
CWE-425 0 4 [17] et al. Lomio, Francesco. Just-in-time software
CWE-434 0 9
CWE-454 0 6 vulnerability detection: Are we there yet? Journal
CWE-462 0 5 of Systems and Software, - -:111283, 2022.
CWE-477 0 2
CWE-488 0 4 [18] Rebecca Russell and et al. Kim. Automated vul-
CWE-502 0 15 nerability detection in source code using deep
CWE-521 0 5
CWE-522 0 12 representation learning. In 2018 17th IEEE inter-
CWE-595 0 4 national conference on machine learning and ap-
CWE-601 0 14
CWE-605 0 4 plications (ICMLA), pages 757–762. IEEE, 2018.
CWE-611 0 22 [19] Zhen Li and et al. Zou. Vuldeepecker: A deep
CWE-641 0 3
CWE-643 0 8 learning-based system for vulnerability detection.
CWE-703 0 13 arXiv preprint arXiv:1801.01681, 2018.
CWE-730 0 10
CWE-732 0 4 [20] Laura Wartschinski and et al. Nollers. Vudenc:
CWE-759 0 4 Vulnerability detection with deep learning on a
CWE-760 0 2
CWE-776 0 3 natural codebase for python. Information and
CWE-798 0 4 Software Technology, 144:106809, 2022.
CWE-827 0 3
CWE-835 0 5 [21] et al. Steenhoek, Benjamin. An empirical study of
CWE-841 0 10 deep learning models for vulnerability detection.
CWE-918 0 8
CWE-941 0 8 arXiv preprint arXiv:2212.08109, 2022.
CWE-943 0 7 [22] et al. Chen, Yizheng. Diversevul: A new vul-
8

nerable source code dataset for deep learning

based vulnerability detection. arXiv preprint
arXiv:2304.00409, 2023.
[23] Hazim Hanif and Sergio Maffeis. Vulberta: Sim- Abdalsamad Keramatfar re-
plified source code pre-training for vulnerability ceived his PhD in Information Tech-
detection. In International Joint Conference on nology engineering in 2021 with a
Neural Networks (IJCNN), pages 1–8, 2022. focus on Natural Language Process-
[24] Michael Fu and Chakkrit Tantithamthavorn. ing and Deep Learning. He worked
Linevul: a transformer-based line-level vulnera- as a Data Scientist for 6 years at SID
bility prediction. In Proceedings of the 19th Inter- and is currently working as an AI re-
national Conference on Mining Software Reposi- searcher at RCDAT.
tories, pages 608–620, 2022.
[25] Pavel Zadorozhny Cheshkov, Anton and Ro-
Amir Norouzi is a data scientist
dion Levichev. Evaluation of chatgpt model
with a Master’s degree in Bioelectric
for vulnerability detection. arXiv preprint
Engineering from Amirkabir Univer-
arXiv:2304.07232, 2023.
sity of Technology. He is experienced
[26] Mohammed Latif Siddiq and Joanna CS Santos.
in machine learning, data engineer-
Securityeval dataset: mining vulnerability exam-
ing, deep learning, and data analysis.
ples to evaluate machine learning-based code gen-
He has been working in AI since 2017
eration techniques. In Proceedings of the 1st Inter-
and is currently working as a researcher in RCDAT.
national Workshop on Mining Software Reposito-
ries Applications for Privacy and Security, pages
29–33, 2022. Mohammad Mahdi
[27] Bruno Thalmann Stefan Micheelsen. Pyt: A static Chekidehkhoun graduated from
analysis tool for detecting security vulnerabilities Telecommunications engineering in
in python web applications, 2016. 2016 with a focus on identifying
[28] python-security. https://github.com/python- threats in the mobile network. He has
security/pyt/tree/master/examples, 2018. been working as a security specialist
Accessed: 2023-06-15. at RCDAT for 12 years, focusing on
[29] Isa Fulford Andrew Ng. Chatgpt prompt penetration testing and code security reviews.
engineering for developers. https://www.
deeplearning.ai/short-courses/chatgpt-
prompt-engineering-for-developers, April
2023. Accessed: 2023-04-27.
[30] Atieh Bakhshandeh and Zahra Eskandari. An
efficient user identification approach based on
netflow analysis. In 2018 15th International ISC
(Iranian Society of Cryptology) Conference on
Information Security and Cryptology (ISCISC),
pages 1–5. IEEE, 2018.
[31] Catherine Tony, Markus Mutas, Nicolás E Díaz
Ferreyra, and Riccardo Scandariato. Llmse-
ceval: A dataset of natural language prompts
for security evaluations. arXiv preprint
arXiv:2303.09384, 2023.

Atieh Bakhshandeh is a cyber secu-

rity researcher in RCDAT since 2014.
She has a master degree in Computer
Science and her interested research
areas include data analysis for secu-
rity threat detection and penetration
testing.

AnshPatelResume OP-2
No ratings yet
AnshPatelResume OP-2
1 page
Angular Testing Book
No ratings yet
Angular Testing Book
81 pages
Vulnerability Detection in Popular Programming Languages With Language Models
No ratings yet
Vulnerability Detection in Popular Programming Languages With Language Models
21 pages
Final Research Paper
No ratings yet
Final Research Paper
9 pages
A Deep Learning Based Static Taint Analysis Approach
No ratings yet
A Deep Learning Based Static Taint Analysis Approach
40 pages
Your Instructions Are Not Always Helpfu
No ratings yet
Your Instructions Are Not Always Helpfu
10 pages
Usenixsecurity24 Liu Peiyu
No ratings yet
Usenixsecurity24 Liu Peiyu
19 pages
Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
Automated Vulnerability Detection Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detection Using Deep Representation Learning
7 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
No ratings yet
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
18 pages
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
No ratings yet
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
15 pages
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
No ratings yet
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
16 pages
Li 2021
No ratings yet
Li 2021
16 pages
Sensors 23 07978 v2
No ratings yet
Sensors 23 07978 v2
33 pages
Vulnerabilities Classification Machine Learning Paper SIMARGL
No ratings yet
Vulnerabilities Classification Machine Learning Paper SIMARGL
16 pages
QNLP
No ratings yet
QNLP
20 pages
Auto-Detection of Programming Code Vulnerabilities With Natural L
No ratings yet
Auto-Detection of Programming Code Vulnerabilities With Natural L
37 pages
Security Vulnerability Detection Using Deep Learning Natural Language Processing
No ratings yet
Security Vulnerability Detection Using Deep Learning Natural Language Processing
6 pages
Usage of Machine Learning in Software Testing
No ratings yet
Usage of Machine Learning in Software Testing
15 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
Artificial Intelligence Techniques For Security Vulnerability Prevention
No ratings yet
Artificial Intelligence Techniques For Security Vulnerability Prevention
8 pages
Usenixsecurity24 - Slides Liu Peiyu
No ratings yet
Usenixsecurity24 - Slides Liu Peiyu
39 pages
Technical Report: Evaluation of Chatgpt Model For Vulnerability Detection
No ratings yet
Technical Report: Evaluation of Chatgpt Model For Vulnerability Detection
6 pages
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
No ratings yet
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
17 pages
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
No ratings yet
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
24 pages
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
No ratings yet
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
9 pages
Usenixsecurity23 Mirsky
No ratings yet
Usenixsecurity23 Mirsky
19 pages
Mishra Thesis AI Augmented Vulnerability
No ratings yet
Mishra Thesis AI Augmented Vulnerability
96 pages
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
7 pages
AIBug Hunter
No ratings yet
AIBug Hunter
34 pages
Vul-RAG Enhancing LLM-based Vulnerability Detectio
No ratings yet
Vul-RAG Enhancing LLM-based Vulnerability Detectio
12 pages
Haseeb Tahir Report
No ratings yet
Haseeb Tahir Report
40 pages
Cyber Vulnerability Intelligence For Internet of Things Binary
No ratings yet
Cyber Vulnerability Intelligence For Internet of Things Binary
10 pages
Catching Bugs Before The Hack How AI Can Predict Security Flaws in Code
No ratings yet
Catching Bugs Before The Hack How AI Can Predict Security Flaws in Code
3 pages
Vul-RAG Enhancing LLM-based Vulnerability Detection Via Knowledge-Level RAG
No ratings yet
Vul-RAG Enhancing LLM-based Vulnerability Detection Via Knowledge-Level RAG
12 pages
AI Code Generators For Security: Friend or Foe?
No ratings yet
AI Code Generators For Security: Friend or Foe?
9 pages
1 s2.0 S0164121222000437 Main3
No ratings yet
1 s2.0 S0164121222000437 Main3
2 pages
1 s2.0 S0164121222000437 Main
No ratings yet
1 s2.0 S0164121222000437 Main
4 pages
1 s2.0 S0164121222000437 Main2
No ratings yet
1 s2.0 S0164121222000437 Main2
3 pages
When Chatgpt Meets Smart Contract Vulnerability Detection: How Far Are We?
No ratings yet
When Chatgpt Meets Smart Contract Vulnerability Detection: How Far Are We?
30 pages
Dlap
No ratings yet
Dlap
15 pages
How Secure Is AI-generated Code: A Large-Scale Comparison of Large Language Models
No ratings yet
How Secure Is AI-generated Code: A Large-Scale Comparison of Large Language Models
47 pages
SAST Tool Evaluation in The Age of 5G and IoT Semgrep vs. Codacy
No ratings yet
SAST Tool Evaluation in The Age of 5G and IoT Semgrep vs. Codacy
6 pages
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
Web Application Vulnerability Prediction Using Machine Learning
No ratings yet
Web Application Vulnerability Prediction Using Machine Learning
10 pages
Is Github'S Copilot As Bad As Humans at Introducing Vulnerabilities in Code?
No ratings yet
Is Github'S Copilot As Bad As Humans at Introducing Vulnerabilities in Code?
24 pages
Pattern Based Vulnerability Discovery
No ratings yet
Pattern Based Vulnerability Discovery
151 pages
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
No ratings yet
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
11 pages
1 s2.0 S0167404822004096 Main
No ratings yet
1 s2.0 S0167404822004096 Main
11 pages
A Systematic Analysis OnChatbot Information Security
No ratings yet
A Systematic Analysis OnChatbot Information Security
14 pages
Artificial Intelligence-Driven Penetration Testing For Wireless Networks: Enhancing Security Vulnerability Detection Using CNN Models
No ratings yet
Artificial Intelligence-Driven Penetration Testing For Wireless Networks: Enhancing Security Vulnerability Detection Using CNN Models
1 page
Exploring The Use of ChatGPT For Resolving Programming Bugs
No ratings yet
Exploring The Use of ChatGPT For Resolving Programming Bugs
12 pages
Paper Submission Example From Peergrade
No ratings yet
Paper Submission Example From Peergrade
6 pages
Electronics
No ratings yet
Electronics
23 pages
Castle
No ratings yet
Castle
18 pages
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
No ratings yet
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
18 pages
Web Application Vulnerability Prediction Using Hybrid Program Ana
No ratings yet
Web Application Vulnerability Prediction Using Hybrid Program Ana
21 pages
Soft Vulns Survey
No ratings yet
Soft Vulns Survey
35 pages
Towards_the_application_of_recommender_systems_to_
No ratings yet
Towards_the_application_of_recommender_systems_to_
25 pages
SIET 2018 - Program Book
No ratings yet
SIET 2018 - Program Book
24 pages
1.5.7 Packet Tracer - Network Representation
No ratings yet
1.5.7 Packet Tracer - Network Representation
3 pages
702318A3, SAL WTU Advanced Menu System
No ratings yet
702318A3, SAL WTU Advanced Menu System
32 pages
MindsAhead Academy - Brainobrain USA, Language Minds, Tutor Minds, MindsBee - Home
No ratings yet
MindsAhead Academy - Brainobrain USA, Language Minds, Tutor Minds, MindsBee - Home
16 pages
Battery Model Parameter Estimation Using A Layered Technique: An Example Using A Lithium Iron Phosphate Cell
No ratings yet
Battery Model Parameter Estimation Using A Layered Technique: An Example Using A Lithium Iron Phosphate Cell
15 pages
LG 42lg3000
No ratings yet
LG 42lg3000
32 pages
One Dimensional Array in Java - Tutorial & Example-Por
No ratings yet
One Dimensional Array in Java - Tutorial & Example-Por
5 pages
Comparing Xbar R and Xbar-S Control Charts1
No ratings yet
Comparing Xbar R and Xbar-S Control Charts1
11 pages
Proposal For Organizing and Conducting A Practical Workshop On Robotics & Automation
100% (1)
Proposal For Organizing and Conducting A Practical Workshop On Robotics & Automation
4 pages
Grafana
No ratings yet
Grafana
8 pages
2 Gcno Numeric 8 The Unique Number Given For Every Trip (Trip Sheet Number)
No ratings yet
2 Gcno Numeric 8 The Unique Number Given For Every Trip (Trip Sheet Number)
3 pages
Mad 18
No ratings yet
Mad 18
12 pages
TCL TK Tutorial
0% (1)
TCL TK Tutorial
19 pages
Pendon Group
No ratings yet
Pendon Group
19 pages
Vit Ap: Object Oriented Programming (CSE2005 - 030) Marks: 50 Duration: 90 Mins. Answer All The Questions
No ratings yet
Vit Ap: Object Oriented Programming (CSE2005 - 030) Marks: 50 Duration: 90 Mins. Answer All The Questions
2 pages
End To End Automation
No ratings yet
End To End Automation
44 pages
Data Sheet: FH G1/4" Green-Brass
No ratings yet
Data Sheet: FH G1/4" Green-Brass
12 pages
User Guides: (Windows 7)
No ratings yet
User Guides: (Windows 7)
10 pages
C++ Concurrency Cheatsheet
100% (1)
C++ Concurrency Cheatsheet
1 page
Hikvision
No ratings yet
Hikvision
5 pages
Selecting A LMS and Switching From A Proprietary To Open Source LMS, Clayton R. Wright
No ratings yet
Selecting A LMS and Switching From A Proprietary To Open Source LMS, Clayton R. Wright
54 pages
Computeractive Issue.637 16 August 2022
No ratings yet
Computeractive Issue.637 16 August 2022
76 pages
Ind 780
No ratings yet
Ind 780
254 pages
Red Hat Openstack Platform-16.0-Partner Integration-En-Us
No ratings yet
Red Hat Openstack Platform-16.0-Partner Integration-En-Us
67 pages
Jeffrey Aboh - Resume
No ratings yet
Jeffrey Aboh - Resume
2 pages
Zscaler CSPM Retail Datasheet
No ratings yet
Zscaler CSPM Retail Datasheet
2 pages
Manual Rosemount Tankmaster Winview en 81040
No ratings yet
Manual Rosemount Tankmaster Winview en 81040
116 pages
MOdul Pengganti Untuk Egsv3
No ratings yet
MOdul Pengganti Untuk Egsv3
18 pages

E Ure: The Int'l Journal of Information Security

Uploaded by

E Ure: The Int'l Journal of Information Security

Uploaded by

The ISC Int'l Journal of

Using ChatGPT as a Static Application Security Testing Tool

1 Research Center for Development of Advanced Technologies, Tehran, Iran

1 Introduction Software vulnerability is a technical vulnerability that

the number of real negative cases in the data. Based

following formula: GPT-3.5 0.7413 0.0819 0.1475

precision × recall Table 3. Results of Experiment 1 (Binary classification)

transformer. In Proceedings of the 38th Inter-

nerable source code dataset for deep learning

Atieh Bakhshandeh is a cyber secu-

You might also like