0% found this document useful (0 votes)
10 views29 pages

Application of Large Language Models in Cybersecurity A Systematic Literature Review

This document presents a systematic literature review of 177 articles published between 2018 and 2024 on the application of Large Language Models (LLMs) in cybersecurity. It highlights the effectiveness of LLMs in both offensive and defensive cybersecurity measures, particularly in phishing simulations and administrative tasks, while also addressing cyberethics and governance. The review indicates a significant trend towards AI as a defensive measure, with a notable concentration of recent research originating from the USA and China.

Uploaded by

enricolu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views29 pages

Application of Large Language Models in Cybersecurity A Systematic Literature Review

This document presents a systematic literature review of 177 articles published between 2018 and 2024 on the application of Large Language Models (LLMs) in cybersecurity. It highlights the effectiveness of LLMs in both offensive and defensive cybersecurity measures, particularly in phishing simulations and administrative tasks, while also addressing cyberethics and governance. The review indicates a significant trend towards AI as a defensive measure, with a notable concentration of recent research originating from the USA and China.

Uploaded by

enricolu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/386133416

Application of Large Language Models in Cybersecurity: a Systematic


Literature Review

Article in IEEE Access · December 2024


DOI: 10.1109/ACCESS.2024.3505983

CITATIONS READS

0 271

4 authors, including:

Seppo Virtanen Antti Hakkala


University of Turku University of Turku
97 PUBLICATIONS 1,360 CITATIONS 32 PUBLICATIONS 157 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Seppo Virtanen on 04 December 2024.

The user has requested enhancement of the downloaded file.


Received 4 November 2024, accepted 21 November 2024, date of publication 25 November 2024,
date of current version 4 December 2024.
Digital Object Identifier 10.1109/ACCESS.2024.3505983

Application of Large Language Models in


Cybersecurity: A Systematic Literature Review
ISMAYIL HASANOV 1,2 , SEPPO VIRTANEN 1, (Senior Member, IEEE), ANTTI HAKKALA 1,

AND JOUNI ISOAHO 1


1 Department of Computing, University of Turku, 20014 Turku, Finland
2 Openfactory Nordic oy, 20500 Turku, Finland
Corresponding author: Ismayil Hasanov ([email protected])

ABSTRACT The emergence of Large Language Models (LLMs) is currently creating a major paradigm shift
in societies and businesses in the way digital technologies are used. While the disruptive effect is especially
observable in the information and communication technology field, there is a clear lack of systematic
studies focusing on the application and impact of LLMs in cybersecurity holistically. This article presents an
exhaustive systematic literature review of 177 articles published in 2018-2024 on the application of LLMs
and the use of Artificial Intelligence (AI) as a defensive measure in cybersecurity. This article contributes
an analytical compendium of the recent research on the application of LLMs in offensive and defensive
cybersecurity as well as in research on cyberethics, current legal frameworks, and research regarding the use
of LLMs for cybersecurity governance. It also contributes a statistical summary of global research trends in
the field. Of the reviewed literature, 68% was published in 2023. Nearly 30% of the articles originate from the
USA and 11% from China, with other countries currently having significantly lower contributions to recent
research. Most attention in recent research has been given to AI as a defensive measure, accounting for
27% of the reviewed literature. It was observed that LLMs have proven highly effective in phishing attack
simulations and in managing cybersecurity administrative aspects, including defending against advanced
exploits. Furthermore, LLMs show significant potential in the development of security software, further
cementing their role as a powerful tool in cybersecurity innovation.

INDEX TERMS Cybersecurity, artificial intelligence, large language models, generative AI, penetration
testing, cyberethics, network security, natural language processing, systematic literature review, survey.

I. INTRODUCTION AI assists in identifying and tracking malicious patterns,


Artificial intelligence (AI) has profoundly permeated various thereby substantially reducing human workload, minimizing
sectors, notably through the emergence of Large Language human error, and enhancing system efficiency. Nonetheless,
Models (LLMs). These technologies find applications across it is acknowledged that AI systems are not infallible,
diverse industries, including healthcare, mechanical engi- evidenced by occurrences of false positives and negatives.
neering, and information technologies, with an increasing However, the advantages of AI are significant, most notably
presence also in cybersecurity and with both beneficial and the capacity for round-the-clock operation without fatigue
detrimental effects on the development of the field. Notable and with sustained concentration. Moreover, the ability of AI
applications in cybersecurity include Intrusion Detection Sys- to discern subtleties that may elude human observation makes
tems (IDS) [1] and Intrusion Prevention Systems (IPS) [2], it indispensable.
Security Information and Event Management (SIEM) sys- Being proficient in understanding and generating human-
tems [3], and Next Generation Firewalls (NGFW) [4]. like text, LLMs are invaluable for numerous cybersecurity
tasks such as log analysis and penetration testing. LLMs are
The associate editor coordinating the review of this manuscript and particularly beneficial for Small and Medium-sized Enter-
approving it for publication was Sedat Akleylek . prises (SMEs) by facilitating tasks like website development
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 176751
I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

TABLE 1. Overview of related literature surveys on using LLMs in the cybersecurity context in comparison to the work presented in this article.

and security audits, thereby economizing the use of addresses the use of LLMs in both defensive and offensive
resources. cybersecurity, in contrast to many studies that focus solely
The primary motivation for conducting this study is the on one aspect. Additionally, the potential implementation of
limited number of comprehensive analyses on the imple- LLMs in cybersecurity administration, including their role
mentation of AI, particularly LLMs, in cybersecurity. While in various frameworks and policies, is discussed. Another
there are studies that explore the defensive and offensive distinguishing feature of this work is the presented analysis
capabilities or ethical concerns of LLMs, they tend to focus of cyberethics with regard to LLMs and how this reflects
on a single aspect, preventing a broader understanding of on proper governance of LLMs in today’s digital landscape.
this emerging technology. The authors believe it is crucial In summary, this work makes the following key contributions:
to provide a holistic perspective, enabling readers to fully • A review and analysis of the application of AI in
grasp the future potential of LLMs in the cybersecurity cybersecurity, including the use of LLMs for both
domain. Consequently, this literature review aims to offer offensive and defensive purposes, while other studies
a more comprehensive analysis of these technologies in typically focus on some narrower aspects (as seen in
this field. In this article, a thorough Systematic Literature Table 1).
Review (SLR) is performed to discern current methods and • A review and analysis of recent research on cyberethics
research directions in the application of AI and LLMs in in the AI context, current legal frameworks surrounding
cybersecurity. A review of other state-of-the-art surveys has AI, and the use of LLMs for cybersecurity governance.
also been conducted to highlight the unique contributions • A summary of statistics on global research trends in the
of this research. The findings are summarized in Table 1. field.
As can be observed from the table, the majority of the The following three research questions to which the
articles concentrate on one or a few specific aspects of LLM literature review presented in this article will give elaborate
implementation in cybersecurity. Moreover, the bulk of these answers are defined:
works are preprints and have yet to undergo peer review. 1) How effective are Large Language Models and Artifi-
Additionally, this work does not solely focus on LLMs; it cial Intelligence in cybersecurity applications?
also incorporates relevant AI-related articles to offer a more 2) In what ways are Large Language Models applied in
comprehensive perspective than the other works summarized cybersecurity tactics?
in Table 1. 3) What are the challenges and limitations of Large
Unlike other studies on the implementation of generative Language Models in the context of cybersecurity?
AI in cybersecurity, this work not only explores the The rest of this article is organized as follows. Section II
application of LLMs but also examines the broader use of AI presents the methodology used for conducting an SLR and
in this field, enabling readers to compare the effectiveness of discusses some statistical findings regarding the analyzed
various AI technologies in security. Furthermore, this work literature. Section III discusses key concepts and terminology

176752 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

of AI, LLMs and cybersecurity to the extent needed for 26 keyword sets used for querying the databases. The details
following the presentation of the literature analysis and the of these keyword sets are presented in Tables 2, 3, 4 and 5.
results. The analysis of the reviewed literature is presented in
section IV. The order of topics presented in section IV follows TABLE 2. Queries for research question 1.
the same sequence as the topical categories listed in Table 10.
Following the literature analysis, the results are discussed
in section V. Concluding remarks of the study are given in
section VI.

TABLE 3. Queries for research question 2.

TABLE 4. Queries for research question 3.

FIGURE 1. Systematic literature review process used in this work.


TABLE 5. Topic-specific queries.

II. MATERIALS AND METHODS


This section describes the methodology employed for
conducting the SLR and also presents statistical findings
regarding the analyzed literature. The SLR methodology
applied in this study is illustrated in Figure 1, which also
outlines the number of articles filtered out at each stage of For each database and corresponding keyword set, an ini-
the analysis. The methodology draws inspiration from, and is tial screening was undertaken to compile a preliminary list of
a simplified adaptation of, two well-established approaches articles for subsequent analysis. Given the rapid evolution of
in the existing literature: the Kitchenham Guidelines [12] AI and the relatively recent emergence of LLMs, the focus
and the PRISMA (Preferred Reporting Items for Systematic was on studies published in the last six years (2018–2024);
Reviews and Meta-Analyses) [13] framework. older studies were filtered out. During preliminary screening,
The literature review entails an exhaustive analysis of the title and abstract of each article were carefully examined.
several databases and research dissemination platforms, The inclusion of an article in the list was determined based
specifically: arXiv, IEEE Xplore, ResearchGate, Scopus, on the following criteria:
Google Scholar, ScienceDirect, SpringerLink and ACM • Population/Subject: Determine whether the article falls
Digital Library. Each database was queried using a carefully within the scope of the research.
made set of keywords derived from the research questions. • Outcomes of Interest: Evaluate whether the efficacy,
For every research question, seven distinct sets of keywords accuracy, challenges, limitations, benefits, and applica-
were developed. In addition, five sets of topic-specific tions of AI and LLMs in cybersecurity were analyzed in
keywords were crafted, resulting in a comprehensive total of the article.

VOLUME 12, 2024 176753


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

• Language: Only studies published in English were for a Machine Learning (ML) model, which consists
considered. of 417 spear phishing emails and 13916 non-spear
phishing emails, including benign and phishing emails
After the initial screening, 250 articles were collected and
from the companies the authors were cooperating with.
documented for further analysis. Various statistics, such as
The data employed for the study are considered to be
publication year, source, and keywords, were also gathered
of good quality since companies are usually the target
during the analysis.
of phishing; therefore, such emails are ideal for training
The abstract, introduction, discussion, and conclusion
the model. Moreover, the size of the dataset is sufficient,
sections of each article were analyzed in detail, while the
and the authors additionally used resampling techniques.
main content was browsed cursorily. For the detailed analysis,
Furthermore, the authors are transparent that the spear
the following criteria were applied to assess the quality of
phishing emails were classified by an expert group,
each article:
which adds extra credibility to the quality of the data.
• Study Design and Methodology: This criterion
Based on this analysis and the collected data, the articles
assesses whether the paper aligns with the research
were classified into four distinct categories: not relevant
scope of the SLR and evaluates the relevance and
articles (73 articles), peripheral articles (10 articles), rele-
reliability of the methods employed. The authors
vant articles (123 articles), and key articles (44 articles).
examined whether the papers had a well-structured
In percentages, 18% of the collected literature comprised
research design and methodology, such as empirical
the key articles, while 49% of the literature constituted the
methods (e.g., experiments, surveys) or theoretical
relevant articles and 4% the peripheral articles. As a result
models (e.g., frameworks, simulations) that were
of this classification, the not relevant articles (73 articles,
relevant to the research questions. For example, in the
29%) were excluded from the remainder of the literature
article ‘‘GPT-Based Malware: Unveiling Vulnerabilities
review either because they did not meet the selection criteria
and Creating a Way Forward in Digital Space,’’ [14]
or they were not accessible, leaving 177 articles (71%) for
the research design is a literature review. The authors
further analysis. The distribution of these articles in terms
focus on exploring the threat of GPT-based malware.
of the databases and research dissemination platforms in
The study provides a detailed overview of the threats
which they were discovered is as follows: arXiv 44, IEEE
introduced by LLMs and ways to address them. In this
Xplore 38, ResearchGate 33, Scopus 19, Google Scholar 18,
case, a literature review and case studies are good ways
ScienceDirect 11, SpringerLink 9 and ACM Digital Library
to study possible threats arising from it; therefore, this
5. A significant portion of the articles originates from arXiv
article was accepted into the SLR, providing insights
and thus has preprint status. Since the cut-off date of the
into how LLMs are used to create malware. The
SLR, many of these articles have been published in peer-
objective of the study is to examine how threat actors
reviewed forums. However, as they were still in arXiv preprint
are using LLMs, which aligns with the work performed.
status at the time of the SLR cut-off date and during the
Therefore, this article meets the required ‘‘Study Design
review of articles for this study, they have been categorized
and Methodology’’ criteria.
as arXiv articles here. Google Scholar, ResearchGate, and
• Results: This criterion assesses whether the research
Scopus contain articles from various publishers; therefore,
outcomes are aligned with the research questions
if an article was found on one of these three services as well
presented in the article. What is more, it analyzes
as on one of the specific publisher services, the article was
whether the research presents any unique or valuable
counted for the publisher’s service.
information within its scope. For example, in the
previously mentioned study, authors define the primary
research question as identifying the vulnerabilities A. STATISTICS
arising from LLMs and suggesting ways to mitigate In the next stage of the literature review, the remaining
them. The research outcomes align well with this goal 177 articles were classified into different topical categories
since the article presents both the threats posed by based on the authors’ assessment of their research or method-
LLMs and methods to address them. Moreover, the study ical focus. Each paper was categorized based on a detailed
demonstrates how GPTs could be used in the creation analysis of its title, abstract, and content. For example, the
of polymorphic malware and includes an interesting study titled ‘‘Getting pwn’d by AI: Penetration Testing with
jailbreak prompt that is particularly noteworthy. Large Language Models’’ [16] is classified under ‘‘LLM
• Quality of the Data: This criterion verifies whether as an Offensive Tool’’ because its main focus is on the
the data employed in the training process are inherently use of LLMs in offensive cybersecurity. Similarly, ‘‘Cyber
reliable. This means the data used in the research were Intrusion Detection using Natural Language Processing on
collected from reliable sources and are not biased or Windows Event Logs’’ [17] is categorized as ‘‘NLP in
falsified. For example, in the article ‘‘Spear Phishing Cybersecurity’’ due to its emphasis on applying Natural
Emails Detection Based on Machine Learning,’’ [15] Language Processing (NLP) technologies for cyber defense.
the authors perform an experiment and employ data In the rest of this article, the authors use the terms category

176754 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

and topical category interchangeably when referring to how TABLE 7. Article distribution by countries and the topical category most
focused on in research output for each country.
researchers classified the articles. The authors defined seven
topical categories to which the articles were sorted. The
defined topical categories and the number of articles for each
category are shown in Table 6. While the topical categories
‘‘AI as a Defensive Measure’’ and ‘‘LLM in Defensive
Cybersecurity’’ are technically related (since LLMs are a
subset of AI) they were separated for the purposes of a more
detailed analysis. This way a specific focus on LLMs could TABLE 8. Number of articles by publication year and the topical category
be provided, while the other topical category considers the with most articles in each year. For 2024, only the period january-march is
broader implementation of AI in cybersecurity. included in the study.

TABLE 6. Defined topical categories and article distribution by category.

TABLE 9. Top 5 most popular keywords in analyzed literature.

counts may be explained by significant national investments


the countries make in AI research. Indeed, many of the top
AI companies are based in the USA. Another noteworthy
observation concerns the focal areas of study across different
countries. As demonstrated in Table 7, the USA primarily
concentrates on LLMs and defensive security. This is
FIGURE 2. Article distribution by topical category (numbers are
unsurprising, taking into account the substantial funding the
percentages). country allocates to its defense industry. In contrast, China
seems to direct its resources towards system vulnerabilities
The topical category with the majority of articles is ‘‘AI and the potential applications of LLMs and AI. Meanwhile,
as a defensive measure,’’ which includes 48 articles out of India’s research efforts seem to be predominantly centred on
the 177 that qualified for the detailed analysis. This category NLP in cybersecurity.
is followed by ‘‘AI system vulnerabilities and mitigating Another interesting statistic is demonstrated in Table 8,
them’’, which comprises 39 articles. The third and fourth which shows the distribution of articles by year and the
categories pertain to the use of LLMs in defensive and dominant focus area studied most in the articles in each
offensive cybersecurity, respectively. The rest of the topical year. As can be seen, almost 75% of the articles are from
categories had significantly fewer article hits as seen in 2023, while 20 articles are from the first three months of
Table 6. Figure 2 illustrates the percentage distribution of the 2024 (the cut-off for the review presented in this study was
177 articles into the topical categories, showing that nearly March 2024). Overall, the application of AI for defensive
one-third of the analyzed literature studies the usage of AI as cybersecurity has dominated research efforts over the years
a defensive measure. covered in this study.
Table 7 shows the top five countries contributing to the The top five keywords of the extracted articles are
analyzed literature. The country of origin for each article is presented in the Table 9. The most popular keyword used
determined based on the first author’s affiliation as given is ‘‘cybersecurity,’’ followed by ‘‘chatgpt’’. The most com-
in the article. The table also shows each country’s topical monly used keywords within the different article categories
focus of study based on the articles’ classification. Out of are given in Table 10.
the 177 articles, 52 are from the USA, followed by China The general tendency is that the majority of the studies
with 20 articles, and India with 11 articles. These top article analyzed in this SLR feature either cybersecurity or AI/LLM

VOLUME 12, 2024 176755


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

TABLE 10. Topical categories and the top 3 keywords used in articles for autonomously executing actions to accomplish specific
each category.
tasks [18]. This section will briefly summarize some of the
most used best practices and approaches. The quintessential
approach to AI is rule-based AI [19], wherein human
experts develop meticulous rule-based procedures. Con-
versely, in ML, computers learn from data without explicit
rule sets. In ML, algorithms and statistical models are used to
draw inferences from extensive datasets, a procedure known
as model training [20].
There are two principal techniques in ML. In supervised
learning, the available data adhere to a specific format:
{xi , yi }ni=1 , where xi denotes an input and yi denotes the
corresponding output. The primary objective is to determine
the relationship between these values in order to predict
unknown outputs [21].
Unsupervised learning, as the name implies, does not
depend on known outcomes or outputs. Rather, it exclusively
employs input data {xi }ni=1 without corresponding outputs.
Unlike supervised learning, which utilizes a hypothesis set
comprising functions from which the data engineer extracts
the solution, the objective of unsupervised learning is to
keywords. Looking at the co-occurrence of keywords, it can discern patterns and relationships within the dataset in an
be observed that the most frequently found keyword pairs are: exploratory fashion, without explicit instruction. Notably,
unsupervised learning can also be employed to examine the
• ‘‘cybersecurity’’ and ‘‘machine learning’’ (25)
data the engineer was given, to glean insights on the data
• ‘‘cybersecurity’’ and ‘‘artificial intelligence’’ (22)
(for example some tendencies), often serving as a preliminary
• ‘‘chatgpt’’ and ‘‘artificial intelligence’’ (21)
stage in data preprocessing [21].
• ‘‘machine learning’’ and ‘‘artificial intelligence’’ (20)
Deep learning (DL), a specialized subset of ML, involves
• ‘‘cybersecurity’’ and ‘‘chatgpt’’ (19)
artificial neural networks with multiple layers, commonly
The most popular keyword co-occurrence combinations are referred to as deep neural networks. These networks can
the ones where a pair is formed of a security term and an AI tackle more intricate tasks by deconstructing them into
term, aligning well and as expected with the target domain of smaller, more manageable problems across their layers [22].
this study. Data play a pivotal role in AI, especially in ML. High-
quality, expansive datasets are essential for training algo-
III. FOUNDATIONAL CONCEPTS IN AI, LLMS AND rithms effectively. For instance, training an audio recognition
CYBERSECURITY system necessitates thousands, if not millions, of labeled
This section introduces foundational concepts and ideas samples. Sometimes there are insufficient data to equally
behind AI, LLMs and cybersecurity that are needed for represent each class, leading to class imbalance. In such
further understanding of the topic. This section is structured cases, it is important to implement resampling techniques.
as follows: first, general information about AI technologies Oversampling involves increasing the number of instances
will be provided, followed by an introduction to LLM in the minority class to balance the dataset. Undersampling
and NLP concepts and ideas. Subsequently, the concept of reduces the number of instances in the majority class to
jailbreaking in the context of LLMs will be defined, along balance the dataset [23].
with some jailbreaking techniques. This concept is crucial Another important topic is the evaluation of AI models.
for understanding the logic behind offensive security in There are many evaluation metrics available, but the follow-
LLMs. Finally, some definitions and theoretical background ing metrics are most commonly used for classification tasks.
for cybersecurity are presented. It must be noted, however, Accuracy refers to the proportion of all classifications, both
that the field of cybersecurity is vast, and therefore only the positive and negative, that were correctly predicted. Recall
concepts covered in the literature analysis part of this study (also known as true positive rate) is the proportion of actual
will be elaborated here. positive cases that were correctly classified as positive by
the model. Precision measures the proportion of all positive
A. ARTIFICIAL INTELLIGENCE classifications made by the model that are actually positive.
As delineated by the European Parliamentary Research The F1 score is the harmonic mean of precision and recall
Service, AI can be characterized as a system that exhibits and is often used as a descriptive metric [24]. F1 score
intelligent behavior by analyzing its environment and is especially important as it provides a more robust metric

176756 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

for evaluation, making it extremely useful when working For this, techniques like Low-Rank Adaptation (LoRA) and
with imbalanced datasets. The Area Under the Receiver Parameter Efficient Fine-Tuning (PEFT) are used [31]. Zero-
Operating Characteristic (AUC-ROC) score indicates how shot learning refers to a technique where a model scores
well a model can distinguish between classes, with higher the answers without having seen any prior examples, relying
values signifying better performance. The ROC curve is a entirely on its pre-existing knowledge and the provided
plot of the true positive rate against the false positive rate at prompt. On the other hand, few-shot learning involves giving
different classification thresholds [25]. the model a small number of examples before asking it to
In essence, AI systems analyze inputs from their envi- generate new responses, generally improving performance,
ronment, process this information using various algorithms, especially for simpler tasks.
and produce outputs that range from simple decisions to LLMs have built-in mechanisms designed to prevent
complex predictions. The continuous enhancement of these access to illegal or offensive data. However, it is possible
systems is heavily reliant on data, computational power, and to bypass these safeguards through techniques known as
advancements in algorithm design. jailbreaking [32]. Recent studies have demonstrated that
cleverly crafted natural language prompts can circumvent
sophisticated safety measures, leading to Prompt Injection
B. LARGE LANGUAGE MODELS (PI) attacks. This underscores a significant gap in current
Large Language Models are sophisticated language models security measures, which are often reactive and narrowly
encompassing billions or even trillions of parameters. These focused.
models are trained on vast datasets, often amounting to Some well-known attacks have already been addressed,
terabytes of data. Notable examples such as T5 [26], GPT, such as the Do Anything Now (DAN) attack, which
and LLaMA [27] have demonstrated remarkable proficiency compelled ChatGPT to provide any information requested by
in tasks such as text generation, chatbot interactions, and the user [33]. There are various types of attacks, including the
programming. Furthermore, it has been established that Evil-Bot Prompt, ChatGPT Developer Mode v2, and Strive
LLMs can effectively conduct penetration testing [16]. There To Avoid Norms (STAN) Prompt. While these attacks are
are different types of LLMs, including general-use models relatively easy to identify and rectify, more sophisticated
and domain-specific models tailored to particular tasks. methods involve creating adversarial prompts that exploit
For instance, BloombergGPT [28] is designed to perform model weaknesses using advanced techniques like gradient-
financial tasks. based search. This highlights a significant gap in current
The fundamental concept underpinning LLMs involves the security measures, which are often reactive and narrowly
tokenization of text, which breaks text down into smaller focused [34].
units known as tokens. The core principle of LLMs is to
predict the next token (word) based on the context and
C. CYBERSECURITY
input data. Utilizing NLP technology, LLMs can generate
In today’s data-driven world, social networks, online transac-
human-like language tailored to specific tasks, making the
tions, and myriad other activities are powered by Information
interpretation of their output exceedingly straightforward.
and Communication Technologies (ICTs). As modern tech-
One of the most prominent LLMs today is ChatGPT, which
nologies advance, the threat of cyberattacks escalates, making
was trained using Reinforcement Learning from Human
it crucial to prioritize information security and data privacy.
Feedback (RLHF) [29].
Cybersecurity is the practice of protecting resources,
Traditional Reinforcement Learning (RL) involves a model
processes, and infrastructures from adversarial attacks,
learning to make decisions by interacting with its environ-
unauthorized access, or damage, both physical and digital.
ment, receiving penalties and rewards based on its actions.
Additionally, it encompasses strategies and actions aimed at
These values are determined by a reward function created
ensuring quick recovery in the event of disasters, as well as
by engineers, making the design of this function critical.
a set of policies and rules designed to safeguard these assets.
Poorly designed reward functions can lead to unintended
This subsection will briefly discuss certain types of attacks
consequences, such as agents exploiting loopholes to achieve
on cybersecurity that have additional dimensions for AI and
high rewards without genuinely fulfilling the intended
LLMs, both in attack and defense.
objectives. To address these issues, RLHF incorporates a
human-in-the-loop approach. Rather than relying on a static
reward function, the agent’s learning objective is iteratively 1) PHISHING AND SOCIAL ENGINEERING
refined through human feedback [30]. Social engineering involves the psychological manipulation
Also, terms like fine-tuning, zero-shot and few-shot of users to obtain confidential information or prompt actions
learning should be introduced. Fine-tuning is the process that provide access or leverage for unauthorized activities.
of adapting an LLM to perform better in certain domains Phishing, a form of social engineering, deceives users into
by training it on task-oriented datasets. It helps the model disclosing sensitive information to an attacker. These attacks
to adjust its parameters, improving its ability to respond to typically serve as an initial entry point into an infrastructure
domain-specific queries with higher accuracy and relevance. and are commonly executed via email or phone call [35].

VOLUME 12, 2024 176757


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

It is believed that AI, particularly LLMs, will contribute to development, deployment, and use of AI technologies. Given
both attackers in crafting sophisticated phishing emails and its significant role in AI, it is crucial to thoroughly analyze
defenders in identifying them. A more in-depth analysis of this topic. With the advent of ChatGPT and particularly the
this will be conducted in section IV. emergence of more powerful models, cyberethics has become
a substantial concern.
2) MALWARE
Malware encompasses all malicious software and can be 1) LLMS AND PRIVACY
defined as software the operation of which has a detrimental Wu et al. [39] assert that AI technology functions as a double-
effect on the target computer or device. Various types of edged sword. While it is designed to enhance security, such
malware include viruses, worms, trojan horses, spyware, and AI systems also pose significant threats to organizations,
ransomware. Like any software, malware can be created as they can be exploited for malicious intentions. Addition-
using various programming languages. Typically, hackers ally, the security of the AI system itself can lead to substantial
either develop their malware or purchase it from the darknet. damage if compromised. Consequently, it is important to
However, with the advent of generative AI, it is now possible fortify the security of these systems. Chugh [40] highlights
to create malware without prior programming knowledge, the necessity of addressing privacy concerns and mentions the
making malware creation accessible to almost anyone [36]. Probably Approximately Correct (PAC) learning technique as
It is quite common in cybersecurity to use honeypots, which a method to safeguard sensitive corporate data when utilizing
are decoy systems designed to lure attackers. Honeypots LLMs.
provide cybersecurity professionals with an environment to
analyze attack methodologies. The data collected provide
2) LLM AND BIAS
valuable insights into the behavior, evolution, and evasion
techniques utilized by malware. Wu et al. [39] emphasize that LLMs have the capacity to
influence public opinion, making it crucial that the data used
3) BRUTEFORCING AND ENUMERATION to train these models are free from bias and discrimination.
Bruteforce attacks involve an attacker leveraging com- If an LLM is trained on biased or inappropriate data, it may
puter processing power, for example, to find passwords propagate unethical, harmful, or discriminatory practices
by attempting numerous username-password combinations, to its vast user base, potentially reaching hundreds of
often sourced from dictionaries, potentially reaching millions millions. Another significant concern is the environmental
of combinations. Generative AI can enhance this process impact of LLMs. These systems require substantial resources,
by making educated guesses, generating passwords, and including electricity, leading to increased carbon emissions
conducting adaptive bruteforce attacks [37]. Enumeration and contributing to environmental pollution.
refers to systematically extracting information about a target Chugh [40] underscores the importance of addressing
network, system, or application, such as open network ports, these biases through meticulous data curation and the
running services, and website directories. This information is use of fairness-aware ML techniques. Such approaches
typically used to plan subsequent cyberattacks [38]. help to mitigate biases and promote equity in AI-driven
interactions. Additionally, Chugh highlights the ethical
IV. ANALYSIS OF THE REVIEWED LITERATURE challenge of misinformation spread by LLMs. These models
In this section the authors present the primary findings of can inadvertently disseminate false information, amplifying
this study. All articles were classified into topical categories societal biases and prejudices. Thus, implementing robust
(as defined in section II) based on the central focus of fact-checking mechanisms and maintaining a commitment to
the research presented in each article. In the following, factual accuracy are essential. Chugh also advocates for the
the analyzed literature is discussed one topical category establishment of regulatory frameworks to safeguard against
at a time in the following order: Cyberethics and Ethical potential misuse or ethical breaches.
Considerations, LLM and Law, AI as a Defensive Measure,
AI System Vulnerabilities and Mitigating Them, NLP in 3) LLM AND AI GOVERNANCE
Cybersecurity, LLM as an Offensive Tool, and LLM in Wu et al. [39] highlight the necessity of regulating
Defensive Cybersecurity. LLMs through legal frameworks, particularly concerning
the copyrights of texts generated by these models. Given
A. CYBERETHICS AND ETHICAL CONSIDERATIONS the popularity of LLMs, many students and researchers
Ethics, in its broadest sense, constitutes a branch of philoso- incorporate LLM-generated texts into their reports or home-
phy that addresses questions concerning what is morally right work. To address these issues, Chugh [40] recommends
and wrong, good and bad, fair and unfair. It encompasses a the use of regulatory frameworks. Another innovative
system of moral principles that guide human behavior, aiding approach is proposed by Gianni et al. [41], which involves
individuals and societies in determining how they ought to act a framework of democratic experimentation. This method
in various situations. When discussing ethics in the context of emphasizes social inquiry and involves civil society in the
AI, it pertains to the principles and guidelines that govern the governance of AI, ensuring that ethical guidelines reflect

176758 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

the values and concerns of the general public. It includes such as the generation of poor-quality or illegal content
public engagement and stakeholder inclusion in decision- and data leakage. Shi criticizes the currently suggested
making processes, transitioning from traditional ethical legislation and proposes new concepts intended to strengthen
guidelines to a more inclusive and participatory model of enforcement. The research promotes a balanced governance
governance. strategy that focuses on both security and development. Clear
In addressing these issues, Flaih and Jasim [42] sug- regulations on copyright ownership and the enhancement
gest embedding ethical guidelines into chatbot AI models. of AI self-checking functions are vital. Additionally, the
Authors advocate for the development, implementation, and study suggests that international cooperation is necessary to
frequent monitoring of cyberethical frameworks and rules. establish unified data protection rules to address the legal
Furthermore, the responsibility for the ethical use of LLMs challenges posed by generative AI.
must be shared among all stakeholders. Beyond ethical con- LLMs like ChatGPT are not entirely new or unique;
cerns, some researchers argue that modern curricula should the first modern LLMs were introduced back in 2017,
place greater emphasis on cyberethics. According to Matei although a significant ‘‘boom’’ for LLMs occurred with
and Bertino [43], cyberethics education is insufficiently the launch of ChatGPT. Despite widespread enthusiasm,
covered in cybersecurity majors. Consequently, professors there were notable concerns, such as Italy becoming the
often overestimate the cyberethical preparedness of students, first country to ban ChatGPT at the governmental level.
and Matei et al. propose that cyberethical training should be As Gualdi and Cordella [47] mention, Italy’s Data Protection
integrated into curricula. Authority, known as Garante, imposed a temporary ban on
Governmental entities, such as the G7, have expressed ChatGPT in March 2023 due to violations of the General
interest in regulating LLMs. It is also crucial to ensure Data Protection Regulation (GDPR). Issues like lack of
the proper use of LLMs like ChatGPT. As Waghmare transparency, data accuracy, legal basis, and age verification
notes [44], users have a degree of control over the data shared led to the ban. The authors critique the assumption of
with and used by ChatGPT for training. Therefore, when technological neutrality in regulations like GDPR, arguing
sharing sensitive information, it is important to ensure that that regulatory frameworks must evolve to address the
the data are not used for training purposes. User concerns specific characteristics of generative AI rather than applying
about privacy, data security, and transparency significantly broad, technically neutral policies.
affect their loyalty to ChatGPT, as observed by Niu and Another important regulatory issue is that, according to
Mvondo [45]. Users with strong cyberethical beliefs demand Kshetri [48], LLMs like ChatGPT lower the barriers to
higher standards of corporate responsibility and transparency, entry for malicious actors. Techniques such as jailbreaking
influencing their satisfaction and loyalty to AI technologies. can force LLMs to generate malicious code or plan a
Thus, users are likely to seek out the most transparent and cyberattack without prior expertise required from the human
secure bots. user. Therefore, it is crucial to regulate the use of LLMs by
It is clear that while users are gradually accepting LLMs, users.
there remains significant concern regarding data privacy and Jeong [49] offers a comprehensive taxonomy of AI-related
transparency. This creates a dual responsibility: users must crimes, categorizing them into AI as a tool crime and AI
be vigilant about the data they input into public LLMs, as a target crime. This classification emphasizes the mul-
while developers must uphold cyberethical standards and tifaceted nature of AI-enhanced traditional crimes, such as
ensure responsible data processing practices. Furthermore, advanced phishing and automated hacking, and highlights the
existing ethical frameworks need to be updated to address the unique challenges posed by adversarial attacks targeting AI
challenges posed by recent advancements in AI technologies, systems.
ensuring that privacy, bias, and governance issues are ChatGPT is highly dynamic, with datasets continuously
properly managed. evolving as the AI processes data and generates new outputs.
As this section suggests, applying traditional regulatory
approaches is insufficient for managing such systems.
B. LLM AND LAW It is important to shift towards regulatory frameworks that
Another crucial concept is the regulation of AI by law. incorporate a thorough understanding of AI technologies,
As LLMs become an integral part of human life, it is ensuring that regulations are not only legally robust but also
essential to define what is permissible by law and what is technologically informed.
not. For example, it needs to be defined whether it is legal
to use AI or LLMs to create software. One of the notable C. AI AS A DEFENSIVE MEASURE
approaches in this respect is the method proposed by Shi [46]. The advent of AI has introduced novel attack vectors,
In the research, it was observed that all risks associated with techniques, and defensive measures, while also enhanc-
generative AI can be categorized into two main categories. ing existing defensive mechanisms. In this section, the
The first category encompasses all risks related to intellectual authors assess the literature on defensive AI applica-
property, including issues like the ownership of AI-generated tions in cybersecurity and identify specific application
content. The second category includes all data-related risks, areas.

VOLUME 12, 2024 176759


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

1) AI AND MACHINE LEARNING GENERAL APPLICATIONS Lastly, Hemberg and O’Reilly [60] discuss the use of a
The authors start this section with a discussion on the general collated dataset named BRON to enhance AI applications
application of AI in cybersecurity, based on the following in cybersecurity. BRON integrates multiple public threat
publications: [50], [51], [52], [53], [54]. Authors note that AI and vulnerability sources into a graph database, supporting
in cybersecurity plays a critical role by leveraging advanced pattern inference, modeling, simulation, and AI planning,
techniques such as ML, DL, and RL to detect and respond making it a powerful tool for cybersecurity professionals.
to cyber threats. A recurring theme noted in the articles It is evident that AI can be applied in many specific
is the dual role of AI as both a defender and a potential areas of cybersecurity, making it an extremely useful tool for
tool for cybercriminals. This underscores the necessity for cybersecurity engineers. Therefore, it is important to ensure
robust and adaptive defense mechanisms. AI’s ability to that this technology is integrated within organizations, as AI
analyze vast datasets in real-time, adapt to new threats, and generally helps to improve security operations and strengthen
automate security routines enhances the speed and efficiency overall security.
of defensive cybersecurity operations.
The articles also observe that AI technologies have 3) AI FOR INTRUSION DETECTION AND PREVENTION
their limitations and shortcomings. Issues such as data Several studies have been conducted on the significant
quality, algorithmic complexity, vulnerability to adver- potential that AI technologies hold for intrusion detection and
sarial attacks, and ethical concerns related to privacy, prevention. Park et al. [61] propose that DL models such as
bias, and accountability are serious concerns for AI in Convolutional Neural Networks (CNNs) and LSTM networks
cybersecurity. can be highly effective for intrusion detection. However,
these models often face the challenge of insufficient data.
To address this, Park et al. suggest using Generative
2) SPECIFIC AI TECHNIQUES AND APPLICATIONS Adversarial Networks (GANs) to generate synthetic network
A potential strategy to apply AI techniques to cybersecurity traffic data, thereby improving the performance of IDS by
is to embed the implementation of AI directly into the mitigating data imbalance issues.
security framework. For example, Bagaa et al. [55] propose The study by Marino et al. [62] introduces an approach
a ML-based security framework for Internet of Things (IoT) to enhance the explainability of IDS models. This is
systems that leverage Software Defined Networking (SDN) achieved by identifying the minimum modifications needed
and Network Function Virtualization (NFV) to provide to correct misclassified samples, thus providing insights into
dynamic and efficient threat detection and mitigation. AI can the decision boundaries of the model. The approach was
be utilized in Cybersecurity Named Entity Recognition (Cs- tested using the NSL-KDD99 benchmark dataset, and the
NER). Chen et al. [56] propose a novel approach to Cs-NER experiments demonstrated its effectiveness.
models that combines the Bidirectional Encoder Represen- The integration of AI within the Open XDR frame-
tations from Transformers (BERT) language representation work shows promise. Pissanidis and Demertzis [63] high-
model [57] with Long Short-Term Memory (LSTM), Bidi- light the enhanced capabilities in threat detection and
rectional LSTM (Bi-LSTM), Iterated Dilated Convolutional response achieved by combining data from IDS, Endpoint
Neural Networks (ID-CNNs), and Conditional Random Field Detection and Response (EDR), SIEM, Active Direc-
(CRF) layers. This approach uses the BERT model to tory, and log forwarding systems. The incorporation of
obtain distributed representations of words, enhancing the AI/ML enables real-time detection, reduces false posi-
performance of the NER system. The study demonstrates tives, and enhances the overall efficiency of cybersecurity
that joint BERT models significantly outperform state-of-the- operations.
art methods in Cs-NER tasks. Another intriguing application AI is also particularly useful in mitigating DDoS attacks.
of BERT is in malware detection, as explored by Rahali Ouhssini et al. [64] propose the DeepDefend framework for
and Akhloufi [58]. Transformer-based malware detection real-time detection and prevention of DDoS attacks in cloud
utilizes static analysis to detect and classify malware without computing environments. This framework leverages DL
executing the code, making it resource-efficient and faster. techniques and genetic algorithms and was tested on the
Moreover, this method can not only classify software as CIDDS-001 dataset, demonstrating high accuracy in entropy
malware or benign but also identify the type of malware. The forecasting and rapid, precise detection of DDoS attacks.
results demonstrate satisfactory accuracy and F1 scores, with The use of AI in IDS, IPS, and DDoS-prevention
binary classification showing superior performance. systems has been shown to be highly efficient, contributing
AI can also be applied to cryptography to enhance the significantly to organizational security. This efficiency is
security and efficiency of cryptographic algorithms such further illustrated in the study by Latif et al. [65], where
as Advanced Encryption Standard (AES), Rivest–Shamir– authors present an optimized IDS for Industrial IoT networks.
Adleman (RSA), and Learning with Errors (LWE) -based Authors’ approach utilizes Deep Transfer Learning (DTL)
systems. According to Nitaj and Rachidi [59], AI can analyze and Genetic Algorithms (GA), converting cybersecurity
and improve the security of S-boxes in symmetric ciphers and datasets into image data to enable the use of advanced CNNs
generate safe prime numbers and public/private keys in RSA. for intrusion detection.

176760 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

4) AI IN SPECIFIC CYBERSECURITY DOMAINS effectively predicts phishing attacks, showing improvements


In this section, the potential of AI across diverse cybersecurity in detection accuracy and suggesting that the proposed
domains will be analyzed. A particularly intriguing exper- framework is a valuable tool against phishing attacks.
iment was conducted by Veprytska and Kharchenko [66], Another method to combat phishing emails is introduced
who provide a comprehensive classification and risk anal- by Ding et al. [15], who combined K-means clustering with
ysis framework for AI-powered cyberattacks and defenses. the traditional Synthetic Minority Over-sampling Technique
Authors underscore the complexity of AI-against-AI sce- (SMOTE), creating the KM-SMOTE algorithm. The final
narios, highlighting the necessity for robust AI-powered model achieved a recall of 95.56%, a precision of 98.85%,
countermeasures and ongoing research to stay ahead of and an F1 score of 97.16%, demonstrating excellent perfor-
evolving threats. Authors’ application of the IMECA method mance. An interesting approach to phishing defense is studied
offers a systematic approach to assess risks and evaluate the by Asfour and Murillo [71]. The work involved using LLMs
effectiveness of various countermeasures. A critical consid- to effectively simulate realistic human responses to social
eration is adversarial ML, where the dataset is manipulated engineering attacks. This method offers valuable insights
to mislead the ML algorithm’s decision-making process. into the susceptibility of different personality traits, enabling
Vaccari et al. [67] evaluate the effectiveness of traditional ML organizations to develop targeted cybersecurity training and
algorithms compared to new methods designed to ensure awareness programs.
reliability and transparency in detecting adversarial attacks. The method proposed by Dadvandipour and Ganie [72]
The study proposes a novel approach that utilizes explainable employs a multi-layer strategy to mitigate spear phishing
and reliable AI techniques to identify adversarial attacks, by analyzing both email content and attachments. By using
maximizing detection accuracy. Experimental results indicate Support Vector Machines (SVM) and RandomForest clas-
that the proposed Reliable AI methods outperform traditional sifiers, sentiment analysis is applied to both email content
ML algorithms. and attachments to determine their legitimacy. The study
AI has also found applications in 6G networks. The study demonstrates that this multi-layer approach, combining ML
by Ferrag et al. [68] outlines various use cases of generative algorithms with hashing algorithms and sentiment analysis,
AI in IoT applications, including visual, audio, text-based, is effective in detecting spear-phishing attacks. Generally,
code-based, and IoT security applications. This implemen- the SVM method is quite popular in phishing email
tation involves using generative AI for cyber threat-hunting detection [73].
in 6G-enabled IoT networks to enhance network security. A novel framework called ‘‘Cyber Protect’’ was proposed
The study proposes a hybrid model combining GANs and by Gawade et al. [74]. This comprehensive cybersecurity
transformers for cyber threat-hunting in 6G-enabled IoT system leverages ML to detect and prevent fraudulent scams
networks, with evaluations confirming the high accuracy of and phishing URLs. The system is trained on a large
the proposed model in detecting various types of IoT cyber dataset of emails and messages using supervised learning.
threats. Furthermore, with the integration of NLP, the ML model
Another application in 6G is proposed by Karaçay et achieved commendable precision, recall, and F1 scores,
al. [69], focusing on the security aspects of 6G use cases, as NLP techniques significantly improved phishing detection
particularly the All-Senses meeting, which leverages AI and by analyzing linguistic nuances in email content.
ML to enhance telepresence. The article applies the STRIDE AI has also found significant applications in threat
threat modeling framework to systematically identify and detection. Anande and Leeson [75] proposed generating
classify AI/ML-specific threats in the All-Senses meeting use synthetic network traffic data using GANs and classifying
case, providing a detailed analysis of potential vulnerabilities Advanced Persistent Threat (APT) samples using Extreme
and mitigation strategies. Gradient Boosting (XGBoost). As mentioned earlier, the lack
The studies discussed above demonstrate the versatile of data is one of the main obstacles to AI applications in some
applications of AI in addressing complex cybersecurity fields of cybersecurity, and the methodology proposed by this
challenges across different domains, highlighting its role as study successfully addresses this issue. XGBoost classified
a crucial tool in enhancing security measures and mitigating synthetic samples with 99.97% accuracy, maintaining a high
emerging threats. ROC-AUC, indicating optimal detection performance. The
efficiency of Boosting was also confirmed in a study by
Hasan et al. [76], in which boosting and explainable AI were
5) THREAT DETECTION AND PHISHING used to identify APTs.
An innovative approach to phishing attack prevention Studies by Hlatshwayo [77] and Amarasinghe et al. [78]
involves combining SDN with a Deep Machine Learn- further confirm the efficiency of AI in threat detection and
ing technique called the CANTINA approach (DMLCA), response. Many benefits have been mentioned, including
as proposed by Raja and Ravi [70]. This methodology accelerated response times due to AI’s automation of the
integrates SDN’s centralized management capabilities with analysis and prioritization of security alerts, high detection
DMLCA to enhance network security. The DMLCA method accuracy, and prediction accuracy. The efficiency of GAN

VOLUME 12, 2024 176761


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

models was also demonstrated in a study by Ferrag et al. [79], guidelines. Another promising AI-based tool is ‘‘Gargoyle
where a two-stage intrusion detection framework was pro- Guard’’ [86], that improves security through continuous user
posed, specifically designed to secure IoT environments authentication using Real-Time User Activity Fingerprinting
using GAN and DL models. GANs were used to generate (RTAF). While this technology is highly effective, it does
adversarial examples that mimic real-world attack patterns, face limitations such as user behavior variability and privacy
followed by DL models to detect and classify both normal concerns. Macas et al. [87] explore the application of DL
and attack traffic. The proposed framework achieved high in various cybersecurity tasks, from network intrusion
detection accuracy, with a weighted average precision, recall, detection to malware analysis and spam filtering. The survey
and F1 score of 96%, 95%, and 95%, respectively. A study by highlights the simplicity, scalability, and reusability of DL
Uwagboe and Aremora [80] proposes an AI-based security models, emphasizing their effectiveness in automating threat
analytics framework to detect and mitigate APTs in cloud detection and response. The authors present a detailed
environments. The framework aims to provide real-time framework for deploying DL in cybersecurity, underscoring
analysis and response capabilities to minimize the impact of successful applications across different domains. Another
APTs. The proposed framework demonstrated high accuracy interesting approach for configuration verification is explored
in detecting APTs with low false-positive rates during by He et al. [88], who investigate the use of association
extensive evaluations in simulated cloud environments. analysis for network configuration verification in large-scale
A similar result could also be achieved using a Multi-Layer telecom networks. The proposed system leverages weak asso-
Protection Approach aimed at detecting and mitigating APTs. ciation rules to identify infrequent item sets as configuration
Mohamed et al. [81] designed an approach based on the anomalies. The framework integrates data preprocessing,
MITRE ATT&CK framework, which demonstrated effective model training, anomaly detection, and manual annotation
detection of APTs through Central Processing Unit (CPU) to create a robust closed-loop system. Experimental results
utilization monitoring. Arshad and Menon [82] explore the demonstrate high precision and recall rates, with the system
application of AI and ML in enhancing honeypot solutions efficiently scanning large datasets within minutes. Despite
for cybersecurity. The use of adaptive honeypots, powered challenges in detecting frequent misconfigurations, the
by LSTM models, allows for detailed behavioral analysis system’s continuous improvement approach through expert
of SSH attacks, achieving high detection accuracy. The feedback and additional ML techniques shows significant
research underscores the significance of feature engineering promise. This study provides valuable insights into enhancing
in transforming heterogeneous data into a format suitable for network management and maintenance through advanced
ML model training. Despite challenges such as data quality data mining methods.
and occasional misclassifications, the study highlights the To sum up, modern trends indicate a growing interest
potential of AI to revolutionize cybersecurity by predicting in the intersection of AI and cybersecurity. Increasingly,
attacker behaviors and improving threat detection. companies are incorporating AI into their Security Opera-
AI clearly has a transformative impact on threat detec- tions Center (SOC) operations, thereby strengthening their
tion and response. However, some issues still need to cybersecurity infrastructure. However, several challenges in
be addressed, particularly ethical considerations, including implementation must be considered, such as the need for
privacy and bias in AI algorithms, to ensure the responsible unified frameworks and collaborative environments [84].
deployment of AI in cybersecurity. The key takeaways are that numerous experts and research
articles highlight the importance of AI in cybersecurity
for countering contemporary threats and enhancing SOC
6) AI FRAMEWORKS AND THEORETICAL APPROACHES operational workflows. Nevertheless, it is crucial to address
Another intriguing application of AI is its integration into ethical considerations and privacy concerns, ensuring that AI
modern cybersecurity frameworks. Chomiak-Orsa et al. [83] implementation complies with existing legislation [89], [90],
propose embedding AI into different stages of the cyber kill [91], [92]. However, it is evident that AI can be regarded
chain framework. The article highlights that AI applications as an essential protection barrier for any organization due
are particularly promising in the reconnaissance, intrusion, to its effectiveness in countering a wide range of potential
privilege escalation, and data exfiltration stages of the cyber attacks. Furthermore, in certain cases, AI may prove to be
kill chain. Additionally, Molina et al. [84] emphasize the more efficient and cost-effective than relying solely on human
power of AI in both offensive and defensive cybersecurity. resources.
Given the broad scope AI can cover, it is crucial to consider
the ethical implications and potential risks associated with its
use in cybersecurity. D. AI SYSTEM VULNERABILITIES AND MITIGATING THEM
An innovative framework, AI4CYBER, proposed by This subsection will delve into the potential systemic
Iturbe et al. [85], leverages AI to enhance cybersecurity for vulnerabilities inherent in AI, as well as the proposed
critical infrastructure. This framework provides a suite of ways of mitigating the effects of these vulnerabilities. The
novel AI-driven services designed to manage the entire discussion will be organized into smaller sections: first,
incident response lifecycle and aligns with the NIST 800-61 the vulnerabilities of AI and LLM systems are scrutinized,

176762 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

followed by an exploration of potential mitigation proposals a herculean task. A robust defense strategy is essential,
and strategies. blending adversarial training with traditional cybersecurity
practices and layered defenses [96].
The vulnerabilities within the AI/ML supply chain,
1) VULNERABILITIES IN AI SYSTEMS particularly in commonly used libraries like TensorFlow and
Numerous studies address the vulnerabilities present in AI PyTorch, also pose serious risks [97]. Innovative approaches
systems. Some focus on specific aspects of AI, while others to counter adversarial attacks include layer-wise adversarial
adopt a broader perspective, discussing various approaches training and Mixed Adversarial Training (MAT), which
and frameworks to tackle these issues. A particularly intrigu- combine multiple attack methods to improve robustness [98].
ing idea was proposed by Spring et al. [93], who conducted Eggers and Sample [99] explore vulnerabilities inherent in
a thought experiment on assigning Common Vulnerabilities AI and ML applications within the nuclear security context.
and Exposures Identifiers (CVE-IDs) to flaws in ML Authors illustrate how AI enhances security through applica-
systems. The article emphasizes the importance of adapting tions like insider threat mitigation and autonomous perimeter
existing vulnerability management frameworks to the unique defense, while simultaneously introducing new risks. The
challenges posed by ML systems. By assigning CVE-IDs report highlights the importance of high-quality, unbiased
to ML algorithm vulnerabilities, better communication and data and robust security measures to protect AI systems from
understanding between researchers and practitioners can be adversarial attacks. Adherence to established standards and
fostered, ultimately leading to more secure AI/ML systems. best practices, along with continuous monitoring and human
The article also highlights the inadequacy of existing tools oversight, are crucial for mitigating these vulnerabilities.
like the Common Vulnerability Scoring System (CVSS) in An intriguing methodology proposed by Mauri and
ML contexts, advocating for new frameworks that better Damiani [100], known as STRIDE-AI, adapts Microsoft’s
capture the complexities of ML systems. Another noteworthy STRIDE framework to address the unique security challenges
study is conducted by Grosse et al. [94]. The study examines of AI-ML systems. This framework, enhanced with the
AI vulnerabilities and the disparity between academic threat rigorous structure of Failure Mode and Effects Analysis
models and practical AI security by analyzing the six most (FMEA), offers a comprehensive approach for identifying
common AI attacks: poisoning, backdoors, evasion, model and mitigating threats throughout the entire ML lifecycle.
stealing, membership inference, and property inference. The Its practical application in the TOREADOR H2020 project
study argues that academia sometimes makes overly generous underscores its effectiveness in real-world scenarios, demon-
assumptions about attacker capabilities, whereas real-life strating its utility in identifying and mitigating threats in
attacks reveal a different scenario, characterized by more complex ML systems. As noted by Tao et al. [101], the
stringent access controls and limited data availability. The STRIDE framework can also be applied to custom GPTs,
study underscores the need for threat models that align providing a structured analysis of potential vulnerabilities and
more closely with the day-to-day realities of AI deployment. identifying 26 attack vectors with real-world implications.
Additionally, the frequent use of pre-trained models and the The study highlights the necessity of robust security measures
reliance on domain experts introduces unique vulnerabilities and transparent data handling protocols.
that require closer examination. Aligning research with these Another notable framework for risk management is
practical constraints can pave the way for more robust and the Three Lines of Defense (3LoD) model, studied by
realistic AI security measures. Schuett [102]. This model, widely used in various indus-
Another noteworthy article addressing AI system vulner- tries, emphasizes the need for clear role assignments and
abilities is authored by Scott-Hayward [95], who discusses coordinated efforts in managing AI risks. Schuett provides
the fundamental weaknesses of AI-based security systems. practical suggestions for implementing the model in AI
The study emphasizes adversarial training, a method rec- companies, particularly medium-sized research labs and
ommended for enhancing the robustness of AI models by big tech firms. By adapting the 3LoD framework, Schuett
incorporating adversarial examples into the training dataset. offers a structured risk management approach that ensures
However, this approach is currently implemented in an ad comprehensive risk coverage and enhances governance.
hoc manner. The article underscores the need for standardized However, potential bureaucratic inefficiencies and the risk
adversarial robustness benchmarking, which includes agreed- of creating a false sense of security should be considered
upon datasets, threat models, evaluation techniques, and during implementation. This approach should also be applied
metrics. From data poisoning to sophisticated adversarial to securing LLMs. The study by He et al. [103] provides a
attacks, these threats can manipulate model outputs and thorough examination of the ethical and security challenges
compromise data integrity. in LLMs, proposing advanced strategies to fortify these
Traditional cyber risks also pose significant threats. For boundaries. By integrating sensitive vocabulary filtering,
example, consider the potential impact of a botnet or role-playing detection, and custom rule engines, the authors
ransomware attack on ML infrastructure. The complexity present a robust framework that balances high performance
and stealth of these attacks make defending ML systems with stringent ethical standards. The approach addresses the

VOLUME 12, 2024 176763


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

immediate risks of phishing attacks and privacy breaches limitation highlighted by various studies [110], [111], [112]
while contributing to broader social equity and data protec- is the difficulty in maintaining context during extended
tion. The necessity for security frameworks in the AI field conversations, which can lead to incoherent responses.
is confirmed by numerous studies [104], underscoring the To protect sensitive data, especially in healthcare, robust
importance of guaranteeing the security of AI systems. measures such as data anonymization, encryption, and
The increasing deployment of AI/ML hardware acceler- strict access controls are essential [113]. A case study
ators in critical sectors such as healthcare, aerospace, and by Elnawawy et al. [114] demonstrates an attack on an
defense has spotlighted the issue of Hardware Trojans (HTs). ML-enabled blood glucose monitoring system. By injecting
These covert, malicious modifications can cause significant adversarial data points through a known Bluetooth vulner-
damage, from leaking sensitive information to undermining ability [115], the ML model can be manipulated to make
the accuracy and reliability of ML models. The intricate and incorrect predictions, potentially leading to erroneous insulin
proprietary nature of these accelerators makes HT detection dosage recommendations.
an exceedingly challenging task. Effective mitigation neces- Another interesting study [116] reviews potential attacks
sitates a multi-layered strategy, including design-for-trust, against LLM models, including adversarial attacks, Struc-
ML-based anomaly detection, rigorous formal verification, tured Query Language (SQL) injection, DoS, and buffer
and side-channel analysis [105]. overflow. These attacks exploit vulnerabilities in AI systems,
Poisoning attacks on ML models, particularly during emphasizing the importance of robust security measures,
the training phase, have advanced considerably. These including regular vulnerability scanning, secure coding
attacks now utilize sophisticated techniques like bilevel practices, and continuous monitoring, to mitigate these risks.
optimization and GAN-based methods to generate highly Privacy and data security are additional concerns that must be
effective poisoned data. Of particular concern are the recent addressed with LLMs [117].
developments in clean-label attacks, which employ feature Weeks et al. [118] mention toxicity injection attacks, which
collision strategies to create visually indistinguishable but pose a serious threat to the integrity of open-domain chatbots.
malicious samples. These attacks not only degrade model These attacks exploit the chatbot’s Dialog-based Learning
performance but also evade traditional detection methods, (DBL) framework, where the model is periodically retrained
making them exceptionally insidious [106]. on user interactions, to inject harmful responses into the
AI has a profound impact on cybersecurity. Rayhan language model.
and Rayhan [107] provide a comprehensive analysis of Moreover, LLMs present significant socio-economic
AI’s transformative role in global security, presenting a implications, notably job displacement and widening inequal-
balanced view of its risks and opportunities. The study ities. The automation of tasks traditionally performed by
underscores AI’s dual function: enhancing cybersecurity humans can lead to job losses and increased stress among
while also introducing new cyber threats, like autonomous workers. The deployment of AI risks exacerbating socio-
weapons, a particularly contentious issue, which are shown economic divides, creating an ‘AI divide’ between those with
to offer precision benefits but also carry significant risks of access to AI technologies and those without [119].
unpredictability and misuse. The most substantial potential Diffusion models are also vulnerable to backdoor attacks.
of AI is realized when it works in collaboration with human A novel detection mechanism based on distribution dis-
operators, augmenting their capabilities and compensating crepancy [120] achieves a high detection rate for known
for their weaknesses. Training and skill development for triggers. Additionally, the proposed evasion strategy employs
personnel working with AI are essential. By emphasizing end-to-end learning to minimize distribution discrepancy,
human-AI collaboration, the authors highlight the importance maintaining high attack performance while evading detection
of training and oversight to fully harness AI’s potential. with nearly 100% pass rates.
Sarker et al. [108] also emphasize the power of AI in One of the emerging concerns in the realm of generative
cybersecurity, providing a valuable roadmap for integrating AI is the phenomenon of data feedback loops. These loops
data science and ML into cybersecurity strategies. The occur when AI-generated content is fed back into the training
work underscores the critical role of AI in identifying and datasets for future models, leading to risks such as the
mitigating cyber threats, ultimately strengthening the security amplification of biases, degradation of data quality, and
infrastructure. increased vulnerability to data poisoning attacks. As AI
LLMs are currently highly popular, and it is crucial to models learn from synthetic data, the authenticity and
discuss the potential vulnerabilities associated with them. reliability of their outputs diminish over time, a process
A significant study by Chowdhury and Rahman [109] known as ‘‘model collapse’’ [121].
discusses the limitations and vulnerabilities of ChatGPT. An interesting attack proposed by Xu et al. [122] involves
Although recent versions have addressed many of these multilingual cognitive overload. By presenting harmful
issues, biases in generated text and a lack of creativity remain prompts in various languages, particularly low-resource ones,
persistent challenges. However, these can be mitigated LLMs can be coerced into generating unsafe responses.
to some extent with carefully crafted prompts. Another Language-switching scenarios increase the effectiveness of

176764 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

these attacks, with the success rate rising for languages that malicious code and facilitating cyber attacks. A pioneering
have greater word order distance from English. Paraphrasing study by Li et al. [131] introduced a semantic-preserving
harmful prompts to replace sensitive words with neutral or algorithm to generate multilingual datasets, revealing that
less common synonyms increases the likelihood of LLMs LLMs exhibit varying degrees of vulnerability depending on
generating unsafe responses. Furthermore, LLMs can be the language. Notably, models like GPT-4 show enhanced
prompted to reason backward from an effect to a cause, lead- defense mechanisms compared to their predecessors but still
ing them to generate scenarios that describe how to engage in struggle with lower-resource languages. In the study, the
malicious behavior without facing legal consequences. researchers developed an advanced algorithm to create a
Moreover, psychological deception techniques based on multilingual jailbreak dataset, ensuring high semantic fidelity
persuasion principles (Reciprocation, Consistency, Social in translations. An empirical study was conducted on widely
Proof, Likeability, Authority and Scarcity) can be adapted used open-source and commercial LLMs, including GPT-4
to manipulate LLMs effectively, as shown in a study by and LLaMa, across nine languages. The study found that
Singh et al. [123]. For instance, by using social proof and certain languages make models more susceptible to jailbreak
creating scenarios where the model perceives a consensus or attacks. LLMs exhibit stronger defenses in English but are
common practice, attackers can influence LLM responses. more vulnerable in other languages, especially those with
Prompts suggesting widespread acceptance of a harmful fewer resources. Jailbreak templates significantly impacted
action can lead the model to generate supportive responses. the models’ defense mechanisms, with higher success rates
for attacks in low-resource languages like Swahili and Arabic.
Fine-tuning techniques such as LoRA have proven effective
2) MITIGATING THE AI SYSTEM VULNERABILITIES in reducing the success rates of such attacks, underscoring the
One of the recommended ways to address risks associated necessity for tailored, language-specific security strategies.
with AI is leveraging frameworks like NIST’s AI Risk Another noteworthy contribution is by Esmradi et al. [132],
Management and fostering a culture of continuous learn- who conducted a comprehensive survey identifying sig-
ing [124]. The six-dimensional framework proposed by Hu nificant attack methods and mitigation strategies. Authors
and Chen [125] offers a robust approach to dissecting these thoroughly examined direct and indirect PIs, highlighting
dual-edged swords, examining everything from offensive the vulnerabilities LLMs face throughout their lifecycle.
and defensive uses to the inherent vulnerabilities of the AI By reviewing over 100 recent studies, authors provided
models themselves. Moving forward, continuous monitoring, insights into how attackers exploit these weaknesses to
stringent policies, and collaborative efforts are essential to subvert AI functionalities. Additionally, authors implemented
harness the full potential of these systems while mitigating and tested various attack methods, such as self-generated
their risks. attacks using upgraded DAN prompts, combination attacks,
It is also crucial to exercise caution with the information and phishing schemes utilizing LLM-generated fake web-
provided to GPT models, as highlighted in studies by sites. Authors’ work underscores the urgency of address-
Ananthachari and Singh [126] and Sieja and Wach [127]. ing these threats through proactive cybersecurity mea-
Protecting personal information and adhering to data privacy sures, including RLHF, data anonymization, encryption, and
measures is especially important when using publicly avail- advanced filtering techniques.
able LLMs, as the data input is often utilized for training. Jailbreaking techniques present a substantial threat to both
A unique approach proposed by Huang et al. [128] aims the security and ethical utilization of LLMs. By compre-
to enhance AI’s adaptability and robustness in the cognitive hending and addressing these vulnerabilities, developers can
domain. Mimicry intelligence, inspired by biological sys- fortify the resilience of AI models, ensuring they function
tems, offers a promising method for enhancing the adapt- within ethical confines and contribute positively to digital
ability and resilience of AI in navigating and safeguarding interactions. Future research should concentrate on devising
the cognitive domain. By understanding and addressing the advanced defense mechanisms that anticipate and counteract
unique security challenges posed by the integration of cyber, the evolving strategies of prompt-based attacks. Another
physical, and cognitive realms, one can better prepare for the critical concern is data quality. As highlighted in this section,
evolving landscape of AI-driven threats. data play a central role in model training, and in line with
An interesting study has been carried out by Okey the data science principle ‘‘garbage in, garbage out,’’ it is
et al. [129], examining the cybersecurity implications of essential to ensure that the data used are neither biased nor
LLMs, revealing a dual narrative. On the one hand, sentiment falsified. Furthermore, it is crucial to control the type of data
analysis using the Valence Aware Dictionary and Sentiment inputted by users, especially if the data are used for model
Reasoner (VADER) [130] model reveals a significant propor- fine-tuning, in order to prevent potential attacks on the model.
tion of positive sentiments (43.8%), suggesting an appreci-
ation for ChatGPT’s potential in enhancing threat detection
and response mechanisms. Conversely, the roBERTa model E. NLP IN CYBERSECURITY
highlights a considerable amount of negative sentiment Natural Language Processing is an intriguing field that can
(32.7%), reflecting concerns over its misuse in generating significantly contribute to strengthening cybersecurity. NLP

VOLUME 12, 2024 176765


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

is capable of performing numerous tasks within the security Applying NLP to event logs is another promising approach.
domain, and in this section the authors will discuss its Steverson et al. [17] explore the application of transformer
potential applications based on recent research. models and self-supervised learning for cyber intrusion
A novel application of NLP to cybersecurity is presented detection using Windows Event Logs (WELs). Their study
in the study by Li et al. [133]. The authors introduce highlights the effectiveness of this approach in identifying
NEDetector, an automated system designed to identify new and timing cyber attacks with near-perfect precision and
cybersecurity terms, mainly hacking tools and groups, from recall. By leveraging WELs, which are widely available
hacker forums. This tool addresses the challenge of swiftly and capture diverse activities, this method facilitates decen-
discovering and analyzing new terminology in the cyberse- tralized, device-specific responses to intrusions. The high
curity field. By employing a combination of Bi-LSTM and accuracy and promising results suggest a robust framework
Random Forest algorithms, and leveraging comprehensive for autonomous endpoint defense systems, setting the stage
feature extraction from word, character, casing, and part-of- for future advancements in multi-log analysis and context
speech levels, NEDetector achieves a precision of 89.11% in consideration.
detecting new cybersecurity terms. This innovative approach Marinho and Holanda [138] present a framework for
not only surpasses traditional neologism detection methods real-time identification and profiling of emerging cyber
but also proves effective across various platforms, including threats using NLP and ML. By harnessing dynamic data from
Twitter, offering a robust tool for early warning and proactive Twitter and the comprehensive MITRE ATT&CK frame-
defense against cyber threats. work, their system achieves a 77% F1 score in threat profiling.
In the ever-evolving landscape of cybersecurity, the ability This methodology exemplifies the potential of integrating
to automatically analyze and understand vast amounts of open-source intelligence with structured knowledge bases
related documents is invaluable. Georgescu’s work [134] on to enhance situational awareness and response strategies in
an NLP model specifically designed for this purpose stands cybersecurity.
out. By developing a specialized ontology and leveraging the Andrew et al. [139] present an approach to mapping
capabilities of IBM Watson Knowledge Studio, the model Linux shell commands to the MITRE ATT&CK framework
achieves impressive precision and recall rates, with an F1 using NLP techniques. By leveraging TF-IDF with unigram
score of 0.81 for NER and 0.58 for relation extraction. and bigram tokenization, their model achieves high recall
Singh et al. [135] propose a cutting-edge solution utilizing scores, significantly aiding in the automatic identification and
NLP and DL models to automate software vulnerability categorization of attacker behaviors.
detection. By treating source code as text and leveraging Jha [140] delves into the transformative potential of
models like CodeBERT, which achieved an accuracy of integrating ML and NLP to bolster smart grid cybersecurity.
94%, the study demonstrates a significant leap in identifying By harnessing the capabilities of these technologies, smart
and classifying vulnerabilities. This approach not only grids can achieve enhanced anomaly detection, real-time
enhances detection precision but also streamlines the process threat response, and global threat intelligence. The fusion
through an intuitive dashboard, underscoring its potential to of ML and NLP provides a comprehensive approach to
comprehensively fortify cybersecurity defenses. analyzing structured and unstructured data, enabling more
Ukwen and Karabatak [136] prepared a comprehensive effective incident response and adaptive security measures.
review illustrating the transformative role of NLP-based This section illustrates that NLP technologies offer sig-
systems in digital forensics and cybersecurity. By leveraging nificant benefits for cybersecurity in various applications.
ML and DL, these systems effectively process vast amounts One of the most promising applications is the analysis of
of unstructured data, enhancing applications from disk and log messages to identify potential cyber attacks. Additionally,
mobile forensics to intrusion detection and malware analysis. NLP models prove useful in analyzing textual data related
However, challenges persist in context comprehension and to information security. In one case, NLP models were
securing NLP systems against AI-driven attacks. employed to identify key cybersecurity terms. In conclusion,
Another interesting approach is ‘‘MalVulDroid,’’ a frame- it can be asserted that NLP technologies are highly valuable
work designed by Garg and Baliyan [137], to map malware in enhancing cybersecurity efforts.
to the vulnerabilities they exploit in Android systems. This
mapping utilizes NLP techniques such as Bag-of-Words
(BoW), n-gram probability generation, and TF-IDF, com- F. LLM AS AN OFFENSIVE TOOL
bined with ML classifiers like Multilayer Perceptron (MLP), Generative AI also finds applications in offensive security,
SVM, RIDOR, and PART. This framework demonstrates particularly in phishing. Schmitt and Flechais [141] present a
exceptional accuracy, particularly with the MLP classifier robust framework for understanding the impact of generative
achieving 98.04%. By providing a detailed many-to-many AI on social engineering and phishing attacks. Authors
mapping matrix, MalVulDroid offers critical insights for demonstrate how AI’s capabilities in realistic content cre-
developers and researchers to fortify applications against ation, advanced targeting, and automated attack infrastructure
potential threats during the initial development phases. significantly enhance the effectiveness of these attacks.

176766 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

Hazell [142] empirically demonstrates the potential for obtaining relevant responses from ChatGPT, emphasizing the
LLMs like GPT-3.5 and GPT-4 to enhance and scale spear importance of prompt engineering to maximize the model’s
phishing campaigns. By creating spear phishing messages utility. A study by Feffer et al. [148] evaluated the varied
for over 600 British Members of Parliament using GPT- practices of AI red-teaming, particularly in the context of
3.5 and GPT-4, Hazell shows that these models significantly generative AI and LLMs. Authors noted significant inconsis-
lower the barrier to entry for cybercriminals. The approach tencies in definitions and methods, stressing the importance
by Seymour and Tully [143] involves training models using of iterative and inclusive approaches to effectively identify a
word vector representations of social media posts to generate wide range of AI vulnerabilities. The study highlighted the
spear phishing messages. need for standardized reporting and transparency to enhance
Research by Bethany et al. [144] revealed that AI-crafted the utility of red-teaming results. By integrating red-teaming
emails, particularly those leveraging internal organizational with other evaluation methods such as audits and model
information, had high success rates in eliciting responses cards, a more robust framework for ensuring AI safety and
from employees. Despite existing phishing training, many trustworthiness can be established.
employees remained vulnerable to these sophisticated The study by Naito et al. [149] explored the use of
attacks, highlighting the need for ongoing and effective train- ChatGPT to generate detailed attack scenarios by integrating
ing programs. The study also demonstrated the effectiveness IT asset management data and vulnerability information.
of advanced ML-based detection techniques, achieving an F1 This innovative approach automates the attack path mapping
score of 0.98 in identifying LLM-generated phishing emails. process, reducing reliance on traditional scanning tools. The
The study by Falade [145] demonstrates the use of generative study highlights the benefits of detailed attack scenarios that
AI models, such as FraudGPT and WormGPT, in social include specific CVEs, providing valuable insights for pene-
engineering attacks. These tools enhance the effectiveness tration testing and red-teaming. However, the practicality of
of phishing campaigns and other malicious activities by these scenarios needs further validation in real-world settings.
generating highly convincing and personalized content. The The efficiency of generative AI in cyberattacks is also
study highlights the practical applications and threats posed demonstrated in a study by Teichmann [150], where the
by these technologies, including deepfake scams and voice author investigates how generative AI could be utilized to
cloning for vishing (voice phishing) attacks. FraudGPT, plan and implement ransomware attacks. The findings reveal
discovered on dark web channels, automates the creation that these AI tools significantly lower the entry barriers for
of phishing emails, undetectable malware, and malicious non-technical criminals and enhance the sophistication of
websites, making sophisticated cyberattacks accessible even attacks by those with IT expertise. The broad availability of
to less experienced attackers. WormGPT, a tool based on the generative AI could lead to an increase in the number and
GPTJ language model, is designed specifically for malicious quality of ransomware attacks. A similar conclusion can be
activities, enhancing the success rate of Business Email drawn from an article by Renaud et al. [151].
Compromise (BEC) attacks through personalized, convincing The study by Yener and Gal [152] introduces the ‘‘smart
emails. adversary’’ model, where attackers employ sophisticated
Sharma et al. [146] investigate the comparative effec- techniques to exploit AI/ML vulnerabilities. The study
tiveness of human-crafted versus GPT-3-crafted phishing highlights the challenges of processing high-volume and
emails, highlighting the significant role of cognitive biases high-velocity data and underscores the need for smarter,
such as authority bias in influencing susceptibility. The adaptive defenses. By incorporating explainable AI, smart
study reveals that while human-crafted emails are more data, and adversarial training, the research proposes a robust
effective, feedback mechanisms can significantly improve framework to counteract these advanced threats.
individuals’ phishing detection skills, particularly for AI- A very novel study by Deng et al. [153] introduces PEN-
generated emails. However, there is a need to carry out the TESTGPT, a framework leveraging LLMs to automate
same study with the current model of ChatGPT-4 Omni. penetration testing tasks. The study showcases the profi-
LLMs can be leveraged for a variety of cyberattacks ciency of LLMs in specific sub-tasks but identifies chal-
beyond phishing. A study by Heim et al. [147] evaluates lenges in maintaining context. PENTESTGPT’s innovative
the potential of ChatGPT to assist in penetration testing, architecture, featuring Reasoning, Generation, and Parsing
particularly within an educational setting using HackTheBox modules, significantly enhances the efficiency and accuracy
machines. The GPT model was employed in various stages of of penetration testing. The framework’s success in real-world
penetration testing, including pre-engagement interactions, applications underscores its superiority to the default Chat-
intelligence gathering, threat modeling, vulnerability analy- GPT model and its practical value while highlighting the need
sis, exploitation, and post-exploitation. The study concludes for continued research to refine AI tools for cybersecurity.
that while ChatGPT’s recommendations are often valuable, A study by Happe and Cito [16] confirms the efficiency
they sometimes include incorrect or misleading information, of LLMs (ChatGPT-3.5) in penetration testing. Authors
underscoring the need for human oversight and verifica- demonstrate the potential of LLMs to assist in high-level
tion. Furthermore, crafting effective prompts is crucial for task planning and low-level vulnerability hunting through

VOLUME 12, 2024 176767


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

a closed-feedback loop with a vulnerable virtual machine. developing malware. Authors demonstrate that these models
Additionally, ChatGPT can not only generate malware code can generate functional malicious code within minutes,
and phishing emails but is also quite effective in SQL highlighting significant gaps in safety controls. In his recent
Injection Attacks, as demonstrated by Alawida et al. [154]. study [160], Botacin confirms the strength of the GPT-3
Authors also confirm AI’s capability to generate polymorphic model in malware generation but also highlights that GPT-3
malware and craft convincing phishing emails. Tann et struggles to generate complex malware from simple prompts.
al. [155] explore the use of LLMs like ChatGPT, Google A subsequent study by Happe et al. [161] introduces Winter-
Bard, and Microsoft Bing in solving cybersecurity Capture mute, an LLM-guided privilege escalation tool designed to
The Flag (CTF) challenges and professional certification automate and prototype penetration testing tasks. Wintermute
questions. Authors demonstrate that while LLMs can effec- utilizes prompts to guide LLMs in discovering and exploiting
tively solve factual questions and many CTF challenges, they vulnerabilities. The empirical analysis revealed that GPT-
struggle with conceptual questions and ethical safeguards 4 outperforms other models in detecting and exploiting
can be bypassed using jailbreak prompts. In the test cases, file-based vulnerabilities, compared to GPT-3.5-turbo and
ChatGPT solved 6 out of 7, Bard solved 2, and Bing solved Llama2. However, several challenges were faced, such as
only 1. maintaining focus during testing, coping with errors, and
Shandilya et al. [14] delved into the emergence of handling multi-step exploitation paths. For instance, LLMs
GPT-based malware, emphasizing the sophistication and often repeated enumeration commands and failed to exploit
evasiveness of such threats. The use of LLMs like ChatGPT to found vulnerabilities effectively.
create polymorphic malware presents significant challenges In general, it can be seen that LLMs like ChatGPT
for traditional detection methods. The study also explores pose a significant risk to the CIA Triad. As highlighted by
the potential misuse of AI models through jailbreak prompts, Chowdhury et al. [162], there are several concerns, including
which can trick the AI into generating malicious code. the storage of sensitive user data, the generation of fake
To address these threats, the researchers propose advanced information, and the facilitation of cyberattacks. The study
detection methods, improved user authentication, and regular underscores the need for enhanced security measures, ethical
adversarial testing. guidelines, and continuous monitoring to prevent the misuse
An interesting study by McKee and Noever [156] illus- of AI. These insights are crucial for understanding the chal-
trates the powerful capabilities of LLMs in generating lenges associated with AI-generated content and developing
sophisticated and varied forms of malware. The ability to effective strategies to protect information confidentiality,
automate these processes and enhance the complexity of integrity, and availability.
attacks poses significant challenges for traditional cyber- Other studies also confirm the dangerous impact of LLMs
security defenses. The research highlights the urgent need on cybersecurity due to their vast range of applications
for advanced AI-driven defensive strategies and real-time in cyberattacks [163], [164], [165]. The importance of
anomaly detection systems to counteract these emerging enhanced security measures, user education, and regulatory
threats. frameworks to prevent the misuse of AI technologies was
Beckerich et al. [157] present a proof-of-concept called highlighted in a study by Iqbal et al. [166]. Derner and
‘‘RatGPT,’’ demonstrating how generative AI models like Batistič [167] emphasize the need for proper protection
ChatGPT can be exploited to deploy Remote Access against LLM jailbreaking, demonstrating that security mea-
Trojans (RATs). By leveraging vulnerable plugins and sures of LLMs can be bypassed. Dash and Sharma [168]
dynamic IP generation, the study shows how attackers can further highlight that generative AI can be exploited to create
establish undetected communication with victims’ systems. sophisticated fake content (deepfakes), posing significant
A seemingly innocent executable is delivered to the victim, threats to cybersecurity.
which upon execution, uses ChatGPT to generate and As demonstrated, LLMs are highly efficient in offensive
execute payloads. The payload connects to an attacker’s cybersecurity and should be considered a valuable tool in
command and control (C2) server, allowing remote control the penetration tester’s toolkit. As anticipated, LLMs excel
without direct interaction with the victim’s system. The in generating phishing messages and even producing code
power of LLMs was also demonstrated in an article by for malicious software. Surprisingly, they are also capable
Gupta et al. [158], where authors engaged LLMs in the of solving various CTF challenges, further highlighting their
creation of malware such as WannaCry, NotPetya, Ryuk, proficiency across different attack vectors. In conclusion,
REvil, and Locky. The article demonstrates that ChatGPT while LLMs are powerful tools, their potential for harm
can generate code snippets for ransomware, including the necessitates careful and responsible usage.
encryption process and ransom note generation. The AI can
provide detailed steps and code to execute such attacks.
Further, the power of LLMs was confirmed by Pa et al. [159] G. LLM IN DEFENSIVE CYBERSECURITY
in an article where the authors explore the capabilities of This section will examine the implementation of LLMs as
generative AI technologies like ChatGPT and Auto-GPT in a defensive tool within the cybersecurity domain. Rigaki

176768 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

et al. [169] delve into the application of LLMs, such as achieved higher accuracy and robustness, making them better
GPT-3.5 and GPT-4, as agents in cybersecurity environments. suited for real-world applications where phishing URLs are
Authors introduce NetSecGame, an innovative network less prevalent. An interesting countermeasure to phishing was
security environment designed for realistic and modular proposed by Cambiaso and Caviglione [176]. Authors engage
testing. The study illustrates that LLM agents can surpass ChatGPT to involve scammers in automated and pointless
traditional reinforcement learning agents and human testers communications, with the aim of wasting scammers’ time
in planning and executing complex cybersecurity tasks. These and resources. The study shows that AI successfully engaged
findings underscore the potential of LLMs to revolutionize scammers in extended email threads, with some interactions
automated network security, notwithstanding challenges like lasting up to 27 days. The AI-generated responses were
cost and model instability. effective in keeping scammers engaged and wasting their
An intriguing study conducted by Roy et al. [170] devel- resources.
oped a highly accurate BERT-based detection tool designed McHugh [177] proposed a method to address phishing mail
to identify and block malicious prompts, demonstrating the generation, where an anti-expert policy model effectively
tool’s effectiveness across various platforms. This research reduced the generation of phishing content by GPT-3. This
underscores the critical role of proactive defense mechanisms study demonstrates that LLMs can be controlled through
and ethical considerations in safeguarding AI technologies. policy interventions, highlighting that custom-trained policy
In another study, Koide et al. [171] introduced Chat- models can significantly curb the production of phishing
PhishDetector, a system leveraging LLMs (GPT-4V) to emails and enhance cybersecurity defenses. The research
detect phishing sites with remarkable precision and recall. underscores the emerging threat of AI-Crime-as-a-Service
By combining textual and visual analysis the system excels and the importance of addressing ethical and legal consid-
in identifying suspicious domains and social engineering erations.
techniques across multiple languages. This approach not only An intriguing invention by Kaheh et al. [178], Cyber
surpasses existing detection methods but also highlights the Sentinel, is a cybersecurity dialogue system leveraging
significance of advanced prompt engineering and contextual GPT-4 to streamline security tasks. This system excels in
understanding in enhancing cybersecurity defenses. both explaining cyber threats (Explainable AI) and taking
Jiang [172] delves into the usage of LLMs such as GPT- direct security actions (Actionable AI), thereby enhancing
3.5 and GPT-4 for scam detection. The author outlines transparency and operational efficiency. Cyber Sentinel
comprehensive steps required to build an effective scam integrates multiple components, including an Indicators of
detector, from data collection to integration into target Compromise (IoC) signature database, SIEM system, and
systems. The study’s preliminary evaluation demonstrated LLM. The study acknowledges several limitations, such as
the models’ proficiency in identifying scam indicators like the need for human oversight, potential privacy concerns,
unusual sender addresses and suspicious links. This research regulatory compliance challenges, and the resource-intensive
highlights the potential of LLMs in bolstering scam detection nature of deploying and maintaining such systems, thus
mechanisms and underscores the importance of continuous indicating a need for further research in the field.
refinement and collaboration with cybersecurity experts to Moreover, LLMs can be utilized to create Governance,
combat evolving threats. Risk, and Compliance (GRC) policies aimed at mitigating
The study by Heiding et al. [173] compared the effective- ransomware attacks involving data exfiltration, as demon-
ness of phishing emails generated by LLMs like GPT-4 strated in a study by McIntosh et al. [179]. The find-
with manually created emails using the V-Triad framework. ings reveal that GPT-4-generated policies, when provided
The authors found that combining human expertise with AI with tailored input prompts, can outperform traditional
significantly improved the success rates of phishing emails, human-generated policies in terms of effectiveness, effi-
while reducing the cost and effort for attackers. On the ciency, and completeness. However, the study also empha-
defensive side, LLMs, particularly Claude [174], exhibited sizes the critical role of human oversight to ensure accuracy
strong capabilities in detecting phishing attempts, sometimes and compliance with ethical and legal standards.
surpassing human detection rates. The economic analysis The approach by Lempinen et al. [180] integrates
revealed that AI-enabled phishing significantly reduces the ChatGPT-3.5 and Wazuh. This chatbot analyzes security
cost and effort involved in creating sophisticated phishing logs and performs actions such as blocking IP addresses
attacks, thereby increasing the incentives for attackers to and restarting agents. User feedback indicated that the
employ AI in phishing campaigns. Notably, emails manually chatbot is easy to use and effective in providing detailed
created using V-Triad proved to be the most effective. The security insights, particularly benefiting users with limited
study by Trad and Chehab [175] compared the effectiveness cybersecurity expertise. Nevertheless, technical issues and
of prompt engineering versus fine-tuning LLMs for phishing the need for further enhancements were identified. The study
detection. Authors discovered that fine-tuning models like by Prasad et al. [181] confirms this theory and highlights
GPT-2 specifically for phishing URL detection significantly the potential of ChatGPT in supporting Chief Information
outperformed prompt-engineered models. Fine-tuned models Security Officers (CISOs) and enhancing cybersecurity

VOLUME 12, 2024 176769


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

management. ChatGPT demonstrated significant capabilities OpenAI’s Codex and AI21’s Jurassic J-1 for zero-shot
in defining the role of CISOs, creating cybersecurity vulnerability repair. Authors found that while LLMs can
frameworks, generating awareness content, and automating effectively repair synthetically generated and handcrafted
security operations. security bugs, they struggle with real-world scenarios due
The efficiency of LLMs in cybersecurity was also demon- to context limitations and reliability issues. Detailed and
strated in smart grid applications by Zaboli et al. [182], context-rich prompts significantly enhance the models’
showcasing the broad range of applications where LLMs can performance. However, ensuring that generated fixes are both
be effectively deployed. LLM efficiency was also proven functional and secure remains a challenge.
in a study by Ali and Kostakos [183], where researchers Cherqi et al. [193] presented ConGAN-BERT, an inno-
integrated ML-based anomaly detection with explainable AI vative framework integrating self-supervised contrastive
and LLMs to enhance cybersecurity operations. The proposed learning with the GAN-BERT model to enhance cyber
system, named HuntGPT, was used to provide interpretable threat identification from Open-source Threat Intelligence
explanations for detected anomalies. Evaluation results Feeds (OTIFs). This approach addresses the challenge of
indicate substantial proficiency in technical accuracy and limited annotated data by utilizing a semi-supervised learning
response readability, underscoring the potential of such inte- strategy, significantly improving performance across various
grated systems to improve cybersecurity operations. How- datasets. The framework’s ability to handle complex and
ever, it’s not only ChatGPT that is used in defensive applica- overlapping threat descriptions, combined with an efficient
tions. Ferrag et al. [184] demonstrated that FalconLLM 40B method for selecting hard negatives, leads to notable
is also effective in automated software vulnerability detec- improvements in accuracy and robustness.
tion. Utilizing datasets such as FormAI and FalconVulnDB, Another interesting idea is a novel intrusion detection
SecureFalcon achieved remarkable accuracy in both binary framework that integrates BERT with Conditional Gen-
classification (94%) and multiclassification (92%). Another erative Adversarial Networks (CGAN). This approach by
study on software vulnerabilities explored the application Li et al. [194] addresses the challenges of class imbalance
of LLMs for detecting software vulnerabilities. Authors and limited feature extraction capabilities in traditional IDS
evaluated models like GPT-3.5-Turbo, Davinci, and CodeGen models. By augmenting minority attack samples and enhanc-
on datasets including Code Gadgets and CVEfixes. The ing feature extraction through BERT, the proposed model
findings indicate that while LLMs excel in recognizing subtle achieves significant improvements in detection accuracy
code patterns, they suffer from high false positive rates. Fine- across multiple datasets.
tuning significantly improves performance, underscoring the The efficiency of integrating LLMs in IDS was also
importance of tailored training [185]. shown in a study by Markevych and Dawson [195], which
The BERT model also depicted decent performance in highlights successful applications in sectors like banking and
another study by Ferrag et al. [186]. Utilizing the novel financial services, where AI-driven IDS have demonstrated
Privacy-Preserving Fixed-Length Encoding (PPFLE) tech- high efficacy. However, challenges related to computational
nique, SecurityBERT, a BERT-based architecture, achieved complexity, data privacy, and scalability remain.
remarkable accuracy (98.2%) and rapid inference times Guastalla et al. [196] investigated the application of
(less than 0.15 seconds) on standard CPUs. This model LLMs for detecting DDoS attacks in IoT networks. Utilizing
outperformed traditional ML and DL models, demonstrating datasets like CICIDS 2017 and the Urban IoT Dataset, the
the potential of advanced LLMs in cybersecurity. The study demonstrated that LLMs, through few-shot learning
efficiency of the BERT model in cybersecurity has also been and fine-tuning, can achieve high accuracy and provide
demonstrated in other studies, for example in [187] and [188]. insightful explanations for their predictions. Despite out-
Li and Fu, [189] explored the application of transformer- performing traditional neural networks, challenges such as
based models, specifically SecureBERT and LLAMA 2, hallucinations in fine-tuned models and the high cost of
for detecting and classifying Control Area Network (CAN) advanced models like GPT-4 were noted.
attacks in vehicular networks. The research highlights the The study by Mikhalev et al. [197] reveals that GPT-
superior performance of these models, proving that other 4 excels in fundamental and intermediate cryptographic
LLMs are also efficient in cybersecurity. queries, achieving near-perfect scores. However, the model
Additionally, it was shown by Garza et al. [190] that LLMs shows limitations in handling complex tasks, often propagat-
are capable of generating and answering questions related ing initial errors and making unwarranted assumptions.
to threat behaviors in the MITRE ATT&CK framework. Wang et al. [198] introduce SELF-GUARD, a novel
The research by Wang [191] highlights LLMs’ capabilities methodology that equips LLMs with the capability to
in identifying patterns, performing real-time analysis, and protect themselves against jailbreak attacks. By integrating
automating policy generation. These advanced AI mod- the advantages of safety training and inherent safeguards,
els offer significant potential in enhancing cybersecurity the SELF-GUARD method trains LLMs to scrutinize their
resilience by generating informed and dynamic policies. responses and append suitable tags indicating harmful
Pearce et al. [192] investigated the use of LLMs like or harmless content. This two-stage training strategy not

176770 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

only bolsters the LLMs’ proficiency in identifying harmful As demonstrated in this section, LLMs hold significant
material but also ensures they maintain high performance potential in cyber defense. LLMs like ChatGPT have
across various tasks. This approach provides robust defense proven to be highly efficient across various industries,
against evolving attack techniques while preserving the with particularly exceptional performance in text-related
general functionalities of the LLMs. tasks such as phishing prevention and software vulnerability
An intriguing application of ChatGPT is illustrated in analysis. Notably, LLMs also exhibit strong capabilities
a study by Shchavinsky et al. [199], where the authors in understanding cybersecurity frameworks and providing
demonstrated that the integration of AI facilitates the rapid consultations on related matters.
creation of realistic and pertinent training scenarios. This
significantly enhances the efficiency and effectiveness of the V. DISCUSSION AND RESULTS
learning process. The study emphasizes the importance of In this article, a thorough systematic literature review was
cultivating technical and managerial competencies through conducted to discern current research directions and methods
practical applications and real-life situations. However, it also in the application of AI and LLMs in cybersecurity. The
highlights the necessity for ongoing critical assessment and analysis of the literature was guided by the three research
refinement of AI-generated content to ensure it aligns with questions defined in the introduction of this article. In this
legal, ethical, and contextual standards. section, the findings of the literature analysis of section IV
Marshall [200] delves into the profound influence of LLMs are discussed and results from them to answer the research
like ChatGPT on cybersecurity. His research underscores questions are derived.
how LLMs can bolster cybersecurity through efficient code AI has profoundly revolutionized the field of cybersecurity,
generation and swift threat detection. However, the study for example with successful applications of ML and DL
also raises pivotal concerns about the misuse of these models technologies to log analysis, intrusion detection and intrusion
in generating phishing emails and malware, thereby low- prevention. The recent emergence of LLMs has further
ering the threshold for cybercriminal activities. Real-world influenced the cybersecurity landscape. AI, encompassing
examples and experimental findings highlight the potential ML, DL, and LLMs, can be harnessed as both a defensive and
risks, emphasizing the necessity for robust safeguards and an offensive instrument. The swift progression of LLMs has
heightened awareness among cybersecurity professionals. ushered in new possibilities for both attackers and defenders.
Numerous other research articles echo these findings, demon- In 2023 and 2024, several new LLMs were introduced,
strating LLMs’ efficacy in log analysis, threat detection, including ChatGPT-4, ChatGPT-4o, and LLaMa2. By the
vulnerability assessment, and incident response, while also end of 2025, OpenAI is poised to unveil a new iteration
identifying issues such as job displacement and potential of ChatGPT, anticipated to be markedly more potent than
misuse of LLMs [201], [202], [203], [204]. Karlsen et the current LLMs. The next milestone in AI development
al. [205] benchmark several LLMs for log analysis and is Artificial General Intelligence (AGI) [209], foreseen as
security using the LLM4Sec pipeline. The authors’ study an exceedingly powerful tool. As AI advances, the security
evaluates models like BERT, RoBERTa, DistilRoBERTa, landscape is expected to evolve in tandem. The authors
GPT-2, and GPT-Neo across six datasets, revealing that consider that AGI, under specific circumstances, might
fine-tuning significantly enhances performance. Notably, be capable of compromising some of the contemporary
DistilRoBERTa achieves near-perfect F1 scores, surpassing cryptographic methods. Consequently, the advent of more
current state-of-the-art models. advanced AI will necessitate the development of more
Shafee et al. [206] assess the performance of various LLM sophisticated defense mechanisms and will introduce new
chatbots, including ChatGPT and GPT4all, for Open source threat landscapes.
intelligence (OSINT) -based Cyber Threat Intelligence (CTI). With the first research question, the authors set out to
The authors’ findings indicate that while these chatbots determine how effective AI and LLMs are in cybersecurity
excel in binary classification tasks, achieving F1 scores applications. Based on the literature analysis, it is possible
of 0.94 and 0.90 respectively, they fall short compared to to seamlessly integrate AI into IDS and IPS systems,
specialized models in NER. This underscores the potential where a host of ML algorithms can be tailored to address
of LLM chatbots to enhance cyber threat awareness, but also specific cybersecurity challenges. Furthermore, AI proves to
highlights the need for targeted training and optimization be invaluable in threat detection and phishing prevention,
to improve their NER capabilities. These findings suggest a testament to its efficacy seen in the success of numerous
a pathway for integrating LLM chatbots into CTI tools, spam filters. The body of research in this domain is extensive,
balancing their strengths in classification with ongoing encompassing both practical and theoretical studies. During
improvements in entity recognition. the course of the literature analysis, it was noted that there
In general, researchers acknowledge the substantial poten- are no existing studies discussing the potential of LLMs in
tial of LLMs, yet researchers emphasize the need for robust network traffic analysis. However, based on this extensive
regulatory frameworks, ethical guidelines, and continuous review of the literature the authors hypothesize that it is
monitoring to ensure responsible use [207], [208]. feasible to train or develop an LLM specifically for network

VOLUME 12, 2024 176771


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

traffic analysis, which would enable the LLM to classify that continuing research is imperative, as AI is expected to
certain packet streams as either legitimate or malicious. Such significantly transform the threat landscape in the imminent
a model could potentially offer explanations for specific future. The majority of the analysed literature emphasized
packets or frames, providing detailed insights on whether the application of AI and LLMs in defensive security. The
the traffic is malicious or if there are peculiarities (e.g., the offensive capabilities of AI were predominantly explored
use of an outdated protocol that, while not malicious, should at the LLM level, alongside the defensive functionalities
be avoided). It is believed that such a model would make of these models. The analysis results highlight the inherent
a significant contribution to the cybersecurity industry and vulnerabilities in AI and LLM technologies that must be
enhance security accessibility for SMEs and users who are addressed during development, such as susceptibility to
not highly trained. poisoning, backdoors, evasion, model stealing, membership
Beyond LLMs, NLP technologies have also proven inference, and property inference attacks.
effective, notably in identifying emergent cybersecurity The occasional ‘‘hallucinations’’ of LLMs necessitate
terminologies—such as new hacking tools and malware consideration to mitigate potential risks. The analyzed
names from hacker forums—and in analyzing vast datasets, research works advise caution against an uncritical reliance
which is pivotal in digital forensics and cybersecurity on LLMs and AI. The authors’ study revealed numerous
investigations. As the second research question, the authors articles reporting instances where LLMs produced delusional
studied the literature to find out in what ways are Large or inaccurate outputs. This introduces significant concerns
Language Models applied in cybersecurity tactics. Within about the reliability of LLMs in scenarios where human
the cybersecurity realm, LLMs can be deployed for both oversight is unfeasible or the stakes are exceedingly high. For
offensive and defensive purposes. The authors’ literature instance, entrusting LLMs with calculating rocket trajectories
analysis underscores the formidable power of LLMs in or developing new pharmaceuticals poses substantial risks.
offensive operations. Certain studies highlight the efficiency This caution extends to cybersecurity, questioning the
of LLMs in Capture-the-Flag (CTF) challenges, social prudence of relying solely on LLMs for threat identification.
engineering attacks, and various other domains. Hence, While findings generally indicate a high accuracy rate in
the regulation of this power through safety mechanisms is LLM outputs—trustworthy in nine out of ten cases—it is
of paramount importance. It is essential to recognize that imperative that LLMs employed in specific fields be trained
the risks associated with LLMs extend beyond the field with domain-specific data. Hence, while general-purpose
of cybersecurity. LLMs can disseminate sensitive infor- LLMs demonstrate utility, they exhibit a higher error rate and
mation, such as manufacturing instructions for explosives are less advisable for specialized cybersecurity applications.
and schematics for firearms. Controlling these information Establishing guidelines or ethical frameworks governing
sources to avoid malicious purposes is critical. Although the use of LLMs is vital, and many researchers are actively
LLM safety mechanisms aim to curb the spread of such engaged in this effort. Additionally, some studies focus on
information, jailbreak techniques can still be employed to the legal aspects of LLMs, emphasizing the importance
extract sensitive data or coerce the LLM into performing of determining accountability in the global deployment of
illicit or unethical actions. LLMs have also expedited the LLMs, especially within the governmental sector.
creation of phishing emails and have proven effective in The authors believe that before LLMs can be implemented
streamlining penetration testing processes, assisting with at the governmental or corporate level, it is essential to
each phase. Furthermore, LLMs are instrumental in devising establish a legal and ethical framework for their use. A good
attacks against other AI systems. example is the policies adopted by many universities, such
While LLMs exhibit remarkable efficacy in offensive as the University of Turku (Finland) [210], which state that
operations, they are equally potent in defense. LLMs and AI can be used as a tool, provided its use is disclosed. This
NLP technologies are particularly effective in countering approach offers students numerous opportunities for research
social engineering attacks due to their capacity to analyze and study. Similarly, the authors believe that instead of
written and spoken language, making them invaluable tools banning AI, corporations and governments should implement
in such contexts. Furthermore, LLMs are proficient in policy guidelines and limitations that regulate its use, ensuring
generation, code analysis, IDS/IPS systems, and other areas. responsible and transparent practices.
Another intriguing application of LLMs is in explainable AI,
where cybersecurity engineers can receive explanations or VI. CONCLUSION
human-readable analyses for specific alerts or logs. This is This systematic literature review and analysis contributed
viewed as having considerable potential, particularly for edu- an exhaustive examination of the current deployments and
cational purposes. Similarly, in log analysis, explainable AI use cases of LLMs and defensive AI techniques within
can save engineers time by providing immediate explanations cybersecurity, unveiling both potential pitfalls and advan-
for specific logs without necessitating in-depth analysis. The tages. Additionally, it addresses cyberethics and the legal
third research question was defined to deduce the challenges foundations for the use of LLMs.
and limitations of Large Language Models in the context of During the literature review, it became apparent that LLMs
cybersecurity. Based on the literature analysis, scholars posit and AI have great potential for utilization in cybersecurity.

176772 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

LLMs have shown remarkable efficacy in phishing attack [5] X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo,
simulations and in cybersecurity governance, even defending J. Grundy, and H. Wang, ‘‘Large language models for software
engineering: A systematic literature review,’’ ACM Trans. Softw. Eng.
against sophisticated exploits. Additionally, LLMs hold the Methodol., pp. 1–76, Sep. 2024.
potential for developing security software, further cementing [6] D. Mon Divakaran and S. T. Peddinti, ‘‘LLMs for cyber security: New
their role as a formidable tool in cybersecurity innovation. opportunities,’’ 2024, arXiv:2404.11338.
[7] Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, ‘‘A survey
AI and LLMs are versatile, with applications ranging from on large language model (LLM) security and privacy: The good, the
secure coding to traffic analysis. AI and LLMs substan- bad, and the ugly,’’ High-Confidence Comput., vol. 4, no. 2, Jun. 2024,
tially reduce the entry barriers for hackers while proving Art. no. 100211.
[8] Y. Yigit, W. J. Buchanan, M. G. Tehrani, and L. Maglaras, ‘‘Review of
immensely beneficial for penetration testers. For example, generative AI methods in cybersecurity,’’ 2024, arXiv:2403.08701.
LLMs are able to perform penetration testing tasks, and they [9] G. de Jesus Coelho da Silva and C. B. Westphall, ‘‘A survey of large
are also highly efficient at generating phishing messages. language models in cybersecurity,’’ 2024, arXiv:2402.16968.
Certain limitations of LLMs were also observed during [10] H. Xu, S. Wang, N. Li, K. Wang, Y. Zhao, K. Chen, T. Yu, Y. Liu,
and H. Wang, ‘‘Large language models for cyber security: A systematic
this study. For instance, in specific fields, the performance of literature review,’’ 2024, arXiv:2405.04760.
LLMs was found to be below an adequate level. Additionally, [11] M. Guven, ‘‘A comprehensive review of large language models in cyber
there were instances where these models generated hallucina- security,’’ Int. J. Comput. Experim. Sci. Eng., vol. 10, no. 3, pp. 507–516,
Sep. 2024.
tions or inaccurate information, underscoring the importance [12] B. Kitchenham and S. M. Charters, ‘‘Guidelines for performing
of thoroughly cross-checking any output provided by LLMs systematic literature reviews in software engineering,’’ Keele
to ensure its reliability. The results of the literature analysis Univ. Durham Univ., Durham, U.K.:, Tech. Rep. EBSE-2007-
01, Dec. 2006. [Online]. Available: https://www.researchgate.net/
underscore the utility and power of LLM tools in data publication/302924724GuidelinesforperformingSystematicLiterature
analysis and text review within the field of cybersecurity, ReviewsinSoftwareEngineering
reinforcing the argument for their value in cybersecurity [13] M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann,
C. D. Mulrow, L. Shamseer, J. M. Tetzlaff, E. A. Akl, S. E. Brennan,
applications. The results also lead to the observation that R. Chou, J. Glanville, J. M. Grimshaw, A. Hrobjartsson, M. M. Lalu,
LLMs can significantly enhance the efficiency of individual T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, L. A. McGuinness,
workers. They can boost productivity through explainable L. A. Stewart, J. Thomas, A. C. Tricco, V. A. Welch, P. Whiting, and
D. Moher, ‘‘The PRISMA 2020 statement: An updated guideline for
AI and provide valuable insights into different sub-fields reporting systematic reviews,’’ BMJ, vol. 372, pp. 1–9, Jan. 2021.
of cybersecurity. The usage of LLMs in cybersecurity is [14] S. K. Shandilya, G. Prharsha, A. Datta, G. Choudhary, H. Park, and
already extensive, including both offensive and defensive I. You, ‘‘GPT based malware: Unveiling vulnerabilities and creating a
applications, but a lot of their potential is still unleashed and way forward in digital space,’’ in Proc. Int. Conf. Data Secur. Privacy
Protection (DSPP), Oct. 2023, pp. 164–173.
significant vulnerabilities and ethical concerns need to be [15] X. Ding, B. Liu, Z. Jiang, Q. Wang, and L. Xin, ‘‘Spear phishing emails
addressed in their deployment for cybersecurity applications. detection based on machine learning,’’ in Proc. IEEE 24th Int. Conf.
To conclude, the authors observe a lack of sufficient studies Comput. Supported Cooperat. Work Design (CSCWD), Dalian, China,
May 2021, pp. 354–359.
addressing the use of LLMs in network security, highlighting [16] A. Happe and J. Cito, ‘‘Getting pwn’d by AI: Penetration testing with
a potential research gap that should be explored. large language models,’’ in Proc. 31st ACM Joint Eur. Softw. Eng.
Conf. Symp. Found. Softw. Eng., San Francisco, CA, USA, Nov. 2023,
pp. 2082–2086.
ACKNOWLEDGMENT [17] K. Steverson, C. Carlin, J. Mullin, and M. Ahiskali, ‘‘Cyber intrusion
During the preparation of this article, to enhance the detection using natural language processing on windows event logs,’’
in Proc. Int. Conf. Mil. Commun. Inf. Syst. (ICMCIS), Hague, The
clarity and coherence of their work, artificial intelligence Netherlands, May 2021, pp. 1–7.
(specifically ChatGPT-4o LLM), was used for proofreading, [18] European Parliament. (Mar. 2024). Artificial Intelligence Act:
text refinement, and statistical analysis. After application European Parliament Legislative Resolution. Strasbourg. Accessed:
Feb. 26, 2024. [Online]. Available: https://www.europarl.
of the AI, all text was manually reviewed by the authors europa.eu/doceo/document/TA-9-2024-0138EN.pdf
to ensure the accuracy of both the content and the data [19] V. Moret-Bonillo, ‘‘Emerging technologies in artificial intelligence:
presented. Quantum rule-based systems,’’ Prog. Artif. Intell., vol. 7, no. 2,
pp. 155–166, Jan. 2018.
[20] I. E. Naqa and M. J. Murphy, What is Machine Learning? In Machine
REFERENCES Learning in Radiation Oncology: Theory and Applications. Cham,
Switzerland: Springer, 2015.
[1] J. Heino, C. Jalio, A. Hakkala, and S. Virtanen, ‘‘JAPPI: An unsupervised [21] Z. Zhao and H. Liu, ‘‘Spectral feature selection for supervised and
endpoint application identification methodology for improved zero trust unsupervised learning,’’ in Proc. 24th Int. Conf. Mach. Learn., Jun. 2007,
models, risk score calculations and threat detection,’’ Comput. Netw., pp. 1151–1157.
vol. 250, Aug. 2024, Art. no. 110606. [22] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
[2] T. Sowmya and E. A. M. Anita, ‘‘A comprehensive review of AI no. 7553, pp. 436–444, May 2015.
based intrusion detection system,’’ Meas., Sensors, vol. 28, Aug. 2023, [23] R. Mohammed, J. Rawashdeh, and M. Abdullah, ‘‘Machine learning
Art. no. 100827. with oversampling and undersampling techniques: Overview study and
[3] G. Suarez-Tangil, E. Palomar, A. Ribagorda, and I. Sanz, ‘‘Providing experimental results,’’ in Proc. 11th Int. Conf. Inf. Commun. Syst. (ICICS),
SIEM systems with self-adaptation,’’ Inf. Fusion, vol. 21, pp. 145–158, Irbid, Jordan, Apr. 2020, pp. 243–248.
Jan. 2015. [24] G. M. Foody, ‘‘Challenges in the real world use of classification
[4] M. Patel, P. P. Amritha, V. B. Sudheer, and M. Sethumadhavan, accuracy metrics: From recall and precision to the Matthews correlation
‘‘DDoS attack detection model using machine learning algorithm in coefficient,’’ PLoS ONE, vol. 18, no. 10, Oct. 2023, Art. no. e0291908.
next generation firewall,’’ Proc. Comput. Sci., vol. 233, pp. 175–183, [25] S. Narkhede, ‘‘Understanding AUC-ROC curve,’’ Towards Data Sci.,
Jan. 2024. vol. 26, no. 1, pp. 220–227, 2018.

VOLUME 12, 2024 176773


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

[26] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, [50] M. Ozkan-Okay, E. Akin, Ö. Aslan, S. Kosunalp, T. Iliev, I. Stoyanov,
W. Li, and P. J. Liu, ‘‘Exploring the limits of transfer learning with a and I. Beloev, ‘‘A comprehensive survey: Evaluating the efficiency of
unified text-to-text transformer,’’ 2019, arXiv:1910.10683. artificial intelligence and machine learning techniques on cyber security
[27] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, solutions,’’ IEEE Access, vol. 12, pp. 12229–12256, 2024.
T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, [51] F. Kamoun, F. Iqbal, M. A. Esseghir, and T. Baker, ‘‘AI and machine
A. Joulin, E. Grave, and G. Lample, ‘‘LLaMA: Open and efficient learning: A mixed blessing for cybersecurity,’’ in Proc. Int. Symp. Netw.,
foundation language models,’’ 2023, arXiv:2302.13971. Comput. Commun. (ISNCC), Montreal, QC, Canada, Oct. 2020, pp. 1–7.
[28] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, [52] M. Macas and C. Wu, ‘‘Review: Deep learning methods for cybersecurity
P. Kambadur, D. Rosenberg, and G. Mann, ‘‘BloombergGPT: A large and intrusion detection systems,’’ in Proc. IEEE Latin-American
language model for finance,’’ 2023, arXiv:2303.17564. Conf. Commun. (LATINCOM), Santo Domingo, Dominican Republic,
[29] J. Zhou, J. Ji, J. Dai, and Y. Yang, ‘‘Sequence to sequence reward model- Nov. 2020, pp. 1–6.
ing: Improving RLHF by language feedback,’’ 2024, arXiv:2409.00162. [53] N. Mohamed, ‘‘Current trends in AI and ML for cybersecurity: A state-
[30] T. Kaufmann, P. Weng, V. Bengs, and E. Hüllermeier, ‘‘A survey of of-the-art survey,’’ Cogent Eng., vol. 10, no. 2, pp. 1–30, Oct. 2023.
reinforcement learning from human feedback,’’ 2023, arXiv:2312.14925. [54] A. Wasif, M. Hamid, and A. Abbas, ‘‘AI and cybersecurity: An ever-
[31] M. Raj J, K. VM, H. Warrier, and Y. Gupta, ‘‘Fine tuning LLM evolving landscape,’’ Int. J. Adv. Eng. Technol. Innov., vol. 1, no. 1,
for enterprise: Practical guidelines and recommendations,’’ 2024, p. 5271, Jan. 2024.
arXiv:2404.10779. [55] M. Bagaa, T. Taleb, J. B. Bernabe, and A. Skarmeta, ‘‘A machine
[32] T. Mitsunaga, ‘‘Heuristic analysis for security, privacy and bias of text learning security framework for iot systems,’’ IEEE Access, vol. 8,
generative AI: GhatGPT-3.5 case as of June 2023,’’ in Proc. IEEE pp. 114066–114077, 2020.
Int. Conf. Comput. (ICOCO), Langkawi Island, Malaysia, Oct. 2023, [56] Y. Chen, J. Ding, D. Li, and Z. Chen, ‘‘Joint BERT model based
pp. 301–305. cybersecurity named entity recognition,’’ in Proc. 4th Int. Conf. Softw.
[33] Chatgpt_Dan. Accessed: Feb. 2, 2024. [Online]. Available: Eng. Inf. Manage., New York, NY, USA, Jul. 2021, pp. 236–242.
https://github.com/0xk1h0/ChatGPTDAN [57] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
[34] A. Tsamados, L. Floridi, and M. Taddeo, ‘‘The cybersecurity crisis of of deep bidirectional transformers for language understanding,’’ in Proc.
artificial intelligence: Unrestrained adoption and natural language-based NAACL, 2019, pp. 4171–4186.
attacks,’’ 2023, arXiv:2311.09224. [58] A. Rahali and M. A. Akhloufi, ‘‘MalBERT: Using transformers for cyber-
[35] J. A. Chaudhry, S. A. Chaudhry, and R. G. Rittenhouse, ‘‘Phishing attacks security and malicious software detection,’’ 2021, arXiv:2103.03806.
and defenses,’’ Int. J. Secur. Appl., vol. 10, no. 1, pp. 247–256, Jan. 2016. [59] A. Nitaj and T. Rachidi, ‘‘Applications of neural network-based AI in
[36] (2017). Technical University of Denmark (DTU) Lyngby, cryptography,’’ Cryptography, vol. 7, no. 3, p. 39, Aug. 2023.
Denmark, an Introduction to Malware. [Online]. Available: [60] E. Hemberg and U.-M. O’Reilly, ‘‘Using a collated cybersecu-
https://orbit.dtu.dk/en/publications/an-introduction-to-malware-2 rity dataset for machine learning and artificial intelligence,’’ 2021,
[37] L. Bosnjak, J. Sres, and B. Brumen, ‘‘Brute-force and dictionary attack arXiv:2108.02618.
on hashed real-world passwords,’’ in Proc. 41st Int. Conv. Inf. Commun. [61] C. Park, J. Lee, Y. Kim, J.-G. Park, H. Kim, and D. Hong, ‘‘An enhanced
Technol., Electron. Microelectron. (MIPRO), Otavija, Croatia, May 2018, AI-based network intrusion detection system using generative adversarial
pp. 1161–1166. networks,’’ IEEE Internet Things J., vol. 10, no. 3, pp. 2330–2345,
Feb. 2023.
[38] K. Scarfone, M. Souppaya, A. Cody, and A. Orebaugh, Technical Guide
to Information Security Testing and Assessment, document NIST Publi- [62] D. L. Marino, C. S. Wickramasinghe, and M. Manic, ‘‘An adversarial
cation SP 800-115, National Institute of Standards and Technology, U.S. approach for explainable AI in intrusion detection systems,’’ 2018,
Dept. Commerce, Gaithersburg, MD, USA, 2008. [Online]. Available: arXiv:1811.11705.
https://tsapps.nist.gov/publication/getpdf.cfm?pubid=152164 [63] D. L. Pissanidis and K. Demertzis, ‘‘Integrating AI/ML in cybersecurity:
An analysis of open XDR technology and its application in intrusion
[39] X. Wu, R. Duan, and J. Ni, ‘‘Unveiling security, privacy, and ethical
detection and system log management,’’ Preprints, vol. 2023, pp. 1–24,
concerns of ChatGPT,’’ 2023, arXiv:2307.14192.
Jan. 2024.
[40] H. Chugh, ‘‘Cybersecurity in the age of generative AI: Usable security
[64] M. Ouhssini, K. Afdel, E. Agherrabi, M. Akouhar, and A. Abarda,
& ThreatGPT,’’ Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, pp. 1–11,
‘‘DeepDefend: A comprehensive framework for DDoS attack detection
Oct. 2023.
and prevention in cloud computing,’’ J. King Saud Univ. Comput. Inf. Sci.,
[41] R. Gianni, S. Lehtinen, and M. Nieminen, ‘‘Governance of responsible vol. 36, no. 2, Feb. 2024, Art. no. 101938.
AI: From ethical guidelines to cooperative policies,’’ Frontiers Comput.
[65] S. Latif, W. Boulila, A. Koubaa, Z. Zou, and J. Ahmad, ‘‘DTL-IDS:
Sci., vol. 4, pp. 1–17, May 2022.
An optimized intrusion detection framework using deep transfer learning
[42] T. Flaih and Y. Jasim, ‘‘The ethical implications of ChatGPT AI chatbot: and genetic algorithm,’’ J. Netw. Comput. Appl., vol. 221, Jan. 2024,
A review,’’ J. Modern Comput. Eng. Res., vol. 2023, pp. 49–57, Oct. 2023. Art. no. 103784.
[43] S. A. Matei and E. Bertino, ‘‘Educating for AI cybersecurity work and [66] O. Veprytska and V. Kharchenko, ‘‘AI powered attacks against AI
research: Ethics, systems thinking, and communication requirements,’’ powered protection: Classification, scenarios and risk analysis,’’ in
2023, arXiv:2311.04326. Proc. 12th Int. Conf. Dependable Syst., Services Technol. (DESSERT),
[44] C. Waghmare, Security and Ethical Considerations When Using Wulumuqi, China, Dec. 2022, pp. 1–7.
ChatGPT in Unleashing the power of ChatGPT: A Real World [67] I. Vaccari, A. Carlevaro, S. Narteni, E. Cambiaso, and M. Mongelli, ‘‘On
Bus. Applications. Berkeley, CA, USA: Apress, 2023, ch. 6, the detection of adversarial attacks through reliable AI,’’ in Proc. IEEE
pp. 111–132. INFOCOM Conf. Comput. Commun. Workshops (INFOCOM WKSHPS),
[45] B. Niu and G. F. N. Mvondo, ‘‘I am ChatGPT, the ultimate AI New York, NY, USA, May 2022, pp. 1–6.
chatbot! Investigating the determinants of users’ loyalty and ethical usage [68] M. A. Ferrag, M. Debbah, and M. Al-Hawawreh, ‘‘Generative
concerns of ChatGPT,’’ J. Retailing Consum. Services, vol. 76, Jan. 2024, AI for cyber threat-hunting in 6G-enabled IoT networks,’’ 2023,
Art. no. 103562. arXiv:2303.11751.
[46] Y. Shi, ‘‘Study on security risks and legal regulations of generative AI,’’ [69] L. Karaçay, Z. Laaroussi, S. Ujjwal, and E. U. Soykan, ‘‘On the security
J. Law Sci., vol. 2, pp. 17–23, Nov. 2023. of 6G use cases: AI/ML-specific threat modeling of all-senses meeting,’’
[47] F. Gualdi and A. Cordella, ‘‘Theorizing the regulation of generative AI: in Proc. 2nd Int. Conf. 6G Netw. (6GNet), Oct. 2023, pp. 1–8.
Lessons learned from Italy’s ban on ChatGPT,’’ in Proc. Annu. Hawaii [70] R. Ravi, ‘‘A performance analysis of software defined network based
Int. Conf. Syst. Sci. Honolulu, HI, USA: IEEE Computer Society, 2024, prevention on phishing attack in cyberspace using a deep machine
pp. 2023–2032. learning with CANTINA approach (DMLCA),’’ Comput. Commun.,
[48] N. Kshetri, ‘‘Cybercrime and privacy threats of large language models,’’ vol. 153, pp. 375–381, Mar. 2020.
IT Prof., vol. 25, no. 3, pp. 9–13, May 2023. [71] M. Asfour and J. C. Murillo, ‘‘Harnessing large language models to
[49] D. Jeong, ‘‘Artificial intelligence security threat, crime, and simulate realistic human responses to social engineering attacks: A case
forensics: Taxonomy and open issues,’’ IEEE Access, vol. 8, study,’’ Int. J. Cybersecurity Intell. Cybercrime, vol. 6, no. 2, pp. 21–49,
pp. 184560–184574, 2020. Aug. 2023.

176774 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

[72] S. Dadvandipour and A. G. Ganie, ‘‘Analyzing and predicting [93] J. M. Spring, A. Galyardt, A. D. Householder, and N. VanHoudnos,
spear-phishing using machine learning methods,’’ MultidiszciplináRis ‘‘On managing vulnerabilities in AI/ML systems,’’ in Proc. New Secur.
Tudományok, vol. 10, no. 4, pp. 262–273, 2020. Paradigms Workshop, Oct. 2020, pp. 111–126.
[73] S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, ‘‘A systematic literature [94] K. Grosse, L. Bieringer, T. R. Besold, and A. Alahi, ‘‘Towards
review on phishing email detection using natural language processing more practical threat models in artificial intelligence security,’’ 2023,
techniques,’’ IEEE Access, vol. 10, pp. 65703–65727, 2022. arXiv:2311.09994.
[74] M. Gawade, ‘‘Cyber protect: A robust cybersecurity system for fraudulent [95] S. Scott-Hayward, ‘‘Securing AI-based security systems,’’ Geneva Centre
scam and phishing detection using machine learning techniques,’’ Int. J. Secur. Policy Strategic Secur. Anal., vol. 25, pp. 1–25, Jun. 2022.
Res. Appl. Sci. Eng. Technol., vol. 11, no. 11, pp. 2480–2487, Nov. 2023. [96] L. N. Tidjon and F. Khomh, ‘‘Threat assessment in machine learning
[75] T. Anande and M. Leeson, ‘‘Synthetic network traffic data generation based systems,’’ 2022, arXiv:2207.00091.
and classification of advanced persistent threat samples: A case study [97] D. Williams, C. Clark, R. McGahan, B. Potteiger, D. Cohen, and
with GANs and XGBoost,’’ in Proc. Int. Conf. Deep Learn. Theory Appl., P. Musau, ‘‘Discovery of AI/ML supply chain vulnerabilities within
Rome, Italy, Jul. 2023, pp. 1–18. automotive cyber-physical systems,’’ in Proc. IEEE Int. Conf. Assured
[76] M. M. Hasan, M. U. Islam, and J. Uddin, ‘‘Advanced persistent threat Autonomy (ICAA), Fajardo, PR, USA, Mar. 2022, pp. 93–96.
identification with boosting and explainable AI,’’ Social Netw. Comput. [98] P. Bountakas, A. Zarras, A. Lekidis, and C. Xenakis, ‘‘Defense strategies
Sci., vol. 4, no. 3, pp. 1–9, Mar. 2023. for adversarial machine learning: A survey,’’ Comput. Sci. Rev., vol. 49,
[77] M. Hlatshwayo, ‘‘Unleashing the power of AI: A deep dive into the Aug. 2023, Art. no. 100573.
integration of AI in cybersecurity for threat detection and response,’’ J. [99] S. L. Eggers and C. Sample, ‘‘Vulnerabilities in artificial intelligence
IoT Intell. Solutions, vol. 1, pp. 1–25, Jan. 2024. and machine learning applications and data,’’ U.S. Dept. Energy,
[78] A. M. S. N. Amarasinghe, W. A. C. H. Wijesinghe, D. L. A. Nirmana, Idaho Nat. Lab (INL), Idaho Falls, ID, USA, Tech. Rep. INL/RPT-
A. Jayakody, and A. M. S. Priyankara, ‘‘AI based cyber threats and 22-66111-Rev000, Dec. 2020. [Online]. Available: https://www.osti.
vulnerability detection, prevention and prediction system,’’ in Proc. Int. gov/biblio/1846969
Conf. Advancements Comput. (ICAC), Malabe, Sri Lanka, Dec. 2019, [100] L. Mauri and E. Damiani, ‘‘Modeling threats to AI-ML systems using
pp. 363–368. STRIDE,’’ Sensors, vol. 22, no. 17, p. 6662, Sep. 2022.
[79] M. A. Ferrag, D. Hamouda, M. Debbah, L. Maglaras, and A. Lakas, [101] G. Tao, S. Cheng, Z. Zhang, J. Zhu, G. Shen, and X. Zhang, ‘‘Opening
‘‘Generative adversarial networks-driven cyber threat intelligence a Pandora’s box: Things you should know in the era of custom GPTs,’’
detection framework for securing Internet of Things,’’ 2023, 2023, arXiv:2401.00905.
arXiv:2304.05644.
[102] J. Schuett, ‘‘Three lines of defense against risks from AI,’’ AI Soc., vol.
[80] O. Uwagboe and S. Aremora. (2023). AI-Based Security Analytics 2023, pp. 1–24, Nov. 2023.
for Cloud Infrastructure: Leveraging Machine Learning Algorithms
[103] Y. He, J. Qiu, W. Zhang, and Z. Yuan, ‘‘Fortifying ethical boundaries
to Detect and Mitigate Advanced Persistent Threats (APTs) in Cloud
in AI: Advanced strategies for enhancing security in large language
Environments. ResearchGate preprint, Accessed: Feb. 12, 2024. [Online].
models,’’ 2024, arXiv:2402.01725.
Available: https://www.researchgate.net/publication/376168059TitleAI-
BasedSecurityAnalyticsforCloudInfrastructureLeveragingMachine [104] X. Zhang, F. T. Chan, C. Yan, and I. Bose, ‘‘Towards risk-aware AI and
LearningAlgorithmstoDetectandMitigateAdvancedPersistentThreats machine learning systems: An overview,’’ Decis. Support Syst., vol. 159,
APTsinCloudEnvironments pp. 1–13, Aug. 2022.
[81] N. Mohamed, E. Alam, and G. Stubbs, ‘‘Multi-layer protection approach [105] K. I. Gubbi, I. Kaur, A. Hashem, S. Manoj P D, H. Homayoun,
(MLPA) for the detection of advanced persistent threats,’’ J. Positive A. Sasan, and S. Salehi, ‘‘Securing AI hardware: Challenges in detect-
School Psychol., vol. 6, no. 5, pp. 1–23, Jun. 2022. ing and mitigating hardware trojans in ML accelerators,’’ in Proc.
[82] T. Arshad and S. Menon, ‘‘AI-enabled honeypot,’’ J. Netw. Inf. Secur., IEEE 66th Int. Midwest Symp. Circuits Syst. (MWSCAS), Aug. 2023,
vol. 11, no. 2, pp. 16–26, Jun. 2023. pp. 821–825.
[83] I. Chomiak-Orsa, A. Rot, and B. Blaicke, ‘‘AI in cybersecurity: The use of [106] Z. Wang, J. Ma, X. Wang, J. Hu, Z. Qin, and K. Ren, ‘‘Threats to
AI along the cyber kill chain,’’ in Proc. 11th Int. Conf. Comput. Collect. training: A survey of poisoning attacks and defenses on machine learning
Intell., Hendaye, France, Aug. 2019, pp. 406–416. systems,’’ ACM Comput. Surveys, vol. 55, no. 7, pp. 1–36, Dec. 2022.
[84] S. B. Molina, P. Nespoli, and F. G. Mármol, ‘‘Tackling cyberattacks [107] A. Rayhan and S. Rayhan. AI and Global Security: Navigating the
through AI-based reactive systems: A holistic review and future vision,’’ Risks and Opportunities. Berlin, Germany: ResearchGate Preprint, 2023,
2023, arXiv:2312.06229. Accessed: Feb. 12, 2024, doi: 10.13140/RG.2.2.28224.92160/1.
[85] E. Iturbe, E. Rios, A. Rego, and N. Toledo, ‘‘Artificial intelligence for next [108] I. H. Sarker, A. S. M. Kayes, S. Badsha, H. Alqahtani, P. Watters, and
generation cybersecurity: The AI4CYBER framework,’’ in Proc. 18th Int. A. Ng, ‘‘Cybersecurity data science: An overview from machine learning
Conf. Availability, Rel. Secur., New York, NY, USA, Aug. 2023, pp. 1–8. perspective,’’ J. Big Data, vol. 7, no. 1, pp. 1–29, Jul. 2020.
[86] C. R. Barone IV, M. Mekni, and M. Nassar, ‘‘Gargoyle guard: Enhancing [109] N. Chowdhury and S. Rahman, ‘‘A brief review of ChatGPT: Limi-
cybersecurity with AI techniques,’’ in Proc. 3rd Intell. Cybersecur. Conf., tations, challenges and ethical-social implications,’’ Bachelor’s thesis,
San Antonio, TX, USA, Oct. 2023, pp. 127–132. Dept. Comput. Sci. Technol., Chongqing Univ. Posts Telecommun.,
[87] M. Macas, C. Wu, and W. Fuertes, ‘‘A survey on deep learning for Chongqing, China, Feb. 2023, doi: 10.5281/zenodo.7629888.
cybersecurity: Progress, challenges, and opportunities,’’ Comput. Netw., [110] P. P. Ray, ‘‘ChatGPT: A comprehensive review on background, applica-
vol. 212, Jul. 2022, Art. no. 109032. tions, key challenges, bias, ethics, limitations and future scope,’’ Internet
[88] X. He, S. Li, Z. He, and X. Peng, ‘‘Research on network configuration Things Cyber-Phys. Syst., vol. 3, pp. 121–154, Apr. 2023.
verification based on association analysis,’’ in Proc. 6th Int. Conf. [111] M. Alawida, S. Mejri, A. Mehmood, B. Chikhaoui, and O. I. Abiodun,
Comput. Sci. Appl. Eng., Nanjing, China, Dec. 2022, pp. 1–6. ‘‘A comprehensive study of ChatGPT: Advancements, limitations, and
[89] G. Blanc, Y. Liu, R. Lu, T. Takahashi, and Z. Zhang, ‘‘Interactions ethical considerations in natural language processing and cybersecurity,’’
between AI and cybersecurity to protect future networks,’’ Ann. Information, vol. 14, no. 8, p. 462, Aug. 2023.
Telecommun., vol. 77, pp. 727–729, Nov. 2022. [112] K. Y. Thakkar and N. Jagdishbhai, ‘‘Exploring the capabilities and
[90] D. Samon. (Dec. 2023). Artificial Intelligence’s Function in limitations of GPT and Chat GPT in natural language processing,’’ J.
Cybersecurity. Accessed: Feb. 13, 2024. [Online]. Available: Manage. Res. Anal., vol. 10, no. 1, pp. 18–20, Apr. 2023.
https://www.researchgate.net/publication/376784670Artificial [113] J. Li, ‘‘Security implications of AI chatbots in health care,’’ J. Med.
Intelligence%27sFunctioninCybersecurity Internet Res., vol. 25, Nov. 2023, Art. no. e47551.
[91] G. B. Mensah and L. Acquah, Generative AI, International Cyber- [114] M. Elnawawy, M. Hallajiyan, G. Mitra, S. Iqbal, and K. Pattabiraman,
Security Infrastructure, and Geosynchronous Satellite Banking. Berlin, ‘‘Systematically assessing the security risks of AI/ML-enabled connected
Germany: ResearchGate Preprint, 2023, Accessed: Feb. 14, 2024, doi: healthcare systems,’’ 2024, arXiv:2401.17136.
10.13140/RG.2.2.30417.30567. [115] D. Antonioli, N. O. Tippenhauer, K. Rasmussen, and M. Payer,
[92] M. Ramzan and A. Abbas, ‘‘Mindful machines: Navigating the intersec- ‘‘BLURtooth: Exploiting cross-transport key derivation in Bluetooth
tion of AI, ML, and cybersecurity,’’ J. Environ. Sci. Technol., vol. 2, no. 2, classic and Bluetooth low energy,’’ in Proc. ACM Asia Conf. Comput.
pp. 1–12, Jan. 2024. Commun. Secur., Nagasaki, Japan, May 2022, pp. 196–207.

VOLUME 12, 2024 176775


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

[116] A. Qammar, H. Wang, J. Ding, A. Naouri, M. Daneshmand, and [139] Y. Andrew, C. Lim, and E. Budiarto, ‘‘Mapping Linux shell commands to
H. Ning, ‘‘Chatbots to ChatGPT in a cybersecurity space: Evolution, MITRE ATT&CK using NLP-based approach,’’ in Proc. Int. Conf. Electr.
vulnerabilities, attacks, challenges, and future recommendations,’’ 2023, Eng. Informat. (ICELTICs), Jakarta, Indonesia, Sep. 2022, pp. 37–42.
arXiv:2306.09255. [140] R. K. Jha, ‘‘Strengthening smart grid cybersecurity: An in-depth
[117] R. Pasupuleti, R. Vadapalli, and C. Mader, ‘‘Cyber security issues and investigation into the fusion of machine learning and natural language
challenges related to generative AI and ChatGPT,’’ in Proc. 10th Int. Conf. processing,’’ J. Trends Comput. Sci. Smart Technol., vol. 5, no. 3,
Social Netw. Anal., Manage. Secur. (SNAMS), Nov. 2023, pp. 1–5. pp. 284–301, Sep. 2023.
[118] C. Weeks, A. Cheruvu, S. M. Abdullah, S. Kanchi, D. Yao, and [141] M. Schmitt and I. Flechais, ‘‘Digital deception: Generative artificial intel-
B. Viswanath, ‘‘A first look at toxicity injection attacks on open-domain ligence in social engineering and phishing,’’ 2023, arXiv:2310.13715.
chatbots,’’ in Proc. Annu. Comput. Secur. Appl. Conf., New York, NY, [142] J. Hazell, ‘‘Spear phishing with large language models,’’ 2023,
USA, Dec. 2023, pp. 521–534. arXiv:2305.06972.
[119] K. Wach, J. Ejdys, R. Kazlauskaite, P. Korzynski, G. Mazurek, [143] J. Seymour and P. Tully, ‘‘Generative models for spear phishing posts on
J. Paliszkiewicz, E. Ziemba, and D. Duong, ‘‘The dark side of generative social media,’’ 2018, arXiv:1802.05196.
AI: A critical analysis of controversies and risks of ChatGPT,’’ [144] M. Bethany, A. Galiopoulos, E. Bethany, M. B. Karkevandi,
Entrepreneurial Bus. Econ. Rev., vol. 11, no. 2, pp. 7–24, Jun. 2023. N. Vishwamitra, and P. Najafirad, ‘‘Large language model lateral
[120] Y. Sui, H. Phan, J. Xiao, T. Zhang, Z. Tang, C. Shi, Y. Wang, Y. Chen, and spear phishing: A comparative study in large-scale organizational
B. Yuan, ‘‘DisDet: Exploring detectability of backdoor attack on diffusion settings,’’ 2024, arXiv:2401.09727.
models,’’ 2024, arXiv:2402.02739. [145] P. V. Falade, ‘‘Decoding the threat landscape: ChatGPT, FraudGPT, and
[121] C. Barrett, ‘‘Identifying and mitigating the security risks of generative WormGPT in social engineering attacks,’’ Int. J. Sci. Res. Comput. Sci.,
AI,’’ Found. Trends Privacy Secur., vol. 6, no. 1, pp. 1–52, 2023. Eng. Inf. Technol., vol. 9, pp. 185–198, Oct. 2023.
[122] N. Xu, F. Wang, B. Zhou, B. Zheng Li, C. Xiao, and M. Chen, ‘‘Cognitive [146] M. Sharma, K. Singh, P. Aggarwal, and V. Dutt, ‘‘How well does GPT
overload: Jailbreaking large language models with overloaded logical phish people? An investigation involving cognitive biases and feedback,’’
thinking,’’ 2023, arXiv:2311.09827. in Proc. IEEE Eur. Symp. Secur. Privacy Workshops (EuroS&PW), Delft,
[123] S. Singh, F. Abri, and A. S. Namin, ‘‘Exploiting large language models The Netherlands, Jul. 2023, pp. 451–457.
(LLMs) through deception techniques and persuasion principles,’’ 2023, [147] M. Heim, N. Starckjohann, and M. Torgersen, ‘‘The convergence
arXiv:2311.14876. of AI and cybersecurity: An examination of ChatGPT’s role in
[124] D. R. Polaski and M. J. Brienza, ‘‘Managing AI: Risks and opportunities,’’ penetration testing and its ethical and legal implications,’’ Bach-
PM World, vol. 12, no. 7, pp. 1–12, Jul. 2023. elor’s thesis, Dept. Comput. Technol. Inform., Norwegian Univ.
Sci. Technol., Trondheim, Norway, May 2023. [Online]. Available:
[125] C. Hu and J. Chen, ‘‘A dimensional perspective analysis on the
https://hdl.handle.net/11250/3076387
cybersecurity risks and opportunities of ChatGPT-like information
systems,’’ in Proc. Int. Conf. Netw. Netw. Appl. (NaNA), Aug. 2023, [148] M. Feffer, A. Sinha, W. H. Deng, Z. C. Lipton, and H. Heidari, ‘‘Red-
pp. 324–331. teaming for generative AI: Silver bullet or security theater?’’ 2024,
arXiv:2401.15897.
[126] P. Ananthachari and G. Singh, ‘‘Repercussion of ChatGPT in cybersecu-
[149] T. Naito, R. Watanabe, and T. Mitsunaga, ‘‘LLM-based attack scenarios
rity,’’ Int. J. Res. Publication Rev., vol. 4, no. 2, pp. 1429–1430, Feb. 2023.
generator with IT asset management and vulnerability information,’’ in
[127] M. Sieja and K. Wach, ‘‘Revolutionary artificial intelligence or rogue Proc. 6th Int. Conf. Signal Process. Inf. Secur. (ICSPIS), Nov. 2023,
technology? The promises and pitfalls of ChatGPT,’’ Int. Entrepreneur- pp. 99–103.
ship Rev., vol. 9, no. 4, pp. 101–115, 2023.
[150] F. Teichmann, ‘‘Ransomware attacks in the context of generative artificial
[128] R. Huang, X. Zheng, Y. Shang, and X. Xue, ‘‘On challenges of intelligence—An experimental study,’’ Int. Cybersecur. Law Rev., vol. 4,
AI to cognitive security and safety,’’ Secur. Saf., vol. 2, Jan. 2023, pp. 399–414, Aug. 2023.
Art. no. 2023012.
[151] K. Renaud, M. Warkentin, and G. Westerman, ‘‘From ChatGPT to
[129] O. D. Okey, E. U. Udo, R. L. Rosa, D. Z. Rodríguez, and HackGPT: Meeting the cybersecurity threat of generative AI,’’ MIT Sloan
J. H. Kleinschmidt, ‘‘Investigating ChatGPT and cybersecurity: A per- Manage. Rev., vol. 64, no. 3, pp. 1–4, Aug. 2023.
spective on topic modeling and sentiment analysis,’’ Comput. Secur., [152] B. Yener and T. Gal, ‘‘Cybersecurity in the era of data science: Examining
vol. 135, Dec. 2023, Art. no. 103476. new adversarial models,’’ IEEE Secur. Privacy, vol. 17, no. 6, pp. 46–53,
[130] C. Hutto and E. Gilbert, ‘‘VADER: A parsimonious rule-based model for Nov. 2019.
sentiment analysis of social media text,’’ in Proc. Int. AAAI Conf. Web [153] G. Deng, Y. Liu, V. Mayoral-Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang, Y. Liu,
Social Media, vol. 8, May 2014, pp. 216–225. M. Pinzger, and S. Rass, ‘‘PentestGPT: An LLM-empowered automatic
[131] J. Li, Y. Liu, C. Liu, L. Shi, X. Ren, Y. Zheng, Y. Liu, and Y. Xue, penetration testing tool,’’ 2023, arXiv:2308.06782.
‘‘A cross-language investigation into jailbreak attacks in large language [154] M. Alawida, B. A. Shawar, O. I. Abiodun, A. Mehmood, A. E. Omolara,
models,’’ 2024, arXiv:2401.16765. and A. K. Al Hwaitat, ‘‘Unveiling the dark side of ChatGPT: Exploring
[132] A. Esmradi, D. W. Yip, and C. F. Chan, ‘‘A comprehensive survey of cyberattacks and enhancing user awareness,’’ Information, vol. 15, no. 1,
attack techniques, implementation, and mitigation strategies in Large p. 27, Jan. 2024.
Language Models,’’ in Proc. 3rd Int. Conf. (UbiSec), vol. 2034, [155] W. Tann, Y. Liu, J. H. Sim, C. M. Seah, and E.-C. Chang, ‘‘Using
Nov. 2024, pp. 76–95. large language models for cybersecurity capture-the-flag challenges and
[133] Y. Li, J. Cheng, C. Huang, Z. Chen, and W. Niu, ‘‘NEDetector: certification questions,’’ 2023, arXiv:2308.10443.
Automatically extracting cybersecurity neologisms from hacker forums,’’ [156] F. McKee and D. Noever, ‘‘The evolving landscape of cybersecurity:
J. Inf. Secur. Appl., vol. 58, May 2021, Art. no. 102784. Red teams, large language models, and the emergence of new AI attack
[134] T.-M. Georgescu, ‘‘Natural language processing model for automatic surfaces,’’ Int. J. Cryptography Inf. Secur., vol. 13, no. 1, pp. 1–34,
analysis of cybersecurity-related documents,’’ Symmetry, vol. 12, no. 3, Mar. 2023.
p. 354, Mar. 2020. [157] M. Beckerich, L. Plein, and S. Coronado, ‘‘RatGPT: Turning online LLMs
[135] K. Singh, S. S. Grover, and R. K. Kumar, ‘‘Cyber security vulnerability into proxies for malware attacks,’’ 2023, arXiv:2308.09183.
detection using natural language processing,’’ in Proc. IEEE World AI IoT [158] M. Gupta, C. Akiri, K. Aryal, E. Parker, and L. Praharaj, ‘‘From ChatGPT
Congr. (AIIoT), Seattle, WA, USA, Jun. 2022, pp. 174–178. to ThreatGPT: Impact of generative AI in cybersecurity and privacy,’’
[136] D. O. Ukwen and M. Karabatak, ‘‘Review of NLP-based systems in IEEE Access, vol. 11, pp. 80218–80245, 2023.
digital forensics and cybersecurity,’’ in Proc. 9th Int. Symp. Digit. [159] Y. M. P. Pa, S. Tanizaki, T. Kou, M. van Eeten, K. Yoshioka, and
Forensics Secur. (ISDFS), Elazig, Turkey, Jun. 2021, pp. 1–9. T. Matsumoto, ‘‘An attackers dream? exploring the capabilities of
[137] S. Garg and N. Baliyan, ‘‘MalVulDroid: Tracing vulnerabilities from ChatGPT for developing malware,’’ in Proc. 16th Cybersecur. Express
malware in Android using natural language processing,’’ J. Web Eng., Test Workshop, Marina del Rey, CA, USA, Aug. 2023, pp. 10–18.
vol. 21, no. 8, pp. 2339–2361, Nov. 2022. [160] M. Botacin, ‘‘GPthreats-3: Is automatic malware generation a threat?’’ in
[138] R. Marinho and R. Holanda, ‘‘Automated emerging cyber threat Proc. IEEE Secur. Privacy Workshops (SPW), May 2023, pp. 238–254.
identification and profiling based on natural language processing,’’ IEEE [161] A. Happe, A. Kaplan, and J. Cito, ‘‘LLMs as hackers: Autonomous Linux
Access, vol. 11, pp. 58915–58936, 2023. privilege escalation attacks,’’ 2023, arXiv:2310.11409.

176776 VOLUME 12, 2024


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

[162] M. M. Chowdhury, N. Rifat, M. Ahsan, S. Latif, R. Gomes, and [186] M. A. Ferrag, M. Ndhlovu, N. Tihanyi, L. C. Cordeiro, M. Debbah,
M. S. Rahman, ‘‘ChatGPT: A threat against the CIA triad of cyber T. Lestable, and N. S. Thandi, ‘‘Revolutionizing cyber threat detec-
security,’’ in Proc. IEEE Int. Conf. Electro Inf. Technol. (eIT), May 2023, tion with large language models: A privacy-preserving BERT-based
pp. 1–6. lightweight model for IoT/IIoT devices,’’ IEEE Access, vol. 12,
[163] M. A. Elsadig, ‘‘ChatGPT and cybersecurity: Risk knocking the door,’’ J. pp. 23733–23750, 2024.
Internet Services Inf. Secur., vol. 14, no. 1, pp. 1–15, Dec. 2023. [187] K. Ameri, M. Hempel, H. Sharif, J. Lopez Jr., and K. Perumalla,
[164] M. Al-Hawawreh, A. Aljuhani, and Y. Jararweh, ‘‘ChatGPT for ‘‘CyBERT: Cybersecurity claim classification by fine-tuning the BERT
cybersecurity: Practical applications, challenges, and future directions,’’ language model,’’ J. Cybersecurity Privacy, vol. 1, no. 4, pp. 615–637,
Cluster Comput., vol. 26, no. 6, pp. 3421–3436, Aug. 2023. Nov. 2021.
[165] P. J. Caven, ‘‘A more insecure ecosystem? ChatGPTs influence on [188] Z. Wang, J. Li, S. Yang, X. Luo, D. Li, and S. Mahmoodi, ‘‘A lightweight
cybersecurity,’’ in Proc. 51st Res. Conf. Commun., Inf., Internet Policy, IoT intrusion detection model based on improved BERT-of-theseus,’’
Aug. 2023, pp. 1–15. Expert Syst. Appl., vol. 238, Mar. 2024, Art. no. 122045.
[166] F. Iqbal, F. Samsom, F. Kamoun, and Á. MacDermott, ‘‘When ChatGPT [189] X. Li and H. Fu, ‘‘SecureBERT and LLAMA 2 empowered control area
goes rogue: Exploring the potential cybersecurity threats of AI-powered network intrusion detection and classification,’’ 2023, arXiv:2311.12074.
conversational chatbots,’’ Frontiers Commun. Netw., vol. 4, pp. 1–19, [190] E. Garza, E. Hemberg, S. Moskal, and U. OReilly, ‘‘Assessing large
Sep. 2023. language models knowledge of threat behavior in MITRE ATT&CK,’’
[167] E. Derner and K. Batistič, ‘‘Beyond the safeguards: Exploring the security in Proc. 3rd Workshop Artif. Intell. Enabled Cybersecur. Anal. (KDD),
risks of ChatGPT,’’ 2023, arXiv:2305.08005. Long Beach, CA, USA, Aug. 2023, pp. 1–7.
[168] B. Dash and P. Sharma, ‘‘Are ChatGPT and deepfake algorithms [191] F. Wang, ‘‘Using large language models to mitigate ransomware threats,’’
endangering the cybersecurity industry? A review,’’ Int. J. Eng. Appl. Sci., Preprints, vol. 2023, pp. 1–12, Nov. 2023.
vol. 10, no. 1, pp. 1–5, Jan. 2023. [192] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, ‘‘Examining
zero-shot vulnerability repair with large language models,’’ in Proc. IEEE
[169] M. Rigaki, O. Lukáš, C. Catania, and S. Garcia, ‘‘Out of the cage: How
Symp. Secur. Privacy (SP), May 2023, pp. 2339–2356.
stochastic parrots win in cyber security environments,’’ in Proc. 16th Int.
Conf. Agents Artif. Intell., Rome, Italy, 2024, pp. 774–781. [193] O. Cherqi, Y. Moukafih, M. Ghogho, and H. Benbrahim, ‘‘Enhancing
cyber threat identification in open-source intelligence feeds through an
[170] S. S. Roy, P. Thota, K. V. Naragam, and S. Nilizadeh, ‘‘From chatbots to
improved semi-supervised generative adversarial learning approach with
PhishBots?—Preventing phishing scams created using ChatGPT, Google
contrastive learning,’’ IEEE Access, vol. 11, pp. 84440–84452, 2023.
bard and claude,’’ 2023, arXiv:2310.19181.
[194] F. Li, H. Shen, J. Mai, T. Wang, Y. Dai, and X. Miao, ‘‘Pre-
[171] T. Koide, N. Fukushi, H. Nakano, and D. Chiba, ‘‘Detecting phishing sites
trained language model-enhanced conditional generative adversarial
using ChatGPT,’’ 2023, arXiv:2306.05816.
networks for intrusion detection,’’ Peer Peer Netw. Appl., vol. 17, no. 1,
[172] L. Jiang, ‘‘Detecting scams using large language models,’’ 2024, pp. 227–245, Nov. 2023.
arXiv:2402.03147.
[195] M. Markevych and M. Dawson, ‘‘A review of enhancing intrusion
[173] F. Heiding, B. Schneier, A. Vishwanath, J. Bernstein, and P. S. Park, detection systems for cybersecurity using artificial intelligence (AI),’’ in
‘‘Devising and detecting phishing: Large language models vs. smaller Proc. Int. Conf. Knowl. Based Org., Sibiu, Romania, Jul. 2023, pp. 1–9.
human models,’’ 2023, arXiv:2308.12287.
[196] M. Guastalla, Y. Li, A. Hekmati, and B. Krishnamachari, ‘‘Application
[174] M. Enis and M. Hopkins, ‘‘From LLM to NMT: Advancing low-resource of large language models to DDoS attack detection,’’ in Proc. Int. Conf.
machine translation with claude,’’ 2024, arXiv:2404.13813. Secur. Privacy Cyber-Phys. Syst. Smart Vehicles, Oct. 2024, pp. 83–99.
[175] F. Trad and A. Chehab, ‘‘Prompt engineering or fine-tuning? A case study [197] V. Mikhalev, N. Kopal, and B. Esslinger, ‘‘Evaluating GPT-4s proficiency
on phishing detection with large language models,’’ Mach. Learn. Knowl. in addressing cryptography examinations,’’ Cryptologia, vol. 48, no. 1,
Extraction, vol. 6, no. 1, pp. 367–384, Feb. 2024. pp. 1–10, Mar. 2024.
[176] E. Cambiaso and L. Caviglione, ‘‘Scamming the scammers: Using [198] Z. Wang, F. Yang, L. Wang, P. Zhao, H. Wang, L. Chen, Q. Lin, and
ChatGPT to reply mails for wasting time and resources,’’ 2023, K.-F. Wong, ‘‘Self-guard: Empower the LLM to safeguard itself,’’ 2023,
arXiv:2303.13521. arXiv:2310.15851.
[177] J. McHugh, ‘‘Defensive AI: Experimental study,’’ Ph.D. dissertation, [199] Y. V. Shchavinsky, T. M. Muzhanova, Y. M. Yakymenko, and
Dept. Cybersecurity, Marymount Univ., Arlington, VA, USA, Apr. 2023. M. M. Zaporozhchenko, ‘‘Application of artificial intelligence for
[178] M. Kaheh, D. K. Kholgh, and P. Kostakos, ‘‘Cyber sentinel: Exploring improving situational training of cybersecurity specialists,’’ Inf. Technol.
conversational agents in streamlining security tasks with GPT-4,’’ 2023, Learn. Tools, vol. 97, no. 5, pp. 215–226, Oct. 2023.
arXiv:2309.16422. [200] J. Marshall, ‘‘What effects do large language models have on
[179] T. McIntosh, T. Liu, T. Susnjak, H. Alavizadeh, A. Ng, R. Nowrozy, cybersecurity,’’ Old Dominion Univ., Norfolk, VA, USA, Tech.
and P. Watters, ‘‘Harnessing GPT-4 for generation of cybersecurity GRC Rep., May 2023. [Online]. Available: https://digitalcommons.
policies: A focus on ransomware attack mitigation,’’ Comput. Secur., odu.edu/covacci-undergraduateresearch/2023spring/projects/15
vol. 134, Nov. 2023, Art. no. 103424. [201] H. J. Kam, C. Zhong, H. Liu, and A. Johnston, ‘‘The blend of human
[180] M. Lempinen, A. Juntunen, and E. Pyyny, ‘‘Chatbot for assessing cognition and AI automation: What will ChatGPT do to the cybersecurity
system security with OpenAI GPT-3.5,’’ Bachelor’s thesis, Dept. Comput. landscape?’’ in Proc. Dewald Roode Workshop Inf. Syst. Secur. Res.,
Sci. Eng., Univ. Oulu, Oulu, Finland, Jun. 2023. [Online]. Available: Jun. 2023, pp. 1–22.
https://oulurepo.oulu.fi/handle/10024/42952 [202] M. A. Hadi, M. N. Abdulredha, and E. Hasan, ‘‘Introduction to
[181] S. G. Prasad, V. C. Sharmila, and M. K. Badrinarayanan, ‘‘Role ChatGPT: A new revolution of artificial intelligence with machine
of artificial intelligence based chat generative pre-trained transformer learning algorithms and cybersecurity,’’ Sci. Arch., vol. 4, no. 4,
(ChatGPT) in cyber security,’’ in Proc. 2nd Int. Conf. Appl. Artif. Intell. pp. 276–285, 2023.
Comput. (ICAAIC), Salem, India, May 2023, pp. 107–114. [203] S. Biswas, ‘‘Role of ChatGPT in cybersecurity,’’ SSRN Preprint, vol.
[182] A. Zaboli, S. L. Choi, T.-J. Song, and J. Hong, ‘‘ChatGPT and other large 2023, pp. 1–3, Jan. 2023, doi: 10.2139/ssrn.4403584.
language models for cybersecurity of smart grid applications,’’ 2023, [204] M. Ayaim, ‘‘How ChatGPT can be used as a defense mechanism for cyber
arXiv:2311.05462. attacks,’’ Old Dominion Univ., Norfolk, VA, USA, Tech. Rep., Dec. 2023.
[183] T. Ali and P. Kostakos, ‘‘HuntGPT: Integrating machine learning-based [Online]. Available: https://digitalcommons.odu.edu/covacci-
anomaly detection and explainable AI with large language models undergraduateresearch/2023fall/projects/15
(LLMs),’’ 2023, arXiv:2309.16021. [205] E. Karlsen, X. Luo, N. Zincir-Heywood, and M. Heywood, ‘‘Benchmark-
[184] M. A. Ferrag, A. Battah, N. Tihanyi, R. Jain, D. Maimut, F. Alwahedi, ing large language models for log analysis, security, and interpretation,’’
T. Lestable, N. S. Thandi, A. Mechri, M. Debbah, and L. C. Cordeiro, 2023, arXiv:2311.14519.
‘‘SecureFalcon: Are we there yet in automated software vulnerability [206] S. Shafee, A. Bessani, and P. M. Ferreira, ‘‘Evaluation of LLM chatbots
detection with LLMs?’’ 2023, arXiv:2307.06616. for OSINT-based cyber threat awareness,’’ 2024, arXiv:2401.15127.
[185] M. Das Purba, A. Ghosh, B. J. Radford, and B. Chu, ‘‘Software [207] F. Okeke, ‘‘An assessment of the use of generative AI in cybersecurity:
vulnerability detection using large language models,’’ in Proc. IEEE 34th Challenges and opportunities,’’ Bournemouth Univ., Bournemouth, U.K.,
Int. Symp. Softw. Rel. Eng. Workshops (ISSREW), Oct. 2023, pp. 112–119. Tech. Rep., Dec. 2023, doi: 10.13140/RG.2.2.20613.12001.

VOLUME 12, 2024 176777


I. Hasanov et al.: Application of Large Language Models in Cybersecurity: A SLR

[208] S. Neupane, I. A. Fernandez, S. Mittal, and S. Rahimi, ‘‘Impacts and risk ANTTI HAKKALA received the D.Sc. (Tech.)
of generative AI technology on cyber defense,’’ 2023, arXiv:2306.13033. degree in communication systems from the Uni-
[209] P. Botu, ‘‘Consciousness for AGI,’’ in Proc. 10th Annu. Int. Conf. Bio- versity of Turku, Finland, in 2017. He is currently a
logically Inspired Cogn. Archit. (BICA), Seattle, WA, USA, Aug. 2019, University Teacher of communication systems and
pp. 365–372. cyber security with the Department of Computing,
[210] (2024). University of Turku Guideline on Artificial Intelligence in University of Turku. He has 15 years of experience
Teaching and Studying. Accessed: Mar. 31, 2024. [Online]. Available: in teaching engineering students on cyber security
https://utuguides.fi/artificialintelligence and communication systems engineering and has
supervised over 100 bachelor’s and master’s theses
on cyber security topics. His current research
interests include application of AI and LLMs to cyber and network security,
digital forensics, and security and privacy in the networked information
ISMAYIL HASANOV received the M.Sc. degree in society.
information and communication technology from
the University of Turku, Finland, in 2023. He is
currently a part-time Doctoral Researcher with the
University of Turku and a NOC and SOC Engineer
with Openfactory Nordic oy. He has over six years
of experience in networking technologies. His
current research interests include network security,
cybersecurity policies, the security of LLMs, and
the application of AI in cybersecurity.

JOUNI ISOAHO received the M.Sc. (Tech)


SEPPO VIRTANEN (Senior Member, IEEE) degree in electrical engineering and the Lic.Tech.
received the D.Sc. (Tech.) degree in communi- and Dr.Tech. degrees in information technology
cation systems from the University of Turku, from Tampere University of Technology, Finland,
Finland, in 2004. He is currently a Professor of in 1989, 1992, and 1995, respectively. Since 1999,
cyber security engineering with the Department he has been a Professor with the University
of Computing, University of Turku. His current of Turku, Finland. The core of his research is
research interests include the application of arti- communication and cyber security technologies.
ficial intelligence and large language models to His current research interests include security of
network and cyber security, security of smart autonomous systems and AI, human and societal
environments, and cyber security in digitalization cybersecurity, and smart technology and digitalization.
and societal processes.

176778 VOLUME 12, 2024

View publication stats

You might also like