JCP 04 00040
JCP 04 00040
1 Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University,
Kenitra 14000, Morocco; [email protected] (N.A.A.); [email protected] (Y.E.B.E.I.);
[email protected] (A.Z.F.)
2 Laboratory of Economic Analysis and Modelling, Faculty of Law, Economic and Social Sciences Souissi,
Mohammed V University, Rabat 12000, Morocco
3 ACG Cybersecurity Head Office, 3 Soufflot Street, Cabinet PCH, 75005 Paris, France;
[email protected] (B.S.); [email protected] (D.M.)
4 Laboratory of ACG Cybersecurity, Campus Cyber, 5-7 Bellini Street, Puteaux, 92800 Paris, France
* Correspondence: [email protected]
Abstract: The number of new vulnerabilities continues to rise significantly each year. Simultaneously,
vulnerability databases have challenges in promptly sharing new security events with enough
information to improve protections against emerging cyberattack vectors and possible exploits. In this
context, several organizations adopt strategies to protect their data, technologies, and infrastructures
from cyberattacks by implementing anticipatory and proactive approaches to their system security
activities. To this end, vulnerability management systems play a crucial role in mitigating the
impact of cyberattacks by identifying potential vulnerabilities within an organization and alerting
cyber teams. However, the effectiveness of these systems, which employ multiple methods and
techniques to identify weaknesses, relies heavily on the accuracy of published security events. For
this reason, we introduce a discussion concerning existing vulnerability detection methods through
an in-depth literature study of several research papers. Based on the results, this paper points
out some issues related to vulnerability databases handling that impact the effectiveness of certain
vulnerability identification methods. Furthermore, after summarizing the existing methodologies,
Citation: Bennouk, K.; Ait Aali, N.;
El Bouzekri El Idrissi, Y.; Sebai, B.; this study classifies them into four approaches and discusses the challenges, findings, and potential
Faroukhi, A.Z.; Mahouachi, D. A research directions.
Comprehensive Review and
Assessment of Cybersecurity Keywords: vulnerability detection; CPE; CVE; CWE; AI model; graph representation; feature model;
Vulnerability Detection similarity matching algorithm; VMS; cybersecurity
Methodologies. J. Cybersecur. Priv.
2024, 4, 853–908. https://doi.org/
10.3390/jcp4040040
In contrast, the adoption rate of solutions for anticipating and reducing cyber risks remains
insufficient. This is concerning as the frequency and complexity of cyberattacks increase
proportionally with the growth of digital transformation and Industry 4.0 in both IT and OT
ecosystems [4]. In this context, and for cyber experts, it is legitimate to define an accurate
context in terms of assets control; this step is very crucial during the risk assessment process
and constitutes the cornerstone of cyberattacks detection, prediction, and anticipation.
In general, to acquire a realistic picture of an organization’s system configuration, a
vulnerabilities management system (VMS) can be implemented to supervise and monitor
the system state and consequently minimize potential damage from cyberattacks. These
systems are regarded as a strategy that contributes to human efforts for detecting faults
or vulnerabilities in an organization’s information system, internal controls, or system
operations. Based on the asset mapping process [5], the VMS discovers potential cyber risks
by detecting, assessing, and rating the magnitude of vulnerabilities that might impact soft-
ware, hardware products, Operator Systems (OS), and Operational Technologies (OT) [6].
More specifically, the majority of VMS performs their aims through four broad phases:
inspection and scanning, vulnerability identification, analysis, and reporting. Furthermore,
the VMS has to be linked to Vulnerability Databases (VDBs) so that it may be fed with the
most recent vulnerabilities and complementary metadata. This step remains so crucial in
determining a system for patching priority process. In this field, the handling of vulnera-
bility activities and system configuration is a complex process that involve two essential
features: CVE (Common Vulnerability and Exposures) feeds and CPE (Common Platform
Enumeration). CVE is a part of the SCAP specification [7]; it represents a method for
assigning identifiers to the publicly known vulnerabilities and providing information about
the vulnerabilities, whereas CPE specifies a naming scheme for applications, hardware
devices, and operating systems [8].
In this context, fully automated vulnerability analysis refers to the capability of VMS to
assign a CPE identifier to a configuration product and extracted information (CPE entries)
from multiple open VDBs (CVE feeds) in order to perform a series of scans related to
potential vulnerabilities without human interaction. Unfortunately, this operation follow a
complex procedure which outputs globally a significant rate of false positives or negatives
and is qualified as impractical and error-prone [9]. Actually, the wide range of configuration
systems increases the workload of security analysts, making it both time-consuming and
error-prone when handled manually. The aforementioned difficulties in this context refer
to CVE feeds without CPE entries, software products without assigned CPE, the CPE
dictionary deprecation issues, or VDBs synchronization between the CPE dictionary and
CVE feeds [5,10]. Another issue is the inconsistency challenge of program names across
multiple VDBs [11]. It is worth noting that a fully automated CPE assignment is prone to
errors owing to CPE and CVE shortcomings (inconsistencies in VDBs and software naming
specification difficulties). As a result, the mismatching and inconsistencies might have
serious consequences related to dissemination of inaccurate vulnerability information. In
this study, we attempt to highlight the existing methods incorporated by various VMSs
that enable the matching process between the asset mapping of an Information System (IS)
and multiple VDBs since 2016. We also examine the methodology of each approach and
provide suggestions for future work. The main contributions of this paper are summarized
as follows:
• Conduct a security vulnerability database study to assess data inconsistency and
identify issues;
• Classify and analyze vulnerability detection methods according to multiple approaches;
• Provide details of presentation and comprehensive analysis of the drawbacks and
limitations of existing vulnerability detection methods in each approach;
• Categorize existing vulnerability detection methods by approaches based on
related papers.
J. Cybersecur. Priv. 2024, 4 855
2. Research Methodology
The methodology adopted followed a systematic literature review (SLR), proposed by
authors in [12], to derive conclusions and reflections about the above research questions.
This academic approach helped us gather, examine, sort, and study the pertinent papers
within the topic frame. The recommended guidelines of this method consist of three
main stages:
• Planning the review, which focuses first on the identification of the need for a review,
their proposal, and the development of their protocol;
• Conducting the review involves identifying the research using predefined keywords
and search strings, selecting the studies based on inclusion and exclusion criteria,
performing a study quality assessment using predefined criteria and checklists, ex-
tracting data, and monitoring progress before summarizing findings and providing
data synthesis;
• Reporting recommendations and disseminating evidence through a descriptive analy-
sis of findings and insights.
Consulting several reputable academic libraries helped us to gather pertinent articles
related to our subject and respond to the research questions. These libraries are as follows:
1. ACM (Association for Computing Machinery) digital library;
2. JSTOR;
3. IEEE Xplore digital library;
4. MDPI;
5. ScienceDirect;
6. Scopus;
7. Springer;
8. Web of Science.
The current study aims to collect pertinent papers published from 2016 to 2024. To
this end, many specific keywords are used in the research methodology during this period,
such as: “CPE and CVE”, “vulnerability detection”, “vulnerability assessment”, “CWE
and vulnerabilities”, “matching vulnerabilities”, “asset inventory and CPE”, “vulnerability
detection and AI”, “CVE and CPE by graph”, “CVE and CPE by FM” and “VMS and
vulnerability detection”.
As shown below in Figures 1 and 2, the research method consisted of four procedures
to gather the most significant papers related to our subject. The first stage involves gathering
and building a global overview of the scientific contributions found in the literature review.
Next, this study initially retrieved 846 papers from the academic libraries. By eliminating
duplicates and out-of-scope papers, and classifying the publications using the abstract and
title, the paper number was reduced to 487 papers. Then, 256 articles were selected by
using predetermined criteria relevant to our topic. The following criteria were adopted:
• Papers published within the last 8 years;
• Relevant papers according to the research question posed previously;
• Papers suggesting vulnerability detection methods;
selected
J. Cybersecur. Priv. 2024, 4, x FOR PEER REVIEW by using predetermined criteria relevant to our topic. The following4 criteria
of 58 wer
adopted:
• Papers published within the last 8 years;
J. Cybersecur. Priv. 2024, 4 •selected
Relevant papers
by using according
predetermined to therelevant
criteria research question
to our posed
topic. The previously;
following criteria were
856
• Papers suggesting vulnerability detection methods;
adopted:
•• Methods
Papers leveraging
published withinthe
theusage
last 8 of basic security metadata or AI techniques;
years;
•• Methods leveraging
Papers offering
Relevant the usage of
well-documented
papers according basic security
research
to the research metadata
on the
question orpreviously;
AI techniques;
proposed
posed methods.
•• Papers
Papers offering well-documented
suggesting vulnerability research on
detection the proposed methods.
methods;
To provide unbiased research, the analysis was limited to academic contributions fo
• To provideleveraging
Methods unbiased theresearch,
usage of thebasic
analysis wasmetadata
security limited to
or academic contributions
AI techniques;
cusing on
focusing on the described
the describedmethods
methods relative
relative to to topic.
our our topic. Ultimately,
Ultimately, data data analysis
analysis results result
• Papers offering well-documented research on the proposed methods.
(125 articles) were separated into two studies: the main study, which conducts
(125 articles) were separated into two studies: the main study, which conducts a thorough a thoroug
To provide unbiased research, the analysis was limited to academic contributions fo-
anddeep
and deep investigation
investigation of theofarticle’s
the article’s
content,content, and the connected
and the connected study, whichstudy, which is suffi
is sufficiently
cusing on the described methods relative to our topic. Ultimately, data analysis results
ciently investigated
investigated to derive
to derive further further
insights andinsights and future contributions.
future contributions.
(125 articles) were separated into two studies: the main study, which conducts a thorough
and deep investigation of the article’s content, and the connected study, which is suffi-
ciently investigated to derive further insights and future contributions.
Figure 1.Process
Process
Figure1.1.Process
Figure of of
ofthe
the methodology
themethodology
methodologyused
usedthe
usedininthe
inliterature
the literature
literaturereview.
review.
review.
35
35
30 30
30
30
26 26
25
25
20
20
15 13 13
15 13 13 11
10 10 11
10 10 10
7
10 5 7
5 5
5
0
0 2016 2017 2018 2019 2020 2021 2022 2023 2024
2016 2017 2018 2019 2020 2021 2022 2023 2024
Figure2.2.Distribution
Figure Distributionby
byyear
yearof
ofthe
theanalysis
analysisstudy.
study.
Figure 2. Distribution by year of the analysis study.
Thus,the
Thus, thepreviously
previouslyused
usedmethodology
methodologyframed
framedour
ourstudy
studyto tofind
findpertinent
pertinentpapers
papers
according to our research topic. In the following section, we will present motivation, some
according to our research topic. In the following section, we will present motivation, some
Thus, the previously used methodology framed our study to find pertinent paper
basic cybersecurity concepts, and an overview of security events published in the National
according to our research topic. In the following section, we will present motivation, som
Vulnerability Database (NVD).
J. Cybersecur. Priv. 2024, 4 857
3.1. Motivation
Specifying the precise inventory is so crucial for assessing vulnerabilities. In other
words, detecting vulnerabilities that may affect inventory products remains a complicated
with a high incidence of false positives and negatives. Meanwhile, this operation requires
two vectors, notably the specification of the installed products and their associated vulnera-
bilities. These relevant data are retrieved from the target system, the cybersecurity event
management databases, websites, and other sources. As a result, the mapping process iden-
tifies the target products potentially affected by vulnerabilities. Unfortunately, automating
this process faces multiple challenges [5,10,11]:
• Various configuration systems impact product inventories and technical content
of VDBs;
• Product properties, such as name, version, and edition, might change frequently
affecting mapping with VDBs and inventory systems;
• Vulnerability databases that list the same product under different properties have
inconsistent product names (character and semantics);
• Inconsistencies in vulnerability databases, including both structured and unstructured
product names.
• Relevant insights may reveal CVE feeds without CPE entries;
• Some product vulnerabilities, including software, hardware, and operating systems
are published without assigned CPE;
• Product identity is not unified across information systems and VDBs;
• Some CVE feeds contain CPE entries that are not in the CPE dictionary;
• The high rate of false positives and negatives in the vulnerability detection process.
review, and Act: Monitor and enhance the risk treatment plan. In contrast, the NIST SP
800 30 aims to analyze risks using three major steps: S1—Risk assessments look at the
risks across all organizational levels, S2—Focus on business processes, considering sales,
marketing, or HR (Human Resources) procedures, and S3—Leverage the technological
level by integrating applications, systems, and information flows [18].
Impact: Defines the magnitude of the harm that can be expected from unauthorized
disclosure, alteration, or destruction of information, and loss of information or system
availability [19]. These repercussions can affect confidentiality, integrity, availability, or
all three.
Security measures: They encompass any processes, policies, devices, practices, or
other activities that may be administrative, technological, managerial, or legal in nature
that are meant to change a risk state. Classified by their function, security measures can be
preventive, detective, or corrective [20,21].
Exploit: It refers to the frequency of attacks targeting assets, exploiting a specific
vulnerability, and the likelihood of a vulnerable system being attacked [22].
Assets: It includes data, personnel, devices, systems, and facilities that enable the
organization to achieve business objectives [20]. Assets may be divided into two groups:
Physical assets include money, equipment, stocks, and items, as well as network and server
infrastructure, etc. Virtual assets include accounts, data, business plans, and reputation, etc.
CIA: Confidentiality (C) ensures information is not made available or disclosed to
unauthorized individuals, entities, or processes (authentication, authorization, and ac-
cess control). Integrity (I) protects the accuracy and completeness of assets (information
changed). Availability (A) ensures that assets are accessible and usable on demand by an
authorized entity [23].
Attack Vector (AV): Specific path or scenario used by a hacker or malicious actor to
exploit vulnerabilities and gain access to a target system [24].
Access Complexity (AC): A metric capturing the actions an attacker must take to evade
or bypass security measures to exploit a vulnerability [22,25].
In summary, cyberspace integrates software, internet services, information technolo-
gies (IT), telecommunications networks, and technology infrastructures. This virtual
environment links all previous cyber items directly or indirectly. As shown in Figure 3, any
organization, regardless of size, could be able to possess one or more potential vulnerabili-
ties in their assets that might be exploited by a threat and launch a potential attack. The
exploitation of vulnerabilities may turn into a major risk assessed based on their impact
and occurrence. The organization rates the risks and vulnerabilities’ severity by using a
risk assessment in a context-aware manner. It then elaborates a mitigation plan to reduce
the risk impact regarding the CIA by implementing the necessary security measures. This
step follows a risk management process, as shown below in Figure 4. In addition, residual
risks may remain even after applying the necessary safety measures. This fact implies a
continuous process of control and supervision to prevent further impacts [26,27]. The result
should be communicated for tracking and making timely decisions.
Figure
Figure
Figure3. 3.
3.Interaction
Interaction
Interactionbetween
between information
information
between security
security
information cyber cybercyber
items.
security items.items.
Figure
3.2.2.
Figure 4. Risk
4.Cyber
Risk management
Concepts
management processprocess
[15]. [15].
We have focused
Vulnerability on the key
Management components
System (VMS): It of the following
represents cyber that
a capability concepts to help read
identifies
3.2.2.
ers Cyber Concepts
understand the content of this paper.
CVEs present on devices that attackers may exploit to compromise them, thereby using
them asCyberattack:
We platforms toMalicious
have focused further activity
on compromise
the aimed
otheratsegments
key components collecting,
of the disrupting,
offollowing
the network denying,
cyber degrading
[29]. concepts
VMSs to h
or
ersdestroying
incorporate
understand information
multiple hybrid
the system
systems
content of toresources
thisdetect or thecyber
potential
paper. information itselfacross
risk presence [18]. diverse
ecosystems
Cyber and assess their
resilience: Thecyber state.to continuously deliver the intended outcome despit
ability
Cyberattack: Malicious activity aimed at collecting, disrupting, denying, de
As shown in Figure 5, the implementation
adverse cyber events, encompassing the identification, of VMS in different ecosystems (IoT, and
IT, repor
or destroying
cloud-based information
systems, ICS, and system
others) resources
depends on the the evaluation,
orqualityinformation
and
treatment,
quantity itself
of the[18].
data
ing of system and software vulnerabilities [20].
gatheredCyber resilience:
from multiple The ability
vulnerability to continuously
databases (VDBs) and the deliver thetointended
capability outcome
collect infor-
Cyber threat: Any circumstance or event that has the potential to harm organizationa
mation through
adverse cyber scanning
events,operations about
encompassing products affected by
the identification, known vulnerabilities.
evaluation, In
treatment, an
operations,
this context, assets,
the individuals,
primary function other
of the organizations,
VMS is to or the
perform a nationmapping
logical by gaining unauthorize
between
ing of system
access, causing and software
destruction, vulnerabilities
disclosure, [20]. of information, and/or denial of se
modification
CPE/VDBs and product ID, ensuring accurate results while reducing the occurrence of
Cyber
vicepositives
false threat: Any circumstance
[28]. and false negatives [6]. or event that has the potential to harm organ
operations, assets,Management
Vulnerability individuals,System
other organizations, or theanation
(VMS): It represents by gaining
capability unau
that identifie
access,present
CVEs causingon destruction, disclosure,
devices that attackers modification
may of information,
exploit to compromise them, and/or
thereby deni
usin
vice [28].
them as platforms to further compromise other segments of the network [29]. VMSs in
corporate multiple hybrid
Vulnerability systems to
Management detect (VMS):
System potentialItcyber risk presence
represents across divers
a capability that i
ecosystems and assess their cyber state.
CVEs present on devices that attackers may exploit to compromise them, there
themAsasshown in Figure
platforms 5, the implementation
to further compromise otherof VMS in different
segments ecosystems
of the network (IoT,
[29]. IT
V
cloud-based systems, ICS, and others) depends on the quality and quantity of the dat
corporate multiple hybrid systems to detect potential cyber risk presence acros
information through scanning operations about products affected by known vulnerabil
ties. In this context, the primary function of the VMS is to perform a logical mapping b
J. Cybersecur. Priv. 2024, 4 tween CPE/VDBs and product ID, ensuring accurate results while reducing the860 occurrenc
of false positives and false negatives [6].
Format Description Cpex = {〈part, v1〉, 〈vendor, v2〉, 〈product, v3〉. . ..., 〈other, vn〉}
Representation
Well-Format
WFN wfn:[part = “a”,vendor = “microsoft”, product = “internet_explorer”, version = “8\.0\.6001”,
Name v1⟩, ⟨vendor,
Cpex = {⟨part,update = “beta”] v2⟩, ⟨product, v3⟩….., ⟨other, vn⟩}
Well-Format
WFN Uniform Resource wfn:[part
CPE = “a”,vendor = “microsoft”, product = “internet_explorer”,
= cpe:/{part}:{vendor}:{product}:{version}:{update}:{edition}:{language}. version
URI Name
Identifiers “8\.0\.6001”, update = “beta”]
cpe:/a:microsoft:internet_explorer:8.0.6001:beta
Uniform
Format String Resource CPE = cpe:/{part}:{vendor}:{product}:{version}:{update}:{edition}:{language}
cpe:2.3:part:vendor:product:version:update:edition:language:sw_edition:target_sw:target_hw:other
URI
FSB
BindingIdentifiers cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
cpe:/a:microsoft:internet_explorer:8.0.6001:beta
cpe:2.3:part:vendor:product:version:update:edition:language:sw_edition:tar
Format String
A well-formed CPE name (WFN), an abstract logical construction, refers to this CPE
FSB get_sw:target_hw:other
Binding
naming method. The CPE naming specification defines procedures for binding WFNs
cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
to machine-readable encodings and for reversing these encodings back to WFNs [31].
The CPE standard defines eleven attributes in WFN format. Part (1) may contain “a”
A well-formed
for applications, CPE
“o” for name (WFN),
operating systems,an orabstract logical construction,
“h” for hardware devices. Therefers
vendorto this CP
naming
(2) method.
identifies The CPE
an individual naming
or an specification
organization defines
responsible for procedures
producing orfor binding WFNs
developing
machine-readable
the item. The officialencodings
product name and isforidentified
reversing bythese encodings
part (3). Versionback to WFNs
(4), update (5), [31]. Th
and sw_edition (6) specify version and update details, with edition
CPE standard defines eleven attributes in WFN format. Part (1) may contain “a” (7) typically set to for appl
ANY unless backward compatibility requires a specific value related to the
cations, “o” for operating systems, or “h” for hardware devices. The vendor (2) identifieproduct. The
user interface language (8) tag follows the RFC 5646 definition [32], while target_sw (9)
an individual or an organization responsible for producing or developing the item. Th
denotes the product’s operating environment. Target_hw (10) specifies the hardware
official product name is identified by part (3). Version (4), update (5), and sw_edition (
architecture. Finally, other (11) provides additional information supporting specifications
specify version
referenced in [8,33].and update details, with edition (7) typically set to ANY unless backwar
compatibility
The public canrequires a specific
gain access value
to CPE related to
information thethe
using product. Theas
NIST API, user interface
shown below languag
in(8) tag follows
Figures the RFC
6–10, which 5646data
ensures definition
accuracy, [32], while target_sw
reliability, (9) denotes
and accessibility. Figurethe product’s op
6 above
eratinganenvironment.
presents example extract Target_hw (10)The
from a query. specifies
NIST API theis hardware architecture.
used to preserve and makeFinally,
the oth
CPE
(11)data available.
provides CPE can be
additional extracted from
information CVE/metadata
supporting and the NIST/dictionary,
specifications as
referenced in [8,33].
shownThe below in Figure 11. Figures 9–12 illustrate the annual collection of CPE
public can gain access to CPE information using the NIST API, as shown belo via Python
scripts from 2016
in Figures 6–10,towhich
2024, highlighting
ensures data partitions
accuracy, forreliability,
hardware (h), andoperating systems
accessibility. (o), 6 abov
Figure
and applications (a).
presents an example extract from a query. The NIST API is used to preserve and make th
CPE data available. CPE can be extracted from CVE/metadata and the NIST/dictionary, a
shown below in Figure 11. Figures 9–12 illustrate the annual collection of CPE via Pytho
CVE/metadata and the NIST/dictionary, as shown below in Figure 11. Figure 9, Figu
Figure 11 and Figure 12 illustrate the annual collection of CPE via Python scripts
J. Cybersecur. Priv. 2024, 4 2016 to 2024, highlighting partitions for hardware (h), operating systems
861 (o)
applications (a).
100
100
90
100
90
8090
80
7080
70
6070
60
5060
50
4050
40
3040
30
30
2020
20
1010
010
0
0 2016
2016 20172017 2018
2018 2019 2020
2019 2020 20212021 2022
2022 20232023 20242024
2016 2017 2018 2019 2020 2021 2022 2023 2024
o:o:Operating
Operating Systems (%)
Systems (%) a:Applications
a: Applications(%)
(%) h:h: Hardware
Hardware (%)(%)
o: Operating Systems (%) a: Applications (%) h: Hardware (%)
Figure
Figure9.
Figure 9.9.Distribution
Distribution of
Distributionof CPEsnumber
ofCPEs numberextracted
number extracted
extracted from
from
from NVD/CPE
NVD/CPE
NVD/CPE DICT
DICT
DICT byby
by partition.
partition.
partition.
Figure 9. Distribution of CPEs number extracted from NVD/CPE DICT by partition.
7070
70
6060
60
5050
50
40
4040
30
3030
20
2020
10
1010
0
00 2016 2017 2018 2019 2020 2021 2022 2023 2024
2016 2017
2016 2017 20182018 2019 2020
2020 20212021 2022
2022 2023 2023 2024 2024
a: Applications (%) o: Operating Systems (%) h: Hardware (%)
a:a:Applications
Applications(%)
(%) o: Operating
o: OperatingSystems
Systems(%)
(%) h:h:Hardware
Hardware(%)(%)
12.Similarity
Figure 12.
Figure Similarityrate of of
rate CPEs between
CPEs NVD/dictionary
between and NVD/CVE.
NVD/dictionary and NVD/CVE.
Common Vulnerability and Exposure (CVE): It is a program maintained by the MITRE
Table 2. Security vulnerability databases (VDBs).
Corporation [34] and sponsored by the U.S. Department of Homeland Security (DHS)
VDBs CVE 1 and the
NVD 2 Cybersecurity
Mitre 3 andVulDB
Infrastructure
4 Security
Security Agency
DB 5 (CISA)
VulnDB[35]. 6It focuses on
ExploitDB 7
representing a nomenclature and dictionary of security-related product flaws. Every
Operated Mitre Mitre Scip Risk-based Offensive
CVE ID is assigned to the respective product by authorized
NIST Varies organizations known as CVE
by Corp Numbering CorpAuthorities (CNAs). AG The National Vulnerability Database security
(NVD) manages security
the
analysis process for each CVE ID, incorporating reference tags, the Common Vulnerability Security
vulnerability’s
Scoring System (CVSS), the Common Weakness Enumeration (CWE), and CPE Applicability
Vulnerabilities vulnerabilities
Statements [36]. It is worth noting that the number of CVEs publishedtechnical
by the NVD increases
technical Security Affected
CVE ID CVE ID Figures 9–12 provide statistics on supplemental CPE information
annually. details between 2016
details research software or
Description Description CVEhighlights
and 2024. This ID a notable discrepancy between CPEs released Mitigation
with CVE/metadata
Data and those listed Exploit papers technical
Severity Metrics CVEin the dictionary. Additionally, it is important to recognize that not
strategies all
delivered availability, Exploit
CPEs affected by disclosed vulnerabilities are covered in every CVE entry. description of
Product CPE Program Exploit
Impact
Common Weaknesses Enumeration (CWE):security,
It can be understood as a state within the
Version References
a hardware, software, firmware, or service component that,
information
under specific conditions,
References Events systems
can lead to vulnerabilities. CWEaffected
incorporates a taxonomy to identify Other common sources exploit
Product Relevant
of weaknesses [37]. Resources
code
Scoring system: Each year sees an increase in the number of published vulnerabilities,
with a notable peak in Limited free access,
2021, as illustrated in Figure Limited
7, while their severity remains influenced
by various factors. InSubscription for more a scoring
this context, employing version
system becomes essential to classify
complexity and prioritize assessment
information processes. is
and Infree
our literature review, we identified
Free access Yes
four distinct scoring systems. The Common Vulnerability No Yes
services (Just one Scoring System (CVSS) is the
first method used to address the vulnerability impact using qualitative representation (low,
(Commercial product is
medium, high, and critical) and quantitative measures of severity (a scale from 0 to 10).
The last version, CVSSor V4.0,
Enterprise)
was released inmonitored)
November 2023. It adds more information,
including significant changes from the previousDaily for of CVSS V3.x and V2.x, addi-
versions
tional scoring guidance, and scoring rubrics [38].limited The second system is the Vulnerability
Update Rating and Scoring Limited
System (VRSS) for freewhich bases its final score on CVSS V2, providing
[39],
Regularly version and Regularly
process both qualitative ratings andversion
quantitative scores for vulnerabilities. The third system is
hourly for
called the Weighted Impact Vulnerability Scoring System (WIVSS). Based on CVSS V2,
subscriptions
it assigns different weights for CIA impact metrics in contrast to CVSS, which uses the
same weights for impact metrics [40]. Finally, the Not available
Variable Impact–Exploitability Weightage
Limited for free
API Support No Yes
Scoring SystemNo(VIEWSS) is a hybrid technique that for limited
combines the strengths of CVSS,Yes VRSS,
and WIVSS [41]. version
version
Incident Response (IR): It is focused on identifying, analyzing, and mitigating damage,
CVE List Not available for free
Available
as well as addressing the root to minimize incident impact. Not available
This can be viewed as Available
the
download version
Scoring CVSS V2, 3 CVSS V2, 3.x CVSS V2 CVSS V2, 3.x CVSS V2
-
System and 4.0 and 4.0 and V3 and 4.0 and V3
1 https://www.cve.org/; 2 https://nvd.nist.gov/; 3 https://www.mitre.org/; 4 https://vuldb.com/;
https://www.security-database.com/; 6 https://vulndb.flashpoint.io/users/sign_in;
J. Cybersecur. Priv. 2024, 4 864
mitigation process for security violations of policies and recommended practices. Incident
Response (IR) encompasses eight broad operations within any ecosystem: policies and
procedures (IR), training (IR), testing incidents, handling incidents, monitoring incidents,
reporting (IR), assistance, and an IR plan [42,43].
Indicator of Compromise (IoC): After an attack has been executed on a victim system,
some digital footprints can be left by hackers. This evidence of a possible attack represents
forensic artifacts from intrusions identified at the host or network level within organiza-
tional systems. IoCs provide valuable information about compromised systems and can
include the creation of registry key values. IoCs for network traffic include Universal
Resource Locators or protocol elements that indicate malicious code commands and control
servers. The rapid distribution and adoption of IoCs can enhance information security
by reducing the time systems and organizations remain vulnerable to the same exploit
or attack [43].
Thus, this section highlighted the most important cyber elements and concepts to
provide a foundation for understanding the content of this paper. Next, we will examine
additional findings regarding used VDBs.
Thus, each of these databases presented above has its own specificities, performance,
and accuracy. The analysis of the data published by NVD confirms the existing VDBs’
issues and shortcomings, which will be discussed further.
To conclude this section, we presented several motivations for choosing this topic.
We focused on key cybersecurity elements, providing concise explanations to facilitate
understanding. We also summarized our findings on various VDBs, with an in-depth
focus on NVD. In the following section, we will examine our research and explore multiple
findings in vulnerability detection.
Figure13.
Figure 13.Taxonomy
Taxonomyofofvulnerability
vulnerabilitydetection.
detection.
Figure14.
Figure 14.Features
Featuresof
ofsimilarity
similaritymatching-based
matching-basedapproach.
approach.
as handling multi-word keywords, capitalized terms, and words starting with “lib-.” Thus,
the evaluation is showing a promising result in general.
where “t” is a word and “d” is a document belonging to a corpus of documents “D”.
Figure15.
Figure 15.Overview
Overview of
of HermeScan.
HermeScan. (Adapted
(Adapted from
from [54]).
[54]).
4.1.2.Table
Finding Analysis
3 below provides additional information about the different methods used in
this category.
The method based on RE identifies vulnerabilities and required patch releases
without providing specific details on the accuracy rate. Challenges related to CPE and
4.1.2. Finding Analysis
CVE metadata impact the results. To address these issues, the Levenshtein technique uses
The method based on RE
a semi-manual approach in identifies vulnerabilities
the CPE matching and required
process, achievingpatch releases without
approximately 83%
providing specific details on the accuracy rate. Challenges related
accuracy, as 10 out of 12 products were correctly matched. Manual interventionto CPE and CVE metadata
was
impact
necessarythe results.
due to Totheaddress these issues,
error-prone naturethe ofLevenshtein technique
fully automated uses a semi-manual
approaches. However,
approach in the CPE
inconsistencies matching process,
and incomplete achievinginapproximately
data published CVEs still affect 83% theaccuracy, as 10 out of
overall accuracy of
12
theproducts
studied were correctly matched. Manual intervention was necessary due to the error-
methods.
proneBuilding
nature of fully
new CPEsautomated
from bannerapproaches. However,
texts achieved a highinconsistencies
accuracy of 98.9%. and Despite
incomplete
this
data published in CVEs still affect the overall accuracy of the studied methods.
success, issues such as overmatching, short product names, or common names led to false
Building
positives new CPEs
or missed from bannerIn
vulnerabilities. texts
theachieved a high accuracy
CVE matching process, theof 98.9%. Despite
TF-IDF-based
this success, issues such as overmatching, short product names, or common
keyword extraction pipeline was used to identify the most affected software, with 70% names led of
to
false positives or missed vulnerabilities. In the CVE matching process,
vulnerabilities (around 57,640 CVEs) accurately identifying full software names. the TF-IDF-based
keyword The extraction pipeline was
Ratcliff/Obershelp used tocontributed
algorithm identify thesimilarly
most affected
to thesoftware,
regular with 70% of
expression-
vulnerabilities
based method,(around
matching57,640 CVEs)
system logsaccurately
with CPE/NVDidentifying
data full softwareaffected
to identify names.software,
achieving an average accuracy of 79%. Additionally, several solutions focused on
resolving incomplete CPE listings. The modified Jaro–Winkler technique achieved 83.7%
accuracy in vendor matching surpassing the Levenshtein edit distance and
Ratcliff/Obershelp methods.
Recently, the method based on ChatGPT was evaluated for retrieving CVSS scores,
identifying affected CPEs, and offering mitigation strategies. However, these techniques
J. Cybersecur. Priv. 2024, 4 870
Limitations Human
Authors, Comparison Scope or Scanning
and Interaction Attributes Prioritization
Year Method Ecosystem Mode
Challenges (HI)
Incomplete information in log file;
Gawron No matching between CPE ID/products and CPE, log file,
Regular
et al., IT CPE/VDBs; No HPI-VDB, No Passive
Expression
2017 [46] Vulnerability without CPE; OSVDB, NVD.
Vulnerability zero-day.
Mismatch errors;
Similar semantic CPE with different syntax;
Sanguinoc Levenshtein Vendor, Product
Large and complex computation; Passive and
and Uetz, edit IT Yes and version. Yes
Human intervention is labor intense; active
2017 [9] distance CPE, CVE.
CVE description without
software product metadata.
Dependence on banner text quality, and
Building
complexity in managing vague or
CPE Banner text, CPE
Na et al., incomplete data. Passive and
for IoT No (Product and No
2018 [47] Deprecation in CPE active
connected vendor name).
dictionary;
devices
No CPE entries in the CPE dictionary.
Heavily dependent on the quality of text
description and in case of lack of relevant
keywords, the results may lead to false
positives or negatives. Free-form Yes,
Elbaz, Rilling, Analysis based on description only description, the result is the
and Morin, TF-IDF IT may output errors; No keywords most probable Passive
2020 [48] Incomplete metadata in VDBs extracted from CPE affected
represent a considerable URI, CPE, CVE. software.
issue;
Limited heuristics may cause
occasional inaccuracies.
J. Cybersecur. Priv. 2024, 4 871
Table 3. Cont.
Limitations Human
Authors, Comparison Scope or Scanning
and Interaction Attributes Prioritization
Year Method Ecosystem Mode
Challenges (HI)
BinXray relies on the accurate
No, but manual
function matching, as well as a
Basic bloc analysis is Vulnerable function
dependance on a binary compiled system;
mapping, IT required (VF); Patched
A challenge is raised when a function receives
Greedy and to analyze function of a
Xu et al., multiple changes at the same location in
Algorithm, software used potential program (PF) and No Passive
2020 [49] different versions;
Levenshtein in IoT vulnerable target
Complex and large functions may
distance devices functions and binary
increase the time
Algorithm. then, check program.
consumption for analysis;
ambiguous cases.
Remain noise to impact the accuracy.
Name inconsistency issues during the
collection of
software products; Vendor,
Ushakov Error-prone mapping due Yes, Product
Ratcliff/ Passive and
et al., IT to the obtained score; in some and No
Obershelp active
2021 [5] Manual verification is cases. version,
required in certain steps; CPE, CVE.
Common issues related to
the VDBs.
Extracting and analyzing
No, but in
Abstract Syntax Trees (ASTs) may increase the
case
Fuzzy computational cost in a
of false Target
matching; complex infrastructure;
positive function (TF), patch
Hash Patching methods differ and could generate
or function (PF) and
algorithms false positives;
ambiguous vulnerable
Zhao et al., (CTPH and VULDEFF focuses only on syntactic
IT results, function (VF), No Passive
2023 [50] CRC32); and structural
validation dataset of
Weighted edit features without handling
is vulnerable
distance and semantic aspects;
required to function
Cuckoo filter, The balance between the three
maintain the and patches.
and AST. thresholds (ξ1, ξ2, and ξ3) should be well set to
accuracy of
avoid impacting the
VULDEFF.
accuracy of VULDEFF.
J. Cybersecur. Priv. 2024, 4 872
Table 3. Cont.
Limitations Human
Authors, Comparison Scope or Scanning
and Interaction Attributes Prioritization
Year Method Ecosystem Mode
Challenges (HI)
The variability in vendor names impacts the
accuracy of the matching process;
Vulnerabilities published without software
Jaro–Winkler; description or no CPE at all;
Yes, Dataset of ICS
McClanahan NLTK snowball Handling abbreviations and acronyms
especially for advisories published
et al., stemmer; OT when building No Passive
building the before July 25, 2023;
2023 [52] Cleanco Python exact CPEs;
dataset. CPE, CVE.
library; Handling Jaro–Winkler errors during the
matching process;
Following versioning names over time;
Labor-intensive in building the dataset.
GPT-3 and GPT-3.5 are not accurate in finding Yes, to
McClanahan GPT-3; GPT-3.5; CVSS scores, vectors, and affected products; interact CVE, CPE, CVSS,
et al., GPT-4; LLM and Linux system GPT-4 and Bing chatbot still had issues with Exploits, Mitigation, No Passive
2024 [53] Bing chatbot retrieving correct and precise CVEs; user-prompted Google, and NVD.
LLM is prone to hallucinations. questions.
Build incomplete CFG for complex firmware
(obfuscated code or indirect calls);
IoT device firmware;
Many interdependencies between functions
Fuzzy Shared libraries;
Gao et al., and libraries may require more computations
matching, CFG; IoT Yes Binary files; 0-day No Passive
2024 [54] and resources;
RDA dataset; N-day
Dynamic;
dataset.
Over-tainting constitutes a challenge and leads
to incorrect vulnerability reports
J. Cybersecur. Priv. 2024, 4 873
4.2.Graph-Based
4.2. Graph-BasedApproach
Approach
Basedon
Based ongraph
graphtheory,
theory,DBs,
DBs,and
andother
otherAI AItechniques,
techniques,thethesecond
secondbasket
basketisisanother
another
methodfor
method formodeling
modelingandandanalyzing
analyzingthe therelationships
relationshipsandandinteractions
interactionsbetween
betweenelements
elements
insideaa target
inside target system.
system. This
This approach
approachplays
playsaasignificant
significantrole
roleininvulnerability
vulnerabilitydetection
detectionby
utilizing complex relationship mapping. It includes multiple
by utilizing complex relationship mapping. It includes multiple inputs, inputs, fast traversal
traversal ofof
linkedsecurity
linked securitydatadata stored,
stored, scalability,
scalability, real-time
real-time analysisanalysis
of securityof events,
security
andevents, and
simulation
simulation
and and of
prediction prediction
potentialofcyberattacks,
potential cyberattacks,
as depicted as in
depicted
Figurein Figure
16. More16.details
More about
details
aboutmethods
these these methods are presented
are presented throughout
throughout the synthesis
the synthesis of several
of several methods methods
existingexisting
in the
in the literature
literature review.review.
Figure16.
Figure 16.Features
Featuresofofgraph-based
graph-basedapproach.
approach.
4.2.1.Graph-Based
4.2.1. Graph-basedApproach
approach Methods
methods description
Description
Method
MethodBased
Basedon
onGGNN
GGNN
This study
This study introduced
introduced inin their
their work
work aa novel
novel framework
framework named
named FUNDED
FUNDED (Flow-
(Flow-
sensitivevUlNerability
sensitive vUlNerabilitycoDE
coDEDetection)
Detection)[55].
[55].ItItcombines
combinesaagraph-based
graph-basedlearning
learningconcept
concept
withautomated
with automateddatadatacollection
collectionforforcode
codevulnerability
vulnerabilitydetection.
detection.FUNDED
FUNDEDleverages
leveragesthe
the
benefits of advanced graph neural networks (GNNs) [56] to represent the target program
as a graph by AST [57], capturing control, data, and call dependencies, using PCDG [58],
to enhance code vulnerability detection. The framework first converts the program source
code into a graph representation, where nodes represent statements and edges represent
various code dependencies (control, data, and call). In the first phases, FUNDED includes
Gated Graph Neural Networks (GGNNs) [59] to capture complex code structures and
J. Cybersecur. Priv. 2024, 4 874
benefits of advanced graph neural networks (GNNs) [56] to represent the target program
as a graph by AST [57], capturing control, data, and call dependencies, using PCDG [58], to
enhance code vulnerability detection. The framework first converts the program source
code into a graph representation, where nodes represent statements and edges represent
various code dependencies (control, data, and call). In the first phases, FUNDED includes
Gated Graph Neural Networks (GGNNs) [59] to capture complex code structures and
relationships critical for identifying vulnerabilities. In the second step, related to data
collection, the framework gathers high-quality training samples from open-source projects
to identify vulnerable code and enrich the training dataset with real-life examples. A key
aspect of this phase is the combination of expert models (support vector machine (SVM),
random forests (RFs), k-nearest neighbor (KNN), logistic regression (LR), and gradient
boosting (GB) to identify vulnerability-relevant commits. Conformal Prediction (CP) mea-
sures the statistical confidence of each expert model’s predictions. The third step highlights
the multi-relational graph modeling technique, which helps create multiple relation graphs
for different types of edges (e.g., control flow, data flow, syntax). It aggregates information
across these relation graphs using a Gated Recurrent Unit (GRU) [60] to learn a comprehen-
sive representation of the program. Finally, the last step involves training the models on
real-life samples and applying transfer learning to adapt the model to different program-
ming languages, as shown in Figure 17. Through the utilization of the trained model and
J. Cybersecur. Priv. 2024, 4, x FOR PEER REVIEW 24 of 58
the learned graph representations, the proposed solution assists in identifying patterns
indicative of vulnerabilities.
Figure17.
Figure 17.Workflow
WorkflowofofFUNDED.
FUNDED.(Adapted
(Adaptedfrom
from[55]).
[55]).
MethodBased
Method Basedon onSPG
SPG
Using static program
Using static program analysis analysisapproaches,
approaches,this thiscontribution
contributionpresented
presentedaamethod
methodfor for
handling software vulnerability detection based on the Slice Property
handling software vulnerability detection based on the Slice Property Graph (VulSPG) [61]. Graph (VulSPG)
[61]. parsing
After After theparsing
target code theusing
target code using
the open-source the (https://joern.readthedocs.
tool joern open-source tool joern
(https://joern.readthedocs.io/en/latest/),
io/en/latest/, accessed on 31 July 2024), thethe methodmethod
matches matches some vulnerability
some vulnerability candidate
candidate syntax characteristics by applying the Abstract
syntax characteristics by applying the Abstract Syntax Tree (AST) [57], following Syntax Tree (AST) [57],
six types
following six types of Syntax-based Vulnerability Candidates
of Syntax-based Vulnerability Candidates (SyVCs) [62]. Then, the Program Dependency (SyVCs) [62]. Then, the
Program Dependency Graph (PDG) [63] is traversed to obtain slice
Graph (PDG) [63] is traversed to obtain slice nodes. Additionally, the Code Property Graphs nodes. Additionally,
the Code
(CPG) [64]Property
generateGraphs
data and (CPG) [64] generate dataasand
control-dependencies wellcontrol-dependencies
as function calls among as well
sliceas
function
nodes calls among
to build the Sliceslice nodes Graph
Property to build(SPG).
the Slice
TheProperty
second stepGraph (SPG).the
encodes Thesemantics
second step in
encodes
the the semantics
SPG nodes. This processin the SPG nodes.
involves This process
lexical analysis via theinvolves
Word2Vec lexical
modelanalysis via the
and semantic
Word2Vec model and semantic feature vectors through a token-level
feature vectors through a token-level attention mechanism [65]. This type of information attention mechanism
is[65].
thenThis type of information
combined to enrich theis node’s
then combined to enrich the node’s
feature representation feature an
and output representation
embedded
and output
vector of the an embedded
graph nodes. vector of thestep
The current graph nodes.the
divides The SPGcurrent step divides
into three types ofthe SPG into
subgraphs:
athree types
Control of subgraphs:
Dependency a Control
Graph (CDG), Dependency Graph (CDG),
a Data Dependency Graph a Data
(DPG),Dependency
and a FunctionGraph
(DPG),
Call and a Function
Dependency Graph. Call
TheDependency
Graph Encoding Graph. The Graph
Network phaseEncoding
then uses Network phase
Relational then
Graph
uses Relational
Convolutional Graph Convolutional
Networks (R-GCNs) [66]. Networks
This helps to (R-GCNs)
learn the[66].
hidden This helps
state to learn
of each node the
in
hidden
each layerstate of each node
and concatenate themin each layer
to capture and concatenate
the comprehensive graphthem to essential
features capture for the
comprehensive
identifying graph
potential features essential
vulnerabilities. forlast
In the identifying potentialproposes
step, the method vulnerabilities. In the last
a subgraph-level
attention
step, the mechanism
method proposesto obtain the feature vector
a subgraph-level of themechanism
attention combined subgraphs
to obtain the (vectors).
feature
vector of the combined subgraphs (vectors). Finally, the obtained subgraph and SPG are
concatenated into a classifier network for vulnerability detection.
Finally, the obtained subgraph and SPG are concatenated into a classifier network for
vulnerability detection.
Figure18.
Figure 18. Steps
Steps to
to build
build EDG
EDGfor
forSUT.
SUT.(Adapted
(Adaptedfrom
from[68]).
[68]).
Method Based
Method BasedononAnalytic
AnalyticGraph
Graph
This contribution introduced a Graph-Based
This contribution introduced a Graph-Based Analytic
Analytic methodmethod to improve
to improve Cyber Cyber
Situa-
Situational Awareness (CSA) across complex computer networks
tional Awareness (CSA) across complex computer networks [75,76]. The CSA operates [75,76]. The CSA on
operates
three on perception,
levels: three levels:comprehension,
perception, comprehension,
and projectionand projectioninofa cyber
of situations situations in a
environ-
cyber The
ment. environment. The authors
authors introduce introduce
graph-based graph-based
intelligence, intelligence,
which leverageswhich leverages
the second levelthe
of
second level of CSA. This method starts by identifying hosts near compromised
CSA. This method starts by identifying hosts near compromised devices using Depth devices
First
using Depth
Search First Search
(DFS). Next, (DFS).
this method Next, this
discovers methodassets
vulnerable discovers
usingvulnerable
breadth-firstassets using
searching
breadth-first searching (BFS) to identify and manage network vulnerabilities.
(BFS) to identify and manage network vulnerabilities. Likewise, community detection and Likewise,
community
frequent detection
subgraph and(FSM)
mining frequent subgraph
algorithms mining
segment (FSM) algorithms
the network segment
as part of the the
proactive
network as part of the proactive security measures in Incident Response IR [77].
Ultimately, graph centrality measures (Degree—Betweenness—PageRank—Closeness)
prioritize nodes based on their influence, impact, critical subnet, and other relevant
parameters to assess network security [78,79].
Limitations Human
Authors, Used Scope or Scanning
and Attributes Interaction Prioritization
Year Method Eco-System Mode
Challenges (HI)
Data: Many commits in open-source
Target: Program
projects includes benign code snippets in
Source Code;
the training samples;
GGNN: Data: CVE,
Data quality assessment: The check Yes, especially in data
(PCDG, NVD, SARD
process remains manual; gathering process, Initial
AST, and open-source No, a
ML models: Are dependent on sample labeling
GRU) projects hosted Binary
the quality of dataset which (inspecting and labeling)
Wang et al., Mixture of on GitHub; Decision is
IT needs continuous upgrade; and in continuous Passive
2020 [55] Expert Model Dataset: SAP [88] given by
Uncertain situations: the models learning where
(SVM, RF- for Java and ZVD function
are predefined to produce predictions are reviewed
KNN, LR [89] for C/C++; detection.
high-probability answers which may lead by developers to provide
and Expert Models;
to false positives; ground-truth labels.
GB-RE). Conformal
Resource-intensive: More time to perform
Prediction
training a huge volume of data and
(CP).
learning from these graphs.
SARD [90] and NVD [36] datasets:
present noise and irrelevant information, Yes, to
Source code;
inconsistencies in training data, handle complex
Outputs of PDG:
inaccurate synthetic samples, limited interpretation of
Data and flows
coverage of vulnerability results, to
of the program;
types for training; perform a
Semantic
SPG: SPG: complex to construct, semantic validation of
Zheng et al., Outputs of
R-GCN-AST- IT process is resource; vulnerability source Yes Passive
2021 [61] CPG by using
PDG-CPG. Intensive, reducing redundancy code, to adjust the model
AST and CFG;
can lead to omission of potentially parameters and refine
Syntactic
relevant information; the slicing criteria as well
features: slicing
Handling variability in code structure as alter dependencies in
criteria to generate
impacts effectiveness of SPG generation; SPG construction
(SPG).
VulSPG is focused only on vulnerability process.
detection in programs written in C/C++.
J. Cybersecur. Priv. 2024, 4 878
Table 4. Cont.
Limitations Human
Authors, Used Scope or Scanning
and Attributes Interaction Prioritization
Year Method Eco-System Mode
Challenges (HI)
Known CVE
Granular details of asset configurations Yes, for
vulnerabilities
increase the complexity of vulnerabilities
Graph-based (Json Format),
assets management; and device
methods and CPE
Frequent alteration in fingerprints, but HI is
Tovarnak et al., and applicability
IT configurations system and in VDBs; required again in No Passive
2021 [67] Gremlin graph statements
Intensive computation when applying updating CVE or
traversal (Version 2.3
to a large-scale ecosystem; asset data or
language reference
Complete dependence of the accuracy modifying the graph
implementation
of CVE and CPE published. structure.
[30,31]).
Global dependence of input data
All CPE under
EDG model accuracy (CVE and CWE); All the process included
the SUT;
(directed graphs Complexity in managing dynamic in this approach are
Public CVE,
and dynamic updates or upgrade (CVE, automatic;
CWE and CAPEC; Yes,
Longueira- tracking); patch or firmware); nevertheless,
OT Time-quantitative metrics especially
Romero et al., Quantitative Resource intensiveness: in periodic reviews may Passive
(IACS) based on CVSS: for for patching
2022 [68] Metrics (CVSS- maintaining require manual
vulnerabilities activities.
based Metrics EDG model; input to ensure
(M0 to M6)
and Continuous The used model loses effectiveness accuracy and
and for weaknesses (M7
Assessment) in front of the unknown or relevance.
and M8).
(zero-days) vulnerabilities.
J. Cybersecur. Priv. 2024, 4 879
Table 4. Cont.
Limitations Human
Authors, Used Scope or Scanning
and Attributes Interaction Prioritization
Year Method Eco-System Mode
Challenges (HI)
New paradigms (new query Network hosts,
languages and adaption users, services
Graph-based to data processing); information, IP
analytic-graph Lack of comprehensive datasets addresses,
traversal (DFS (high-quality datasets for training vulnerabilities of Yes, for data
and BFS), and validating graph-based CVE (CPE included), interpretation, incident
Husak et al., IT Passive and
Community cybersecurity systems); and security events; response, decision Yes
2023 [75] (Network) Active
detection, FSM, Need for unified ontology Nmap for scanning making and maintenance
and graph (The effectiveness can be limited); (CPE string) and the and updates.
centrality Explainability and complexity Neo4j Graph Data
measures. (difficulties for users to Platform for storing
understand and and visualizing the
interpret the results). data [87].
Dependance of the external
cyber security event;
Threat
Incomplete vulnerability information No, but in the set-up and
knowledge
or delayed updates; CVE, CPE, defining
graph
Shi et al., Managing prediction errors and and parameters of the model,
(Translating IT Yes Passive
2023 [80] maintaining complexity; CWE from human expertise is
Embeddings:
Manual analysis is required; NVD. required to interpret the
ML model
The prediction of the association between result.
TransE)
entities is based on historical data; other
newly entities may represent an issue.
J. Cybersecur. Priv. 2024, 4 880
Table 4. Cont.
Limitations Human
Authors, Used Scope or Scanning
and Attributes Interaction Prioritization
Year Method Eco-System Mode
Challenges (HI)
Graph Structural
Information
Integration Higher computational costs and resource
(AST-PDG and CFG); demands for building a complex Tree datasets
LLM graph representation in high-scale ecosystem; are
(in-context learning); Dependence on quality during the in-context used to train
CodeT5 [91] to learning and domain-specific information; models in
Yes, the three modules
extract Effectiveness GRACE with other detection if the
Lu et al., integrated
semantic IT programming language; code is vulnerable No Passive
2024 [84] are fully
features; Certain nuanced or complex semantic or not.
automated.
T-SNE [92] to reduce information may impact the FFmpeg [94] and
feature detection of some Qemu [95]
dimensionality; vulnerabilities; and
-SimSBT [93] to New vulnerable patterns not existing Big-Vul [96].
generate sequences in the data source.
during the
traversal path.
Issues within a large and complex
IoT environment;
The reachability and attack path
Neo4j, Yes, to
Salayma computations can face limitations when CVE,
Cypher IoT elaborate No Active
2024 [86] firewall policies grow in complexity; Attack paths.
queries. queries.
Dependence on Neo4j and its cypher query
language may limit the portability of the
solution to other graph databases.
J. Cybersecur. Priv. 2024, 4 881
Figure20.
Figure 20.CyberSPL
CyberSPLworkflow.
workflow.(Adapted
(Adaptedfrom
from[99].)
[99].)
Figure 20. CyberSPL workflow. (Adapted from [99].)
Method
MethodBasedBasedononAttack
AttackScenario
Scenario
Method Another work
Basedwork
Another presented
on Attack aastudy
Scenario
presented studyfocusing
focusingononthe
theintegration
integrationofoffeature
featuremodeling
modelingtoto
support
support security
Another work
security assessments
presentedbyaby
assessments virtualizing
study focusing
virtualizing attack scenarios
onscenarios
attack the for for
integration software
of feature
software systems
modeling
systems [104].
[104]. to
In
support
this security
context, assessments by
the methodology virtualizing
starts attack scenarios
with extracting for software
security events systems
from VDBs, [104].
linking
them, and correlating dependencies between software systems. Next, a feature model is
built to capture vulnerabilities and the relationships between them. At this step, actions are
J. Cybersecur. Priv. 2024, 4 883
carried out manually by pulling from the Metasploit Framework (MSF) and vulnerability
databases to build records for each attack scenario into a vulnerability feature model [105].
For evaluating the presented work, the authors built the relationship between Firefox and
operating systems using leaf features and 24 cross-tree constraints. The next step involves
integrating the retrieved data to replicate vulnerable systems. The virtualized systems
are then attacked using MSF scripts before evaluating the scenarios’ effectiveness. This
capability allows all security stakeholders the opportunity to identify pertinent attack
scenarios and vulnerabilities for their purposes.
Limitations
Authors, Used Scope or Human Scanning
and Attributes Prioritization
Year Method Eco-System Interaction (HI) Mode
Challenges
High initial effort: assets cartography Cybersecurity
FAMA
and security control identification; policy,
Varela-Vaca et al., framework-REST API;
IT Dependency on accurate models: Any Assets- Yes Yes Passive
2019 [99] ChocoSolver
error may lead to incorrect diagnosis; Cybersecurity
-CSP.
Manual updates of FM are required. Context.
Security events: Lacks quality, difficulties in
extracting relevant data and
inconsistencies issues;
Analysis, extraction, synthesis,
and date are performed manually; Vulnerability
In this study,
Additional manual analysis is required Databases:
throughout the
to build FM; NVD.
attack scenarios
During the evaluation, errors or Exploit Databases.
Kenner et al., and penetration
IT technological issues relating to Attack Yes Yes Passive
2020 [104] testing stage,
constraints on the environment Scenario
only the
occur; Dataset and
specific MSF is
The suggested model must be heavily modified for Framework:
defined.
many use cases with the goal to be reusable; MSF.
Maintainability and real-time updates
require additional effort to be
accomplished in the event that a
software system changes.
J. Cybersecur. Priv. 2024, 4 885
Table 5. Cont.
Limitations
Authors, Used Scope or Human Scanning
and Attributes Prioritization
Year Method Eco-System Interaction (HI) Mode
Challenges
Dependance: relevant key work addition requires
to be manually included to
enhance accuracy;
Assets inventory depends only on NMAP scan Vulnerability
results which may contain Databases:
inconsistencies or omission; NVD;
Difficulty to manage products whose CPE does not CPE;
meet specifications and that NMAP is Running
FaMa;
unable to identify; Configuration RC
FM: fm.py;
Varela-Vaca et al., VDBs: inconsistencies and relevant data omission (environments in Passive and
Tool: Nmap; IT Yes Yes
2020 [106] can affect the accuracy of the FMs; which the active
web scrapers:
There are more cross-time limitations when a vulnerability can be
scraper.py.
significant number of features (CVE and CPE) reproduced);
are included; Reports from
The FM does not accurately represent the state of infrastructure
assets in terms of RC and CPE; analysis (ports,
System feature detection is still manual; services,etc, . . .).
It will be time-consuming as a result of the
scraping mode carried out in a large
complex environment.
FaMaPy; The AMADEUS-exploit still has the same NVD,
Tool: Nmap; limitations as the AMADEUS framework; ExploitDB
web scrapers: Exploit DB: Incomplete, inconsistent, or error data and
Varela-Vaca et al., Passive and
scraper.py and IT may affect the accuracy of FMs; VulDB; Yes Yes
2023 [107] active
exploitdb; Misinterpreting the automated analysis and CPE,
scrapper.py; FMs’ reasoning; RC, and
FM: fm.py; Need more external validation experts. key terms.
J. Cybersecur. Priv. 2024, 4, x FOR PEER REVIEW 36 of 58
J. Cybersecur. Priv. 2024, 4 886
Figure21.
Figure 21. Example
Example of
ofFM
FMconstruction
constructionused
usedbyby
AMADEUS and
AMADEUS AMADEUS-Exploit.
and (Adapted
AMADEUS-Exploit. from
(Adapted
[106,107].)
from [106,107]).
4.3.2.Findings
4.3.2. FindingsAnalysis
Analysis
CyberSPLassists
CyberSPL assistscyber
cyberprofessionals
professionalsby byautomating
automatingthe theanalysis
analysisofofnon-conformance
non-conformance
withcybersecurity
with cybersecuritypolicies.policies.ThisThis method
method combines
combines feature
feature models’models’
capacity capacity with
with auto-
automated
mated verification
verification and diagnosis.
and diagnosis. The evaluation
The evaluation showcasedshowcased
performances performances
in operationalin
operational development
development (DevOps). Additionally,
(DevOps). Additionally, the method based the method
on attack based on attack
scenario scenario
incorporates
incorporates feature model variability to represent the vulnerability
feature model variability to represent the vulnerability of the target system. It integrates of the target system.
It integrates attack scenarios to uncover insights about potential
attack scenarios to uncover insights about potential exploitation areas. The evaluation exploitation areas. The
evaluation
showed thatshowed
5 out of that 5 out failed
18 attacks of 18 attacks
to exploitfailed to exploit vulnerabilities,
the identified the identified vulnerabilities,
with detailed
with detailed
reasons listed inreasons
Table 5.listed in Tablethe
In addition, 5. In addition, method
AMADEUS the AMADEUS
integrates method
the SPL integrates
techniques, the
SPL in
used techniques,
CyberSPL,used within CyberSPL,
feature models with feature models
to automate to automateinventory
the infrastructure the infrastructure
analysis,
inventory
scraping analysis, databases,
vulnerability scrapingextracting
vulnerability databases,
vulnerability extracting
configuration, vulnerability
and inferring pos-
sible attack vectors.
configuration, andDespite some
inferring limitations,
possible attackthevectors.
evaluation demonstrated
Despite high accuracy
some limitations, the
inevaluation
generating and validating attack vectors. In the same field, AMADEUS-Exploit,
demonstrated high accuracy in generating and validating attack vectors. In the an
extension
same field, of AMADEUS,
AMADEUS-Exploit, adds an exploit layer tooffeature
an extension models adds
AMADEUS, and improves
an exploit reasoning
layer to
capacities.
feature models This andmethod was evaluated
improves reasoningincapacities.
a real scenario, identifying
This method 4000 vulnerabili-
was evaluated in a real
ties and 700
scenario, exploits. Generally,
identifying AMADEUS-Exploit
4000 vulnerabilities and 700 exploits. has proven moreAMADEUS-Exploit
Generally, its scalability and
efficiency
has proven in vulnerability detection
more its scalability and and management.
efficiency in vulnerability detection and management.
InInpractical
practicalcase casestudies,
studies,thetheprevious
previous methods
methods cancan
be be
employed
employed within enterprises
within enterprisesto
identify, assess, and prioritize vulnerabilities in this infrastructure.
to identify, assess, and prioritize vulnerabilities in this infrastructure. For AMADEUS- For AMADEUS-Exploit
and AMADEUS,
Exploit and AMADEUS,the inventory theassets are evaluated
inventory assets are using web scrapers
evaluated usingorweb querying
scrapersNVD. or
The outputsNVD.
querying affectTheall potential
outputs vulnerabilities
affect all potential and exploits associated
vulnerabilities andwith the discovered
exploits associated
system.
with the If discovered
the system runs PostgreSQL
system. version
If the system runs16.4, the tool might
PostgreSQL find 16.4,
version vulnerabilities
the tool mightlike
CVE-2020-0985 (REFRESH MATERIALIZED VIEW
find vulnerabilities like CVE-2020-0985 (REFRESH MATERIALIZED VIEW CONCURRENTLY executes arbitrary
SQL). Then, AMADEUS-Exploit
CONCURRENTLY executes arbitrarygenerates
SQL).feature
Then, models (FMs) including
AMADEUS-Exploit all possible
generates feature
combinations of affected configurations, versions of product
models (FMs) including all possible combinations of affected configurations, versions (CPEs) and exploits. As aof
result, the cyber team can verify that PostgreSQL version 16.4
product (CPEs) and exploits. As a result, the cyber team can verify that PostgreSQL is vulnerable under specific
configurations, and no exploit
version 16.4 is vulnerable under currently
specific exists for CVE-2020-0985.
configurations, and no exploit The currently
reasoningexistsmecha- for
nism prioritized CVEs with exploits that can affect critical assets,
CVE-2020-0985. The reasoning mechanism prioritized CVEs with exploits that can affect such as Adobe Commerce
versions 2.4.3-p1
critical assets, andas2.3.7-p2
such Adobe affected
Commerce by CVE-2022-24086,
versions 2.4.3-p1 and with2.3.7-p2
related affected
exploits by already
CVE-
available, and focusing efforts to apply necessary preventive measures.
2022-24086, with related exploits already available, and focusing efforts to apply necessary
Thus, the earlier techniques provided specific examples of how to use feature models
preventive measures.
for vulnerability identification through capturing the global picture and the dependence
Thus, the earlier techniques provided specific examples of how to use feature models
between security information elements. We will then go over another basket of techniques
for vulnerability identification through capturing the global picture and the dependence
related to the AI-based approach.
between security information elements. We will then go over another basket of techniques
related to the AI-based approach.
J.J.Cybersecur.
Cybersecur.Priv. 2024,4 4, x FOR PEER REVIEW
Priv.2024, 37 of887
58
4.4.
4.4.AI-Based
AI-BasedApproach
Approach
This
Thiscategory
categoryfocuses
focusesononthethe
useuse
of artificial intelligence
of artificial (AI) technologies
intelligence in iden-
(AI) technologies in
tifying, classifying, and prioritizing vulnerabilities in software systems. It combines
identifying, classifying, and prioritizing vulnerabilities in software systems. It combines the
use
the of one
use oforone
multiple AI models
or multiple (machine
AI models learning,
(machine deep learning,
learning, and LLM),
deep learning, andas shown
LLM), as
in Figure 22, to provide advanced techniques for finding and fixing vulnerabilities
shown in Figure 22, to provide advanced techniques for finding and fixing vulnerabilities faster
and more
faster andaccurately than traditional
more accurately methods,methods,
than traditional thereby strengthening the overall
thereby strengthening thesecurity
overall
posture of software systems. There are many contributions accomplished
security posture of software systems. There are many contributions accomplished regarding this
topic, as detailed in the following subsections:
regarding this topic, as detailed in the following subsections:
Figure22.
Figure 22.Features
Featuresof
ofAI-based
AI-basedapproach.
approach.
4.4.1.AI-Based
4.4.1. AI-based Approach
approach methods
Methods description
Description
MethodBased
Method Basedon onBLSTM
BLSTM
In 2018,
In 2018, the
the authors
authors published
published aa newnew method
method called
called Vulnerability
Vulnerability Deep Deep Pecker
Pecker
(VulDeePecker). The purpose of the method is to integrate deep learning
(VulDeePecker). The purpose of the method is to integrate deep learning into the software into the software
vulnerabilitydetection
vulnerability detectionprocess
process[109].
[109]. Based
Based on onthisthis
work,work, VulDeePecker
VulDeePecker automates
automates vul-
vulnerability
nerability discovery
discovery by lowering
by lowering thethe reliance
reliance onon human
human experts,reducing
experts, reducingfalsefalsepositive
positive
and negative
and negative rates,
rates, and
andenhancing
enhancing detection
detection accuracy.
accuracy. As As one
one ofof the
the first
first attempts
attempts toto
integrate deep learning into vulnerability detection, VulDeePecker
integrate deep learning into vulnerability detection, VulDeePecker operates in two phases: operates in two
phases: learning
learning and detection.
and detection. The learning The phase
learningusesphase
a largeuses a largeofnumber
number of code
code gadgets gadgets
classified
asclassified
vulnerableas vulnerable
or not for or not forthe
training training the Bidirectional
Bidirectional Long Short-Term
Long Short-Term Memory
Memory (BLSTM)
(BLSTM)using
network network using[98]
Theano Theanoand [98]
Kerasand KerasIn
[110]. [110]. In addition,
addition, the detection
the detection phasephase
uses uses
the
the trained
trained BLSTMBLSTM network
network to identify
to identify vulnerabilities
vulnerabilities in theinprogram
the programcode.code. In addition,
In addition, the
the target
target code code is systematically
is systematically converted
converted to a using
to a vector vectorthe using the word2vec
word2vec tool [65],tool [65],
making
itmaking
a suitable input for the BLSTM. Additionally, this model uses two
it a suitable input for the BLSTM. Additionally, this model uses two datasets datasets (NVD [36]
and
(NVD SARD [90])SARD
[36] and to learn
[90])and detect
to learn andvulnerability patternspatterns
detect vulnerability from these
fromvectorized code
these vectorized
gadgets. Finally, Finally,
code gadgets. preserving semantic semantic
preserving relationships between programs,
relationships between finer granularity
programs, finer
representation of the code, and
granularity representation ofmodel suitability
the code, and for the vulnerability
model suitability for detection context are
the vulnerability
VulDeePecker’s
detection context guiding principles for employing
are VulDeePecker’s deep learning
guiding principles in vulnerability
for employing detection.
deep learning in
According to the experimental results, VulDeePecker achieved much
vulnerability detection. According to the experimental results, VulDeePecker achieved fewer false negatives
than
much other methods.
fewer false negatives than other methods.
Method
MethodBased
BasedononNER
NER
Another
Another contribution
contribution proposed
proposed aa new
new solution
solution to
to address
address security
security issues
issues during
during
software
softwaredevelopment
development [10]. In this
[10]. context,
In this Dependency
context, Vulnerability
Dependency Management
Vulnerability (DVM)
Management
technologies automate software
(DVM) technologies automatecomposition analysis (SCA)
software composition to match
analysis known
(SCA) to vulnerabilities
match known
(CVEs)
vulnerabilities (CVEs) with used software components. It was observed thatlag
with used software components. It was observed that there was a time between
there was a
the first CVE disclosure and the addition of CPEs to the vulnerability (the median
time lag between the first CVE disclosure and the addition of CPEs to the vulnerability time is
almost 35 days).
(the median timeAutomated
is almost technologies cannot immediately
35 days). Automated technologiesalert developers
cannot and users
immediately alert
J. Cybersecur. Priv. 2024, 4 888
Method Based on ML
In the same field of research, this method suggested a recommender system for
tracking vulnerabilities that addresses the matching issue between public notifications from
VBDs and the potentially vulnerable products in an enterprise information system [114].
This method provides a shortlist of candidate matches for human verification. The pipeline
of this method comprises three steps: (S1) Based on the target system’s asset inventory
data, the method uses NLP with the SpaCy [115] library to extract word vectors. These are
converted to vectors by Word2Vec [65] to represent the most relevant semantic similarity
followed by normalization to discard unnecessary symbols. (S2) Fuzzy matching integrates
cosine similarity [116,117] to measure the similarity between vendor and product names of
inventory packages and NVD [36]. (S3) The final step is related to machine learning, and
uses a random forest classifier with a Gini impurity measure, to classify the candidate CPEs
by confidence levels (highest, highest, medium, lowest, lowest, reject) and classification
levels (vendor or product).
English VDBs using a tailored Named Entity Recognition (NER) model, VERNIER measures
software name inconsistency from three perspectives: measurement level (character and
semantics), categories (mismatching, overclaiming, underclaiming, and overlapping), and
VDBs (across NVD and eight other DBs, and inside NVD). The findings reveal prevalence
inconsistencies between multiple databases (matching level: 20.3% for character and 43.3%
for semantic) and within the same database, especially between structured and unstructured
software names.
To address these issues, VERNIER suggests a tool that identifies the wrong software
names using a reward-punishment matrix. The tool aggregates data from various VDBs,
performs pairwise comparison using a reward-punishment system for correct, incorrect,
or missing software names, constructs a reward-punishment matrix, applies a weighting
system to assign the importance to different databases, and generates alerts for evaluation
and correction.
Limitations
Authors, Used Scope or Human
and Attributes Prioritization Scanning Mode
Year Method Ecosystem Interaction (HI)
Challenges
Dependance on source code to detect
vulnerabilities while complied program
remains a challenge;
Applicability only in C/C++ and for one
Datasets:
vulnerability type (library/API function calls); Yes,
NVD
RNN VulDeePecker does not provide control flow analysis, it especially
and SARD;
(BLSTM); only supports data flow analysis; in learning
Li et al., Target
Word2vec; IT Dependence on the quality of datasets used phase when No Passive
2018 [109] programs;
Theano in model training; Labeling
Code
Keras. Converting code gadget variable length vector code
Gadgets
representations into fixed-length vectors; gadgets.
(vector).
The vulnerability detection results depend
only on one model;
No features to identify the reason behind
false positives and negatives results.
Intensive processing power and time are needed
to train models;
The F-measure, recall, and precision
indicate signs of an overfitting, which
requires further training and
hyperparameter of used model;
When dealing with multi-word labels, Data: NVD
the model performs less well; CVE ID
Lexicon limitations affect the performance of and
Yes, to
NER the proposed model; CPEs;
handle
Wareus et al., BLSTM Complex sentences or unseen words in CVE affect the CoNLL-2003
IT errors in No Passive
2020 [10] CRF context understanding (BLSTM and CRF); dataset
labeling
CNN. Dependency on the quality and quantity of NVD data for
activities.
(inconsistence, errors, data lack, rare labels, NER;
exposure delay, amount of CVE
training data); summary.
Multi-word labels present issues that
single one and affect the performance of
proposed model;
A significant number of errors are
produced, leading to incorrect predictions
(both over- and under-predicting of labels).
J. Cybersecur. Priv. 2024, 4 892
Table 6. Cont.
Limitations
Authors, Used Scope or Human
and Attributes Prioritization Scanning Mode
Year Method Ecosystem Interaction (HI)
Challenges
Software naming conventions influence matching accuracy;
Inventory and NVD discrepancies can affect fuzzy
NVD (CVE
matching and NLP processes;
and CPE);
NLP: SpaCy Human confirmation of outcomes influences
Names of Yes, for
and Word2Vec; process flexibility;
Software reviewing the
Fuzzy Large dependency on the quality of the
Huff et al., Packages shortlisted
matching: IT training dataset; No Passive
2021[114] installed within candidate
cosine The system generates results with false
an organization; CPEs and confirming
similarity; positives and negatives;
Dataset (https: matches.
ML: RF. The performance might have an influence on a vast
//github.com/pdhuff/
size of organization;
cpe_recommender).
CVE without CPE Metadata remains a
significant data constraint.
Lack of temporal relationships between
MLP, RNN,
Mihoub et al., DOS and DDOS attacks in the dataset used; Bot-IoT
LSTM, KNN, IoT No No Passive
2022 [118] Significantly time is required for training and testing Dataset [133].
DT, RF.
phases, which impact quick detection;
The tool’s efficacy is based on the data quality CVE IDs
in the nine VDBs; from
The reward-punishment matrix may NVD,
provide inaccurate or misleading CVE,
outcomes; CNNVD, Yes, to validate
NER,
Computationally intensive may influence the tool CNVD, the alerts, use
Sun et al., RNN,
IT performances in a large-scale context; ExploitDB, descriptions and data No Passive
2023 [11] LSTM,
The tool may struggle with unclear SecurityFocus, from all vulnerability
Neural network.
ambiguous case Openwall, databases.
when software names are (EDB),
not clear or general; and
The manual verification method may be both SecurityFocus
time-consuming and labor-intensive. Forum.
J. Cybersecur. Priv. 2024, 4 893
Table 6. Cont.
Limitations
Authors, Used Scope or Human
and Attributes Prioritization Scanning Mode
Year Method Ecosystem Interaction (HI)
Challenges
Model dependence on the quality
Regular
and quantity of labeled data used
expressions; Labeled
for training;
ABI to encode or Source
Accumulation of training errors
decode; Code
due to incorrect labels when using
SMT checker; Unlabeled Source
Sun et al., semi-supervised learning; Yes, for
Bert model; Code
2023 Blockchain More time-consuming and manually labeling No Passive
Classifier model; Datsets:
[119] less efficient in active learning module; activities.
KL divergence; Smartbugs [134],
In practical applications, labeling all code data for
Maximization SoliAudit [135],
vulnerability detection remains a complex activity;
function ElBO(.); and
Possible complexity and
Measuring SolidiFi [136].
computational resources in a
uncertainty H(.).
large-scale environment.
Focus soley C/C++ and not generalizing well to other
programming languages;
Dependence on predefined rules and
CodeBERT; patterns (time-consuming and Dataset:
CodeT5; labor-intensive); PRIMEVUL Yes, for
Yes, for
VulEval UniXcoder; Quality of dataset used; Source vulnerability-
the input and
Wel et al., LLaMA; IT The complex nature and scope of a code related Passive
assess the output of
2024 [127] CodeLlama; project might impact the accuracy of target, file dependency
the second task.
GPT-3.5-turbo; inter-procedural vulnerability; and prediction.
GPT-3.5-instruct. The semantic-based approach is not repository.
very effective;
Evaluation in software development environments;
Challenges in complied code version.
Issues to detect modern ransomware altering their
signature dynamically;
GBM and Lasso Regression can encounter compatibility
issues with legacy systems;
Datsets:
Tariq, GBM, Lasso Training steps require extensive time to
IIoT/ZephyrOS RanSAP and No No Active
2024 [132] Regression handle large datasets;
IoT-23.
Improper tuning of hyperparameters (overfitting) can
influence the model detection capacities;
Imbalanced datasets can affect the performance
of the used model.
J. Cybersecur. Priv. 2024, 4 894
multiple issues related to scalability and performance (cloud and distributed systems),
diversity of components (not limited to legacy systems, IoT devices, edge computing),
resource constraints (computational, storage, and battery resources), real-time detection
and response (financial and industrial control systems (ICS)), lack of standardization (CPE,
inconsistency vendor names, varying protocols), limited visibility in multi-tenant (cloud
environments), among others [147,148].
Potential Solutions. To bypass the previous issues, it is recommended to adopt real-
time monitoring solutions (like cloud-native monitoring, lightweight detection systems,
behavioral anomaly analysis), lightweight models and offloading computational tasks for
IoT devices (edge computing, firmware updates and patch automation, and protocol-aware
detection), integrate AI models of vulnerability detection associated with the continuous
training process, and federated learning for IoT devices and edge computing, among others,
to ease the automation of vulnerability detection and reduce its impact [149–152].
including one or multiple models like Convolutional Neural Networks (CNNs), Recur-
rent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Feedforward Neural
Networks (FNN), Gated Recurrent Units (GRU), Variational Autoencoders (VAE), Graph
Neural Networks (GNN), Autoencoders (AEs), Deep Belief Networks (DBNs), Generative
Adversarial Networks (GANs), Deep Reinforcement Learning (DRL), among others. The
DL can improve their performance if they are associated with another layer of Metaheuristic
Algorithms (MAs) [158]. These MAs contribute to optimizing and tuning DL models and
improving their effectiveness. This combination enhances the detection and response of
cyberattack capabilities [159]. Moreover, many studies explored the advantages of AI/XAI
in cyberattacks detection. Proposed studies including these features are referenced in
“connected papers”, the third column of Table 7. Finding the ideal AI model with accurate
hyperparameters is still challenging and requires further study and real-world evaluation
to reduce hallucination issues and errors.
Table 7. Connected papers, from 2016 to 2024, related to vulnerability detection per approach.
Features Connected
Category Domain
Trend Papers
Deep and Machine Learning (CNN, DNN, RNN,
Vulnerability detection LSTM, BLSTM, FNN, VAE, GNN, AEs, GANs, DRL,
based on Deep RF, LR, DT, ETC, VC, BC, AC, GB, XB, GRU, DBN, [62,141,143,160–185]
and Machine Learning. MLP, K-fold Stacking Model (RF, GNB, KNN, SVM,
GB, AdaLR, ADA, SVC, RFC, XAI)).
IA-based
Large Language Model (LLM, GPT-2, GPT-3, GPT-3.5,
approach
GPT-4, Llama, PaLM2)-Metaheuristic algorithms
Vulnerability detection
(Genetic Algorithm (GA), Genetic Programming (GP),
based on OpenAI- [186–208]
Particle Swarm Optimization (PSO),
Metaheuristic algorithms.
Teaching–Learning-Based Optimization
(TLBO), among others).
Vulnerability feature
Feature model-mapping, Cybersecurity knowledge base, reverse engineering,
model-based dependencies, and metamodel, Algorithms FM (SubFM/Vendor, [209–223]
approach correlations of SubFM/RC and SubFM/Tree), FaMaPy.
system components.
Vulnerability detection
AST-PDG-CPG-Gremlin graph-EDG-
based on graph structure
Graph-based Graph-based analytic-Graph traversal-
information related to target [58,95,156,224–245]
approach Threat knowledge graph -GNN-SPG-
input and strengthened by
LLM and AI model.
certain AI techniques.
Matching- Vulnerability detection RE–Levenshtein edit distance–TF-IDF-
based based on string-matching Ratcliff/Obershelp–fuzzy matching; [246–260]
approach algorithm and AI models. AST–Hash algorithms–Jaro–Winkler–GPT models.
In terms of the feature model-based approach, the main objective is to utilize this
method in order to detect vulnerabilities. This is performed by providing a comprehensive
overview of the system components, which is presented in graphical and textual notations.
Additionally, it allows for the current dependencies and potential correlations between all
sub-elements of FM (CVE, product, RC, CPE, etc.) to be identified. This capability helps
identify and analyze the target system in depth, diagnose possible security flaws, and
mitigate cyber risks. For a more accurate FM, it is recommended to check the relevance of
the security data source and the asset inventory. It remains crucial to note that large config-
uration variability increases FM complexity, requiring human interaction for maintenance,
which is time-consuming and prone to errors.
J. Cybersecur. Priv. 2024, 4 897
Vulnerability
Detection Summary of Limits and Drawbacks Related to the Four Aforementioned Approaches.
Approaches
- VDBs, which store and publish all multiform security events, contain multiple issues related to
inconsistent software products, missing metadata, especially CPE, lack of synchronization between CPE
dictionary and CPE/CVE;
- String-similarity algorithms generate errors during the process matching, which increases false positives
and negatives;
A
- Assets inventory do not incorporate a complete CPE list product;
matching-
- Configuration variability and instable product naming over the time impact the accuracy of results;
based approach
- Vulnerability zero day is still a rising crucial issue;
- Difficulty to perform a similarity matching between product having the same semantic and
different syntax;
- The string-matching process is labor-intensive and significantly computational;
- Inaccurate results when using GPT models.
- Building an accurate graph to represent all slices of source code represents a challenge, specifically in a
large complex ecosystem;
- Both the quality of data for the training model and building graph is so important to avoid errors
and under-exploitation;
- Certain techniques center their study on identifying weaknesses in a particular programming language;
A - The graph-based approach could include an excessive amount of duplicate data unrelated
graph-based to the vulnerabilities;
approach - The graph-based approach, which seeks to improve the brute-force method, is resource-intensive and has
to be optimized in order to perform better;
- Some techniques do not employ steps to isolate the compromised network segment in order to prevent the
spread of threats;
- The white-box analysis makes it more difficult to produce a precise model representation, particularly
in ICS.
- Mapping cartography or assets discovered as a feature model input is labor-intensive and prone to errors;
- Errors occur during generating global FM when assessing huge system configurations, extreme complexity
A
and variability of system components;
feature
- The relevancy of security events released by VBDs is essential to FM’s accuracy;
model-based
- The accuracy of the FMs may be impacted by discrepancies and a lack of pertinent data;
approach
- To keep the FM up to date, maintainability and real-time upgrades are necessary;
- Human intervention is necessary for asset scanning, process analysis, FM update, and results exploitation.
- The quality of the datasets continues to be a factor in how well AI models operate;
- There are challenges with the training process as it is time-consuming and labor-intensive;
- Choosing the appropriate model for the context’s method still poses a challenge in order to lower the rate
of false positives and negatives;
An
- Research elucidating the cause of false positives and negative outcomes is lacking;
AI-based
- Incorrect predictions occur when some models deal with lexical features;
approach
- The discrepancies and inconsistencies of published VDBs have an effect on the dataset quality;
- Using Large Language Models (LLMs) to accurately identify software vulnerabilities without generating
false positives is still challenging;
- Not all the vulnerabilities can be discovered by employing the methods that have been suggested.
Author Contributions: Conceptualization, K.B. and N.A.A.; methodology, K.B., N.A.A. and A.Z.F.;
software, K.B. and D.M.; validation, K.B., N.A.A., Y.E.B.E.I. and A.Z.F.; formal analysis, K.B., N.A.A.
and B.S.; investigation, K.B., N.A.A. and D.M.; resources, K.B. and B.S.; data curation, K.B. and
D.M.; writing—original draft preparation, K.B. and N.A.A.; writing—review and editing, K.B.,
N.A.A., A.Z.F. and D.M.; visualization, K.B. and N.A.A.; supervision, N.A.A. and Y.E.B.E.I.; project
administration, Y.E.B.E.I.; funding acquisition, B.S. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data that support the findings of this study are provided by ACG
Cybersecurity (https://acgcybersecurity.fr/, accessed on 5 August 2024).
Acknowledgments: We acknowledge the collaborative efforts of the Laboratory of Engineering
Sciences, National School of Applied Sciences, Ibn Tofail University, Kenitra 14000, Morocco, and
ACG Cybersecurity, for their contributions to the research design and data analysis. Special thanks to
the R&D teams for their insightful discussions and feedback throughout this research study.
Conflicts of Interest: The authors declare that they have no financial or personal conflicts of interest
that could have influenced the work reported in this manuscript. All authors provided materials and
contributed effectively to support this research without influencing this study.
J. Cybersecur. Priv. 2024, 4 899
References
1. Top Cybersecurity Statistics for 2024. Available online: https://www.cobalt.io/blog/cybersecurity-statistics-2024 (accessed on
21 July 2024).
2. Gartner Identifies Three Factors Influencing Growth in Security Spending. Available online: https://www.gartner.com/en/
newsroom/press-releases/2022-10-13-gartner-identifies-three-factors-influencing-growth-i (accessed on 18 April 2024).
3. Rossella, M.; Apostolos, M.; ENISA. Foresight Cybersecurity Threats for 2030–Update. Creat. Commons Attrib. 40 Int. CC 40 2024,
7–12. Available online: https://data.europa.eu/doi/10.2824/349493 (accessed on 31 July 2024).
4. Pochmara, J.; Świetlicka, A. Cybersecurity of Industrial Systems—A 2023 Report. Electronics 2024, 13, 1191. [CrossRef]
5. Ushakov, R.; Doynikova, E.; Novikova, E.; Kotenko, I. CPE and CVE Based Technique for Software Security Risk Assessment. In
Proceedings of the 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems:
Technology and Applications (IDAACS), Cracow, Poland, 22–25 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 353–356.
6. Kharat, P.P.; Chawan, P.M. Vulnerability Management System. Int. Res. J. Eng. Technol. 2022, 9, 976–981.
7. Computer Security Division, I.T.L. Security Content Automation Protocol|CSRC|CSRC. Available online: https://csrc.nist.gov/
projects/security-content-automation-protocol (accessed on 18 April 2024).
8. Vladimir, D. CPE Ontology. 2021. Available online: https://ceur-ws.org/Vol-2933/paper30.pdf (accessed on 31 July 2024).
9. Sanguino, L.A.B.; Uetz, R. Software Vulnerability Analysis Using CPE and CVE. arXiv 2017, arXiv:1705.05347.
10. Wåreus, E.; Hell, M. Automated CPE Labeling of CVE Summaries with Machine Learning. In Detection of Intrusions and Malware,
and Vulnerability Assessment; Maurice, C., Bilge, L., Stringhini, G., Neves, N., Eds.; Lecture Notes in Computer Science; Springer
International Publishing: Cham, Switzerland, 2020; Volume 12223, pp. 3–22, ISBN 978-3-030-52682-5.
11. Sun, H.; Ou, G.; Zheng, Z.; Liao, L.; Wang, H.; Zhang, Y. Inconsistent Measurement and Incorrect Detection of Software Names in
Security Vulnerability Reports. Comput. Secur. 2023, 135, 103477. [CrossRef]
12. Tranfield, D.; Denyer, D.; Smart, P. Towards a Methodology for Developing Evidence-Informed Management Knowledge by
Means of Systematic Review. Br. J. Manag. 2003, 14, 207–222. [CrossRef]
13. Swanson, M.; Hash, J.; Bowen, P. Guide for Developing Security Plans for Federal Information Systems; National Institute of Standards
and Technology: Gaithersburg, MD, USA, 2006; p. 47.
14. Newhouse, W. Multifactor Authentication for E-Commerce; National Institute of Standards and Technology: Gaithersburg, MD,
USA, 2019; p. 24.
15. ISO/IEC 27005; Information Security, Cybersecurity and Privacy Protection—Recommendations for the Management of Risks
Related to Information Security. ISO: Geneva, Switzerland, 2022.
16. Joint Task Force Transformation Initiative. Risk Management Framework for Information Systems and Organizations: A System Life
Cycle Approach for Security and Privacy; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2018; pp. 21–23.
17. Isniah, S.; Hardi Purba, H.; Debora, F. Plan Do Check Action (PDCA) Method: Literature Review and Research Issues. J. Sist. Dan
Manaj. Ind. 2020, 4, 72–81. [CrossRef]
18. Joint Task Force Transformation Initiative. Guide for Conducting Risk Assessments; Department of Commerce, National Institute of
Standards and Technology: Gaithersburg, MD, USA, 2012; p. 53.
19. Stine, K.; Kissel, R.; Barker, W.C.; Fahlsing, J.; Gulick, J. Volume I: Guide for Mapping Types of Information and Information
Systems to Security Categories. Spec. Publ. 800-60 Revis. 1 2008, 1, 53. [CrossRef]
20. Ross, R.; Pillitteri, V.; Graubart, R.; Bodeau, D.; McQuaid, R. Developing Cyber-Resilient Systems: A Systems Security Engineering
Approach; National Institute of Standards and Technology (U.S.): Gaithersburg, MD, USA, 2021; pp. 17–18+91–92.
21. National Institute of Standards and Technology. Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1; National
Institute of Standards and Technology: Gaithersburg, MD, USA, 2018. [CrossRef]
22. LeMay, E.; Scarfone, K.; Mell, P. The Common Misuse Scoring System (CMSS): Metrics for Software Feature Misuse Vulnerabilities;
National Institute of Standards and Technology: Gaithersburg, MD, USA, 2012; pp. 16–17+20.
23. Nieles, M.; Dempsey, K.; Pillitteri, V.Y. An Introduction to Information Security; National Institute of Standards and Technology:
Gaithersburg, MD, USA, 2017; pp. 12–13.
24. Cichonski, P.; Millar, T.; Grance, T.; Scarfone, K. Computer Security Incident Handling Guide: Recommendations of the National Institute
of Standards and Technology; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2012; pp. 34–35.
25. Franklin, J.; Wergin, C.; Booth, H. CVSS Implementation Guidance; National Institute of Standards and Technology: Gaithersburg,
MD, USA, 2014; p. 16.
26. ISO/IEC 27001 ISO/IEC; Information Security, Cybersecurity and Privacy Protection—Information Security Management
Systems–Requirements. ISO: Geneva, Switzerland, 2022.
27. ISO/IEC 27032; Cybersecurity—Guidelines for Internet Security. ISO: Geneva, Switzerland, 2023.
28. Johnson, C.S.; Badger, M.L.; Waltermire, D.A.; Snyder, J.; Skorupka, C. Guide to Cyber Threat Information Sharing; National Institute
of Standards and Technology: Gaithersburg, MD, USA, 2016; p. 10.
29. Dempsey, K.; Eavy, P.; Moore, G. Automation Support for Security Control Assessments. Volume 1: Overview; National Institute of
Standards and Technology: Gaithersburg, MD, USA, 2017; p. NIST IR 8011-1. [CrossRef]
30. Cheikes, B.A.; Waltermire, D.; Scarfone, K. Common Platform Enumeration: Naming Specification Version 2.3; National Institute of
Standards and Technology: Gaithersburg, MD, USA, 2011; p. NIST IR 7695. [CrossRef]
J. Cybersecur. Priv. 2024, 4 900
31. Waltermire, D.; Cichonski, P.; Scarfone, K. Common Platform Enumeration: Applicability Language Specification Version 2.3; National
Institute of Standards and Technology: Gaithersburg, MD, USA, 2011; p. NIST IR 7698. [CrossRef]
32. Phillips, A.; Davis, M. Tags for Identifying Languages; Internet Engineering Task Force: Fremont, CA, USA, 2009. [CrossRef]
33. CPE—Common Platform Enumeration: CPE Specifications. Available online: https://cpe.mitre.org/specification/ (accessed on
21 April 2024).
34. Solving Problems for a Safer World|MITRE. Available online: https://www.mitre.org/ (accessed on 13 July 2024).
35. Home Page|CISA. Available online: https://www.cisa.gov/ (accessed on 13 July 2024).
36. NVD–Home. Available online: https://nvd.nist.gov/ (accessed on 22 April 2024).
37. CWE–About CWE. Available online: https://cwe.mitre.org/about/index.html (accessed on 22 April 2024).
38. CVSS v4.0 Specification Document. Available online: https://www.first.org/cvss/specification-document (accessed on 20 April 2024).
39. Liu, Q.; Zhang, Y. VRSS: A New System for Rating and Scoring Vulnerabilities. Comput. Commun. 2011, 34, 264–273. [CrossRef]
40. Spanos, G.; Sioziou, A.; Angelis, L. WIVSS: A New Methodology for Scoring Information Systems Vulnerabilities. In Proceedings
of the 17th Panhellenic Conference on Informatics, Thessaloniki, Greece, 19–21 September 2013; ACM: New York, NY, USA, 2013;
pp. 83–90. [CrossRef]
41. Sharma, A.; Sabharwal, S.; Nagpal, S. A Hybrid Scoring System for Prioritization of Software Vulnerabilities. Comput. Secur. 2023,
129, 103256. [CrossRef]
42. Swanson, M.; Bowen, P.; Phillips, A.W.; Gallup, D.; Lynes, D. Contingency Planning Guide for Federal Information Systems; National
Institute of Standards and Technology: Gaithersburg, MD, USA, 2010; p. 144.
43. NIST SP 800-53 Rev. 5; Joint Task Force Interagency Working Group Security and Privacy Controls for Information Systems and
Organizations Revision 5. National Institute of Standards and Technology: Gaithersburg, MD, USA, 2020; 176–188+370.
44. GitHub: Let’s Build from Here. Available online: https://github.com/ (accessed on 8 July 2024).
45. Liu, B.; Shi, L.; Cai, Z.; Li, M. Software Vulnerability Discovery Techniques: A Survey. In Proceedings of the 2012 Fourth
International Conference on Multimedia Information Networking and Security, Nanjing, China, 2–4 November 2012; IEEE:
Piscataway, NJ, USA, 2012; pp. 152–156.
46. Gawron, M.; Cheng, F.; Meinel, C. PVD: Passive Vulnerability Detection. In Proceedings of the 2017 8th International Conference
on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 322–327.
47. Na, S.; Kim, T.; Kim, H. Service Identification of Internet-Connected Devices Based on Common Platform Enumeration. J. Inf.
Process. Syst. 2018, 14, 740–750. [CrossRef]
48. Elbaz, C.; Rilling, L.; Morin, C. Automated Keyword Extraction from “One-Day” Vulnerabilities at Disclosure. In Proceedings of
the NOMS 2020—2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2020;
IEEE: Piscataway, NJ, USA, 2020; pp. 1–9.
49. Xu, Y.; Xu, Z.; Chen, B.; Song, F.; Liu, Y.; Liu, T. Patch Based Vulnerability Matching for Binary Programs. In Proceedings of the
29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual, 18–22 July 2020; ACM: New York, NY,
USA, 2020; pp. 376–387.
50. Zhao, Q.; Huang, C.; Dai, L. VULDEFF: Vulnerability Detection Method Based on Function Fingerprints and Code Differences.
Knowl.-Based Syst. 2023, 260, 110139. [CrossRef]
51. Kornblum, J. Identifying Almost Identical Files Using Context Triggered Piecewise Hashing. Digit. Investig. 2006, 3, 91–97.
[CrossRef]
52. McClanahan, K.; Li, Q. Towards Automatically Matching Security Advisories to CPEs: String Similarity-Based Vendor Matching.
In Proceedings of the IEEE International Conference on Computing, Networking and Communications (ICNC)-Workshop on
Computing, Networking and Communications, Big Island, HI, USA, 19–22 February 2024. [CrossRef]
53. McClanahan, K.; Elder, S.; Uwibambe, M.L.; Liu, Y.; Heng, R.; Li, Q. When ChatGPT Meets Vulnerability Management: The Good,
the Bad, and the Ugly. In Proceedings of the IEEE International Conference on Computing, Networking and Communications
(ICNC)-Workshop on Computing, Networking and Communications, Big Island, HI, USA, 19–22 February 2024. [CrossRef]
54. Gao, Z.; Zhang, C.; Liu, H.; Sun, W.; Tang, Z.; Jiang, L.; Chen, J.; Xie, Y. Faster and Better: Detecting Vulnerabilities in Linux-Based
IoT Firmware with Optimized Reaching Definition Analysis. In Proceedings of the 2024 Network and Distributed System Security
Symposium, San Diego, CA, USA, 26 February–1 March 2024; Internet Society: Reston, VA, USA, 2024. [CrossRef]
55. Wang, H.; Ye, G.; Tang, Z.; Tan, S.H.; Huang, S.; Fang, D.; Feng, Y.; Bian, L.; Wang, Z. Combining Graph-Based Learning with
Automated Data Collection for Code Vulnerability Detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1943–1958. [CrossRef]
56. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and
Applications. AI Open 2020, 1, 57–81. [CrossRef]
57. Noonan, R.E. An Algorithm for Generating Abstract Syntax Trees. Comput. Lang. 1985, 10, 225–236. [CrossRef]
58. Wen, X.-C.; Chen, Y.; Gao, C.; Zhang, H.; Zhang, J.M.; Liao, Q. Vulnerability Detection with Graph Simplification and Enhanced
Graph Representation Learning. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering
(ICSE), Melbourne, Australia, 17–19 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2275–2286.
59. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw.
Learn. Syst. 2019, 32, 4–24. [CrossRef]
60. Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations
Using RNN Encoder-Decoder for Statistical Machine Translation 2014. arXiv 2014, arXiv:1406.1078.
J. Cybersecur. Priv. 2024, 4 901
61. Zheng, W.; Jiang, Y.; Su, X. Vu1SPG: Vulnerability Detection Based on Slice Property Graph Representation Learning. In
Proceedings of the 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China,
25–28 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 457–467.
62. Li, Z.; Zou, D.; Xu, S.; Jin, H.; Zhu, Y.; Chen, Z. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities.
IEEE Trans. Dependable Secur. Comput. 2022, 19, 2244–2258. [CrossRef]
63. Ferrante, J. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 1987, 9, 319–349.
[CrossRef]
64. Yamaguchi, F.; Golde, N.; Arp, D.; Rieck, K. Modeling and Discovering Vulnerabilities with Code Property Graphs. In Proceedings
of the 2014 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 18–21 May 2014; IEEE: Piscataway, NJ, USA, 2014;
pp. 590–604.
65. Gensim: Topic Modelling for Humans. Available online: https://radimrehurek.com/gensim/models/word2vec.html (accessed
on 1 June 2024).
66. Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional
Networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018.
[CrossRef]
67. Tovarnak, D.; Sadlek, L.; Celeda, P. Graph-Based CPE Matching for Identification of Vulnerable Asset Configurations. In
Proceedings of the 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM), Virtual, 17–21 May 2021;
pp. 986–991.
68. Longueira-Romero, Á.; Iglesias, R.; Flores, J.L.; Garitano, I. A Novel Model for Vulnerability Analysis through Enhanced Directed
Graphs and Quantitative Metrics. Sensors 2022, 22, 2126. [CrossRef]
69. CAPEC—Common Attack Pattern Enumeration and Classification (CAPECTM). Available online: https://capec.mitre.org/
(accessed on 4 May 2024).
70. ISA/IEC 62443; Industrial Communication Networks—Network and System Security Series of Standards. ISA: Durham, NC,
USA, 2017.
71. Autonomy–Open-Source PLC Software. Available online: https://autonomylogic.com/ (accessed on 7 June 2024).
72. Alves, T. Thiagoralves/OpenPLC. Available online: https://github.com/thiagoralves/OpenPLC (accessed on 7 June 2024).
73. Alves, T. Thiagoralves/OpenPLC_v2. Available online: https://github.com/thiagoralves/OpenPLC_v2 (accessed on 7 June 2024).
74. Alves, T. Thiagoralves/OpenPLC_v3. Available online: https://github.com/thiagoralves/OpenPLC_v3 (accessed on 7 June 2024).
75. Husák, M.; Khoury, J.; Klisura, Ð.; Bou-Harb, E. On the Provision of Network-Wide Cyber Situational Awareness via Graph-Based
Analytics. In Complex Computational Ecosystems; Collet, P., Gardashova, L., El Zant, S., Abdulkarimova, U., Eds.; Lecture Notes in
Computer Science; Springer Nature Switzerland: Cham, Switezerland, 2023; Volume 13927, pp. 167–179, ISBN 978-3-031-44354-1.
76. Jajodia, S.; Liu, P.; Swarup, V.; Wang, C. Cyber Situational Awareness: Issues and Research; Springer Science & Business Media:
Berlin/Heidelberg, Germany, 2009; ISBN 978-1-4419-0140-8.
77. Jiang, C.; Coenen, F.; Zito, M. A Survey of Frequent Subgraph Mining Algorithms. Knowl. Eng. Rev. 2013, 28, 75–105. [CrossRef]
78. Brandes, U. A Faster Algorithm for Betweenness Centrality*. J. Math. Sociol. 2001, 25, 163–177. [CrossRef]
79. De, S.; Sodhi, R. A PMU Assisted Cyber Attack Resilient Framework against Power Systems Structural Vulnerabilities. Electr.
Power Syst. Res. 2022, 206, 107805. [CrossRef]
80. Shi, Z.; Matyunin, N.; Graffi, K.; Starobinski, D. Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs. ACM
Trans. Priv. Secur. 2024, 27, 1–26. [CrossRef]
81. Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data.
Proc. 26th Int. Conf. Neural Inf. Process. Syst. 2013, 2, 2787–2795.
82. Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of
the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016.
83. Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases.
arXiv 2014. [CrossRef]
84. Lu, G.; Ju, X.; Chen, X.; Pei, W.; Cai, Z. GRACE: Empowering LLM-Based Software Vulnerability Detection with Graph Structure
and in-Context Learning. J. Syst. Softw. 2024, 212, 112031. [CrossRef]
85. Wu, Y.; Zou, D.; Dou, S.; Yang, W.; Xu, D.; Jin, H. VulCNN: An Image-Inspired Scalable Vulnerability Detection System. In
Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 21 May 2022; ACM: New York,
NY, USA, 2022; pp. 2365–2376.
86. Salayma, M. Threat Modelling in Internet of Things (IoT) Environments Using Dynamic Attack Graphs. Front. Internet Things
2024, 3, 1306465. [CrossRef]
87. Neo4j–Plateforme de Données de Graphes. Available online: https://neo4j.com/fr/ (accessed on 2 May 2024).
88. Project-Kb/MSR2019 at Main · SAP/Project-Kb. Available online: https://github.com/SAP/project-kb/tree/main/MSR2019
(accessed on 17 May 2024).
89. SecretPatch SecretPatch/Dataset. Available online: https://github.com/SecretPatch/Dataset (accessed on 17 May 2024).
90. NIST Software Assurance Reference Dataset. Available online: https://samate.nist.gov/SARD (accessed on 14 May 2024).
91. Wang, Y.; Wang, W.; Joty, S.; Hoi, S.C.H. CodeT5: Identifier-Aware Unified Pre-Trained Encoder-Decoder Models for Code
Understanding and Generation. arXiv 2021, arXiv:2109.00859.
J. Cybersecur. Priv. 2024, 4 902
92. Belkina, A.C.; Ciccolella, C.O.; Anno, R.; Halpert, R.; Spidlen, J.; Snyder-Cappione, J.E. Automated Optimized Parameters for
T-Distributed Stochastic Neighbor Embedding Improve Visualization and Analysis of Large Datasets. Nat. Commun. 2019,
10, 5415. [CrossRef]
93. Yang, G.; Chen, X.; Cao, J.; Xu, S.; Cui, Z.; Yu, C.; Liu, K. ComFormer: Code Comment Generation via Transformer and Fusion
Method-Based Hybrid Code Representation. In Proceedings of the 2021 8th International Conference on Dependable Systems
and Their Applications (DSA), Yinchuan, China, 11–12 September 2021. [CrossRef]
94. Chakraborty, S.; Krishna, R.; Ding, Y.; Ray, B. Deep Learning Based Vulnerability Detection: Are We There Yet? IEEE Trans. Softw.
Eng. 2022, 48, 3280–3296. [CrossRef]
95. Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective Vulnerability Identification by Learning Comprehensive Program
Semantics via Graph Neural Networks. Conf. Neural Inf. Process. Syst. 2019. [CrossRef]
96. Fan, J.; Li, Y.; Wang, S.; Nguyen, T.N. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In
Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29 June 2020; ACM:
New York, NY, USA, 2020; pp. 508–512.
97. Batory, D.; Benavides, D.; Ruiz-Cortes, A. Automated Analysis of Feature Models. Commun. ACM 2006, 49, 45–47. [CrossRef]
98. Batory, D. Feature Models, Grammars, and Propositional Formulas. In Software Product Lines; Obbink, H., Pohl, K., Eds.; Lecture
Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3714, pp. 7–20, ISBN 978-3-540-28936-4.
99. Varela-Vaca, Á.J.; Gasca, R.M.; Ceballos, R.; Gómez-López, M.T.; Torres, P.B. CyberSPL: A Framework for the Verification of
Cybersecurity Policy Compliance of System Configurations Using Software Product Lines. Appl. Sci. 2019, 9, 5364. [CrossRef]
100. Galindo, J.A.; Benavides, D.; Trinidad, P.; Gutiérrez-Fernández, A.-M.; Ruiz-Cortés, A. Automated Analysis of Feature Models:
Quo Vadis? Computing 2019, 101, 387–433. [CrossRef]
101. Brailsford, S.C.; Potts, C.N.; Smith, B.M. Constraint Satisfaction Problems: Algorithms and Applications. Eur. J. Oper. Res. 1999,
119, 557–581. [CrossRef]
102. Prud’homme, C.; Fages, J.-G.; Lorca, X. Choco-Solver. Available online: https://choco-solver.org/ (accessed on 5 June 2024).
103. Benavides, D.; Trinidad, P.; Ruiz-Cortés, A.; Segura, S. FaMa. In Systems and Software Variability Management: Concepts,
Tools and Experiences; Capilla, R., Bosch, J., Kang, K.-C., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 163–171,
ISBN 978-3-642-36583-6.
104. Kenner, A.; Dassow, S.; Lausberger, C.; Krüger, J.; Leich, T. Using Variability Modeling to Support Security Evaluations:
Virtualizing the Right Attack Scenarios. In Proceedings of the 14th International Working Conference on Variability Modelling of
Software-Intensive Systems, Magdeburg, Germany, 5 February 2020; ACM: New York, NY, USA, 2020; pp. 1–9.
105. Maynor, D. Metasploit Toolkit for Penetration Testing, Exploit Development, and Vulnerability Research; Maynor, D., Mookhey, K.K.,
Eds.; Syngress: Burlington, MA, USA, 2007; pp. vii–ix, ISBN 978-1-59749-074-0.
106. Varela-Vaca, Á.J.; Gasca, R.M.; Carmona-Fombella, J.A.; Gómez-López, M.T. AMADEUS: Towards the AutoMAteD secUrity teSt-
ing. In Proceedings of the 24th ACM Conference on Systems and Software Product Line, Montreal, QC, Canada, 19 October 2020;
ACM: New York, NY, USA, 2020; Volume A, pp. 1–12.
107. Varela-Vaca, Á.J.; Borrego, D.; Gómez-López, M.T.; Gasca, R.M.; Márquez, A.G. Feature Models to Boost the Vulnerability
Management Process. J. Syst. Softw. 2023, 195, 111541. [CrossRef]
108. Galindo, J.A.; Benavides, D. A Python Framework for the Automated Analysis of Feature Models: A First Step to Integrate
Community Efforts. In Proceedings of the 24th ACM International Systems and Software Product Line Conference, Montreal, QC,
Canada, 19 October 2020; ACM: New York, NY, USA, 2020; Volume B, pp. 52–55.
109. Li, Z.; Zou, D.; Xu, S.; Ou, X.; Jin, H.; Wang, S.; Deng, Z.; Zhong, Y. VulDeePecker: A Deep Learning-Based System for
Vulnerability Detection. In Proceedings of the 2018 Network and Distributed System Security Symposium, San Diego, CA, USA,
18–21 February 2018; Internet Society: Reston, VA, USA, 2018. [CrossRef]
110. Keras-Team/Keras. Available online: https://github.com/keras-team/keras (accessed on 1 June 2024).
111. Chiu, J.P.C.; Nichols, E. Named Entity Recognition with Bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 2016, 4,
357–370. [CrossRef]
112. Sun, P.; Yang, X.; Zhao, X.; Wang, Z. An Overview of Named Entity Recognition. In Proceedings of the 2018 International
Conference on Asian Language Processing (IALP), Bandung, Indonesia, 15–17 November 2018; IEEE: Piscataway, NJ, USA, 2018;
pp. 273–278.
113. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 36, 1735–1780. [CrossRef]
114. Huff, P.; McClanahan, K.; Le, T.; Li, Q. A Recommender System for Tracking Vulnerabilities. In Proceedings of the 16th
International Conference on Availability, Reliability and Security, Vienna, Austria, 17 August 2021; ACM: New York, NY, USA,
2021; pp. 1–7.
115. spaCy · Industrial-Strength Natural Language Processing in Python. Available online: https://spacy.io/ (accessed on 25 May 2024).
116. Rahutomo, F.; Kitasuka, T.; Aritsugi, M. Semantic Cosine Similarity. In Proceedings of the 7th International Student Conference
on Advanced Science and Technology ICAST, Seoul, Republic of Korea, 29–30 October 2012.
117. Kwak, B.I.; Han, M.L.; Kim, H.K. Cosine Similarity Based Anomaly Detection Methodology for the CAN Bus. Expert Syst. Appl.
2021, 166, 114066. [CrossRef]
118. Mihoub, A.; Fredj, O.B.; Cheikhrouhou, O.; Derhab, A.; Krichen, M. Denial of Service Attack Detection and Mitigation for Internet
of Things Using Looking-Back-Enabled Machine Learning Techniques. Comput. Electr. Eng. 2022, 98, 107716. [CrossRef]
J. Cybersecur. Priv. 2024, 4 903
119. Qu, Y.; Uddin, M.P.; Gan, C.; Xiang, Y.; Gao, L.; Yearwood, J. Blockchain-Enabled Federated Learning: A Survey. ACM Comput.
Surv. 2023, 55, 1–35. [CrossRef]
120. Torres, C.F.; Iannillo, A.K.; Gervais, A.; State, R. The Eye of Horus: Spotting and Analyzing Attacks on Ethereum Smart Contracts.
In Proceedings of the International Conference on Financial Cryptography and Data Security, Virtual, 15 January 2021. [CrossRef]
121. Sun, X.; Tu, L.; Zhang, J.; Cai, J.; Li, B.; Wang, Y. ASSBert: Active and Semi-Supervised Bert for Smart Contract Vulnerability
Detection. J. Inf. Secur. Appl. 2023, 73, 103423. [CrossRef]
122. Huang, S.; Jin, R.; Zhou, Z. Active Learning by Querying Informative and Representative Examples. Adv. Neural Inf. Process. Syst.
2010, 23. [CrossRef] [PubMed]
123. Taherkhani, F.; Kazemi, H.; Nasrabadi, N.M. Matrix Completion for Graph-Based Deep Semi-Supervised Learning. In Proceedings
of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [CrossRef]
124. Arazo, E.; Ortego, D.; Albert, P.; O’Connor, N.E.; McGuinness, K. Pseudo-Labeling and Confirmation Bias in Deep Semi-
Supervised Learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, July
2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8.
125. Yalniz, I.Z.; Jégou, H.; Chen, K.; Paluri, M.; Mahajan, D. Billion-Scale Semi-Supervised Learning for Image Classification. arXiv
2019, arXiv:1905.00546.
126. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understand-
ing. arXiv 2018, arXiv:1810.04805.
127. Wen, X.-C.; Wang, X.; Chen, Y.; Hu, R.; Lo, D.; Gao, C. VulEval: Towards Repository-Level Evaluation of Software Vulnerability
Detection. arXiv 2024, arXiv:2404.15596.
128. Hou, X.; Zhao, Y.; Liu, Y.; Yang, Z.; Wang, K.; Li, L.; Luo, X.; Lo, D.; Grundy, J.; Wang, H. Large Language Models for Software
Engineering: A Systematic Literature Review. arXiv 2023, arXiv:2308.10620v6. [CrossRef]
129. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al.
LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971.
130. Rozière, B.; Gehring, J.; Gloeckle, F.; Sootla, S.; Gat, I.; Tan, X.E.; Adi, Y.; Liu, J.; Sauvestre, R.; Remez, T.; et al. Code Llama: Open
Foundation Models for Code. arXiv 2023, arXiv:2308.12950.
131. ChatGPT. Available online: https://chatgpt.com (accessed on 2 June 2024).
132. Tariq, U. Combatting Ransomware in ZephyrOS-Activated Industrial IoT Environments. Heliyon 2024, 10, e29917. [CrossRef]
[PubMed]
133. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of
Things for Network Forensic Analytics: Bot-IoT Dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [CrossRef]
134. Durieux, T.; Ferreira, J.F.; Abreu, R.; Cruz, P. Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart Contracts.
In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June 2020;
ACM: New York, NY, USA, 2020; pp. 530–541.
135. SoliAudit VA Dataset. Available online: https://docs.google.com/spreadsheets/u/1/d/17QxTGZA7xNifAV8bQ2A2
dJWRRHcmPp3QgPNxwptT9Zw/edit?pli=1&usp=embed_facebook (accessed on 29 May 2024).
136. Ghaleb, A.; Pattabiraman, K. How Effective Are Smart Contract Analysis Tools? Evaluating Smart Contract Static Analysis Tools
Using Bug Injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis,
Virtual, 18 July 2020; ACM: New York, NY, USA, 2020; pp. 415–427.
137. Abdullahi, M.; Baashar, Y.; Alhussian, H.; Alwadain, A.; Aziz, N.; Capretz, L.F.; Abdulkadir, S.J. Detecting Cybersecurity Attacks
in Internet of Things Using Artificial Intelligence Methods: A Systematic Literature Review. Electronics 2022, 11, 198. [CrossRef]
138. Amoo, O.O.; Osasona, F.; Atadoga, A.; Ayinla, B.S.; Farayola, O.A.; Abrahams, T.O. Cybersecurity Threats in the Age of IoT: A
Review of Protective Measures. Int. J. Sci. Res. Arch. 2024, 11, 1304–1310. [CrossRef]
139. Ahmad, W.; Rasool, A.; Javed, A.R.; Baker, T.; Jalil, Z. Cyber Security in IoT-Based Cloud Computing: A Comprehensive Survey.
Electronics 2021, 11, 16. [CrossRef]
140. Buda, M.; Maki, A.; Mazurowski, M.A. A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks.
Neural Netw. 2018, 106, 249–259. [CrossRef]
141. Senanayake, J.; Kalutarage, H.; Al-Kadri, M.O.; Piras, L.; Petrovski, A. Labelled Vulnerability Dataset on Android Source Code
(LVDAndro) to Develop AI-Based Code Vulnerability Detection Models. In Proceedings of the 20th International Conference on
Security and Cryptography, Rome, Italy, 10–12 July 2023; SCITEPRESS—Science and Technology Publications: Setúbal, Portugal,
2023; pp. 659–666.
142. Rezaeibagha, F.; Mu, Y.; Huang, K.; Chen, L. Secure and Efficient Data Aggregation for IoT Monitoring Systems. IEEE Internet
Things J. 2021, 8, 8056–8063. [CrossRef]
143. Pinconschi, E.; Reis, S.; Zhang, C.; Abreu, R.; Erdogmus, H.; Păsăreanu, C.S.; Jia, L. Tenet: A Flexible Framework for
Machine-Learning-Based Vulnerability Detection. In Proceedings of the 2023 IEEE/ACM 2nd International Conference on
AI Engineering–Software Engineering for AI (CAIN), Melbourne, Australia, 15–16 May 2023; IEEE: Piscataway, NJ, USA, 2023;
pp. 102–103.
144. Stellios, I.; Kotzanikolaou, P.; Psarakis, M. Advanced Persistent Threats and Zero-Day Exploits in Industrial Internet of Things. In
Security and Privacy Trends in the Industrial Internet of Things; Alcaraz, C., Ed.; Advanced Sciences and Technologies for Security
Applications; Springer International Publishing: Cham, Switzerland, 2019; pp. 47–68, ISBN 978-3-030-12329-1.
J. Cybersecur. Priv. 2024, 4 904
145. Singh, S.; Sharma, P.K.; Moon, S.Y.; Moon, D.; Park, J.H. A Comprehensive Study on APT Attacks and Countermeasures for
Future Networks and Communications: Challenges and Solutions. J. Supercomput. 2019, 75, 4543–4574. [CrossRef]
146. Admass, W.S.; Munaye, Y.Y.; Diro, A.A. Cyber Security: State of the Art, Challenges and Future Directions. Cyber Secur. Appl.
2024, 2, 100031. [CrossRef]
147. Maglaras, L.; Janicke, H.; Ferrag, M.A. Cybersecurity of Critical Infrastructures: Challenges and Solutions. Sensors 2022, 22, 5105.
[CrossRef]
148. Djenna, A.; Harous, S.; Saidouni, D.E. Internet of Things Meet Internet of Threats: New Concern Cyber Security Issues of Critical
Cyber Infrastructure. Appl. Sci. 2021, 11, 4580. [CrossRef]
149. Soe, Y.N.; Feng, Y.; Santosa, P.I.; Hartanto, R.; Sakurai, K. Towards a Lightweight Detection System for Cyber Attacks in the IoT
Environment Using Corresponding Features. Electronics 2020, 9, 144. [CrossRef]
150. Long, Z.; Yan, H.; Shen, G.; Zhang, X.; He, H.; Cheng, L. A Transformer-Based Network Intrusion Detection Approach for Cloud
Security. J. Cloud Comput. 2024, 13, 5. [CrossRef]
151. Jameil, A.K.; Al-Raweshidy, H. AI-Enabled Healthcare and Enhanced Computational Resource Management With Digital Twins
Into Task Offloading Strategies. IEEE Access 2024, 12, 90353–90370. [CrossRef]
152. Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process.
Mag. 2020, 37, 50–60. [CrossRef]
153. Okoli, U.I.; Obi, O.C.; Adewusi, A.O.; Abrahams, T.O. Machine Learning in Cybersecurity: A Review of Threat Detection and
Defense Mechanisms. World J. Adv. Res. Rev. 2024, 21, 2286–2295. [CrossRef]
154. Salem, A.H.; Azzam, S.M.; Emam, O.E.; Abohany, A.A. Advancing Cybersecurity: A Comprehensive Review of AI-Driven
Detection Techniques. J. Big Data 2024, 11, 105. [CrossRef]
155. Denz, R.; Taylor, S. A Survey on Securing the Virtual Cloud. J. Cloud Comput. Adv. Syst. Appl. 2013, 2, 17. [CrossRef]
156. Guo, W.; Fang, Y.; Huang, C.; Ou, H.; Lin, C.; Guo, Y. HyVulDect: A Hybrid Semantic Vulnerability Mining System Based on
Graph Neural Network. Comput. Secur. 2022, 121, 102823. [CrossRef]
157. Taghavi, S.M.; Feyzi, F. Using Large Language Models to Better Detect and Handle Software Vulnerabilities and Cyber Se-
curity Threats, CC BY 4.0 License. 2024. Available online: https://www.researchgate.net/publication/380772943_Using_
Large_Language_Models_to_Better_Detect_and_Handle_Software_Vulnerabilities_and_Cyber_Security_Threats (accessed on
31 July 2024). [CrossRef]
158. Dokeroglu, T.; Sevinc, E.; Kucukyilmaz, T.; Cosar, A. A Survey on New Generation Metaheuristic Algorithms. Comput. Ind. Eng.
2019, 137, 106040. [CrossRef]
159. Rajwar, K.; Deep, K.; Das, S. An Exhaustive Review of the Metaheuristic Algorithms for Search and Optimization: Taxonomy,
Applications, and Open Challenges. Artif. Intell. Rev. 2023, 56, 13187–13257. [CrossRef] [PubMed]
160. Nong, Y.; Sharma, R.; Hamou-Lhadj, A.; Luo, X.; Cai, H. Open Science in Software Engineering: A Study on Deep Learning-Based
Vulnerability Detection. IEEE Trans. Softw. Eng. 2023, 49, 1983–2005. [CrossRef]
161. Chen, Y.; Ding, Z.; Alowain, L.; Chen, X.; Wagner, D. DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning
Based Vulnerability Detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and
Defenses, Hong Kong, China, 16 October 2023; ACM: New York, NY, USA, 2023; pp. 654–668.
162. Yang, X.; Wang, S.; Li, Y.; Wang, S. Does Data Sampling Improve Deep Learning-Based Vulnerability Detection? Yeas! And Nays!
In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia,
14–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2287–2298.
163. Nie, X.; Li, N.; Wang, K.; Wang, S.; Luo, X.; Wang, H. Understanding and Tackling Label Errors in Deep Learning-Based
Vulnerability Detection (Experience Paper). In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software
Testing and Analysis, Seattle, WA, USA, 12 July 2023; ACM: New York, NY, USA, 2023; pp. 52–63.
164. Tang, W.; Tang, M.; Ban, M.; Zhao, Z.; Feng, M. CSGVD: A Deep Learning Approach Combining Sequence and Graph Embedding
for Source Code Vulnerability Detection. J. Syst. Softw. 2023, 199, 111623. [CrossRef]
165. Liu, Z.; Jiang, M.; Zhang, S.; Zhang, J.; Liu, Y. A Smart Contract Vulnerability Detection Mechanism Based on Deep Learning and
Expert Rules. IEEE Access 2023, 11, 77990–77999. [CrossRef]
166. Yuan, B.; Lu, Y.; Fang, Y.; Wu, Y.; Zou, D.; Li, Z.; Li, Z.; Jin, H. Enhancing Deep Learning-Based Vulnerability Detection by
Building Behavior Graph Model. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering
(ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2262–2274.
167. Harzevili, N.S.; Belle, A.B.; Wang, J.; Wang, S.; Ming, Z.; Nagappan, N. A Survey on Automated Software Vulnerability Detection
Using Machine Learning and Deep Learning. arXiv, 2023. [CrossRef]
168. Steenhoek, B.; Rahman, M.M.; Jiles, R.; Le, W. An Empirical Study of Deep Learning Models for Vulnerability Detection. In
Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia,
17–19 May 2023. [CrossRef]
169. Yuan, Y.; Xie, T. SVChecker: A Deep Learning-Based System for Smart Contract Vulnerability Detection. In Proceedings of the
International Conference on Computer Application and Information Security (ICCAIS 2021), Wuhan, China, 25 May 2022; Lu, Y.,
Cheng, C., Eds.; SPIE: Bellingham, WA, USA, 2022; p. 99.
170. Hussan, B.K.; Rashid, Z.N.; Zeebaree, S.R.M.; Zebari, R.R. Optimal Deep Belief Network Enabled Vulnerability Detection on
Smart Environment. J. Smart Internet Things 2022, 2022, 146–162. [CrossRef]
J. Cybersecur. Priv. 2024, 4 905
171. Russell, R.L.; Kim, L.; Hamilton, L.H.; Lazovich, T.; Harer, J.A.; Ozdemir, O.; Ellingwood, P.M.; McConley, M.W. Automated
Vulnerability Detection in Source Code Using Deep Representation Learning. In Proceedings of the 2018 17th IEEE International
Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018. [CrossRef]
172. Zhou, Y.; Sharma, A. Automated Identification of Security Issues from Commit Messages and Bug Reports. In Proceedings of the
2017 11th Joint Meeting on Foundations of Software Engineering, Paderborn, Germany, 21 August 2017; ACM: New York, NY,
USA, 2017; pp. 914–919.
173. Russo, E.R.; Di Sorbo, A.; Visaggio, C.A.; Canfora, G. Summarizing Vulnerabilities’ Descriptions to Support Experts during
Vulnerability Assessment Activities. J. Syst. Softw. 2019, 156, 84–99. [CrossRef]
174. Li, Y.; Wang, S.; Nguyen, T.N. Vulnerability Detection with Fine-Grained Interpretations. In Proceedings of the 29th ACM Joint
Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens,
Greece, 20 August 2021; ACM: New York, NY, USA, 2021; pp. 292–303.
175. Li, D.; Liu, Y.; Huang, J. Assessment of Software Vulnerability Contributing Factors by Model-Agnostic Explainable AI. Mach.
Learn. Knowl. Extr. 2024, 6, 1087–1113. [CrossRef]
176. Zhang, F.; Huff, P.; McClanahan, K.; Li, Q. A Machine Learning-Based Approach for Automated Vulnerability Remediation
Analysis. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Avignon, France,
29 June–1 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–9.
177. Hassan, M.d.M.; Ahmad, R.B.; Ghosh, T. SQL Injection Vulnerability Detection Using Deep Learning: A Feature-Based Approach.
Indones. J. Electr. Eng. Inform. IJEEI 2021, 9, 702–718. [CrossRef]
178. Hu, L.; Chang, J.; Chen, Z.; Hou, B. Web Application Vulnerability Detection Method Based on Machine Learning. J. Phys. Conf.
Ser. 2021, 1827, 012061. [CrossRef]
179. Cao, Y.; Zhang, L.; Zhao, X.; Jin, K.; Chen, Z. An Intrusion Detection Method for Industrial Control System Based on Machine
Learning. Information 2022, 13, 322. [CrossRef]
180. Hulayyil, S.B.; Li, S.; Xu, L. Machine-Learning-Based Vulnerability Detection and Classification in Internet of Things Device
Security. Electronics 2023, 12, 3927. [CrossRef]
181. Shaukat, K.; Luo, S.; Chen, S.; Liu, D. Cyber Threat Detection Using Machine Learning Techniques: A Performance Evaluation
Perspective. In Proceedings of the 2020 International Conference on Cyber Warfare and Security (ICCWS), Islamabad, Pakistan,
20 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6.
182. Abdusalomov, A.; Kilichev, D.; Nasimov, R.; Rakhmatullayev, I.; Im Cho, Y. Optimizing Smart Home Intrusion Detection with
Harmony-Enhanced Extra Trees. IEEE Access 2024, 12, 117761–117786. [CrossRef]
183. Gawand, S.P.; Kumar, M.S. A Comparative Study of Cyber Attack Detection & Prediction Using Machine Learning Algorithms.
Preprint 2023. [CrossRef]
184. Azhagiri, M.; Rajesh, A.; Karthik, S.; Raja, K. An Intrusion Detection System Using Ranked Feature Bagging. Int. J. Inf. Technol.
2023, 16, 1213–1219. [CrossRef]
185. Rodriguez, E.; Otero, B.; Gutierrez, N.; Canal, R. A Survey of Deep Learning Techniques for Cybersecurity in Mobile Networks.
IEEE Commun. Surv. Tutor. 2021, 23, 1920–1955. [CrossRef]
186. Boi, B.; Esposito, C.; Lee, S. VulnHunt-GPT: A Smart Contract Vulnerabilities Detector Based on OpenAI chatGPT. In Proceedings
of the 39th ACM/SIGAPP Symposium on Applied Computing, Avila, Spain, 8 April 2024; ACM: New York, NY, USA, 2024;
pp. 1517–1524.
187. Ding, Y.; Fu, Y.; Ibrahim, O.; Sitawarin, C.; Chen, X.; Alomair, B.; Wagner, D.; Ray, B.; Chen, Y. Vulnerability Detection with Code
Language Models: How Far Are We? arXiv 2024. [CrossRef]
188. Zhou, X.; Cao, S.; Sun, X.; Lo, D. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road
Ahead. arXiv 2024, arXiv:2404.02525.
189. Xu, H.; Wang, S.; Li, N.; Wang, K.; Zhao, Y.; Chen, K.; Yu, T.; Liu, Y.; Wang, H. Large Language Models for Cyber Security: A
Systematic Literature Review. arXiv 2024, arXiv:2405.04760.
190. Yin, X.; Ni, C.; Wang, S. Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability. arXiv 2024, arXiv:2404.02056.
191. Steenhoek, B.; Rahman, M.M.; Roy, M.K.; Alam, M.S.; Barr, E.T.; Le, W. A Comprehensive Study of the Capabilities of Large
Language Models for Vulnerability Detection. arXiv 2024, arXiv:2403.17218.
192. Li, Z.; Dutta, S.; Naik, M. LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. arXiv 2024, arXiv:2405.17238.
193. Fang, R.; Bindu, R.; Gupta, A.; Kang, D. LLM Agents Can Autonomously Exploit One-Day Vulnerabilities. arXiv 2024,
arXiv:2404.08144.
194. Zhou, X.; Zhang, T.; Lo, D. Large Language Model for Vulnerability Detection: Emerging Results and Future Directions. In
Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results,
Lisbon, Portugal, 14 April 2024; ACM: New York, NY, USA, 2024; pp. 47–51.
195. Sun, Y.; Wu, D.; Xue, Y.; Liu, H.; Ma, W.; Zhang, L.; Shi, M.; Liu, Y. LLM4Vuln: A Unified Evaluation Framework for Decoupling
and Enhancing LLMs’ Vulnerability Reasoning. arXiv 2024, arXiv:2401.16185.
196. Tóth, R.; Bisztray, T.; Erdodi, L. LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities and
Limitations. In Proceedings of the International Conference on Computer Safety, Reliability, and Security, Florence, Italy, 17–20
September 2024. [CrossRef]
J. Cybersecur. Priv. 2024, 4 906
197. Ullah, S.; Han, M.; Pearce, S.P.H.; Coskun, A.; Stringhini, G. LLMs Cannot Reliably Identify and Reason About Security
Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. In Proceedings of the IEEE Symposium on
Security and Privacy, Francisco, CA, USA, 20–22 May 2024. [CrossRef]
198. Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A Survey on Large Language Model (LLM) Security and Privacy: The Good,
The Bad, and The Ugly. High-Confid. Comput. 2024, 4, 100211. [CrossRef]
199. Mathews, N.S.; Brus, Y.; Aafer, Y.; Nagappan, M.; McIntosh, S. LLbezpeky: Leveraging Large Language Models for Vulnerability
Detection. arXiv 2024, arXiv:2401.01269.
200. Shestov, A.; Levichev, R.; Mussabayev, R.; Maslov, E.; Cheshkov, A.; Zadorozhny, P. Finetuning Large Language Models for
Vulnerability Detection. arXiv 2024, arXiv:2401.17010.
201. Sun, Y.; Wu, D.; Xue, Y.; Liu, H.; Wang, H.; Xu, Z.; Xie, X.; Liu, Y. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts
by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software
Engineering, Lisbon, Portugal, 12 April 2024; ACM: New York, NY, USA, 2024; pp. 1–13.
202. Jones, A.; Omar, M. Codesentry: Revolutionizing Real-Time Software Vulnerability Detection With Optimized GPT Framework.
Land Forces Acad. Rev. 2024, 29, 98–107. [CrossRef]
203. Ferrag, M.A.; Alwahedi, F.; Battah, A.; Cherif, B.; Mechri, A.; Tihanyi, N. Generative AI and Large Language Models for Cyber
Security: All Insights You Need. arXiv 2024, arXiv:2405.12750.
204. Manjunatha, A.; Kota, K.; Babu, A.S. CVE Severity Prediction from Vulnerability Description—A Deep Learning Approach.
Procedia Comput. Sci. 2024, 235, 3105–3117. [CrossRef]
205. Rawte, V.; Tonmoy, S.M.T.I.; Rajbangshi, K.; Nag, S.; Chadha, A.; Sheth, A.P.; Das, A. FACTOID: FACtual enTailment fOr
hallucInation Detection. arXiv 2024, arXiv:2403.19113.
206. Agrawal, P.; Abutarboush, H.F.; Ganesh, T.; Mohamed, A.W. Metaheuristic Algorithms on Feature Selection: A Survey of One
Decade of Research (2009–2019). IEEE Access 2021, 9, 26766–26791. [CrossRef]
207. Zeinalpour, A.; McElroy, C.P. Comparing Metaheuristic Search Techniques in Addressing the Effectiveness of Clustering-Based
DDoS Attack Detection Methods. Electronics 2024, 13, 899. [CrossRef]
208. Thomas, M.; Meshram, B.B. DoS Attack Detection Using Aquila Deer Hunting Optimization Enabled Deep Belief Network. Int. J.
Web Inf. Syst. 2024, 20, 66–87. [CrossRef]
209. Syed, R. Cybersecurity Vulnerability Management: A Conceptual Ontology and Cyber Intelligence Alert System. Inf. Manag.
2020, 57, 103334. [CrossRef]
210. Jia, Y.; Qi, Y.; Shang, H.; Jiang, R.; Li, A. A Practical Approach to Constructing a Knowledge Graph for Cybersecurity. Engineering
2018, 4, 53–60. [CrossRef]
211. Martínez, S.; Cosentino, V.; Cabot, J. Model-Based Analysis of Java EE Web Security Misconfigurations. Comput. Lang. Syst. Struct.
2017, 49, 36–61. [CrossRef]
212. Seidl, C.; Winkelmann, T.; Schaefer, I. A Software Product Line of Feature Modeling Notations and Cross-Tree Constraint
Languages. 2016, pp. 157–172. Available online: https://dl.gi.de/items/758130c0-32b3-485e-8d9d-04e1e1f94a8f (accessed on
21 July 2024).
213. Sawyer, P.; Mazo, R.; Diaz, D.; Salinesi, C.; Hughes, D. Using Constraint Programming to Manage Configurations in Self-Adaptive
Systems. Computer 2012, 45, 56–63. [CrossRef]
214. Felfernig, A.; Walter, R.; Galindo, J.A.; Benavides, D.; Erdeniz, S.P.; Atas, M.; Reiterer, S. Anytime Diagnosis for Reconfiguration. J.
Intell. Inf. Syst. 2018, 51, 161–182. [CrossRef]
215. Varela-Vaca, Á.J.; Galindo, J.A.; Ramos-Gutiérrez, B.; Gómez-López, M.T.; Benavides, D. Process Mining to Unleash Variability
Management: Discovering Configuration Workflows Using Logs. In Proceedings of the 23rd International Systems and Software
Product Line Conference, Paris, France, 9 September 2019; ACM: New York, NY, USA, 2019; Volume A, pp. 265–276.
216. Costa, G.; Merlo, A.; Verderame, L.; Armando, A. Automatic Security Verification of Mobile App Configurations. Future Gener.
Comput. Syst. 2018, 80, 519–536. [CrossRef]
217. Murthy, P.V.R.; Shilpa, R.G. Vulnerability Coverage Criteria for Security Testing of Web Applications. In Proceedings of the
2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India,
19–22 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 489–494.
218. Xiong, W.; Lagerström, R. Threat Modeling—A Systematic Literature Review. Comput. Secur. 2019, 84, 53–69. [CrossRef]
219. Thüm, T.; Kästner, C.; Benduhn, F.; Meinicke, J.; Saake, G.; Leich, T. FeatureIDE: An Extensible Framework for Feature-Oriented
Software Development. Sci. Comput. Program. 2014, 79, 70–85. [CrossRef]
220. Blanco, C.; Rosado, D.G.; Varela-Vaca, Á.J.; Gómez-López, M.T.; Fernández-Medina, E. Onto-CARMEN: Ontology-Driven
Approach for Cyber–Physical System Security Requirements Meta-Modelling and Reasoning. Internet Things 2023, 24, 100989.
[CrossRef]
221. Hitesh; Kumari, A.C. Feature Selection Optimization in SPL Using Genetic Algorithm. Procedia Comput. Sci. 2018, 132, 1477–1486.
[CrossRef]
222. Zahoor Chohan, A.; Bibi, A.; Hafeez Motla, Y. Optimized Software Product Line Architecture and Feature Modeling in Improve-
ment of SPL. In Proceedings of the 2017 International Conference on Frontiers of Information Technology (FIT), Islamabad,
Pakistan, 18–20 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 167–172.
J. Cybersecur. Priv. 2024, 4 907
223. Zou, D.; Wang, S.; Xu, S.; Li, Z.; Jin, H. µVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection.
IEEE Trans. Dependable Secur. Comput. 2019, 18, 2224–2236. [CrossRef]
224. Zhang, J.; Liu, Z.; Hu, X.; Xia, X.; Li, S. Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code. IEEE
Trans. Softw. Eng. 2023, 49, 4196–4212. [CrossRef]
225. Kreyßig, B.; Bartel, A. Analyzing Prerequisites of Known Deserialization Vulnerabilities on Java Applications. In Proceedings
of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno, Italy, 18–21 June 2024.
[CrossRef]
226. Aladics, T.; Hegedűs, P.; Ferenc, R. An AST-Based Code Change Representation and Its Performance in Just-in-Time Vulnerability
Prediction. In Proceedings of the International Conference on Software Technologies, Rome, Italy, 10–12 July 2023. [CrossRef]
227. Wan, T.; Lu, L.; Xu, H.; Zou, Q. Software Vulnerability Detection via Doc2vec via Path Representation. In Proceedings of the 2023
IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), Chiang Mai, Thailand,
22–26 October 2023; IEEE: Piscataway, NJ, USA, 2023. [CrossRef]
228. Liu, R.; Wang, Y.; Xu, H.; Liu, B.; Sun, J.; Guo, Z.; Ma, W. Source Code Vulnerability Detection: Combining Code Language
Models and Code Property Graphs. arXiv 2024, arXiv:2404.14719.
229. Zhao, C.; Tu, T.; Wang, C.; Qin, S. VulPathsFinder: A Static Method for Finding Vulnerable Paths in PHP Applications Based on
CPG. Appl. Sci. 2023, 13, 9240. [CrossRef]
230. Wu, P.; Yin, L.; Du, X.; Jia, L.; Dong, W. Graph-Based Vulnerability Detection via Extracting Features from Sliced Code. In
Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C),
Macau, China, 11–14 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 38–45.
231. Wu, Y.; Lu, J.; Zhang, Y.; Jin, S. Vulnerability Detection in C/C++ Source Code with Graph Representation Learning. In Proceedings
of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Virtual, 27–30 January 2021;
IEEE: Piscataway, NJ, USA, 2021; pp. 1519–1524.
232. Zhang, C.; Xin, Y. Static Vulnerability Detection Based on Class Separation. J. Syst. Softw. 2023, 206, 111832. [CrossRef]
233. Şahïn, C.B. Semantic-Based Vulnerability Detection by Functional Connectivity of Gated Graph Sequence Neural Networks. Soft
Comput. 2023, 27, 5703–5719. [CrossRef]
234. Gong, K.; Song, X.; Wang, N.; Wang, C.; Zhu, H. SCGformer: Smart Contract Vulnerability Detection Based on Control Flow
Graph and Transformer. IET Blockchain 2023, 3, 213–221. [CrossRef]
235. Yuan, X.; Lin, G.; Mei, H.; Tai, Y.; Zhang, J. Software Vulnerable Functions Discovery Based on Code Composite Feature. J. Inf.
Secur. Appl. 2024, 81, 103718. [CrossRef]
236. Pradel, M.; Sen, K. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2018, 2, 1–25.
[CrossRef]
237. Javorník, M.; Komárková, J.; Husák, M. Decision Support for Mission-Centric Cyber Defence. In Proceedings of the 14th
International Conference on Availability, Reliability and Security, Canterbury, UK, 26 August 2019; ACM: New York, NY, USA,
2019; pp. 1–8.
238. Husák, M.; Sadlek, L.; Špaček, S.; Laštovička, M.; Javorník, M.; Komárková, J. CRUSOE: A Toolset for Cyber Situational Awareness
and Decision Support in Incident Handling. Comput. Secur. 2022, 115, 102609. [CrossRef]
239. Wagner, N.; Sahin, C.S.; Winterrose, M.; Riordan, J.; Pena, J.; Hanson, D.; Streilein, W.W. Towards Automated Cyber Decision Sup-
port: A Case Study on Network Segmentation for Security. In Proceedings of the 2016 IEEE Symposium Series on Computational
Intelligence (SSCI), Athens, Greece, 6–9 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–10.
240. Chen, X.; Jia, S.; Xiang, Y. A Review: Knowledge Reasoning over Knowledge Graph. Expert Syst. Appl. 2020, 141, 112948.
[CrossRef]
241. Li, X.; Chen, J.; Lin, Z.; Zhang, L.; Wang, Z.; Zhou, M.; Xie, W. A Mining Approach to Obtain the Software Vulnerability
Characteristics. In Proceedings of the 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), Shanghai,
China, 13–16 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 296–301.
242. Shi, Z.; Matyunin, N.; Graffi, K.; Starobinski, D. Uncovering Product Vulnerabilities with Threat Knowledge Graphs. In
Proceedings of the 2022 IEEE Secure Development Conference (SecDev), Atlanta, GA, USA, 18–20 October 2022; IEEE: Piscataway,
NJ, USA, 2022; pp. 84–90.
243. Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T.-S. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 25 July 2019;
pp. 950–958.
244. Allamanis, M.; Brockschmidt, M.; Khademi, M. Learning to Represent Programs with Graphs. arXiv 2017, arXiv:1711.00740.
245. Cheng, X.; Wang, H.; Hua, J.; Xu, G.; Sui, Y. DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural
Network. ACM Trans. Softw. Eng. Methodol. 2021, 30, 1–33. [CrossRef]
246. Kiran, S.R.A.; Rajper, S.; Shaikh, R.A.; Shah, I.A.; Danwar, S.H. Categorization of CVE Based on Vulnerability Software By Using
Machine Learning Techniques. Int. J. Adv. Trends Comput. Sci. Eng. 2021, 10, 2637–2644. [CrossRef]
247. Li, Y.; Zhang, B. Detection of SQL Injection Attacks Based on Improved TFIDF Algorithm. J. Phys. Conf. Ser. 2019, 1395, 012013.
[CrossRef]
248. Sun, H.; Cui, L.; Li, L.; Ding, Z.; Hao, Z.; Cui, J.; Liu, P. VDSimilar: Vulnerability Detection Based on Code Similarity of
Vulnerabilities and Patches. Comput. Secur. 2021, 110, 102417. [CrossRef]
J. Cybersecur. Priv. 2024, 4 908
249. Kim, S.; Woo, S.; Lee, H.; Oh, H. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In Proceedings of the
2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–24 May 2017; IEEE: Piscataway, NJ, USA, 2017;
pp. 595–614.
250. Hu, W.; Thing, V.L.L. CPE-Identifier: Automated CPE Identification and CVE Summaries Annotation with Deep Learning and
NLP. arXiv 2024, arXiv:2405.13568.
251. Kanakogi, K.; Washizaki, H.; Fukazawa, Y.; Ogata, S.; Okubo, T.; Kato, T.; Kanuka, H.; Hazeyama, A.; Yoshioka, N. Tracing CVE
Vulnerability Information to CAPEC Attack Patterns Using Natural Language Processing Techniques. Information 2021, 12, 298.
[CrossRef]
252. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084.
253. O’Hare, J.; Macfarlane, R.; Lo, O. Identifying Vulnerabilities Using Internet-Wide Scanning Data. In Proceedings of the 2019 IEEE
12th International Conference on Global Security, Safety and Sustainability (ICGS3), London, UK, 16–18 January 2019; IEEE:
Piscataway, NJ, USA, 2019; pp. 1–10.
254. Wang, X.; Sun, K.; Batcheller, A.; Jajodia, S. Detecting “0-Day” Vulnerability: An Empirical Study of Secret Security Patch in
OSS. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),
Portland, OR, USA, 24–27 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 485–492.
255. Takahashi, T.; Inoue, D. Generating Software Identifier Dictionaries from Vulnerability Database. In Proceedings of the 2016 14th
Annual Conference on Privacy, Security and Trust (PST), Auckland, New Zealand, 12–14 December 2016; IEEE: Piscataway, NJ,
USA, 2016; pp. 417–420.
256. Alfasi, D.; Shapira, T.; Barr, A.B. Unveiling Hidden Links Between Unseen Security Entities. arXiv 2024, arXiv:2403.02014.
257. Chen, T.; Li, L.; Zhu, L.; Li, Z.; Liu, X.; Liang, G.; Wang, Q.; Xie, T. VulLibGen: Generating Names of Vulnerability-Affected
Packages via a Large Language Model. In Proceedings of the 62nd Annual Meeting of the Association for Computational
Linguistics, Bangkok, Thailand, 11–16 August 2024. [CrossRef]
258. Aghaei, E.; Al-Shaer, E.; Shadid, W.; Niu, X. Automated CVE Analysis for Threat Prioritization and Impact Prediction. arXiv 2023,
arXiv:2309.03040.
259. Blinowski, G.J.; Piotrowski, P. CVE Based Classification of Vulnerable IoT Systems. In Theory and Applications of Dependable
Computer Systems; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Advances in Intelligent Systems
and Computing; Springer International Publishing: Cham, Switzerland, 2020; Volume 1173, pp. 82–93, ISBN 978-3-030-48255-8.
260. Jiang, Y.; Atif, Y. Towards Automatic Discovery and Assessment of Vulnerability Severity in Cyber–Physical Systems. Array 2022,
15, 100209. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.