A Systematic Review On Research Utilising Artificial Intelligence For Open
A Systematic Review On Research Utilising Artificial Intelligence For Open
https://doi.org/10.1007/s10207-024-00868-2
REGULAR CONTRIBUTION
Abstract
This paper presents a systematic review to identify research combining artificial intelligence (AI) algorithms with Open
source intelligence (OSINT) applications and practices. Currently, there is a lack of compilation of these approaches in
the research domain and similar systematic reviews do not include research that post dates the year 2019. This systematic
review attempts to fill this gap by identifying recent research. The review used the preferred reporting items for systematic
reviews and meta-analyses and identified 163 research articles focusing on OSINT applications leveraging AI algorithms.
This systematic review outlines several research questions concerning meta-analysis of the included research and seeks to
identify research limitations and future directions in this area. The review identifies that research gaps exist in the following
areas: Incorporation of pre-existing OSINT tools with AI, the creation of AI-based OSINT models that apply to penetration
testing, underutilisation of alternate data sources and the incorporation of dissemination functionality. The review additionally
identifies future research directions in AI-based OSINT research in the following areas: Multi-lingual support, incorporation
of additional data sources, improved model robustness against data poisoning, integration with live applications, real-world
use, the addition of alert generation for dissemination purposes and incorporation of algorithms for use in planning.
Keywords Open source intelligence · Artificial intelligence · Machine learning · Security operations · Cyber threat intelligence
Abbreviations
1 Introduction
PRISMA Preferred reporting items for systematic
reviews and meta-analyses
With the growth of the digital world the scale of data that is
OSINT Open-source intelligence
stored, transmitted and processed is increasing at an exponen-
AI Artificial intelligence
tial rate. The International Data Corporation estimates that
RNN Recurrent neural networks
64.2 zeta bytes of data were created or copied in 2020 [87]
LTSM Long short-term memory
and in 2019 just over 50 per cent of the world’s popula-
DNS Domain name system
tion used the internet [91]. This includes a large amount
CNN Convolutional neural network
of information that is publicly available related to individ-
NLP Natural language processing
uals, states and organisations. This open data can be utilised
SQL Structured query language
in intelligence operations for a wide range of applications
DGA Domain name generation algorithm
and can be especially useful in security contexts such as
AUC Area under curve
national security, law enforcement, defence, cyber security
APT Annotated probabilistic temporal logic
and threat intelligence operations. Open-source intelligence
B Thomas Oakley Browne (OSINT) is created when this publicly available and open
[email protected] data is processed into actionable intelligence and distributed
Mohammad Abedin to the relevant parties [273]. This process is known as the
[email protected] intelligence cycle and consists of planning, collection, pro-
Mohammad Jabed Morshed Chowdhury cessing, analysis, dissemination and evaluation [90]. The size
[email protected] and scope of online open data sources can cause problems
for OSINT investigations in addition to the opportunities pre-
1 Computer Science and IT, La Trobe University, Plenty Road sented.
and, Kingsbury Dr, Bundoora, VIC 3086, Australia
123
2912 T. O. Browne et al.
123
A systematic review on research utilising artificial intelligence… 2913
a simplified four-step process designed to fit into the intel- agencies. This is particularly true in the area of OSINT,
ligence cycle [273]. The steps outlined by RAND include for example, cyber security professionals can leverage data
collection, processing, exploitation and production. Figure 2 collected from Twitter to improve their threat intelligence
outlines this high-level four-step process. This four-step pro- capabilities [220] and law enforcement can monitor online
cess would be used within the intelligence cycle as defined marketplaces to identify the sale of illegal and controlled
by the US-based Office of the Director of National Intelli- narcotics [141]. Business intelligence is a growing area of
gence [90] and combined with other intelligence disciplines. interest and is driven by the growth of available digital data,
The intelligence cycle can be described with the chart in this is usually defined under the term big data. As such busi-
Fig. 3 and includes planning, collection, processing, analy- nesses have had to create processes similar to the intelligence
sis, dissemination and evaluation. AI-based OSINT systems cycle for their intelligence operations [15].
could be categorised based on which step in this higher-level
process they are assisting. 2.3 OSINT techniques and tools
According to the RAND Corporation, a highly respected
US-based research organisation that specialises in subjects There is a wide variety of tools available to support OSINT
concerning national security [217], metrics alone are not operations throughout the intelligence cycle. These include
sufficient to gauge the effectiveness of an AI-based system comprehensive tools that can manage entire investigations,
employed for intelligence purposes. The RAND corporation such as Spiderfoot [159] and Maltego [147], or they can
proposes that the performance of such systems must be eval- provide single functions to assist OSINT operations, for
uated based on the value they provide to the next stage of the example, TheHarvester, which provides email scraping func-
intelligence cycle [90]. It is also important to note that intelli- tions [155]. Broadly speaking OSINT tools and techniques
gence operations are not reserved exclusively for intelligence can fit into the different stages of the intelligence cycle. The-
123
2914 T. O. Browne et al.
123
A systematic review on research utilising artificial intelligence… 2915
and accuracy than a human. One example of this is the use the intelligence process is correct 99 per cent of the time, it
of convolutional neural networks (CNN) in computer vision will still fail if the outcomes it provides to the intelligence
tasks. CNN algorithms can analyse images piece by piece process are insufficient, or worse still, result in catastrophic
to identify features common to a particular set of images errors. The RAND Corporation believes this concept should
and be used to categorise images or identify objects in be applied when evaluating any AI or machine learning algo-
images [171]. To an extent, CNN algorithms can be used rithms that are being applied in combination with OSINT or
to mimic the human ability of sight and provide an example other intelligence operations.
of how machine learning algorithms can be used to com-
plete tasks usually performed by a human operator. A further
example of this comes from the application of machine learn- 3 Methodology
ing algorithms for natural language processing (NLP). NLP
techniques can be employed for a large range of tasks that This study makes use of the PRISMA guidelines for conduct-
in the past would have been reserved for a human analyst. ing systematic literature reviews. Two researchers conducted
NLP can be used to derive meaning from human speech and the review and evaluated each other’s results. The methodol-
text and can be used for sentiment analysis, to understand ogy utilised to identify, include, exclude, evaluate and review
people’s feelings or opinions on a subject. NLP can also be literature for this paper consists of the following steps:
used to efficiently extract knowledge and information from
bodies of text [119]. NLP and CNN algorithms are only two 1. Search academic databases using selected keyword set.
examples of how AI and machine learning can perform tasks 2. Initial evaluation of search results to exclude non-relevant
usually conducted by humans. The full range of available AI articles by appraisal of the article keywords, abstract and
and machine learning techniques is extensive but these exam- title.
ples show the potential for machine learning algorithms to 3. Remove all duplicate articles.
reduce the workload of human analysts with specific tasks. 4. Include or exclude identified articles with the defined cri-
teria through appraisal of the full text.
2.5 Combining AI, machine learning and OSINT 5. Use quality questions to identify high-quality articles and
exclude low-quality articles.
The ability of AI to perform human tasks provides a sig- 6. Use research questions to collect the appropriate data from
nificant opportunity to improve the speed and efficiency of the included research
systems utilised in OSINT operations. Recently academic 7. Record collected data.
researchers have been utilising AI and machine learning in 8. Synthesise collected data through graphical display and
combination with OSINT sources and practices. Some use description of findings.
cases include the detection of malicious domain names [102],
misinformation detection [94] and threat intelligence through The Number of articles was recorded at each stage of the
monitoring online hacker forums [57]. process to show aggregated results throughout the review
These examples show that this combination of AI and process. Articles in foreign languages were translated so that
OSINT opens a powerful new tool for OSINT operations. data from these articles could be included in the results.
Incorporating AI and machine learning with OSINT is
enabling the dissemination of an otherwise insurmountable 3.1 Academic database search
amount of data and provides a means of accelerating investi-
gations and bolstering intelligence gathering and analysis. In To initially identify potential articles for review, several
addition to this, open data available online can also be utilised online academic databases were queried. The following
as training sets for machine learning algorithms to assist in databases and academic search engines were used:
various security and intelligence applications and generate
actionable OSINT. This is frequently the case in models used 1. IEEE [88]
for the detection of maliciously generated domain names 2. Scopus [222]
which are trained on data gathered from DNS [276]. 3. ACM [26]
Combining AI and machine learning with OSINT presents 4. Science Direct [35]
many challenges. The RAND Corporation has recently 5. Springer Link [238]
identified that judging the performance of AI use in the 6. Google Scholar [66]
intelligence process does not rest solely on metrics such as
how accurate the algorithm’s results are. The AI needs to be The keywords identified for use in this study include
judged on the outcomes it provides to the intelligence pro- OSINT, machine learning, artificial intelligence and cyber
cess [90]. Put another way, even if an AI implemented in reconnaissance. An initial search was run using the “OR”
123
2916 T. O. Browne et al.
operator to identify the total number of articles relating to 3.3 Article criteria
these keywords available on the selected databases and aca-
demic search engines. This was refined by searching for Articles that proceed from the initial evaluation will be
exact matches of the specific keywords, to identify the total assessed for inclusion using the following criteria:
number of articles available for each term. Once this was
completed the “AND” operator was used to identify articles 1. Article must post-date the year 2011.
that relate to either OSINT and machine learning/ artificial 2. Article must be a journal article, conference proceeding,
intelligence, or cyber reconnaissance and machine learning/ technical report or academic archive.
artificial intelligence. An example search would be “OSINT” 3. The article must include the subject matter of using
AND “Machine Learning”. Total numbers and search dates machine learning or AI in combination with OSINT
were recorded for each search to construct flow charts that sources or processes in a security context. A security con-
break down the total number of articles related to these key- text could relate to either national security, cyber security,
words. All keyword searches are included below: law enforcement or intelligence operations. Systems that
are general but could be used in a security context will
1. OSINT OR cyber reconnaissance OR artificial intelli- also be included in this study.
gence OR machine learning 4. The article must cover the use of a machine Learning or
2. OSINT OR artificial Intelligence AI algorithm, proposed model or framework through:
3. OSINT OR machine Learning a. Real-world use.
4. cyber reconnaissance OR artificial intelligence b. Experimental use.
5. Cyber reconnaissance OR machine learning c. Proposed or theoretical use.
6. OSINT
7. Cyber reconnaissance
3.4 Quality questions
8. Artificial intelligence
9. Machine learning
Once articles were identified to adhere to the selection criteria
10. OSINT AND artificial intelligence
they were assessed on their level of academic quality. The
11. OSINT AND machine learning
following questions were considered when determining the
12. Cyber reconnaissance AND artificial intelligence
quality of a paper:
13. Cyber reconnaissance AND machine Learning
When each search was entered the number of articles was 3.5 Research questions
compared between the reviewers to assess if the search had
been run correctly. Once this was done an initial evaluation Once the articles were identified for inclusion in this study
was conducted to remove unrelated articles. This evalua- the following research questions were asked to extract data
tion was completed by reviewing the keywords provided by and information from the selected articles. If multiple cate-
the articles, the abstract and the title to assess if the article gorisation occurs, both categories will be included in the total
matches the required subject matter. If an article matched the figures. For example, if researchers collaborate from multi-
required keywords of the search it proceeded to the next stage ple countries, both countries are recorded in the final results.
of the review. The review of the search results was terminated The same process is followed for multiple data sources, pro-
once the search results started returning significant amounts fessionals and organisations, algorithms, intelligence cycle
of non-relevant material. phases, metrics, OSINT sources and tools:
Once articles had been identified through the initial eval-
uation, they were added to a spreadsheet, and any duplicate 1. What is the trend in AI and machine-learning-based
papers were identified and removed from the list. These OSINT?
articles then proceeded to be selected based on our quality 2. What geographical regions are contributing the most to
questions and article criteria. this area of study?
123
A systematic review on research utilising artificial intelligence… 2917
3. What professions and organisations could benefit from AI 3.5.4 What machine learning algorithms, techniques or
and machine-learning-based OSINT applications? tools are being used in OSINT?
4. What machine learning algorithms, techniques or tools
are being used in OSINT? This question seeks to understand what algorithms are being
5. What phases of the intelligence cycle does the included applied to OSINT use cases. It also seeks to identify other
research apply to? tools and techniques that are being used by researchers. This
6. What metrics and analysis are provided to evaluate system could provide other researchers with potential starting points
performance in the included research? in terms of tools, techniques and algorithms for their research.
7. What are the sources of OSINT used in the included The question will be answered by recording identified tools,
research? algorithms and techniques used in models or experimentation
8. What OSINT tools are applied in the included research? within the included articles.
3.5.1 What is the trend in AI and machine-learning-based 3.5.5 What phases of the intelligence cycle does the
OSINT? included research apply to?
This question is asked to identify growth patterns in AI- Taking into consideration the advice provided by the Rand
based OSINT research. Given the current need to analyse Corporation, it is important to understand which phase of
vast amounts of open data, it is important to identify if the the intelligence cycle AI models are being applied to. This
research community is contributing sufficiently to this field provides context to establish their performance in terms of
of research and if any events have impacted research out- how they support the next phase of the intelligence cycle.
put. The research community must be sufficiently covering Understanding this would be beneficial in appraising models
this topic to assist both research and industry in their OSINT for use in both industry and government settings. This ques-
efforts. To answer this question the publication date field tion also supports academic research in identifying if there
from the article metadata will be utilised. are research gaps in the context of the intelligence cycle.
To answer this question each paper was analysed to identify
3.5.2 What geographical regions are contributing the most which intelligence stage is most appropriate. If the research
to this area of study? model included multiple stages, each stage was included in
the results.
This question will be answered based on each author’s affil-
iated institution. Multiple institutions could be recorded for
each paper. The question seeks to address whether there 3.5.6 What metrics and analysis are provided to evaluate
is sufficient linguistic and cultural contextual spread in system performance in the included research?
AI-based OSINT research. This is important to note, for
example, that elements such as regional slang and dialect This question seeks to understand how researchers are
could influence results when AI models are employed in currently evaluating their proposed experimental models.
text-based tasks. For example the previously mentioned mon- Presentation of results that demonstrate how models fit into
itoring of hacker forums. This could have implications for the intelligence cycle would be beneficial. Identifying any
criminal, cyber threat intelligence and general intelligence research gaps in this area can potentially provide potential
investigations. future directions in terms of model evaluation. The ques-
tion is answered through analysis of metrics provided in the
results section for papers and identifying any further evalua-
3.5.3 What professions and organisations could benefit tion analysis.
from AI and machine-learning-based OSINT
applications?
3.5.7 What OSINT tools are applied in the included
This question is qualitative as it diverges from the analysis research?
of the article’s metadata. Answering this question will help
identify the variance in industry end use for the proposed This question is of particular interest to industry and practi-
models. Ultimately it will identify if there are any limita- tioners as it can be used to evaluate the extent to which AI
tions or gaps in the assessed research articles concerning the models can be integrated with pre-existing non-AI OSINT
domain of final use. This can be used to identify future direc- tools. The question is answered through identifying in
tions for which there is limited research material. OSINT tools described in the included papers.
123
2918 T. O. Browne et al.
The initial database search returned a total of 54,138 arti- 4.2.2 What geographical regions are contributing the most
cles. Screening of the title, keywords and abstract reduced to this area of study?
this number to 415, or 213 with duplicates removed. 84 addi-
tional articles were identified through references, providing a A substantial portion of the research that has been included
total of 297 articles. 27 articles were removed after assessing in this paper has occurred in either the United States or India.
article quality and 123 were removed after comparing criteria The United States and India combined account for about 44
to the full text. A further 11 were removed after comparison per cent of the research included in this study. The United
to the quality questions. A total of 163 articles have been Kingdom and China are also driving research in this area. The
included in this review. Figure 4 shows the PRISMA flow top four countries that are contributing to this area of research
diagram for the number of articles identified at each stage of account for about 57 per cent of the included research papers.
the review and is divided into search, screening, exclusion The heat map provided in Fig. 6 demonstrates the global
and inclusion sections in line with the PRISMA framework. distribution of the research included in this study.
Figure 6 additionally shows the total number of articles
4.2 Research questions from the top fifteen countries producing research that fit the
criteria of this study. These countries account for 150 of the
This section reports the analysis of the systematic literature total 163 included articles, or about 92 per cent of all the
review results framed along the eight research questions pre- included research articles. Countries from the continents of
sented in Sect. 3.5. North America, Asia and Europe are included on this list. The
United States is the largest producer of included research in
4.2.1 What is the trend in AI and machine-learning-based North America, with Canada being second in that region.
OSINT? India is the highest-ranked country in Asia for research com-
bining AI with OSINT, with China being the second. In
There has been significant growth in the number of research Europe, the United Kingdom is the highest-producing coun-
articles that fit the criteria for this study, this positive trend can try for research in this field, with Italy being the second
be seen in Fig. 5. The volume of research in this area peaked highest. There is limited research that meets the criteria of
in 2019 at 33 articles but this was followed by a decline of this systematic literature review is being undertaken in the
about 21.2 percent in 2020. continents of Africa and South America. Only Argentina,
It is possible that the COVID-19 pandemic slowed the Columbia, Egypt and South Africa contributed some papers
growth of research in this area post-2019. The initial stages from these regions.
of the pandemic correlate with a decline in time spent on Overall there seems to be a fair distribution of papers in
scientific research and reduce the number of new scientific terms of geographic location. However, there are concen-
research projects [53]. Concerning the assessed articles of trations of papers in the United States, India and the United
this study, there appears to be a more significant decrease in Kingdom, two of which are predominantly English-speaking
conference papers between 2019 and 2020, with conference with cultural similarities. The reduced number of papers in
publications decreasing by about 22 per cent, while journal areas such as Africa, Central Asia, Eastern Europe and South
articles remain relatively stable. This could be due to dis- America could indicate limitations based on cultural context.
ruption in academic conferences during the initial pandemic This would be particularly important for OSINT applications
stages. leveraging natural language processing that are concerned
Computer science conferences appear to have adapted with law enforcement or intelligence operations. Contribu-
well in this period but a study into this question showed that tions from academics in these areas could help produce truly
123
A systematic review on research utilising artificial intelligence… 2919
20
15
15 14
10
10
5
54
3 3
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 20…
Number of Articles
123
2920 T. O. Browne et al.
6 6
8
Security Operations
18 68
Law Enforcement
Intelligence Services
32 Cyber Threat Intelligence
General
Government Departments
Non-Goverment Organisations
34 Cyber Risk Management
57
Public Health
Defence
Emergency Services
Penetration Testing
48
Cyber Incident Response
56
United States
Fig. 7 Types and functions of organisations that could benefit from the
India included research that combines AI with OSINT applications. Some
United Kingdom research papers could be used in multiple organisation types
China
Italy
Japan
South Korea
Canada this conference paper utilises OSINT collected from the
Portugal National Vulnerability Database to train several classifiers
Spain
Germany
to identify Structured Query Language (SQL) injection vul-
France nerabilities from online texts. The model can be used to scan
Pakistan
Greece
OSINT sources such as tweets and websites to identify SQL
Australia injection-related information. There are also a few papers that
0 3 5 8 10 13 15 18 20 23 25 28 30 33 35 38 40
are specific to the defence domain although this could be due
Number of Articles Top 15 Countries
to the classified and sensitive nature of such research and
some defence research may not be available through normal
Fig. 6 Geographic distribution and number of articles produced on a academic sources.
per-country basis for research that incorporates OSINT with AI. Out Much research that would be well suited for the defence
of a total of 163 articles included in this review. Large amounts of AI-
driven OSINT have been undertaken in the United States and India, sector is also well suited for other domains such as intel-
which together account for 44 % of the research included in this review ligence services and government departments. The journal
article [58] provides a demonstration of this by propos-
ing a tweet classification system based on various machine
universal AI-based OSINT systems dedicated to these forms learning algorithms that identify the exposure of sensitive or
of investigation. classified information on social media. This system has obvi-
ous applications to government, defence and the intelligence
4.2.3 What professions and organisations could benefit sector but such a system could also be utilised by security
from AI and machine-learning-based OSINT operations conducted in the private sector. Ultimately the
applications? results of this systematic review show that many systems
employing AI methods with OSINT can be utilised across
There is less variation in the domains that could apply the multiple domains and are not confined solely to government,
included research when compared to the specific applica- defence or intelligence service projects.
tions. The majority of the included research can be applied to Figure 8 shows the main applications and functions of
either security operations, law enforcement, intelligence ser- research that combines machine learning with OSINT. The
vices or cyber threat intelligence operations. Figure 7 breaks uses for the systems being used in the research are varied
down the included research by the potential domain of use. but a substantial portion covers either sentiment analysis,
There are few papers leveraging AI for OSINT purposes that cyber threat intelligence, domain name generation algorithm
apply to the domain of penetration testing. This presents a (DGA) detection or OSINT investigations. There are also
research opportunity as an important aspect of penetration several other uses such as cyber attack prediction, natural
testing is collecting publicly available data about the target disaster management and human trafficking investigations.
so that the penetration tester can gain knowledge of the poten- The paper [13] provides an interesting example of how
tial attack surface [210]. a particular application leveraging OSINT with AI can be
One specific piece of research that could assist penetra- applied to specific domains. This research utilises Twitter
tion testers in their projects is presented in the paper [187], as an OSINT source and leverages machine learning algo-
123
A systematic review on research utilising artificial intelligence… 2921
123
2922 T. O. Browne et al.
123
A systematic review on research utilising artificial intelligence… 2923
Accuracy
77 Precision 22
F1-Score
Recall 6 Social Media
Area Under Curve (AUC) 6 Online Dataset
193
True Positive Rate 7 Online News
68 93
False Positive Rate Websites
7
True Negative Rate Blogs
Total Negatives 12 Online Forums
Positive Domain Name System
False Positives Dark Web
66
False Negatives Online Marketplaces
Receiver Operating others
Characteristic (ROC)
60 True Positives
18 19 70
others
Fig. 11 Metrics used for evaluation of machine learning-based systems Fig. 12 Types of OSINT utilised included research articles. Social
in OSINT applications research media is the largest direct source of data used in the included research.
Significant amounts of the included research make use of OSINT data
sets
nance scheduling for offshore oil and gas platforms. Similar
methods could be used for the planning and scheduling of
OSINT investigations or operations. An interesting avenue their inclusion in vulnerability databases. A further exam-
of research could be provided by identifying how these tools ple of this is in which provides measurements on whether
and models could be incorporated into the planning phase an investigated individual is engaging in cyberstalking. Pro-
of the intelligence cycle and integrated with classification viding measurements such as these can help potential users
models for processing, analysis and dissemination. evaluate the usefulness of a system in terms of real-world out-
comes and is especially useful information to non-technical
stakeholders or team members. These are however limited
4.2.6 What metrics and analysis are provided to evaluate
cases, researchers should endeavour to evaluate their system
system performance in the included research?
based on the the objectives of a particular phase of the intel-
ligence cycle. This would better place the systems for use in
Figure 11 shows the metrics used for evaluation in the
real-world OSINT operations.
included research. Nearly half the metrics identified for the
evaluation of machine learning systems included in this study
are accuracy, precision, f1-Score, recall and area under curve 4.2.7 What are the sources of OSINT used in the included
(AUC). These five metrics are some of the most commonly research?
used for the evaluation of machine learning systems [280].
There are also a smaller amount of papers providing totals for Figure 12 shows the types of OSINT used in machine learning
metrics such as true positive rate, false positives and receiver and AI-based OSINT applications, a significant proportion
operating characteristics. It should also be noted that some of the papers included in this study obtain OSINT sources
papers fail to provide a full accounting of the performance through either online data sets or social media. Online news,
of the algorithm in terms of these metrics, which makes it websites, blogs and the dark web are also being utilised to
harder to conclude how suitable the system is to complete some degree in the included research. A model that makes
its intended task. For example, in the paper [9] an analysis use of a social media data source is the TwitterOSINT appli-
is given comparing the true positive rate to the true nega- cation proposed in [83]. TwitterOSINT uses Twitter data to
tive rate but no precise figures are given for precision, recall create real-time visual representations, this feature supports
or f1-score. Including these results would make it easier to intelligence analysts by easily presenting the data while it is
assess the suitability of their system for processing cyber current and relevant. The proposed system leverages NLP to
threat intelligence. A further example of this is in the con- annotate the collected data so it can be further processed for
ference paper [57] which only provides figures for accuracy search and final visualisation. This tool is particularly well
and precision but fails to account for recall and f1-score. suited to provide real-time threat intelligence to cyber secu-
There are some examples in the included research that pro- rity operations. An example of research using online datasets
vide results related to the intended use of the system or system as the OSINT data source is the bidirectional LSTM models
objectives. In [33] cyber threat intelligence is performed for DGA classification proposed in [11]. The datasets used
using data from Twitter and the researchers can provide infor- consisted of OSINT data collected from DNS that has been
mation relating to vulnerabilities that were discovered before flagged as malicious, this data was collected from a data set
123
2924 T. O. Browne et al.
Twitter
Alexa
in the included research. Online datasets feature quite pre-
Facebook dominantly in systems being used for the detection of DGA.
OSINT Feeds These systems use OSINT to train their machine learning
OpenDNS
83
National Vulnerability
systems to correctly identify malicious domain names. The
Database (NVD) systems will generally include separate data sets for benign
Reddit
domains and malicious domains, with benign domains being
Reuters
160 netlab-360 sourced from DNS and online data sets such as Alexa. Mali-
16
MalwareDomainList cious domain data sets are then constructed from security
YouTube
Amazon
databases such as 360netlab [208].
11
Stack Exchange
8
7 Phishtank 4.2.8 What OSINT tools are applied in the included
Common Vulnerabilities and
6
Exposures (CVE) research?
Bambenek
Natural News
There were relatively few examples of pre-existing non-
others
AI-based OSINT tools being incorporated into AI models
Fig. 13 The sources of OSINT utilised included research articles. identified in this study. Many of the researchers have elected
Social media includes significant amounts of data gathered from Twit- to collect data manually, or through application program-
ter. Online datasets include data retrieved from OpenDNS, OSINT feeds
and the National Vulnerability Database
ming interfaces such as the Twitter API [138, 190, 196],
or by downloading preexisting data sets, such as UNSW-
NB15 [103] or AmritaDGA [264]. There are some examples
provided by netlab-360. This obtained data was used as the of OSINT tools being used in the included research in com-
malicious domain component of the model’s training set. bination with AI, such as Tor Crawler [108], Shodan [267],
Figure 13 breaks down the direct sources of the OSINT ReKognition boto3 [251], SAIL LABS Media Mining Sys-
data from the included articles. For social media, Twitter is tem [12] and the Tweepy API [94]. In [136] Maltego is used
the most prominent source of OSINT, followed to a lesser in the prototype Twitter cyberbullying detection system to
extent by Facebook and Reddit. Video-based social media is graph relationships between the various accounts in the col-
also present with YouTube being utilised in some research lected data. This provides useful information on coordination
included in this study. Online data sets collected from ser- between stalkers.
vices such as Alexa are also prominent, especially in DGA There are also a few OSINT tools being developed
detection. Twitter is used quite predominantly for cyber threat by researchers that combine AI directly into a proposed
intelligence and sentiment analysis. An example of Twitter OSINT tool, for example, the previously mentioned Twit-
being used for cyber threat intelligence is found in [220], terOSINT [83, 220] provides a complete system that collects
which covers automating the collection and analysis of Twit- data from the Twitter API and then uses NLP to annotate the
ter data for threat intelligence purposes. The system they unstructured text. This then allows intelligence analysts to
propose is still in the concept phase. The researcher’s pro- easily search and create a visualisation of the collected data.
posal includes the collection, processing and analysis of Additional research that either creates new OSINT tools util-
Twitter data to identify cyber threats before they appear in ising AI or that leverages the abilities of current OSINT tools
common vulnerability and exposure databases. The confer- would be beneficial to intelligence or security analysts. This
ence paper [5] provides an example of Twitter data being is because these systems would link directly with their exist-
used for sentiment analysis, this is also an example of a sys- ing tool kits or add new applications that they can use in their
tem that is general in nature. The model is directed to the existing processes.
analysis of people’s opinions but could also be repurposed
to identify posts that concern security incidents or users with
malicious intent. 5 Limitations, future directions and
The results of this systematic review show that Twitter is summary
over-represented in the collected data and this is most likely
due to the ease of access to posts on the platform. This infor- The following sections outline identified limitations with cur-
mation can easily be obtained by researchers using the Twitter rent research utilising OSINT in combination with AI. First
API [255] or a service such as Tweepy that leverages the current research limitations are presented followed by future
API provided by Twitter [10]. The downside of this ease of directions that specific researchers are undertaking to reduce
access is the potential for researchers to overlook other social the limitations of their research. A summary of findings will
media sources for OSINT data, this may cause the under- also be presented and finally the identified limitations of this
representation of services such as YouTube and Facebook study.
123
A systematic review on research utilising artificial intelligence… 2925
The following section outlines limitations that have been Only a small amount of the included research could be
identified from the papers included in the systematic review. applied to the domain of penetration testing. Some exam-
ples of papers included in this study that could be applied
to this domain include work by Layton, Perez, Birregah,
Watters and Lemercier that links profile ownership between
5.1.1 OSINT tools different social media accounts [121] and could provide use-
ful information to test organisational resilience in the face
Very few studies use pre-existing OSINT tools in their of social engineering attacks. The mentioned research is of
machine learning research, while there are a few examples particular interest as it could circumvent an organisation’s
of OSINT tools being used in the included research the sig- stakeholders’ efforts to keep particular social media accounts
nificant majority did not include the use of these tools in anonymous. A further example is FastEmbed which deter-
their AI-based OSINT research. A possible reason for this mines the possibility of the exploitation of vulnerabilities
is the large number of research papers that are dedicated using the LightGMB algorithm [48]. The proposed model
to the processing and analysis phases of the intelligence uses OSINT collected from an exploit database for its train-
cycle, rather than collection. As many OSINT tools are used ing set and can complete its predictions when considering
mainly for the collection phase for example theHarvester. real-world exploits. This was done with an accuracy of 83
This could provide a reason for their exclusion from the per cent. These two papers provide examples of how AI-
research [155]. There are OSINT tools that include function- based OSINT applications could be beneficial to penetration
ality for processing and analysis, such as Maltego [147] but testing.
due to the nature of the research in this paper, the processing Despite these examples, the volume of research is limited
and analysis phases are being undertaken by machine learn- in comparison to other domains such as cyber threat intelli-
ing algorithms, which are essentially replacing the functions gence. Another paper of interest for this domain is presented
undertaken by non-machine learning based OSINT tools. in [205]. The researchers propose a model that generates fake
However, it appears researchers are ignoring the opportunity cyber threat intelligence reports to distribute to threat intel-
to preprocess their data using these tools which could refine ligence vendors. This is of interest as such an attack could
data sets and enhance the capabilities of their final AI mod- cause data poisoning of defensive cyber AI systems, or cause
els. An opportunity to demonstrate how their systems can confusion within a security operations team leading to sets
integrate with current tools is also being lost, which would of actions that benefit the attacker. This paper was ultimately
be a great benefit to current users of OSINT tools. not included in the aggregated results for penetration testing
There are sparse examples where researchers have created as such a system would impact multiple threat intelligence
new AI-based OSINT applications that could be employed vendors and organisations. This indiscriminate nature would
in research and industry. There are however a few exam- lead to issues concerning the scope of the penetration testing
ples of this. NoRegINT is a machine learning-based OSINT exercise. However, penetration testers may want to consider
tool [105], its authors place it alongside current OSINT how to test organisational resilience against such an attack.
tools that are used for cyber reconnaissance such as Spi-
derfoot [159] and Maltego [147]. Currently, the proposed 5.1.3 Underutilised data sources
framework for NoRegINT uses Twitter, Reddit and Tumblr
as OSINT sources and can perform sentiment analysis on the Research that uses data from social media appears to be heav-
collected data. The tool is however far from complete and ily reliant on Twitter. This seems to be due to the ease of
does not compare to the scope of data sources and analysis access to the Twitter API [255]. This appears to be true for
that is currently available in Spiderfoot or Maltego. Another various forms of applications, including cyber threat intelli-
example is TwitterOSINT outlined in [83, 220], although gence. To demonstrate there are multiple papers covering the
the AI-based functionality is limited to annotations and does collection of threat intelligence from Twitter including [16,
not provide further analysis. Future research could focus on 209, 220, 233, 254], with a total number of 80 papers obtain-
the development of an AI-based OSINT suite that can per- ing data from Twitter. The number of papers using data
form multiple functions across a variety of OSINT sources. obtained from the dark web is severely limited in compari-
Another potential research direction is to focus on demon- son. It appears that the ease of access to data from Twitter
strating how current OSINT tools can be used in tandem with may be driving an overreliance on this data source in the
proposed AI systems. This would be especially beneficial to included research to the detriment of other potential sources.
individuals utilising these OSINT tools in industry or intel- In [57] a similar threat intelligence system is described
ligence circles. that uses the dark web with similar functionality to the stated
123
2926 T. O. Browne et al.
Twitter-based threat intelligence systems. The system looks OSINT applications with this ability could potentially oper-
specifically at hacker forums and potentially could provide ate at a global scale across multiple geographic regions. This
more valuable and current data than Twitter. The same is true potential future research is not isolated to a particular domain
when comparing the number of papers using data from other of research or application. For example, the proposed model
social media sources such as Facebook. in [14] conducts sentiment analysis on social media and the
There also appears to be no research into using data researchers plan to continue the research by training their
from newer social platforms such as Discord [37]. Dis- model on larger datasets containing data from multiple lan-
cord could provide an interesting source of OSINT data for guages. A further example of this can be found in the human
AI-based applications due to its community-based nature. trafficking detection model proposed in [22]. The authors of
Cyber security-related servers on Discord can be found this paper also seek to add multi-lingual functionality to their
using services such as Disboard [36]. Researchers could also system in the future. The avenue they propose for this future
make attempts to join private Discord groups to gather data, research is to add automatic translation to the model. Sup-
although this might raise ethical concerns. port for multiple languages is also a planned feature for future
Essentially the over-utilisation of a single data source due research for the natural disaster monitoring system proposed
to its ease of access may skew research results and forfeit in [226] and in the area of cyber threat intelligence, the
the ability to gain access to a wider variety and more com- authors of [277] have included multilingual capabilities into
plete set of data. In the case of OSINT for cyber threat their future research plans. Incorporation of multi-language
intelligence this would include building AI-based systems support will be especially beneficial in domains that tackle
that work with the dark web, community forums and newer problems at the global level, such as monitoring the cyber
social platforms. The same reasoning could also be applied threat landscape and human trafficking.
to other applications such as behaviour profiling and hate
speech detection. Future AI-based OSINT research should
endeavour to include these data sources.
From the included papers there are limited examples of Some papers have identified the limitations of relying on a
systems that include the dissemination phase of the intel- single source of data and plan to incorporate multiple data
ligence cycle. An example of research that includes the sources in future research. In [233] the researchers plan to
dissemination phase of the intelligence cycle in their machine include data originating from Facebook to train their cyber
learning-based system is a paper by Nnaemeka Ekwunife of threat intelligence model. The authors of [94], also plan to
Marymount University [44]. The paper is a proposed model modify their misinformation detection model to include data
that gathers intelligence from social media and includes sources in addition to Twitter and a further example can
alert generation when intelligence relevant to national secu- be found in [237], where the authors plan to increase the
rity is uncovered. This is the sole example of the included diversity of their dataset to improve their fake news and mis-
research that covers this aspect of the intelligence cycle. Fur- information detection model. In [209, 220], the researchers
ther research could include this component as part of future plan to include additional social media data sources in their
AI-based models. Adding dissemination functionality to AI- cyber intelligence system and the authors of [85] note that
based OSINT models would enable researchers to create fully their use of data contained on MITRE ACC&CK [27] frame-
automated systems that can work across multiple jurisdic- work required expansion due to the nature of the MITRE
tions. ACC&CK framework. The framework relies on expert con-
tribution so there is a time delay from when the threat
5.2 Future directions materialises in the real world and from when the threat is
reported. The researchers plan to include additional sources
The following sections detail future directions being under- of OSINT to circumvent this data source issue. This provides
taken by researchers in this area of study, some avenues an example that each OSINT data source presents unique
present potential opportunities to overcome some of the pre- strengths and weaknesses and incorporating multiple sources
viously identified limitations. in future research will provide researchers with a means to
mitigate any potential open data source shortcomings. Being
5.2.1 Multi-lingual capabilities able to include additional data sources will increase the mon-
itoring surface of AI-based OSINT applications and assist
Some studies have identified the inclusion of multi-lingual in the creation of general OSINT suites that incorporate AI
functionality as an avenue for future research. AI-enabled algorithms.
123
A systematic review on research utilising artificial intelligence… 2927
5.2.3 Robustness against data poisoning and automation to generate reports that report the findings of their
misinformation models to the Food and Drug Administration (FDA) and the
Drug Enforcement Administration (DEA) using the required
An interesting future direction of research is presented reporting templates of those organisations. Another example
in [209] for their cyber event detection system. The avenue of this direction is the research team that created CyberDect.
for future research is to check the ability of their system to These researchers are seeking to improve their cyberbul-
perform robustly in the event of data poisoning attacks, or lying detection AI model by incorporating an alert system
when dealing with fake accounts. This is vital for all areas to notify the appropriate parties when bullying is occurring
of OSINT as the collected data is publicly available. Any AI online [136].
system that relies on OSINT could be subject to incorrect
or misleading information corrupting the normal function-
5.2.6 The planning phase of the intelligence cycle
ality of the system. This could be unintentional exposure to
misleading data, or in the case of a data poisoning attack,
The planning phase of the intelligence cycles may also pro-
part of a purposeful and targeted cyber event that aims to
vide a future avenue for research. The majority of the models
corrupt the output of the AI system. As mentioned in the
found in this study perform classification and are well suited
limitations section the paper [205] provides an example of
towards the processing and analysis phases of the intelli-
an AI-based system generating fake cyber threat intelligence
gence cycle. In regards to planning, there are some examples
reports. Such a system in the hands of a cyber adversary could
of algorithms dedicated to this purpose, such as the afore-
manipulate multiple organisations into performing a set of
mentioned research conducted in [110]. A potential future
desired actions. An interesting avenue for future AI-based
research direction would be to use a similar system for
OSINT research could focus on how penetration testers test
OSINT planning and integrate it with subsequent classifiers
the robustness of such an attack. These tests could be based
for collection, processing, analysis, and perhaps even dis-
on current systems or systems that utilise AI for cyber threat
semination.
intelligence or security operations.
Several research teams intend to take steps to apply their The results of the research questions presented in this sys-
OSINT-based AI research for real-world applications or are tematic review show that there has been significant growth
taking steps to integrate with currently used cyber security in research using AI for OSINT applications from 2011 to
applications. In the cyber threat intelligence domain, the 2021. Despite a slowing of growth between the years 2020
authors of [116] plan to improve the integration of their AI and 2021 during the COVID-19 pandemic, the positive trend
model into currently used intrusion detection and prevention is maintained. Continued growth in this area of study is ben-
systems. A large-scale operations evaluation of the platform eficial to business and industry as it becomes more vital to
will also be performed to assess any additional requirements convert open data into actionable intelligence.
that may be required before real-world use. This is a common There is a fairly even geographic spread for the research
future research aspiration for many of the studies included covered by this systematic review, although, from the three
in this paper. A further example in the threat intelligence highest-producing geographic areas, two have similar lin-
domain can be found in [220], who plan to trial their Twit- gual and cultural backgrounds and some areas are producing
terOSINT system in a security operations environment. Some limited research in this area which include South America,
researchers do not have plans for an operational implemen- Central Asia, Africa and Eastern Europe. This is particularly
tation but are making improvements to their system so it is important for systems employing natural language process-
suitable in a real-world context [97]. ing as cultural and lingual context will be vital to ensure
system effectiveness.
5.2.5 Alert generation and dissemination This is important as this systematic review identified a
large number of papers being utilised for law enforcement
As previously stated, a limited number of papers include and intelligence purposes. Universal OSINT applications in
the dissemination of intelligence generated by AI systems these areas will need to be employed in areas with varying
utilising OSINT. However, some researchers have included cultural contexts. It should also be noted the lack of papers
developing this capability in future research. This includes focusing on the domains of penetration testing or offensive
future directions outlined for the model proposed in [141], security, both of which may provide an interesting research
which uses Twitter data to identify the online sale of nar- opportunity for researchers investigating the use of AI with
cotics. The researchers wish to pursue including script OSINT.
123
2928 T. O. Browne et al.
The study identified a wide range of different AI mod- ilar review could be conducted that focuses purely on law
els and algorithms, which can be related to the various tasks enforcement or the operations of intelligence agencies. The
needed for OSINT operations and investigations. However, same could be done for focusing exclusively on AI-based
most of these algorithms are used for classification tasks OSINT applications used solely in a cyber threat intelligence
using binary or categorical cross-entropy. This can explain domain of use.
why the study also identified that the majority of the included
research was focused on the processing or analysis phases of
the intelligence cycle. These models are not well suited, or at
the very least need to be adapted if they can be successfully 6 Conclusion
implemented for planning and evaluation tasks. Researchers
could also consider other models if they wish to investi- This study completed a systematic review of current research
gate how these intelligence cycle stages can be served by combining AI algorithms for OSINT applications using the
AI models. Potentially using algorithms suited for planning Prisma framework. 163 articles were identified for inclusion
tasks could be integrated into subsequent models employed in the study. It was found that research that combines the
in classification tasks. fields of OSINT and AI is a growing area of research and cur-
This systematic review also identified there was a limited rently, India and the United States are the main geographic
degree of integration of pre-existing OSINT tools with AI centres engaging in this area of study. The identified system-
models. This could be beneficial to practitioners and industry atic reviews in this area of study do not incorporate much of
by enabling them to integrate directly with their pre-existing this growth as they do not include papers that post-date the
tool set. There was also a small amount of research detail- year 2019.
ing work towards a fully AI-enabled OSINT tool. To create OSINT combined with AI provides a method to reduce the
such a tool that is universal, or general. there will need to workload of intelligence analysts across multiple domains
be contributions from various geographic regions, continued working with large amounts of data. Various security-related
growth in this area of research, and direction taken to include domains could utilise this form of research including security
the planning and evaluation stages of the intelligence cycle. operations, law enforcement and intelligence services. This
This is most likely possible given the already large variety of variety of potential use is due to the wide range of potential
OSINT tasks being performed by AI. applications that can leverage benefits from AI-based OSINT
systems. This is made possible by the number and variety of
5.4 Limitations of this study available AI algorithms that are each suited to specific and
different tasks.
Several limitations can be outlined specifically concerning There is limited research that covers the use of how AI can
this systematic review. These can be highlighted by the be used in conjunction with pre-existing OSINT tools and
domains of use and the scope of the study. it appears there has been little significant effort to produce
The domain research questions can be highlighted as con- a multi-faceted AI-based OSINT tool that could combine
taining some limitations. One such limitation is that the the various models available. The majority of the included
assessment of papers for their domain suitability can be sub- research fits into either the collection, processing or analy-
jective. This is particularly true when there is a degree of sis phases of the intelligence cycle. There has been limited
cross-over between domains. Some systems used for domain research concerning the dissemination phase, so it would be
name generation algorithm detection have been designed for beneficial to include this in the form of alerts to relevant
use within security operations centres but the systems could parties. Future research could help with dissemination by
also be well placed for cyber threat intelligence operations. categorising pre-analysed OSINT based on which persons
There is no way to directly quantify this cross-over between need to be notified. This could be based on jurisdiction or
domains of application, so the assessment is qualitative and security clearance.
could be open to debate discussion in terms of the usefulness Social media and online datasets form a significant portion
of the application to a particular domain. The same could be of the types of OSINT that are used in the included research.
said for systems that are to be used in law enforcement inves- Research that uses social media appears to be heavily focused
tigations, intelligence services and government departments on Twitter and this is especially true of threat intelligence
may also find the systems useful but once again there is no systems. This could be due to the ease of access to Twitter
direct way to quantify this and it is open to some discussion. as a data source. Future research should seek to engage with
The scope of the study is also wide and while this has alternate data sources such as the dark web or hacker forums.
uncovered a high number of interesting papers in this area, Newer platforms such as Discord appear to be completely
specific domains might be better served by a more focused absent and could provide researchers with novel sources of
review that focuses on a particular area. For example, a sim- data in the future.
123
A systematic review on research utilising artificial intelligence… 2929
It was also found that there is a limited volume of research media. Int. J. Cyber Warfare Terror. 9(1), 1–18 (2019). https://doi.
relating to combined OSINT AI applications that could be org/10.4018/IJCWT.2019010101
7. Aliprandi, C., De Luca, A.E., Di Pietro, G., Raffaelli, M., Gazzè,
utilised for penetration testing purposes. OSINT is a valuable D., La Polla, M.N., Marchetti, A., Tesconi, M.: Caper: Crawling
source of information for penetration testers and the domain and analysing facebook for intelligence purposes. In: ASONAM
could benefit from research that combines AI with OSINT 2014 - Proceedings of the 2014 IEEE/ACM International Confer-
in this context. Future research in this area is progressing ence on Advances in Social Networks Analysis and Mining, pp.
665–669. IEEE (2014). https://doi.org/10.1109/ASONAM.2014.
in several directions that could either address some of these 6921656
limitations or provide novel directions. These future research 8. Alves, F., Ferreira, P.M., Bessani, A.: Design of a classification
pathways include adding multi-lingual support to OSINT model for a twitter-based streaming threat monitor. In: Pro-
AI models, incorporating additional data sources, improving ceedings - 49th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks Workshop, DSN-W 2019, pp.
model robustness against misinformation and data poisoning,
9–14 (2019). https://doi.org/10.1109/DSN-W.2019.00010
testing platforms in real-world situations and finally adding 9. Alves, F., Bettini, A., Ferreira, P.M., Bessani, A.: Processing
alert generation and dissemination functions. tweets for cybersecurity threat awareness. Inf. Syst. 95, 101586
(2021). https://doi.org/10.1016/j.is.2020.101586
Funding Open Access funding enabled and organized by CAUL and 10. API, T.: Tweepy API. https://github.com/tweepy/tweepy/blob/
its Member Institutions. cc8dd493b1ed04b48a6dd4e47eeb9a9064f83024/docs/api.rst
(2022). Accessed 05 December 2022
Data availability Data in relation to this study is available on figshare 11. Attardi, G., Sartiano, D.: Bidirectional lstm models for dga classi-
fication. In: International Symposium on Security in Computing
and Communication - SSCC 2018: Security in Computing and
Declarations Communications, pp. 687–694. Springer (2019). https://doi.org/
10.1007/978-981-13-5826-5_54
Conflict of interest The authors have no Conflict of interest to declare 12. Backfried, G., Shalunts, G.: Sentiment analysis of media in Ger-
concerning this paper. man on the refugee crisis in Europe. In: Díaz, P., Bellamine Ben
Saoud, N., Dugdale, J., Hanachi, C. (eds.) Third International
Open Access This article is licensed under a Creative Commons Conference, ISCRAM-med 2016, Madrid, Spain, October 26-28,
Attribution 4.0 International License, which permits use, sharing, adap- 2016, Proceedings. Lecture Notes in Business Information Pro-
tation, distribution and reproduction in any medium or format, as cessing, vol. 265, pp. 234–241. Springer, Cham (2016)
long as you give appropriate credit to the original author(s) and the 13. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for
source, provide a link to the Creative Commons licence, and indi- hate speech detection in tweets. In: Proceedings of the 26th Inter-
cate if changes were made. The images or other third party material national Conference on World Wide Web Companion - WWW
in this article are included in the article’s Creative Commons licence, ’17 Companion, pp. 759–760. ACM Press, New York, New York
unless indicated otherwise in a credit line to the material. If material (2017). https://doi.org/10.1145/3041021.3054223 . http://dl.acm.
is not included in the article’s Creative Commons licence and your org/citation.cfm?doid=3041021.3054223
intended use is not permitted by statutory regulation or exceeds the 14. Badr, E.M., Salam, M.A., Ali, M., Ahmed, H.: Social media
permitted use, you will need to obtain permission directly from the copy- sentiment analysis using machine learning and optimization tech-
right holder. To view a copy of this licence, visit http://creativecomm niques. Int. J. Comput. Appl. 178(41), 31–36 (2019). https://doi.
ons.org/licenses/by/4.0/. org/10.5120/ijca2019919306
15. Batarseh, F.A.: In: Schintler, L.A., McNeely, C.L. (eds.) Busi-
ness Intelligence Analytics, pp. 141–145. Springer, Cham (2022).
https://doi.org/10.1007/978-3-319-32010-6_253
References 16. Behzadan, V., Aguirre, C., Bose, A., Hsu, W.: Corpus and
deep learning classifier for collection of cyber threat indicators
1. AI, O.: GPT-4 (2024). https://openai.com/gpt-4 in twitter stream. In: 2018 IEEE International Conference on
2. Aboul-Ela, A.: Sublist3r. Github (2020). https://github.com/ Big Data (Big Data), pp. 5002–5007. IEEE (2018). https://doi.
aboul3la/Sublist3r org/10.1109/BigData.2018.8622506. https://ieeexplore.ieee.org/
3. Aggarwal, K.: DataSploit. https://github.com/DataSploit/ document/8622506/
datasploit/commits?author=KunalAggarwal (2023). Accessed 17. Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: Knn based
17 July 2023 machine learning approach for text and document mining. Int.
4. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news J. Datab. Theory Appl. 7(1), 61–70 (2014)
using n-gram analysis and machine learning techniques. Lecture 18. Bird, S.: Natural language toolkit. NLTK Team (2022). https://
Notes in Computer Science (including subseries Lecture Notes in www.nltk.org/
Artificial Intelligence and Lecture Notes in Bioinformatics) 10618 19. Blindfuzzy: Low Hanging Fruit. https://github.com/blindfuzzy/
LNCS (December), pp. 127–138 (2017). https://doi.org/10.1007/ LHF (2023). Accessed 17 July 2023
978-3-319-69155-8_9 20. Blocklist: Block List. http://www.blocklist.de/en/index.html
5. Alamsyah, A., Rizkika, W., Nugroho, D.D.A., Renaldi, F., Saadah, (2023). Accessed 17 July 2023
S.: Dynamic large scale data on twitter using sentiment anal- 21. BuiltWith: BuiltWith. https://builtwith.com/ (2022). Accessed 05
ysis and topic modeling. In: 2018 6th International Confer- December 2022
ence on Information and Communication Technology (ICoICT), 22. Catania, C., García, S., Torres, P.: Deep convolutional neural net-
pp. 254–258. IEEE (2018). https://doi.org/10.1109/ICoICT.2018. works for dga detection. In: Argentine Congress of Computer
8528776. https://ieeexplore.ieee.org/document/8528776/ Science - CACIC 2018: Computer Science – CACIC 2018, pp.
6. Alguliyev, R.M., Aliguliyev, R.M., Abdullayeva, F.J.: The 327–340. Springer (2019). https://doi.org/10.1007/978-3-030-
improved lstm and cnn models for ddos attacks prediction in social 20787-8_23
123
2930 T. O. Browne et al.
23. Chauhan, S., Panda, N.K.: Osint tools and techniques. In: Hack- 39. Drus, Z., Khalid, H.: Sentiment analysis in social media and
ing Web Intelligence, pp. 101–131. Elsevier, Amsterdam (2015). its application: systematic literature review. Procedia Comput.
https://doi.org/10.1016/B978-0-12-801867-5.00006-9 Sci. 161, 707–714 (2019). https://doi.org/10.1016/j.procs.2019.
24. Choudhary, C., Sivaguru, R., Pereira, M., Yu, B., Nascimento, 11.174
A.C., De Cock, M.: Algorithmically generated domain detection 40. Dughyala, N., Potluri, S., Sumesh, K.J., Pavithran, V.: Automat-
and malware family classification. In: International Symposium ing the detection of cyberstalking. In: Proceedings of the 2nd
on Security in Computing and Communication SSCC 2018: Secu- International Conference on Electronics and Sustainable Com-
rity in Computing and Communications, pp. 640–655. Springer munication Systems, ICESC 2021, pp. 887–892 (2021). https://
(2019). https://doi.org/10.1007/978-981-13-5826-5_50 doi.org/10.1109/ICESC51422.2021.9532858
25. CMS, W.: What CMS. https://whatcms.org/ (2022). Accessed 05 41. Ebrahimi, M., Suen, C.Y., Ormandjieva, O.: Detecting predatory
December 2022 conversations in social media by deep convolutional neural net-
26. Computing Machinary (ACM), A.: ACM Digital Library. https:// works. Digit. Investig. 18, 33–49 (2016). https://doi.org/10.1016/
dl.acm.org/ (2021). Accessed 08 November 2021 j.diin.2016.07.001
27. Corporation, M.: ATT&CK. https://attack.mitre.org/ (2023). 42. Edwards, M., Rashid, A., Rayson, P.: A systematic survey of
Accessed 17 July 2023 online data mining technology intended for law enforcement.
28. Danda, M.: Open Source Intelligence and Cybersecurity. PhD ACM Comput. Surv. (2015). https://doi.org/10.1145/2811403
thesis, Webster University (2019). https://mattdanda.com/wp- 43. Eiji Aramaki, Sachiko Maskawa, M.M.: Twitter catches the flu:
content/uploads/2019/05/Paper-OSINT.pdf detecting influenza epidemics using twitter. In: EMNLP 11:
29. Das Bhattacharjee, S., Talukder, A., Balantrapu, B.V.: Active proceedings of the conference on empirical methods in nat-
learning based news veracity detection with feature weighting ural language processing, pp. 1568–1576 (2011). https://doi.
and deep-shallow fusion. In: 2017 IEEE International Conference org/10.5555/2145432.2145600. https://dl.acm.org/doi/10.5555/
on Big Data (Big Data), vol. 2018-January, pp. 556–565. IEEE 2145432.2145600
(2017). https://doi.org/10.1109/BigData.2017.8257971. http:// 44. Ekwunife, N., Ekwunife, N.: National security intelligence
ieeexplore.ieee.org/document/8257971/ through social network data mining. In: Proceedings - 2020
30. De Smedt, T., De Pauw, G., Van Ostaeyen, P.: Automatic Detec- IEEE International Conference on Big Data, Big Data 2020, pp.
tion of Online Jihadist Hate Speech. Technical Report February, 2270–2273 (2020). https://doi.org/10.1109/BigData50022.2020.
University of Antwerp, Antwerp (Mar 2018). https://doi.org/10. 9377940
13140/rg.2.2.28155.41767. arXiv:1803.04596 45. Eldridge, C., Hobbs, C., Moran, M.: Fusing algorithms and ana-
31. Di Pietro, G., Aliprandi, C., De Luca, A.E., Raffaelli, M., Soru, lysts: open-source intelligence in the age of ‘big data’. Intell. Natl.
T.: Semantic crawling: an approach based on named entity recog- Secur. 33(3), 391–406 (2018). https://doi.org/10.1080/02684527.
nition. In: ASONAM 2014 - Proceedings of the 2014 IEEE/ACM 2017.1406677
International Conference on Advances in Social Networks Analy- 46. Evangelista, J.R.G., Sassi, R.J., Romero, M., Napolitano, D.:
sis and Mining (2014). https://doi.org/10.1109/ASONAM.2014. Systematic literature review to investigate the application of
6921661. https://ieeexplore.ieee.org/document/6921661 open source intelligence (osint) with artificial intelligence. J.
32. DiBona, P., Ho, S.-S.: Automated information foraging for sense- Appl. Secur. Res. 16(3), 345–369 (2021). https://doi.org/10.1080/
making. In: Pham, T. (ed.) Artificial Intelligence and Machine 19361610.2020.1761737
Learning for Multi-Domain Operations Applications, p. 12. 47. Exploit-db: Google Hacking Database. https://www.exploit-db.
SPIE (2019). https://doi.org/10.1117/12.2518893. https://www. com/google-hacking-database (2022). Accessed 13 December
spiedigitallibrary.org/conference-proceedings-of-spie/11006/ 2022
110060D/Automated-information-foraging-for-sensemaking/ 48. Fang, Y., Liu, Y., Huang, C., Liu, L.: Fastembed: predicting vul-
10.1117/12.2518893.shorthttps://www.spiedigitallibrary.org/ nerability exploitation possibility based on ensemble machine
conference-proceedings-of-spie/11006/2518893/Automated- learning algorithm. PLoS ONE 15(2), 0228439 (2020). https://
information-f doi.org/10.1371/journal.pone.0228439
33. Dionisio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat 49. Farina, A., Ortenzi, L., Ristic, B., Skvortsov, A.: Integrated
detection from twitter using deep neural networks. In: 2019 Inter- sensor systems and data fusion for homeland protection. In:
national Joint Conference on Neural Networks (IJCNN), vol. Academic Press Library in Signal Processing: Volume 2 Com-
2019-July, pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN. munications and Radar Signal Processing vol. 2, pp. 1245–1320.
2019.8852475. https://ieeexplore.ieee.org/document/8852475/ Elsevier Masson SAS (2014). https://doi.org/10.1016/B978-0-
34. Dionisio, N., Alves, F., Ferreira, P.M., Bessani, A.: Towards 12-396500-4.00022-3. https://linkinghub.elsevier.com/retrieve/
end-to-end cyberthreat detection from twitter using multi- pii/B9780123965004000223
task learning. In: 2020 International Joint Conference on 50. Fenza, G., Gallo, M., Loia, V., Volpe, A.: Cognitive name-face
Neural Networks (IJCNN), pp. 1–8. IEEE (2020). https:// association through context-aware graph neural network. Neu-
doi.org/10.1109/IJCNN48605.2020.9207159. https://ieeexplore. ral Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-
ieee.org/document/9207159/ 06617-z
35. Direct, S.: Science Direct. https://www.sciencedirect.com/ 51. Framework, O.: OSINT Framework (2023)
(2021). Accessed 15 November 2021 52. Galán-GarcÍa, P., Puerta, J.G.D.L., Gómez, C.L., Santos, I.,
36. Disboard: Disboard - Cyber Security. https://disboard.org/search? Bringas, P.G.: Supervised machine learning for the detection of
keyword=cyber+security (2023). Accessed 27 January 2023 troll profiles in twitter social network: application to a real case
37. Discord: Discord. https://discord.com/ (2023). Accessed 27 Jan- of cyberbullying. Log. J. IGPL 24(1), 048 (2015). https://doi.org/
uary 2023 10.1093/jigpal/jzv048
38. Drichel, A., Drury, V., Brandt, J., Meyer, U.: Finding phish in 53. Gao, J., Yin, Y., Myers, K.R., Lakhani, K.R., Wang, D.: Potentially
a haystack: a pipeline for phishing classification on certificate long-lasting effects of the pandemic on scientists. Nat. Commun.
transparency logs. In: The 16th International Conference on Avail- 12, 6188 (2021). https://doi.org/10.1038/s41467-021-26428-z
ability, Reliability and Security, pp. 1–12. ACM, New York, 54. García Lozano, M., Schreiber, J., Brynielsson, J.: Tracking geo-
NY (2021). https://doi.org/10.1145/3465481.3470111. https://dl. graphical locations using a geo-aware topic model for analyzing
acm.org/doi/10.1145/3465481.3470111
123
A systematic review on research utilising artificial intelligence… 2931
social media data. Decis. Support Syst. 99, 18–29 (2017). https:// 69. Grant, T.: Building an ontology for planning attacks that minimize
doi.org/10.1016/j.dss.2017.05.006 collateral damage: literature survey. In: 14th International Con-
55. García Lozano, M., Brynielsson, J., Franke, U., Rosell, M., Tjörn- ference on Cyber Warfare and Security, ICCWS 2019, pp. 78–86
hammar, E., Varga, S., Vlassov, V.: Veracity assessment of online (2019)
data. Decis. Support Syst. 129(July 2019), 113132 (2020). https:// 70. Grepp, A.: Grepp.app. https://grep.app/ (2022). Accessed 0
doi.org/10.1016/j.dss.2019.113132 December 2022
56. Garzia, F., Cusani, R., Borghini, F., Saltini, B., Lombardi, M., 71. Gupta, A., Pruthi, J., Sahu, N.: Sentiment analysis of tweets using
Ramalingam, S.: Perceived risk assessment through open-source machine learning approach. Int. J. Comput. Sci. Mob. Comput.
intelligent techniques for opinion mining and sentiment analy- 6(4), 444–458 (2017)
sis: the case study of the papal basilica and sacred convent of 72. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sen-
saint francis in assisi, italy. In: 2018 International Carnahan Con- timent analysis. Procedia Comput. Sci. 17, 26–32 (2013). https://
ference on Security Technology (ICCST), vol. 2018-Octob, pp. doi.org/10.1016/j.procs.2013.05.005
1–5. IEEE (2018). https://doi.org/10.1109/CCST.2018.8585519. 73. Haoxiang, D.W.: Emotional analysis of bogus statistics in social
https://ieeexplore.ieee.org/document/8585519/ media. J. Ubiquitous Comput. Commun. Technol. 2(3), 178–186
57. Gautam, A.S., Gahlot, Y., Kamat, P.: Hacker forum exploit and (2020). https://doi.org/10.36548/jucct.2020.3.006
classification for proactive cyber threat intelligence. In: Lecture 74. Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine
Notes in Networks and Systems, vol. 98, pp. 279–285. Springer learning-based sentiment analysis for twitter accounts. Math.
(2020). https://doi.org/10.1007/978-3-030-33846-6_32 Comput. Appl. 23(1), 11 (2018). https://doi.org/10.3390/
58. Geetha, R., Karthika, S., Kumaraguru, P.: Tweet-scan-post: a sys- mca23010011
tem for analysis of sensitive private data disclosure in online social 75. Hassan, N.A.: Open Source Intelligence Methods and Tools: A
media. Knowl. Inf. Syst. 63(9), 2365–2404 (2021). https://doi.org/ Practical Guide to Online Intelligence. Apress, New York (2018)
10.1007/s10115-021-01592-2 76. Hassan, N.A., Hijazi, R.: Search engine techniques. In: Open
59. Ghazi, Y., Anwar, Z., Mumtaz, R., Saleem, S., Tahir, A.: A super- Source Intelligence Methods and Tools, pp. 127–201. Apress,
vised machine learning based approach for automatically extract- Berkeley, CA (2018)
ing high-level threat intelligence from unstructured sources. In: 77. Hernandez Mediná, M.J., Pinzón Hernández, C.C., Díaz López,
2018 International Conference on Frontiers of Information Tech- D.O., Garcia Ruiz, J.C., Pinto Rico, R.A.: Open source intel-
nology, FIT 2018, pp. 129–134. IEEE (2018). https://doi.org/ ligence (osint) in a colombian context and sentiment analy-
10.1109/FIT.2018.00030 . https://ieeexplore.ieee.org/document/ sis. Revista vínculos 15(2), 195–214 (2018). https://doi.org/10.
8616979 14483/2322939X.13504
60. Giacalone, M., Buondonno, A., Romano, A., Santarcangelo, V.: 78. Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, K.,
Innovative methods for the development of a notoriety system. Martinez-Hernandez, V., Perez-Meana, H., Olivares-Mercado, J.,
Lecture Notes in Computer Science (including subseries Lecture Sanchez, V.: Social sentiment sensor in twitter for predicting
Notes in Artificial Intelligence and Lecture Notes in Bioinformat- cyber-attacks using regularization. Sensors 18(5), 1380 (2018).
ics) 10147 LNAI, 218–225 (2017). https://doi.org/10.1007/978- https://doi.org/10.3390/s18051380
3-319-52962-2_19 79. Herrera-Cubides, J.F., Gaona-García, P.A., Sánchez-Alonso, S.:
61. Giachanou, A., Crestani, F.: Like it or not - a survey of twitter Open-source intelligence educational resources: a visual per-
sentiment analysis methods. ACM Comput. Surv. 49(2), 1–41 spective analysis. Appl. Sci. (Switzerland) 10(21), 1–25 (2020).
(2016). https://doi.org/10.1145/2938640 https://doi.org/10.3390/app10217617
62. Goel, S., Sachdeva, N., Kumaraguru, P., Subramanyam, 80. Hiransha, M.E.A.G., Gopalakrishnan, E.A., Menon, V.K., Soman,
A.V., Gupta, D.: Pichunt: Social Media Image Retrieval for K.P.: Nse stock market prediction using deep-learning models.
Improved Law Enforcement. Lecture Notes in Computer Sci- Procedia Comput. Sci. 132(Iccids), 1351–1362 (2018). https://
ence (including subseries Lecture Notes in Artificial Intelli- doi.org/10.1016/j.procs.2018.05.050
gence and Lecture Notes in Bioinformatics) 10046 LNCS, 81. Holder, E., Wang, N.: Correction to: Explainable artificial intelli-
pp. 206–223 (2016). https://doi.org/10.1007/978-3-319-47880- gence (xai) interactively working with humans as a junior cyber
7_13. arXiv:1608.00905 analyst. Hum. Intell. Syst. Integr. (2021). https://doi.org/10.1007/
63. Goldszmidt, M., Najork, M., Paparizos, S.: Boot-strapping lan- s42454-021-00039-x
guage identifiers for short colloquial postings. In: Lecture Notes 82. Holt, T.J., Bossler, A.M.: The palgrave handbook of international
in Computer Science (including Subseries Lecture Notes in Arti- cybercrime and cyberdeviance. In: The Palgrave Handbook of
ficial Intelligence and Lecture Notes in Bioinformatics) vol. 8189, International Cybercrime and Cyberdeviance, pp. 1–1489 (2020).
LNAI, pp. 95–111 (2013). https://doi.org/10.1007/978-3-642- Chap. 7. https://doi.org/10.1007/978-3-319-78440-3
40991-2_7 83. Hoppa, M.A., Debb, S.M., Hsieh, G., KC, B.: Twitterosint:
64. Gong, S., Lee, C.: Efficient Data Noise-Reduction for Cyber automated open source intelligence collection, analysis
Threat Intelligence System, vol. 715, pp. 591–597. Springer, New & visualization tool. In: Annual Review of CyberTher-
York (2021) apy and Telemedicine, pp. 121–128 (2019). https://www.
65. Google: Google Data Studio. Google. https://datastudio.google. proquest.com/docview/2153621548?pq-origsite=gscholar&
com/ (2022) fromopenview=true
66. Google: Google Scholar (2021) 84. Howells, K., Ertugan, A.: Applying fuzzy logic for sentiment
67. Google: Google Sheets. Google (2022) analysis of social media network data in marketing. Procedia
68. Grandi, R., Neri, F.: Sentiment analysis and city branding. Comput. Sci. 120(January), 664–670 (2017). https://doi.org/10.
In: Catania, B., Cerquitelli, T., Chiusano, S., Guerrini, G., 1016/j.procs.2017.11.293
Kämpf, M., Kemper, A., Novikov, B., Palpanas, T., Pokorný, 85. Huang, Y.T., Lin, C.Y., Guo, Y.R., Lo, K.C., Sun, Y.S., Chen,
J., Vakali, A. (eds.) Advances in Intelligent Systems and Com- M.C.: Open source intelligence for malicious behavior discov-
puting. Advances in Intelligent Systems and Computing, vol. ery and interpretation. IEEE Trans. Dependable Secure Com-
241, pp. 339–349. Springer, Cham (2014). https://doi.org/10. put. 5971(c), 1–14 (2021). https://doi.org/10.1109/TDSC.2021.
1007/978-3-319-01863-8_36. http://link.springer.com/10.1007/ 3119008
978-3-319-01863-8
123
2932 T. O. Browne et al.
86. Hutto, C.J. and Gilbert, E.: Vader: A parsimonious rule-based 102. Jyothsna, P.V., Prabha, G., Shahina, K.K., Vazhayil, A.: Detecting
model for sentiment analysis of social media text. In: Eighth Inter- dga using deep neural networks (dnns). In: International Sym-
national AAAI Conference on Weblogs and Social Media, p. 18 posium on Security in Computing and Communication - SSCC
(2014) 2018: Security in Computing and Communications, pp. 695–706.
87. IDC: Data Creation and Replication Will Grow at a Faster Springer (2019). https://doi.org/10.1007/978-981-13-5826-5_55
Rate than Installed Storage Capacity, According to the IDC 103. Kaiser, S., Ferens, K.: Variance fractal dimension feature selection
Global DataSphere and StorageSphere Forecast (2021). https:// for detection of cyber security attacks. In: Transactions on Compu-
www.idc.com/getdoc.jsp?containerId=prUS47560321 Accessed tational Science and Computational Intelligence, pp. 1029–1045
02 November 2021 (2021). https://doi.org/10.1007/978-3-030-70296-0_82
88. IEEE: IEE Explore. https://ieeexplore.ieee.org/ (2021). Accessed 104. Kallus, N.: On the predictive power of web intelligence and social
07 November 2021 media the best way to predict the future is to tweet it. In: Lecture
89. Iorga, D., Corlatescu, D., Grigorescu, O., Sandescu, C., Das- Notes in Computer Science (including Subseries Lecture Notes
calu, M., Rughinis, R.: Early detection of vulnerabilities from in Artificial Intelligence and Lecture Notes in Bioinformatics),
news websites using machine learning models. In: Proceedings - vol. 9546, pp. 26–45 (2016). https://doi.org/10.1007/978-3-319-
RoEduNet IEEE International Conference, vol. 2020-December 29009-6_2
(2020). https://doi.org/10.1109/RoEduNet51892.2020.9324852 105. Karthika, S., Bhalaji, N., Chithra, S., Sri Harikarthick, N., Bhat-
90. Ish, D., Ettinger, J., Ferris, C.: Evaluating the effectiveness of arti- tacharya, D.: Noregint-a tool for performing osint and analysis
ficial intelligence systems in intelligence. Analysis (2021). https:// from social media. In: Lecture Notes in Networks and Systems,
doi.org/10.7249/rr-a464-1 vol. 173 LNNS, pp. 971–980. Springer (2021). https://doi.org/10.
91. ITU: Measuring Digital Development Facts And Figures 2020, 1007/978-981-33-4305-4_71
pp. 1–15. ITU Publications (2020) 106. Kashyap, G.S., Malik, K., Wazir, S., Khan, R.: Using machine
92. Iwona Chomiak-Orsa, Artur Rot, B.B.: Artificial intelligence in learning to quantify the multimedia risk due to fuzzing. Multime-
cybersecurity: The use of ai along the cyber kill chain. In: Interna- dia Tools and Applications (0123456789) (2021) https://doi.org/
tional Conference on Computational Collective Intelligence, pp. 10.1007/s11042-021-11558-9
406–416. Springer (2019). https://doi.org/10.1007/978-3-030- 107. Katz, B.: The analytic edge: Leveraging Emerging Technologies
28374-2. https://doi.org/10.1007/978-3-030-28374-2_35 to Transform Intelligence Analysis. Technical Report, Center
93. Jain, P., Bendapudi, H., Rao, S.: Eequest: an event extrac- for Strategic and International Studies (CSIS) (2020). https://
tion and query system. In: Proceedings of the 9th Annual www.jstor.org/stable/resrep26414?seq=1#metadata_info_tab_
ACM India Conference, pp. 59–66. ACM, New York, NY, contents
USA (2016). https://doi.org/10.1145/2998476.2998482. https:// 108. Kawaguchi, Y., Yamada, A., Ozawa, S.: Ai web-contents ana-
dl.acm.org/doi/10.1145/2998476.2998482 lyzer for monitoring underground marketplace. In: Lecture Notes
94. Jain, S., Sharma, V., Kaushal, R.: Towards automated real-time in Computer Science (including Subseries Lecture Notes in Artifi-
detection of misinformation on twitter. In: 2016 International cial Intelligence and Lecture Notes in Bioinformatics), vol. 10638
Conference on Advances in Computing, Communications and LNCS, pp. 888–896 (2017). https://doi.org/10.1007/978-3-319-
Informatics (ICACCI), pp. 2015–2020. IEEE (2016). https://doi. 70139-4_90
org/10.1109/ICACCI.2016.7732347. http://ieeexplore.ieee.org/ 109. Kelly, J., Delaus, M., Hemberg, E., Orreilly, U.M.: Adversarially
document/7732347/ adapting deceptive views and reconnaissance scans on a software
95. Jeon, S., Moon, J.: Malware-detection method with a convolu- defined network. In: 2019 IFIP/IEEE Symposium on Integrated
tional recurrent neural network using opcode sequences. Inf. Sci. Network and Service Management (IM), pp. 49–54 (2019)
535, 1–15 (2020). https://doi.org/10.1016/j.ins.2020.05.026 110. Khalid, W., Soleymani, I., Mortensen, N.H., Sigsgaard, K.V.:
96. Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal Ai-Based Maintenance Scheduling for Offshore Oil and Gas Plat-
fusion with recurrent neural networks for rumor detection on forms, vol. 2021-May, pp. 1–6. IEEE (2021). https://doi.org/10.
microblogs. In: Proceedings of the 25th ACM International Con- 1109/RAMS48097.2021.9605794 . https://ieeexplore.ieee.org/
ference on Multimedia, pp. 795–816. ACM, New York, NY, document/9605794/
USA (2017). https://doi.org/10.1145/3123266.3123454 . https:// 111. Khan, M., Rehman, O., Rahman, I.M.H., Ali, S.: Lightweight
dl.acm.org/doi/10.1145/3123266.3123454 testbed for cybersecurity experiments in scada-based systems. In:
97. Johnsen, J.W., Franke, K.: The impact of preprocessing in 2020 International Conference on Computing and Information
natural language for open source intelligence and criminal inves- Technology (ICCIT-1441), pp. 1–5. IEEE (2020). https://doi.org/
tigation. In: 2019 IEEE International Conference on Big Data 10.1109/ICCIT-144147971.2020.9213791. https://ieeexplore.
(Big Data), pp. 4248–4254. IEEE (2019). https://doi.org/10. ieee.org/document/9213791/
1109/BigData47090.2019.9006006. https://ieeexplore.ieee.org/ 112. Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A
document/9006006/ benchmark study of machine learning models for online fake news
98. Johnson, D.O.: Overview of artificial intelligence. In: Medical detection. Mach. Learn. Appl. 4(May), 100032 (2021). https://doi.
Applications of Artificial Intelligence, pp. 27–46. CRC Press, org/10.1016/j.mlwa.2021.100032
Boca Raton (2013). https://doi.org/10.1201/b15618-6 113. Khurana, N., Mittal, S., Piplai, A., Joshi, A.: Preventing poisoning
99. Josan, G.S., Kaur, J.: Lstm network based malicious domain name attacks on ai based threat intelligence systems. IEEE Int. Work-
detection. Int. J. Eng. Adv. Technol. 8(6), 3187–3191 (2019). shop Mach. Learn. Signal Process. MLSP (2019). https://doi.org/
https://doi.org/10.35940/ijeat.F8809.088619 10.1109/MLSP.2019.8918803
100. Ju, Y., Li, Q., Liu, H.Y., Cui, X.M., Wang, Z.H.: Study on applica- 114. Kleissner, P.: Intelligence X (2022). https://intelx.io/ Accessed 15
tion of open source intelligence from social media in the military. May 2022
J. Phys. Conf. Ser. (2020). https://doi.org/10.1088/1742-6596/ 115. Køien, G.M.: Initial reflections on the use of augmented cognition
1507/5/052017 in derailing the kill chain. In: Lecture Notes in Computer Science
101. Jung, D., Tuan, V.T., Tran, D.Q., Park, M., Park, S.: Conceptual (including Subseries Lecture Notes in Artificial Intelligence and
framework of an intelligent decision support system for smart city Lecture Notes in Bioinformatics), vol. 12776 LNAI, pp. 433–451.
disaster management. Appl. Sci. (Switzerland) (2020). https://doi. Springer (2021). https://doi.org/10.1007/978-3-030-78114-9_30
org/10.3390/app10020666
123
A systematic review on research utilising artificial intelligence… 2933
116. Koloveas, P., Chantzios, T., Alevizopoulou, S., Skiadopoulos, S., T.P., Llinas, J. (eds.) SPIE Defense + Security, vol. 1020704,
Tryfonopoulos, C.: Intime: a machine learning-based framework p. 1020704 (2017). https://doi.org/10.1117/12.2263546. https://
for gathering and leveraging web data to cyber-threat intelli- www.spiedigitallibrary.org/conference-proceedings-of-spie/
gence. Electronics (Switzerland) (2021). https://doi.org/10.3390/ 10207/1020704/Using-soft-hard-fusion-for-misinformation-
electronics10070818 detection-and-pattern-of/10.1117/12.2263546.shorthttp://
117. Kotinas, I., Fakotakis, N.: Text analysis for decision making under proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10.
adversarial environments. In: ACM International Conference Pro- 1117/12.2263546
ceeding Series (2018). https://doi.org/10.1145/3200947.3201018 128. Lewis, S.J.: OnionScan. https://github.com/s-rah/onionscan
118. Kotzé, E., Senekal, B.A., Daelemans, W.: Automatic classification (2017). Accessed 17 July 2023
of social media reports on violent incidents in South Africa using 129. Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the
machine learning. S. Afr. J. Sci. 116(3–4), 4–11 (2020). https:// ioc game. In: Proceedings of the 2016 ACM SIGSAC Conference
doi.org/10.17159/sajs.2020/6557 on Computer and Communications Security, pp. 755–766. ACM,
119. Kuiler, E.W.: Natural language processing (nlp). In: Encyclopedia New York, NY, USA (2016). https://doi.org/10.1145/2976749.
of Big Data, pp. 679–682. Springer, Cham (2022). https://doi.org/ 2978315. https://dl.acm.org/doi/10.1145/2976749.2978315
10.1007/978-3-319-32010-6_250 130. Lison, P., Mavroeidis, V.: Automatic Detection of Malware-
120. Kumar, M.S., Ben-Othman, J., Srinivasagan, K.G., Krishnan, Generated Domains with Recurrent Neural Models (2017).
G.U.: Artificial intelligence managed network defense sys- arXiv:1709.07102
tem against port scanning outbreaks. In: Proceedings - Inter- 131. Liu, X., Keliris, A., Konstantinou, C., Sazos, M., Maniatakos, M.:
national Conference on Vision Towards Emerging Trends in Assessment of low-budget targeted cyberattacks against power
Communication and Networking, ViTECoN 2019, pp. 1–5. systems. In: IFIP/IEEE International Conference on Very Large
IEEE (2019). https://doi.org/10.1109/ViTECoN.2019.8899380. Scale Integration - System on a Chip, pp. 232–256. Springer
https://ieeexplore.ieee.org/document/8899380 (2019). https://doi.org/10.1007/978-3-030-23425-6_12
121. Layton, R., Perez, C., Birregah, B., Watters, P., Lemercier, M.: 132. Liu, X., Nourbakhsh, A., Li, Q., Shah, S., Martin, R., Duprey,
Indirect Information Linkage for Osint Through Authorship Anal- J.: Reuters tracer: Toward automated news production using
ysis of Aliases. Lecture Notes in Computer Science (including large scale social media data. In: 2017 IEEE International
subseries Lecture Notes in Artificial Intelligence and Lecture Conference on Big Data (Big Data), vol. 2018-Janua, pp.
Notes in Bioinformatics) 7867 LNAI, pp. 36–46 (2013). https:// 1483–1493. IEEE (2017). https://doi.org/10.1109/BigData.2017.
doi.org/10.1007/978-3-642-40319-4_4 8258082. http://ieeexplore.ieee.org/document/8258082/
122. Le, B.-D., Wang, G., Nasim, M., Babar, M.A.: Gathering cyber 133. Liu, S., Wang, Y., Zhang, J., Chen, C., Xiang, Y.: Addressing the
threat intelligence from twitter using novelty classification. class imbalance problem in twitter spam detection using ensem-
In: 2019 International Conference on Cyberworlds (CW), pp. ble learning. Comput. Secur. 69(September 2014), 35–49 (2017).
316–323. IEEE (2019). https://doi.org/10.1109/CW.2019.00058. https://doi.org/10.1016/j.cose.2016.12.004
https://ieeexplore.ieee.org/document/8919107/ 134. Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, T.Z.: Target-
123. Lee, J., Moon, M., Shin, K., Kang, S.: Cyber threats prediction dependent twitter sentiment classification. In: HLT ’11: Proceed-
model based on artificial neural networks using quantification of ings of the 49th Annual Meeting of the Association for Computa-
open source intelligence (osint). J. Inf. Secur. 20(3), 115–123 tional Linguistics: Human Language Technologies, pp. 151–160.
(2020). https://doi.org/10.33778/kcsa.2020.20.3.115 ACM (2011). https://doi.org/10.5555/2002472.2002492
124. Leibowicz, C.R., McGregor, S., Ovadya, A.: The deepfake 135. Longo, L., Goebel, R., Lecue, F., Kieseberg, P., Holzinger,
detection dilemma: A multistakeholder exploration of adversar- A.: Explainable artificial intelligence: Concepts, applications,
ial dynamics in synthetic media. In: Proceedings of the 2021 research challenges and visions. In: Machine Learning and
AAAI/ACM Conference on AI, Ethics, and Society, vol. 1, Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Com-
pp. 736–744. ACM, New York, NY, USA (2021). https://doi. puter Science, vol. 12279, pp. 1–16 (2020). https://doi.org/10.
org/10.1145/3461702.3462584 . https://dl.acm.org/doi/10.1145/ 1007/978-3-030-57321-8_1
3461702.3462584 136. López-Martínez, A., García-Díaz, J.A., Valencia-García, R.,
125. Levchuk, G.M., Fouse, A., Pattipati, K., Serfaty, D., McCormack, Ruiz-Martínez, A.: Cyberdect. A novel approach for cyber-
R.: Active learning and structure adaptation in teams of heteroge- bullying detection on twitter. In: International Conference on
neous agents: designing organizations of the future. In: Llinas, J., Technologies and Innovation, pp. 109–121 (2019). https://doi.org/
Hanratty, T.P. (eds.) Next-Generation Analyst VI, vol. 1065305, p. 10.1007/978-3-030-34989-9_9
4. SPIE, Orlando FL (2018). https://doi.org/10.1117/12.2305875 137. Lozano, M.G., Franke, U., Rosell, M., Vlassov, V.: Towards
. https://www.spiedigitallibrary.org/conference-proceedings-of- automatic veracity assessment of open source information. In:
spie/10653/1065305/Active-learning-and-structure-adaptation- 2015 IEEE International Congress on Big Data, pp. 199–206.
in-teams-of-heterogeneous-agents/10.1117/12.2305875.short. IEEE (2015). https://doi.org/10.1109/BigDataCongress.2015.36.
https://www.spiedigitallibrary.org/conference-proceedings-of- http://ieeexplore.ieee.org/document/7207220/
spie/10653/2305 138. Luber, M., Weisser, C., Säfken, B., Silbersdorff, A., Kneib, T.,
126. Levchuk, G., Pattipati, K., Fouse, A., Serfaty, D.: Application Kis-Katos, K.: Identifying topical shifts in twitter streams: an
of free energy minimization to the design of adaptive multi- integration of non-negative matrix factorisation, sentiment anal-
agent teams. In: Hall, R.D., Blowers, M., Williams, J. (eds.) ysis and structural break models for large scale data. In: Lecture
Disruptive Technologies in Sensors and Sensor Systems, vol. Notes in Computer Science (including subseries Lecture Notes
10206, p. 102060 (2017). https://doi.org/10.1117/12.2263542. in Artificial Intelligence and Lecture Notes in Bioinformatics)
https://www.spiedigitallibrary.org/conference-proceedings-of- 12887 LNCS, pp. 33–49 (2021). https://doi.org/10.1007/978-3-
spie/10206/102060E/Application-of-free-energy-minimization- 030-87031-7_3
to-the-design-of-adaptive/10.1117/12.2263542.shorthttp:// 139. Luo, Y., Ao, S., Luo, N., Su, C., Yang, P., Jiang, Z.: Extracting
proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10. threat intelligence relations using distant supervision and neu-
1117/12.2263542 ral networks. In: Peterson, G., Shenoi, S. (eds.) IFIP Advances
127. Levchuk, G., Shabarekh, C.: Using soft-hard fusion for misinfor- in Information and Communication Technology. IFIP Advances
mation detection and pattern of life analysis in osint. In: Hanratty, in Information and Communication Technology, vol. 306, pp.
123
2934 T. O. Browne et al.
193–211. Springer, Cham (2021). https://doi.org/10.1007/978-3- 155. Martorella, C.: theHarvester. Edge Security Research (2019).
030-88381-2_10. https://link.springer.com/10.1007/978-3-030- https://github.com/laramies/theharvester
88381-2 156. Masombuka, M., Grobler, M., Watson, B.: Towards an Artificial
140. Maciolek, P., Dobrowolski, G.: Cluo: web-scale text mining sys- Intelligence Framework to Actively Defend Cyberspace. PhD the-
tem for open source intelligence purposes. Comput. Sci. 14(1), 45 sis, University of Stellenbosch (2018)
(2013). https://doi.org/10.7494/csci.2013.14.1.45 157. Medenou, R.D., Mayo, V.M.C., Balufo, M.G., Castrillo, M.P.,
141. Mackey, T., Kalyanam, J., Klugman, J., Kuzmenko, E., Gupta, R.: Garrido, F.J.G., Martinez, A.L., Catalán, D.N., Hu, A., Rodríguez-
Solution to detect, classify, and report illicit online marketing and Bermejo, D.S., Vidal, J.M., De Riquelme, G.R.P., Berardi, A.,
sales of controlled substances via twitter: using machine learning De Santis, P., Torelli, F., Sanchez, S.L.: Cysas-s3: A novel
and web forensics to combat digital opioid access. J. Med. Internet dataset for validating cyber situational awareness related tools
Res. 20(4), 10029 (2018). https://doi.org/10.2196/10029 for supporting military operations. In: Proceedings of the 15th
142. Madakam, S., Holmukhe, R.M., Kumar Jaiswal, D.: The future International Conference on Availability, Reliability and Secu-
digital work force: robotic process automation (rpa). J. Inf. rity (2020). https://doi.org/10.1145/3407023.3409222. https://dl.
Syst. Technol. Manag. 16, 1–17 (2019). https://doi.org/10.4301/ acm.org/doi/10.1145/3407023.3409222
S1807-1775201916001 158. Mensikova, A., Mattmann, C.A., Gov, C.A.N.: Ensemble senti-
143. Mahaini, M.I., Li, S.: Detecting cyber security related twitter ment analysis to identify human trafficking in web data. ACM
accounts and different sub-groups: a multi-classifier approach. In: 1(February), 5 (2018)
ASONAM 2012 The 2021 IEEE/ ACM International Conference 159. Micallef, S.: Spiderfoot. Spiderfoot (2021). https://github.com/
on Advances in Social Networks Analysis and Mining, pp. 1– smicallef/spiderfoot/releases
11. ACM, Netherlands (2021). https://doi.org/10.1145/3487351. 160. Microsoft: Miscrosoft Excel - Mac Edition. Microsoft (2022)
3492716. https://kar.kent.ac.uk/90995/ 161. Miehling, E., Dong, R., Langbort, C., Basar, T.: Strategic infer-
144. Major, M., Fugate, S., Mauger, J., Ferguson-Walter, K.: Creating ence with a single private sample. In: 2019 IEEE 58th Conference
cyber deception games. In: Proceedings - 2019 IEEE 1st Inter- on Decision and Control (CDC), vol. December, pp. 2188–2193.
national Conference on Cognitive Machine Intelligence, CogMI, IEEE (2019). https://doi.org/10.1109/CDC40024.2019.9029544
pp. 102–111 (2019). https://doi.org/10.1109/CogMI48466.2019. . https://ieeexplore.ieee.org/document/9029544/
00023 162. Mittal, S., Joshi, A., Finin, T.: Cyber-all-intel: an ai for secu-
145. Maksimova, E.A., Sadovnikova, N.P., Baranov, V.V., Gromov, rity related threat intelligence. arXiv preprint, 1–13 (2019)
Y.Y., Lauta, O.S., Tret’yakova, L.V.: Robot technological system arXiv:1905.02895
of analysis of cybersecurity information systems and communi- 163. Mohan, V.S., R, V., KP, S., Poornachandran, P.: S.p.o.o.f net: syn-
cation networks. In: Journal of Physics: Conference Series, vol. tactic patterns for identification of ominous online factors. In:
1661 (2020). https://doi.org/10.1088/1742-6596/1661/1/012119 2018 IEEE Security and Privacy Workshops (SPW), pp. 258–263.
146. Malik, J., Akhunzada, A., Bibi, I., Imran, M., Musaddiq, A., IEEE (2018). https://doi.org/10.1109/SPW.2018.00041. https://
Kim, S.W.: Hybrid deep learning: an efficient reconnaissance and ieeexplore.ieee.org/document/8424657/
surveillance detection mechanism in sdn. IEEE Access 8, 134695– 164. Momtazi, S.: Fine-grained german sentiment analysis on social
134706 (2020). https://doi.org/10.1109/ACCESS.2020.3009849 media. In: Proceedings of the 8th International Conference on
147. Maltego: Maltego. Maltego (2022). https://www.maltego.com/ Language Resources and Evaluation, LREC 2012, pp. 1215–1220
148. Mani, G.S.: Data Processing and Analytics for National Security (2012)
Intelligence: An Overview, vol. 71, pp. 293–315. Springer, New 165. Morel, B.: Artificial intelligence a key to the future of cyberse-
York (2022). https://doi.org/10.1007/978-981-16-2937-2_20 curity. In: Proceedings of the ACM Conference on Computer and
149. Mantere, M., Sailio, M., Noponen, S.: Detecting anomalies in Communications Security, pp. 93–97. ACM (2011). https://doi.
printed intelligence factory network. In: Lecture Notes in Com- org/10.1145/2046684.2046699. https://dl.acm.org/doi/10.1145/
puter Science (including Subseries Lecture Notes in Artificial 2046684.2046699
Intelligence and Lecture Notes in Bioinformatics), vol. 8924, pp. 166. Morgenstern, M.: Search is back. https://searchisback.com/
1–16 (2015). https://doi.org/10.1007/978-3-319-17127-2_1 (2023). Accessed 17 July 2023
150. Marco Pennacchiotti, A.-M.P.: A machine learning approach to 167. Motion: Motion (2024). https://www.usemotion.com/
twitter user classification. In: Proceedings of the International 168. Mubarak, S., et al.: Industrial datasets with ICS testbed and attack
AAAI Conference on Web and Social Media, pp. 281–288 (2021). detection using machine learning techniques. Intell. Autom. Soft
https://ojs.aaai.org/index.php/ICWSM/article/view/14139 Comput. 31(3), 1345–1360 (2022)
151. Marin, E., Almukaynizi, M., Shakarian, P.: Inductive and deduc- 169. Mubin, O., Alnajjar, F., Shamail, A., Shahid, S., Simoff, S.: The
tive reasoning to assist in cyber-attack prediction. In: 2020 10th new norm: computer science conferences respond to covid-19.
Annual Computing and Communication Workshop and Confer- Scientometrics 126, 1813–1827 (2021). https://doi.org/10.1007/
ence, CCWC 2020, pp. 262–268 (2020) https://doi.org/10.1109/ s11192-020-03788-9
CCWC47524.2020.9031154 170. Nadine Wirkuttis, H.K.: Artificial intelligence in cybersecurity.
152. Marlin, T.J.: Detecting Fake News by Combining Cybersecu- Cyber Intell. Secur. 1(1), 103–118 (2017)
rity, Open-source Intelligence, and Data Science. PhD The- 171. Nagapawan, Y.V.R., Prakash, K.B., Kanagachidambaresan, G.R.:
sis, Utica College (2019). https://search.proquest.com/docview/ Convolutional neural network. In: EAI/Springer Innovations in
2346618330?accountid=14478 Communication and Computing, pp. 45–51 (2021). https://doi.
153. Marques, C., Malta, S., Magalhães, J.P.: Dns dataset for malicious org/10.1007/978-3-030-57077-4_6
domains detection. Data Brief 38, 107342 (2021). https://doi.org/ 172. Naiknaware, B., Kushwaha, B., Kawathekar, S.: Social media sen-
10.1016/j.dib.2021.107342 timent analysis using machine learning classifiers. Int. J. Comput.
154. Martinez Monterrubio, S.M., Noain-Sánchez, A., Verdú Pérez, Sci. Mob. Comput. 6(6), 465–472 (2017)
E., González Crespo, R.: Coronavirus fake news detection via 173. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.:
medosint check in health care official bulletins with cbr explana- Semeval-2016 task 4: Sentiment analysis in twitter. In: Proceed-
tion: The way to find the real information source through osint, the ings of the 10th International Workshop on Semantic Evaluation
verifier tool for official journals. Inf. Sci. 574, 210–237 (2021). (SemEval-2016), pp. 1–18. Association for Computational Lin-
https://doi.org/10.1016/j.ins.2021.05.074
123
A systematic review on research utilising artificial intelligence… 2935
guistics, Stroudsburg, PA, USA (2016). https://doi.org/10.18653/ 188. Pastor-Galindo, J., Nespoli, P., Gomez Marmol, F., Martinez
v1/S16-1001. http://aclweb.org/anthology/S16-1001 Perez, G.: The not yet exploited goldmine of osint: opportunities,
174. Namihira, Y., Segawa, N., Ikegami, Y., Kawai, K., Kawabe, open challenges and future trends. IEEE Access 8, 10282–10304
T., Tsuruta, S.: High precision credibility analysis of informa- (2020). https://doi.org/10.1109/ACCESS.2020.2965257
tion on twitter. In: 2013 International Conference on Signal- 189. Patil, P.P., Phansalkar, S., Kryssanov, V.V.: Topic modelling for
Image Technology & Internet-Based Systems, pp. 909–915. aspect-level sentiment analysis. In: Advances in Intelligent Sys-
IEEE (2013). https://doi.org/10.1109/SITIS.2013.148. http:// tems and Computing, vol. 828, pp. 221–229. Springer (2019).
ieeexplore.ieee.org/document/6727298/ https://doi.org/10.1007/978-981-13-1610-4_23
175. Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., By, T.: Sentiment 190. Pavan Kumar, C.S., Dhinesh Babu, L.D.: Novel text preprocessing
analysis on social media. In: Proceedings of the 2012 IEEE/ACM framework for sentiment analysis. In: Smart Innovation, Systems
International Conference on Advances in Social Networks Anal- and Technologies, vol. 105, pp. 309–317. Springer (2019). https://
ysis and Mining, ASONAM 2012, pp. 919–926. IEEE (2012). doi.org/10.1007/978-981-13-1927-3_33
https://doi.org/10.1109/ASONAM.2012.164 191. Pellet, H., Shiaeles, S., Stavrou, S.: Localising social network
176. Neuman, Y., Lev-Ran, Y., Erez, E.S.: Screening for potential users and profiling their movement. Comput. Secur. 81, 49–57
school shooters through the weight of evidence. Heliyon 6(10), (2019). https://doi.org/10.1016/j.cose.2018.10.009
05066 (2020). https://doi.org/10.1016/j.heliyon.2020.e05066 192. Pelzer, R.: Policing of terrorism using data from social media.
177. Nicart, E., Zanuttini, B., Gilbert, H., Grilheres, B., Praca, F.: Eur J Secur Res 3(2), 163–179 (2018). https://doi.org/10.1007/
Building document treatment chains using reinforcement learning s41125-018-0029-9
and intuitive feedback. In: 2016 IEEE 28th International Confer- 193. Perera, I., Hwang, J., Bayas, K., Dorr, B., Wilks, Y.: Cyberat-
ence on Tools with Artificial Intelligence (ICTAI), pp. 635–639. tack prediction through public text analysis and mini-theories. In:
IEEE (2016). https://doi.org/10.1109/ICTAI.2016.0102. http:// 2018 IEEE International Conference on Big Data (Big Data), pp.
ieeexplore.ieee.org/document/7814662/ 3001–3010. IEEE (2018). https://doi.org/10.1109/BigData.2018.
178. Nicart, E., Zanuttini, B., Grilhères, B., Giroux, P., Saval, A.: 8622106. https://ieeexplore.ieee.org/document/8622106/
Amélioration continue d’une chaîne de traitement de documents 194. Pingle, A., Piplai, A., Mittal, S., Joshi, A., Holt, J., Zak, R.: Relext:
avec l’apprentissage par renforcement. Revue d’intelligence Arti- relation extraction using deep learning approaches for cybersecu-
ficielle 31(6), 619–648 (2017). https://doi.org/10.3166/ria.31. rity knowledge graph improvement. In: Proceedings of the 2019
619-648 IEEE/ACM International Conference on Advances in Social Net-
179. Nila, C., Preda, M., Apostol, I., Patriciu, V.V.: Reactive wifi hon- works Analysis and Mining, pp. 879–886. ACM, New York, NY,
eypot. In: Proceedings of the 13th International Conference on USA (2019). https://doi.org/10.1145/3341161.3343519. https://
Electronics, Computers and Artificial Intelligence, ECAI 2021 dl.acm.org/doi/10.1145/3341161.3343519
(2021). https://doi.org/10.1109/ECAI52376.2021.9515048 195. Pratama, M.O., Satyawan, W., Jannati, R., Pamungkas, B.,
180. NIST: National Vulnerability Database. https://nvd.nist.gov/ Raspiani, Syahputra, M.E., Neforawati, I.: The sentiment anal-
(2022). Accessed 22 March 2022 ysis of indonesia commuter line using machine learning based
181. Noel, L.: Redai : A Machine Learning Approach to Cyber Threat on twitter data. In: Journal of Physics: Conference Series,
Intelligence Redai: A Machine Learning Approach to Cyber vol. 1193, p. 012029 (2019). https://doi.org/10.1088/1742-6596/
Threat Intelligence. PhD thesis, James Madison University (2021) 1193/1/012029. https://iopscience.iop.org/article/10.1088/1742-
182. Noubours, S., Pritzkau, A., Schade, U.: Nlp as an essential 6596/1193/1/012029
ingredient of effective osint frameworks. In: 2013 Military Com- 196. Queiroz, A., Keegan, B., Mtenzi, F.: Predicting software vulner-
munications and Information Systems Conference, MCC 2013. ability using security discussion in social media. In: European
Military University of Technology (2013) Conference on Information Warfare and Security, ECCWS,
183. Pagolu, V.S., Reddy, K.N., Panda, G., Majhi, B.: Sentiment analy- pp. 628–634 (2017). https://www.semanticscholar.org/paper/
sis of twitter data for predicting stock market movements. In: 2016 Predicting-Software-Vulnerability-Using-Security-in-Queiroz-
International Conference on Signal Processing, Communication, Keegan/3bcb4df05336060443638e71e9ee99c190c9109f
Power and Embedded System (SCOPES), pp. 1345–1350. IEEE 197. Rachman, F.F., Nooraeni, R., Yuliana, L.: Public opinion of
(2016). https://doi.org/10.1109/SCOPES.2016.7955659. http:// transportation integrated (jak lingko), in dki jakarta, indonesia.
ieeexplore.ieee.org/document/7955659/ Procedia Comput. Sci. 179(2020), 696–703 (2021). https://doi.
184. Pai, U., et al.: Open source intelligence and its applications in org/10.1016/j.procs.2021.01.057
next generation cyber security - a literature review. Int. J. Appl. 198. Radanliev, P., De Roure, D., Maple, C., Ani, U.: Method-
Eng. Manage. Lett. 5(2), 1–25 (2021). https://doi.org/10.47992/ ology for integrating artificial intelligence in healthcare sys-
IJAEML.2581.7000.0100 tems: learning from covid-19 to prepare for disease x. AI and
185. Palmieri, R., Orabona, V., Cinque, N., Tangorra, S., Cappetta, Ethics (0123456789) (2021). https://doi.org/10.1007/s43681-
D.: Reputation analysis towards discovery. In: Proceedings of 021-00111-x
the 6th International Conference on Data Science, Technol- 199. Radanliev, P., Roure, D.C.D., Walton, R., Van Kleek, M., Mon-
ogy and Applications, pp. 321–330. SCITEPRESS - Science talvo, R.M., Santos, O., Maddox, L., Cannady, S.: Covid-19 what
and Technology Publications (2017). https://doi.org/10.5220/ have we learned? The rise of social machines and connected
0006487303210330. https://www.scitepress.org/papers/2017/ devices in pandemic management following the concepts of pre-
64873/64873.pdf. http://www.scitepress.org/DigitalLibrary/ dictive, preventive and personalised medicine. EPMA J 2020(11),
Link.aspx?doi=10.5220/0006487303210330 311–332 (2020). https://doi.org/10.2139/ssrn.3692585
186. Panagiotou, A., Ghita, B., Shiaeles, S., Bendiab, K.: FaceWall- 200. Rahul, K., Jindal, B.R., Singh, K., Meel, P.: Analysing
Graph: Using Machine Learning for Profiling User Behaviour public sentiments regarding covid-19 vaccine on twitter.
from Facebook Wall, vol. 11660 LNCS, pp. 125–134. Springer In: 2021 7th International Conference on Advanced Com-
(2019). https://doi.org/10.1007/978-3-030-30859-9_11 puting and Communication Systems (ICACCS), pp. 488–
187. Parashar, D., Sanagavarapu, L.M., Reddy, Y.R.: Sql injection 493. IEEE (2021). https://doi.org/10.1109/ICACCS51430.2021.
vulnerability identification from text. In: ACM International 9441693. https://ieeexplore.ieee.org/document/9441693https://
Conference Proceeding Series (2021). https://doi.org/10.1145/ ieeexplore.ieee.org/document/9441693/
3452383.3452405
123
2936 T. O. Browne et al.
201. Rahul, L., Meetei, L.S., Jayanna, H.S.: Statistical and Neu- 217. Samaan, J.-L.: The RAND Corporation (1989–2009). Pal-
ral Machine Translation for Manipuri-English on Intelligence grave Macmillan US, New York (2012). https://doi.org/10.1057/
Domain, vol. 736 LNEE, pp. 249–257. Springer (2021). https:// 9781137057358
doi.org/10.1007/978-981-33-6987-0_21 218. Sarker, I.H., Kayes, A.S.M., Badsha, S., Alqahtani, H., Watters,
202. Rajalakshmi, E., Asik Ibrahim, N., Subramaniyaswamy, V.: A P., Ng, A.: Cybersecurity data science: an overview from machine
survey of machine learning techniques used to combat against the learning perspective. J. Big Data (2020). https://doi.org/10.1186/
advanced persistent threat. In: Communications in Computer and s40537-020-00318-5
Information Science, pp. 159–172. Springer (2019). https://doi. 219. Sarma, N., Singh, S.R., Goswami, D.: Influence of social con-
org/10.1007/978-981-15-0871-4_12 versational features on language identification in highly multilin-
203. Rajalakshmi, R., Ramraj, S., Ramesh Kannan, R.: Transfer learn- gual online conversations. Inf. Processi. Manag. 56(1), 151–166
ing approach for identification of malicious domain names. In: (2019). https://doi.org/10.1016/j.ipm.2018.09.009
SSCC 2018: Security in Computing and Communications, pp. 220. Satyanarayan Raju Vadapalli, George Hsieh, K.S.N.: Twitterosint:
656–666 (2019). https://doi.org/10.1007/978-981-13-5826-5_51 Automated cybersecurity threat intelligence collection and anal-
204. Ramraj, S., Sivakumar, V., Ramnath G., K.: Real-time resume ysis using twiter data. In: Proceedings of the International
classification system using linkedin profile descriptions. In: Conference on Security and Management (SAM), p. 60132.
2020 International Conference on Computational Intelligence for The Steering Committee of The World Congress in Computer
Smart Power System and Sustainable Energy (CISPSSE), pp. Science, Computer Engineering and Applied Computing (World-
1–4. IEEE (2020). https://doi.org/10.1109/CISPSSE49931.2020. Comp), Athens (2018). https://www.proquest.com/docview/
9212209. https://ieeexplore.ieee.org/document/9212209/ 2153621548?pq-origsite=gscholar&fromopenview=true
205. Ranade, P., Piplai, A., Mittal, S., Joshi, A., Finin, T.: Gen- 221. Schaurer, F., Storger, J.: The evolution of open source intelligence
erating fake cyber threat intelligence using transformer-based (osint). J. U.S Intell. Stud. 19(3), 53–56 (2013)
models. In: Proceedings of the International Joint Conference on 222. Scopus: Scopus (2021). https://www.scopus.com/ Accessed 08
Neural Networks 2021-July, 1–9 (2021) https://doi.org/10.1109/ November 2021
IJCNN52387.2021.9534192. arXiv:2102.04351 223. Searchcode: Seachcode. https://searchcode.com/ (2022).
206. Recon-NG: Recon-NG. https://github.com/lanmaster53/recon- Accessed 05 December 2022
ng (2022). Accessed 05 December 2022 224. Senekal, B., Kotzé, E.: Open source intelligence (osint) for con-
207. Reddy, D.M., Reddy, D.N.V.S., Reddy, D.N.V.S.: Twitter senti- flict monitoring in contemporary South Africa: challenges and
ment analysis using distributed word and sentence representation opportunities in a big data context. Afr. Secur. Rev. 28(1), 19–37
(2019). arXiv:1904.12580 (2019). https://doi.org/10.1080/10246029.2019.1644357
208. Ren, F., Jiang, Z., Wang, X., Liu, J.: A dga domain names detection 225. Severyn, A., Moschitti, A.: Twitter sentiment analysis with
modeling method based on integrating an attention mechanism deep convolutional neural networks, pp. 959–962. ACM (2015).
and deep neural network. Cybersecurity (2020). https://doi.org/ https://doi.org/10.1145/2766462.2767830. https://dl.acm.org/
10.1186/s42400-020-00046-6 doi/10.1145/2766462.2767830
209. Riebe, T., Wirth, T., Bayer, M., Kühn, P., Kaufhold, M.A., Knau- 226. Shalunts, G., Backfried, G., Prinz, K.: Sentiment analysis of
the, V., Guthe, S., Reuter, C.: Cysecalert: An alert generation german social media data for natural disasters. In: ISCRAM
system for cyber security events using open source intelligence 2014 Conference Proceedings - 11th International Conference on
data. In: Lecture Notes in Computer Science (including Sub- Information Systems for Crisis Response and Management, pp.
series Lecture Notes in Artificial Intelligence and Lecture Notes in 752–756 (2014). http://idl.iscram.org/files/shalunts/2014/940_
Bioinformatics), vol. 12918 LNCS, pp. 429–446 (2021). https:// Shalunts_etal2014.pdf
doi.org/10.1007/978-3-030-86890-1_24 227. Shen, A., Chow, K.P.: Time and location topic model for analyzing
210. Roberts, A.: The importance of osint. In: Cyber Threat Intelli- lihkg forum data. In: 2020 13th International Conference on Sys-
gence, pp. 131–152. Apress, Berkeley, CA (2021) tematic Approaches to Digital Forensic Engineering (SADFE),
211. Rodriguez, A., Okamura, K.: Cybersecurity text data classifica- pp. 32–37. IEEE (2020). https://doi.org/10.1109/SADFE51007.
tion and optimization for CTI systems. Adv. Intell. Syst. Comput. 2020.00009. https://ieeexplore.ieee.org/document/9133703/
(2020). https://doi.org/10.1007/978-3-030-44038-1_37 228. Shin, G., Yooun, H., Shin, D., Shin, D.: Incremental learning
212. Rodriguez, A., Okamura, K.: Enhancing data quality in real-time method for cyber intelligence, surveillance, and reconnaissance in
threat intelligence systems using machine learning. Soc. Netw. closed military network using converged it techniques. Soft. Com-
Anal. Min. 10(1), 1–22 (2020). https://doi.org/10.1007/s13278- put. 22(20), 6835–6844 (2018). https://doi.org/10.1007/s00500-
020-00707-x 018-3433-1
213. Rushlene Kaur Bakshi, Navneet Kaur, Ravneet Kaur, G.K.: Opin- 229. Shin, H.S., Kwon, H.Y., Ryu, S.J.: A new text classification model
ion mining and sentiment analysis. In: 2016 3rd International based on contrastive word embedding for detecting cybersecu-
Conference on Computing for Sustainable Global Development, rity intelligence in twitter. Electronics (Switzerland) 9(9), 1–21
pp. 452–455. IEEE (2016). https://ieeexplore.ieee.org/document/ (2020). https://doi.org/10.3390/electronics9091527
7724305/authors#authors 230. Shire, R., Shiaeles, S., Bendiab, K., Ghita, B., Kolokotronis,
214. Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using N.: Machine learning in detecting user’s suspicious behaviour
machine learning: a survey (2017). arXiv:1701.07179 through facebook wall. In: arXiv Preprint, pp. 65–76 (2019).
215. Sakiyama, K., de Souza Rodrigues, L., Matsubara, E.T.: Can Twit- arXiv:1910.14417
ter Data Estimate Reality Show Outcomes? vol. 12319 LNAI, pp. 231. Shodan: Shodan. https://www.shodan.io/ (2022). Accessed 05
466–482. Springer (2020). https://doi.org/10.1007/978-3-030- December 2022
61377-8_32 232. Simonov, M., Bertone, F., Goga, K., Terzo, O.: Cyber Kill Chain
216. Sakiyama, K.M., Silva, A.Q.B., Matsubara, E.T.: Twitter breaking Defender for Smart Meters, vol. 772, pp. 386–397. Springer
news detector in the 2018 Brazilian presidential election using (2019). https://doi.org/10.1007/978-3-319-93659-8_34
word embeddings and convolutional neural networks. Proc. Int. 233. Simran, K., Balakrishna, P., Vinayakumar, R., Soman, K.P.: Deep
Jt. Conf. Neural Netw. 2019(July), 1–8 (2019). https://doi.org/10. Learning Approach for Enhanced Cyber Threat Indicators in Twit-
1109/IJCNN.2019.8852394 ter Stream, vol. 1208 CCIS, pp. 135–145. Springer (2020). https://
doi.org/10.1007/978-981-15-4825-3_11
123
A systematic review on research utilising artificial intelligence… 2937
234. Singh, S., Fernandes, S.V., Padmanabha, V., Rubini, P.E.: Mcids- Syst. Appl. 168(August 2020), 114386 (2021). https://doi.org/
multi classifier intrusion detection system for iot cyber attack 10.1016/j.eswa.2020.114386
using deep learning algorithm. In: Proceedings of the 3rd Inter- 247. Tavarez, D.: PwnDB. https://github.com/davidtavarez/pwndb
national Conference on Intelligent Communication Technologies (2022). Accessed 05 December 2022
and Virtual Mobile Networks, ICICV 2021 (March), pp. 354–360 248. Terán, L., Mancera, J.: Dynamic profiles using sentiment analysis
(2021). https://doi.org/10.1109/ICICV50876.2021.9388579 and twitter data for voting advice applications. Gov. Inf. Q. 36(3),
235. Smadi, M., Qawasmeh, O.: A supervised machine learning 520–535 (2019). https://doi.org/10.1016/j.giq.2019.03.003
approach for events extraction out of arabic tweets. In: 2018 Fifth 249. Tewari, A.: Decoding the black box: interpretable methods for
International Conference on Social Networks Analysis, Manage- post-incident counter-terrorism investigations. In: ACM web
ment and Security (SNAMS), pp. 114–119. IEEE (2018). https:// science conference. Websci, Southampton (2020). https://www.
doi.org/10.1109/SNAMS.2018.8554560. https://ieeexplore.ieee. southampton.ac.uk/~sem03/STAIDCC20_tewari_paper_07_07_
org/document/8554560/ 2020.pdf
236. Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Predictive sen- 250. Theron, P., Kott, A.: When autonomous intelligent goodware
timent analysis of tweets: A stock market application. In: Lecture will fight autonomous intelligent malware: A possible future of
Notes in Computer Science (including Subseries Lecture Notes in cyber defense. Proceedings - IEEE Military Communications
Artificial Intelligence and Lecture Notes in Bioinformatics), pp. Conference MILCOM 2019-Novem, 1–7 (2019) https://doi.org/
77–88 (2013). https://doi.org/10.1007/978-3-642-39146-0_8 10.1109/MILCOM47813.2019.9021038arXiv:1912.01959
237. Sotirakou, C., Karampela, A., Mourlas, C.: Evaluating the role of 251. Tiwari, S., Verma, R., Jaiswal, J., Rai, B.K.: Open Source
news content and social media interactions for fake news detec- Intelligence Initiating Efficient Investigation and Reliable Web
tion. In: Lecture Notes in Computer Science (including Subseries Searching vol. 1244 CCIS, pp. 151–163. Springer (2020). https://
Lecture Notes in Artificial Intelligence and Lecture Notes in doi.org/10.1007/978-981-15-6634-9_15
Bioinformatics), vol. 12887 LNCS, pp. 128–141. Springer (2021). 252. Translator, O.D.: DocTranslator (2021). https://www.
https://doi.org/10.1007/978-3-030-87031-7_9 onlinedoctranslator.com/en/ Accessed 6 December 2021
238. Springer: Springer Link (2021) 253. Truvé, S.: Threats of tomorrow: using artificial intelligence to
239. Spyse: Spyse. https://spyse-dev.readme.io/reference/quick-start predict malicious infrastructure activity. Record. Future 2016,
(2022). Accessed 05 December 2022 204–212 (2016)
240. Strohmeier, M., Smith, M., Lenders, V., Martinovic, I.: Classi- 254. Tundis, A., Ruppert, S., Mühlhäuser, M.: On the Automated
fly: inferring aircraft categories from open data using machine Assessment of open-source cyber threat intelligence sources. Lec-
learning. arXiv preprint (2019) arXiv:1908.01061 ture Notes in Computer Science (including subseries Lecture
241. Stumptner, M., Mayer, W., Grossmann, G., Liu, J., Li, W., Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
Casanovas, P., De Koker, L., Mendelson, D., Watts, D., Bain- ics) 12138 LNCS, pp. 453–467 (2020) https://doi.org/10.1007/
bridge, B.: An architecture for establishing legal semantic work- 978-3-030-50417-5_34
flows in the context of integrated law enforcement. In: Lecture 255. Twitter: Twitter API. https://developer.twitter.com/en/docs/
Notes in Computer Science (including Subseries Lecture Notes in twitter-api (2022). Accessed 15 December 2022
Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 256. Tyagi, H., Kumar, R.: Attack and anomaly detection in IoT
10791, pp. 124–139. Springer (2018). https://doi.org/10.1007/ networks using supervised machine learning approaches. Revue
978-3-030-00178-0_8 d’Intelligence Artificielle 35(1), 11–21 (2021). https://doi.org/10.
242. Susnea, E.: A real-time social media monitoring system as an 18280/ria.350102
open source intelligence (osint) platform for early warning in 257. Uehara, K., Nishikawa, H., Yamamoto, T., Kawauchi, K., Nishi-
crisis situations. In: International Conference KNOWLEDGE- gaki, M.: Analysis of the relationship between psychological
BASED ORGANIZATION, vol. 24, pp. 427–431 (2018). https:// manipulation techniques and personality factors in targeted
doi.org/10.1515/kbo-2018-0127. https://sciendo.com/pdf/10. emails. In: Advances in Intelligent Systems and Computing, vol.
1515/kbo-2018-0127https://www.sciendo.com/article/10.1515/ 1151 AISC, pp. 338–351. Springer (2020). https://doi.org/10.
kbo-2018-0127 1007/978-3-030-33506-9_30
243. Szakonyi, A., Chellasamy, H., Vassilakos, A., Dawson, M.: Using 258. Upadhayay, B., Lodhia, Z.A.M., Behzadan, V., Haven, W.: Com-
technologies to uncover patterns in human trafficking. Adv. bating human trafficking via automatic osint collection , validation
Intell. Syst. Comput. (2021). https://doi.org/10.1007/978-3-030- and fusion. In: 15th International AAAI Conference on Web
70416-2_64 and Social Media. Association for the Advancement of Artificial
244. Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., Li, P.: User-level Intelligence, Connecticut (2020). http://workshop-proceedings.
sentiment analysis incorporating social networks. In: Proceed- icwsm.org/pdf/2021_17.pdf
ings of the 17th ACM SIGKDD International Conference on 259. Verdejo, D.P., Mercier-Laurent, E.: Video intelligence as a compo-
Knowledge Discovery and Data Mining - KDD ’11, p. 1397. nent of a global security system. In: IFIP International Workshop
ACM Press, New York, New York, USA (2011). https://doi. on Artificial Intelligence for Knowledge Management, pp. 132–
org/10.1145/2020408.2020614 . http://dl.acm.org/citation.cfm? 145. Springer (2019). https://doi.org/10.1007/978-3-030-29904-
doid=2020408.2020614 0_10
245. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learn- 260. Vinayakumar, R., Soman, K.P., Poornachandran, P.: Detecting
ing sentiment-specific word embedding for twitter sentiment malicious domain names using deep learning approaches at scale.
classification. In: Proceedings of the 52nd Annual Meeting of J. Intell. Fuzzy Syst. 34(3), 1355–1367 (2018). https://doi.org/10.
the Association for Computational Linguistics (Volume 1: Long 3233/JIFS-169431
Papers), vol. 44, pp. 1555–1565. Association for Computational 261. Vinayakumar, R., Soman, K.P., Poornachandran, P.: Evaluating
Linguistics, Stroudsburg, PA, USA (2014). https://doi.org/10. deep learning approaches to characterize and classify malicious
3115/v1/P14-1146 url’s. J. Intell. Fuzzy Syst. 34(3), 1333–1343 (2018). https://doi.
246. Tariq, I., Sindhu, M.A., Abbasi, R.A., Khattak, A.S., Maq- org/10.3233/JIFS-169429
bool, O., Siddiqui, G.F.: Resolving cross-site scripting attacks 262. Vinayakumar, R., Soman, K.P., Poornachandran, P., Sachin
through genetic algorithm and reinforcement learning. Expert Kumar, S.: Evaluating deep learning approaches to characterize
123
2938 T. O. Browne et al.
and classify the dgas at scale. J. Intell. Fuzzy Syst. 34(3), 1265– 276. Yadav, S., Reddy, A.K.K., Reddy, A.L.N., Ranjan, S.: Detecting
1276 (2018). https://doi.org/10.3233/JIFS-169423 algorithmically generated malicious domain names. In: Proceed-
263. Vinayakumar, R., Soman, K.P., Prabaharan Poornachandran, A.S., ings of the 10th Annual Conference on Internet Measurement -
Elhoseny, M.: Improved dga domain names detection and catego- IMC ’10, p. 48. ACM Press, New York, New York, USA (2010).
rization using deep learning architectures with classical machine https://doi.org/10.1145/1879141.1879148. http://portal.acm.org/
learning algorithms. Adv. Sci. Technol. Secur. Appl. (2019). citation.cfm?doid=1879141.1879148
https://doi.org/10.1007/978-3-030-16837-7_8 277. Yang, W., Lam, K.Y.: Automated Cyber Threat Intelligence
264. Vinayakumar, R., Alazab, M., Srinivasan, S., Pham, Q.V., Padan- Reports Classification for Early Warning of Cyber Attacks in
nayil, S.K., Simran, K.: A visualized botnet detection system Next Generation SOC, vol. 11999 LNCS, pp. 145–164. Springer
based deep learning for the internet of things networks of smart (2020). https://doi.org/10.1007/978-3-030-41579-2_9
cities. IEEE Trans. Ind. Appl. 56(4), 4436–4456 (2020). https:// 278. Yu, B., Pan, J., Gray, D., Hu, J., Choudhary, C., Nascimento,
doi.org/10.1109/TIA.2020.2971952 A.C.A., De Cock, M.: Weakly supervised deep learning for
265. WalletExplorer: WalletExplorer (2023) the detection of domain generation algorithms. IEEE Access
266. Wan, Y., Gao, Q.: An ensemble sentiment classification system 7, 51542–51556 (2019). https://doi.org/10.1109/ACCESS.2019.
of twitter data for airline services analysis. In: 2015 IEEE Inter- 2911522
national Conference on Data Mining Workshop (ICDMW), pp. 279. Yue, L., Chen, W., Li, X., Zuo, W., Yin, M.: A survey of sentiment
1318–1325. IEEE (2015). https://doi.org/10.1109/ICDMW.2015. analysis in social media. Knowl. Inf. Syst. 60(2), 617–663 (2019).
7. http://ieeexplore.ieee.org/document/7395820/ https://doi.org/10.1007/s10115-018-1236-4
267. Wang, M.-H., Tsai, M.-H., Yang, W.-C., Lei, C.-L.: Infection 280. Zhou, Z.-H.: Machine Learning, pp. 181–182. Springer, Singa-
categorization using deep autoencoder. In: IEEE INFOCOM pore (2021). https://doi.org/10.1007/978-981-15-1967-3
2018 - IEEE conference on computer communications workshops 281. Zhuk, D., Tretiakov, A., Gordeichuk, A., Puchkovskaia, A.:
(INFOCOM WKSHPS), pp. 1–2. IEEE (2018). https://doi.org/ Methods to identify fake news in social media using artificial
10.1109/INFCOMW.2018.8406878. https://ieeexplore.ieee.org/ intelligence technologies. In: International Conference on Digital
document/8406878/ Transformation and Global Society DTGS 2018: Digital Transfor-
268. Wang, T., Chen, L.-C., Genc, Y.: A dictionary-based method mation and Global Society, pp. 446–454. Springer (2018). https://
for detecting machine-generated domains. Inform. Secur. J. doi.org/10.1007/978-3-030-02843-5_36
Glob. Perspect. 30(4), 205–218 (2021). https://doi.org/10.1080/ 282. Zhuk, D., Tretiakov, A., Gordeichuk, A.: Methods to iden-
19393555.2020.1834650 tify fake news in social media using machine learning. In:
269. Warefare, G.: Greyhat Warefare. https://grayhatwarfare.com/ Proceedings of the 22st Conference of Open Innovations Associa-
(2022). Accessed 27 May 2022 tion FRUCT, pp. 59–40159404 (2018). http://dl.acm.org/citation.
270. Wei, Y., Zou, F.: Automatic generation of malware threat intel- cfm?id=3266365.3266424
ligence from unstructured malware traces. In: Garcia-Alfaro, J., 283. Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art
Li, S., Poovendran, R., Debar, H., Yung, M. (eds.) Lecture Notes in twitter sentiment analysis. ACM Trans. Manag. Inf. Syst. 9(2),
of the Institute for Computer Sciences, Social-Informatics and 1–29 (2018). https://doi.org/10.1145/3185045
Telecommunications Engineering, LNICST. Lecture Notes of the 284. Zizzo, G., Hankin, C., Maffeis, S., Jones, K.: Adversarial machine
Institute for Computer Sciences, Social Informatics and Telecom- learning beyond the image domain. In: Proceedings of the 56th
munications Engineering, vol. 398 LNICST, pp. 44–61. Springer, Annual Design Automation Conference 2019, pp. 1–4. ACM,
Cham (2021). https://doi.org/10.1007/978-3-030-90019-9_3 New York, NY, USA (2019). https://doi.org/10.1145/3316781.
271. Whois: Whois. https://www.whois.com/whois/ (2022). Accessed 3323470. https://ieeexplore.ieee.org/document/8806924https://
05 December 2022 dl.acm.org/doi/10.1145/3316781.3323470
272. Wilkinson, G., Legg, P.: “what did you say?”: Extracting uninten- 285. Zunino, R., Bisio, F., Peretti, C., Surlinelli, R., Scillia, E.,
tional secrets from predictive text learning systems. In: 2020 Inter- Ottaviano, A., Sangiacomo, F.: An analyst-adaptive approach
national Conference on Cyber Security and Protection of Digital to focused crawlers. In: Proceedings of the 2013 IEEE/ACM
Services (Cyber Security), pp. 1–8. IEEE (2020). https://doi.org/ International Conference on Advances in Social Networks Anal-
10.1109/CyberSecurity49315.2020.9138882. https://ieeexplore. ysis and Mining, ASONAM 2013, pp. 1073–1077. ACM and
ieee.org/document/9138882/ IEEE (2013). https://doi.org/10.1145/2492517.2500328 . https://
273. Williams, H., Blum, I.: Defining Second Generation Open Source ieeexplore.ieee.org/document/6785835
Intelligence (OSINT) for the Defense Enterprise. Rand Corpora-
tion, Santa Monica (2018). https://doi.org/10.7249/rr1964
274. Xu, H., Dong, M., Zhu, D., Kotov, A., Carcone, A.I., Naar-
Publisher’s Note Springer Nature remains neutral with regard to juris-
King, S.: Text classification with topic-based word embedding and
dictional claims in published maps and institutional affiliations.
convolutional neural networks. In: Proceedings of the 7th ACM
International Conference on Bioinformatics, Computational Biol-
ogy, and Health Informatics, pp. 88–97. ACM, New York, NY,
USA (2016). https://doi.org/10.1145/2975167.2975176. https://
dl.acm.org/doi/10.1145/2975167.2975176
275. Yaacoub, J.P.A., Noura, H.N., Salman, O., Chehab, A.: Robotics
cyber security: vulnerabilities, attacks, countermeasures, and rec-
ommendations. Int. J. Inf. Secur. (2021). https://doi.org/10.1007/
s10207-021-00545-8
123