0% found this document useful (0 votes)
165 views22 pages

Survey of Attack Projection, Prediction, and

This document summarizes a survey of methods for predicting, projecting, and forecasting cyber attacks. It discusses four main predictive tasks in cybersecurity: 1) attack projection and intention recognition, which predict an attacker's next moves or intentions, 2) intrusion prediction, which predicts upcoming cyber attacks, 3) network security situation forecasting, which projects the overall cybersecurity situation in a network. It surveys both discrete models like attack graphs and Bayesian networks as well as continuous models like time series for these tasks. It also discusses machine learning approaches and challenges in evaluating predictions. The survey aims to outline what can be predicted, how usable predictions are, and how to evaluate them in cybersecurity applications.

Uploaded by

rajaavikhram3719
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views22 pages

Survey of Attack Projection, Prediction, and

This document summarizes a survey of methods for predicting, projecting, and forecasting cyber attacks. It discusses four main predictive tasks in cybersecurity: 1) attack projection and intention recognition, which predict an attacker's next moves or intentions, 2) intrusion prediction, which predicts upcoming cyber attacks, 3) network security situation forecasting, which projects the overall cybersecurity situation in a network. It surveys both discrete models like attack graphs and Bayesian networks as well as continuous models like time series for these tasks. It also discusses machine learning approaches and challenges in evaluating predictions. The survey aims to outline what can be predicted, how usable predictions are, and how to evaluate them in cybersecurity applications.

Uploaded by

rajaavikhram3719
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327449459

Survey of Attack Projection, Prediction, and Forecasting in Cyber Security

Article  in  IEEE Communications Surveys & Tutorials · September 2018


DOI: 10.1109/COMST.2018.2871866

CITATIONS READS
26 3,675

4 authors, including:

Martin Husák Elias Bou-Harb


Masaryk University University of Texas at San Antonio
29 PUBLICATIONS   205 CITATIONS    82 PUBLICATIONS   746 CITATIONS   

SEE PROFILE SEE PROFILE

Pavel Celeda
Masaryk University
71 PUBLICATIONS   752 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

KYPO cyber range View project

prototype of the DCC (CIRC) View project

All content following this page was uploaded by Elias Bou-Harb on 19 October 2018.

The user has requested enhancement of the downloaded file.


1

Survey of Attack Projection, Prediction, and


Forecasting in Cyber Security
Martin Husák, Jana Komárková, Elias Bou-Harb, and Pavel Čeleda

Abstract—This paper provides a survey of prediction, and approaches to predict also vulnerabilities. Finally, we might
forecasting methods used in cyber security. Four main tasks be interested in overall statistics of attacks, the presence of
are discussed first, attack projection and intention recognition, threats, and other pieces of information that together form
in which there is a need to predict the next move or the
intentions of the attacker, intrusion prediction, in which there is a network security situation. In this context, we talk about
a need to predict upcoming cyber attacks, and network security network security situation forecasting [6]. Numerous methods
situation forecasting, in which we project cybersecurity situation and system were proposed to approach these problems, and
in the whole network. Methods and approaches for addressing as we point out in this survey, they often share a common
these tasks often share the theoretical background and are often theoretical background, which makes the particular tasks and
complementary. In this survey, both methods based on discrete
models, such as attack graphs, Bayesian networks, and Markov use cases similar to each other.
models, and continuous models, such as time series and grey To summarize the open problems, we emphasize the fol-
models, are surveyed, compared, and contrasted. We further lowing research challenges of predictions and forecasting in
discuss machine learning and data mining approaches, that have cyber security:
gained a lot of attention recently and appears promising for such
• What can be predicted in a cyber security domain?
a constantly changing environment, which is cyber security. The
survey also focuses on the practical usability of the methods and Is it the next move of an adversary, appearance of a
problems related to their evaluation. new attacker, or cyber security situation from a global
Index Terms—Cyber security, intrusion detection, situational perspective?
awareness, prediction, forecasting, model checking. • How usable are the predictions in cyber security? Can
they be used to effectively mitigate an attack or to get
prepared for an upcoming security threat?
I. I NTRODUCTION • How to evaluate predictions in cyber security and what

Cyber security is a broad field of research, and the detection metrics should be used? Is it sufficient to rely on evalua-
of malicious activities on the network is among the oldest and tion using datasets and testbeds or can the actual predic-
most common problems [1]. However, intrusion detection is tion accuracy be measured in a live network setting?
mostly reactive and responses to specific patterns or observed To this end, such research challenges impact both theoretical
anomalies. The intuitive next step is taking a proactive ap- and practical perspectives. In this survey, we postulate if pre-
proach, in which there is a need to preemptively infer the dictions and forecasts are possible, and we are also interested
upcoming malicious activities so that we could react to such in the applicability and evaluation of the theoretical results.
events before they cause any harm [2]. Research efforts and
progress in predictions and forecasting in cyber security are A. Paper Organization
not as prominent as attack detection. However, it is gaining
more attention, and a breakthrough in this field would benefit This paper is divided into nine sections. Section II intro-
the whole discipline of cyber security [1]. duces the main use cases of predictive and forecasting methods
Before we can start making predictions about cyber security, in cyber security. Taxonomy of attack prediction methods is
there is a need to examine what can actually be predicted and presented in Section III. A literature review of methods of
what obstacles are there that make this problem hard. First, if cyber attack prediction is presented in Sections IV–VII with
there is an attack taking place, it is possible to predict its next a detailed explanation of the methods. Section VIII discusses
steps. Such a task is called attack projection [3]. A similar task evaluation of attack prediction and lessons learned. Finally,
is intention recognition [4], in which we also estimate what is Section IX concludes the paper and provides an outlook on
the ultimate goal of an adversary, which can also help us in future research.
predicting adversary’s next moves. Another task is predicting This paper is intended for an audience familiar with com-
cyber attacks that are going to happen. In this case, we talk puter networks and cyber attacks. Nevertheless, the tasks and
about intrusion prediction [5], although we can use similar use cases of attack prediction, projection, and forecasting are
defined in Section II, so the reader does not need to be
Martin Husák, Jana Komárková and Pavel Čeleda are with the an expert in the field. Probably the most interesting part of
Masaryk University, Institute of Computer Science, Botanická 68a, 602 00 this survey can be found in Sections III–VII. A taxonomy
Brno, Czech Republic (email: [email protected], [email protected], in Section III provides a high-level view of the discussed
[email protected])
Elias Bou-Harb is with Florida Atlantic University, 777 Glades Road, Boca methods. Sections IV–VII contain theoretical background and
Raton, FL 33431, USA (email: [email protected]) list of recent literature for each group of methods. There is
2

also a table included in each of the four sections (Tables II– analytics, although these are not discussed much in details. A
V) that summarizes all the prediction method. If a paper listed simple yet usable taxonomy of intrusion prediction methods
in the table is discussed in the text, it is distinguished by can also be found in a paper by Abdlhamed et al. [9]. The
the author name(s) in italic. Selected papers are highlighted authors first split related work into two groups, predictions
with a gray background in the table and announced in the methods and intrusion detection enhancement. The prediction
text as recommended reading. Practitioners are advised to methods are categorized into three groups, methods using
read Section VIII that contains practical implications and open Hidden Markov models, methods based on Bayesian networks,
problems in the field. and genetic algorithms. Subsequently, they classify artificial
neural networks, data mining, and algorithmic methodologies
B. Literature Search Methodology as three enhancements for intrusion detection, which enhance
the effectiveness of prediction systems. The same authors later
A literature search for this survey covered many journals
published a survey of intrusion prediction [5], in which they
and conference proceedings. Although the discussed problems
categorize prediction methodologies and prediction systems.
are studied in the field of cyber security, the topics are often
Prediction methodologies can be based on alert correlation,
addressed in journals and conferences on computer networks
sequences of actions, statistical and probabilistic methods, and
and communications. Due to the specific nature of this work,
feature extraction. Prediction systems are then categorized as
we also had to go through journals and conferences dedicated
based on hidden Markov models, Bayesian networks, genetic
to formal methods in computer science, such as expert systems
algorithms, neural networks, data mining, and algorithmic
and their applications, which appeared to be an important
methods. Recently, Ahmed and Zaman [4] surveyed methods
source of papers for this survey.
of attack intention recognition, a field dominated by meth-
First, we reviewed survey-oriented journals like IEEE Com-
ods based on graphical models. The authors recognize four
munication Surveys and Tutorials and ACM Computing Sur-
categories: causal networks, path analysis, graphical models,
veys, although no survey was found to discuss predictions in
and dynamic Bayesian networks. Methods based on causal
cyber security. Subsequently, we used Google Scholar, IEEE
networks were evaluated as the most effective.
Xplore, and ACM Digital Library to search for related papers
using the queries “cyber security” AND “prediction”, “cyber
security” AND “attack projection”, “cyber security” AND II. U SE C ASES OF P REDICTION AND F ORECASTING IN
“forecasting”. Further, we looked for publications citing or C YBER S ECURITY
cited by already found works or having the same author. The From the surveyed research papers, we distilled several tasks
publications are presented in chronological order from 2012 to that pose a use case of prediction or forecasting in cyber
2018. Papers published prior to 2012 are not included in this security. The tasks are summed up in Table I. Historically,
survey unless they pose fundamental contribution or are still the first such use cases are the attack projection [3] and
highly relevant. The numbers of citations assessed by Google the attack intention recognition [4], which are closely tied to
Scholar and Scopus were used to identify the most influential intrusion detection. The task is to predict what is an attacker
research papers. (in an already observed attack) going to do next, and what is
attacker’s ultimate goal [4]. In practice, these two tasks use
C. Existing Surveys very similar methods, and can often be used interchangeably.
Later, the task of predicting attacks emerged [5]. This task is
To the best of our knowledge, prediction and forecasting more general as it does not require observation of a preceding
methods in cyber security were not surveyed in such scope activity. The expected outcome is a prediction of an attack
yet, although several surveys of particular tasks and use before it actually occurs, not predicting a continuation of an
cases were published in recent years. Wei and Jiang [7] observed series of events. Finally, the task of forecasting a
in 2013 analyzed the problem of network security situation security situation [6] is a highly generic use case related to
prediction and compared predictions of NSSA using neural cyber situational awareness. The task is not to predict an at-
networks, time series, and support vector machines, although tack, but rather forecast the situation in the whole network [2].
mostly to illustrate the limitations of the available methods. The outcomes may be a forecast of increase or decrease in
Yang et al. [3] formalized the task of attack projection and the number of attacks or vulnerabilities in the network. The
surveyed literature on the topic in 2014. Three categories are following subsections discuss the use cases in more details.
listed, prediction based on attack plans, estimates of attackers
capabilities and intentions, and predictions by learning attack
patterns and attacker’s behavior. Leau and Manickam [6] in A. Attack Projection and Intention Recognition
2015 surveyed several existing techniques of network security The initial idea of attack projection dates back to 2001
situation forecasting. They grouped them into three categories when Geib and Goldman [10] proposed attack projection
by their theoretical background: machine learning, Markov as an extension of attack plan recognition and identified its
models, and Grey theory. In 2016, Gheyas and Abdallah [8] prerequisites and possible problems, such as a need to work
surveyed detection and prediction of insider threats. Although with unobserved actions, failure to observe, and consideration
this topic is still of interest, the predictive approaches do not of multiple concurrent goals. First methods started to appear
seem to be studied in recent years. Ramaki and Atani [2] around 2003 [11], [12] and the research in this field is still
surveyed early warning systems, which often use predictive active, including literature reviews [3], [4].
3

TABLE I propagation and multi-stage attacks, and other cyber security


U SE CASE CHARACTERISTICS . events. There is also a significant overlap with research on
Use case Task description Previous surveys early warning systems [2], which pose a practical use case for
Attack projection What is an adversary go- Yang et al. [3] prediction in cyber security in general.
ing to do next? Due to the task being too generic, there are not many
Attack intention What is an ultimate goal Ahmed and common elements in the proposed approaches. While attack
recognition of an adversary? Zaman [4]
Attack / Intrusion What type of attack will Abdlhamed projection mostly relied on discrete models of cyber attacks,
prediction occur, when, and where? et al. [5] there is a plethora of methods and models used for attack
Network security How is the overall situa- Leau and Man- prediction ranging from discrete models, e.g., attack graphs,
situation forecasting tion going to evolve? ickam [6]
to continuous models, e.g., time series. Thus, one may predict
the attacks using the same discrete models that used for attack
To project the continuation of an attack and predict the projection, with only a small variation in prediction start.
upcoming events, we typically need to document the behavior For example, the prediction may not start with an already
of the attackers and establish a description of an attack for observed malicious event, but rather with a probability that
later use. Sample anatomy of a cyber attack was given by a particular vulnerability in the network will be exploited. An
Bou-Harb et al. [13]. The anatomy consists of the following example of an approach based on a continuous model is a time
steps: series representing a number of attacks on a certain system or
i. Cyber scanning network in time. The time series may then be used to predict
ii. Enumeration if an attack is going to happen or not. Advanced methods may
iii. Intrusion Attempt calculate with types of attacks and characteristics of attackers
iv. Elevation of Privilege and victims, so that they may estimate what type of attack
v. Perform Malicious Tasks is going to happen, who is going to an attacker, and who is
vi. Deploy Malware/Backdoor going to be the victim. Recent approaches often include non-
vii. Delete Forensic Evidence and Exit technical data sources in the predictions so that we may see
methods based on sentiment analysis on social networks [16],
Many types of cyber attacks follow this simple sequence of
[17] or changes in user behavior [18], thus overcoming the
events, which can be observed either in the network traffic or
“unpredictability” of cyber attacks.
on the target system, where intrusion detection systems may
be found. The projection of an ongoing attack is, in essence,
very simple. If we see a sequence of events that fit an attack C. Network Security Situation Forecasting
model, we may assume that the attack will continue according
The last main use case of predictions and forecasting in
to the model. Thus, we predict the adversary’s next step.
cyber security is the forecasting of a global security situation.
Nevertheless, vague description of an attack is not usable for
Instead of focusing on an individual attacker or an ongoing
algorithmic predictions and, thus, more formal description of
attack, there is a need to know what is a holistic state of an
an attack is required, e.g., in the form of an attack graph [11].
information system or a network under our control. This use
Further, many different types of attacks exist, so there is a
case of cyber security prediction was briefly surveyed by Leau
need to create a model for all the attacks that are going to be
and Manickam [6].
projected. Historically, the first methods depended on attack
A key concept of a holistic view on cyber security is often
libraries [12] that had to be manually filled, which requires
referenced as cyber situational awareness (CSA) or network
substantial effort and continuous updates [3]. Thus, modern
security situational awareness (NSSA). Both terms originate
methods more often rely on data mining to automatically
in the general term situational awareness that originates in
generated attack patterns for attack projections [14], [15].
military research. One of the most widely used definitions of
A basic idea behind attack intention recognition is similar
situational awareness is the one by Endsley [19]: “Perception
to attack projection; the difference is in motivation. In attack
of the elements in the environment within a volume of time and
projection, we are not that interested in an attacker’s intentions.
space, the comprehension of their meaning and the projection
If an ultimate goal of an adversary is estimated, the predictions
of their status in near future.” The definition itself emphasizes
of future malicious events may be suited more to the particular
three levels, perception, comprehension, and projection, as
attack. Attacker’s intention recognition is studied in network
illustrated on Figure 1 [20]. When applied in the cyber security
forensics [4], where it was originally performed over historical
field, perception corresponds to monitoring of cyber systems
data. However, novel approaches are focused on real-time
as well as intrusion detection. Comprehension corresponds
intention recognition and are becoming more and more similar
to the understanding of the cyber security situation, in our
to attack projection.
case represented by modeling of cyber threats or correlating
security alerts. Finally, projection, as understood in the context
B. Intrusion Prediction of this paper, is an action of predicting the changes in a
A more general task predicting cyber attacks, mostly in- cyber security situation [3]. As we can see, the importance of
trusions [5]. Instead of projecting an already observed attack, projection is rooted deep in the theoretical background of the
we are interested in predicting novel attacks. Minor variations situational awareness [21] and thus, motivates the research on
of the task also include predictions of vulnerabilities, attack predictions in cyber security. The motivation is stronger than
4

First, we categorize the methods by the theoretical back-


SITUATION AWARENESS
ground they use as a basis for prediction. Typically, a pre-
dictive method in cyber security uses a model to represent
Perception Comprehension Projection an attack or network security situation. Clear examples are
of data and of the meaning of future states
the elements of and significance and events graphical models of attack progression or game-theoretical
the environment of the situation representation of attacker-defender interaction. Approaches
(Level 1) (Level 2) (Level 3)
based on these discrete model formed the first category of
methods. In contrary, the network security situation might
be represented via a continuous mathematical model, e.g., a
Fig. 1. Levels of situational awareness [20]. time series or a grey model, that are excellent for forecast-
ing. The second category of methods thus contains methods
based on continuous models. Both categories contain several
in attack projections presented earlier, where the projection is subcategories, each representing a particular model. The third
seen only as an extension of intrusion detection. category of predictive and forecasting methods contains the
Most of the works use quantitative analysis to describe the methods based on machine learning and data mining. A
network security situation at a point in time. The resulting common characteristic of such methods is that they include
values are then projected into the future. Such an approach the learning phase, i.e., creating the knowledge base for
does not provide any information about the exact nature of further predictions. It is worth noticing that several model-
future attacks. However, it can supply warnings about general based approaches used data mining to create a model before
increase or decline of network security in future. The quanti- making predictions [14], [15]. However, data mining plays
tative approach allows for efficient application of methods for only a supporting role in such cases so that these methods
analysis and projection that have been thoroughly researched do not qualify for the machine learning and data mining
in the context of other fields. The quantitative analysis requires category. Finally, the fourth category contains methods that
a measure for evaluation of a network security situation. are either very specific or otherwise hard to categorize. For
There is no established canonical measure for assessing net- example, predictions of DDoS attack volume and predictions
work security situation. However, there are two prevalent based on sentiment analysis on social media are very specific
approaches: hierarchical method with additive weights and and use unique methods in the context of this work. The
attack intensity estimation method. The hierarchical method fourth category further includes a group of similarity-based
evaluates the network security situation bottom up. Initially, a approaches, which are unfortunately highly fragmented, and a
security situation is measured for each host. Subsequently, the group of methods based on evolutionary computing, which
values for each host are multiplied by a weight of the host and emerged very recently and thus it is too soon to properly
summed up to compute the overall security of the network. The categorize it.
actual method for estimating host security varies by author. Apart from the theoretical background, we are interested in
The weight usually expresses the importance of the host. The the input data that are used for predictions. There are multiple
attack intensity approach fuses information about the ongoing available data sources with different levels of abstraction. A
attacks from diverse sources and estimates an overall attack method can work with raw data, such as network traffic and
intensity. The overall intensity is derived from the number and system logs, or with the abstract data, such alerts generated
severity of attacks against the whole network. The prediction by intrusion detection systems or numerical representation of
can then give a warning about incoming increase or recess network security situation. Further, for the needs of evaluation
of attacks. Note that since the input, as well as the predicted of the methods, the data can be either available as a dataset
value, are numeric, most of the models used for prediction of or gathered from a live environment. Such information are
network security situation falls into the category of continuous contained in taxonomy but can be found in the Tables II–V.
models.
IV. M ETHODS BASED ON D ISCRETE M ODELS
III. TAXONOMY AND M ETRICS OF P REDICTION M ETHODS The first group of cyber attack prediction methods is using
IN C YBER S ECURITY discrete models. In this section, we discuss methods using
graph models, such as attack graphs, Bayesian networks, and
This section presents a taxonomy of attack prediction Markov models. An alternative approach is based on game
methods. There are several approaches for categorizing the theory. A summary of methods and research papers discussed
methods, ranging from use cases to mathematical background. in this section can be found in Table II.
Related surveys were mostly focused on a single use case, such
as attack projection or network security situation forecasting.
We decided not to categorized the methods by their use case A. Attack Graphs
but instead on their theoretical background, thus highlighting An attack graph is a graphical representation of an at-
the similarities between the methods solving different tasks. tack scenario that was introduced in 1998 by Phillips and
Nevertheless, the use cases of particular research works are Swiler [46] and quickly became a popular method of formal
explained in their descriptions. The resulting taxonomy of representation of attacks. Thus, the first attack prediction
attack prediction methods is illustrated in Figure 2. methods were based upon attack graphs. The attack graphs
5

Prediction and Forecasting Methods in Cyber Security

Discrete Models Continuous Models Machine Learning and Data Mining Other Approaches
(Section IV) (Section V) (Section VI) (Section VII)

Graph Models Game Theoretical Time series Grey Models Machine Learning Data Mining Similarity-based approaches,
(Section IV-D) (Section V-A) (Section V-B) evolutionary computing,
prediction from unconventional data,
DDoS volume forecasting, . . .
Attack Graphs Bayesian Networks Markov Models Neural Networks, SVM, . . .
(Section IV-A) (Section IV-B) (Section IV-C)

Fig. 2. Taxonomy of attack prediction and forecasting methods.

TABLE II
C OMPARISON OF PREDICTION METHODS , PART I – APPROACHES BASED ON DISCRETE MODELS .

Attack Graphs (Section IV-A)


Authors Year Approach/Model Evaluation Advantages and Limitations
Hughes and Sheyner [11] 2003 Attack graph Proof-of-concept The first proposed methods
Chung et al. [22] (NICE) 2013 Attack graph Testbed Part of countermeasure selection tool
Kotenko and Chechulin [23] 2013 Attack graph Proof-of-concept Part of impact assessment tool
(CAMIAC)
Cao et al. [24], [25] 2014- Attack graph Live 75 % accuracy, factor graph
2015
Ramaki et al. [26] (RTECA) 2014 Attack graph DARPA 2000 95 % accuracy
GhasemiGol et al. [27], [28] 2016 Attack graph Proof-of-concept Scalable for large-scale networks
Polatidis et al. [29], [30] 2017- Attack graph Proof-of-concept Recommender system
2018
Bayesian Networks (Section IV-B)
Authors Year Approach/Model Evaluation Advantages and Limitations
Qin and Lee [12] 2004 Causal network DARPA GCP Fundamental work on attack projection
Wu et al. [31] 2012 Bayesian network - Only model extensions
Ramaki et al. [32] 2015 Bayesian attack graph DARPA 2000 92.3–99.2 % accuracy, real-time
Okutan et al. [33] 2017 Bayesian network Live 63%–99% accuracy, non-conventional signals
Huang et al. [34] 2018 Bayesian network Testbed Application in a larger framework
(cyber-physical)
Markov Models (Section IV-C)
Authors Year Approach/Model Evaluation Advantages and Limitations
Farhadi et al. [15] 2011 Hidden Markov model DARPA 2000 81.33 %–98.3 % accuracy, data mining,
illustrative example of a real-time attack
projection framework
Sendi et al. [35] 2012 Hidden Markov model DARPA 2000 Prediction of next step in multi-step attack
Shin et al. [36] (APAN) 2013 Markov chain DARPA 2000 Improving intrusion detection by predictions
Zhang et al. [37] 2014 Hidden Markov model DARPA 2000 Improvements in theoretical background
Kholidy et al. [38], [39], [40] 2014 Hidden Markov model, DARPA 2000 Timing metric – predicts an attack coming in 39
Variable-order Markov model minutes
Abraham and Nair [41] 2015 Markov model Testbed Exploitability analysis, vulnerability life-cycle
Bar et al. [42], [43] 2016 Markov chain Live (honeypot) Large-scale attack propagation models
Game Theory (Section IV-D)
Authors Year Approach/Model Evaluation Advantages and Limitations
Lisý et al. [44] 2012 Game theory Virtual attacks 38.6 % accuracy
Pı́bil et al. [45] 2012 Game theory Comparison with Extensions of previous works
naive algorithms
Abdlhamed et al. [9] 2016 Game theory, time series DARPA 1999 Combined approach

also served as a basis for other model-checking approaches, actions to transition from the initial state to any of the success
e.g., methods using Bayesian networks and Markov models states, the attack is successful, as the success state represents
and game-theoretical methods. a system compromise.
1) Method Description: An attack graph (often abbreviated
as AG) is a tuple G = (S, r, S0 , Ss ), where S is a set of As stated earlier, an attack graph is constructed either
states, r ⊆ S × S is a transition relation, S0 ⊆ S is a set manually or automatically; a popular approach is using data
of initial states, and Ss ⊆ S is a set of success states [47]. mining to generate attack graphs [14]. An example of an
The initial state represents the state before the attack starts. attack graph is shown in Figure 3. In the nodes, we can see
Transition relations represent possible actions of an attacker. possible events that comprise an attack. Edge values represent
These are usually weighted, e.g., by the probability that the a probability, by which the event associated with the end node
attacker will choose the action. If an attacker takes all the will happen. The edge value is referred to as predictability.
6

both systems use attack projection as a part of a larger system,


0.048
and the research works do not focus on it.
ICMP PING Another variant of attack graphs is a factor graph proposed
by Cao et al. [24], [25] in 2014. A factor graph is a probabilis-
0.052 tic graphical model consisting of random variables and factor
INFO TELNET access
functions. The authors compare it to Bayesian networks and
Markov random fields and evaluate the use of factor graph for
0.162 predicting attacks over a large dataset of real security incidents
(several years of reports) with a promising accuracy of 75 %.
RPC portmap sadmind
request UDP Ramaki et al. [26] in 2014 proposed RTECA (Real Time
Episode Correlation Algorithm) for multi-step attack scenarios
0.266 detection and prediction. The paper describes in details the
RCP sadmind query with theoretical and practical implications of designing such a tool.
root credentials attempt UDP
Although they propose leveraging attack graph, the authors
0.264 extensively use causal correlations in their approach. Thus, in
their later work, Ramaki et al. [32] dropped the attack graphs
RCP sadmind UDP
NETMGT PROC SERIVCE in favor of Bayesian networks (see Section IV-B for more
CLIENT DOMAIN overflow attempt

0.577 0.798 details).


0.538 GhasemiGol et al. [27] in 2016 introduced an uncertainty-
INFO TELNET access POLICY FTP ATTACK-RESPONSES aware attack graph to evaluate network security state and a
directory listing
anonymous login attempt
forecasting attack graph to estimate the risk of future attacks.
0.002 The forecasting attack graph is built using several other graphs
0.933 0.011
BAD-TRAFFIC
- uncertainty-aware attack graph, hyper-alerts graph (for alert
(portscan)
loopback traffic
INFO TELNET access UDP Portsweep correlation as in [48]), dependency graph, and response graph.
0.002 0.001 Although the attack graphs and probabilities have to be prede-
0.001 fined, they are continuously updated in reaction to incoming
WEB-MISC WEB-CGI
/ doc/ access
BAD-TRAFFIC
finger access alerts. The authors describe the process of graph generation
loopback traffic in details and provide an impressive amount of examples,
illustrations, and algorithms, which makes the paper very
Fig. 3. Example of exploit-oriented attack graph with predictability values interesting as an introductory paper to the field. The authors
(inspired by [14]). also used many tools proposed in earlier works to assess their
usability. The same authors also proposed attack graph-based
attack prediction as a part of their work on incident response
The predictions using attack graphs are based on traversing management [28].
the graph and searching for a successful attack path, or on Polatidis et al. [29], [30] proposed an approach to cyber
probability values of edges in the graph. Assuming a current attack prediction using attack graphs and recommender sys-
attack is in a certain state according to the model, the node is tems. First, an attack graph is built using the information about
marked as an initial state. From the initial state, all the possible infrastructure. Subsequently, a recommender system is used to
paths may be traversed, e.g., using breadth-first search, and the predict cyber attacks using a collaborative filtering approach
ones leading to successful system compromise are selected as that the authors proposed earlier [49]. The papers include a
possible attack paths. The weights might be used to predict case study of attack graph generation in critical infrastructure,
the most probable path. Alternatively, the most probable action specifically maritime supply chain.
of an attacker may be considered in each node, which might
predict the immediate action of the attacker, but there the
attack path may not lead to a successful compromise. B. Bayesian Networks
2) Literature Review: Attack graphs were the first method Another group of model-checking approach to attack pre-
proposed for predicting cyber attacks, dating back to an essay diction is using Bayesian networks. These methods are closely
by Hughes and Sheyner published in 2003 [11]. Many research related to model-checking approaches based on attack graphs
papers that propose using the attack graphs, mostly for attack because a Bayesian network is typically constructed from an
projection and intent recognition, were published in years attack graph. The distinct feature of Bayesian networks are the
2005-2008. Recent additions are listed in this section. conditional variables and probabilities that are reflected in the
In 2013, two alert correlation frameworks, in which predic- model. In some cases, further restrictions are set on Bayesian
tion is involved, were proposed. Chung et al. [22] presented networks. For example, the requirement on the causality of
NICE, a system for countermeasure selection in virtual net- events leads to using causal networks instead of generic
work systems, that uses attack graphs to model and project Bayesian networks.
the attacks. Kotenko and Chechulin [23] presented CAMIAC, 1) Method Description: A Bayesian network is a proba-
a system for cyber attack modeling and impact assessment, bilistic graphical model that represents the variables and the
where the attack graphs are used in a similar way. However, relationships between them. The network is a directed acyclic
7

B C Pr (A) Pr (¬A) with the highest posterior probabilities is the most probable
A root/FTP server 1
1
1
0
1.00
0.65
0.00
0.35 to appear in the future. For practical purposes, a threshold is
192.216.0.10 0 1 1.00 0.00
0 0 0.00 0.00 required to filter out predicted alerts with low probability. If the
0.65 1.00 probability of the predicted event is higher than the threshold,
the predicted event can be reported, and appropriate defense
B Matu FTP BOF C remote BOF on SSH daemon
mechanisms can be set.
192.216.0.10 192.216.0.10 2) Literature Review: A fundamental contribution is re-
search work by Qin and Lee from 2004 [12], which remain
D Pr (B) Pr (¬B) D Pr (C) Pr (¬C)
1 0.85 0.15
0.85 0.70
1 0.70 0.30 a recommended reading even today. The authors presented
0 0.00 1.00 0 0.00 1.00
an approach to attack plan recognition and prediction of
D remote attacker
upcoming attacks based on predefined attack plans. According
to their proposal, a causal network is constructed from low-
Pr (D) Pr (¬D)
0.70 0.30 level alerts. Subsequently, probabilistic inference is conducted
to evaluate the likelihood of the next attack step. Their
Fig. 4. Simple Bayesian Attack Graph illustrating probability computations approach was evaluated using DARPA’s Grand Challenge
(inspired by [50]). Problem datasets. However, only limited results are presented.
A drawback of their work is that it requires a library of attack
plans, from which the causal network is derived. Thus, input
graph with nodes as the discrete or continuous random vari- from a human expert is needed. The authors acknowledge this
ables and edges as the relationships between them. The nodes as a challenge for future work. They also stated that there is
maintain the states of the random variables and conditional a need to distinguish between the deceptive plan and the real
probability form. goal of the attack and also attacks conducted by one attacker
There are several equivalent definitions of a Bayesian net- and a group of collaborating attackers. These issues remain
work. Bayesian network is usually represented as a directed open research problems even today.
acyclic graph (DAG). Each node represents a variable that
Similarly to the situation with attack graphs, methods based
has a certain set of states. The edges represent the causal
on Bayesian networks peaked in late 2000’ and are not getting
relationships between the nodes. Formally, let G = (V, E)
that much attention lately. Wu et al. [31] in 2012 proposed
be a DAG, and let X = (Xv )vV be a set of random variables
minor updates to building Bayesian networks from attack
indexed by V . A Bayesian Network consists of a set of
graphs for attack predictions. The authors propose to include
variables and a set of direct edges between variables. Each
the presence of vulnerabilities and three environmental factors
variable has a finite set of mutually exclusive states. The
into the Bayesian networks to reflect the potential impact of
variable and direct edge form a DAG. To each variable A with
predicted attacks. The environmental factors are the value
parents B1 , B2 ...Bn , there is attached a conditional probability
of assets in the network, the utilization of the host in the
table P (A|B1 , B2 ...Bn ).
network, and the attack history. However, the research work
An example of a Bayesian attack graph is shown in Fig-
only outlines the work and does not include any results.
ure 4 [50]. We can derive that the Bayesian network models
an activity of an attacker (D), who is likely to use one of Ramaki et al. [32] proposed a real-time alert correlation
the buffer overflow exploits (B, C) to get access to a server and prediction framework in 2015. The framework has two
(A). Probability tables are attached to each node informing us modes, online and offline. In the offline mode, a Bayesian
about the probability related to the exploit that the attacker will attack graph is constructed from low-level alerts. In the online
likely use and what is the probability of a successful exploit. mode, the most probable next step of the attacker according
Further extensions or constraints are used for specific to BAG is predicted. The authors evaluated their approach
purposes, including cyber security. For example, Bayesian using the DARPA 2000 dataset. The accuracy of prediction
attack graphs is an attack graph in the form of the Bayesian was observed to be increasing with the length of the attack
network [32]. A causal network is a special case of a Bayesian scenario. Thus, accuracy ranged from 92.3% when processing
network which explicitly requires the relationships in the the first attack step to 99.2% when processing the fifth attack
network to be causal [12]. step.
In order to create a Bayesian network or a Bayesian attack Recently, Okutan et al. [33] included signals unrelated to
graph, the list of events, causal dependencies between events, the target network into the attack prediction method based
and the probability of transitions between events are required. on the Bayesian network. The signals are mentions of attacks
Building the model requires either expert knowledge, or it can on Twitter or the current number of attacks from Hackmaged-
be trained using data mining or machine learning. Typically, don [51]. The results show that the prediction accuracy ranges
the probability tables are calculated from the training datasets from 63 % to 99 %, which makes it a promising approach.
or historical records. Structure learning, parameter learning, Huang et al. [34] in 2018 involved attack prediction using
and unobserved variable inference are the main tasks of the Bayesian network in their framework for assessing cyber
building the network. attacks in cyber-physical systems. However, there are no
Alert prediction using Bayesian networks or Bayesian attack improvements to the prediction method itself; it is more of
graphs uses probabilities depicted in the model. The event an application.
8

C. Markov Models
Another common approach to predicting attacks based on
model-checking prediction methods is using Markov models.
Markov models form a popular category of models, including
well-known examples of Markov chains and Hidden Markov Normal Attempt Progress Compromise
Models (HMM). Markov models are often represented as
a graph, which makes methods based on them similar to
the methods based on attack graphs and Bayesian networks.
Contrary to previously described approaches, Markov mod- Fig. 5. Hidden Markov Model states for predicting cyber attacks (inspired
by [35]).
els operate well in the presence of unobservable states and
transitions, which removes the dependency of intrusion de-
tection and attack prediction methods on possessing complete 2) Literature Review: The methods based on Markov mod-
information. This allows for successful intrusion detection and els appeared along with the methods based on attack graphs
attack prediction even if some attack steps were undetected or and Bayesian networks in late 2000’. Farhadi et al. [15] in
cannot be completely inferred. 2011 proposed a complex framework for alert correlation and
1) Method Description: There are several variants of prediction. In this work, sequential pattern mining is used to
Markov models used for attack prediction, Hidden Markov extract attack scenarios, which are then represented using a
models (HMM), Variable-length Markov models (VLMM), Hidden Markov model that is used for attack plan recognition.
and Variable-order Markov models (VOMM). In this section, Authors claim that their work is the first to use an unsupervised
we show how to construct the model and predict an attack method of attack plan recognition. Research works like this
using an HMM. VLMM and VOMM, however, share the one are part of a trend in research on predictions in cyber
same theoretical background and their utilization for attack security that overcomes a major drawback of previous works.
prediction is very similar. HMM is a statistical model where Instead of relying on a predefined model constructed or super-
the system being modeled is assumed to be a Markov process vised by a human expert, it incorporates unsupervised methods
with unobserved (hidden) states. Hence, we can not observe of data mining or machine learning. Thus, we selected this
the state of a model directly, but only the outputs dependent work as a recommended reading to illustrate this transition.
on the current state. Sendi et al. [35] in 2012 proposed a method of intrusion
Consider having attack sequences consisting of classes such prediction in real time that uses HMMs. The multi-step attacks
as enumeration, host and service probing, exploitation, etc. are the prime interest in this work. An experimental evaluation
These events may be detected by an IDS, and thus the alerts shows how their method can predict multi-step attacks, which
will be raised. From the perspective of HMMs, the alerts are is especially useful for preventing the attacker from gaining
observable outputs of attack classes. Keep in mind that not all control over more and more hosts in the network.
the events can be detected by an IDS. In order to construct Shin et al. [36] in 2013 proposed an advanced probabilis-
an HMM from the attack sequences, we need to determine tic approach for network-based IDS (APAN), which uses a
the number of states in the model, the number of distinct Markov chain to model unusual events in the network traffic
observation symbols per state, the state transition probability and to forecast intrusion. Contrary to other methods based on
distribution, and the initial state distribution [15]. The number Markov models, this method processes network anomalies and,
of states is the number of attack classes. The observation thus, is not aiming at predicting the next move of an attacker
symbols represent IDS alerts. State transition and observation like other model-checking approaches.
probabilities are extracted from historical records or by an Zhang et al. [37] in 2014 discussed differences between
expert. trained and untrained Markov models as applied to detection
HMMs are often visualized as graphs. In cyber security, and prediction of multi-step attacks. The authors first train
attack classes are the nodes, observation symbols are the the HMM by Baum-Welch algorithm. Consequently, attack
edges, and the probabilities are weights of the edges. Figure 5 scenario corresponding to an alert is found using a Forward al-
shows an example of HMM used for attack prediction [35]. gorithm. Finally, the next possible attack sequence is predicted
We can see four states representing the attacker’s progress using the Viterbi algorithm. The approach was evaluated using
from a normal state (nothing is happening) to a successful DARPA 2000 dataset. Trained HMMs scored better than their
compromise. untrained counterparts in both recognition and prediction.
When having a sequence of attack classes, there is a need Kholidy et al. published a series of three papers on attack
to predict the next activity of an attacker, i.e., the next element predictions in cloud systems in 2014. First, attack predic-
in the sequence. Intuitively, there is a need to find the most tion models for intrusion detection systems in the cloud are
likely path from the current state node. The most likely path proposed [38]. Subsequently, the utilization of finite state
provides a sequence of attack classes that are the predicted HMMs for predicting multi-stage attacks in the cloud is
actions of the attacker. To eliminate false positives, it is discussed [39]. Finally, the intrusion prediction model with
recommended to set a probability threshold so that lower finite context with a probabilistic suffix tree is described [40].
probabilities are discarded, and such paths are not considered Abraham and Nair [41] proposed predictive cybersecurity
for further actions [15]. framework based on Markov models for exploitability anal-
9

ysis. The authors use CVSS data to assess the life-cycle of other’s moves in previous turns. Contrary, if all information
vulnerabilities and predict their impact on the network. about past moves is not available to all player, the extensive
Most recently, Bar et al. [42], [43] in 2016 used data form game is said to have imperfect information.
from honeypots for complex modeling of attack propagation 2) Literature Review: Lisý et al. [44] used a zero-sum
using Markov chains. Several frequent patterns of attack game in extensive form with imperfect information to infer
propagation were observed and described in details. However, the attacker’s plan in situations when the attacker tries to
the prediction of the next attacked honeypot is only briefly actively mislead the defender about his goals. They assume the
mentioned and left for future work. targets and their respective value for the attacker are known
as well as the set of all attack scenarios. Every round of
the game, the attacker chooses an action, and the defender
D. Game Theory
chooses a sensor from a given set of sensors. Each sensor has
Game-theoretical approaches to attack prediction are similar given the capability of detecting various attacker’s actions.
to the graphical model-checking approaches discussed earlier. The attacker tries to reach the most valuable target while
The game is used as a model of interaction between an attacker avoiding detection and misleading the defender about the
and a defender. Contrary to the graphical model-checking ultimate goal. The defender tries to guess as many of the
approaches, game-theoretical methods aim to find the best attacker’s moves as possible. They present an algorithm to
strategy for the players instead of the most frequent attack compute an approximation of the Nash equilibria. Another
progression observed in historical data. Thus, game-theoretical presented algorithm each turn identifies the most probable
approaches seem promising especially for prediction of ad- scenarios, thus enabling the defender to guess not only the
vanced attacker’s activity. attacker’s next action but also his ultimate goal.
1) Method Description: Game theory is a mathematical Pı́bil et al. [45] focus on predicting the target of the attacker
tool designed for analysis of an interaction between subjects rather than his next move. They consider the zero-sum finite
with often conflicting objects. The basic assumptions in game game in extensive form with imperfect information between
theory are that participants are rational (they pursue their the attacker and defender. The defender selects the deployment
objectives) and that they reason strategically (they take into of honeypots, mainly how valuable they appear to the attacker.
account their knowledge or expectations of other participants). The attacker chooses which target to attack. They consider two
A game is a model of strategic interaction. The game scenarios; in the first scenario, the attacker has no information
consists of 1) a finite set N of players (usually attacker and other than the perceived value of the target, while in the second
defender/administrator in context of network security), 2) a scenario the attacker can probe a few targets and receive
nonempty set of actions Ai for each player i ∈ N , 3) a payoff noisy information of their type. The Nash equilibria of this
function ui for each player i ∈ N , that assigns each outcome game help the defender to best disguise the honeypots and the
a ∈ ×j∈N Aj a utility of player i. attacker to select which targets will he attack.
A strategy of a player is a function that provides a player’s Abdlhamed et al. [9] in 2016 proposed a system for intrusion
action for each situation in which the player should make prediction in a cloud computing environment. Their system is
a decision. We distinguish between two types of strategies. designed to leverage the problem that theoretic models such
Pure strategies provide a single action for each situation. By as game theory can be highly unreliable with insufficient or
contrast, a mixed strategy assigns each situation a probability uncertain input data. Their system first tries to match the
distribution over the set of player’s actions. The concept of situation to build attack models and scenarios. If the match is
a game solution in game theory is not explicit. The most sufficient, the system assumes the situation is covered by the
commonly used solution concept is a Nash equilibrium [52]. theoretical game theory based model and applies the model’s
In Nash equilibrium, both players have chosen such strategies, prediction. In case the input data are not sufficient, statistical
which neither of them would benefit by deviating from his methods are applied for prediction. Thus, this work poses as
strategy. Finding the Nash equilibria of a game is often com- an example of using a combination of different approaches.
putationally intractable [53]. However, algorithms with lesser
computational complexity approximating the Nash equilibria V. M ETHODS BASED ON C ONTINUOUS M ODELS
are available for some types of games [54], [55]. The second group of methods is using continuous models,
There are various classes of game models that can be used namely time series and grey models, as discussed in appro-
for attack prediction. One such classification distinguishes priate subsections. Such approaches are in most cases suitable
extensive vs. strategic games. In a game in strategic form, for forecasting network security situation. Common results are
each player chooses his action only once, and the actions of forecasts of the numbers, volumes, and composition of attacks
all players are made simultaneously. By contrast, in games in in the network and their distribution in time. Alternatively,
extensive form, the players make the choice of action multiple spatiotemporal patterns in time series may be used to predict
(possibly infinitely many) times simultaneously or in turns cyber attacks. A summary of methods and research papers can
and the players may include all available information in their be found in Table III.
decision at the time the decision is made.
Alternatively, we distinguish games with imperfect vs. per- A. Time Series
fect information. In extensive games with perfect information, Time series pose a very interesting tool for predictive
at any stage of the game, all players are informed about each analysis, that is used in various fields, including cyber security.
10

TABLE III
C OMPARISON OF PREDICTION METHODS , PART II – APPROACHES BASED ON CONTINUOUS MODELS .

Time series (Section V-A)


Authors Year Approach/Model Evaluation Advantages and Limitations
Park et al. [56] (FORE) 2012 Time series and linear regression Live 1.8 time faster reaction to worms
Zhan et al. [57] 2013 Time series (FARIMA) Live (honeypot) Attack predictions up to 5 hours ahead
Silva et al. [58] 2014 Time series (PBRS/EWMA) Live (honeynet) up to 57.8 % accuracy, limited to burst
attacks (brute-forcing and DDoS)
Abdullah, Pillai et al. [59], [60] 2015 Time series (GARMA + ARMA) Live data (honeynet) Limited set of atack types considered
Freudiger et al. [61] 2015 Time series (EWMA) Dshield Collaborative blacklisting
Chen et al. [62] 2015 Spatiotemporal patterns Live (honeynet) Discussison of found attack patterns
Zhan et al. [63] 2015 Time series (FARIM + GARCH) + Live (honeypot) 70 %–87.9 % accuracy
Extreme values
Sokol et al. [64] 2017 Time series (AR(1)) Live (honeynet) 95 % certainty, finding simple models
Werner et al. [65] 2017 Time series (ARIMA) Hackmageddon 14.1 %–21.2 % accuracy
Dowling et al. [66] 2017 Temporal variances Live (honeynet) Attack type predictability
Okutan et al. [67] 2017 Time series (ARIMA) Live data (anonymized) Unconventional resources (Twitter, etc.)
Grey Models (Section V-B)
Authors Year Approach/Model Evaluation Advantages and Limitations
Lin et al. [68] 2014 Grey models DARPA 1998 Supported by immunity model
Leau and Manickam [69], [70] 2016 Grey models DARPA 1999 & 2000 More robust than standard grey models

It is worth mentioning that time series are commonly used in


Historical values
anomaly detection. A time series represent common network Forecast
traffic patterns. Subsequently, the deviations that do not match 95 % limits

with the expected values of network traffic in a given moment


is proclaimed as an anomaly. Although the terminology and
methods of anomaly detection are similar to attack prediction,
the two use cases are substantially different. Hence, research
on anomaly detection is not presented here.

1) Method Description: A time series is a set of consecutive


data points indexed in time order, often plotted in line charts.
Estimation Validation Forecast
A time series is constructed from historical records of an
period period into future
observed phenomenon; in our case, it can be attacker’s activity
or a network security situation state represented in a numerical
value. There are a plethora of methods for dealing with time
series analysis that can be used to predict the values of a time Fig. 6. Time series forecasting with moving average.
series in the near future. A significant number of approaches
employ moving average, a calculation to analyze the data by
creating a series of averages of subsets of the time series.
Variants of moving average analyses include simple moving From there on, Zhan et al. [57] proposed a statistical
averages (SMA) [9] or exponential weighted moving average framework based on time series analysis of honeypot data in
(EWMA) [58], [61]. The weights and exponential smoothing 2013. In 2014, Silva et al. [58] created a model for predicting
allow a prediction method to better reflect the nature of the burst attacks, i.e., brute-forcing and DoS, that is based on time
input time series, e.g., seasonality of network traffic (day- series. The authors compared pseudo-random binary sequences
night differences, etc.). A recent trend is using autoregressive (PBRS) and exponential weighted moving average (EWMA)
moving averages (ARMA, ARIMA) [65], [67]. See Figure 6 to predict the beginning of bursts. In an evaluation using a
for an example of time series forecasting with moving average honeynet, it was shown that the attacks could be predicted
and forecasting confidence limits. with an accuracy ranging from 17.4 % to 57.8 % with a moving
average of around 5-10 hours. Many research papers appeared
2) Literature Review: Using time series for cyber attack in 2015. Abdullah, Pillai et al. [59], [60] proposed using
prediction and forecasting is a somewhat recent idea, com- GARMA and ARMA time series evaluated on live data from
pared to other approaches. A precursor to time series methods a honeynet. Freudiger et al. [61] worked on controlled data
appeared in 2012 when Park et al. [56] proposed FORE, a sharing that would lead to collaborative predictive blacklisting.
mechanism for predicting ”cyber weather” using regression Part of this contribution proposed the use of EWMA time
analysis. The tasks of FORE is to forecast unknown Internet series for predictions and evaluation on Dshield data. Chen et
worms by analyzing the randomness in the network traffic. The al. [62] relied on time series in their work on predicting cyber
concept of the work is that the presence of work in the network attacks using spatiotemporal patterns. Zhan et al. compared
traffic increases network traffic randomness. The forecasts are long-term and short-term predictions of cyber attacks using
based on time series analysis and linear regression. time series (FARIMA and GARCH) and extreme values with
11

interesting results, up to 87.9 % prediction accuracy was equation x b1 (k) is computed and the future values of the
achieved 1 hour ahead of time of an attack. In 2017, Werner sequence X 0 are predicted as x b0 (k + 1) = xb1 (k + 1) − x
b1 (k)
et al. [65] used ARIMA time series to predict the intensity of for k ≥ n. The various methods based on Grey model usually
cyber attacks, i.e., expected number of attacks in the next day. use modified model or extend the model on error prediction.
Sokol et al. [64] used AR(1) model to predict attacks against 2) Literature Review: Preliminary work on network secu-
a honeynet. A similar yet simplistic method using random rity situation forecasting using Grey models from 2006 to 2014
sampling in temporal variance was proposed by Dowling et is covered in a survey by Leau and Manickam [6]. Thus, we
al. [66] to attack type predictability. Recent work by Okutan only surveyed later research works.
et al. [67] uses a broad range of unconventional signals, such In 2014, Lin et al. [68] introduced their definition of the
as Twitter events, to improve forecasting of security incidents network security situation. They claim the network defense
using a time series and ARIMA model. is similar to an immunity system; the severity of a situation
Time series were already mentioned in Section IV-D, where is proportional to the strength of the response. The authors
a combined approach using game theory and supported by compute the network security situation based on the num-
time series analysis was presented [9]. Machine learning ber of defensive measures currently in place. They improve
methods (see Section VI) may also use time series to train the prediction by considering various factors, that influence
classifiers [71]. network security situation. The most influential factors are
selected using the method of grey entropy correlation analysis,
B. Grey Models and the Kalman filter is applied to improve the prediction.
The Grey Models are typically used for predicting cyber In 2016, Leau and Manickam [69] endeavor to overcome the
security situations and define yet another example of method- limitations of GM (1, 1) and Grey-Verhulst models, namely
ologies which employ a continuous mathematical model. The that they are accurate only for specific input series. In their
Grey Theory was first presented by Deng in 1982 [72]. In a work, they introduce an adaptive Grey-Verhulst model that is
grey theory terminology, a situation with no information is robust as applied to wider types of time series. The modifica-
defined as black and a situation with complete information tion consists of an extension of the underlining Grey-Verhulst
as white. Since both options are idealized, the real world model. While the original model from which the differential
problems are somewhere in the middle, in a situation defined equation is derived assumes that x0 (k) + az 1 (k) = b(z 1 (k))2 ,
as grey. Thus, a grey situation can be modeled using a Grey where z 1 (k) = 21 x1 (k) + 12 x1 (k − 1), the modified version
Model (GM). assumes z 1 (k) = x1 (k−1)+ 12 x0 (k)+ 61 x0 (k−1)− 16 x0 (k−2).
1) Method Description: The most widely used grey fore- The value of z 1 (k) is derived so that the error due to different
casting models are GM (1, 1) and its modification Grey- shapes of the original time series is reduced as much as
Verhulst model. The forecasting ability of these models is possible. The same authors also introduce [70] an adaptive
limited to predicting next members of a time series. It is most Grey-Verhulst-Kalman prediction model, which utilizes the
suitable for short-term prediction based on a small sample of adaptive Grey-Verhulst model from their previous work and
data. In network security, authors usually measure the network improves it by applying the Kalman filter to predict the next
security situation and predict its next value. residuum, thus increasing the prediction accuracy.
Let X 0 = {x0 (1), . . . , x0 (n)} be a sequence of length n
whose next value will be predicted, usually a time series. First
the Accumulating Generation Operation (1-AGO) is applied VI. M ACHINE L EARNING AND DATA M INING M ETHODS
and new sequence
Pk X 1 = {x1 (1), . . . , x1 (n)} is created, where
1 0
x (k) = i=1 x (i). By applying accumulation operation, Machine learning (ML) is gaining popularity in the research
the influence of random fluctuations present in the original community in wide areas of exploration, and cyber security is
sequence is weakened. Moreover the original sequence can be no exception [89]. It contains a vast landscape of approaches
easily reconstructed as x0 (k) = x1 (k) − x1 (k − 1) for k > 1, and methods, such as neural networks and support vector
x0 (1) = x1 (1). machines, which makes it difficult to properly categorize ma-
The model is created for the sequence X 1 . Different mod- chine learning in terms of attack prediction methods. Machine
ifications use different models. The original GM (1, 1) model learning is closely tied to data mining [89], which was already
assumes the data satisfy the differential equation mentioned several times in this work. Typically, data mining
dx1 (k) was exploited to create a model used in attack prediction, e.g.,
+ ax1 (k) = b. an attack graph [14] and a Markov model [15]. The utilization
dk
The model works best for data with exponential growth. of data mining in this context is intended to overcome a
The Grey-Verhulst model, which is more appropriate for data major drawback of model-based attack prediction models, i.e.,
following S-curve [73] assumes a differential equation the dependency on models provided by a security expert [3].
However, data mining does not directly influence the method
dx1 (k) itself. Thus, in this section, we only list approaches that
+ ax1 (k) = b[x1 (k)]2
dk make direct use of machine learning. Methods that are only
The model parameters a, b are estimated using least squares supported by machine learning or data mining are discussed
method from the sample data. The solution of the differential in other sections.
12

TABLE IV
C OMPARISON OF PREDICTION METHODS , PART III – APPROACHES BASED ON MACHINE LEARNING AND DATA MINING .

Neural Networks (Section VI-B1)


Authors Year Approach/Model Evaluation Advantages and Limitations
Zheng et al. [74] 2012 BP neural network KDD99 Modular system, very brief discussion
Chen et al. [75] 2013 Recurrent neural network Live (honeypot) Old data (2000-2001)
Zhang et al. [76] 2013 BP and RBF neural networks Custom dataset 84.2-85.42 % accuracy, BP faster than RBF
Xing-zhu [77] 2016 RBF Neural network DARPA 1998 Intrusion prediction
Zhang et al. [78] 2016 Wavelet neural network Testbed Optimized by genetic algorithms
He et al. [79] 2017 Wavelet neural network DARPA (not specified) Minor improvements, discusses drawbacks
Support Vector Machines (Section VI-B2)
Authors Year Approach/Model Evaluation Advantages and Limitations
Cheng and Lang [80] 2012 Support Vector Machine Live Alternative to NSSA forecasting based on
neural networks
Jayasinghe et al. [81] 2014 Support Vector Machine Live (webpages) Limited to drive-by download attacks
Uwagbole et al. [82], [83] 2017 Support Vector Machine Custom dataset Limited to SQL injection attacks
Data Mining (Section VI-B3)
Authors Year Approach/Model Evaluation Advantages and Limitations
Fachkha et al. [84] 2012 Frequent pattern mining, CAIDA network telescope Global scope given by CAIDA’s network
association rule mining telescope size
Kim and Park [85] (CARMA) 2014 Sequence mining Live Thorough reasoning behind the results
Husák and Kašpar [86] 2018 Sequential rule mining Live Collaborative environment,
(alert sharing platform) timing discussed
Other Machine Learning Methods (Section VI-B4)
Authors Year Approach/Model Evaluation Advantages and Limitations
Soska and Christin [87] 2014 Decision-tree classifiers Live detects websites that will turn malicious,
66 % TP and 17 % FP rate
Liu et al. [71] 2015 Random forest classifier VERIS database, Hack- data breach forecasting,
mageddon, Web Hacking 258 features,
Incident Database 90 % TP and 10 % FP rate
Shao et al. [18] 2016 Rule mining, clustering Proof-of-concept User behavior analysis, identification of po-
tentially problematic user groups
Veeramachaneni et al. [88] 2016 Combination of supervised Live Improved detection rates compared to unsu-
(AI2 ) and unsupervised methods pervised methods alone

A. Method Description learning is supervision. Either a model is trained autonomously


There is a number of approaches and methods of machine and thus is referred to as unsupervised unsupervised, or the
learning that can be used to predict future events such as input data are fully or partially labeled by a human expert and
cyber attacks. Thus, we describe the basics of neural networks thus dubbed as supervised or semi-supervised learning. The
herein as they are the most often used machine learning problem of identifying the classes and class attributes in the
method derived from the surveyed papers. Neural networks data, i.e., inputs of the machine learning methods, is known
were prominent at the initial rise of machine learning but were as feature extraction [89].
later replaced by Support Vector Machines (SVM) that offered Artificial neural network (ANN) is a form of distributed
lower computational complexity and shorter learning times. computing inspired by biological neural networks, i.e., neurons
However, with the novel findings, the neural networks are once in a brain. It is composed of simple processing units and
again gaining on popularity [89]. Readers that are interested synapses between them. It is common to visualize ANN in
in more details related to machine learning applications in a graph as illustrated in Figure 7, where nodes are units and
cyber security are kindly referred to a survey by Buczak and edges are synapses. A subset of units acts as input nodes and
Guven [89]. another subset as output nodes. The remaining nodes receive
There are common steps in applying machine learning the signals transmitted from their input nodes, process the
methods. Usually, it consists of two phases, training and signals, and transmit it to their output nodes. The nodes can
testing. During the training phase, appropriate examples from be weighted, and the whole network is typically structured
the learning dataset are learned. Consequently, in the testing in layers. Further, the nodes may have their own state or a
phase, new data are processed by the model and the machine threshold, which retransmits only the signals of a given level.
learning method produces results, such as predicted contin- The weights, thresholds, and synapses are established during
uations of attack sequences. In practice, however, there is the learning phase and may vary as the learning proceeds. The
also a validation phase between the training and testing. In inputs are sent as signals to the input nodes, and the output
the validation phase, another dataset is used to evaluate how nodes then provide the results.
well was the model trained or which of the models should be
used for testing. For example, several neural networks may B. Literature Review
be constructed in the learning phase, each with a different The literature review of machine learning and data mining
number of layers and nodes, which differ in the prediction methods was structured as follows. Three subsections are
accuracy and effectiveness. An important aspect of machine dedicated to methods that were used in multiple research
13

Input Hidden Output drive-by downloads by monitoring and analyzing bytecode


layer layer layer stream produced by a web browser. Uwagbole et al. [82] in
2017 proposed a predictive system based on machine learning
I1 O1 to predict SQL injection attacks. The system uses SVM to
H1 classify web request so that the SQL injection can be predicted
before the web page starts a malicious database query. The
work is accompanied by another paper on generating corpus

Prediction Output
Situation Sample

I2 O2
H2 a for the learning phase [83].
3) Data Mining: Fachkha et al. [84] in 2012 investigated
the data from darknet, a large unassigned IP address space,
.. .. .. to profile the darknet traffic and corresponding cyber threats.
Frequent pattern mining and association rule mining were used
. . . to find hidden correlations between events in darknet traffic.
The found patterns and rules are then proposed to be used
In On for predicting events in the darknet traffic and cyber threats in
Hn
general. Due to the nature of the darknet, in this case, CAIDA
darknet that represents 1/256 of the IPv4 address space, the
Fig. 7. Artificial neural network for network security situation prediction results of such threat prediction have global scope.
(inspired by [90]).
Kim and Park [85] in 2014 used data mining to build the
attack graph for attack prediction. The authors used sequential
works. Those are neural networks, support vector machines, association rule mining to reflect the order of events. Although
and data mining. The remaining research works are discussed the paper indicates that the mined sequences are used for
after that. It is hard to properly categorize this group of constructing the attack graph, the paper does not particularly
methods, because of frequent combinations of approaches or specify how is this actually done but rather focus on the
uniquely used approach. sequence mining. Thus, it was not categorized under attack
1) Neural Networks: A number of papers deal with the graph-based models in Section IV-A. Sequence mining was
application of machine learning to predict network security also used in recent work by Husák and Kašpar [86], in which
situation for the needs of NSSA. These papers are rather short the authors mined sequential rules from cyber security alerts
and focus on the theoretical background of NSSA modeling contained in a large-scale alert sharing platform. Contrary
and forecasting, such as the mathematical formalization of to [85], the emphasis was put on analyzing live data from real
the problem. However, the proposed approaches are rarely networks and evaluating the suitability of such an approach in
supported by experimental evaluation and, thus, provide lim- practice.
ited value for security practitioners. Nevertheless, the common 4) Other Machine Learning Methods: In 2014, Soska and
statement that NSSA is of vital interest is unquestionable. Vari- Christin [87] used machine learning to automatically detect
ous types of neural networks are discussed in these papers, and vulnerable websites before they turn malicious. Traffic statis-
herein, a summary is subsequently provided for completeness tics, filesystem structure, and website content were used to
purposes. The first papers started to appear in 2008, and the train an ensemble of decision-tree classifiers. The authors
work continues till now. In 2012, Zheng et al. [74] discussed performed a year-long evaluation with promising results of
using back-propagation neural networks. Zhang et al. [76] 66% true positive rate and 17% false positive rate, which is a
in 2013 compared back-propagation and radial basis function good result among methods evaluated in practice.
neural networks and Chen et al. [75] proposed using small- Liu et al. [71] in 2015 characterized the extent to which
world echo state network, which is a kind of recurrent neural cyber security incidents can be predicted. The research work is
network. Zhang et al. [78] proposed using wavelet neural focused on data breaches, which are predicted using a random
networks in 2016. Most recently, He et al. [79] proposed using forest classifier against more than 1,000 real data breaches.
a mixed wavelet-based neural network. The number of features used for training the classifier is
Neural networks were also used for intrusion prediction in remarkable, 258 features were collected from organizations’
2016 by Xing-zhu [77]. The research work is, in essence, simi- networks. The features either describe mismanagement symp-
lar to the works on network security situation forecasting, only toms (misconfigured DNS, BGP, etc.) or malicious activity
the motivation is focused more towards predicting particular time series (spam, phishing, network scans, etc.). The resulting
intrusion. 90% true positive rate and 10% false positive rate only
2) Support Vector Machine: Cheng and Lang [80] sug- underline the extent of this work. Due to the significant extent
gested using support vector regression machine to forecast of the work, we list this work as a recommended reading.
network security situation, although this work mostly presents Veeramachaneni et al. [88] in 2016 presented AI2 , a
an alternative to the neural network-based methods. Apart machine learning system for attack prediction that includes
from a different classifier, their work is, in essence, similar human input. First, the first authors use an ensemble of
to research performed in this field using neural networks. unsupervised outlier detection methods, including principal
Support vector machines proved suitable for predicting very component analysis and autoencoders. Subsequently, feedback
specific attacks. Jayasinghe et al. [81] in 2014 predicted from an analyst is obtained and supervised learning module
14

is used. The model is constantly refined as more feedback is malicious flows and current flows, it is possible to predict
gathered, which leads to promising results. The AI2 improves a continuation of the traffic and mitigate the attack.
the detection rate more than three times on average while 2) DDoS Volume Forecasting: Deeply studied topics are the
reducing false positive rate fivefold, compared to unsupervised DDoS attacks and predictions related to them. The predictions
methods alone. of DDoS attacks focuses mostly on identifying the initial phase
Shao et al. [18] in 2016 used user behavior analysis to of an attack, in which the volume of bogus network traffic
predict cyber attacks with a motivation to include the reasoning rises, and the prediction of the volume of the attack. The
behind the attacks. User security rating is derived from his/her volume of a DDoS attack is the most important feature of
consistency (usage patterns), accuracy (frequency of mistakes), such attacks. The metrics for DDoS volume are packet or byte
and constancy (how long the user displays good online behav- rate per second and the estimated number of compromised
ior). Rule mining is then used to find hidden relations in the machines involved in the attack. Knowing the attack volume
behavior patterns. Finally, unsupervised clustering, such as k- in advance tells us whether the target system or the network
means, and manual filtration of the results are used to identify can withstand the attack or if there is enough capacity for
groups of users that are prone to malicious operations. defense, e.g., in scrubbing centers.
Since 2012, several authors have proposed their methods of
VII. OTHER A PPROACHES DDoS forecasting. Kwon et al. [97] used honeynets to capture
the initial phases of the DDoS attack and predict its size.
In this section, we discuss the fourth group of prediction
Later, they used statistical approaches to predict the DDoS
methods, methods that are hard to categorize properly or that
volume [98]. Fachkha et al. [99] proposed an approach based
are highly specialized in terms of a use case or a method
on analysis of data from darknets. Olabelurin et al. [100]
used. The full list of approaches and papers is presented in
improved the forecasting techniques by including entropy in
Table V. There is no common background to these methods,
the calculations.
so we only provide the literature review, and briefly explain
the background there. 3) Evolutionary computing: A very recent approach to
1) Similarity-based Approaches: The first of the alterna- forecast network security situation is based on belief rule base
tive approaches is based on similarity, mostly addressing the (BRB) models and evolutionary algorithms, namely CMA-ES.
problem of attacker’s intention recognition by calculating a This approach emerged in 2016, and was since then described
similarity metric with a previously observed attack. In 2012, and continuously improved by Hu et al. [101], [102] and Wei
Jantan and Rasmi et al. [91], [92] proposed a model of et al. [103], including the improvements in network security
attack strategy that allows comparisons of the attack strategies. situation assessment [107]. BRB model includes a series of
The observed security alerts are expressed numerically, and belief rules and can be built from expert knowledge as well
cosine similarity is applied to infer a similarity between as historical data. These might be subjective and inaccurate.
two attack strategies. It is worth mentioning that the same Subsequently, the covariance matrix adaption evolution strat-
authors have previously developed models based on Bayesian egy (CMA-ES) is used to optimize the models the parameters
networks [106]. of BRB model, which can then forecast network security
In 2014, AlEroud and Karabatis [93] proposed an approach situation. This novel approach seems very promising and
to detect cyber attacks using semantic link network (SLN), might be a good alternative to grey models, that were used for
which utilizes contextual information of network flows and the same purpose, as discussed in section V-B. Nevertheless,
alerts raised in response to them. Subsequently, SLN is used this method is too novel, so that we cannot compare its impact,
to predict and detect malicious flows, focusing on multi-step e.g., by a number of citations.
attacks, using similarity measures. The same authors recently 4) Unconventional data sources: A novel trend in cyber
published a novel approach [94] based on contextual relation- security predictions is using unconventional data sources. For
ships between cyber attacks and calculating their similarity. example, using DNS logs for attack prediction is present in
In 2016, Jiang et al. [95] proposed an intrusion prediction work by Mahjoub and Mathew [104] from 2015, who proposed
mechanism based on honeypot log similarity. System logs a principle called Spike Rank or SPRank, that detects domains
from honeypots were first analyzed using association rule min- showing a sudden spike in DNS queries issued from millions
ing to find useful implicit information and to select features. of clients worldwide towards OpenDNS resolvers. The spikes
Subsequently, the flows are mapped into metric space, and were able to detect several malware campaigns as well as
distance calculation is used to identify flows that are most phishing campaigns.
similar to the known malicious flows, thus adding them to the In addition, even non-technical data sources were consid-
prediction list. This approach aims at reducing false positive ered for cyber attack prediction. Hernandez et al. [16] in
alarms and was evaluated in a live environment of a Taiwanese 2016 performed sentiment analysis on Twitter to predict cyber
academic network. attacks. Sentiment analysis of social networks was also a
Recently, AlEroud and Alsmadi [96] used similarity to data source for Shu et al. [17] in 2018. Information foraging
predict and mitigate attacks in software-defined networks. for improving cyber attack predictions was also discussed by
The network traffic is aggregated to flows, and the flows’ Dalton et al. [105] in 2017. The authors, however, discuss
characteristics are subsequently compared to flow signatures various strategies for information foraging and only briefly
of known attacks. If a similarity is found between known mention the data sources with which they work.
15

TABLE V
C OMPARISON OF PREDICTION METHODS , PART IV – OTHER APPROACHES .

Similarity-based approaches (Section VII-1)


Authors Year Approach/Model Evaluation Advantages and Limitations
Jantan et al. [91], 2012, Similarity Proof-of-concept Reduced time and cost of intention recog-
Rasmi et al. [92] 2013 nition in network forensics
AlEroud and Karabatis [93], 2014, Semantic links and similarity, Synthetic dataset (IP flows), Supported by machine learning,
[94] 2017 Contextual relationships DARPA (not specified) missing temporal aspects
Jiang et al. [95] 2016 Similarity Live (honeynet) Supported by data mining
AlEroud and Alsmadi [96] 2017 Similarity Testbed (SDN) Evaluation limited to DoS prediction
DDoS volume forecasting (Section VII-2)
Authors Year Approach/Model Evaluation Advantages and Limitations
Kwon et al. [97], [98] 2012, Regression analysis and other Live (honeypots) Framework was proposed first,
2017 statistical methods methods were added later
Fachkha et al. [99] 2013 Time series, liner regression CAIDA network telescope Backscatter analysis – global scope
Olabelurin et al. [100] 2015 Entropy forecasting Testbed Low false positive rate – 22.5%
Evolutionary computing (Section VII-3)
Authors Year Approach/Model Evaluation Advantages and Limitations
Hu et al. [101], [102], 2016- Belief rule base model, Proof-of-concept Possible alternative to grey models for
Wei et al. [103] 2017 evolutionary computing network security situation prediction
Predictions based on unconventional data sources (Section VII-4)
Authors Year Approach/Model Evaluation Advantages and Limitations
Mahjoub and Mathew [104] 2015 DNS anomalies Live Practical implementation,
(SPRank) not a research paper
Hernandez et al. [16] 2016 Twitter sentiment analysis, Live (Twitter) Thorough evaluation on real-world events
linear regression
Dalton et al. [105] 2017 Information foraging in Hackmageddon Only suggests improvements to existing
publicly available data methods
Shu et al. [17] 2018 Twitter sentiment analysis, Live (Twitter) Claims to predict attack, including its
logistic regression type, against a particular target

VIII. E VALUATION AND L ESSONS L EARNED 60–70 % [25], [33], [58], [63]. Some works show even worse
results, which indicates that the prediction accuracy in practice
In this section, we evaluate the findings from the literature
is at the lower bounds.
review, and we answer the questions stated in the introduction.
In the first question, we were interested in what can be pre- Other practical aspects of predictions in practice are the time
dicted in the cyber security domain. Although many use cases criteria, namely the time needed to predict future events and
were proposed, they can be reduced to several main use cases, the time that remains to the predicted event. While older works
namely, attack projection and intent recognition, attack or focused on the computational complexity of the prediction
intrusion prediction, and security situation forecasting. These algorithms, the field reports are scarce. However, modern
were already described in details in Section II. The remaining approaches are implemented to operate in real time with
questions are summed up and answered in the following minimal time delay [32], which effectively solves the problem.
subsection. First, we sum up the practical implications, i.e., Nevertheless, there is a need to find out how much time there
how ready are the attack prediction methods to effectively is to react to a predicted attack. Kholidy et al. [38], [39],
mitigate the attacks. Further, we take a closer look at the [40] claim that they can predict an attack forthcoming in 39
evaluation of predictions and forecasts in cyber security. A minutes, which is a promising result that leaves enough time
separate subsection is dedicated to metrics as there appeared to even for manual inspection of the predicted event. However,
be more approaches to set an evaluating set of metric. Finally, there are no other works using the same metrics.
we sum up open and resolved problems in the field. There are two other major issues common to many methods,
populating the knowledge base of the attacks and placing
attack prediction at the most suitable level of abstraction [3].
A. Practical implications First, attack prediction methods require either a library of
Regarding the practical implications, the prime issues are attack plans completed by experts or a dataset of historical
the accuracy and efficiency of predictions, but it is hard to records, from which the attack plans might be constructed.
evaluate and compare the methods. Even setting the right Although both approaches are prone to errors and missing
metrics is a problem as we have discussed further in this attack descriptions, the use of machine learning and data min-
section. However, high prediction accuracy is a good indicator ing for model construction or direct prediction has prevailed in
of a method’s usability in practice. As we inferred in the recent years. However, if an automatically found attack plan is
literature review, there are many approaches that achieved high going to be used in practice, one has to be careful to manually
accuracies of over than 90 % [26], [32], [15]. However, such inspect the results. Second, it is computationally demanding
results were obtained when evaluating the approaches over to implement attack prediction at the network level, e.g., as
datasets. When we take a look at methods evaluated on live part of an IDS. Working with alerts from IDS is much more
network traffic, the prediction accuracies drop down to around scalable and flexible than working with packets or network
16

flows. Additionally, it is convenient to combine alerts from that is that the datasets are not designed for the purpose of
multiple IDS, e.g., a network-based and host-based, to get the evaluating attack prediction. As Fava et al. stated back in
complete trace of the attack. However, correlating alerts from 2008 [115], commonly known datasets, including the DARPA
heterogeneous sources adds additional complexity and stands datasets, are crafted for intrusion detection and, thus, do not
as a research problem of its own [108]. have the notion of attack tracks, i.e., there is no information
Suthaharan [109] states that the network intrusion detection available on the attackers’ intentions or correlation of attack
and prediction are time sensitive applications requiring highly steps. Thus, we can only confirm the accuracy of predicting
efficient Big Data techniques to tackle the problem on the fly. the next attacker’s move, but we cannot confirm or discard the
Thus, it is proven that the data fall into the category of big data. predicted attack plan.
However, a new definition of big data is provided based on
three new parameters, cardinality, continuity, and complexity, C. Evaluation in live network
instead of traditional volume, variety, and velocity. Further,
Evaluation of attack prediction in real-life scenarios is
the suitability of machine learning for big data is discussed.
challenging. It is hazardous to let the adversary execute an
Although methods based on Support Vector Machines provide
intrusion in a real network only to evaluate the predictions. In
excellent accuracies, yet they are not suitable for big Data due
large networks, it is also problematic to get access to every
to their computational complexity. Representational learning
host that could be compromised. Nevertheless, several live data
might be suitable for big data classification, but Machine
sources were used, such as the data from DShield [116], a col-
Lifelong Learning is recommended to be used.
laborative database of firewall logs, and Hackmageddon [51], a
B. Datasets compilation of cyber attack timelines and statistics. Very often,
researchers set up a honeypot to capture cyber security data
During the literature search, we encountered several datasets
and use them to evaluate predictions. The main advantage of
that were often used for evaluation of the proposed meth-
honeypots is that they typically contain only malicious data.
ods. The most popular datasets were produced by MIT
However, they are not useful for studying advanced attack-
Lincoln Labs and are generally recognized as the DARPA
ers for the purpose of attack intention recognition. Finally,
datasets [110], [111]. There are three distinct datasets avail-
darknets, large unassigned IP blacks, such as CAIDA network
able: DARPA 1998, DARPA 1999, and DARPA 2000. DARPA
telescope, were used for prediction in a global scale [84], [99].
2000 further contains two attack scenarios, LLDOS 1.0 and In addition, the research on attack projection is often
LLDOS 2.0.2; often only LLDOS 1.0 was used in attack accompanied by research on deception and network traffic
prediction method evaluations. Although the dataset is popular manipulation. The aim of deception in cyber security is to
and well documented, its main problem is its age; almost 20 guide the adversary to the target of defender’s choice, typically
years old dataset does not reflect current cyber security threats a honeypot. Several researchers [117] continued their work
and network traffic patterns. on attack prediction by proposing a deception system, which
ACM SIGKDD announced KDD Cup 1999 [112], a contest
prepares an attractive target for an attacker. For example,
on knowledge discovery from the cyber security data. In
if an adversary is supposed to exploit a certain service,
this contest, DARPA 1998 dataset was used, although many
a honeypot emulating such service is set up in the target
authors referenced the dataset as the KDD 1999 dataset. The
network, either as a new target or as a clone of a real
KDD Cup 1999 gained a lot of attention from numerous
system. If the predictions are correct and the honeypot setup
researchers on the problem of intrusion detection as well
is quick enough, the attacker would exploit a honeypot, and
as attack prediction, thus allowing further comparisons of
the attack can be studied. Manipulating the terrain for the
various methods. However, substantial flaws in the dataset
attacker was problematic mostly due to the need for rapid
were revealed in a thorough evaluation [113]. Thus, the
deployment of honeypots and movement of targets as traffic
dataset is now considered unreliable and even harmful by
manipulation was too costly. However, recent development in
the community, although attempts for improving the dataset
networking, namely in Software Defined Networks (SDN),
quality were made [114]. Still, the dataset is used even in
allowed easy traffic manipulation. The emerging field of SDN
recent works [9], [77].
thus began producing security-related frameworks focusing
Other datasets public datasets are used scarcely; the re-
on early-stage attack mitigation and traffic redirection, e.g.,
searchers often crafted their own datasets [76] and evaluated
diverting the attack traffic to a honeypot instead of the original
their proposed methods using these data. While some data are
target. AVANT-GUARD [118] is one of the early examples.
obtained from real network traffic, which provides fresh data,
Combining such framework with attack prediction have been
nevertheless it is quite problematic to publish such data due to
proposed recently [96], and we expect more work on this topic
the needs of data anonymization. Another common option is
in near future.
to design a testbed [22], [41], [100], which is often laborious
to set up, even if a proper description is provided. Thus,
custom datasets and testbeds seem suitable for evaluating the D. Metrics
proposed methods, but the reproducibility of such research is Setting the metrics to evaluate and compare attack predic-
often disputable. tion methods is a challenging task. Naturally, we are interested
There is one more common problem related to many in the prediction accuracy as a prime indicator, but that may
datasets used for evaluating methods of attack prediction, and rely on a given context and specific use case. In practical
17

setups, we encountered the time criteria, such as prediction to be created and maintained. Similarly, if a security situation
efficiency and the time remaining to the predicted event. is formally represented, there is a need to consider all the
Specific tasks, such as predictions based on specific attack factors contributing to it, which is not always straightforward.
traits, require specific metrics. In this section, we summarize Here we recapitulate minor problems which were successfully
and evaluate the metrics that are typically used in the literature. approached and which remain open.
The most important metric for evaluating prediction meth- An example of a successfully resolved problem is the gener-
ods is their accuracy. As we have seen in many surveyed ation and maintenance of attack models or attack plan libraries.
papers, the authors often include the accuracy as the percent- The first attack prediction methods depended on attack plan
age of successfully predicted events or situations. However, libraries that had to be populated by human experts. It was
accuracy can be understood broadly and not all the papers use tedious to formally represent all the possible attack paths and
it in a formal sense. Often, we can see confusion matrix as a if so, the model parameters, such as transition probabilities
more descriptive metric of a prediction method. The confusion in graph models, were hard to accurately be obtained. Often,
matrix is used for the evaluation of intrusion detection. Hence a model library built upon historical records were proposed,
it is natural to use it in to evaluate prediction in cyber security which enabled realistic model parameters but still required
as well. However, there are several issues with the use of laborious manual work by experts. However, the introduction
confusion matrices. First, all the elements can be obtained of data mining into the cyber security domain created a
when evaluating a method over an annotated dataset, but breakthrough for attack predictions. Using data mining, an
we can never be sure about the results when evaluating the attack plan library can be constructed automatically and con-
methods over live network data. Second, different methods tinuously updated. Data mining became especially popular for
may use different criteria for true and false positives and constructing graph-based models, for example [14], [15], [32],
negatives. For example, if a certain exploit is predicted to [37]. Data mining closely relates to machine learning, which
happen at a certain time on a specific host, but the attacker became another popular method to attack prediction. Machine
exploits another target or the time of the attack is significantly learning-supported methods do not need an external model
different, it is quite unclear whether we should consider as they construct their own internal representation of cyber
such events as true positives. Finally, in predictive analytics security events and predictive rules during the learning phase.
and other fields of research, precision and recall values are However, human experts still play a vital role in constructing
often used instead of the full confusion matrix, but calculated attack models and consulting the results [88]. Further, a current
from it. Precision is defined as tp/(tp + f p), while recall as research trend is using deep machine learning, which has not
tp/(tp + f n). Precision and recall are favored to prevent the been observed in the surveyed literature yet. We expect to see
accuracy paradox, i.e., a situation in which a predictive model deep learning-based prediction methods in cyber security in
with a given level of accuracy may have greater predictive the near future.
power than models with high accuracy. These metrics were Although the problems outlined earlier in this section have
often used to evaluate statistical methods and methods based been resolved, many other issues remain. The major issue
on machine learning, that we surveyed in the literature review. is how can prediction methods react to new trends in cyber
To sum up, even though many surveyed papers use similar security, e.g., novel attack vectors and security paradigms.
metrics, they are hardly comparable due different works going Even though we cannot effectively predict 0-day attacks, its
into different levels of details or using less formal definitions attack progression is typically similar to some of the known
of prediction accuracy. attacks, thus making the actual attack predictable to some
Time criteria were used for evaluation of attack prediction extent. However, how can we react to paradigm shifts and
methods by Kholidy et al., who measured the time difference novel attack vectors that arose with the development of the
between the prediction and the predicted attack [38], [39], Internet of Things (IoT), cyber-physical systems, software-
[40]. Thus, it is possible to estimate when is the attack going defined networking (SDN), and other current trends? Indeed,
to appear and how much time there is to prepare an appropriate the first attempts to predict attacks in these novel paradigms
defense. On the one hand, the time delay between individual have already been proposed [34], [96]. Nevertheless, it is
attack steps can be inferred from the history of attacks in most definitely interesting to see how we can adapt the general
of the attack prediction methods. On the other hand, the time methods to work under emerging paradigms in networking
criterion may be used as an indicator of the practical usability and security.
of a prediction method. Thus, the time criterion should be
considered especially by practitioners who require some time IX. C ONCLUSION & O UTLOOK
to react to a prediction. In this paper, we presented a literature survey of attack
prediction methods. The problem was set in a context of re-
search on intrusion detection and cyber situational awareness.
E. Open and Resolved Problems A taxonomy of methods was provided, and each category
In the introduction and the literature survey in Sections IV– was described in detail and evaluated. The final evaluation
VII, we have mentioned a number of problems associated compared the methods and discussed related problems and
with attack prediction and forecasting. Many of these problems lessons learned. Herein, we conclude our findings on the
were common to multiple attack prediction methods. For ex- theory and practice of attack prediction and suggest future
ample, if a method depends on an attack model, the model has events in the field.
18

Three important findings emerged from the literature review. [6] Y.-B. Leau and S. Manickam, Network Security Situation Prediction:
First, many of the prediction methods in cyber security are A Review and Discussion. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2015, pp. 424–435.
using a model to represent and project the future state of an [7] X. Wei and X. Jiang, “Comprehensive analysis of network security
attack or a security situation. Although there is an apparent situational awareness methods and models,” in Instrumentation and
division of the models given by their use case (attack pro- Measurement, Sensor Network and Automation (IMSNA), 2013 2nd
International Symposium on. IEEE, 2013, pp. 176–179.
jection more often uses discrete models, while forecasting [8] I. A. Gheyas and A. E. Abdallah, “Detection and prediction of insider
network security situation uses continuous models predomi- threats to cyber security: a systematic literature review and meta-
nantly), the two main use cases often complement each other analysis,” Big Data Analytics, vol. 1, no. 1, p. 6, Aug 2016.
and overlap in many cases. Second, we have seen many [9] M. Abdlhamed, K. Kifayat, Q. Shi, and W. Hurst, “A system for intru-
sion prediction in cloud computing,” in Proceedings of the International
new approaches based on data mining and machine learning, Conference on Internet of Things and Cloud Computing, ser. ICC ’16.
which substantially change the state of the research in cyber New York, NY, USA: ACM, 2016, pp. 35:1–35:9.
security predictions. Data mining resolves the dependence on [10] C. W. Geib and R. P. Goldman, “Plan recognition in intrusion detection
systems,” in DARPA Information Survivability Conference amp; Expo-
artificially provided prediction models, while machine learning sition II, 2001. DISCEX ’01. Proceedings, vol. 1, 2001, pp. 46–55
challenges the model-based approaches in general. Finally, we vol.1.
have encountered many problems related to the evaluation [11] T. Hughes and O. Sheyner, “Attack scenario graphs for computer
network threat analysis and prediction,” Complexity, vol. 9, no. 2, pp.
of predictions in cyber security. In the context of empirical 15–18, 2003.
datasets, popular datasets are old, unreliable, and created for [12] X. Qin and W. Lee, “Attack plan recognition and prediction using
other purposes, while evaluations in live networks are not causal networks,” in Computer Security Applications Conference, 2004.
reproducible. We do not even have a common set of metrics 20th Annual, Dec 2004, pp. 370–379.
[13] E. Bou-Harb, M. Debbabi, and C. Assi, “Cyber Scanning: A Compre-
to compare the methods. hensive Survey,” Communications Surveys & Tutorials, IEEE, vol. 16,
In the future, we are likely going to see further improve- no. 3, pp. 1496–1519, 2013.
ments of attack prediction and its utilization in practice. [14] Z. t. Li, J. Lei, L. Wang, and D. Li, “A data mining approach to gener-
ating network attack graph for intrusion prediction,” in Fuzzy Systems
Keeping in mind that attack prediction is one step behind and Knowledge Discovery, 2007. FSKD 2007. Fourth International
intrusion detection, we outline a few directions in which Conference on, vol. 4, Aug 2007, pp. 307–311.
the research will be held. First, a transition in processing [15] H. Farhadi, M. AmirHaeri, and M. Khansari, “Alert Correlation and
Prediction Using Data Mining and HMM,” ISeCure, vol. 3, no. 2, 2011.
the network data and alerts from batches to stream data
[16] A. Hernndez, V. Sanchez, G. Snchez, H. Prez, J. Olivares, K. Toscano,
processing has already started, and we may expect further M. Nakano, and V. Martinez, “Security attack prediction based on
utilization of Big Data analytics [119], [109]. Second, in the user sentiment analysis of Twitter data,” in 2016 IEEE International
near future, we are going to see research on attack prediction Conference on Industrial Technology (ICIT), March 2016, pp. 610–617.
[17] K. Shu, A. Sliva, J. Sampson, and H. Liu, “Understanding cyber attack
in a collaborative environment, such as collaborative intrusion behaviors with sentiment information on social media,” in Social,
detection systems or alert sharing platforms. Predicting attacks Cultural, and Behavioral Modeling. Cham: Springer International
in such an environment is a natural next step of the research Publishing, 2018, pp. 377–388.
[18] P. Shao, J. Lu, R. K. Wong, and W. Yang, “A transparent learning
in this area [120], [86]. Finally, we are going to see more and approach for attack prediction based on user behavior analysis,” in
more data mining and machine learning in cyber security [89] Information and Communications Security. Cham: Springer Interna-
and the attack prediction is no exception. Specifically, we will tional Publishing, 2016, pp. 159–172.
know better if machine learning alone can be used to learn [19] M. R. Endsley, “Situation awareness global assessment technique
(SAGAT),” in Aerospace and Electronics Conference, 1988. NAECON
about the attacks and predict them at the same time, or if data 1988., Proceedings of the IEEE 1988 National. IEEE, 1988, pp. 789–
mining and machine learning will be used only to learn about 795.
the attacks and the prediction will still use pattern matching. [20] ——, “Toward a Theory of Situation Awareness in Dynamic Systems,”
Human Factors, vol. 37, no. 1, pp. 32–64, 1995.
To conclude this paper, the issue of attack prediction is an [21] A. Kott, C. Wang, and R. F. Erbacher, Cyber defense and situational
interesting research problem that has been approached many awareness. Springer, 2014, vol. 62.
times by a number of researchers. Although many solutions [22] C. J. Chung, P. Khatkar, T. Xing, J. Lee, and D. Huang, “NICE:
Network Intrusion Detection and Countermeasure Selection in Virtual
have been proposed, there is still no definite answer on Network Systems,” IEEE Transactions on Dependable and Secure
how to effectively and precisely predict cyber attacks. Attack Computing, vol. 10, no. 4, pp. 198–211, July 2013.
prediction is not yet used in practice and sometimes seen [23] I. Kotenko and A. Chechulin, “A cyber attack modeling and impact
as rather misleading [121], but it is still an open and an assessment framework,” in 2013 5th International Conference on Cyber
Conflict (CYCON 2013), June 2013, pp. 1–24.
imperative, desirable research problem [1], [3], [120]. [24] P. Cao, K.-w. Chung, Z. Kalbarczyk, R. Iyer, and A. J. Slagell, “Pre-
emptive intrusion detection,” in Proceedings of the 2014 Symposium
and Bootcamp on the Science of Security, ser. HotSoS ’14. New York,
R EFERENCES NY, USA: ACM, 2014, pp. 21:1–21:2.
[1] A. Kott, Towards Fundamental Science of Cyber Security. New York, [25] P. Cao, E. Badger, Z. Kalbarczyk, R. Iyer, and A. Slagell, “Preemptive
NY: Springer New York, 2014, pp. 1–13. Intrusion Detection: Theoretical Framework and Real-world Measure-
[2] R. A. Ahmadian and A. R. Ebrahimi, “A survey of it early warning ments,” in Proceedings of the 2015 Symposium and Bootcamp on the
systems: architectures, challenges, and solutions,” Security and Com- Science of Security, ser. HotSoS ’15. New York, NY, USA: ACM,
munication Networks, vol. 9, no. 17, pp. 4751–4776. 2015, pp. 5:1–5:12.
[3] S. J. Yang, H. Du, J. Holsopple, and M. Sudit, Attack Projection. [26] A. A. Ramaki, M. Amini, and R. E. Atani, “RTECA: Real time
Cham: Springer International Publishing, 2014, pp. 239–261. episode correlation algorithm for multi-step attack scenarios detection,”
[4] A. A. Ahmed and N. A. K. Zaman, “Attack intention recognition: A Computers & Security, vol. 49, no. Supplement C, pp. 206 – 219, 2015.
review.” IJ Network Security, vol. 19, no. 2, pp. 244–250, 2017. [27] M. GhasemiGol, A. Ghaemi-Bafghi, and H. Takabi, “A comprehensive
[5] M. Abdlhamed, K. Kifayat, Q. Shi, and W. Hurst, Intrusion Prediction approach for network attack forecasting,” Computers & Security,
Systems. Cham: Springer International Publishing, 2017, pp. 155–174. vol. 58, pp. 83 – 105, 2016.
19

[28] M. GhasemiGol, H. Takabi, and A. Ghaemi-Bafghi, “A foresight model [49] N. Polatidis and C. K. Georgiadis, “A multi-level collaborative filtering
for intrusion response management,” Computers & Security, vol. 62, method that improves recommendations,” Expert Systems with Appli-
pp. 73 – 94, 2016. cations, vol. 48, pp. 100 – 110, 2016.
[29] N. Polatidis, E. Pimenidis, M. Pavlidis, and H. Mouratidis, “Rec- [50] N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic Security Risk
ommender systems meeting security: From product recommendation Management Using Bayesian Attack Graphs,” IEEE Transactions on
to cyber-attack prediction,” in Engineering Applications of Neural Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, Jan 2012.
Networks. Cham: Springer International Publishing, 2017, pp. 508– [51] P. Passeri. (2017) Hackmageddon Information Security Timelines and
519. Statistics. [Online]. Available: http://www.hackmageddon.com/
[30] N. Polatidis, E. Pimenidis, M. Pavlidis, S. Papastergiou, and H. Moura- [52] J. Nash, “Non-cooperative games,” Annals of mathematics, pp. 286–
tidis, “From product recommendation to cyber-attack prediction: gen- 295, 1951.
erating attack graphs and predicting future attacks,” Evolving Systems, [53] V. Conitzer and T. Sandholm, “Complexity Results About Nash Equi-
May 2018. libria,” in Proceedings of the 18th International Joint Conference
[31] J. Wu, L. Yin, and Y. Guo, “Cyber Attacks Prediction Model Based on Artificial Intelligence, ser. IJCAI’03. San Francisco, CA, USA:
on Bayesian Network,” in Parallel and Distributed Systems (ICPADS), Morgan Kaufmann Publishers Inc., 2003, pp. 765–771.
2012 IEEE 18th International Conference on, Dec 2012, pp. 730–731. [54] S. C. Kontogiannis and P. G. Spirakis, “Well supported approximate
[32] A. A. Ramaki, M. Khosravi-Farmad, and A. G. Bafghi, “Real time alert equilibria in bimatrix games,” Algorithmica, vol. 57, no. 4, pp. 653–
correlation and prediction using Bayesian networks,” in Information 667, 2010.
Security and Cryptology (ISCISC), 2015 12th International Iranian [55] H. Tsaknakis and P. Spirakis, “An optimization approach for approx-
Society of Cryptology Conference on. IEEE, 2015, pp. 98–103. imate nash equilibria,” Internet and Network Economics, pp. 42–56,
[33] A. Okutan, S. J. Yang, and K. McConky, “Predicting Cyber Attacks 2007.
with Bayesian Networks Using Unconventional Signals,” in Proceed- [56] H. Park, S.-O. D. Jung, H. Lee, and H. P. In, “Cyber Weather
ings of the 12th Annual Conference on Cyber and Information Security Forecasting: Forecasting Unknown Internet Worms Using Randomness
Research, ser. CISRC ’17. ACM, 2017, pp. 13:1–13:4. Analysis,” in Information Security and Privacy Research. Berlin,
[34] K. Huang, C. Zhou, Y. C. Tian, S. Yang, and Y. Qin, “Assessing the Heidelberg: Springer Berlin Heidelberg, 2012, pp. 376–387.
physical impact of cyberattacks on industrial cyber-physical systems,” [57] Z. Zhan, M. Xu, and S. Xu, “Characterizing Honeypot-Captured Cyber
IEEE Transactions on Industrial Electronics, vol. 65, no. 10, pp. 8153– Attacks: Statistical Framework and Case Study,” IEEE Transactions on
8162, Oct 2018. Information Forensics and Security, vol. 8, no. 11, pp. 1775–1789, Nov
[35] A. S. Sendi, M. Dagenais, and M. Jabbarifar, “Real Time Intrusion 2013.
Prediction based on Optimized Alerts with Hidden Markov Model,” [58] A. Silva, E. Pontes, F. Zhou, A. Guelf, and S. Kofuji, “PRBS/EWMA
Journal of Networks, vol. 7, no. 2, 2012. based model for predicting burst attacks (Brute Froce, DoS) in
[36] S. Shin, S. Lee, H. Kim, and S. Kim, “Advanced probabilistic approach computer networks,” in Ninth International Conference on Digital
for network intrusion forecasting and detection,” Expert Systems with Information Management (ICDIM 2014), Sept 2014, pp. 194–200.
Applications, vol. 40, no. 1, pp. 315 – 322, 2013. [59] A. B. Abdullah, T. R. Pillai, and L. Z. Cai, “Intrusion detection fore-
[37] Y. Zhang, D. Zhao, and J. Liu, “The Application of Baum-Welch casting using time series for improving cyber defence,” International
Algorithm in Multistep Attack,” The Scientific World Journal, vol. Journal of Intelligent Systems and Applications in Engineering, vol. 3,
2014, 2014. no. 1, pp. 28–33, 2015.
[38] H. A. Kholidy, A. Erradi, and S. Abdelwahed, “Attack Prediction Mod- [60] T. R. Pillai, S. Palaniappan, A. Abdullah, and H. M. Imran, “Predictive
els for Cloud Intrusion Detection Systems,” in Artificial Intelligence, modeling for intrusions in communication systems using GARMA
Modelling and Simulation (AIMS), 2014 2nd International Conference and ARMA models,” in 2015 5th National Symposium on Information
on, Nov 2014, pp. 270–275. Technology: Towards New Smart World (NSITNSW), Feb 2015.
[39] H. A. Kholidy, A. Erradi, S. Abdelwahed, and A. Azab, “A Finite [61] J. Freudiger, E. De Cristofaro, and A. E. Brito, Controlled Data
State Hidden Markov Model for Predicting Multistage Attacks in Cloud Sharing for Collaborative Predictive Blacklisting. Cham: Springer
Systems,” in Dependable, Autonomic and Secure Computing (DASC), International Publishing, 2015, pp. 327–349.
2014 IEEE 12th International Conference on, Aug 2014, pp. 14–19. [62] Y.-Z. Chen, Z.-G. Huang, S. Xu, and Y.-C. Lai, “Spatiotemporal
[40] H. A. Kholidy, A. M. Yousof, A. Erradi, S. Abdelwahed, and H. A. patterns and predictability of cyberattacks,” PLOS ONE, vol. 10, no. 5,
Ali, “A Finite Context Intrusion Prediction Model for Cloud Systems pp. 1–19, 05 2015.
with a Probabilistic Suffix Tree,” in Modelling Symposium (EMS), 2014 [63] Z. Zhan, M. Xu, and S. Xu, “Predicting cyber attack rates with extreme
European, Oct 2014, pp. 526–531. values,” IEEE Transactions on Information Forensics and Security,
[41] S. Abraham and S. Nair, “Exploitability analysis using predictive vol. 10, no. 8, pp. 1666–1677, Aug 2015.
cybersecurity framework,” in 2015 IEEE 2nd International Conference [64] P. Sokol and A. Gajdoš, Prediction of Attacks Against Honeynet Based
on Cybernetics (CYBCONF), June 2015, pp. 317–323. on Time Series Modeling. Cham: Springer International Publishing,
[42] A. Bar, B. Shapira, L. Rokach, and M. Unger, “Identifying Attack 2018, pp. 360–371.
Propagation Patterns in Honeypots Using Markov Chains Modeling [65] G. Werner, S. Yang, and K. McConky, “Time series forecasting of
and Complex Networks Analysis,” in Software Science, Technology cyber attack intensity,” in Proceedings of the 12th Annual Conference
and Engineering (SWSTE), 2016 IEEE International Conference on. on Cyber and Information Security Research, ser. CISRC ’17. New
IEEE, 2016, pp. 28–36. York, NY, USA: ACM, 2017, pp. 18:1–18:3.
[43] ——, “Scalable attack propagation model and algorithms for honeypot [66] S. Dowling, M. Schukat, and H. Melvin, “Using analysis of temporal
systems,” in 2016 IEEE International Conference on Big Data (Big variances within a honeypot dataset to better predict attack type proba-
Data), Dec 2016, pp. 1130–1135. bility,” in 2017 12th International Conference for Internet Technology
[44] V. Lisý, R. Pı́bil, J. Stiborek, B. Bošanský, and M. Pěchoucek, “Game- and Secured Transactions (ICITST), Dec 2017, pp. 349–354.
theoretic Approach to Adversarial Plan Recognition,” in ECAI, 2012, [67] A. Okutan, G. Werner, K. McConky, and S. J. Yang, “POSTER:
pp. 546–551. Cyber Attack Prediction of Threats from Unconventional Resources
[45] R. Pı́bil, V. Lisý, C. Kiekintveld, B. Bošanský, and M. Pěchouček, (CAPTURE),” in Proceedings of the 2017 ACM SIGSAC Conference
“Game theoretic model of strategic honeypot selection in computer on Computer and Communications Security, ser. CCS ’17. New York,
networks,” in Decision and Game Theory for Security. Springer, 2012, NY, USA: ACM, 2017, pp. 2563–2565.
pp. 201–220. [68] Z. Lin, L. Xiujie, M. Jing, S. Wenchang, and W. Xiufang, “The predic-
[46] C. Phillips and L. P. Swiler, “A graph-based system for network- tion algorithm of network security situation based on grey correlation
vulnerability analysis,” in Proceedings of the 1998 Workshop on New entropy Kalman filtering,” in Information Technology and Artificial
Security Paradigms, ser. NSPW ’98. New York, NY, USA: ACM, Intelligence Conference (ITAIC), 2014 IEEE 7th Joint International,
1998, pp. 71–79. Dec 2014, pp. 321–324.
[47] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing, “Au- [69] Y.-B. Leau and S. Manickam, “A Novel Adaptive Grey Verhulst Model
tomated generation and analysis of attack graphs,” in Security and for Network Security Situation Prediction,” International Journal of
privacy, 2002. Proceedings. 2002 IEEE Symposium on. IEEE, 2002, Advanced Computer Science & Applications, vol. 1, no. 7, pp. 90–95,
pp. 273–284. 2016.
[48] H. Debar and A. Wespi, “Aggregation and correlation of intrusion- [70] ——, “An Enhanced Adaptive Grey Verhulst Prediction Model for Net-
detection alerts,” in International Workshop on Recent Advances in work Security Situation,” International Journal of Computer Science
Intrusion Detection. Springer, 2001, pp. 85–103. and Network Security (IJCSNS), vol. 16, no. 5, p. 13, 2016.
20

[71] Y. Liu, A. Sarabi, J. Zhang, P. Naghizadeh, M. Karir, M. Bailey, and Procedia Technology, vol. 11, no. Supplement C, pp. 540 – 547, 2013,
M. Liu, “Cloudy with a Chance of Breach: Forecasting Cyber Security 4th International Conference on Electrical Engineering and Informatics,
Incidents,” in USENIX Security Symposium, 2015, pp. 1009–1024. ICEEI 2013.
[72] D. Ju-Long, “Control problems of grey systems,” Systems & Control [93] A. Aleroud and G. Karabatis, “Context Infusion in Semantic Link
Letters, vol. 1, no. 5, pp. 288–294, 1982. Networks to Detect Cyber-attacks: A Flow-Based Detection Approach,”
[73] F.-s. Zhang, F. Liu, W.-b. Zhao, Z.-a. SUN, and G.-y. JIANG, “Applica- in 2014 IEEE International Conference on Semantic Computing, June
tion of grey verhulst model in middle and long term load forecasting,” 2014, pp. 175–182.
Power System Technology, vol. 5, pp. 37–40, 2003. [94] A. AlEroud and G. Karabatis, “Methods and techniques to identify se-
[74] R. Zheng, D. Zhang, Q. Wu, M. Zhang, and C. Yang, “A strategy of curity incidents using domain knowledge and contextual information,”
network security situation autonomic awareness,” in Network Comput- in 2017 IFIP/IEEE Symposium on Integrated Network and Service
ing and Information Security. Springer, 2012, pp. 632–639. Management (IM), May 2017, pp. 1040–1045.
[75] F. Chen, Y. Shen, G. Zhang, and X. Liu, “The network security situation [95] C.-B. Jiang, I. Liu, Y.-N. Chung, J.-S. Li et al., “Novel intrusion
predicting technology based on the small-world echo state network,” prediction mechanism based on honeypot log similarity,” International
in Software Engineering and Service Science (ICSESS), 2013 4th IEEE Journal of Network Management, 2016.
International Conference on. IEEE, 2013, pp. 377–380. [96] A. AlEroud and I. Alsmadi, “Identifying cyber-attacks on software
[76] Y. Zhang, S. Jin, X. Cui, X. Yin, and Y. Pang, Network Security defined networks: An inference-based intrusion detection approach,”
Situation Prediction Based on BP and RBF Neural Network. Berlin, Journal of Network and Computer Applications, vol. 80, pp. 152 –
Heidelberg: Springer Berlin Heidelberg, 2013, pp. 659–665. 164, 2017.
[77] W. Xing-zhu, “Network Intrusion Prediction Model based on RBF [97] D. Kwon, J. W.-K. Hong, and H. Ju, “DDoS attack forecasting system
Features Classification,” International Journal of Security and Its architecture using Honeynet,” in Network Operations and Management
Applications, vol. 10, no. 4, pp. 241–248, 2016. Symposium (APNOMS), 2012 14th Asia-Pacific, Sept 2012, pp. 1–4.
[78] H. Zhang, Q. Huang, F. Li, and J. Zhu, “A network security situation [98] D. Kwon, H. Kim, D. An, and H. Ju, “DDoS Attack Volume Forecast-
prediction model based on wavelet neural network with optimized ing Using a Statistical Approach,” in TODO, 2017.
parameters,” Digital Communications and Networks, vol. 2, no. 3, pp. [99] C. Fachkha, E. Bou-Harb, and M. Debbabi, “Towards a Forecast-
139 – 144, 2016, advances in Big Data. ing Model for Distributed Denial of Service Activities,” in Network
[79] F. He, Y. Zhang, D. Liu, Y. Dong, C. Liu, and C. Wu, “Mixed Wavelet- Computing and Applications (NCA), 2013 12th IEEE International
Based Neural Network Model for Cyber Security Situation Prediction Symposium on, Aug 2013, pp. 110–117.
Using MODWT and Hurst Exponent Analysis,” in Network and System [100] A. Olabelurin, S. Veluru, A. Healing, and M. Rajarajan, “Entropy
Security. Cham: Springer International Publishing, 2017, pp. 99–111. clustering approach for improving forecasting in DDoS attacks,” in
[80] X. Cheng and S. Lang, “Research on network security situation Networking, Sensing and Control (ICNSC), 2015 IEEE 12th Interna-
assessment and prediction,” in Computational and Information Sciences tional Conference on, April 2015, pp. 315–320.
(ICCIS), 2012 Fourth International Conference on. IEEE, 2012, pp. [101] G.-Y. Hu, Z.-J. Zhou, B.-C. Zhang, X.-J. Yin, Z. Gao, and Z.-G.
864–867. Zhou, “A method for predicting the network security situation based
[81] G. K. Jayasinghe, J. S. Culpepper, and P. Bertok, “Efficient and on hidden BRB model and revised CMA-ES algorithm,” Applied Soft
effective realtime prediction of drive-by download attacks,” Journal Computing, vol. 48, pp. 404 – 418, 2016.
of Network and Computer Applications, vol. 38, pp. 135 – 149, 2014. [102] G. Y. Hu and P. L. Qiao, “Cloud Belief Rule Base Model for Network
[82] S. O. Uwagbole, W. J. Buchanan, and L. Fan, “Applied Machine Security Situation Prediction,” IEEE Communications Letters, vol. 20,
Learning predictive analytics to SQL Injection Attack detection and no. 5, pp. 914–917, May 2016.
prevention,” in 2017 IFIP/IEEE Symposium on Integrated Network and [103] H. Wei, G. Hu, X. Han, P. Qiao, Z. Zhou, Z. Feng, and X. Yin, “A New
Service Management (IM), May 2017, pp. 1087–1090. BRB Model for Cloud Security-state Prediction based on the Large-
[83] ——, “An applied pattern-driven corpus to predictive analytics in miti- scale Monitoring Data,” IEEE Access, 2017.
gating SQL injection attack,” in 2017 Seventh International Conference [104] D. Mahjoub and T. Mathew, “SPRank and IP Space Monitoring at Bru-
on Emerging Security Technologies (EST), Sept 2017, pp. 12–17. CON & Hack.lu,” https://umbrella.cisco.com/blog/2015/11/19/sprank-
[84] C. Fachkha, E. Bou-Harb, A. Boukhtouta, S. Dinh, F. Iqbal, and and-ip-space-monitoring/, 2015.
M. Debbabi, “Investigating the dark cyberspace: Profiling, threat-based [105] A. Dalton, B. Dorr, L. Liang, and K. Hollingshead, “Improving
analysis and correlation,” in 2012 7th International Conference on cyber-attack predictions through information foraging,” in 2017 IEEE
Risks and Security of Internet and Systems (CRiSIS), Oct 2012. International Conference on Big Data (Big Data), Dec 2017, pp. 4642–
[85] Y.-H. Kim and W. H. Park, “A study on cyber threat prediction based 4647.
on intrusion detection event for apt attack detection,” Multimedia Tools [106] M. Rasmi and A. Jantan, Attack Intention Analysis Model for Network
and Applications, vol. 71, no. 2, pp. 685–698, Jul 2014. Forensics. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp.
[86] M. Husák and J. Kašpar, “Towards Predicting Cyber Attacks Using 403–411.
Information Exchange and Data Mining,” in Proceedings of 2018 Inter- [107] H. Wei, G.-Y. Hu, Z.-J. Zhou, P.-L. Qiao, Z.-G. Zhou, and Y.-M. Zhang,
national Wireless Communications and Mobile Computing Conference “A new BRB model for security-state assessment of cloud computing
(IWCMC), 2018, (to appear). based on the impact of external and internal environments,” Computers
[87] K. Soska and N. Christin, “Automatically detecting vulnerable websites & Security, vol. 73, pp. 207 – 218, 2018.
before they turn malicious.” in USENIX Security Symposium, 2014, pp. [108] H. T. Elshoush and I. M. Osman, “Alert correlation in collaborative
625–640. intelligent intrusion detection systems – a survey,” Applied Soft Com-
[88] K. Veeramachaneni, I. Arnaldo, V. Korrapati, C. Bassias, and K. Li, puting, vol. 11, no. 7, pp. 4349 – 4365, 2011, soft Computing for
“AI2 : Training a Big Data Machine to Defend,” in 2016 IEEE 2nd Information System Security.
International Conference on Big Data Security on Cloud (BigDataSe- [109] S. Suthaharan, “Big data classification: Problems and challenges in
curity), IEEE International Conference on High Performance and network intrusion prediction with machine learning,” SIGMETRICS
Smart Computing (HPSC), and IEEE International Conference on Perform. Eval. Rev., vol. 41, no. 4, pp. 70–73, Apr. 2014.
Intelligent Data and Security (IDS), April 2016, pp. 49–54. [110] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall,
[89] A. L. Buczak and E. Guven, “A survey of data mining and ma- D. McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cun-
chine learning methods for cyber security intrusion detection,” IEEE ningham, and M. A. Zissman, “Evaluating intrusion detection systems:
Communications Surveys Tutorials, vol. 18, no. 2, pp. 1153–1176, the 1998 DARPA off-line intrusion detection evaluation,” in DARPA
Secondquarter 2016. Information Survivability Conference and Exposition, 2000. DISCEX
[90] J.-B. Lai, H.-Q. Wang, X.-W. Liu, Y. Liang, R.-J. Zheng, and G.-S. ’00. Proceedings, vol. 2, 2000, pp. 12–26.
Zhao, “Wnn-based network security situation quantitative prediction [111] MIT Lincoln Laboratory. DARPA Intrusion Detection Data Sets.
method and its optimization,” Journal of computer science and tech- [Online]. Available: https://www.ll.mit.edu/ideval/data/
nology, vol. 23, no. 2, pp. 222–230, 2008. [112] The UCI KDD Archive. (1999, Oct.) KDD Cup 1999 Data . [Online].
[91] A. Jantan, M. Rasmi, M. I. Ibrahim, and A. H. A. Rahman, A Similarity Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Model to Estimate Attack Strategy Based on Intentions Analysis for [113] M. Mahoney and P. Chan, “An analysis of the 1999 DARPA/Lincoln
Network Forensics. Berlin, Heidelberg: Springer Berlin Heidelberg, Laboratory evaluation data for network anomaly detection,” in Recent
2012, pp. 336–346. advances in intrusion detection. Springer, 2003, pp. 220–237.
[92] M. Rasmi and A. Jantan, “A new algorithm to estimate the similarity [114] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
between the intentions of the cyber crimes for network forensics,” analysis of the KDD CUP 99 data set,” in 2009 IEEE Symposium on
21

Computational Intelligence for Security and Defense Applications, July


2009.
[115] D. S. Fava, S. R. Byers, and S. J. Yang, “Projecting Cyberattacks
Through Variable-Length Markov Models,” IEEE Transactions on
Information Forensics and Security, vol. 3, no. 3, pp. 359–369, Sept
2008.
[116] SANS. (2017) DShield: Internet Storm Center. [Online]. Available:
https://www.dshield.org/
[117] M. Albanese, E. Battista, S. Jajodia, and V. Casola, “Manipulating the
attacker’s view of a system’s attack surface,” in 2014 IEEE Conference
on Communications and Network Security, Oct 2014, pp. 472–480.
[118] S. Shin, V. Yegneswaran, P. Porras, and G. Gu, “AVANT-GUARD:
Scalable and Vigilant Switch Flow Management in Software-defined
Networks,” in Proceedings of the 2013 ACM SIGSAC Conference on
Computer & Communications Security, ser. CCS ’13. New York, NY,
USA: ACM, 2013, pp. 413–424.
[119] A. Kott, A. Swami, and P. McDaniel, “Security Outlook: Six Cyber
Game Changers for the Next 15 Years,” Computer, vol. 47, no. 12, pp.
104–106, Dec 2014.
[120] E. Vasilomanolakis, S. Karuppayah, M. Mühlhäuser, and M. Fischer,
“Taxonomy and survey of collaborative intrusion detection,” ACM
Comput. Surv., vol. 47, no. 4, pp. 55:1–55:33, May 2015.
[121] A. Chuvakin, “Sad hilarity of predictive analytics in security?”
http://blogs.gartner.com/anton-chuvakin/2016/03/31/sad-hilarity-of-
predictive-analytics-in-security/, March 2016, Published: 2016-03-31,
Accessed 2017-07-15.

Martin Husák is a researcher at the Institute of


Computer Science at Masaryk University, member
of the university’s security team (CSIRT-MU), and a
contributor to The Honeynet Project. He is currently
pursuing a Ph.D. in Computer Systems and Technol-
ogy, and his thesis addresses the problem of early
detection and mitigation of network attacks. His
research interests are related to honeypots, network
monitoring, intrusion detection, and cyber situational
awareness.

Jana Komárková is a researcher at the Institute


of Computer Science at Masaryk University and
member of the university’s security team (CSIRT-
MU). Her main research interests are cyber defence,
attack impact assessment, attack mitigation and cy-
ber situational awareness. She is currently pursuing
a Ph.D. in Computer Systems and Technology, the
topic of her thesis is decision support in network
defence.

Elias Bou-Harb is currently an Assistant Professor


at the computer science department at Florida At-
lantic University, where he directs the Cyber Threat
Intelligence Laboratory. Previously, he was a visiting
research scientist at Carnegie Mellon University.
Elias is also a research scientist at the National
Cyber Forensic and Training Alliance (NCFTA) of
Canada. Elias holds a Ph.D. degree in computer sci-
ence from Concordia University, Montreal, Canada.
His research and development activities and interests
focus on the broad area of operational cyber security,
including, attacks detection and characterization, Internet measurements, cyber
security for critical infrastructure and big data analytics.

Pavel Čeleda is an associate professor affiliated with


the Institute of Computer Science at the Masaryk
University in Brno. He received a Ph.D. degree in
Informatics from University of Defence, Brno. His
main research interests include cyber security, flow
monitoring, situational awareness and research and
development of network security devices. He has
been participating in a number of academia, industry
and defense projects. He is the head of the CSIRT-
MU.

View publication stats

You might also like