Data Analytics For CyberSecurity
Data Analytics For CyberSecurity
VANDANA P. JANEJA
University of Maryland, Baltimore County (UMBC)
www.cambridge.org
Information on this title: www.cambridge.org/9781108415279
DOI: 10.1017/9781108231954
© Vandana P. Janeja 2022
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2022
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-41527-9 Hardback
Preface page ix
Acknowledgment xii
References 165
Index 189
ix
1
Workforce Framework for Cybersecurity(NICE Framework), NIST Special Publication 800-
181 Revision 1, https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-181r1.pdf,
Last Accessed May 2021
2
Workforce Framework for Cybersecurity (NICE Framework), https://niccs.cisa.gov/
workforce-development/cyber-security-workforce-framework , Last Accessed May 2021
3
Information Assurance Workforce Improvement Program, DoD 8570.01-M www.esd.whs.mil/
Portals/54/Documents/DD/issuances/dodm/857001m.pdf; 8140.01 reissues and renumbers
DoDD 8570.01, Last Accessed May 2021
The views expressed in this book are my own and do not reflect those of the
organizations I am affiliated with.
I want to thank Carole Sargent, who planted the seed for this book and
helped me learn the ropes of the process. Many thanks to my editor at
Cambridge University Press, Lauren Cowles, and the Cambridge University
Press team, Amy He, Adam Kratoska, Johnathan Fuentes, Rebecca Grainger,
Andy Saff, and Neena S. Maheen, who were patient with the many deadlines
and generous in their time reviewing the book and its many facets. Many
thanks to the reviewers of this book who helped improve it with their
detailed feedback.
The work in this book is built on the foundation of my work with many of
my students, and the many conversations with them have inspired the writing
of this book. I would like to thank my PhD advisees, Josephine Namayanja,
Chhaya Kulkarni, Sara Khanjani, Ira Winkler, Faisal Quader, Mohammad
Alodadi, Ali Azari , Yanan Sun, Lei Shi, and Mike McGuire; my master’s
advisees, Henanksha Sainani, Prathamesh Walkikar, Suraksha Shukla,
Vasundhara Misal, Anuja Kench, Akshay Grover, Jay Gholap, Javad Zabihi,
Kundala Das, Prerna Mohod, Abdulrahman Alothaim, and Revathi
Palanisamy; my graduate students who worked on research projects with me,
Yuanyuan Feng, Ruman Tambe, Song Chen, Sandipan Dey, Tania Lobo,
Monish Advani, Ahmed Aleroud, Abu Zaher Md Faridee, Sai Pallaprolu,
and Justin Stauffer; and my undergraduate research students, Gabrielle
Watson, Olisaemeka Okolo, Aminat Alabi, Mackenzie Harwood, Adrian
Reyes, Jarrett Early, Alfonso Delayney, David Lewis, and Brian Lewis.
My thought process and contributions are also framed by work with
advisors, collaborators, colleagues, and industry affiliates who have contrib-
uted to my research and thinking in the area of security and data analytics,
including Lucy Erickson, Carolyn Seaman, Aryya Gangopadhyay, Anupam
xii
However, this also makes the device vulnerable since it has external connect-
ivity and access is fragmented through multiple external applications.
Now the obvious question is, does cybersecurity apply to connected devices
only? Although a big part of cybersecurity is a result of the high level of
connectivity, it also includes threats resulting from compromised physical
security. For instance, a highly secure desktop in a secure room, which is
not accessible via the traditional internet but only through a biometric authen-
tication or a card swipe, is also at risk. An unauthorized access to the secure
room poses a cybersecurity threat due to the electronic assets at risk and
potential risk to the systems providing access to the room. This is primarily
because some form of connectivity is always possible in this highly connected
world. Another example is that of an insider threat where an authorized user
accesses materials with a malicious intent.
2020), Digital Attack Map depicting top distributed denial of service (DDoS)
attacks worldwide (Netscout 2020), and Kaspersky Lab’s 3D interactive map
of cyberattacks (Kaspersky 2020). These are helpful in visualizing the spread
of types of attacks and level of activity in a region.
Risks
Motivation
lnternet protocol that is inherently not
To steal Intellectual property
secure
To damage reputation
Applications with several dependencies
To gain access to data, which can
Logical errors in software code (ex.
then be sold
Heartbleed)
To gain access to information, which
Organizational risks (multiple partners ex.
is not generally available
Target, PNNL)
To make a political statement
Lack of user awareness of cybersecurity risks
To impede access to critical data and
(ex. social engineering, phishing)
applications
Personality traits of individuals using the
To make a splash/ for fun
systems
Attaining security
Protecting resources
Hardening defenses
Capturing data logs
Monitoring systems
Tracing the attacks
Predicting risks
Predicting attacks
Identifying vulnerabilities
becoming more and more sophisticated. Such large numbers are difficult to
estimate, and the report outlines a sound strategy for reaching the
estimated loss.
The loss due to cyberattacks is not simply based on direct financial loss but
also based on several indirect factors that may lead to a major financial impact.
As an example, let us consider the Target cyberattack. According to a Reuters
news article (Skariachan and Finkle 2014), Target reported $61 million in
expenses related to the cyberattack out of which $44 million was covered by
insurance. Thus, the direct financial impact to Target was $17 million. Now let
us consider some other factors that led to the indirect loss for Target: there was
a 46% drop in net profit in the holiday quarter and a 5.5% drop in transactions
during the quarter, share price fluctuations led to further losses, cards had to be
reissued to several customers, and Target had to offer identity protection to
affected customers. All these losses amount to much more than the total $61
million loss. In addition, the trust of the customers was lost, which is not a
quantifiable loss and has long-term impacts.
Network security: This deals with the challenges faced in securing the
traditional computer networks and security measures adopted to secure,
prevent unauthorized access and misuse of either the public or the
private network.
Cyberphysical security: This focuses on the emerging challenges due to the
coupling of the cyber systems with the physical systems. For example,
the power plants being controlled by a cyber system present new security
challenges that can arise due to the risk of disruption of the cyber
component or risk of unauthorized control of the cyber system, thus
gaining control of the physical systems.
Data analytics: This crosscutting theme can apply to each of these areas to
learn from existing threats and develop solutions for novel and unknown
threats toward networks, infrastructure, data, and information. Threat
hunting (Sadhwani 2020), which proactively looks for malicious players
across the myriad data sources in an organization, is a direct application
of using data analytics on the various types of security data produced
from the various types of security streams. However, this does not
necessarily have to be a completely machine-driven process and should
account for user behaviors as well (Shashanka et al. 2016), looking at the
operational context. Data analytics can provide security analysts a much
focused field of vision to zero in on solutions for potential threats.
Cybersecurity affects many different types of devices, networks, and organ-
izations. Each poses different types of challenges to secure. It is important to
understand and differentiate between these challenges due to the changing
hardware and software landscape across each type of cybersecurity domain.
While hardware configurations and types of connectivity are out of scope for
this book, it is important to understand some of the fundamental challenges to
study their impact and how they can be addressed using techniques such as
data analytics, which is crosscutting across the many different types of con-
nectivity since data are ubiquitous across all these domains. In the current
connected environment, multiple types of networks and devices are used, such
as computer networks, cyberphysical systems (CPS), Internet of Things (IoT),
sensor networks, smart grids, and wired or wireless networks. To some extent,
IoT and sensor networks can be seen as specialized cases of CPS systems. We
can consider these systems to study how data analytics can contribute to
cybersecurity since any of these types of systems will generate data, which
can be evaluated to understand their functioning and any potential benign or
nonbenign malfunctions.
Computer networks are the most traditional type of networks where groups
of computers are connected in prespecified configurations. These configur-
ations can be designed using security policy deciding who has access to what
areas of networks. Another way networks form is by determining patterns of
use over a period of time. In both cases, zones can be created for access and
connectivity where each computer in the network and subnetworks can
be monitored.
Cyberphysical systems are an amalgamation of two interacting subsystems,
cyber and physical, that are used to monitor a function. cyberphysical systems
are used to monitor and perform the day-to-day functions of the many auto-
mated systems that we rely on, including power stations, chemical factories,
and nuclear power plants, to name a few.
With the ubiquitous advent of connected technology, many “smart” things
are being introduced into our connected environment. A connected network of
such smart things has led to the evolution of Internet of Things. IoT has
become excessively pervasive and prevalent from our smart homes to hos-
pitals. These new types of connected systems bring about new challenges in
securing them from attacks with malicious intent to disrupt their day-to-
day functioning.
Throughout this book, we will use examples from various types of such
connected systems to illustrate how data analytics can facilitate cybersecurity.
The large amount of data collected has led to the “big data” revolution. This
is also the case in the domain of cybersecurity. Big data (Manyika et al. 2011,
Chen et al. 2014) refers to not only massive datasets (volume) but also data that
are generated at a rapid rate (velocity) and have a heterogeneous nature
(variety), and that can provide valid findings or patterns in this complex
environment (veracity). These data can also change by location (venue).
Thus, big data encompasses the truly complex nature of data particularly in
the domain of cybersecurity.
Every device, action, transaction, and event generates data. Cyber threats
leave a series of such data pieces in different environments and domains.
Sifting through these data can lead to novel insight into why a certain event
occurred and potentially allow the identification of the responsible parties and
lead to knowledge for preventing such attacks in the future.
Let us next understand how data analytics plays a key role in
understanding cyberattacks.
in detecting unusual events of interest. The three aspects are temporal, spatial,
and data-driven understanding of human behavioral aspects (particularly of
attackers):
• Firstly, computer networks evolve over time, and communication patterns
change over time. Can we identify these key changes, which deviate from
the normal changes in a communication pattern, and associate them with
anomalies in the network traffic?
• Secondly, attacks may have a spatial pattern. Sources and destinations in
certain key geolocations are more important for monitoring and preventing
an attack. Can key geolocations, which are sources or destinations of attacks,
be identified?
• Thirdly, any type of an attack has common underpinnings of how it is carried
out; this has not changed from physical security breaches to computer
security breaches. Can this knowledge be leveraged to identify anomalies
in the data where we can see certain patterns of misuse?
Utilizing the temporal, spatial, and human behavioral aspects of learning new
knowledge from the vast amount of cyber data can lead to new insights of
understanding the challenges faced in this important domain of cybersecurity.
Thus, simply looking at one dimension of the data is not enough in such
prolonged attack scenarios. For such a multipronged attacks, we need a
multilevel framework that brings together data from several different data-
bases. Events of interest can be identified using a combination of factors such
as proximity of events in time, in terms of series of communications and even
in terms of the geographic origin or destination of the communication, as
shown in Figure 1.5. Some example tasks that can be performed to glean
actionable information are the following:
(a) Clustering based on feature combinations: One important piece of data
collected in most organizations is Intruder Detection System (IDS) logs
such as SNORT. These can be leveraged, and a keyword matrix and a
word frequency matrix can be extracted to use for various analytical tasks.
For example, the keyword matrix can be used to perform alarm clustering
and alarm data fusion to identify critical alerts that may have been missed.
Instead of clustering the entire set of features seen in a snort alarm, we can
perform clustering based on a combination of features.
(b) Collusions and associations: Using the keyword matrix, we also extract
associations to identify potentially repeated or targeted communications. This
information in conjunction with network mapping can also be used to deter-
mine which attacks are consistently targeted to specific types of machines.
N1
N12
N2
N8
N5 N4
N3
N6
N4 N9 N10 N11 N3
0 4 8 12 16
Time
Goal : to identify potential “collusions” among the entities responsible for related events
What this book is not about: This book does not address the traditional
views of security configurations and shoring up defenses, including setting up
computer networks, setting up firewalls, web server management, and patching
of vulnerabilities.
What this book is about: This book addresses the challenges in cybersecurity
that data analytics can help address, including analytics for threat hunting or
threat detection, discovering knowledge for attack prevention or mitigation,
discovering knowledge about vulnerabilities, and performing retrospective and
prospective analysis for understanding the mechanics of attacks to help prevent
them in the future.
This book will provide multiple examples across different types of networks
and connected systems where data can be captured and analyzed to derive
actionable knowledge fulfilling one of the several aims of cybersecurity,
namely prevention, detection, response, and recovery.
Cyber threats often lead to loss of assets. This chapter discusses the multitude
of datasets that can be harvested and used to track these losses and origins of
the attack. This chapter is not about the data lost during cyberattacks but the
data that organizations can scour from their networks to understand threats
better so that they can potentially prevent or even predict future attacks.
14
Figure 2.1 Logical and physical view of user request and response in a network-
based environment.
The request on the other side may again have to pass through the routers and
firewalls at multiple points in the system being accessed by the user. There
may be multiple intrusion detection systems (IDS) posted throughout the
systems to monitor the network flow for malicious activity. This is just one
example scenario; different network layouts will result in different types of
intermediate steps in this process of request and response, particularly based on
the type of response, the type of network being used, the type of organization
of business applications, the cloud infrastructure being used, to name a few
factors. However, certain key components are always present that allow for
multiple opportunities to glean and scour for data related to potential cyber
threats.
There can be several opportunities to collect data to understand potential
threats. Data collection can begin at a user access point, system functionality
level, and commodity level (particularly if the data is being delivered). For
example, at the user level, we can utilize data such as the following: (a) Who is
the user? The psychology of the user, personality types, etc., can influence
whether a user will click on a link or give access to information to others. (b)
What type of interface is being used by the user? Is there clear information about
what is acceptable or not acceptable in the interface? (c) What type of access
system is being used? Is there access control for users? (d) What data are available
about the access pipeline, such as the type of network or cloud being used.
Several common types of datasets can be collected and evaluated, as shown
in Figure 2.2, including various types of log data such as key stroke logs, web
server logs, and intrusion detection logs, to name a few. We next discuss
several types of such datasets.
1
www.wireshark.org.
2
www.smrfoundation.org/nodexl/.
IP3 across all days of the week. We can also see that the degrees of IP9 and IP7
seem to be higher on some days but lower on other days. This is further
clarified by the plot for IP9, which shows Wednesday as a day where IP9 has
inconsistent behavior.
Thus, through such exploratory analysis it is not only possible to identify
nodes that are inconsistent but also time points where the behavior is inconsist-
ent. Alternatively, this method can also be used to identify highly connected
nodes (such as nodes receiving higher than normal connections during a
breach) or least connected nodes (perhaps nodes that are impacted by a breach
and lose connectivity). This type of consistency and inconsistency can be
identified at the node level and at the graph level as discussed in Namayanja
and Janeja (2015 and 2017)
Another study (Massicotte et al. 2003) introduces a prototype network
mapping framework that uses freely available network scanners (nmap,
Xprobe) on built-in network protocols (ICMP, ARP, NetBIOS, DNS,
SNMP, etc.) to create a real-time network topology mapping with the help of
intelligence databases. It must be used in tandem with an intrusion detection
system. Studies discussed earlier for graph metrics can be applied to such
works as well after the topology is discovered.
Darwish and Bataineh 2012) to evaluate browser security indicators. The data
collected through the eye tracker can be mined for patterns such as associations
between security cue locations on the screen and number of views or clicks.
Clustering can be performed on eye gaze data to identify presence or absence
of clusters around security cues. Associations can be analyzed between user’s
perception of security, backgrounds, and demographics to different zones of
eye gaze foci in a stratified manner. If users perceive disclosing important
information through emails as a low-risk activity, they are less likely to see the
security cues. Similarly, if they see the security cues, their perceived risk of
responding will be high. Studies have hypothesized that user education can
change user’s perception of security and help them to better see these security
cues, increasing the likelihood of threat detection or identifying threats through
visual cues such as in the case of phishing.
Vulnerability data: Software vulnerability is a defect in the system (such as
a software bug) that allows an attacker to exploit the system and potentially
pose a security threat. Vulnerabilities can be investigated, and trends can be
discovered in various operating systems to determine levels of strength or
defense against cyberattacks (Frei et al. 2006). Using the National
Vulnerability Database from the National Institute of Standards and
Technology (NIST) (NIST 2017), trends can be analyzed for several years
and across major releases for operating systems to reinforce knowledge of
choices for critical infrastructural or network projects.
NVD is built on the concept of Common Vulnerabilities and Exposures
(CVE),3 which is a dictionary of publicly known vulnerabilities and exposures.
CVEs allow the standardization of vulnerabilities across products around the
world. NVD scores every vulnerability using the Common Vulnerability
Scoring System (CVSS).4 CVSS is comprised of several submetrics, including
(a) base, (b) temporal, and (c) environmental metrics. Each of these metrics
quantifies some type of feature of a vulnerability. For example, base metrics
capture characteristics of a vulnerability constant across time and user environ-
ments, such as complexity, privilege required, etc. The environmental metrics,
on the other hand, are the modified base metrics reevaluated based on organ-
ization infrastructure. NVD allows searches based on subcomponents of these
metrics and also based on the basic security policies of confidentiality, integ-
rity, and availability. These searches can provide data for analysis to identify
trends and behaviors of vulnerabilities across operating systems or other
software for different types of industries.
3 4
http://cve.mitre.org/. https://nvd.nist.gov/cvss.cfm; www.first.org/cvss.
14
12
10
8
Buffer XSS
6 50
45
4
40
35
2
Number
30
0 25
2006 2007 2008 2009 2010 2011 2012 20
15
10
0
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Year
5
https://tools.cisco.com/security/center/viewAlert.x?alertId=35601, Adobe Flash Player Cross-
Site Scripting Vulnerability.
Source of
cybersecurity Type of detection it can be
data Literature study examples used for
Keystroke Heron 2007, Cai and Hao 2011, User behavior, malicious use to
logging Gupta et al. 2016, Hussain et al. detect user credentials
2016
IDS log data Abad et al. 2003, Koike and Association rule mining, human
Ohno 2004, Vaarandi and Podiņš behavior modeling, log
2010, Deokar and Hazarnis 2012, visualization, temporal analysis,
Chen et al. 2014, Janeja et al. anomaly detection
2014, Quader and Janeja 2014,
2015
Router Tsuchiya 1988, Sklower 1991, Suspicious rerouting, traffic
connectivity Qiu 2007, Geocoding Infosec hijacking, bogus routes
and log data 2013, Kim Zetter Security 2013
Firewall log Golnabi et al. 2006, Abedin et al. Generate efficient rule sets,
data 2010 anomaly detection in policy
rules
Raw payload Wang and Stolfo 2004, Parekh Malware detection, embedded
data et al. 2006, Limmer and Dressler malware, user behavior
2010, Kim et al. 2014, Roy 2014
Network Massicotte et al. 2003, Nicosia Consistent and inconsistent
topology 2013, Namayanja and Janeja nodes, time points
2015, 2017 corresponding to anomalous
activity
User system Stephens and Maloof 2014, Van User profiles, user behavior
data Meigham 2016 data, insider threats
Access Vaidya et al. 2007, Mitra et al. Generate efficient access control
control data 2016 roles
Eye tracker Darwish and Bataineh 2012 Browser security indicators,
data security cues, user behavior
Vulnerability Frei et al. 2006 Vulnerability trend discovery
data
29
Goals of
knowledge
discovery
(c) A search component. This involves a parameter search and model search.
This component allows for finding the optimal fit of both the model and
parameter to use based on maximizing the evaluation criteria.
Some example methods and their evaluation criteria are shown in Figure 3.2.
These will be discussed further in the following sections.
Figure 3.3 summarizes the steps for a standard knowledge discovery pro-
cess. The process starts with large and potentially heterogeneous data sources.
Through user input, data need to be identified and selected based on the task
being addressed. As discussed in some models (such as Cross Industry
Standard Process for DM [CRISP-DM]; see Chapman et al. 2000, Shearer
2000), it is important to have user input at this stage to clearly understand the
business or user requirements that can facilitate better and more targeted data
selection. Once the right data sources are identified, integrated data
Heterogeneous data
sources
Actionable
knowledge
In this method, a new range, for example 0–1, is created for the data such
that the old min becomes 0 and the old max becomes 1 so that all the data are
within this range. This allows for a rescaling of the data to a new well-defined
range. Other measures such as z-score normalization work well when the data
are normally distributed and the mean is well defined. A comparison of
normalization methods, including min–max, z-score, and decimal scaling, is
provided by al Shalabi et al. (2006).
Sahin 2014). Feature selection methods can be ranking-based methods that use
entropy measures to quantify how much data are useful in each of the
attributes. Methods such as singular-value decomposition and principal com-
ponent analysis (Fodor 2002 and Wall et al. 2003) work on the common
principal to transform the data such that a combination of the features can be
used as transformed features.
Manhattan distance (MD) is the “city block” distance or the distance measured
on each axis. MD is computed as shown in (3.3):
MDðX,YÞ ¼ ðjx1 y1 j þ jx2 y2 j þ . . . þ j xn yn j Þ (3.3)
The Minkowski distance (MnD) is the generalizable distance from which
both Euclidean and Manhattan distance are derived. MnD is computed as
shown in (3.4):
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
MnDðX,YÞ ¼ ðx1 y1 Þp þ ðx2 y2 Þp þ . . . þ ðxn yn Þp
p
(3.4)
has a value of 1, and d equals the number of negative matches such that oi
and oj both have a value of 0. Once these values are computed, they can be
used to compute the similarity using different types of similarity coefficients
such as the Jaccard coefficient (a/a + b + c). The output is a n n triangular
matrix with the computed similarity between each object pair.
The Jaccard coefficient is an asymmetric coefficient that ignores negative
matches. Asymmetric coefficients should be applied to data where absences
(0s) are thought to carry no information. For example, if we would like to
study attacks, then a system that did not have a set of attacks is not a
relevant question so features with a 0–0 match (i.e., the system has not had
an attack) is not relevant. Therefore, asymmetric coefficient is useful to
employ. Symmetric coefficients acknowledge negative matches and should
therefore be applied to data where absences are thought to carry information.
An example is the simple matching coefficient (SMC). For example, if we
are studying impact of a vulnerability patch on a software, then if in two
types of applications there was no impact of the patch, then this is pertinent
information we want to capture. Thus, in this scenario we would use a
symmetric coefficient. There are some other types of coefficients, called
hybrids, that include the 0–0 match in either the numerator or denominator
but not both. Several different types of similarity coefficients are proposed
and evaluated (Lewis and Janeja 2011), and some examples are shown in
Figure 3.9.
Asymmetric
Coefficient Expression
Baulieu (range –1 to 1) 4( − )
( + + + )2
Simple matching (range 0–1) +
+ + +
Russel–Rao (hybrid, range 0–1)
+ + +
K
Clustering Sum of squared error = ¦ ¦
i 1 x Ci
dist 2 (mi , x )
Silhouette coefficient =
ao bo
Hierarchical BIRCH: Balanced iterative reducing and clustering using hierarchies (Zhang et al. 1996)
DBSCAN: Density-based spatial clustering of applications with noise (Ester et al. 1996)
smallest. At the end of the first round, we have K clusters partitioned around
the seed centroids. Now the means or centroids of the newly formed clusters are
computed and the process is repeated to align the points to the newly computed
centroids. This process is iterated until there is no more reassignment of the
points moving from one cluster to the other. Thus, K-means works on a
heuristic-based partitioning rather than an optimal partitioning. The quality of
clusters can be evaluated using the sum of squared errors (Figure 3.10), which
computes the distance of every point in the cluster to its centroid. Intuitively, the
bigger the error, the more spread out the points are around the mean. However, if
we plot the SSE for K ¼ 1 to K ¼ n (n number of points), then SSE ranges from
a very large value (all points in one cluster) to 0 (every point in its own cluster).
The elbow of this plot provides an ideal value of K.
K-means has been a well-used and well-accepted clustering algorithm due to
its intuitive approach and interpretable output. However, K-means does not
work very well in the presence of outliers and does not form nonspherical or
free-form clusters.
Given object o in a cluster A, then a(o) is the average distance between the
object o and the objects in cluster A. b(o) is the average distance between the
object o and the objects in the second-closest cluster B. The silhouette of o is
then defined as
bð oÞ að oÞ
SðoÞ ¼
max faðoÞ, bðoÞg
where
SðoÞ = –1: not well assigned, on average closer to members of B
SðoÞ = 0: in between A and B
SðoÞ = 1: good assignment of “o” to its cluster A
The silhouette coefficient of a clustering is thus the average silhouette of all the
objects in the cluster.
CLARANS provides an efficient way to deal with the issues related to PAM
(K-Medoids) that involves a large number of swapping operations, leading to
very high complexity. It also improves upon CLARA. CLARANS is similar to
CLARA in terms of sampling, but here the sampling is done at every step.
However, objects are stored in main memory and the focus on representative
objects is missing. This issue has been addressed using spatial data structures
such as R* trees and focusing techniques. Although CLARANS enables outlier
detection, it is not primarily geared toward the process of detecting anomalies.
for all the dimensions. It could be possible that objects in some dimensions are
similar, but very dissimilar in some other dimensions. Several variations of
DBSCAN, as shown in Figure 3.11, have been proposed addressing these
limitations.
than the core distance of ‘o’. The reachability distance of p depends on the core
object with respect to which it is calculated.
A low reachability indicates an object within a cluster; high reachability
indicates a noise object or a jump from one cluster to another. Ankerst et al.
(1999) propose a technique to view the cluster structure graphically using the
reachability plot. This is a graph of the reachability distances for each object in
the cluster ordering. This cluster ordering is facilitated by a data structure,
namely the access control list. An object p1 is selected and the core distance is
selected around p1. This leads to the identification of objects in this neighbor-
hood, and their reachability distances are measured and stored in the access
control list. This process is performed iteratively in the neighboring objects as
well. Subsequently, the distance is plotted on the reachability plot.
OPTICS addresses the issue of not only traditional clustering but also
intrinsic clustering structure, which other clustering techniques ignore
(single-link effect). Many clustering techniques have global parameters produ-
cing uniform structures, and many times the smaller clusters are merged in the
nearby clusters connected by few points into one big cluster. Thus, OPTICS
addresses this issue of intrinsic clusters that helps in identifying dispersion of
data and correlation of the data. However, this approach is still somewhat
sensitive to input parameters such as minpts in a neighborhood K value for the
KNN query. There could be a scenario when a small cluster itself could be
outlying with respect to a global structure. This book does not address such
a scenario.
3.3.4 Classification
Given a collection of records, the goal of classification is to derive a model that
can assign a record to a class as accurately as possible. A class is often referred
to as a label; for example, a data point is anomaly or not. For example,
Anomaly and Not an anomaly can be two classes. Objects can also be
classified as belonging to multiple classes. Classification is a supervised
method that is based on a well-formed training set, i.e., prelabeled data with
samples of both classes, to create a classifier model that identifies previously
unseen observations in the test set for labeling each instance with a label based
on the model’s prediction. The overall process of classification is shown in
Figure 3.16. In general, the labeled data, which is a set of database tuples with
their corresponding class labels, can be divided into training and test data. In
the training phase, a classification algorithm builds a classifier (set of rules) for
each class using features or attributes in the data. In the testing phase, a set of
data tuples that are not overlapping with the training tuples is selected. Each
Labeled data
Training
data
Classification
Accuracy
Test metrics
Model data
generation
test tuple is compared with the classification rules to determine its class. The
labels of the test tuples are reported along with percentage of correctly
classified labels to evaluate the accuracy of the model in previously unseen
(labels in the) data. As the model accuracy is evaluated and rules are perfected
for labeling previously unseen instances, these rules can be used for future
predictions. One way to do that is to maintain a knowledge base or expert
system with these discovered rules, and as incoming data are observed to
match these rules, the labels for them are predicted.
Several classification algorithms have been proposed that approach the
classification modeling using different mechanisms (see Figure 3.17 and
Kotsiantis et al. 2006). For example, the decision tree algorithms provide a
set of rules in the form of a decision tree to provide labels based on conditions
in the tree branches. Bayesian models provide a probability value of an
instance belonging to a class. Function-based methods provide functions for
demarcations in the data such that the data are clearly divided between classes.
In addition to some of the basic methods there are also classifier combinations,
namely ensemble methods, which combine the classifiers across multiple
samples of the training data. These methods are designed to increase the
accuracy of the classification task by training several different classifiers and
combining their results to output a class label. A good analogy is when humans
seek the input of several experts before making an important decision. There
are several studies suggesting that diversity among classifier models is a
required condition for having a good ensemble classifier (Li 2008, Garcia-
Teodoro et al. 2009). These studies also suggest that the base classifier in the
MILES: multiclass imbalanced learning in ensembles through selective sampling (Azari et al. 2017)
ensemble should be a weak learner in order to get the best results out of an
ensemble. A classification algorithm is called a weak learner if a small change
in the training data produces big difference in the induced classifier mapping.
We next discuss a few relevant classification methods.
label is unknown. These rules form the branches in the decision tree. The
purity of the attributes to make a decision split in the tree is computed using
measures such as entropy. The entropy of a particular node, InfoA (correspond-
ing to an attribute), in a tree is the sum, over all classes represented in the node,
of the proportion of records belonging to a particular class. The entropy
computation of the data, InfoD and attribute, InfoA (e.g., Info (packets)) is
shown in Figure 3.19. When entropy reduction is chosen as a splitting criter-
ion, the algorithm searches for the split that reduces entropy or equivalently the
split that increases information, learned from that attribute, by the greatest
amount. If a leaf in a decision tree is entirely pure, then classes in the leaf can
be clearly described, that is, they all fall in the same class. If a leaf is highly
impure, then describing it is much more complex. Entropy helps us quantify
the concept of purity of the node. The best split is one that does the best job of
separating the records into groups where a single class predominates the group
of records in that branch.
The attribute that reduces the entropy or maximizes the gain (GainA =
InfoD – InfoA) is selected as a splitting attribute. Once the first splitting
attribute is decided, then recursively the process is repeated in all the attributes
of the subset of the data remaining, resulting in a decision tree such as is shown
in Figure 3.18d. Several other measures for attribute selection measures can be
used (such as evaluated by Dash and Liu 1997).
learning sets from different regions in the feature space. This is particularly
important if the data are imbalanced and have distribution of multiple classes
with some majority and minority classes in the feature space. Learning sets that
are dissimilar can produce more diverse results. MILES adopts clustering as a
stratified sampling approach to form a set of learning sets that are different
from each other, and at the same time each one of them is representative of the
original learning set D. It uses K cluster sets to selectively choose the training
samples. It utilizes the concept of split ratio (SR) to split the examples in
clusters into three strata (near, far, and mid) around the cluster centroids as
shown in Figure 3.20.
First, a base classifier H is selected. Here H can be decision tree classifier
such as C4.5, described earlier. Next, the base classifier is trained with training
sets N, F, and M in order to form a set of learned classifiers
π ¼ fH N , H M , H F g. Then the maximum vote rule is used to combine the
votes of the classifiers and build the ensemble H. H
is the original ensemble
MILES, which is created by combining the votes that come from
Under is the variation of MILES combined with random
fH N , H M , H F g. The H
Over is the
undersampling, where fN Under , M Under , F Under g are utilized, and H
variation of MILES combined with random oversampling, where
fN over , M over , F over g are utilized. Undersampling is useful when there are too
many instances of a class, and oversampling is useful when there are too few
samples of the class.
The training examples could potentially be selected into more than three
training sets, however: (1) If the number of splits is increased to very highly
granular demarcations, then there will be splitting of already identified similar
groups so the diversity of the training set will go down. (2) An ensemble
Frequent pattern mining Constraint based (Ng et al. 1998, Pei & Han 2000)
for mining events that occur together and are mutually dependent. Thus,
occurrence of one event is also an indication of the occurrence of the other
event by a certain probability. The detection of M-patterns can be illustrated
using the example shown in Figure 3.23. Let us say we have event dataset
(transactional data) from IDS logs indicating IP address and alarm text (high,
med, and low).
The algorithm IsMPattern is initiated with the value for minimum depend-
ence threshold, such that 0 minp 1. An expression E (such as IP1, high) is
evaluated in terms of individual supports of both directional rules
ðIP1 ! high and high ! IP1Þ. M-pattern relies on a strong bidirectional
dependence; thus it enables the algorithm to discover items even if the supports
are low. However, when generalized for larger minp, the algorithm also
discovers frequent patterns similar to association rule mining. The itemsets
considered for mining for such event sets or patterns do not eliminate low-
support items (nonzero). However, they do eliminate itemsets that may not be
m-patterns, as shown in Figure 3.23. Now, given the itemsets, let us consider
E = IP1 & high, then PðIP1 ! highÞ ¼ 1 and ðhigh ! IP1Þ ¼ 0:6. If the minp
threshold is 0.5, this set is an m-pattern. Similar to the example shown in
Figure 3.22 (min support threshold = 0.25, and min confidence = 0.6), this is
also a strong confidence rule. However, when we look at E = (IP6, med), then
this is an m-pattern but not a strong confidence rule.
The process of identification of rare patterns is based on a probability level.
Thus, even if the supports are low it can identify the rarity of the pattern. It also
discovers rare patterns such that if an expression is an m-pattern, then the
subsets of the expressions are also m-patterns.
60
as variability, variety, and value, may also be considered big data. The term “big”
may refer to the big aspect of the overall complexity of the data. If a dataset is
highly unusual in terms of its complexity, it may qualify as big data under one of
the big data qualities.
In the cybersecurity domain, there are several applications that qualify as big
data. If we consider network data traffic, it is massive; for example, in a midsize
organization, it can range into petabytes per second. If we consider Internet of
Things (IoT), the complexity of data is based on the velocity and variety and in
some cases also the volume. A wireless sensor network or a network of IoT
monitored through distributed IDS (Otoum et al. 2019) can generate a constant
stream of data originating from multiple sensors over time, producing high
variability in the data. This provides a complex and rich dataset.
The value from these big datasets can be derived by sifting through these
complex datasets to derive insightful patterns.
Internet users
800,000,000
700,000,000
600,000,000
500,000,000
400,000,000
300,000,000
200,000,000
100,000,000
0
es
ria
sia
na
n
a
y
il
an
di
pa
si
ic
az
ge
at
hi
ne
us
ex
In
m
Br
Ja
St
C
Ni
do
M
R
er
d
In
G
te
ni
U
Interestingly, the number of attacks (for example, the web attacks discussed
in Akamai 2016) sourced worldwide paints a slightly different picture, as
shown in Figure 4.3.
While majority of the attackers may come from countries with more internet
users, in addition they may also come from the countries with fewer internet
users. For example, the number of internet users in Lithuania in 2018 was a
fraction of that of the United States and is 116th in the list of countries ranked
by the number of internet users. Despite this, Lithuania was placed eighth in
terms of the countries sourcing web application attacks. Similarly, more recent
reports (Akamai 2019) have listed other countries in the top sources of web
attacks even as they do not appear to be a country with highest number of
internet users.
The amount of internet traffic generated may not be an indicator of attack
traffic. There are other factors at play in identifying cyberattacks on the larger
world stage taking into account not only the number of internet users but
sociopolitical factors, crime rates, and other cyberattacks. For example, a
simple search of Lithuania and cyberattacks reveals that Lithuania is also at
the receiving end of many major cyberattacks. Lithuania led a cyber shield
12,00,00,000
10,00,00,000
8,00,00,000
6,00,00,000
4,00,00,000 Aacks
2,00,00,000
0
US
UK
Figure 4.3 Sources of web attacks from countries (2016).
Real time
Latency Near real time
Batch processing
Structured
Structure Semistructured
Unstructured
Batch
Compute infrastructure
streaming
Relational
Factors affecng big data Storage infrastructure No SQL
technology selecon New SQL
Spatial
Visualization
Interactive abstract/summary
Data privacy
Security and Security infrastructure
privacy Data management
Integrity and reactive security
privacy needs of the business. Several big data technologies can be selected
based on these criteria. A pictorial representation of some of these technologies
is shown in Figure 4.5 (Grover et al. 2015 and Roman 2017). For example, if
the need of the business is low latency and SQL-like processing, then the
business can select tools that have a quick turnaround time for queries in
massive datasets. Of the several big data analytics frameworks present in the
market, the business can select the tools that provide massively parallel
processing (MPP), such as engines on top of Hadoop that have high SQL-
like functionality and portability, such as Apache Hive, Facebook Presto,
Apache Drill, and Apache Spark. Out of these, Presto and Spark have been
shown (Grover et al. 2015) to produce better outcomes for SQL-
like processing.
1
https://mahout.apache.org/general/powered-by-mahout.html.
or sensor network data, even small differences in speedup can mean the
difference in millions of dollars. Thus, in selecting a big data technology, it
is also essential to study the scale-ups with the type of technology used.
Let us consider these two big data technologies, which are used for machine
learning and data mining. Apache Mahout2 works on top of the Hadoop map
reduce framework to provide a data analytics library. Spark provides its own
map reduce framework and provides an integrated Spark machine learning
library (MLLib) for machine learning and analytics tasks.
Hadoop is an open-source mechanism for parallel computation in massive
datasets. It provides the Hadoop File System (HDFS) and the map reduce
framework, which provides the mechanism for parallel processing, as will be
discussed in an example case study for change detection. The key steps require
breaking down the data and processing the data in parallel through mappers
that work on key and value pairs. The pairs are then reduced and the
2
http://mahout.apache.org/.
Spark over
Mahout
speedup
Data size Nodes Method Citation achieved
62 MB– 1–2 K-means Gopalani 2 times
1,240 MB and Arora
2015
200–1,000 16 Logistic regression, ridge Meng et al. 3 times
mil data regression, decision tree, ALS 2016
points K-means
50K–50 mil 10 C4.5 K-means Wang et al. 2–5 times
points 2014
Upto 7.5 GB 1 Word count 3–5 times
0.5–4 GB 4 K-means Misal et al. 2 times
2016
infinitely large database to capture the real world precisely. Spatial data pose
more of a challenge due to the type of data, volume of data, and the fact that
there is a correlation between the objects and their neighbor or neighborhood.
There can be relationships between spatial objects such as topological rela-
tionships, direction relationships, metric relationships, or complex relation-
ships, which are based on these basic relationships. Moreover, the data could
depict temporal changes, which is inherent in the data themselves. The same
spatial data can also be represented in different ways, i.e., raster and vector
format. For example, georeferenced data include spatial location in terms of
latitude, longitude and other information about spatial and nonspatial attri-
butes. This essentially means that there is some preprocessing on the raw data
such as satellite imagery or tiger files to extract the georeferenced attributes
and some other nonspatial attributes such as vegetation type, land-use
type, etc.
Another view where spatial data can benefit cybersecurity applications is
in characterization of the region with socioeconomic data and cybercrime
data. The approach proposed by Keivan and Koochakzadeh (2012) first
creates a characterization to determine which cities, areas, or countries may
be more likely to send a malicious request. This is based on the number of
complaints received at each location for different types of cybercrimes.
Locations with a number of complaints above the median are classified as
“critical,” while locations with a number of complaints below the median
are classified as “safe.” In addition, demographic data such as housing
(occupied, vacant, owned/rented, etc.), population (gender, age, etc.) are
also used in the characterization. Based on the complaints by location and
the demographic information, they predict the potential of a location as a
cybercrime origin.
Ferebee et al. (2012) utilize spatial data to produce a cybersecurity storm
map that combines several data streams and provides a geospatial interface to
visualize the location of cyber activity. This concept has been studied at length
in other types of disaster recovery domains but not so much in cybersecurity.
Banerjee et al. (2012) highlights the importance of looking at the
cyberattacks and their impact and interaction with the physical environment.
Since CPS is not limited to sophisticated users, usability is also a big consider-
ation, and this can impact the value of the system and eventually the veracity of
the findings if the usability and interpretability of the systems is low. This work
builds on the cyberphysical security (CypSec) concepts that discuss the map-
ping of the cyber and physical components to provide an environmentally
coupled solution (Venkatasubramanian et al. 2011). Additional examples of
using spatial data are described in Chapter 11.
3
https://nvd.nist.gov/.
Other types of complex datasets include audio, video, and image datasets,
which are increasingly becoming a key data source of anomaly detection due
to propagation of deep fakes (Agarwal et al. 2019).
If we see these central nodes are consistent through time, represented by bins or
intervals of time, then we can say that these nodes are consistent nodes. On the
other hand, if these nodes are not present in all time periods, then they are
inconsistent nodes.
If there are time periods where several of the central nodes become incon-
sistent, then this time period is a potential time point where an event affected
these nodes and may require further investigation. The subgraphs for the time
intervals can also be studied to evaluate graph-level properties. Thus, if the
densification or diameter does not follow the network properties during time
periods, it can again be an indicator of a network-level event at the time points.
Cybersecurity through network and graph data is further discussed in
Chapter 9.
Example
Let us consider the graphs across three time periods where the degrees of the
nodes are shown in the Figure 4.7. Each graph represents a set of communi-
cation lines between the IP nodes. Note that here the degree is based on the
Figure 4.7 Example walkthrough for evaluating central nodes over time.
done by time period. Figure 4.8 shows the process where within a time period
we split the data, and the degree count is done within those chunks where the
key is the node name and the value is the degree, thus forming the key–value
pair. The data are computed for each node in the shuffling and finally the
values are tallied for each node in the reduce phase, producing the final result.
This mechanism of map reduce is common across several big data tools. The
internal workings of how the data are split and the map reduce is performed
may vary.
alarms that may look benign individually but may indicate a critical alert in
combination with other alerts.
In this distributed environment, within each subnet a dIDS agent performs
local intrusion detection and generates IDS log data, as shown in the figure as
IDS1. . . IDSn. Log data from each dIDS agent are sent to a control center,
employing big data tools, where it is stored for aggregated analysis. Each
signature-based agent generates a priority level associated with the alarm when
an attack against a known threat is detected, and generates high, medium, and
low priority alarms for “anomalous” behavior. High-priority alarms can be
clearly labeled; however, the low and medium priority alarm data are very
large, making it difficult for an administrator to perform manual analysis. In
such a scenario, several alarms that are part of a coordinated attack will be
missed. However, if we can show that the high-level alarms have similarities
with low-level alarms, we can propagate the labels to the low-level alarms.
Once we label them as similar to high-level alarms, we can try and study them
carefully for possible breaches that are part of a coordinated attack.
There may be several alarms generated at multiple sources from multiple
sensors (IDS monitors) across networks. These alarms may range from abnor-
mal traffic, unusual port access, and unusual sources. The key idea here is to
connect the anomalies using co-occurrence of alarms at specific time periods,
specific location (node/computer), or similar abnormal behavior. This essen-
tially identifies which have overlaps or similarities in terms of some of the
features as shown in Figure 4.10.
Alarms
Business from
firewall IDS sensors
alarms
Alarms
originating at
business partners
Now given that a lot of data are combined in a big data solution, this can
lead to novel insights, but on the other hand can also lead to loss of privacy. In
today’s day and age, when massive amounts of data are being collected,
privacy guarantees are not possible; however, utilizing multiple datasets actu-
ally decreases the privacy. So in essence we can say that there is an inverse
relationship between the utilization of big data technologies and users’ auton-
omy over personal data.
Big data technologies are highly scalable and efficient as the data storage
and processing is distributed across several nodes in a cluster computing
environment. As a result such an infrastructure potentially opens up multiple
access points to the data be it the network or cloud environment. This can
directly impact the security of the data.
While several big data technologies are available, there is a lack of stand-
ardization in technologies, with new technologies evolving every day. As a
result, portability, standardization, and adoption can be impacted.
While it is clear that big data technologies can benefit business applications,
if not used appropriately they can also negatively impact the end users’
experience. So is a frictionless environment, where there are no restrictions
on data integration and knowledge discovery, good? Probably not. Some
friction is good for getting the right information to the right people at the right
time. Big data helps breach information silos; however, some balance is
required to prevent privacy and security breaches. Privacy has a new challenge
in big data: we might share something without realizing how it will be linked
with other data, leading to privacy and security breaches somewhere in the
future. The privacy concerns are also geospatial to some extent and should be
evaluated as such. For example, in Norway the tax return of every individual is
public (The Guardian 2016); however, the same situation would not be
acceptable in the United States, where individual tax returns are private except
when shared with authorization by the individual.
There is also a recent push for open data. Big data in the public domain can
only be of value if more and more datasets are openly available. Thus, open
data initiatives may lead to big data. However, again security and privacy must
be balanced in this environment. There is indeed a high risk if open data is
merged with some restrictive closed data, leading to security breaches. For
example, if the social background of an individual is posted on networking
sites (for example, a picture that is geo tagged or the geo history in iPhone),
this when combined with other knowledge from LinkedIn and other social
media sites can lead to social engineering attacks.
78
In this model, the behavior of the destination IP at time Tx‒1 when this IP has
normal behavior is compared to the behavior of the same IP as a source at time
Tx when the IP becomes a victim of password compromise. ρ is the threshold,
below which a particular destination is identified as having a nonprobing
scenario. δ is the threshold, below which the source IP depicts normal activities
while the value above the threshold would identify the source to be sending out
a large amount of communication to other destinations (for example, data or
attacks).
Thus, given a set of IP addresses N = {n1,. . ., nn}, a compromised IP n1 is
the source and accessing destination ni more than a certain threshold δ.
Computing Thresholds: The data from log files can be divided into several
bins based on the time stamp and for each bin the count for each destination IP
can be computed and sorted in ascending order. The mean and standard
deviation of the counts can be computed. This mean can be considered as
the threshold value for ρ and δ; alternatively, mean + standard deviation can
also be used as a slightly restrictive threshold. For example, Figure 5.4 shows
example data for both time bins. Thresholds of µ + σ1 and µ + σ2 are computed
to indicate an anomalous communication at time Tx for IP n5. In general, if
subsequent bins show low values as compared to the threshold followed by high
values of the counts as compared to thresholds, then this may indicate a period
worth investigating. While this is not a predictive method, it may provide an
administrator with a way to zoom into unusual communications.
Notice the total count for source IP n1 before the password theft on Tx‒1 is 2,
which is less than our threshold of 2.28 (µ + 2σ). The total count of the source
n1 goes up significantly after the password is potentially compromised on Tx,
which is 5. This count is greater than the threshold of 4.9 (µ + 2σ).
It is important to be able to identify the time periods as well where such
investigations can be initiated. As a preprocessing task, thresholds can be
identified from historic datasets. Overall interesting conditions are when
Count (destination (n1), Tx‒1) < ρ and Count (source (n1), Tx) > δ. Thus,
before the password theft occurs, the count (activity) for that IP (as a destin-
ation) is going to be lot less at time Tx‒1 and the count (activity) for that same
IP (as source) after the password theft is going to be a significantly high at
Tx time.
Discovering Frequent Rules: Association rule mining (discussed in
Chapter 3) can be used to validate the rules where the activities of the
compromised IP increase significantly with a high LIFT value. Association
rule mining provides the rules where the compromised source IP targets many
destination IPs once the intruder gains access to the source machine via
password theft. This source IP from ARM may also show a correspondence
with results from the computational data model and show this IP reaching out
to many targets (n2, n3, n4), and as a result it should show up in the association
rules. The lift value gives us the performance measure and the confidence for
the ARM rules gathered from the algorithm.
malicious site hosting the APT‘s malware (O’Gorman and McDonald 2012).
SQL injection/other is probably the least utilized vector by APTs. Attackers
may use SQL injection to exploit a company’s web servers and then proceed
with the attack.
5.3.1.2 Exploits
Before an attacker can surreptitiously install their chosen backdoor on a victim
machine, vulnerability needs to be available to exploit. Vulnerabilities can be
found in many software applications.
Zero day: A zero-day vulnerability is not known by the software developer
or security community; it is typically only known by the attacker or a small
group of individuals. A zero day is valuable because any victim machine
running that software will be vulnerable to the exploit. Thus, Zero-day vulner-
abilities are difficult to discover (O’Gorman and McDonald 2012).
Known, but unpatched vulnerabilities: A small window of time exists
between the discovery of vulnerability and the time it takes to develop and
release a patch. An attacker may take advantage of this time to develop an
exploit and use it maliciously.
Known, patched vulnerabilities: Since software developers have to release
updates to patch any vulnerability, an opportunity exists for attackers to take
advantage of users who do not regularly update their software.
5.3.1.3 Tools
Many attackers use similar, publicly available tools to navigate a victim
network once they have established backdoors on victim systems. The differ-
ence between APTs lies in the tools they use to maintain access to victims.
Custom tools: Some APTs use custom tools that are not publicly available.
Since a considerable amount of time, expertise, and money is needed to
develop a family of malware, custom tools will highlight the potential expert-
ise and backing of a group.
Publicly available tools: Many APTs use tools that are publicly available
and easily downloadable from many websites dedicated to hacking.
5.3.1.4 Targets
APT targets can be government/military, nongovernmental organizations,
commercial enterprises, or the defense industrial base.
5.3.1.6 Persistence
Persistence is characterized by the length of time a particular APT has been
conducting operations, such as more than five years, between two and five
years, or less than two years
Figure 5.6 presents an overview of the characteristics of some well-known
APTs (SecDev Group 2009, Alperovitch 2011, McAfee Labs 2010, 2011,
Chien and O’Gorman 2011, Thakur 2011, Villeneuve and Sancho 2011,
Blasco 2013, Mandiant 2013). Taking into account the tools, exploits, and
Name Attack vector Exploit Tools Targets Command and control Persistence
APT1 Spear phishing Custom Commercial enterprises, defense Trojan-initiated callbacks, More than five
industrial base third party applications years
Shady Rat Spear phishing Known patched Government/military, non Less than two
vulnerabilities governmental organization, years
commercial enterprises, defense
industrial base
Aurora/ Spear phishing Zero day Custom Nongovernmental organization, Trojan-initiated callbacks, Between two
Elderwood Watering hole commercial enterprises, defense encryption and five years
industrial base
RSA Hack Spear phishing Zero day Publicly Commercial enterprises Trojan-initiated callbacks, Less than two
available standard/default malware years
connections
Lurid Spear phishing Known but Publicly Government/military, commercial Trojan -initiated callbacks
unpatched available enterprises, defense industrial base
vulnerabilities
Night Dragon Spear phishing Known patched Publicly Commercial enterprises Trojan-initiated callbacks Between two
SQL injection/other vulnerabilities available and five years
Ghost Net Spear phishing Known patched Publicly Government/military, non- Less than two
vulnerabilities available governmental organization years
Sykipot Spear phishing Known, but Custom Government/military, defense Trojan-initiated callbacks, More than five
Watering hole unpatched industrial base encryption years
vulnerabilities
Nitro Spear phishing Publicly Non-governmental organization, Trojan-initiated callbacks, Less than two
Watering hole available commercial enterprises encryption years
Figure 5.7 Example overlapping and nonoverlapping rules from ARM for Bin1,
2, and 3.
In this chapter, we focus on what anomalies are, their types, and challenges in
detecting anomalies. In Chapter 7, we focus on specific methods for
detecting anomalies.
91
Figure 6.2 Illustrating the types of anomalies in clusters of network traffic data.
To explain the types of anomalies, let us consider Figure 6.2. The figure
shows clusters of network traffic data that comprise features such as time to
live, datagram length, and total size. Once clustering is performed, we see that
certain points do not fall into any of the clusters. We can evaluate this
clustering as containing potential anomalies.
Figure 6.2 depicts different types of anomalies. A point anomaly is a single
data point that is anomalous with respect to a group of neighboring points.
There can be different points of references such that an anomaly can be
identified with respect to all clusters or with respect to points in a cluster or
in some cases the entire dataset. For example, in the figure we can see that
there is a single point that is a global anomaly, which is an anomalous with
respect to all the three clusters 1, 2, and 3. There some local point anomalies,
which are anomalous with respect to specific clusters such as cluster 2 and 3.
6.2 Context
Context can be in the form of attributes within the data that can be used to
create demarcations in the data so that data are separated into groups for further
analysis. Context can help define homogeneous groups for anomaly detection.
Context discovery is a preprocessing step where the groups are identified using
contextual attributes (such as shown in Figure 6.3), and then anomaly
detection is performed in the contextually relevant groups of data points using
the behavioral attributes that measure the phenomena (such as network traffic
new paths announced, and number of withdrawals after announcing the same
path. The two-dimensional plot of the data considering all the attributes
together (Figure 6.4a) depicts clear anomalies that stand out from the other
data points. This anomaly corresponds to observation 4, which shows high
values for all attributes with respect to all the other observations. Figure 6.4b–e
shows the plot for each attribute. It is clear that observation 4 dominates plots
shown in Figure 6.4b–d. However, in Figure 6.4c we see that observation 3,
which has a high number of BGP updates, is also deviant with respect to the
rest of the data points. Finally, in Figure 6.4d, which plots the attribute AW,
which is the number of withdrawals after announcing the paths, it is evident
that observation 5 is highly deviant compared to the rest of the data points.
This anomaly did not show up in Figure 6.4a and was masked by the other
highly deviant values. This clearly shows the challenge of high dimensionality
when the patterns are hidden in the subspaces.
Thus, anomaly detection is not only focused on the mechanism of labeling
an outlier or anomaly but also carefully identifying the frame of reference in
terms of the objects to compare against and also the subset of attributes or
subspaces to consider.
Once the frame of reference or context in terms of the data and set of
dimensions to be used is determined, then the data mining and analysis could
be used to identify the deviation of the route from the normal.
This type of analysis based on a clearly defined context could be useful in
selecting the suspicious routes for further investigation. Such analysis can help
6.5 Interpretation
Anomalies are in the eyes of the beholder. Anomalies can be a nuisance for an
operator or a user who does not want too many alerts to slow them down. For
example, in some domains such as the emergency operating centers where
complex and multiple systems are used, too many alerts can lead to alert
fatigue, desensitizing the end users to the numerous pings from their systems.
Indeed, in healthcare environments, this type of alert fatigue has been studied,1
and it could lead to potential threats such as ransomware, etc. This can affect
the overall performance of a system, but more importantly this can desensitize
users to click when they should not be clicking. Alternately, for a security
officer each anomaly may be a critical alert to portend the arrival of a major
attack or a major misconfiguration leaving a gaping hole in a system.
Anomaly interpretation may also change from one domain to another. What
appears as a blip in an instrument reading in one domain (such as sensors
monitoring water monitoring quality in a stream) may appear to be a major
alert in another domain (such as sensors monitoring water in the nuclear power
plant). Another example is password attempts. How many failed password
attempts are allowed into a system will depend on the function of the system
being accessed. A bank may only allow three failed attempts before it locks out
the user, whereas an emailing system may allow more attempts. In each of
these systems, predicting whether there is a guess password attack will be
based on different criteria based on the domain context. Figure 6.5 depicts
some of the priorities and discussions at various levels that can impact the
discovery and interpretation of an anomaly. The organization’s priorities may
be connected to the business bottom line, such as reputation, when an attack
happens that can translate to millions of dollars lost (Drinkwater 2016).
Similarly, an end user is highly driven by the day-to-day functions and how
1
https://psnet.ahrq.gov/primers/primer/28/alert-fatigue Last accessed June 2017.
fast the functionality can return to normal. The system and security priorities
are more directly related to the anomaly detection and scoring. However, these
must work in conjunction to the other players’ priorities.
Thus, interpretation of anomalies will vary from one type of user to another,
one type of system to another, and one domain to another based on the context
defined and agreed upon by each.
Many times, two users in the same organization may also not agree on the
definition of an anomaly. It is critical to have a well-crafted definition of an
anomaly to (a) facilitate the discovery of an anomaly, (b) reduce the false
positives, and (c) associate importance to the level of the anomaly for faster
recovery and remediation.
Figure 6.6 Treatment of outliers (adapted from Barnett and Lewis 1994).
101
Classification
based
• Neural networks,
classifiers,
ensembles
High dimensional
outlier detection Nearest
• Subspace clustering neighbor
based
• Discovery in large • Density based
and high-dimensional • Distance based
datasets
Anomaly
detecon
Statistical Clustering
• Discordancy test based
for distributions
• Interquartile • Outliers as by
range based product of
• Standard clustering
deviation based methods
Anomaly
Collective
Point Contextual
or group
Statistical
outliers
Block
Sequential
procedure
Figure 7.2 Types of anomalies and statistical methods of detection.
IQR
Median or 50th percentile
Sorted data
Outliers are 1.5 x IQR above Q3 or below Q1
third quartile respectively. The data are now divided into four chunks of data
ordered by the quartiles. The difference between the third quartile and the first
quartile is the IQR. IQR-based outlier detection assumes that the IQR should
have the majority of the data distribution measuring the spread of the data.
Now if this IQR range is further stretched beyond the first and third quartile
such that the lower threshold is the first quartile minus IQR times 1.5, and the
upper threshold is the third quartile plus the IQR times 1.5. This stretches the
range or spread of the data. Points that fall beyond these thresholds at the lower
extreme or the upper extreme are the outliers.
IQR is primarily a univariate outlier detection method. An example of IQR-
based outlier detection is shown in Figure 7.4, in the network traffic data
sample for the attribute “packet size.”
In general, statistical outlier detection is a highly developed field and has
been well established for a long time. However, for outlier detection using
statistical methods, the distribution of the dataset needs to be known in
advance and the knowledge of the distribution is the underlying assumption.
The properties of the distribution need to be known in advance such as the
mean, standard deviation, etc. Outlier detection also has to take into consider-
ation the masking and swamping effects for sequential and block procedures.
In general, the modeling process is more complicated, especially in case of
multidimensional datasets. The statistical outlier detection mainly works well
for univariate variables. Moreover, these techniques do not work well with
incomplete or inconclusive data, and computation of the results is expensive.
Let us consider the example shown in Figure 7.5. Here the minpts are set to
3, although there are more than 3 points in the neighborhood. Only reachability
distance to the 3 points needs to be measured. Thus, Lrd(p) = 1/((1 + 1 + 2)/3),
which is 0.75.
The outlier factor of p is measured in terms of the uniformity in the density
of p’s neighborhood. Thus, it takes into consideration the local reachability
density of all the minpnts’ nearest neighbors of p. In the preceding example,
these are M1, M2, and M3. The outlier factor of p is as follows:
P
lrd ðoÞ
lrd ðpÞ
OF ðpÞ ¼ N
So in the example from Figure 7.5, the outlier factor is as follows: (0.75 + 1 +
0.5/0.75)/3 = 1.
The OF = 1 represents a cluster of uniform density. The lower p’s Lrd is, and
the higher the Lrd of p’s minpts’ nearest neighbors is, the higher will be p’s
outlier factor.
This approach is one of the first and few approaches that quantifies the
outlierness of an object. In the case of multidimensional datasets, however, the
degree of the outlierness also depends on dimensions that are contributing to
the outlierness and possibly the cause of the outlierness. This approach is able
to identify the local or global outliers with respect to a set of points. For some
applications, both global and local outliers must be pointed out. Moreover, if
only distance from the neighbors or minpts in the neighborhood are con-
sidered, then we overlook the situation when several points are collectively
an anomaly compared to others. If several points deviate from the standard,
p=3
D=2
o
3.5 2.5
1 2
http://isc.sans.edu/submissions_ascii.html, 2017. www.cs.waikato.ac.nz/ml/weka/.
detected. In the other two cases, postprocessing of the analysis will need to be
performed to identify anomalies:
In the case of small clusters, if the data are clustered into groups of different
density and if there is a cluster with fewer points than the other clusters,
this cluster can be a candidate for a collective anomaly. In this case, the
density of the candidate cluster is highly deviant than the other clusters.
There can be other measures, such as the distance between the candidate
cluster and noncandidate cluster, and if the candidate cluster is far from
all other noncandidate points, they are potential outliers. In Figure 7.7,
we can see a density distribution for each cluster, and cluster 5 appears to
have very low density compared to the other clusters. When we consider
the three points in this cluster, they represent a very high number of
targets, sources, and records, indicating a high level of attack activity.
Each of the observations has a time stamp, i.e., the date. If we further
investigate the date by a simple Google search for the time period and
keyword cyberattack, we see reports of relevant attacks occurring around
that time period indicating the unusual activity during the time period.
This type of evaluation helps with exploratory analysis of the dataset.
In the case of larger clusters with scattered points, there may be bigger
clusters where the points in the cluster are not tightly bound around the
centroid. In this case, the distance of every point can be measured to the
centroid, and the points farthest from the centroid can be identified. This
is similar to a distance-based or a K-nearest neighbors–based outlier
detection inside such clusters. These clusters can be identified by looking
at the sum of squared error (SSE) of the clusters, and the clusters with a
high SSE can be candidates for postprocessing for identifying distance-
based outliers. For example, in Figure 7.7, clusters 5 and 6 are potential
clusters for postprocessing where the circled points could be evaluated as
potential outliers.
112
rectangles, polygons, cubes, and other geometric objects. Spatial data have
large volumes, so much so that they may require an infinitely large database to
capture the real world precisely. Spatial data not only pose challenges due to
the type of data and volume but also due to the correlation between the objects
and their spatial neighbors with a neighborhood. There can be relationships
between spatial objects such as topological relationships (e.g., disjoint, meet,
overlap, etc.), direction relationships (e.g., north, south, above, below, etc.),
metric relationships (e.g., distance greater than), or complex relationships (e.g.,
overlap with a certain distance threshold in the north), which are based on
these basic relationships. Moreover, the data can depict temporal changes,
which are inherent in the data.
For the purposes of cybersecurity, we focus on georeferenced data, which
include spatial location in terms of latitude, longitude, and other information
about spatial and nonspatial attributes. This essentially means that there is some
preprocessing on the raw data to extract the georeferenced attributes and some
other nonspatial attributes such as geopolitical attributes, other regional attributes
affecting the network trends such as network bandwidths, etc. Chapter 11 dis-
cusses one such specific geospatial example. Figure 8.1 shows a list of various
cyberattack maps that capture this information in a georeferenced manner.
number of attacks on the East Coast are much larger than the Midwest due to
the richness of targets in the East Coast. Thus, even though there is a bigger
class of locations, for a more accurate analysis the locations will need to be
subdivided to better evaluate and better allocate resources.
Since these instances occurred one after the other, they are possibly similar or
are related in terms of their values (quantified as temporal autocorrelation). The
time difference between the values of the attribute is referred to as lag. For
example, for the variable “packet size” for an IP address, the lag is 1 millisec-
ond. An autocorrelation coefficient measures the correlations (such as using
Pearson’s correlation) between temporal values a certain temporal distance
apart. Here autocorrelation measures the correlation (positive or negative) of
the data variable to itself with a certain time lag. A few approaches (Zhang
et al. 2003) have accounted for this aspect in temporal data mining.
spatial and temporal data are discussed with an emphasis on identifying the
correct partitioning, referred to as neighborhood discovery, in spatial and
temporal data. The neighborhoods can also be used as a characterization such
that objects that deviate substantially from this characterization can be identi-
fied as suspicious. This process is similar to anomaly detection using clustering
as discussed in Chapter 7; the key difference is that the preprocessing to form
the similar objects into a neighborhood is different for spatial and temporal
data due to the properties of the data outlined previously. So essentially the
anomaly detection process is similar for all data types, and the key difference is
the preprocessing based on different types of data. The next sections discuss
this process of preprocessing to form the neighborhoods.
m2
IP1
m1
IP2
mi
Macro neighborhood
1
www.maxmind.com/en/home.
time series data. Yankov et al. (2007) discover unusual time series or discords
with a disk-aware algorithm.
First, most of these approaches consider autocorrelated segments as true
segments for pattern discovery. This may ignore heterogeneity in the discovery
process, possibly in the form of unequal width segments. Second, most of
these approaches do not quantify the unusualness of a patterns discovered in
terms of how distinct the pattern is from the data. Similar to spatial behavior,
temporal autocorrelation, and heterogeneity are a challenge.
Finally, in the area of spatiotemporal mining, several approaches have
explored the spatiotemporal nature of the data (such as Abraham and
Roddick 1999, Roddick and Spiliopoulou 1999, and Roddick et al. 2001).
As such, there does not exist a generalizable framework that facilitates the
accommodation of the properties of space and time in an effective and efficient
manner. Additionally, there is no consensus on spatial-dominant or temporal-
dominant analysis of spatiotemporal data. Thus, there is a need for a shift that
facilitates the analysis based on the properties of space and time (spatial,
temporal autocorrelation, and heterogeneity) in a unified framework.
Kaspersky reported that Iran is the most common place where the Flame
malware was discovered. Cases were also reported in Israel, Palestine,
Sudan, Syria, Lebanon, Saudi Arabia, and Egypt. Out of all computer systems
that were affected worldwide by Stuxnet, 60% of them belonged to Iran
(Halliday 2010, Burns 2012). Also, a recent analysis indicates that most online
schemes occur in Eastern Europe, East Asia, and Africa. These schemes and
phishing attacks are shifting to other countries like Canada, United States,
France, Israel, Iran, and Afghanistan (Bronskill 2012). Thus, the spatial elem-
ent plays a key role in strengthening the cyber infrastructure, not only com-
puter networks but also in power grids and industrial control systems.
Therefore, while identifying locations of cyberattack victims is important in
a power grid or computer network, it is also crucial to determine the location of
those responsible for these attacks, as it may have serious national security
implications. Cyberattacks may be targeted toward specific countries due to
sociopolitical factors. It is therefore important to identify the association
between a cyberattack, its target, and the origin location.
A spatial aspect to cyber data analysis can be brought through (a) identifi-
cation of spatial neighborhoods in the network structure to discover the impact
of an event in space, (b) studying influence of events in neighborhoods, and (c)
studying correlation and associations of events over time.
1.2
ISP switch
1
Server 2
Sharepoint
0.4
server
Database
server
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2
RELATIVE DISTANCE AS BANDWIDTH USAGE
and (b) cluster of network traffic points that are anomalous as a temporal
cluster with respect to other neighboring servers.
2
www.spamhaus.org/statistics/countries/. The world’s worst spam enabling countries, last
accessed March 2021.
3
https://news.netcraft.com/archives/2012/12/13/world-map-of-phishing-attacks.html. Phishiest
countries, last accessed March 2021.
4
http://geoiplookup.net/, last accessed August 2017.
unusual locations and routes. This can help users identify potential threat
locations, particularly when users are in a sensitive job or secure location.
128
distance between two vertices is the shortest path between the two vertices,
also referred to as geodesic path. A graph can be disconnected when some
parts of the graph are not connected by a path, or if a single node is discon-
nected it means that there is no edge connecting it to the graph. For example, in
the graph in Figure 9.2b, there are three separate (disconnected) components.
Centrality: Centrality in a graph structure indicates prominence of a node in
a graph. A node with high centrality will potentially be more influential.
A simple type of centrality is the degree centrality (Freeman 1978), which
indicates the total number of edges incident on a node in an undirected graph.
This can be divided into in-degree and out-degree for directed graphs.
Betweenness centrality (Freeman 1978) of a vertex v measures the number
of geodesic paths connecting vertices ij that are passing through v, divided by
the total number of shortest paths connecting ij. This is summed across all ij to
get the overall betweenness centrality of v. Akoglu et al. (2015) provides a
detailed survey of several different types of centrality, such as betweenness,
closeness, and eigenvector centrality, among others.
Cliques: Traditionally, a clique is defined as any complete subgraph such
that each pair of the vertices in the subgraph are connected by an edge, for
example nodes b, n, and g in Figure 9.3. Cliques may be mapped to coteries
(such as high-risk groups in the context of spreading a virus or DoS attack).
More importantly, the nodes among different cliques may overlap, which
indicates a higher relevance of those nodes in high-risk groups.
Figure 9.4 Using graph properties to evaluate consistency of nodes across time.
Given the adjacency matrix for the graphs in Figure 9.3, the visualization is
generated using NodeXL.1 Figure 9.3a presents a directed graph view, and
Figure 9.3b shows an undirected view of the graph. In Figure 9.3a, it is evident
that node k has a high out-degree, and nodes h, n, r, and t have a high in-
degree. Overall, vertex n has the highest degree. In terms of centrality in an
undirected graph vertex, n will be considered the most central node.
Another example of using graph properties is to evaluate consistency of
nodes over time. For example, Figure 9.4 depicts three time periods, T1, T2,
and T3, with their corresponding graphs. Each time period has a set of nodes
a–f that communicate with each other as depicted by the edges. The corres-
ponding graphs and degrees are shown in the figure. A degree threshold can be
used such that anything less than or equal to the threshold is not central. For
example, in this dataset, the threshold is two, then the central nodes are shown
that are equal or greater than this threshold. Node c, which is central in time
period T1, does not show up in the other time periods. As discussed in
Chapter 3, association rules and their related metrics of confidence and sup-
port, showing the likelihood and strength of associations or frequent co-
occurrences, can be utilized here. Association rules for each time period are
shown that provide a complementary view of the communication in this
example. For example, f, which is not a central node, is also not frequent, as
support is low. However, in all time periods, the rule f e is always present.
1
https://nodexl.com/.
Similarly, the nodes a and d are consistently central, and the rule a d is
consistently a strong rule in terms of support and confidence. On the other
hand, a e has high confidence, and both nodes are central; however, the
support is low compared to other rules with central nodes. Thus, multiple types
of analysis can be used to analyze the network traffic data to provide
complementary knowledge.
example, its unexpected absence from the network – can identify whether this
affects the densification of the network. If densification is tracked over a period
of time, it is possible to identify time points where the densification is
anomalous compared to the overall trend in the network. Such time periods
may indicate unusual changes in the network. Now if such changes are
consistent over multiple, large time periods (depicting periodicity, for example
every year in December), then this can be related to a trend; however, if this
happens in only one sample year, then it may indicate an anomalous
time period.
Diameter over time: Densification draws nodes closer to each other such
that the distance between those nodes that are far apart becomes smaller. As a
result, the diameter of the network reduces, a concept commonly described as
the shrinking diameter (Lescovec et al. 2005). Similar to densification,
changing the diameter of the network over time can provide insights about
the health of the network traffic. This analysis can be performed for the central
nodes, which is particularly helpful for massive datasets. The change in the
behavior of a critical node, for example its unexpected absence from the
network, can impact the diameter of the network. Thus, drastic changes in
the diameter, such as expanding diameter, can indicate malicious effects on the
central nodes.
A redundant router, on the other hand, can be highly similar to other routers
in terms of its connectivity. Taking a redundant router offline would probably
have little effect on the network connectivity. Such a redundant router can act
as a fail-safe if it is similar to some critical network routers. On the other hand,
this redundant router can be used for load balancing or reallocating it to other
useful roles.
Router connectivity can be considered as a graph, and similarity coefficients
can provide an intuitive method of determining whether a router is similar or
unique in relation to the other routers connected on the graph representing the
network. As discussed in Chapter 3, several similarity coefficients have been
studied (Lewis and Janeja 2011). An example use is shown in Figure 9.5 using
the Jaccard coefficient (JC). An example graph with various routers connected
is shown in Figure 9.5a. The corresponding adjacency matrix is also depicted
in Figure 9.5b. Pairwise similarity between each pair of the routers is com-
puted using JC, which is given as positive matches (1–1 match) divided by the
sum of positive matches and positive mismatches (0–1 or 1–0 mismatch) as
shown in Figure 9.5c. Thus, looking at the two rows of connectivity in the
adjacency matrix for routers R4 and R3, we can see that they have perfect set
of matches (1–1) and no mismatches (0–1 or 1–0). Thus, JC for similarity
between R3 and R4 is 1, and they can be seen as redundant routers that are
perfectly similar to replace each other’s role in terms of their connectivity. On
the other hand, when we look at the total similarity of each node and its degree
in Figure 9.5d, e respectively, we can see that nodes R5 and R6 have lowest
degree and lowest similarity, thus they are the most vulnerable nodes in terms
of connectivity. Similarly, nodes R1 and R2 are highly central but have only
some similarity. Thus, they are possible bottleneck nodes due to their unique
connectivity and highly central nature.
Such preemptive exploratory analysis using basic graph properties can help
facilitate the network management for better response to threats and recovery
from network impact. For instance, in the preceding example, if there is better
planning in terms of identifying bottlenecks and redundant nodes, the reme-
diation and recovery from a network-based cyberattack can be facilitated.
Humans are a key part of the security infrastructure (Wybourne et al. 2009).
Several studies have looked at the human aspect of cyber threats from the user
or attacker perspective. Given that more and more users use home computers,
systems are becoming vulnerable and susceptible to more cyberattacks,
Anderson et al. (2010) hypothesized that psychological ownership of one’s
computer is positively related to behavioral intentions to protect one’s own
computer. More and more research has suggested that social, political, eco-
nomic, and cultural (SPEC) conflicts are a leading cause for cyberattacks
(Gandhi et al. 2011). This chapter looks at the different key players’ view-
points in the cyberattack and how they can be strengthened to prevent and
mitigate the risks of the attack.
In general, there are two key parties at play in a cyberattack: a user who is a
victim and the attacker. In a cyberattack against a business, there are three key
parties in play: the business, a user or employee, and an attacker. In addition,
there may also be other types of employees involved, such as network adminis-
trators or system administrators who are configuring the systems, and pro-
grammers who are programming functionalities.
While this is a compartmentalized view of the real-world fluid scenarios, it
helps us understand the dynamics of an attack and study ways to prevent it.
Each of these parties have a set of constraints and factors that help facilitate or
restrict the perpetration of the attack.
A user or employee in a business comes with a certain educational back-
ground, experiences, and any lessons learned in the past. In addition, a user
also has certain psychological traits that impact how they respond to certain
types of requests, from clicking of links, downloading content, or providing
their credentials.
Businesses have to follow their business model, which could be highly
secure or more open. They have a mission and a set of principles they abide by.
137
Moreover, a business may or may not have a well-defined set of security policies.
Lastly, the types of business applications being run in a business may have their
own set of complexities and challenges.
In addition to having technological know-how, an attacker has certain
motivations and an end goal to achieve through perpetrating the attack. An
attacker may have certain psychological traits that lead them to instigate these
attacks not unlike other types of crimes. Greitzer et al. (2011) has implemented
a psychological reasoning for human behavior in cyberattacks and categorized
the attackers with disgruntlement, difficulty accepting feedback, anger man-
agement issues, disengagement, and disregard for authority.
Additionally, depending on the amount of resources an attacker may have,
the extent of the attack may vary from a single computer to a network or entire
business being targeted. An attacker targets a user of the business systems, be
it a technology worker or an end user, utilizing the business systems. Some of
these aspects of the key players in a cyberattack are depicted in Figure 10.1.
This chapter addresses certain types of cyberattacks that have a major influence
of the human perspective, such as phishing and insider threats. It outlines data
analytics methods to study human behaviors through models extracted from data.
together multiple facets from data and behavioral perspectives for discovering
insider threats. Insider threats remain one of the hardest types of threats to detect.
Studying how network traffic changes over time, which locations are the
sources, where is it headed, how are people generating this traffic, and how
people respond physiologically (such as through stress indicators) when
involved in these events – all these aspects become critical in distinguishing
the normal from the abnormal in the domain of cybersecurity. This requires
shifting gears to view cybersecurity as a people problem rather than a purely
technological problem.
As discussed in Chapter 2, several features can be utilized from disparate
domains, as shown in Figure 10.2, such as computer usage including CPU,
memory, and kernel modules, network traffic features including source, des-
tination IP, port, protocol, derived geolocation, and other location-related
features, such as geopolitical event information and physiological sensors
providing knowledge of affective behaviors, including features such as emo-
tion and stress variations. Each of these domains provides insights into the
workings of a networked information system over a period of time. Each
domain individually is not sufficient to indicate an insider threat. When
combined, these disparate data streams can facilitate detection of potentially
anomalous user traffic for deeper inspection. These features can be evaluated
individually and in conjunction to provide knowledge of potential insider
threats.
Figure 10.2 Integrating multiple domain datasets for insider threat detection.
Figure 10.2 indicates relating multiple factors for insider threat evaluation:
• A user’s systems usage changes over time; similarly, network traffic evolves
over time, and communication patterns change over time. These key
changes, which are deviant from the normal changes, can be associated with
anomalies in the network traffic and system usage.
• Any type of attack has common underpinnings of how it is carried out; this
has not changed from physical security breaches to computer security
breaches. Thus, data representing the user’s behavior from both the usage
of the systems and affective behaviors (such as stress indicators) provide
important knowledge. This knowledge can be leveraged to identify behav-
ioral models of anomalies where patterns of misuse can be identified.
• Studying multiple types of datasets and monitoring users through the appli-
cation of affective computing techniques (Picard 2003) can have ethical and
privacy implications. Effective adversarial monitoring techniques need to be
developed such that they are ethical and respect user privacy.
Utilizing data-based and human behavioral aspects of learning, new know-
ledge from the vast array of processes can lead to new insights of understand-
ing the challenges faced in this important domain of cybersecurity.
As such, there is a very limited number of studies that establish the relation-
ships between personality traits and IT security incidents where an insider or
an unknowing user was responsible for the perpetration of the attack.
Certain key aspects of human behavior and personality traits can manifest in
technological viewpoints, as follows:
• User interaction: How a user interacts with a system is telling of their
personalities and their preferences. One common example is setting pass-
words and saving passwords, which is considered to be one of the biggest
points of risk for systems (Yildirim and Mackie 2019). Setting the same
passwords for multiple applications is a big security threat and in some
regards a single point of failure. Saving passwords on the system, or on a
piece of paper, while convenient, may be a security threat, especially if the
device is located in a somewhat nonsecure location. Similarly, use of
mechanisms such as two-factor authentication have to be carefully evaluated
for the types of proposed users of the systems (Das et al. 2020).
• Interface design: A very busy interface can leave average nontechnical
users confused. This is true for new users who may be starting to use an
interface in a new job or for experienced users after a change from the
systems they are used to. In some cases, it is also possible that highly
technical systems with several parameters may leave users vulnerable to
Let us consider all the visitors V of a network N, for each visitor of the
network, V in N, a source IP address, IPsource, and a target IP address, IPtarget,
are displayed in the IDS log. The alert information indicates the actions from
the source IP addresses. The IDS logs do not always categorize the attacks as
such since they observe it in isolation. The alerts in combination may be
potentially related to identify the collective behavior of attackers. Let us
consider three user patterns and build them into three models to exhibit
attacker behavior (Chen and Janeja 2014), as shown in Figure 10.4.
• Model 1: When a cyberattack occurs, the attacker usually will not be
successful the first time. The attacker will attempt different methods in order
to gain access to the target. As discussed in Liu et al. (2008), when an attack
happens, it takes multiple steps: probe, scan, intrusion, and goal. Let us say
each attacker is represented by a unique IP source, and each attempt is
differentiated by the alert messages. If the time when the attack will happen
is not known, then data can be divided into temporal neighborhoods, as
discussed in Chapter 3. This will reduce the number of instances analyzed
each time. It also helps to show the attacks in a smaller subset of the data. If
the count of number of unique attempts between IP1 and IP2 in a given
Interval(i) is greater than a threshold, then this action can be considered
anomalous and this source IP address is a potential attacker. Here
C represents empirical criteria to differentiate attacks and nonattacks. This
is based on the context of a network. If it is at a commercial network, C will
be larger than in a private network. Heuristically, we can consider C as three
times the average attempts of regular users. If an IPsource is identified as an
attacker IP address, its activities before or after the actions will be considered
part of the attack because its other incidents are likely to be in the probe or
scanning stage and in preparation for the following attacks.
• Model 2: When an attack happens, the target address usually is unique or
relatively a few addresses. The attackers will target the unique targets
persistently until success or failure. If a target IP address is accessed by a
much higher number of unique IP addresses than usual within a short period
of time, this target IP address is potentially being accessed anomalously.
• Model 3: A common attack method is when there are a massive number of
attempts in a short time in order, for example, to obtain the password
information. Hence, if an IPtarget is experiencing much higher than normal
traffic from a single or a few IPsource, this IPtarget is under attack. This
represents the percentage of total traffic between two IP addresses in a given
interval (i). If this exceeds a threshold C, an alert can be raised. C can be an
empirical criteria to differentiate attacks and nonattacks, and this can be
based on experimental assessments and evaluating historic traffic patterns.
All these models are based on intuitions on potential attacker behavior,
represented by data on a network. Other models can be developed based on
psychological behavioral models such as those discussed in Chapter 5 for
social engineering threats.
This chapter discusses some key future directions and new and emerging
threats in a complex landscape of machine learning and AI models. This
chapter outlines cyberphysical systems, with a subcategory of the Internet of
Things (IoT), multidomain mining, and how outputs from disparate systems
can be fused for an integrative anomaly detection. This chapter also outlines
deep learning and generative adversarial networks and discusses emerging
challenges in model reuse. These areas are selected due to the emerging threats
and technology landscapes. At the end of this chapter, we also present an
important topic of ethical thinking in the data analytics process.
147
systems will also have a higher emphasis on reliability and accurate perform-
ance. System integrity and availability have much more emphasis, as these
systems cannot be offline for long periods. In addition, there are redundancy
requirements in events of failure. All these are also brought into focus with the
interactions with the physical domains (Stouffer et al. 2009).
One of the key vulnerability identified in the Guide to Industrial Control
Systems (ICS) Security (Stouffer et al. 2009) is a lack of redundancy in critical
components that could lead to a single point of failure. The signals sent
between the various sensors and the control units can be mapped similar to a
network communication flow between IP addresses. Let’s consider the scen-
ario in Figure 11.2, where the various signals or signal pathways are shown.
The ICS systems may be comprised of sensors measuring a signal and
actuators or movers acting on the signal. Anomaly detection is an important
security application to evaluate risks in ICS, such as water monitoring sensor
networks (Lin et al. 2018).
Sensor networking (such as in spatial data, discussed in Chapter 8) and its
applications can be adapted to industrial control systems (ICS) and computer
networks for supporting cybersecurity. Let us consider a part of the sensor
network where the nodes A, B, C, and D (comprising of sensors and actuators,
with the lines indicating the potential connections and communication between
them) are shown in Figure 11.2a, b. We evaluate the nodes based on connec-
tions or number of links in Figure 11.2a. Using concepts of graph-based
metrics, as discussed in Chapter 9, we can evaluate these communication
patterns. Here nodes A and C can be considered important as they are
connecting hubs, where a large number of edges are incident, indicating a
high degree of communication. Similarly, node B can be considered important
IoT as a class of CPS (Nunes et al. 2015) or seen it as intersecting with CPS
(Calvaresi et al. 2017).
IoT has indeed become a big share of the CPS space due to the explosion of
the number of devices and advancements of smart devices and sensors. Even
though estimates vary, they are projected to grow upto 31 billion devices
worldwide by 2025 (Statista 2021). This is a very large number of devices
by any estimate and creates more security challenges given the highly unregu-
lated and nonstandard market for IoT devices. One such space where smart
devices have created this interesting intersection between cyberphysical and
Internet of Things is a smart car. We are making our vehicles smart: fully
connected with the internet to view real-time traffic and weather, talk on the
phone with Bluetooth, listen to the radio, watch video, as well as get real-time
status of automobile’s mechanical functions. However, this smart interface
comes at a price, which is the vulnerability to threats as well as malfunctions to
mechanical parts of the vehicle.
Other areas where IoT has taken a big role is in home security systems with
fire alarms, CO monitors, door and garage sensors, alarms, and temperature
control sensors. Figure 11.3 depicts the complex landscape of IoT.
Let us consider a smart home with a series of devices measuring various
phenomena around them. In such a setting, depicted in Figure 11.3, several
smart devices collect the information from a location with different levels of
the data would be limited. In such a scenarios, we can augment the header
information with other types of data. One such view point is that of geospatial
data, which can enhance the knowledge of the IP session or even the IP
reputation score itself.
Current reputation systems pursue classification into a white and black list,
i.e., binary categorization. Separate lists for URLs and IP addresses are
maintained. Some tools that provide rudimentary reputation services include
Cisco SenderBase (www.senderbase.org/), VirusTotal IP reputation (www
.virustotal.com/), and Spam and Open Relay Blocking System (SORBS)
(www.sorbs.net/). Most of these tools and lists are based on smaller feature
sets from one source, with no correlation across other data sources. Such a
shortcoming degrades a system’s effectiveness for detecting sophisticated
attacks and terminating malicious activities, such as zero-day attacks.
The set of attributes that the reputation scoring considers can be enriched,
providing an expressive scoring system that enables an administrator to under-
stand what is at stake, and increasing robustness by correlating the various
pieces of information while factoring in the trustworthiness of their sources. IP
reputation scoring model can be enriched using network session features and
geocontextual features such that the incoming session IP is labeled based on
the most similar IP addresses, both in terms of network features and geocon-
textual features (Sainani et al. 2020). This can provide better threat assessment
by considering not only the network features but also additional context, such
as the geospatial context information collected from external sources.
Indeed, in some countries, networks may encounter or even host large
quantities of attacks as compared to others. This may be due to shortage of
cybersecurity expertise, level of development, the abundance of resources,
corruption levels, or the computing culture in these countries (Mezzour
2015). Identifying these factors and quantifying them can provide insights into
security policies and have a positive impact on the attack incidents. These
scenarios not only impact the countries facing such cybersecurity crises but
also impact other countries and end users due to the level of connectivity in
today’s day and age. In addition, studies have also identified regions across the
world that are prone to hosting certain types of attacks. For example, studies
have indicated that Trojans, worms, and viruses are most prevalent in Sub-
Saharan Africa (Mezzour 2015), some families of malware preferentially target
Europe and US (Caballero et al. 2011). Yet other studies (Wang and Kim
2009) have explained broad categories of worldwide systemic risks and
country-specific risks where country-specific risks include aspects of econ-
omy, technology, industry, and international cooperation in enforcement of
laws and policies.
Thus, geospatial data not only provide additional context but offer a frame-
work to accommodate additional geopolitical information that often plays a big
role in hactivism or politically inspired attacks. Figure 11.4 provides a set of
rich sources to access geospatial data for countries and in some cases even at a
granular level of cities. Some of these sources, such as Ip2location, provide a
way to identify a user location based on IP address in a nonintrusive manner.
Several other data sources such as World Bank Open Data (https://data
.worldbank.org/), PIOR-GRID (http://grid.prio.org/#/), and the Armed Conflict
Location and Event Data Project (ACLED) (www.acleddata.com/) data provide
sociopolitical and geopolitical conflict data. Such data can be used to create
geospatial characterization of regions (for example, using methods proposed by
Janeja et al. 2010, as discussed in Chapter 8). When an IP address is encoun-
tered, it can be geolocated using the IP location databases such as Ip2location or
Maxmind. As shown in Figure 11.5, based on its geolocation the location score
from the characterization can be attributed to it. In addition the geospatial
attributes for this region can be appended to the network attributes for this IP
(Sainani 2018, 2020). Any additional security intelligence can be appended to
provide an aggregate reputation score to this IP.
The data heterogeneity in terms of types of attributes, namely categorical
versus continuous, can be addressed using methods that are capable of hand-
ling mixed attribute datasets (such as Misal et al. 2016).
Evaluate
sources Geospatial Data preprocessing
Geospatial
sources features
Geospatial
Geospatial
Feature selection characterization reputation
IP address IP geolocation
(MaxMind etc.) Geospatial score
Aggregate reputation
Security intelligence Multi-attribute
score
reputation scoring
Soware Soware
Agency A module 1 Agency B module 2
Level1
Alarms
originating from
Alarms
Level 2
agency B
originating for
agency A
Iterative
alarm fusion
Alarms
originating
from agency
C
Alert Alert
We can perform clustering of the IDS log data after preprocessing. The data
are in the form of parsed IDS alarms, where each alarm is a data point with a
priority level: high, medium, or low. These data are collected from multiple
IDS sources. In this case, the premise is that a low-priority alarm may not
really be a low-priority alarm when seen in conjunction with some of the other
alarms from different IDS systems. Since we are looking at all alarms from
multiple IDS systems, we have the opportunity to study the similarity between
alarms and then judge whether an alarm is truly a low priority or could be
potentially high priority. The end goal is that multiple similar low-priority
alarms could potentially indicate cyberattacks.
To distinguish whether these alarms are true alerts, we would need to
examine if the sources have similar characteristics. A source characterization
based on the features associated with sources can be performed. Here “source”
refers to either IDS systems associated with various parts of an organization or
even multiple organizations; similarly, “source” could also refer to an applica-
tion source or even an agency in a multiagency scenario. Sources could also
refer to sensors in an ICS example, as discussed earlier in this chapter. This
approach is adaptable if the alarm can be sufficiently preprocessed in a uniform
manner across different types of alarm data.
Clusters of alarms based on attributes can be generated. For each cluster, the
set of sources associated with each cluster and the alarm cardinality of each
cluster, which is the number of alarms in a cluster, are identified:
• If only one source is generating alarms in a cluster, then the source is flagged
for investigation. If alarm cardinality of a cluster is greater than a preset
threshold and if all of the alarms in the cluster are from one source, then this
can lead to raising an alert. Let us consider a sensor generating alarms in an
ICS. If a single sensor is generating alarms, then this sensor should be
investigated. Moreover, if data from the sensors are clustered and readings
from one sensor dominates the cluster, this could further raise the alert.
• Next, for every source in a particular characterization (based on similarly
behaving sources) such that this source is not equal to the source in the
cluster, if the source s is in not in any other cluster it implies that potentially
the other sources in the characterization as well are not generating alarms.
This cluster can be flagged as a possible false positive for further investi-
gation. This really means that if a source is part of a cluster but the other
sources from this characterization are not in any of the clusters, then this
source may be a false positive.
After removing the clusters representing the false positives, among the
remaining clusters of alarms, two cases may occur: (a) a cluster comprises a
significant number of alarms, but these alarms do not belong to one single
source in a characterization; and (b) no single cluster has a significant number
of alarms:
• In the case of the first scenario, an aggregated alert can be generated if there
is any cluster that has alarms greater than a preset threshold.
• In the case of the second scenario, we can identify the overlap between the
clusters, using measures such as silhouette coefficient and entropy, to find
mutually related alarms to generate an aggregated alert. The greater the
overlap, the more strongly related are the alarms.
Once an alert is generated, sources can be associated with it. The set of
sources of the alarms is the sources of the alarms in the cluster, or in the other
scenario, the sources of the alarms in the overlap. These sources can be
aggregated by grouping the feature vectors of all the sources, thus generating
a composite feature vector of the source of the alert.
emerged where models are checked in by researchers and others are reusing
these. Examples include the Open Neural Network Exchange (ONNX)
format1, Azure AI gallery2, and Caffe Model Zoo3. It is only a matter of time
before models are as easily available and searchable as public datasets.
However, there is a clear danger in reusing machine learning models without
understanding the provenance of the models and the model pipelines. This has
been a well-studied problem in software reuse (Paul and Taylor 2002, Kath
et al. 2009). Vulnerability databases have been established to study known
software defects to prevent propagating them to other users and applications. If
reuse of vulnerable software and code in general is not prevented, it can lead to
massive disruptions and attacks such as in the case of the heart bleed bug
(Carvalho et al. 2014).
This level of understanding has been well established in machine learning
models. Recently some studies have shown that manipulating even simple
building blocks of a machine learning model can lead to model reuse attacks
(Ji et al. 2018) where corrupted models can lead the host systems to behave
anomalously when certain types of input triggers are introduced into the
system. These threats in model reuse and training data manipulation (Ji et al.
2018) can be summarized as shown in Figure 11.11.
To understand the impacts of such an attack, consider an autonomous
vehicle that has to evaluate several images from multiple camera inputs in an
ensemble learning system. In such a complex system, if a trigger results in an
anomalous decision, this can have far-reaching impacts for the immediate
1 2
http://onnx.ai/ https://gallery.azure.ai//
3
http://caffe.berkeleyvision.org/model_zoo.html
vicinity where the vehicle is driving and also in the long term for the company
that has developed the autonomous vehicle and the systems processing the
images. Indeed, studies have shown how researchers have hacked into driver-
less cars (Versprille 2015) and studied potential risks in smart car functional-
ities (Weimerskirch and Dominic 2018). Model reuse attacks can bring in a
new wave of cyberattacks where complex systems that rely on machine
learning behave erratically when certain trigger inputs are introduced.
traditional privacy protections. There may be yet other types of data that may
have impacts for the society as a whole. Consider data such as location data
collected on a phone, driving data collected through on board diagnostics
(OBD), or atmospheric data collected through sensors in a community. Now
let us consider the types of patterns one may discover if they have access to
such data. For example, the GPS data on the phone may pinpoint the geospatial
coordinates of secure locations where an individual is carrying the phone;
OBD driving data may provide insights into driving behavior; systems may
falsely label anomalous behavior. Data collection is another very important
element where ethical considerations (in addition to legal considerations)
are important.
As we think through the examples, we can also think of the uses of the data
while integrating them with data from other sources. For example, if the data of
IP addresses are combined with geographical distribution of demographics,
does that identify vulnerable populations, especially if the system being used is
not robust to false positives? Can the driver behavior, discovered through OBD
or the information on the neighborhoods that the driver passes through, lead to
additional security risks? Even in these limited examples we can see begin-
nings of questions of bias and risk to individuals as more and more systems
relying on data analytics are being used. Indeed, such systems are beginning to
be used in augmented decision making that can impact lives and livelihoods.
We all have come to accept recommendations made by data-driven analytics
algorithms in our day-to-day lives, such as in spam labels for emails and fraud
alerts from our credit card companies. Things start getting tricky if these data
analytics systems are being used to make decisions that would generally take a lot
of thought and deliberation – for example, who should receive bail or not, who
could be a criminal or not, or if a facial recognition system should be used or not.
The vast complexity of data, the data’s easy availability, and algorithmic
tools requires vigilance around access, privacy, provenance, curation, inter-
operability, issues of fairness, accountability, and transparency. In studying
data analytics algorithms and systems such as for the security alert mechanisms
discussed in this book, it is clear that there are several checkpoints where a
system follows a pathway based on the choices that the creator of that system
made or the user of the system made. It is important to know what those
choices are and how they can impact lives and livelihoods. It is also important
to explain the outcomes from such systems and ensure they are not dispro-
portionately impacting certain groups or individuals. Just like we would
carefully hire an employee, supervise them, and give them advice in making
critical decisions, we need to have sensible extensions so that we do the same
for data analytics algorithms and systems.
Abad, C., Taylor, J., Sengul, C., et al. (2003). “Log correlation for intrusion detection: a
proof of concept.” 19th Annual Computer Security Applications Conference, 2003.
Proceedings., 2003, pp. 255–264, doi: 10.1109/CSAC.2003.1254330.
Abdellahi, S., Lipford, H. R., Gates, C., and Fellows, J. (2020). “Developing a User
Interface Security Assessment method.” USENIX Symposium on Usable Privacy
and Security (SOUPS) 2020. August 9–11, 2020, Boston.
Abedin, M., Nessa, S., Khan, L., Al-Shaer, E., and Awad, M. (2010). “Analysis of
firewall policy rules using traffic mining techniques.” International Journal of
Internet Protocol Technology, 5(1–2), 3–22.
Abraham, T. and Roddick, J. F. (1999). “Survey of spatio-temporal databases.”
GeoInformatica, 3(1), 61–99.
Aggarwal, C. C. (2017). “Spatial outlier detection.” Outlier Analysis. Springer
International Publishing, 345–368.
Aggarwal, C. C. and Yu, P. S. (2001, May). “Outlier detection for high dimensional
data.” ACM Sigmod Record, 30(2) 37–46.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. (1998). “Automatic sub-
space clustering of high dimensional data for data mining applications.”
Proceedings of the 1998 ACM SIGMOD International Conference on
Management of Data. ACM Press, 94–105.
Agrawal, R. and Srikant, R. (1994). “Fast algorithms for mining association rules.”
Proceedings of the 20th International Conference on Very Large Data Bases,
VLDB, J. B. Bocca, M. Jarke, and C. Zaniolo (Eds.). Morgan Kaufmann, 487–499.
Agarwal, S., Farid, H., Gu, Y., et al. (2019, June). “Protecting world leaders against
deep fakes.” Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops, 38–45.
Ahmed, M., Mahmood, A. N., and Hu, J. (2016). “A survey of network
anomaly detection techniques.” Journal of Network and Computer Applications,
60, 19–31.
Akamai. (2016). “Q4 2016 state of the internet/security report.” www.akamai.com/
newsroom/press-release/akamai-releases-fourth-quarter-2016-state-of-the-inter
net-connectivity-report. Last accessed November 2021.
165
Breunig, M. M., Kriegel, H. P., Ng, R. T., and Sander, J. (1999, September).
“Optics-of: identifying local outliers.” Principles of Data Mining and
Knowledge Discovery. PKDD 1999, J. M. Żytkow and J. Rauch (Eds.)
Lecture Notes in Computer Science, vol. 1704. Springer. doi: 10.1007/978-3-
540-48247-5_28.
Brigham, E. O. (2002). The Fast Fourier Transform. Prentice Hall.
Bright, A. (2007). “Estonia accuses Russia of ‘cyberattack’.” www.csmonitor.com/
2007/0517/p99s01-duts.html. Last accessed March 2020.
Bronskill, J. (2012, November 9). “Govt fears Canada becoming host country for cyber-
attacker.” https://winnipeg.ctvnews.ca/canada-becoming-host-country-for-cyber-
attackers-government-fears-.1032064. Last accessed November 2021.
Burns, C. (2012, June 1). “Stuxnet virus origin confirmed: USA and Isreali govern-
ments.” www.slashgear.com/stuxnet-virus-origin-confirmed-usa-and-isreali-gov
ernments-01231244/. Last accessed November 2021.
Caballero, J., Grier, C., Kreibich, C., and Paxson, V. (2011, August). “Measuring pay-
per-install: the commoditization of malware distribution.” Proceedings of the 20th
USENIX Conference on Security (SEC'11), 13.
Cai, L. and Hao, C. (2011). “TouchLogger: inferring keystrokes on touch screen from
smartphone motion.” HotSec, 11(2011), 9–9.
Caldwell, D., Gilbert, A., Gottlieb, J., et al. (2004). “The cutting EDGE of IP router
configuration.” ACM SIGCOMM Computer Communication Review, 34(1),
21–26.
Calvaresi, D., Marinoni, M., Sturm, A., Schumacher, M., and Buttazzo, G. (2017,
August). “The challenge of real-time multi-agent systems for enabling IoT and
CPS.” Proceedings of the International Conference on Web Intelligence. ACM,
356–364.
Cao, L., Yang, D., Wang, Q., Yu, Y., Wang, J., and Rundensteiner, E. A. (2014,
March). “Scalable distance-based outlier detection over high-volume data
streams.” 2014 IEEE 30th International Conference on Data Engineering
(ICDE) . IEEE, 76–87.
Carvalho, M., DeMott, J., Ford, R., and Wheeler, D. A. (2014). “Heartbleed 101.” IEEE
Security & Privacy, 12(4), 63–67.
Chandola, V., Banerjee, A., and Kumar, V. (2009). “Anomaly detection: a survey.”
ACM Computing Surveys (CSUR), 41(3), 15.
Chandrashekar, G. and Sahin, F. (2014). “A survey on feature selection methods.”
Computers & Electrical Engineering, 40(1), 16–28.
Chapman, P., Clinton, J., Kerber, R., et al. (2000). CRISP-DM 1.0, Step-by-Step Data
Mining Guide. CRISP-DM Consortium; SPSS: Chicago, IL, USA.
Check Point. (2020). “Threat map.” https://threatmap.checkpoint.com/. Last accessed
March 2020.
Chen, M., Mao, S., and Liu, Y. (2014). “Big data: a survey.” Mobile Networks and
Applications 19(2), 171–209.
Chen, S. and Janeja, V. P. (2014). “Human perspective to anomaly detection for
cybersecurity.” Journal of Intelligent Information Systems, 42(1), 133–153.
Cheok, R. (2014). “Wire shark: a guide to color my packets detecting network recon-
naissance to host exploitation.” GIAC Certification Paper. SANS Institute Reading
Room.
Cheswick, B. (1992, January). “An evening with Berferd in which a cracker is lured,
endured, and studied.” Proceedings of the Winter USENIX Conference, San
Francisco, 20–24.
Chi, M. (2014). “Cyberspace: America’s new battleground.” www.sans.org/reading-
room/whitepapers/warfare/cyberspace-americas-battleground-35612. Last
accessed November 2021.
Chien, E. and O’Gorman, G. (2011). “The nitro attacks, stealing secrets from the
chemical industry.” Symantec Security Response (2011), 1–8.
CIA. (2021). “World Factbook.” Last accessed November 2021.
Cleary, J. G. and Trigg, L. E. (1995). “K*: an instance-based learner using an entropic
distance measure.” Machine Learning Proceedings 1995, A. Prieditis and S.
Russell (Eds.). Morgan Kaufmann, 108–114.
Cloud Security Alliance. (2014). Big data taxonomy. https://downloads
.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Taxonomy.pdf. Last accessed
April 13, 2017.
Cohen, E., Datar, M., Fujiwara, S., et al. (2001). “Finding interesting associations
without support pruning.” IEEE Transactions on Knowledge and Data
Engineering, 13(1), 64–78.
Cohen, W. W. (1995, July). “Fast effective rule induction.” Proceedings of the Twelfth
International Conference on Machine Learning. Morgan Kaufmann, 115–123.
Cooper, G. F. and Herskovits, E. (1992). “A Bayesian method for the induction of
probabilistic networks from data.” Machine Learning, 9, 309–347.
Cortes, C. and Vapnik, V. (1995). “Support-vector networks.” Machine Learning, 20
(3), 273–297.
Cover, T. and Hart, P. (1967). “Nearest neighbor pattern classification.” IEEE
Transactions on Information Theory, 13(1), 21–27.
Dark Reading. (2011). “PNNL attack: 7 lessons: surviving a zero-day attack.” www
.darkreading.com/attacks-and-breaches/7-lessons-surviving-a-zero-day-attack/d/d-
id/1100226. Last accessed March 2020.
Darwish, A. and Bataineh, E. (2012, December 18–20). “Eye tracking analysis of
browser security indicators.” 2012 International Conference on Computer
Systems and Industrial Informatics (ICCSII), 1, 6. doi: 10.1109/
ICCSII.2012.6454330. Last accessed November 2021.
Das, A., Ng, W.-K., and Woon, Y.-K. 2001. “Rapid association rule mining.”
Proceedings of the Tenth International Conference on Information and
Knowledge Management. ACM Press, 474–481.
Das, S., Kim, A., Jelen, B., Huber, L., and Camp, L. J. (2020). “Non-inclusive online
security: older adults’ experience with two-factor authentication.” Proceedings of
the 54th Hawaii International Conference on System Sciences, 6472.
Dash, M. and Liu, H. (1997). “Feature selection for classification.” Intelligent Data
Analysis, 1(1–4), 131–156.
Davies, C. “Flame cyber-espionage discovered in vast infection net.” (2012, May 28).
www.slashgear.com/flame-cyber-espionage-discovered-in-vast-infection-net-
28230470/. Last accessed November 2021.
Davinson, N. and Sillence, E. (2010). “It won’t happen to me: Promoting secure
behaviour among internet users.” Computers in Human Behavior, 26(6),
1739–1747.
Davinson, N. and Sillence, E. (2014). “Using the health belief model to explore users’
perceptions of ‘being safe and secure’ in the world of technology mediated
financial transactions.” International Journal of Human–Computer Studies, 72
(2), 154–168.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). “Maximum likelihood from
incomplete data via the EM algorithm.” Journal of the Royal Statistical Society.
Series B (Methodological), 39(1), 1–22.
Deokar, B. and Hazarnis, A. (2012). “Intrusion detection system using log files and reinforce-
ment learning.” International Journal of Computer Applications 45(19), 28–35.
Dey, S., Janeja, V. P., and Gangopadhyay, A. (2009). “Temporal neighborhood dis-
covery through unequal depth binning.” IEEE International Conference on Data
Mining (ICDM’09), 110–119.
Dey, S., Janeja, V. P., and Gangopadhyay, A. (2014). “Discovery of temporal neigh-
borhoods through discretization methods.” Intelligent Data Analysis, 18(4),
609–636.
Ding, Y., Yan, E., Frazho, A., and Caverlee, J. (2009, November). “Pagerank for
ranking authors in co-citation networks.” Journal of the American Society for
Information Science and Technology, 60(11), 2229–2243.
Domo. (2017). “Data never sleeps.” www.domo.com/learn/data-never-sleeps-5?aid=
ogsm072517_1&sf100871281=1. Last accessed November 2021.
Drinkwater, D. (2016). “Does a data breach really affect your firm’s reputation?” www
.csoonline.com/article/3019283/data-breach/does-a-data-breach-really-affect-
your-firm-s-reputation.html. Last accessed June 2017.
Duchene, F., Garbayl, C., and Rialle, V. (2004). “Mining heterogeneous multivariate
time-series for learning meaningful patterns: application to home health telecare.”
arXiv preprint cs/0412003.
Duczmal, L. and Renato, A. (2004). “A simulated annealing strategy for the detection
of arbitrarily shaped spatial clusters.” Computational Statistics and Data Analysis,
45(2), 269–286.
Eberle, W., Graves, J., and Holder, L. (2010). “Insider threat detection using a graph-
based approach.” Journal of Applied Security Research, 6(1), 32–81.
ENISA (European Union Agency for Network And Information Security). (2016, Jan).
“Threat Taxonomy: a tool for structuring threat information.” https://library
.cyentia.com/report/report_001462.html. Last accessed November 2021.
Ester, M., Frommelt, A., Kriegel, H.-P., and Sander, J. (1998). “Algorithms for
characterization and trend detection in spatial databases.” Proceedings of the
Fourth International Conference on Knowledge Discovery and Data Mining
(KDD'98), 44–50.
Ester, M., Kriegel, H., and Sander, J. (1997). “Spatial data mining: a database
approach.” The 5th International Symposium on Advances in Spatial Databases,
Springer-Verlag, 47–66.
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996, August). “A density-based
algorithm for discovering clusters in large spatial databases with noise.” KDD, 96
(34), 226–231.
Estevez-Tapiador, J. M., Garcia-Teodoro, P., and Diaz-Verdejo, J. E. (2004). “Anomaly
detection methods in wired networks: a survey and taxonomy.” Computer
Communications, 27(16), 1569–1584.
Guha, S., Rastogi, R., and Shim, K. (2000). “ROCK: a robust clustering algorithm for
categorical attributes.” Information Systems, 25(5), 345–366.
Gupta, H., Sural, S., Atluri, V., and Vaidya, J. (2016). “Deciphering text from touchsc-
reen key taps.” IFIP Annual Conference on Data and Applications Security and
Privacy. Springer International Publishing, 3–18.
Guralnik, V. and Srivastava, J. (1999). “Event detection from time series data.”
KDD’99: Proceedings of the Fifth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 33–42.
Haining, R. (2003). Spatial Data Analysis: Theory and Practice. Cambridge University Press.
Halevi, T., Lewis, J., and Memon, N. (2013, May). “A pilot study of cyber security and
privacy related behavior and personality traits.” Proceedings of the 22nd
International Conference on World Wide Web Companion. International World
Wide Web Conferences Steering Committee, 737–744.
Halliday, J. (2010, September 24). “Stuxnet worm is the ‘work of a national govern-
ment agency’.” www.guardian.co.uk/technology/2010/sep/24/stuxnet-worm-
national-agency. Last accessed November 2021.
Han, J., and Fu, Y. (1995, September). “Discovery of multiple-level association rules
from large databases.” VLDB’95, Proceedings of 21th International Conference
on Very Large Data Bases, U. Dayal, P. Gray, and S. Nishio (Eds.). Morgan
Kaufmann, 420–431.
Han, J., Pei, J., and Yin, Y. (2000, May). “Mining frequent patterns without candidate
generation.” ACM Sigmod Record, 29(2), 1–12.
Hellerstein, J. L., Ma, S., and Perng, C.-S. (2002). “Discovering actionable patterns in
event data.” IBM Systems Journal, 41(3), 475–493.
Heron, S. (2007). “The rise and rise of the keyloggers.” Network Security 2007(6), 4–6.
Hinneburg, A., and Keim, D. A. (1998, August). “An efficient approach to clustering in
large multimedia databases with noise.” KDD, 98, 58–65.
Hoffman, S. (2011). “Cyber attack forces internet shut down for DOE lab, on July 8.”
www.crn.com/news/security/231001261/cyber-attack-forces-internet-shut-down-
for-doe-lab.htm. Last accessed February 23, 2014.
Hong, J., Liu, C. C., and Govindarasu, M. (2014). “Integrated anomaly detection for cyber
security of the substations.” IEEE Transactions on Smart Grid, 5(4), 1643–1653.
Hu, M. and Liu, B. (2004, August). “Mining and summarizing customer reviews.”
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. ACM, 168–177.
Hu, Z., Baynard, C. W., Hu, H., and Fazio, M. (2015, June). “GIS mapping and spatial
analysis of cybersecurity attacks on a Florida university.” 2015 23rd International
Conference on Geoinformatics. IEEE, 1–5.
Hu, Z., Wang, H., Zhu, J., et al. (2014). “Discovery of rare sequential topic patterns in
document stream.” Proceedings of the 2014 SIAM International Conference on
Data Mining, 533–541.
Huang, Y., Pei, J., and Xiong, H. (2006). “Mining co-location patterns with rare events
from spatial data sets.” GeoInformatica, 10(3), 239–260.
Hussain, M., Al-Haiqi, A., Zaidan, A. A., Zaidan, B. B., Kiah, M. M., Anuar, N. B., and
Abdulnabi, M. (2016). “The rise of keyloggers on smartphones: A survey and
insight into motion-based tap inference attacks.” Pervasive and Mobile
Computing, 25, 1–25.
Ingols, K., Lippmann, R., & Piwowarski, K. (2006, December). “Practical attack graph
generation for network defense.” Computer Security Applications Conference,
2006. ACSAC’06. 22nd Annual. IEEE, 121–130.
Iyengar, V. S. (2004). “On detecting space-time clusters.” KDD’04: Proceedings of the
Tenth ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining. ACM Press, 587–592.
Jain, A. K. (2010). “Data clustering: 50 years beyond K-means.” Pattern Recognition
Letters, 31(8), 651–666.
Jakobsson, M. (2007). “The human factor in phishing.” Privacy & Security of
Consumer Information, 7(1), 1–19.
Janeja, V. P. (2019). Do No Harm: An Ethical Data Life Cycle, Sci on the Fly. AAAS.
Janeja, V. P., Adam, N. R., Atluri, V., and Vaidya, J. (2010). “Spatial neighborhood
based anomaly detection in sensor datasets.” Data Mining and Knowledge
Discovery, 20(2), 221–258.
Janeja, V. P. and Atluri, V. (2005). “LS3: A linear semantic scan statistic technique for
detecting anomalous windows. Proceedings of the 2005 ACM Symposium on
Applied Computing, 493–497.
Janeja, V. P. and Atluri, V. (2005). “FS3: A random walk based free-form spatial scan
statistic for anomalous window detection.” Fifth IEEE International Conference
on Data Mining (ICDM’05). IEEE Computer Society, 661–664.
Janeja, V. P, and Atluri, V. (2008). “Random walks to identify anomalous free-form
spatial scan windows.” IEEE Transactions on Knowledge and Data Engineering,
20(10), 1378–1392.
Janeja, V. P. and Atluri, V. (2009). “Spatial outlier detection in heterogeneous neigh-
borhoods.” Intelligent Data Analysis, 13(1), 85–107.
Janeja, V. P., Azari, A., Namayanja, J. M., and Heilig, B. (2014, October).
“B-dids: Mining anomalies in a Big-distributed Intrusion Detection System.”
In 2014 IEEE International Conference on Big Data (Big Data) (pp. 32–34).
IEEE.
Jarvis, R. A. and Patrick, E. A. (1973). “Clustering using a similarity measure based
on shared near neighbors.” IEEE Transactions on Computers, 100(11),
1025–1034.
Jha, S., Sheyner, O., and Wing, J. (2002). “Two formal analyses of attack graphs.” Computer
Security Foundations Workshop, 2002. Proceedings. 15th IEEE. IEEE, 49–63.
Ji, Y., Zhang, X., Ji, S., Luo, X., and Wang, T. (2018, October). “Model-reuse attacks
on deep learning systems.” Proceedings of the 2018 ACM SIGSAC Conference on
Computer and Communications Security. ACM, 349–363.
Joachims, T. (2002, July). “Optimizing search engines using clickthrough data.”
Proceedings of the Eighth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 133–142.
John, O. P. and Srivastava, S. (1999). “The big-five trait taxonomy: history,
measurement, and theoretical perspectives.” Handbook of Personality:
Theory and Research, vol. 2., L. A. Pervin and O. P. John (Eds.). Guilford
Press, 102–138.
Kang, I., Kim, T., and Li, K. (1997). “A spatial data mining method by Delaunay
triangulation.” Proceedings of the 5th ACM International Workshop on Advances
in Geographic Information Systems. ACM, 35–39.
Kang, U., Tsourakakis, C., Appel, A., Faloutsos, C., and Leskovec, J. (2010). “Radius
plots for mining tera-byte scale graphs: algorithms, patterns, and observations.”
Proceedings of the 2010 SIAM International Conference on Data Mining (SDM),
548–558.
Kang, U., Tsourakakis, C., and Faloutsos, C. (2009). “Pegasus: a peta-scale graph
mining system – implementation and observations.” 2009 Ninth IEEE
International Conference on Data Mining, 229–238.
Karypis, G., Han, E. H., and Kumar, V. (1999). “Chameleon: hierarchical clustering
using dynamic modeling.” Computer, 32(8), 68–75.
Kaspersky. (2020). “Cyberthreat real-time map.” https://cybermap.kaspersky.com/.
Last accessed November 2020.
Kath, O., Schreiner, R., and Favaro, J. (2009, September). “Safety, security, and
software reuse: a model-based approach.” Proceedings of the Fourth
International Workshop in Software Reuse and Safety. www.researchgate.net/
publication/228709911_Safety_Security_and_Software_Reuse_A_Model-Based_
Approach. Last accessed November 2021.
Kato, K. and Klyuev, V. (2017, August). “Development of a network intrusion detec-
tion system using Apache Hadoop and Spark.” 2017 IEEE Conference on
Dependable and Secure Computing. IEEE, 416–423.
Katsini, C., Abdrabou, Y., Raptis, G. E., Khamis, M., and Alt, F. (2020, April). “The
role of eye gaze in security and privacy applications: survey and future HCI
research directions.” Proceedings of the 2020 CHI Conference on Human
Factors in Computing Systems, 1–21.
Kaufman, L. and Rousseeuw, P. (1987). Clustering by Means of Medoids. North-
Holland.
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to
Cluster Analysis. John Wiley & Sons.
Keim, D. A., Mansmann, F., Panse, C., Schneidewind, J., and Sips, M. (2005).
“Mail explorer – spatial and temporal exploration of electronic mail.”
Proceedings of the Seventh Joint Eurographics/IEEE VGTC Conference on
Visualization, 247–254.
Keim, D. A., Mansmann, F., and Schreck, T. (2005). “Analyzing electronic mail using
temporal, spatial, and content-based visualization techniques.” Informatik 2005–
Informatik Live!, vol. 67, 434–438.
Keogh. E., Lin. J., and Fu. A. (2005). “Hot sax: efficiently finding the most unusual
time series subsequence.” Fifth IEEE International Conference on Data Mining
(ICDM'05), doi: 10.1109/ICDM.2005.79.
Kianmehr, K. and Koochakzadeh, N. (2012). “Learning from socio-economic charac-
teristics of IP geo-locations for cybercrime prediction.” International Journal of
Business Intelligence and Data Mining, 7(1/2), 21–39
Kim, S., Edmonds, W., and Nwanze, N. 2014. “On GPU accelerated tuning for a
payload anomaly-based network intrusion detection scheme.” Proceedings of the
9th Annual Cyber and Information Security Research Conference (CISR ‘14).
ACM, 1–4. doi: 10.1145/2602087.2602093.
Kim Zetter Security. (2013). “Someone’s been siphoning data through a huge security
hole in the internet.” www.wired.com/2013/12/bgp-hijacking-belarus-iceland/.
Last accessed December 2016.
Knorr, E. M., Ng, R. T., and Tucakov, V. (2000). “Distance-based outliers: algorithms
and applications.” VLDB Journal – The International Journal on Very Large Data
Bases, 8(3–4), 237–253.
Koh, Y. S. and Ravana, S. D. (2016). “Unsupervised rare pattern mining: a survey.”
ACM Transactions on Knowledge Discovery from Data (TKDD), 10(4), 45.
Koh, Y. S. and Rountree, N. (2005). “Finding sporadic rules using apriori-inverse.”
PAKDD (Lecture Notes in Computer Science), vol. 3518. T. B. Ho, D. Cheung,
and H. Liu (Eds.). Springer, 97–106.
Koike, H. and Ohno, K. (2004). “SnortView: visualization system of snort logs.”
Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for
Computer Security (VizSEC/DMSEC '04). ACM, 143–147. doi: 10.1145/
1029208.1029232.
Kosner, A. W. (2012). “Cyber security fails as 3.6 million social security numbers
breached in South Carolina.” www.forbes.com/sites/anthonykosner/2012/10/27/
cyber-security-fails-as-3-6-million-social-security-numbers-breached-in-south-car
olina/?sh=5f3637784e9e. Last accessed March 2021.
Kotsiantis, S. and Pintelas, P. (2004). “Recent advances in clustering: a brief
survey.” WSEAS Transactions on Information Science and Applications, 1(1),
73–81.
Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E. (2006). “Machine learning: a
review of classification and combining techniques.” Artificial Intelligence Review,
26(3), 159–190.
Kulldorff, M. (1997). “A spatial scan statistic.” Communications of Statistics – Theory
Meth., 26(6), 1481–1496.
Kulldorff, M., Athas, W., Feuer, E., Miller, B., and Key, C. (1998). “Evaluating cluster
alarms: a space-time scan statistic and brain cancer in Los Alamos.” American
Journal of Public Health, 88(9), 1377–1380.
Kurgan, L. A. and Musilek, P. (2006, March). “A survey of knowledge discovery and
data mining process models.” Knowledge Engineering Review, 21(1) 1–24. doi: 10
.1017/S0269888906000737.
L24. (2016). “First national cyber security exercise Cyber Shield.” http://l24.lt/en/
society/item/150489-first-national-cyber-security-exercise-cyber-shield-2016-will-
be-held. Last accessed April 12, 2017.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). “Deep learning.” Nature, 521(7553),
436.
Lee, G., Yun, U., Ryang, H., and Kim, D. (2015). “Multiple minimum support-based
rare graph pattern mining considering symmetry feature-based growth technique
and the differing importance of graph elements.” Symmetry, 7(3), 1151.
Leskovec, J. (2008). “Dynamics of large networks.” Dissertation. ProQuest
Dissertations Publishing.
Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. (2005). “Realistic,
mathematically tractable graph generation and evolution, using Kronecker multi-
plication.” European Conference on Principles and Practice of Knowledge
Discovery in Databases: PKDD 2005, A. M. Jorge, L. Torgo, P. Brazdil, R.
Camacho, and J. Gama (Eds.). Lecture Notes in Computer Science, vol. 3721.
Springer. doi: 10.1007/11564126_17.
Luengo, J., García, S., and Herrera, F. (2012). “On the choice of the best imputation
methods for missing values considering three groups of classification methods.”
Knowledge and Information Systems, 32(1), 77–108.
Ma, S. and Hellerstein, J. L. (2001a). “Mining mutually dependent patterns.”
Proceedings of the 2001 International Conference on Data Mining (ICDM’01),
San Jose, CA, November 2001. IEEE, 409–416.
Ma, S. and Hellerstein, J. L. (2001b). “Mining partially periodic event patterns
with unknown periods.” Proceedings of the 2001 International Conference
on Data Engineering (ICDE’01), Heidelberg, Germany, April 2001. IEEE,
205–214.
MacQueen, J. (1967, June). “Some methods for classification and analysis of
multivariate observations.” Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability. University of California Press, vol. 1,
no. 14, 281–297.
Mandiant. (2013). “APT1: exposing one of China’s cyber espionage units.” www
.mandiant.com/resources/apt1-exposing-one-of-chinas-cyber-espionage-units.
Last accessed November 2021.
Manyika, J., Chui, M., Brown, B., et al. (2011). Big data: The next frontier for
innovation, competition, and productivity. McKinsey Global Institute.
Maron, M. and Kuhns, J. (1960). “On relevance, probabilistic indexing, and infor-
mation retrieval.” Journal of the Association for Computing Machinery 7,
216–244.
Massicotte, F., Whalen, T., and Bilodeau, C. (2003). “Network mapping tool for real-
time security analysis.” RTO IST Symposium on Real Time Intrusion Detection,
12-1–12-10.
McAfee. (2010). “Protecting your critical assets.” www.wired.com/images_blogs/
threatlevel/2010/03/operationaurora_wp_0310_fnl.pdf. Last accessed November
2021.
McAfee. (2011). “Global energy cyberattacks: “Night Dragon.” www.heartland.org/
publications-resources/publications/global-energy-cyberattacks-night-dragon. Last
accessed November 2021.
McAfee. (2018). “The economic impact of cybercrime – no slowing down.” McAfee,
Center for Strategic and International Studies (CSIS). www.mcafee.com/enter
prise/en-us/solutions/lp/economics-cybercrime.html.
McBride, M., Carter, L., and Warkentin, M. (2012). “Exploring the role of
individual employee characteristics and personality on employee compliance with
cybersecurity policies.” RTI International Institute for Homeland Security
Solutions, 5(1), 1.
McGuire, M. P., Janeja, V.P., and Gangopadhyay, A. (2008, August). “Spatiotemporal
neighborhood discovery for sensor data.” International Workshop on Knowledge
Discovery from Sensor Data. Springer, 203–225.
McGuire, M. P., Janeja, V. P., and Gangopadhyay, A. (2012). “Mining sensor datasets
with spatio-temporal neighborhoods.” Journal of Spatial Information Science
(JOSIS), 2013(6), 1–42.
Meng, X., Bradley, J., Yavuz, B., et al. (2016). “Mllib: machine learning in Apache
Spark.” Journal of Machine Learning Research, 17(1), 1235–1241.
Mezzour, G. (2015). “Assessing the global cyber and biological threat.” Thesis,
Carnegie Mellon University. doi: 10.1184/R1/6714857.v1.
Miller, H. J. (2004). “Tobler’s first law and spatial analysis.” Annals of the Association
of American Geographers, 94(2), 284–289.
Miller, W. B. (2014). “Classifying and cataloging cyber-security incidents within
cyber-physical systems.” Doctoral dissertation, Brigham Young University.
Misal, V., Janeja, V. P., Pallaprolu, S. C., Yesha, Y., and Chintalapati, R. (2016,
December). “Iterative unified clustering in big data.” 2016 IEEE International
Conference on Big Data (Big Data). IEEE, 3412–3421.
Mitra, B., Sural, S., Vaidya, J., and Atluri, V. (2016). “A survey of role mining.” ACM
Computing Surveys (CSUR), 48(4), 1–37.
MITRE ATT&CK. (2020). ATT&CK Matrix for Enterprise. https://attack.mitre.org/.
Last accessed November 2021.
Molina, L. C., Belanche, L., and Nebot, À. (2002). “Feature selection algorithms: a
survey and experimental evaluation.” 2002 IEEE International Conference on
Data Mining, 2002. ICDM 2003. Proceedings. IEEE, 306–313.
Namayanja, J. M. and Janeja, V. P. (2014, October). “Change detection in temporally
evolving computer networks: a big data framework.” 2014 IEEE International
Conference on Big Data. IEEE, 54–61.
Namayanja, J. M. and Janeja, V. P. (2015, May). “Change detection in evolving computer
networks: changes in densification and diameter over time.” 2015 IEEE International
Conference on Intelligence and Security Informatics (ISI). IEEE, 185–187.
Namayanja, J. M. and Janeja, V. P. (2017). “Characterization of evolving networks for
cybersecurity.” Information Fusion for Cyber-Security Analytics, I. Alsmadi, G.
Karabatis, and A. Aleroud (Eds.). Studies in Computational Intelligence, vol. 691.
Springer International Publishing, 111–127. doi: 10.1007/978-3-319-44257-0_5.
Namayanja, J. M. and Janeja, V. P. (2019). “Change detection in large evolving
networks.” International Journal of Data Warehousing and Mining, 15(2), 62–79.
Naseer, S., Saleem, Y., Khalid, S., et al. (2018). “Enhanced network anomaly detection
based on deep neural networks.” IEEE Access, 6, 48231–48246.
Naus, J. (1965). “The distribution of the size of the maximum cluster of points on the
line.” Journal of the American Statistical Association, 60, 532–538.
Neill, D., Moore, A., Pereira, F., and Mitchell, T. (2005). “Detecting significant
multidimensional spatial clusters.” Advances in Neural Information Processing
Systems 17, MIT Press, 969–976.
Netscout. (2020). “A global threat visualization.” www.netscout.com/global-threat-
intelligence. Last accessed November 2020.
Ng, R. T. and Han, J. (1994, September). “Efficient and effective clustering methods for
spatial data mining.” Proceedings of the 20th International Conference on Very
Large Data Bases (VLDB ’94). Morgan Kaufmann , 144–155.
Ng, R. T., Lakshmanan, L. V. S., Han, J., and Pang, A. 1998. “Exploratory mining and
pruning optimizations of constrained associations rules.” Proceedings of the
1998 ACM SIGMOD International Conference on Management of Data
(SIGMOD ’98). ACM,13–24.
Nguyen, T. T. and Janapa Reddi, V. (2019). “Deep reinforcement learning for cyber
security.” arXiv:1906.05799.
Nicosia, V., Tang, J., Mascolo, C., et al. (2013). “Graph metrics for temporal net-
works.” Temporal Networks: Understanding Complex Systems, P. Holme and J.
Saramäki (Eds.). Springer, 15–40. doi: 10.1007/978-3-642-36461-7_2.
NIST (National Institute of Standards and Technology). (2015). NIST Big Data
Interoperability Framework (NBDIF), V1.0. https://bigdatawg.nist.gov/V1_
output_docs.php. Last accessed April 12, 2017.
NIST. (2017). National vulnerability database. http://nvd.nist.gov/. Last accessed
September 2017.
Nunes, D. S., Zhang, P., and Silva, J. S. (2015). “A survey on human-in-the-loop
applications towards an internet of all.” IEEE Communications Surveys and
Tutorials, 17(2), 944–965.
O’Gorman, G. and McDonald, G. (2012). “The Elderwood Project.” www.infopoint-
security.de/medien/the-elderwood-project.pdf. Last accessed November 2021.
Ohm, M., Sykosch, A., and Meier, M. (2020, August). “Towards detection of software
supply chain attacks by forensic artifacts.” Proceedings of the 15th International
Conference on Availability, Reliability and Security (ARES ’20). ACM, 1–6. doi:
10.1145/3407023.3409183.
Openshaw, S. (1987). “A mark 1 geographical analysis machine for the
automated analysis of point data sets.” International Journal of GIS, 1(4),
335–358.
OSQuery. (2016). https://osquery.io/. Last accessed March 2020.
Otoum, S., Kantarci, B., and Mouftah, H. (2019, May). “Empowering reinforcement
learning on big sensed data for intrusion detection.” 2019–2019 IEEE
International Conference on Communications (ICC). IEEE, 1–7.
Paganini, P. (2014). “Turkish government is hijacking the IP for popular DNS pro-
viders.” http://securityaffairs.co/wordpress/23565/intelligence/turkish-govern
ment-hijacking-dns.html. Last accessed June 2017.
Parekh, J. J., Wang, K., and Stolfo, S. J. (2006). “Privacy-preserving payload-based
correlation for accurate malicious traffic detection.” Proceedings of the
2006 SIGCOMM Workshop on Large-Scale Attack Defense (LSAD ‘06). ACM,
99–106. doi: 10.1145/1162666.1162667.
Patcha, A. and Park, J. M. (2007). “An overview of anomaly detection techniques:
existing solutions and latest technological trends.” Computer Networks, 51(12),
3448–3470.
Paul, R. J. and Taylor, S. J. E. (2002). “Improving the model development process:
what use is model reuse: is there a crook at the end of the rainbow?” Proceedings
of the 34th Conference on Winter Simulation: Exploring New Frontiers (WSC
‘02). Winter Simulation Conference, 648–652.
Pei, J. and Han, J. (2000). “Can we push more constraints into frequent pattern
mining?” Proceedings of the Sixth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM Press, 350–354.
Peña, J. M., Lozano, J. A., and Larrañaga, P. (2002). “Learning recursive Bayesian
multinets for data clustering by means of constructive induction.” Machine
Learning, 47(1), 63–89.
Pfeiffer, T., Theuerling, H., and Kauer, M. (2013). “Click me if you can! How do users
decide whether to follow a call to action in an online message?” Human Aspects of
Information Security, Privacy, and Trust. HAS 2013, L. Marinos and I.
Shi, L. and Janeja, V. P. (2009, June). “Anomalous window discovery through scan
statistics for linear intersecting paths (SSLIP).” Proceedings of the 15th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 767–776.
Shropshire, J., Warkentin, M., Johnston, A., and Schmidt, M. (2006). “Personality and
IT security: an application of the five-factor model.” AMCIS 2006 Proceedings.
Association for Information Systems AIS Electronic Library (AISeL), 415.
Simmon, E., Sowe, S. K., and Zettsu, K. (2015). “Designing a cyber-physical cloud
computing architecture.” IT Professional, (3), 40–45.
Simmons, C., Ellis, C., Shiva, S., Dasgupta, D., and Wu, Q. (2009). AVOIDIT: A cyber
attack taxonomy. 9th Annual Symposium on Information Assurance, 2–12. https://
nsarchive.gwu.edu/sites/default/files/documents/4530310/Chris-Simmons-Charles-
Ellis-Sajjan-Shiva.pdf. Last accessed November 2021.
Skariachan, D. and Finkle, J. (2014). “Target shares recover after reassurance on
data breach impact.” www.reuters.com/article/us-target-results/target-shares-
recover-after-reassurance-on-data-breach-impact-idUSBREA1P0WC20140226. Last
accessed March 2020.
Sklower, K. (1991, Winter). A Tree-Based Packet Routing Table for Berkeley UNIX.
USENIX.
SMR Foundation. (2021). NodeXL. www.smrfoundation.org/nodexl/. Last accessed
November 2021.
Snare. (2020). Snare. www.snaresolutions.com/central-83/. Last accessed March 2020.
Snort. (2020). Snort Rules Infographic. https://snort-org-site.s3.amazonaws.com/pro
duction/document_files/files/000/000/116/original/Snort_rule_infographic.pdf?X-
Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIXACIED2
SPMSC7GA%2F20210316%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date
=20210316T191343Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&
X-Amz-Signature=bcfc7d75d223ab40badd8bd9e89ded29cc98ed3896f0140f5532
0ee9bcdf1383. Last accessed March 2020.
Spitzner, L. (2003). Honeypots: Tracking Hackers, vol. 1. Addison-Wesley.
Srikant, R. and Agrawal, R. (1996). “Mining quantitative association rules in large
relational tables.” Proceedings of the 1996 ACM SIGMOD international
Conference on Management of Data. ACM Press, 1–12.
Statista. (2020). eCommerce report 2020 Statista Digital Market Outlook. www.statista
.com/study/42335/ecommerce-report/. Last accessed November 2021.
Statista. (2021, March). Digital population worldwide. www.statista.com/statistics/
617136/digital-population-worldwide/. Last accessed November 2021.
Statista. “IoT market – forecasts.” www.statista.com/statistics/1101442/iot-number-of-
connected-devices-worldwide/. Last accessed November 2021.
Stauffer, J. and Janeja, V. (2017). “A survey of advanced persistent threats and the
characteristics. Technical report.
Stephens, G. D. and Maloof, M. A. (2014, April 22) “Insider threat detection.” U.S.
Patent No. 8,707,431.
Stouffer, K., Falco, J., and Scarfone, K. (2009). “Guide to industrial control systems
(ICS) security.” Technical report, National Institute of Standards and Technology.
Stubbs, J., Satter, R., and Menn, J. (2020). “U.S. Homeland Security, thousands of
businesses scramble after suspected Russian hack.” www.reuters.com/article/
global-cyber/global-security-teams-assess-impact-of-suspected-russian-cyber-
attack-idUKKBN28O1KN. Last accessed November 2021.
Stutz, J. and Cheeseman, P. (1996). “AutoClass – a Bayesian approach to classifica-
tion.” Maximum Entropy and Bayesian Methods. Springer, 117–126.
Sugiura, O. and Ogden, R. T. (1994). “Testing change-points with linear trend.”
Communications in Statistics – Simulation and Computation, 23(2), 287–322.
doi: 10.1080/03610919408813172.
Sugiyama, M. and Borgwardt, K. (2013). “Rapid distance-based outlier detection via
sampling.” Proceedings of the 26th International Conference on Neural
Information Processing Systems – Volume 1 (NIPS'13). Curran Associates,
467–475.
Tan, Y., Vuran, M. C., and Goddard, S. (2009, June). “Spatio-temporal event model for
cyber-physical systems.” 29th IEEE International Conference on Distributed
Computing Systems Workshops, 2009. ICDCS Workshops’ 09. IEEE, 44–50.
Tang, X., Eftelioglu, E., Oliver, D., and Shekhar, S. (2017). “Significant linear hotspot
discovery.” IEEE Transactions on Big Data, 3(2), 140–153.
Tango, T. and Takahashi, K. (2005). “A flexibly shaped spatial scan statistic for
detecting clusters.” International Journal of Health Geographics, 4(11). doi:
10.1186/1476-072X-4-11.
Tao, F., Murtagh, F., and Farid, M. (2003). “Weighted association rule mining using
weighted support and significance framework.” Proceedings of the Ninth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
KDD’03. ACM Press, 661–666.
Tartakovsky, A. G., Polunchenko, A. S., and Sokolov, G. (2013). “Efficient computer
network anomaly detection by changepoint detection methods.” IEEE Journal of
Selected Topics in Signal Processing, 7(1), 4–11.
Ten, C. W., Hong, J., and Liu, C. C. (2011). “Anomaly detection for cybersecurity of
the substations.” IEEE Transactions on Smart Grid, 2(4), 865–873.
Thakur, V. (2011, December 8). “The Sykipot Attacks.” www.symantec.com/connect/
blogs/sykipot-attacks. Last accessed November 2021.
Tim, O., Firoiu, L., and Cohen, P. (1999). “Clustering time series with hidden markov
models and dynamic time warping.” Presented at IJCAI-99 Workshop on
Sequence Learning.
Tobler, W. R. (1970). “A computer model simulation of urban growth in the Detroit
region.” Economic Geography, 46(2), 234–240.
Townsend, M., Rupp, L., and Green, J. (2014). “Target CEO ouster shows new board
focus on cyber attacks.” www.bloomberg.com/news/2014-05-05/target-ceo-ouster-
shows-new-board-focus-on-cyber-attacks.html. Last accessed November 2021.
Trend Micro Incorporated. (2012a). ”Detecting APT activity with network
traffic analysis.” www.trendmicro.com/cloud-content/us/pdfs/security-intelli
gence/white-papers/wp-detecting-apt-activity-with-network-traffic-analysis.pdf.
Last accessed November 2021.
Trend Micro Incorporated. (2012b), “Spear-phishing email: most favored APT attack
bait.”
Trinius, P., Holz, T., Göbel, J., and Freiling, F. C. (2009, October). “Visual analysis of
malware behavior using treemaps and thread graphs.” 6th International Workshop
on Visualization for Cyber Security, 2009. VizSec 2009. IEEE, 33–38.
Tsuchiya, P. F. (1988). “The landmark hierarchy: a new hierarchy for routing in very
large networks.” Symposium Proceedings on Communications Architectures and
Protocols (SIGCOMM '88). ACM, 35–42.
Vaarandi, R. and Podiņš, K. (2010). “Network IDs alert classification with frequent
itemset mining and data clustering.” 2010 International Conference on Network
and Service Management. IEEE, 451–456.
Vaidya, J., Atluri, V., and Guo, Qi. (2007). “The role mining problem: finding a
minimal descriptive set of roles.” Proceedings of the 12th ACM Symposium on
Access Control Models and Technologies. ACM, 175–184.
Van Mieghem, V. (2016). “Detecting malicious behaviour using system calls.”
Master’s thesis, Delft University.
Venkatasubramanian, K., Nabar, S., Gupta, S. K. S., and Poovendran, R. (2011).
“Cyber physical security solutions for pervasive health monitoring systems.”
E-Healthcare Systems and Wireless Communications: Current and Future
Challenges, M. Watfa (Ed.). IGI Global, 143–162.
Verizon Wireless. (2017). “Data breach digest.” https://enterprise.verizon.com/
resources/articles/2017-data-breach-digest-half-year-anniversary/. Last accessed
November 2021.
Verma, J. P. and Patel, A. (2016, March–September). “Comparison of MapReduce and
Spark programming frameworks for big data analytics on HDFS.” International
Journal of Computer Science and Communication, 7(2), 80–84.
Versprille, A. (2015). “Researchers hack into driverless car system, take control of
vehicle.” www.nationaldefensemagazine.org/articles/2015/5/1/2015may-research
ers-hack-into-driverless-car-system-take-control-of-vehicle. Last accessed
November 20201.
Villeneuve, N. and Sancho, D. (2011). “The ‘lurid’ downloader.” www.trendmicro.com/
cloud-content/us/pdfs/security-intelligence/white-papers/wp_dissecting-lurid-apt.pdf.
Last accessed November 2021.
Vishwanath, A., Herath, T., Chen, R., Wang, J., and Rao, H. R. (2011). “Why do
people get phished? Testing individual differences in phishing vulnerability within
an integrated, information processing model.” Decision Support Systems, 51(3),
576–586.
Wall, M. E., Rechtsteiner, A., and Rocha, L. M. (2003). “Singular value decomposition
and principal component analysis.” A Practical Approach to Microarray Data
Analysis, D. P. Berrar, W. Dubitzky, and M. Granzow (Eds.). Springer, 91–109.
doi: 10.1007/0-306-47815-3_5.
Wang, H., Wu, B., Yang, S., Wang, B., and Liu, Y. (2014). “Research of decision tree
on YARN using MapReduce and Spark.” World Congress in Computer Science,
Computer Engineering, and Applied Computing. American Council on Science
and Education, 21–24.
Wang, K., He, Y., and Han, J. (2003). “Pushing support constraints into association
rules mining.” IEEE Transactions Knowledge Data Engineering, 15(3),
642–658.
Wang, K. and Stolfo, S. J. 2004. “Anomalous payload-based network intrusion detec-
tion.” Recent Advances in Intrusion Detection. RAID 2004, E. Jonsson, A. Valdes,
and M. Almgren (Eds.). Lecture Notes in Computer Science, vol. 3224. Springer,
89–96.
Wang, K., Yu, H., and Cheung, D. W. 2001. “Mining confident rules without support
requirement.” Proceedings of the Tenth International Conference on Information
and Knowledge Management. ACM Press, 89–96.
Wang, L., Singhal, A., and Jajodia, S. (2007a, July). “Measuring the overall
security of network configurations using attack graphs.” IFIP Annual
Conference on Data and Applications Security and Privacy. Springer, 98–112.
Wang, L., Singhal, A., and Jajodia, S. (2007b, October). “Toward measuring network
security using attack graphs.” Proceedings of the 2007 ACM Workshop on Quality
of Protection. ACM, 49–54.
Wang, P. A. (2011). “Online phishing in the eyes of online shoppers.” IAENG
International Journal of Computer Science, 38(4), 378–383.
Wang, Q. H. and Kim, S. H. (2009). Cyber Attacks: Cross-Country Interdependence
and Enforcement. WEIS.
Wang, W., Yang, J., and Muntz, R. (1997, August). “STING: A statistical information
grid approach to spatial data mining.” VLDB, 97, 186–195.
Ward, J. S. and Barker, A. (2013). “Undefined by data: a survey of big data definitions.”
arXiv:1309.5821.
Washington Post. (2013). “Target data flood stolen-card market.” www.washingtonpost
.com/business/economy/target-cyberattack-by-overseas-hackers-may-have-comprom
ised-up-to-40-million-cards/2013/12/20/2c2943cc-69b5-11e3-a0b9-249bbb34602c_
story.html?utm_term=.42a8cd8b6c0e. Last accessed November 2016.
Websense. (2011). “Advanced persistent threat and advanced attacks: threat analysis
and defense strategies for SMB, mid-size, and enterprise organizations Rev 2.”
Technical report.
Wei, L., Keogh, E., and Xi, X. (2006). “SAXually explicit images: finding unusual
shapes.” ICDM’06: Proceedings of the Sixth International Conference on Data
Mining. IEEE Computer Society, 711–720.
Weimerskirch, A. (2018). Derrick Dominic Assessing Risk: Identifying and Analyzing
Cybersecurity Threats to Automated Vehicles. University of Michigan,
Wilson, R. J. (1986). Introduction to Graph Theory. John Wiley & Sons.
Wireshark. (2021). www.wireshark.org. Last accessed November 2021.
Wright, R., Chakraborty, S., Basoglu, A., and Marett, K. (2010). “Where did they go
right? Understanding the deception in phishing communications.” Group Decision
and Negotiation, 19(4), 391–416.
Wright, R. T. and Marett, K. (2010). The influence of experiential and dispositional
factors in phishing: an empirical investigation of the deceived. Journal of
Management Information Systems, 27(1), 27.
Wu, M., Miller, R. C., and Garfinkel, S. L. (2006). “Do security toolbars actually
prevent phishing attacks?” Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (CHI ‘06). ACM, 601–610.
Wu, X., Kumar, V., Quinlan, J. R., et al. (2008). “Top 10 algorithms in data mining.”
Knowledge and Information Systems, 14(1), 1–37.
Wübbeling, M., Elsner, T., and Meier, M. (2014, June). “Inter-AS routing anomalies:
improved detection and classification.” 6th International Conference on Cyber
Conflict (CyCon 2014), 2014. IEEE, 223–238.
Wybourne, M. N., Austin, M. F., and Palmer, C. C. (2009). National Cyber Security.
Research and Development Challenges. Related to Economics, Physical
Zhang, T., Ramakrishnan, R., and Livny, M. (1996, June). “BIRCH: an efficient data
clustering method for very large databases.” ACM Sigmod Record, 25(2),
103–114.
Zhao, Q., and Bhowmick, S. S. (2003). “Sequential pattern mining: a survey.”
Technical report, CAIS Nayang Technological University Singapore, 1–26.
Zhou, B., Cheung, D. W., and Kao, B. (1999, April). “A fast algorithm for density-
based clustering in large database.” Methodologies for Knowledge Discovery and
Data Mining. PAKDD 1999, N. Zhong and L. Zhou (Eds.). Lecture Notes in
Computer Science, Vol. 1574. Springer Berlin Heidelberg. doi: 10.1007/3-540-
48912-6_45.
Zimmermann, A., Lorenz, A., and Oppermann, R. (2007, August). “An operational
definition of context.” Modeling and Using Context. CONTEXT 2007, B.
Kokinov, D. C. Richardson, T. R. Roth-Berghofer, and L. Vieu (Eds.). Lecture
Notes in Computer Science, Vol. 4635. Springer Berlin Heidelberg. doi: 10.1007/
978-3-540-74255-5_42.
Zimmermann, V. and Renaud, K. (2021). “The nudge puzzle: matching nudge inter-
ventions to cybersecurity decisions.” ACM Transactions on Computer–Human
Interaction (TOCHI), 28(1), 1–45.
access control, 12, 16, 24, 28, 49 centroid, 34, 43, 110–111
accommodation, 99 CLARANS, 44–45
accuracy, 29, 40–41, 50 classification, 29–30, 40–41, 49–51, 55, 75, 90,
adjacency matrix, 130, 135 101, 152, 157
advanced persistent threat, 5, 69, 84–85, 139 client server, 14
anomalous window, 118 closeness, 131
anomaly, 18, 27–28, 34, 40, 42, 49, 81, 91–93, cloud, 2–3, 6, 15–16, 76
95–101, 104, 107, 110–112, 116–118, clustering, 11, 25, 34, 37, 40, 42–45, 48–49,
120–123, 129, 147, 158 53, 75, 92, 101, 105, 108–110, 117–118,
anomaly detection, 27, 34, 36, 91, 93, 100, 155
104, 116–117, 140 common vulnerabilities and exposures, 25
application security, 6 common vulnerability scoring system, 25, See
Apriori algorithm, 57–58 CVSS
APT, 5, 79, 84–89, 139 communication, 21
assets, 1–3, 14, 16, 24, 79 computational data model, 81–82, 84
association rule mining, 18–19, 36, 41, 59 computer networks, 3, 7–8, 11, 13, 16, 18, 27,
asymmetric coefficient, 39 67, 124, 148, 154
attack maps, 3, 113–114 conditional probability, 42, 55, 57
attack vector, 85 confidence, 30, 41, 57
autocorrelation, 67, 113–115, 117–118, 120, consistent, 22, 28, 71–72, 89, 134
122 consistent nodes, 71–72
autonomous systems, 95 context, 93
contextual anomalies, 91, 94
Bayesian, 24 contingency matrix, 38
behavioral, 11 CPS, 8, 67–68, 147, 149
behavioral data, 23 critical infrastructure, 3, 80
betweenness, 131 curse of dimensionality, 96–97, 109
BGP anomalies, 95 cyberattacks, 5, 24–26, 62, 68–70, 138–139,
big data, 9, 20, 60–61, 63, 65–66, 69–70, 72, 144, 156
74–76 cyber espionage, 5
blackhole attack, 79 cyberphysical, 7, 16, 147
Border Gateway Protocol, 95 cyberphysical systems, 8, 16, 67, 147
cybersecurity, 1–2, 6, 14–15, 27, 60–61, 112,
categorical data, 32 123, 128–129, 133–134, 147
central nodes, 70, 72, 132–133 cybersecurity data, 28
centrality, 21, 131, 133 cyber threats, 1
189
data analytics, 7–9, 12, 60, 70, 137, 147 firewall, 14, 19
data cleaning, 33 first law of geography, 113
data Mining, 8, 29, 36 F-measure, 40–41
data preprocessing, 31–32 FP tree, 58
data reduction, 35 frequent itemset, 58
data storage, 3, 6, 76 frequent pattern, 55–56, 89
data transformation, 34
decision tree, 50–52, 54 Geary’s, 113
deep learning, 157–158, 161 generative adversarial network, 160–161
degrees, 22, 71, 132 geocontextual features, 152
denial of service, 4, 79, 125 geographic, 11
densification, 133 geolocations, 19, 27, 94, 115, 126, 141
density based, 42, 46–48 geological, 3
detection, 1, 13, 16–18, 20, 23, 25, 27, 34, 42, georeferenced, 68, 113
46, 59, 65, 67, 70–71, 73–74, 81, 84, 88, geospatial data, 152–153
91, 93–97, 99–101, 103–105, 108–109, graph data, 21
111–112, 116–118, 120–123, 129, 139, graphs, 21, 69, 129, 133–134
141, 147, 154, 158 ground truth, 40
diameter, 21, 69, 71, 127–128, 134
discord, 91, 121 hacktivism, 5
discordancy, 34, 102–104, 108–109 Hausdorff distance, 38
dissimilarity, 37, 119 header, 20
distance, 34, 37–38, 42–43, 46–49, 93, 101, heartbleed, 5
105, 107–109, 111, 113, 116, 122, 124, heterogeneous, 9, 30–31, 37, 60, 67, 73, 120, 151
131, 134 hierarchical, 42
distance based, 105, 108 high dimensional, 36, 47, 67, 96, 109
distance based outlier detection, 34 honeypots, 100
distributed control systems, 147 human behavior, 18
distributed denial of service (DDoS), 4, 125
distributed intrusion detection system, 73 IDS, 17
domain name system, 14 IDS logs, 11
imbalanced learning, 53
e-commerce, 16 inconsistent nodes, 71
edges, 21, 45, 72, 122, 124, 128, 130–131, 133, industrial control systems, 3, 147–148
148, 151 insider threat, 2, 24, 138, 140–142
eigenvector, 131 Intellectual property, 4
electronic assets, 1–2 intercluster, 42
ensemble, 50, 53–54, 75, 162 interface design, 143
entropy, 52–53 Internet of Things. See IoT
equal frequency binning, 34 interquartile range, 34, 104
Euclidean distance, 37, 109, 118 intracluster, 42
evaluation, 27, 29–30, 37, 40, 72, 111, 139, intrusion detection systems. See IDS
142, 149 IoT, 8, 149–150
exception, 91 IP reputation scoring, 152
explainability, 159 IQR, 34, 104
exploits, 86
eye tracker, 24 Jaccard coefficient, 39, 135
joint count, 113
false negative, 40
false negative rate, 41 K nearest neighbor, See K-NN
false positive, 40 key logging, 17
K-means, 43–44, 66 outlier detection, 34, 67, 101, 105, 108–109, 117
K-NN, 47, 109 outliers, 34, 37, 46, 67, 91, 96–97, 99–105,
107–109, 111, 117, 121
lift, 30, 41, 84
linear semantic-based scan, 118 page rank, 21
links, 16, 47, 69, 124, 128, 137, 139, 142, parametric, 101–102
148–149 partitioning, 42–44
logical errors, 5 password attack, 79
password theft, 81–83
macroneighborhoods, 120 payload, 19
malfunctions, 8, 150 peculiarity, 91
malicious intent, 2, 81, 95, 142 personal assets, 2
malware, 19–20, 24, 28, 78, 81, 86, phishing, 5, 10, 25, 78, 80, 84–85, 124, 126,
124, 152 138–139, 144, 154
malware detection, 24 physical security, 2, 11, 27, 68, 142
Manhattan distance, 37 power grid, 2, 124
Markov model, 122 precision, 40–41, 151
Markov stationary distribution, 123 predict, 14
masking, 104 prevention, 1, 13, 17, 129
measures of similarity, 37 probabilistic label, 55
medoids, 43–45 probe, 81, 145
microneighborhood, 119 psychological profiling, 140
MILES, 53–54 psychology, 15
Minkowski distance, 37 public assets, 2
missing data, 33
missing data values, 33 rare patterns, 56, 58–59
misuse, 6, 11, 27, 142 recall, 40
mobile ad hoc, 79 recover, 1
mobile devices, 3, 61 rejection, 100
Moran’s, 113 replay attack, 79
multilevel mining, 73 respond, 1, 6, 137, 141
multipronged attack, 9 router, 14
multivariate, 34
scan statistic, 118
naive Bayes, 55 security indicators, 25
National Vulnerability Database, 25 security perimeter, 9
neighborhood, 46, 119–120, 124 security policies, 12, 19, 25, 138, 142, 152
network security, 7 semantic relationships, 120
network topology, 23, 134 semisupervised, 91
network traffic, 21 sensor networks, 3, 8, 16, 67, 124, 154
nodes, 22–23, 28, 69–71, 76, 122–123, 125, sequential, 103
127–128, 131–133, 135, 148, 159 silhouette coefficient, 42, 45
NodeXL, 21 similarity, 32, 34, 38–40, 43, 104, 119–120,
noisy data, 34 122, 129, 135, 156, 161
normal distribution, 101–102 simple matching coefficient, 39
normalization, 35 social engineering, 5, 78, 80
numeric data attributes, 32 social media, 16
NVD, 25 software vulnerability, 25
spam, 126, 152
OPTICS, 48–49, 105, 118 spatial, 11, 27, 32, 46, 67–68, 94, 108–109,
ordinal, 32 112–114, 116–118, 120–122, 124–125, 151