Network Anomaly Detection-Methods, Systems and Tools
Network Anomaly Detection-Methods, Systems and Tools
Abstract—Network anomaly detection is an important and peculiarities or discordant observations in various application
dynamic research area. Many network intrusion detection meth- domains [3], [4]. Out of these, anomalies and outliers are two
ods and systems (NIDS) have been proposed in the literature. of the most commonly used terms in the context of anomaly-
In this paper, we provide a structured and comprehensive
overview of various facets of network anomaly detection so that based intrusion detection in networks.
a researcher can become quickly familiar with every aspect of Anomaly detection has extensive applications in areas such
network anomaly detection. We present attacks normally en- as fraud detection for credit cards, intrusion detection for cyber
countered by network intrusion detection systems. We categorize security, and military surveillance for enemy activities. For
existing network anomaly detection methods and systems based example, an anomalous traffic pattern in a computer network
on the underlying computational techniques used. Within this
framework, we briefly describe and compare a large number of may mean that a hacked computer is sending out sensitive
network anomaly detection methods and systems. In addition, data to an unauthorized host.
we also discuss tools that can be used by network defenders The statistics community has been studying the problem of
and datasets that researchers in network anomaly detection can detection of anomalies or outliers from as early as the 19th
use. We also highlight research directions in network anomaly century [5]. In recent decades, machine learning has started to
detection.
play a significant role in anomaly detection. A good number
Index Terms—Anomaly detection, NIDS, attack, dataset, in- of anomaly-based intrusion detection techniques in networks
trusion detection, classifier, tools have been developed by researchers. Many techniques work
in specific domains, although others are more generic.
I. I NTRODUCTION Even though there are several surveys available in the
D
literature on network anomaly detection [3], [6], [7], surveys
UE to advancements in Internet technologies and the
such as [6], [7], discuss far fewer detection methods than we
concomitant rise in the number of network attacks,
do. In [3], the authors discuss anomaly detection in general
network intrusion detection has become a significant research
and cover the network intrusion detection domain only briefly.
issue. In spite of remarkable progress and a large body of
None of the surveys [3], [6], [7] include common tools
work, there are still many opportunities to advance the state-
used during execution of various steps in network anomaly
of-the-art in detecting and thwarting network-based attacks
detection. They also do not discuss approaches that combine
[1].
several individual methods to achieve better performance. In
According to Anderson [2], an intrusion attempt or a threat
this paper, we present a structured and comprehensive survey
is a deliberate and unauthorized attempt to (i) access infor-
on anomaly-based network intrusion detection in terms of
mation, (ii) manipulate information, or (iii) render a system
general overview, techniques, systems, tools and datasets with
unreliable or unusable. For example, (a) Denial of Service
a discussion of challenges and recommendations. Our presen-
(DoS) attack attempts to starve a host of its resources, which
tation is detailed with ample comparisons where necessary
are needed to function correctly during processing; (b) Worms
and is intended for readers who wish to begin research in this
and viruses exploit other hosts through the network; and (c)
field.
Compromises obtain privileged access to a host by taking
advantages of known vulnerabilities.
The term anomaly-based intrusion detection in networks A. Prior Surveys on Network Anomaly Detection
refers to the problem of finding exceptional patterns in net- Network anomaly detection is a broad research area, which
work traffic that do not conform to the expected normal already boasts a number of surveys, review articles, as well as
behavior. These nonconforming patterns are often referred books. An extensive survey of anomaly detection techniques
to as anomalies, outliers, exceptions, aberrations, surprises, developed in machine learning and statistics has been provided
by [8], [9]. Agyemang et al. [10] present a broad review of
Manuscript received March 7, 2012; revised August 28, 2012 and February
27, 2013. anomaly detection techniques for numeric as well as symbolic
M. H. Bhuyan is with the Department of Computer Science and Engi- data. An extensive overview of neural networks and statistics-
neering, Tezpur University, Napaam, Tezpur-784028, Assam, India (e-mail: based novelty detection techniques is found in [11]. Patcha and
[email protected]).
D. K. Bhattacharyya is with the Dept. of Computer Science and Engi- Park [6] and Snyder [12] present surveys of anomaly detection
neering, Tezpur University, Napaam, Tezpur-784028, Assam, India (e-mail: techniques used specifically for cyber intrusion detection.
[email protected]). A good amount of research on outlier detection in statistics
J. K. Kalita is with the Department of Computer Science, University of
Colorado, Colorado Springs, CO 80918, USA (e-mail: [email protected]). is found in several books [13]–[15] as well as survey articles
Digital Object Identifier 10.1109/SURV.2013.052213.00046 [16]–[18]. Exhaustive surveys of anomaly detection in several
c 2014 IEEE
1553-877X/14/$31.00 ⃝
304 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
domains have been presented in [3], [7]. Callado et al. [19] attacks and their characteristics. In addition, we perform
report major techniques and problems identified in IP traffic detailed comparisons among these methods. Furthermore,
analysis, with an emphasis on application detection. Zhang like [36], we provide practical recommendations and a list
et al. [20] present a survey on anomaly detection methods of research issues and open challenges.
in networks. A review of flow-based intrusion detection is • Unlike [9], [19], our survey is not restricted to only
presented by Sperotto et al. [21], who explain the concepts of IP traffic classification and analysis. It includes a large
flow and classified attacks, and provide a detailed discussion number of up-to-date methods, systems and tools and
of detection techniques for scans, worms, Botnets and DoS analysis. Like [19], we also include a detailed discussion
attacks. on flow and packet level capturing and preprocessing.
Some work [22]–[25] has been reported in the context However, unlike [9], [19], we include ideas for develop-
of wireless networks. Sun et al. [23] present a survey of ing better IDSs, in addition to providing a list of practical
intrusion detection techniques for mobile ad-hoc networks research issues and open challenges.
(MANET) and wireless sensor networks (WSN). They also • Unlike [37], our survey is not restricted to those solutions
present several important research issues and challenges in the introduced for a particular network technology, like CRN
context of building IDSs by integrating aspects of mobility. (Cognitive Radio Network). Also unlike [37], we include
Sun et al. [22] discuss two domain independent online a discussion of a wide variety of attacks, instead of only
anomaly detection schemes (Lempel-Ziv based and Markov- CRN specific attacks.
based) using the location history obtained from traversal of • Unlike [27], our survey is focused on network anoma-
a mobile user. Sun et al. [25] also introduce two distinct lies, their sources and characteristics; and detection ap-
approaches to build IDSs for MANET, viz., Markov-chain proaches, methods and systems, and comparisons among
based and Hotelling’s T2 test-based. They also propose an them. Like [27], we include performance metrics, in
adaptive scheme for dynamic selection of normal profiles and addition to a discussion of the datasets used for evaluation
corresponding thresholds. Sun et al. [24] construct a feature of any IDS.
vector based on several parameters such as call duration,
call inactivity period, and call destination to identify users’ B. The Problem of Anomaly Detection
calling activities. They use classification techniques to detect
To provide an appropriate solution in network anomaly
anomalies.
detection, we need the concept of normality. The idea of
An extensive survey of DoS and distributed DoS attack
normal is usually introduced by a formal model that expresses
detection techniques is presented in [26]. Discussion of net-
relations among the fundamental variables involved in system
work coordinate systems, design and security is found in
dynamics. Consequently, an event or an object is detected as
[27], [28]. Wu and Banzhaf [29] present an overview of
anomalous if its degree of deviation with respect to the profile
applications of computational intelligence methods to the
or behavior of the system, specified by the normality model,
problem of intrusion detection. They include various methods
is high enough.
such as artificial neural networks, fuzzy systems, evolutionary
For example, let us take an anomaly detection system S
computation, artificial immune systems, swarm intelligence,
that uses a supervised approach. It can be thought of as a
and soft computing.
pair S = (M, D), where M is the model of normal behavior
Dong et al. [30] introduce an Application Layer IDS based
of the system and D is a proximity measure that allows one
on sequence learning to detect anomalies. The authors demon-
to compute, given an activity record, the degree of deviation
strate that their IDS is more effective compared to approaches
that such activities have with regard to the model M . Thus,
using Markov models and k-means algorithms. A general
each system has mainly two modules: (i) a modeling module
comparison of various survey papers available in the literature
and (ii) a detection module. One trains the systems to get the
with our work is shown in Table I. The survey contemplated
normality model M . The obtained model is subsequently used
in this paper covers most well-cited approaches and systems
by the detection module to evaluate new events or objects
reported in the literature so far.
or traffic as anomalous or outliers. It is the measurement
Our survey differs from the existing surveys in the following
of deviation that allows classification of events or objects
ways.
as anomalous or outliers. In particular, the modeling module
• Like [35], we discuss sources, causes and aspects of
needs to be adaptive to cope with dynamic scenarios.
network anomalies, and also include a detailed discussion
of sources of packet and flow level feature datasets.
In addition, we include a large collection of up-to-date C. Our Contributions
anomaly detection methods under the categories of sta- This paper provides a structured and broad overview of
tistical, classification-based, knowledge-based, soft com- the extensive research on network anomaly detection methods
puting, clustering-based and combination learners, rather and NIDSs. The major contributions of this survey are the
than restricting ourselves to only statistical approaches. following.
We also include several important research issues, open (a) Like the categorization of the network anomaly detection
challenges and some recommendations. research suggested in ([8], [10]), we classify detection
• Like [36], we attempt to provide a classification of methods and NIDSs into a number of categories. In
various anomaly detection methods, systems and tools addition, we also provide an analysis of many methods
introduced till date in addition to a classification of in terms of their capability and performance, datasets
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 305
TABLE I
A COMPARISON OF OUR SURVEY WITH EXISTING SURVEY ARTICLES
Methods /NIDSs Topics covered [8] [10] [11] [6] [16] [17] [3] [7] [21] [26] [29] [31] [32] [33] [34] Our
/Tools √ √ √ √ √ √ √ √ √ √ √ √ survey
√
Statistical √ √ √ √ √ √ √ √ √ √ √
Classification-based √ √ √ √ √ √ √
Knowledge-based √ √ √
Soft computing √ √ √ √ √ √ √ √
Clustering-based √
Ensemble-based
Methods √
Fusion-based √
Hybrid √ √ √
Statistical √
Classification-based √ √
Soft computing √ √ √
NIDSs Knowledge-based √ √ √ √
Data Mining √
Ensemble-based √
Hybrid √
Tools Capturing,
Preprocessing,
Attack launching
used, matching mechanisms, number of parameters, and researchers. Opportunities for future research and concluding
detection mechanisms. remarks are presented in Section VIII.
(b) Most existing surveys do not cover ensemble approaches
or data fusion for network anomaly detection, but we do.
(c) Most existing surveys avoid feature selection methods, II. I NTRUSION D ETECTION
which are crucial in the network anomaly detection task.
We present several techniques to determine feature rele- Intrusion is a set of actions aimed to compromise the
vance in intrusion datasets and compare them. security of computer and network components in terms of
(d) In addition to discussing detection methods, we present confidentiality, integrity and availability [38]. This can be done
several NIDSs with architecture diagrams with compo- by an inside or outside agent to gain unauthorized entry and
nents and functions, and also present a comparison among control of the security mechanism. To protect infrastructure
the NIDSs. of network systems, intrusion detection systems (IDSs) pro-
(e) We summarize tools used in various steps for network vide well-established mechanisms, which gather and analyze
traffic anomaly detection. information from various areas within a host or a network to
(f) We also provide a description of the datasets used for identify possible security breaches.
evaluation. Intrusion detection functions include (i) monitoring and
(g) We discuss performance criteria used for evaluating meth- analyzing user, system, and network activities, (ii) configuring
ods and systems for network anomaly detection. systems for generation of reports of possible vulnerabilities,
(h) We also provide recommendations or a wish list to the (iii) assessing system and file integrity (iv) recognizing pat-
developers of ideal network anomaly detection methods terns of typical attacks (v) analyzing abnormal activity, and
and systems. (vi) tracking user policy violations. An IDS uses vulnerability
(i) Finally, we highlight several important research issues and assessment to assess the security of a host or a network.
challenges from both theoretical and practical viewpoints. Intrusion detection works on the assumption that intrusion
activities are noticeably different from normal system activities
and thus detectable.
D. Organization
In this paper, we provide a comprehensive and exhaustive
survey of anomaly-based network intrusion detection: fun- A. Different Classes of Attacks
damentals, detection methods, systems, tools and research
issues as well as challenges. Section II discusses the basics Anderson [2] classifies intruders into two types: external
of intrusion detection in networks while Section III presents and internal. External intruders are unauthorized users of
network anomaly detection and its various aspects. Section the machines they attack, whereas internal intruders have
IV discusses and compares various methods and systems permission to access the system, but do not have privileges
for network anomaly detection. Section V reports criteria for the root or superuser mode. A masquerade internal intruder
for performance evaluation of network anomaly detection logs in as other users with legitimate access to sensitive data
methods and systems. Section VI presents recommendations whereas a clandestine internal intruder, the most dangerous,
to developers of network anomaly detection methods and has the power to turn off audit control for themselves.
systems. Section VII is devoted to research issues and chal- There are various classes of intrusions or attacks [39], [40]
lenges faced by anomaly-based network intrusion detection in computer systems. A summary is reported in Table II.
306 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE II
C LASSES OF COMPUTER ATTACKS : CHARACTERISTICS AND EXAMPLE
B. Classification of Intrusion Detection and Intrusion Detec- could assume that someone is committing a ‘port scan’ at some
tion Systems of the computer(s) in the network. Various kinds of port scans,
Network intrusion detection has been studied for almost 20 and tools to launch them are discussed in detail in [43]. Port
years. Generally, an intruder’s behavior is noticeably different scans mostly try to detect incoming shell codes in the same
from that of a legitimate user and hence can be detected [41]. manner that an ordinary intrusion detection system does. Apart
IDSs can also be classified based on their deployment in real from inspecting the incoming traffic, a NIDS also provides
time. valuable information about intrusion from outgoing or local
1) Host-based IDS (HIDS): A HIDS monitors and analyzes traffic. Some attacks might even be staged from the inside
the internals of a computing system rather than its external of a monitored network or network segment, and therefore,
interfaces [42]. A HIDS might detect internal activity such not regarded as incoming traffic at all. The data available
as which program accesses what resources and attempts il- for intrusion detection systems can be at different levels of
legitimate access. An example is a word processor that sud- granularity, e.g., packet level traces and IPFIX records. The
denly and inexplicably starts modifying the system password data is high dimensional, typically, with a mix of categorical
database. Similarly, a HIDS might look at the state of a system as well as continuous attributes.
and its stored information whether it is in RAM or in the Misuse-based intrusion detection normally searches for
file system or in log files or elsewhere. One can think of a known intrusive patterns but anomaly-based intrusion detec-
HIDS as an agent that monitors whether anything or anyone tion tries to identify unusual patterns. Intrusion detection
internal or external has circumvented the security policy that techniques can be classified into three types based on the
the operating system tries to enforce. detection mechanism [1], [3], [44]. This includes (i) misuse-
2) Network-based IDS (NIDS): An NIDS deals with detect- based, (ii) anomaly-based, and (iii) hybrid, as described in
ing intrusions in network data. Intrusions typically occur as Table III. Today, researchers mostly concentrate on anomaly-
anomalous patterns though certain techniques model the data based network intrusion detection because it can detect known
in a sequential fashion and detect anomalous subsequences as well as unknown attacks.
[42]. The primary reason for these anomalies is attacks There are several reasons that make intrusion detection
launched by outside attackers who want to gain unauthorized a necessary part of the entire defense system. First, many
access to the network to steal information or to disrupt the traditional systems and applications were developed without
network. security in mind. Such systems and applications were targeted
In a typical setting, a network is connected to the rest of the to work in an environment, where security was never a
world through the Internet. The NIDS reads all incoming pack- major issue. However, the same systems and applications
ets or flows, trying to find suspicious patterns. For example, when deployed in the current network scenario, become major
if a large number of TCP connection requests to a very large security headaches. For example, a system may be perfectly
number of different ports are observed within a short time, one secure when it is isolated but becomes vulnerable when it is
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 307
TABLE III
C HARACTERISTICS AND TYPES OF INTRUSION DETECTION TECHNIQUES
Technique Characteristics
Misuse- (i) Detection is based on a set of rules or signatures for known attacks. (ii) Can detect all known attack patterns based on the reference
based data. (iii) How to write a signature that encompasses all possible variations of the pertinent attack is a challenging task.
Anomaly- (i) Principal assumption: All intrusive activities are necessarily anomalous. (ii) Such a method builds a normal activity profile and checks
based whether the system state varies from the established profile by a statistically significant amount to report intrusion attempts. (iii) Anomalous
activities that are not intrusive may be flagged as intrusive. These are false positives. (iv) One should select threshold levels so that neither
of the above two problems is unreasonably magnified nor the selection of features to monitor is optimized. (v) Computationally expensive
because of overhead and possibly updating several system profile matrices.
Hybrid (i) Exploits benefits of both misuse and anomaly-based detection techniques. (ii) Attempts to detect known as well as unknown attacks.
TABLE IV
A NOMALY: TYPES , CHARACTERISTICS AND EXAMPLES
information as a post-processing activity to support reference instances (also referred to as objects, records, points, vectors,
or profile updation with the help of security manager. patterns, events, cases, samples, observations, entities) [46].
6) Post-processing: This is an important module in a NIDS Each data instance can be described using a set of attributes
for post-processing of the generated alarms for diagnosis of of binary, categorical or numeric type. Each data instance may
actual attacks. consist of only one attribute (univariate) or multiple attributes
7) Capturing traffic: Traffic capturing is an important mod- (multivariate). In the case of multivariate data instances, all
ule in a NIDS. The raw traffic data is captured at both packet attributes may be of the same type or may be a mixture of
and flow levels. Packet level traffic can be captured using a data types. The nature of attributes determines the applicability
common tool, e.g., Wireshark1 and then preprocessed before of anomaly detection techniques.
sending to the detection engine. Flow level data in high speed 2) Appropriateness of proximity measures: Proximity (sim-
networks, is comprised of information summarized from one ilarity or dissimilarity) measures are necessary to solve many
or more packets. Some common tools to capture flow level pattern recognition problems in classification and clustering.
network traffic include Nfdump2, NfSen3 , and Cisco Netflow Distance is a quantitative degree of how far apart two objects
V.94 . are. Distance measures that satisfy metric properties [46] are
8) Security manager: Stored intrusion signatures are up- simply called metric while other non-metric distance measures
dated by the Security Manager (SM) as and when new are occasionally called divergence. The choice of a proximity
intrusions become known. The analysis of novel intrusions measure depends on the measurement type or representation
is a highly complex task. of objects.
Generally, proximity measures are functions that take argu-
ments as object pairs and return numerical values that become
B. Aspects of Network Anomaly Detection
higher as the objects become more alike. A proximity measure
In this section, we present some important aspects of is usually defined as follows.
anomaly-based network intrusion detection. The network in- Definition 3.1: A proximity measure S is a function X ×
trusion detection problem is a classification or clustering X → R that has the following properties [47].
problem formulated with the following components [3]: (i) – Positivity: ∀x,y ∈ X, S(x, y) ≥ 0
types of input data, (ii) appropriateness of proximity measures, – Symmetry: ∀x,y ∈ X, S(x, y) = S(y, x)
(iii) labelling of data, (iv) classification of methods based on – Maximality: ∀x,y ∈ X, S(x, x) ≥ S(x, y)
the use of labelled data, (v) relevant feature identification and where X is the data space (also called the universe) and x, y
(vi) reporting anomalies. We discuss each of these topics in are the pair of k-dimensional objects.
brief. The most common proximity measures for numeric [48]–
1) Types of input data: A key aspect of any anomaly-based [50], categorical [51] and mixed type [52] data are listed in
network intrusion detection technique is the nature of the input Table V. For numeric data, it is assumed that the data is
data used for analysis. Input is generally a collection of data represented as real vectors. The attributes take their values
1 http://www.wireshark.org/
from a continuous domain. In Table V, we assume that there
2 http://nfdump.sourceforge.net/
!−1objects, x = x1 , x2 , x3 · · · xd , y = y1 , y2 , y3 · · · yd
are two
3 http://nfsen.sourceforge.net/ and represents the data covariance with d number of
4 http://www.cisco.com attributes, i.e., dimensions.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 309
TABLE V
P ROXIMITY MEASURES FOR NUMERIC , CATEGORICAL AND MIXED TYPE DATA
Numeric [48]
Name Measure, Si (xi , yi )
! Name Measure, Si (xi , yi )
!
"d "d
Euclidean |xi − yi |2 Weighted Euclidean αi |xi − yi |2
"d i=1 "d i=1 √ √
Squared Euclidean i=1 |x i − yi |2 Squared-chord i=1 ( x i − y i )2
"d (xi −yi )2 "d
Squared X 2 City block i=1 |xi − yi |
!i=1 xi +yi
p "d p
max
Minkowski i=1 |xi − yi | Chebyshev i |xi − yi |
"d "d
|x i −y i | xi yi
Canberra i=1
xi +yi Cosine !"
d
i=1!
"d
x2 y2
i=1 i i=1 i
d
#
xi yi
i=1 "d $
Jaccard d d d
Bhattacharyya − ln i=1 (xi yi )
# # #
x2i + yi2 − xi yi
i=1 i=1 i=1
"d 2 "d (xi −yi )2
Pearson i=1 (xi − yi ) Divergence 2 i=1 (x +y )2
i i
! "
Mahalanobis (x − y)t −1 (x − y) - -
Categorical [51]
wk , k=1. . . d Measure, Sk (xk , yk ) wk , k=1. . . d Measure Sk (xk , yk )
% ⎧
1 1 if xk = yk 1 ⎨ 1 if xk = yk
Overlap =
2 0 otherwise d Eskin = n2
⎩ n2 k otherwise
+2
k
1 % 1 )
d 1 if xk = yk d 1 if xk = yk
IOF = 1
otherwise OF = 1
otherwise
1+log fk (xk )x log fk (yk ) 1+log N x log N
fk (xk ) fk (yk )
Mixed [52]
Name Measure Name Measure *
"d "d
General Similarity sgsc (x, y) = "d
1
k=1 w(xk , yk ) General Distance Co- dgdc (x, y) = "d
1
k=1 w(sk , yk )
w(xk ,yk ) w(xk ,yk )
Coefficient k=1 efficient k=1
s(xk , yk ), +1
2
|xk −yk |
• For numeric attributes, s(xk , yk ) = 1 − Rk , d2 (xk , yk ) , where d2 (xk , yk ) is the squared distance
th
where Rk is the range of the k attribute;
w(xk , yk ) = 0 if x or y has missing value for for the kth attribute; w(xk , yk ) is the same as in General
th
the k attribute; otherwise w(xk , yk ) = 1. Similarity Coefficient.
|xk −yk |
• For categorical attributes, s(xk , yk ) = 1 if xk = • For numeric attributes, d(xk , yk ) = Rk ,
yk ; otherwise s(xk , yk ) = 0; w(xk , yk ) = 0 if where Rk is the range of kth attribute.
data point x or y has missing value at kth attribute; • For categorical attributes, d(xk , yk ) = 0 if xk =
otherwise w(xk , yk ) = 1. yk ; otherwise d(xk , yk ) = 1.
rely on the general characteristics of the training data to select are reported [3]. Typically, the outputs produced by anomaly
features that are independent of each other and are highly detection techniques are of two types: (a) a score, which is a
dependent on the output. The hybrid feature selection method value that combine (i) distance or deviation with reference to
attempts to exploit the salient features of both wrapper and a set of profiles or signatures, (ii) influence of the majority in
filter methods [60]. its neighborhood, and (iii) distinct dominance of the relevant
An example of wrapper-based feature selection method subspace (as discussed in Section III-B5). (b) a label, which
is [61], where the authors propose an algorithm to build a is a value (normal or anomalous) given to each test instance.
lightweight IDS by using modified Random Mutation Hill Usually the labelling of an instance depends on (i) the size
Climbing (RMHC) as a search strategy to specify a can- of groups generated by an unsupervised technique, (ii) the
didate subset for evaluation, and using a modified linear compactness of the group(s), (iii) majority voting based on the
Support Vector Machines (SVMs) based iterative procedure outputs given by multiple indices (several example indices are
as a wrapper approach to obtain an optimum feature subset. given in Table VI), or (iv) distinct dominance of the subset of
The authors establish the effectiveness of their method in terms features.
of efficiency in intrusion detection without compromising the
detection rate. An example filter model for feature selection IV. M ETHODS AND S YSTEMS FOR N ETWORK A NOMALY
is [62], where the authors fuse correlation-based and minimal D ETECTION
redundancy-maximal-relevance measures. They evaluate their
The classification of network anomaly detection methods
method on benchmark intrusion datasets for classification
and systems that we adopt is shown in Figure 4. This
accuracy. Several other methods for feature selection are [39],
scheme is based on the nature of algorithms used. It is not
[63]–[65].
straightforward to come up with a classification scheme for
6) Reporting anomalies: An important aspect of any network anomaly detection methods and systems, primarily
anomaly detection technique is the manner in which anomalies because there is substantial overlap among the methods used
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 311
TABLE VI
C LUSTER VALIDITY MEASURES
Fig. 5. Statistics of the surveyed papers during the years 2000 to 2012
Manikopoulos and Papavassiliou [81] introduce a hierar-
chical multi-tier multi-window statistical anomaly detection
assume knowledge of the underlying distribution and estimate system to operate automatically, adaptively, and proactively. It
the parameters from the given data [79], non-parametric tech- applies to both wired and wireless ad-hoc networks. This sys-
niques do not generally assume knowledge of the underlying tem uses statistical modeling and neural network classification
distribution [80]. to detect network anomalies and faults. The system achieves
An example of a statistical IDS is HIDE [33]. HIDE is high detection rate along with low misclassification rate when
an anomaly-based network intrusion detection system, that the anomaly traffic intensity is at 5% of the background traffic
uses statistical models and neural network classifiers to detect but the detection rate is lower at lower attack intensity levels
intrusions. HIDE is a distributed system, which consists of such as 1% and 2%.
several tiers with each tier containing several Intrusion Detec- Association rule mining [92], conceptually a simple method
tion Agents (IDAs). IDAs are IDS components that monitor based on counting of co-occurrences of items in transactions
the activities of a host or a network. The probe layer (i.e., top databases, has been used for one-class anomaly detection by
layer as shown in Figure 6) collects network traffic at a host generating rules from the data in an unsupervised fashion.
or in a network, abstracts the traffic into a set of statistical The most difficult and dominating part of an association
variables to reflect network status, and periodically generates rules discovery algorithm is to find the itemsets that have
reports to the event preprocessor. The event preprocessor layer strong support. Mahoney and Chan [83] present an algorithm
receives reports from both the probe and IDAs of lower known as LERAD that learns rules for finding rare events
tiers, and converts the information into the format required in time-series data with long range dependencies and finds
by the statistical model. The statistical processor maintains anomalies in network packets over TCP sessions. LERAD
a reference model of typical network activities, compares uses an Apriori-like algorithm [92] that finds conditional rules
reports from the event preprocessor with the reference models, over nominal attributes in a time series, e.g., a sequence of
and forms a stimulus vector to feed into the neural network inbound client packets. The antecedent of a created rule is
classifier. The neural network classifier analyzes the stimulus a conjunction of equalities, and the consequent is a set of
vector from the statistical model to decide whether the network allowed values, e.g., if port=80 and word3=HTTP/1.0 then
traffic is normal. The post-processor generates reports for the word1=GET or POST. A value is allowed if it is observed
agents at higher tiers. A major attraction of HIDE is its ability in at least one training instance satisfying the antecedent. The
to detect UDP flooding attacks even with attack intensity as idea is to identify rare anomalous events: those which have not
low as 10% of background traffic. occurred for a long time and which have high anomaly score.
Of the many statistical methods and NIDSs [79], [81]–[89] LERAD is a two-pass algorithm. In the first pass, a candidate
only a few are described below in brief. rule set is generated from a random sample of training data
Bayesian networks [90] are capable of detecting anomalies comprised of attack-free network traffic. In the second pass,
in a multi-class setting. Several variants of the basic tech- rules are trained by obtaining the set of allowed values for
nique have been proposed for network intrusion detection and each antecedent.
for anomaly detection in text data [3]. The basic technique A payload-based anomaly detector for intrusion detection
assumes independence among different attributes. Several known as PAYL is proposed in [84]. PAYL attempts to detect
variations of the basic technique that capture the conditional the first occurrence of a worm either at a network system
dependencies among different attributes using more complex gateway or within an internal network from a rogue device and
Bayesian networks have also been proposed. For example, the to prevent its propagation. It employs a language-independent
authors of [91] introduce an event classification-based intru- n-gram based statistical model of sampled data streams. In
sion detection scheme using Bayesian networks. The Bayesian fact, PAYL uses only a 1-gram model (i.e., it looks at the
decision process improves detection decision to significantly distribution of values contained within a single byte) which
reduce false alarms. requires a linear scan of the data stream and a small 256-
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 313
element histogram. In other words, for each ASCII character in statistical analysis and host access policies as components of
the range 0-255, it computes its mean frequency as well as the the host sensor. The system has a separate IDS server, i.e.,
variance and standard deviation. Since payloads (i.e., arriving a management console to aggregate alerts from the various
or departing contents) at different ports differ in length, PAYL sensors with a user interface, a middle-tier and a data man-
computes these statistics for each specific observed payload agement component. It provides real time protection against
length for each port open in the system. It first observes many malicious changes to network settings on client computers,
exemplar payloads during the training phase and computes the which includes unsolicited changes to the Windows Hosts file
payload profiles for each port for each payload length. During and Windows Messenger service.
detection, each incoming payload is scanned and statistics are FSAS (Flow-based Statistical Aggregation Scheme) [94] is
computed. The new payload distribution is compared against a flow-based statistical IDS. It comprises of two modules:
the model created during training. If there is a significant feature generator and flow-based detector. In the feature
difference, PAYL concludes that the packet is anomalous and generator, the event preprocessor module collects the network
generates an alert. The authors found that this simple approach traffic of a host or a network. The event handlers generate
works surprisingly well. reports to the flow management module. The flow manage-
Song et al. [85] propose a conditional anomaly detec- ment module efficiently determines if a packet is part of
tion method for computing differences among attributes and an existing flow or it should generate a new flow key. By
present three different expectation-maximization algorithms inspecting flow keys, this module aggregates flows together,
for learning the model. They assume that the data attributes and dynamically updates per-flow accounting measurements.
are partitioned into indicator attributes and environmental The event time module periodically calls the feature extraction
attributes based on the decision taken by the user regarding module to convert the statistics regarding flows into the format
which attributes indicate an anomaly. The method learns the required by the statistical model. The neural network classifier
typical indicator attribute values and observes subsequent data classifies the score vectors to prioritize flows with the amount
points, and labels them as anomalous or not, based on the of maliciousness. The higher the maliciousness of a flow, the
degree the indicator attribute values differ from the usual indi- higher is the possibility of the flow being an attacker.
cator attribute values. However, if the indicator attribute values In addition to their inherent ability to detect network anoma-
are not conditioned on environmental attributes values, the lies, statistical approaches have a number of additional distinct
indicator attributes are ignored effectively. The precision/recall advantages as well.
of this method is greater than 90 percent.
• They do not require prior knowledge of normal activities
Lu and Ghorbani [87] present a network signal modeling of the target system. Instead, they have the ability to learn
technique for anomaly detection by combining wavelet ap-
the expected behavior of the system from observations.
proximation and system identification theory. They define and • Statistical methods can provide accurate notification or
generate fifteen relevant traffic features as input signals to alarm generation of malicious activities occurring over
the system and model daily traffic based on these features.
long periods of time, subject to setting of appropriate
The output of the system is the deviation of the current input thresholding or parameter tuning.
signal from the normal or regular signal behavior. Residuals • They analyze the traffic based on the theory of abrupt
are passed to the IDS engine to take decisions and obtain 95%
changes, i.e., they monitor the traffic for a long time
accuracy in the daily traffic. and report an alarm if any abrupt change (i.e., significant
Wattenberg et al. [88] propose a method to detect anomalies deviation) occurs.
in network traffic, based on a nonrestricted α-stable first-
order model and statistical hypothesis testing. The α-stable Drawbacks of the statistical model for network anomaly
function is used to model the marginal distribution of real detection include the following.
traffic and classify them using the Generalized Likelihood • They are susceptible to being trained by an attacker in
Ratio Test. They detect two types of anomaly including floods such a way that the network traffic generated during the
and flash-crowds with promising accuracy. In addition, a attack is considered normal.
nonparametric adaptive CSUM (Cumulative Sum) method for • Setting the values of the different parameters or metrics
detecting network intrusions is discussed in [89]. is a difficult task, especially because the balance between
In addition to the detection methods, there are several false positives and false negatives is an issue. Moreover,
statistical NIDSs. As mentioned earlier, a NIDS includes one a statistical distribution per variable is assumed, but not
or more intrusion detection methods that are integrated with all behaviors can be modeled using stochastic methods.
other required sub-systems necessary to create a practically Furthermore, most schemes rely on the assumption of a
suitable system. We discuss a few below. quasi-stationary process [6], which is not always realistic.
N@G (Network at Guard) [93] is a hybrid IDS that ex- • It takes a long time to report an anomaly for the first
ploits both misuse and anomaly approaches. N@G has both time because the building of the models requires extended
network and host sensors. Anomaly-based intrusion detection time.
is pursued using the chi-square technique on various net- • Several hypothesis testing statistics can be applied to
work protocol parameters. It has four detection methodologies detect anomalies. Choosing the best statistic is often not
viz., data collection, signature-based detection, network access straightforward. In particular, as stated in [88] construct-
policy violation and protocol anomaly detection as a part ing hypothesis tests for complex distributions that are
of its network sensor. It includes audit trails, log analysis, required to fit high dimensional datasets is nontrivial.
314 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE VII
C OMPARISON OF STATISTICAL NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset z Detection method
cation eters used
Eskin [79] 2000 2 O N P Numeric DARPA99 C4 Probability Model
Manikopoulos and Papavas- 2002 3 D N P Numeric Real-life C2 , C5 Statistical model with neural
siliou [81] network
Mahoney and Chan [83] 2003 2 C N P - DARPA99 C1 LERAD algorithm
Chan et al. [82] 2003 2 C N P Numeric DARPA99 C1 Learning Rules
Wang and Stolfo [84] 2004 3 C N P Numeric DARPA99 C1 Payload-based algorithm
Song et al. [85] 2007 3 C N P Numeric KDDcup99 Intrusive Gaussian Mixture Model
pattern
Chhabra et al. [86] 2008 2 D N P Numeric Real time C6 FDR method
Lu and Ghorbani [87] 2009 3 C N P, F Numeric DARPA99 C1 Wavelet Analysis
Wattenberg et al. [88] 2011 4 C N P Numeric Real-time C2 GLRT Model
Yu [89] 2012 1 C N P Numeric Real-time C2 Adaptive CUSUM
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, C5 -remote to local, and C6 -anomalous
into two classes: benign and anomalies. The anomalies include built. The purpose of building decision trees is to overcome
a large variety of types such as DoS, scans, and botnets. two problems that k-means faces: a) forced assignment: if
Thus, multi-class classifiers are a natural choice, but like any the value of k is lower than the number of natural groups,
classifier they require expensive hand-labeled datasets and are dissimilar instances are forced into the same cluster, and b)
also not able to identify unknown attacks. class dominance, which arises when a cluster contains a large
Wagner et al. [103] use one-class classifiers that can detect number of instances from one class, and fewer numbers of
new anomalies, i.e., data points that do not belong to the instances from other classes. The hypothesis is that a decision
learned class. In particular, they use a one-class SVM classifier tree trained on each cluster learns the sub groupings (if any)
proposed by Schölkopf et al. [104]. In such a classifier, the present within each cluster by partitioning the instances over
training data is presumed to belong to only one class, and the feature space. To obtain a final decision on classification
the learning goal during training is to determine a function of a test instance, the decisions of the k-means and ID3
which is positive when applied to points on the circumscribed algorithms are combined using two rules: (a) the nearest-
boundary around the training points and negative outside. This neighbor rule and (b) the nearest-consensus rule. The authors
is also called semi-supervised classification. Such an SVM claim that the detection accuracy of the k-means+ID3 method
classifier can be used to identify outliers and anomalies. The is very high with an extremely low false positive rate on
authors develop a special kernel function that projects data network anomaly data.
points to a higher dimension before classification. Their kernel Support Vector Machines (SVMs) are very successful max-
function takes into consideration properties of Netflow data imum margin linear classifiers [109]. However, SVMs take a
and enables determination of similarity between two windows long time for training when the dataset is very large. Khan et
of IP flow records. They obtain 92% accuracy on average for al. [106] reduce the training time for SVMs when classifying
all attacks classes. large intrusion datasets by using a hierarchical clustering
Classification-based anomaly detection methods can usually method called Dynamically Growing Self-Organizing Tree
give better results than unsupervised methods (e.g, clustering- (DGSOT) intertwined with the SVMs. DGSOT, which is based
based) because of the use of labeled training examples. In on artificial neural networks, is used to find the boundary
traditional classification, new information can be incorporated points between two classes. The boundary points are the most
by re-training with the entire dataset. However, this is time- qualified points to train SVMs. An SVM computes the max-
consuming. Incremental classification algorithms [105] make imal margins separating the two classes of data points. Only
such training more efficiently. Although classification-based points closest to the margins, called support vectors, affect the
methods are popular, they cannot detect or predict unknown computation of these margins. Other points can be discarded
attack or event until relevant training information is fed for without affecting the final results. Khan et al. approximate
retraining. support vectors by using DGSOT. They use clustering in
For a comparison of several classification-based network parallel with the training of SVMs, without waiting till the
anomaly detection methods, see Table VIII. end of the building of the tree to start training the SVM. The
Several authors have used a combination of classifiers authors find that their approach significantly improves training
and clustering for network intrusion detection leveraging the time for the SVMs without sacrificing generalization accuracy,
advantages of the two methods. For example, Muda et al. [107] in the context of network anomaly detection.
present a two stage model for network intrusion detection. In addition to the several detection methods viz., noted
Initially, k-means clustering is used to group the samples into above, we also discuss a classification-based IDS known
three clusters: C1 to group attack data such as Probe, U2R as DNIDS (Dependable Network Intrusion Detection Sys-
and R2L; C2 to group DoS attack data, and C3 for normal tem) [110]. This IDS is developed based on the Combined
non-attack data. The authors achieve this by initializing the Strangeness and Isolation measure of the k-Nearest Neighbor
cluster centers with the mean values obtained from known (CSI-KNN) algorithm. DNIDS can effectively detect network
data points of appropriate groups. Since the initial centroids intrusion while providing continued service under attack.
are obtained from known labeled data, the authors find that The intrusion detection algorithm analyzes characteristics of
k-means clustering is very good at clustering the data into the network data by employing two measures: strangeness and
three classes. Next, the authors use a Naive Bayes classifier to isolation. These measures are used by a correlation unit to
classify the data in the final stage into the five more accurate raise intrusion alert along with the confidence information.
classes, Normal, DoS, Probe, R2L and U2R. For faster information, DNIDS exploits multiple CSF-KNN
Gaddam et al. [96] present a method to detect anomalous classifiers in parallel. It also includes a intrusion tolerant
activities based on a combined approach that uses the k- mechanism to monitor the hosts and the classifiers running
means clustering algorithm and the ID3 algorithm for decision on them, so that failure of any component can be handled
tree learning [108]. In addition to descriptive features, each carefully. Sensors capture network packets from a network
data instance includes a label saying whether the instance segment and transform them into connection-based vectors.
is normal or anomalous. The first stage of the algorithm The Detector is a collection of CSI-KNN classifiers that ana-
partitions the training data into k clusters using Euclidean lyze the vectors supplied by the sensors. The Manager, Alert
distance similarity. Obviously, the clustering algorithm does Agents, and Maintenance Agents are designed for intrusion
not consider the labels on instances. The second stage of the tolerance and are installed on a secure administrative server
algorithm builds a decision tree on the instances in a cluster. called Station. The Manager executes the tasks of generating
It does so for each cluster so that k separate decision trees are mobile agents and dispatching them for task execution.
316 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE VIII
C OMPARISON OF CLASSIFICATION - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Tong et al. [95] 2005 4 O N P Numeric DARPA99, TCPSTAT C1 KPCC model
Gaddam et al. [96] 2007 3 C N P Numeric NAD, DED, MSD C1 k-means+ID3
Khan et al. [106] 2007 3 C N P Numeric DARPA98 C1 DGSOT + SVM
Das et al. [97] 2008 3 O N P Categorical KDDcup99 C1 APD Algorithm
Lu and Tong [98] 2009 2 O N P Numeric DARPA99 C1 CUSUM-EM
Qadeer et al. [99] 2010 - C R P - Real time C2 Traffic statistics
Wagner et al.[103] 2011 2 C R F Numeric Flow Traces C2 Kernel OCSVM
Muda et al. [107] 2011 2 O N O Numeric KDDcup99 C1 KMNB algorithm
Kang et al. [100] 2012 2 O N P Numeric DARPA98 C1 Differentiated SVDD
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
Fig. 9. Clustering and outliers in 2-D, where Ci s are clusters in (a) and Oi s
are outliers in (b)
Fig. 10. Architecture of MINDS system
analysis module of this system is dedicated to summarize exploits tree-based subspace clustering and an ensemble-based
the network connections as per the assigned anomaly rank. cluster labelling technique to achieve better detection rate over
The analyst provides feedback after analyzing the summaries real life network traffic data for the detection of known as
created and decides whether these summaries are helpful in well as unknown attacks. They obtain 98% detection rate on
creating new rules that may be used in known attack detection. average in detecting network anomalies.
Clustering techniques are frequently used in anomaly de- Some advantages of using clustering are given below.
tection. These include single-link clustering algorithms, k- • For a partitioning approach, if k can be provided accu-
means (squared error clustering), and hierarchical clustering rately then the task is easy.
algorithms to mention a few [113]–[118]. • Incremental clustering (in supervised mode) techniques
Sequeira and Zaki [119] present an anomaly-based intrusion are effective for fast response generation.
detection system known as ADMIT that detects intruders • It is advantageous in case of large datasets to group into
by creating user profiles. It keeps track of the sequence of similar number of classes for detecting network anoma-
commands a user uses as he/she uses a computer. A user lies, because it reduces the computational complexity
profile is represented by clustering the sequences of the user’s during intrusion detection.
commands. The data collection and processing are thus host- • It provides a stable performance in comparison to classi-
based. The system clusters a user’s command sequence using fiers or statistical methods.
LCS (Longest Common Subsequence) as the similarity metric.
Drawbacks of clustering-based methods include the follow-
It uses a dynamic clustering algorithm that creates an initial
ing.
set of clusters and then refines them by splitting and merging
as necessary. When a new user types a sequence of commands, • Most techniques have been proposed to handle continu-
it compares the sequence to profiles of users it already has. If ous attributes only.
it is a long sequence, it is broken up to a number of smaller • In clustering-based intrusion detection techniques, an
sequences. A sequence that is not similar to a normal user’s assumption is that the larger clusters are normal and
profile is considered anomalous. One anomalous sequence is smaller clusters are attack or intrusion [57]. Without this
tolerated as noise, but a sequence of anomalous sequences assumption, it is difficult to evaluate the technique.
typed by one single user causes the user to be marked • Use of an inappropriate proximity measure affects the
as masquerader or concept drift. The system can also use detection rate negatively.
incremental clustering to detect masqueraders. • Dynamic updation of profiles is time consuming.
Zhang et al. [115] report a distributed intrusion detection Several outlier-based network anomaly identification tech-
algorithm that clusters the data twice. The first clustering niques are available in [18]. When we use outlier-based
chooses candidate anomalies at Agent IDSs, which are placed algorithms, the assumption is that anomalies are uncommon
in a distributed manner in a network and a second clustering events in a network. Intrusion datasets usually contain mixed,
computation attempts to identify true attacks at the central numeric and categorial attributes. Many early outlier detec-
IDS. The first clustering algorithm is essentially the same as tion algorithms worked with continuous attributes only; they
the one proposed by [120]. At each agent IDS, small clusters ignored categorial attributes or modeled them in manners that
are assumed to contain anomalies and all small clusters are caused considerable loss of information.
merged to form a single candidate cluster containing all To overcome this problem, Otey et al. [123] develop a
anomalies. The candidate anomalies from various Agent IDSs distance measure for data containing a mix of categorical
are sent to the central IDS, which clusters again using a simple and continuous attributes and use it for outlier-based anomaly
single-link hierarchical clustering algorithm. It chooses the detection. They define an anomaly score which can be used to
smallest k clusters as containing true anomalies. They obtain identify outliers in mixed attribute space by considering de-
90% attacks detection rate on test intrusion data. pendencies among attributes of different types. Their anomaly
Worms are often intelligent enough to hide their activities score function is based on a global model of the data that
and evade detection by IDSs. Zhuang et al. [121] propose can be easily constructed by combining local models built
a method called PAIDS (Proximity-Assisted IDS) to iden- independently at each node. They develop an efficient one-pass
tify the new worms as they begin to spread. PAIDS works approximation algorithm for anomaly detection that works
differently from other IDSs and has been designed to work efficiently in distributed detection environments with very
collaboratively with existing IDSs such as an anomaly-based little loss of detection accuracy. Each node computes its own
IDS for enhanced performance. The goal of the designers outliers and the inter-node communication needed to compute
of PAIDS is to identify new and intelligent fast-propagating global outliers is not significant. In addition, the authors show
worms and thwarting their spread, particularly as the worm is that their approach works well in dynamic network traffic
just beginning to spread. Neither signature-based nor anomaly- situations where data, in addition to being streaming, also
based techniques can achieve such capabilities. Zhuang et al.’s changes in nature as time progresses leading to concept drift.
approach is based mainly on the observation that during the Bhuyan et al. [124] introduce an outlier score function
starting phase of a new worm, the infected hosts are clustered to rank each candidate object w.r.t. the reference points for
in terms of geography, IP address and maybe, even DNSes network anomaly detection. The reference points are computed
used. from the clusters obtained from variants of the k-means
Bhuyan et al. [122] present an unsupervised network clustering technique. The method is effective on real life
anomaly detection method for large intrusion datasets. It intrusion datasets.
318 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE IX
C OMPARISON OF CLUSTERING AND OUTLIER - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Sequeira and Zaki [119] 2002 4 C R P Numeric, Cat- Real life Synthetic in- ADMIT
egorical trusions
Zhang et al. [115] 2005 2 D N P Numeric KDDcup99 C1 Cluster-based DIDS
Leung and Leckie [116] 2005 3 C N P Numeric KDDcup99 C1 fpMAFIA algorithm
Otey et al. [123] 2006 5 C N P Mixed KDDcup99 C1 FDOD algorithm
Jiang et al. [125] 2006 3 C N P Mixed KDDcup99 C1 CBUID algorithm
Chen and Chen [126] 2008 - O N - - - C3 AAWP model
Zhang et al. [117] 2009 2 O N P Mixed KDDcup99 C1 KD algorithm
Zhuang et al. [121] 2010 2 R C P - Real time C6 PAIDS model
Bhuyan et al. [124] 2011 2 N C P,F Numeric KDDcup99 C1 NADO algorithm
Casas et al. [118] 2012 2 N C F Numeric KDDcup99, C1 UNIDS method
Real time
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, C5 -remote to local, and C6 -worms
time traffic, it uses a structured SOM. It continuously collects instead of strings in standard genetic algorithms leading to
network data from a network port, preprocesses that data and enhanced representation ability with compact descriptions
selects the features necessary for classification. Then it starts derived from possible node reusability in a graph.
the classification process - a chunk of packets at a time - and Xian et al. [146] present a novel unsupervised fuzzy cluster-
then sends the resulting classification to a graphical tool that ing method based on clonal selection for anomaly detection.
portrays the activities that are taking place on the network The method is able to obtain global optimal clusters more
port dynamically as it receives more packets. The hypothesis quickly than competing algorithms with greater accuracy.
is that routine traffic that represents normal behavior would be In addition to the fuzzy set theoretic detection methods, we
clustered around one or more cluster centers and any irregular discuss two IDSs, viz., NFIDS and FIRE below.
traffic representing abnormal and possibly suspicious behavior NFIDS [147] is a neuro-fuzzy anomaly-based network
would be clustered in addition to the normal traffic clustering. intrusion detection system. It comprises three tiers. Tier-I
The system is capable of classifying regular vs. irregular and contains several Intrusion Detection Agents (IDAs). IDAs are
possibly intrusive network traffic for a given host. IDS components that monitor the activities of a host or a
POSEIDON (Payl Over Som for Intrusion DetectiON) [140] network and report the abnormal behavior to Tier-II. Tier-
is a two-tier network intrusion detection system. The first II agents detect the network status of a LAN based on the
tier consists of a self-organizing map (SOM), and is used network traffic that they observe as well as the reports from the
exclusively to classify payload data. The second tier consists Tier-I agents within the LAN. Tier-III combines higher-level
of a light modification of the PAYL system [84]. Tests using reports, correlates data, and sends alarms to the user interface.
the DARPA99 dataset show a higher detection rate and lower There are four main types of agents in this system: TCPAgent,
number of false positives than PAYL and PHAD [141]. which monitors TCP connections between hosts and on the
3) Fuzzy set theoretic approaches: Fuzzy network intrusion network, UDPAgent, which looks for unusual traffic involving
detection systems exploit fuzzy rules to determine the likeli- UDP data, ICMPAgent, which monitors ICMP traffic and
hood of specific or general network attacks [142], [143]. A PortAgent, which looks for unusual services in the network.
fuzzy input set can be defined for traffic in a specific network. FIRE (Fuzzy Intrusion Recognition Engine) [142] is an
Tajbakhsh et al. [144] describe a novel method for building anomaly-based intrusion detection system that uses fuzzy logic
classifiers using fuzzy association rules and use it for network to assess whether malicious activity is taking place on a
intrusion detection. The fuzzy association rule sets are used network. The system combines simple network traffic metrics
to describe different classes: normal and anomalous. Such with fuzzy rules to determine the likelihood of specific or
fuzzy association rules are class association rules where the general network attacks. Once the metrics are available, they
consequents are specified classes. Whether a training instance are evaluated using a fuzzy set theoretic approach. The system
belongs to a specific class is determined by using matching takes on fuzzy network traffic profiles as inputs to its rule set
metrics proposed by the authors. The fuzzy association rules and report maliciousness.
are induced using normal training samples. A test sample 4) Rough Set approaches: A rough set is an approximation
is classified as normal if the compatibility of the rule set of a crisp set (i.e., a regular set) in terms of a pair of sets that
generated is above a certain threshold; those with lower are its lower and upper approximations. In the standard and
compatibility are considered anomalous. The authors also original version of rough set theory [148], the two approxima-
propose a new method to speed up the rule induction algorithm tions are crisp sets, but in other variations the approximating
by reducing items from extracted rules. sets may be fuzzy sets. The mathematical framework of rough
Mabu et al. report a novel fuzzy class-association-rule set theory enables modeling of relationships with a minimum
mining method based on genetic network programming (GNP) number of rules.
for detecting network intrusions [145]. GNP is an evolutionary Rough sets have two useful features [149]: (i) enabling
optimization technique, which uses directed graph structures learning with small size training datasets (ii) and overall
320 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE X
C OMPARISON OF SOFT COMPUTING - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Cannady [133] 2000 2 O N P Numeric Real-life C2 CMAC-based model
Balajinath and Raghavan 2001 3 O N O Categorical User command C4 Behavior Model
[127]
Lee and Heinbuch [134] 2001 3 C N P - Simulated data C2 TNNID model
Xian et al. [146] 2005 3 C N P Numeric KDDcup99 C1 Fuzzy k-means
Amini et al. [130] 2006 2 C R P Categorical KDDcup99, Real- C1 RT-UNNID system
life
Chimphlee et al. [150] 2006 3 C N P Numeric KDDcup99 C1 Fuzzy Rough C-means
Liu et al. [135] 2007 2 C N P Numeric KDDcup99 C1 HPCANN Model
Adetunmbi et al. [151] 2008 2 C N P Numeric KDDcup99 C1 LEM2 and K-NN
Chen et al. [152] 2009 3 C N P Numeric DARPA98 C2 RST-SVM technique
Mabu et al. [145] 2011 3 C N P Numeric KDDcup99 C1 Fuzzy ARM-based on GNP
Visconti and Tahayori [155] 2011 2 O N P Numeric Real-life C2 Interval type-2 fuzzy set
Geramiraz et al. [143] 2012 2 O N P Numeric KDDcup99 C1 Fuzzy rule-based model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
Snort [113] is a quintessentially popular rule-based IDS. which an expert system makes a decision that human common
This open-source IDS matches each packet it observes against sense would recognize as impossible. They use a technique
a set of rules. The antecedent of a Snort rule is a boolean called prudence [166], in which for every rule, the upper and
formula composed of predicates that look for specific values lower bounds of each numerical variable in the data seen
of fields present in IP headers, transport headers and in the by the rule are recorded, as well as a list of values seen
payload. Thus, Snort rules identify attack packets based on for enumerated variables. The expert system raises a warning
IP addresses, TCP or UDP port numbers, ICMP codes or when a new value or a value outside the range is seen in a
types, and contents of strings in the packet payload. Snort’s data instance. They improve the approach by using a simple
rules are arranged into priority classes based on potential probabilistic technique to decide if a value is an outlier. When
impact of alerts that match the rules. Snort’s rules have working with network anomaly data, the authors partition the
evolved over its history of 15 years. Each Snort rule has problem space into smaller subspaces of homogeneous traffic,
associated documentation with the potential for false positives each of which is represented with a separate model in terms
and negatives, together with corrective actions to be taken of rules. The authors find that this approach works reasonably
when the rule raises an alert. Snort rules are simple and easily well for new subspaces when little data has been observed.
understandable. Users can contribute rules when they observe They claim 0% false negative rate in addition to very low
new types of anomalous or malicious traffic. Currently, Snort false positive rate.
has over 20, 000 rules, inclusive of those submitted by users. Scheirer and Chuah [167] report a syntax-based scheme that
An intrusion detection system like Snort can run on a uses variable-length partition with multiple break marks to
general purpose computer and can try to inspect all packets detect many polymorphic worms. The prototype is the first
that go through the network. However, monitoring packets NIDS that provides semantics-aware capability, and can cap-
comprehensively in a large network is obviously an expensive ture polymorphic shell codes with additional stack sequences
task since it requires fast inspection on a large number of and mathematical operations.
network interfaces. Many hundreds of rules may have to be 2) Ontology and logic-based approaches: It is possible to
matched concurrently, making scaling almost impossible. model attack signatures using expressive logic structure in
To scale to large networks that collect flow statistics ubiqui- real time by incorporating constraints and statistical properties.
tously, Duffield et al. [163] use the machine learning algorithm Naldurg et al. [168] present a framework for intrusion detec-
called Adaboost [164] to translate packet level signatures tion based on temporal logic specification. Intrusion patterns
to work with flow level statistics. The algorithm is used to are specified as formulae in an expressively rich and efficiently
correlate the packet and flow information. In particular, the monitorable logic called EAGLE and evaluated using DARPA
authors associate packet level network alarms with a feature log files.
vector they create from flow records on the same traffic. They Estevez-Tapiador et al. [169] describe a finite state ma-
create a set of rules using flow information with features chine (FSM) methodology, where a sequence of states and
similar to those used in Snort rules. They also add numerical transitions among them seems appropriate to model network
features such as the number of packets of a specific kind protocols. If the specifications are complete enough, the model
flowing within a certain time period. Duffield et al. train is able to detect illegitimate behavioral patterns effectively.
Adaboost on concurrent flow and packet traces. They evaluate Shabtai et al. [170] describe an approach for detecting
the system using real time network traffic data with more than previously un-encountered malware targeting mobile devices.
a billion flows over 29 days, and show that their performance Time-stamped security data is continuously monitored within
is comparable to Snort’s with flow data. the target mobile devices like smart phones and PDAs. Then
Prayote and Compton [165] present an approach to anomaly it is processed by the knowledge-based temporal abstraction
detection that attempts to address the brittleness problem in (KBTA) methodology. The authors evaluate the KBTA model
322 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE XI
C OMPARISON OF KNOWLEDGE - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Noel et al. [156] 2002 - O N O - - - Attack Guilt Model
Sekar et al. [157] 2002 3 O N P Numeric DARPA99 C1 Specification-Based Model
Tapiador et al. [169] 2003 3 C N P Numeric Real-life C2 Markov Chain Model
Hung and Liu [171] 2008 - O N P Numeric KDDcup99 C1 Ontology-based
Shabtai et al. [170] 2010 2 O N O - Real-life C2 Incremental KBTA
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
TABLE XII
C OMPARISON OF ENSEMBLE - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- Combination strategy w x y Data types Dataset used z Detection method
cation
Chebrolu et al. 2005 Weighted O N P Numeric KDDcup99 C1 Class specific
[178] voting ensemble model
Perdisci et al. [180] 2006 Majority O N Pay - Operational Synthetic in- One-class classifier
voting points trusions model
Borji [173] 2007 Majority O N P Numeric DARPA98 C1 Heterogeneous clas-
voting sifiers model
Perdisci et al. [183] 2009 Min and Max probability O R Pay - DARPA98 C1 McPAD model
Folino et al. [181] 2010 Weighted majority vot- O N P Numeric KDDcup99 C1 GEdIDS model
ing
Noto et al. [176] 2010 Information theoretic O N - Numeric UCI None FRaC model
Nguyen et al. [58] 2011 Majority O N P Numeric KDDcup99 C1 Cluster ensemble
voting
Khreich et al. [184] 2012 Learn and combine O N pay Numeric UNM C4 EoHMMs model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or payload-based (pay) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
system that consists of an ensemble of one-class classifiers. classification accuracy compared to the stand-alone general
It is very accurate in detecting network attacks that bear decision-based techniques even though such a system may
some form of shell-code in the malicious payload. This detec- have several disparate data sources. So, a suitable combination
tor performs well even in the case of polymorphic attacks. of these is the focus of the fusion approach. Several fusion-
Furthermore, the authors tested their IDS with advanced based techniques have been applied to network anomaly
polymorphic blending attacks and showed that even in the detection [185]–[189]. A classification of such techniques is
presence of such sophisticated attacks, it is able to obtain a as follows: (i) data level, (ii) feature level, and (iii) decision
low false positive rate. level. Some methods only address the issue of operating in
An ensemble method is advantageous because it obtains a space of high dimensionality with features divided into
higher accuracy than the individual techniques. The following semantic groups. Others attempt to combine classifiers trained
are the major advantages. on different features divided based on hierarchical abstraction
• Even if the individual classifiers are weak, the ensemble levels or the type of information contained.
methods perform well by combining multiple classifiers. Giacinto et al. [185] provide a pattern recognition approach
• Ensemble methods can scale for large datasets. to network intrusion detection employing a fusion of mul-
• Ensemble classifiers need a set of controlling parameters tiple classifiers. Five different decision fusion methods are
that are comprehensive and can be easily tuned. assessed by experiments and their performances compared.
• Among existing approaches, Adaboost and Stack gener- Shifflet [186] discusses a platform that enables a multitude of
alization are more effective because they can exploit the techniques to work together towards creating a more realistic
diversity in predictions by multiple base level classifiers. fusion model of the state of a network, able to detect mali-
cious activity effectively. A heterogenous data level fusion for
Here are some disadvantages of ensemble-based methods.
network anomaly detection is added by Chatzigiannakis et al.
• Selecting a subset of consistent performing and unbiased [190]. They use the Dempster-Shafer Theory of Evidence and
classifiers from a pool of classifiers is difficult. Principal Components Analysis for developing the technique.
• The greedy approach for selecting sample datasets is slow dLEARNIN [187] is an ensemble of classifiers that com-
for large datasets. bines information from multiple sources. It is explicitly tuned
• It is difficult to obtain real time performance. to minimize the cost of errors. dLEARNIN is shown to achieve
A comparison of ensemble-based network anomaly detec- state-of-the-art performance, better than competing algorithms.
tion methods is given in Table XII. The cost minimization strategy, dCMS, attempts to minimize
2) Fusion-based methods and system: With an evolving the cost to a significant level. Gong et al. [191] contribute a
need of automated decision making, it is important to improve neural network-based data fusion method for intrusion data
324 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
analysis and pruning to filter information from multi-sensors building a misuse, anomaly, and hybrid network-based IDS.
to get high detection accuracy. The hybrid detection system improves detection performance
HMMPayl [192] is an example of fusion-based IDS, where by combining the advantages of both misuse and anomaly
the payload is represented as a sequence of bytes, and the detection. Tong et al. [201] discuss a hybrid RBF/Elman neural
analysis is performed using Hidden Markov Models (HMM). network model that can be employed for both anomaly detec-
The algorithm extracts features and uses HMM to guarantee tion and misuse detection. It can detect temporally dispersed
the same expressive power as that of n-gram analysis, while and collaborative attacks effectively because of its memory of
overcoming its computational complexity. HMMPayl follows past events.
the Multiple Classifiers System paradigm to provide better A intelligent hybrid IDS model based on neural networks
classification accuracy, to increase the difficulty of evading is introduced by [202]. The model is flexible, extended to
the IDS, and to mitigate the weaknesses due to a non-optimal meet different network environments, improves detection per-
choice of HMM parameters. formance and accuracy. Selim et al. [203] report a hybrid
Some advantages of fusion methods are given below. intelligent IDS to improve the detection rate for known and
• Data fusion is effective in increasing timeliness of attack unknown attacks. It consists of multiple levels: hybrid neural
identification and in reducing false alarm rates. networks and decision trees. The technique is evaluated using
• Decision level fusion with appropriate training data usu- NSL-KDD dataset and results were promising.
ally yields high detection rate. Advantages of hybrid methods include the following.
Some of the drawbacks are given below. • Such a method exploits major features from both signa-
• The computational cost is high for rigorous training on
ture and anomaly-based network anomaly detection.
the samples. • Such methods can handle both known and unknown
• Feature level fusion is a time consuming task. Also, the
attacks.
biases of the base classifiers affect the fusion process.
• Building hypotheses for different classifiers is a difficult
Drawbacks include the following.
task. • Lack of appropriate hybridization may lead to high
A comparison of fusion-based network anomaly detection computational cost.
methods is given in Table XIII. • Dynamic updation of rule or profile or signature still
3) Hybrid methods and system: Most current network remains difficult.
intrusion detection systems employ either misuse detection or Table XIV presents a comparison of a few hybrid network
anomaly detection. However, misuse detection cannot detect anomaly detection methods.
unknown intrusions, and anomaly detection usually has high
false positive rate [193]. To overcome the limitations of
the techniques, hybrid methods are developed by exploiting G. Discussion
features from several network anomaly detection approaches
[194]–[196]. Hybridization of several methods increases per- After a long and elaborate discussion of many intrusion
formance of IDSs. detection methods and anomaly-based network intrusion de-
For example, RT-MOVICAB-IDS, a hybrid intelligent IDS tection systems under several categories, we make a few
is introduced in [197]. It combines ANN and CBR (case-based observations.
reasoning) within a Multi-Agent System (MAS) to detect (i) Each class of anomaly-based network intrusion detection
intrusion in dynamic computer networks. The dynamic real methods and systems has unique strengths and weak-
time multi-agent architecture allows the addition of prediction nesses. The suitability of an anomaly detection technique
agents (both reactive and deliberative). In particular, two of depends on the nature of the problem attempted to
the deliberative agents deployed in the system incorporate address. Hence, providing a single integrated solution to
temporal-bounded CBR. This upgraded CBR is based on an every anomaly detection problem may not be feasible.
anytime approximation, which allows the adaptation of this (ii) Various methods face various challenges when complex
paradigm to real time requirements. datasets are used. Nearest neighbor and clustering tech-
A hybrid approach to host security that prevents binary niques suffer when the number of dimensions is high
code injection attacks known as the FLIPS (Feedback Learning because the distance measures in high dimensions are not
IPS) model is proposed by [198]. It incorporates three major able to differentiate well between normal and anomalous
components: an anomaly-based classifier, a signature-based instances.
filtering scheme, and a supervision framework that employs Spectral techniques explicitly address the high di-
Instruction Set Randomization (ISR). Capturing the injected mensionality problem by mapping data to a lower di-
code allows FLIPS to construct signatures for zero-day ex- mensional projection. But their performance is highly
ploits. Peddabachigari et al. [199] present a hybrid approach dependent on the assumption that normal instances and
that combines Decision trees (DT) and SVMs as a hierarchi- anomalies are distinguishable in the projected space. A
cal hybrid intelligent system model (DTSVM) for intrusion classification technique often performs better in such a
detection. It maximizes detection accuracy and minimizes scenario. However, it requires labeled training data for
computational complexity. both normal and attack classes. The improper distribution
Zhang et al. [200] propose a systematic framework that of these training data often makes the task of learning
applies a data mining algorithm called random forests in more challenging.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 325
TABLE XIII
C OMPARISON OF FUSION - BASED NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- Fusion level w x y Data types Dataset used z Detection method
cation
Giacinto et al. [185] 2003 Decision O N P Numeric KDDcup99 C1 MCS Model
Shifflet [186] 2005 Data O N O - - None HSPT algorithm
Chatzigiannakis et al. [190] 2007 Data C N P - NTUA, GRNET C2 D-S algorithm
Parikh and Chen [187] 2008 Data C N P Numeric KDDcup99 C1 dLEARNIN system
Gong et al. [191] 2010 Data C N P Numeric KDDcup99 C1 IDEA model
Ariu et al. [192] 2011 Decision C R Pay - DARPA98, real-life C1 HMMPayl model
Yan and Shao [189] 2012 Decision O N F Numeric Real time C2 , C3 EWMA model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or payload-based (pay) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
TABLE XIV
C OMPARISON OF HYBRID NETWORK ANOMALY DETECTION METHODS
Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Locasto et al. [198] 2005 2 C R P - Real-life C2 FLIPS model
Zhang and Zulkernine [194] 2006 2 C N P Numeric KDDcup99 C1 Random forest-based hybrid
algorithm
Peddabachigari et al. [199] 2007 2 C N P Numeric KDDcup99 C1 DT-SVM hybrid model
Zhang et al. [200] 2008 2 C N P Numeric KDDcup99 C1 RFIDS model
Aydin et al. [195] 2009 3 C N P - DARPA98, IDE- C1 Hybrid signature-based IDS
VAL
Tong et al. [201] 2009 1 C N P Numeric DARPA-BSM C1 Hybrid RBF/Elman NN
Yu [202] 2010 1 C N - - - - Hybrid NIDS
Arumugam et al. [193] 2010 - C N P Numeric KDDcup99 C1 Multi-stage hybrid IDS
Selim et al. [203] 2011 - C N P Numeric KDDcup99 C1 Hybrid multi-level IDS
Panda et al. [196] 2012 2 C N P Numeric NSL-KDD, KD- C1 DTFF and FFNN
Dcup99
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local
Semi-supervised nearest neighbor and clustering tech- (misuse, anomaly or both), nature of detection (online or
niques that only use normal labels, can often be more ef- offline), nature of processing (centralized or distributed), data
fective than classification-based techniques. In situations gathering mechanism (centralized or distributed) and approach
where identifying a good distance measure is difficult, of analysis. A comparison chart is given in Table XV.
classification or statistical techniques may be a better
choice. However, the success of the statistical techniques V. E VALUATION C RITERIA
is largely influenced by the applicability of the statistical To evaluate performance, it is important that the system
assumptions in the specific real life scenarios. identifies the attack and normal data correctly. There are sev-
(iii) For real time intrusion detection, the complexity of eral datasets and evaluation measures available for evaluating
the anomaly detection process plays a vital role. In network anomaly detection methods and systems. The most
case of classification, clustering, and statistical methods, commonly used datasets and evaluation measures are given
although training is expensive, they are still acceptable below.
because testing is fast and training is offline. In con-
trast, techniques such as nearest neighbor and spectral
techniques which do not have a training phase, have an A. Datasets
expensive testing phase which can be a limitation in a Capturing and preprocessing high speed network traffic is
real setting. essential prior to detection of network anomalies. Different
(iv) Anomaly detection techniques typically assume that tools are used for capture and analysis of network traffic data.
anomalies in data are rare when compared to normal We list a few commonly used tools and their features in Table
instances. Generally, such assumptions are valid, but not XVI. These are commonly used by both the network defender
always. Often unsupervised techniques suffer from large and the attacker at different time points.
false alarm rates, when anomalies are in bulk amounts. The following are various datasets that have been used for
Techniques operating in supervised or semi-supervised evaluating network anomaly detection methods and systems.
modes [204] can be applied to detect bulk anomalies. A taxonomy of different datasets is given in Figure 14.
1) Synthetic datasets: Synthetic datasets are generated to
We perform a comparison of the anomaly-based network meet specific needs or conditions or tests that real data satisfy.
intrusion detection systems that we have discussed throughout This can be useful when designing any type of system for
this paper based on parameters such as mode of detec- theoretical analysis so that the design can be refined. This
tion (host-based, network-based or both), detection approach allows for finding a basic solution or remedy, if the results
326 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
TABLE XV
C OMPARISON OF EXISTING NIDS S
TABLE XVI
T OOLS USED IN DIFFERENT STEPS IN NETWORK TRAFFIC ANOMALY DETECTION AND THEIR DESCRIPTION
prove to be satisfactory. Synthetic data is used in testing and This dataset was prepared by Stolfo et al. [206] and is
creating many different types of test scenarios. It enables built on the data captured in the DARPA98 IDS evaluation
designers to build realistic behavior profiles for normal users program. The KDD training dataset consists of approximately
and attackers based on the generated dataset to test a proposed 4, 900, 000 single connection vectors, each of which contains
system. 41 features and is labeled as either normal or attack with a
specific attack type. The test dataset contains about 300, 000
2) Benchmark datasets: In this subsection, we present
samples with 24 training attack types, with an additional 14
six publicly available benchmark datasets generated using
attack types in the test set only. The names and descriptions
simulated environments that include a number of networks
of the attack types are available in [205].
and by executing different attack scenarios.
(a) KDDcup99 dataset: Since 1999, the KDDcup99 dataset (b) NSL-KDD dataset: Analysis of the KDD dataset showed
[205] has been the most widely used dataset for the evaluation that there were two important issues in the dataset, which
of network-based anomaly detection methods and systems. highly affect the performance of evaluated systems result-
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 327
data/index.html 9 http://agnigarh.tezu.ernet.in/∼dkb/resources/
328 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
Fig. 15. Confusion matrix and related evaluation measures Fig. 16. Illustration of confusion matrix in terms of related evaluation
measures
fact that they are not good representatives of real world traffic. (n) test instance is predicted as normal (N) it is known as true
For example, the DARPA dataset has been questioned about negative (TN), while it is a false positive (FP) if it is predicted
the realism of the background traffic [219], [220] because as anomalous (Y) [40], [227], [228].
it is synthetically generated. In addition to the difficulty of The true positive rate (TPR) is the proportion of anomalous
simulating real time network traffic, there are some other instances classified correctly over the total number of anoma-
challenges in IDS evaluation [221]. A comparison of datasets lous instances present in the test data. TPR is also known as
is shown in Table XVII. sensitivity. The false positive rate (FPR) is the proportion of
normal instances incorrectly classified as anomalous over the
total number of normal instances contained in the test data.
B. Evaluation Measures The true negative rate (TNR) is also called specificity. TPR,
An evaluation of a method or a system in terms of accuracy FPR, TNR, and the false negative rate (FNR) can be defined
or quality is a snapshot in time. As time passes, new vul- for the normal class. We illustrate all measures related to the
nerabilities may evolve, and current evaluations may become confusion matrix in Figure 16.
irrelevant. In this section, we discuss various measures used Sensitivity is also known as the hit rate. Between sensitivity
to evaluate network intrusion detection methods and systems. and specificity, sensitivity is set at high priority when the
1) Accuracy: Accuracy is a metric that measures how system is to be protected at all cost, and specificity gets
correctly an IDS works, measuring the percentage of detection more priority when efficiency is of major concern [227].
and failure as well as the number of false alarms that the Consequently, the aim of an IDS is to produce as many TPs
system produces [223], [224]. If a system has 80% accuracy, and TNs as possible while trying to reduce numbers of both
it means that it correctly classifies 80 instances out of 100 FPs and FNs. The majority of evaluation criteria use these
to their actual classes. While there is a big diversity of variables and the relations among them to model the accuracy
attacks in intrusion detection, the main focus is that the of the IDSs.
system be able to detect an attack correctly. From real life (b) ROC Curves: The Receiver Operating Characteristics
experience, one can easily conclude that the actual percentage (ROC) analysis originates from signal processing theory. Its
of abnormal data is much smaller than that of the normal [57], applicability is not limited only to intrusion detection, but
[225], [226]. Consequently, intrusions are harder to detect than extends to a large number of practical fields such as medical
normal traffic, resulting in excessive false alarms as the biggest diagnosis, radiology, bioinformatics as well as in artificial
problem facing IDSs. The following are the some accuracy intelligence and data mining. In intrusion detection, ROC
measures. curves are used on the one hand to visualize the relation
(a) Sensitivity and Specificity: These two measures [227] between TP and FP rates of a classifier while tuning it and
attempt to measure the accuracy of classification for a 2-class also to compare the accuracy with two or more classifiers. The
problem. When an IDS classifies data, its decision can be ROC space [229], [230] uses the orthogonal coordinate system
either right or wrong. It assumes true for right and false for to visualize the classifier accuracy. Figure 17 illustrates the
wrong, respectively. ROC approach normally used for network anomaly detection
If S is a detector and Dt is the set of test instances, there are methods and systems evaluation.
four possible outcomes described using the confusion matrix (c) Misclassification rate: This measure attempts to estimate
given in Figure 15. When an anomalous test instance (p) is the probability of disagreement between the true and predicted
predicted as anomalous (Y) by the detector S, it is counted cases by dividing the sum of FN and FP by the total number
as true positive (TP); if it is predicted as normal (N), it is of pairs observed, i.e., (TP+FP+FN+TN). In other words, mis-
counted as false negative (FN). On the other hand, if a normal classification rate is defined as (FN+FP)/(TP+FP+FN+TN).
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 329
TABLE XVII
L IST OF DATASETS AVAILABLE AND THEIR DESCRIPTIONS
types of attacks that are not identified can indicate which areas
of the algorithm need more attention. Exposing these flaws and
establishing the causes assist future improvement.
The F-measure mixes the properties of the previous two
measures as the harmonic mean of precision and recall [40],
[228]. If we want to use only one accuracy metric as an
evaluation criterion, F-measure is the most preferable. Note
that when precision and recall both reach 100%, the F-measure
is the maximum, i.e., 1 meaning that the classifier has 0% false
alarms and detects 100% of the attacks. Thus, a good classifier
is expected to obtain F-measure as high as possible.
2) Performance: The evaluation of an IDS performance is
an important task. It involves many issues that go beyond
the IDS itself. Such issues include the hardware platform,
the operating system or even the deployment of the IDS.
For a NIDS, the most important evaluation criterion for its
performance is the system’s ability to process traffic on a high
Fig. 17. Illustration of ROC measure where A, B, C represents the accuracy speed network with minimum packet loss when working real
of a detection method or a system in ascending order.
time. In real network traffic, the packets can be of various
sizes, and the effectiveness of a NIDS depends on its ability
(d) Confusion Matrix: The confusion matrix is a ranking to handle packets of any size. In addition to the processing
method that can be applied to any kind of classification speed, the CPU and memory usage can also serve as measure-
problem. The size of this matrix depends on the number of ments of NIDS performance [231]. These are usually used as
distinct classes to be detected. The aim is to compare the actual indirect measures that take into account the time and space
class labels against the predicted ones as shown in Figure 15. complexities of intrusion detection algorithms. Finally, the
The diagonal represents correct classification. The confusion performance of any NIDS is highly dependent upon (i) its
matrix for intrusion detection is defined as a 2-by-2 matrix, individual configuration, (ii) the network it is monitoring, and
since there are only two classes known as intrusion and normal (iii) its position in that network.
[40], [226], [228]. Thus, the TNs and TPs that represent the 3) Completeness: The completeness criterion represents the
correctly predicted cases lie on the matrix diagonal while the space of the vulnerabilities and attacks that can be covered by
FNs and FPs are on the right and left sides. As a side effect of an IDS. This criterion is very hard to assess because having
creating the confusion matrix, all four values are displayed in omniscience of knowledge about attacks or abuses of privilege
a way that the relation between them can be easily understood. is impossible. The completeness of an IDS is judged against
(e) Precision, Recall and F-measure: Precision is a measure a complete set of known attacks. The ability of an IDS is
of how a system identifies attacks or normals. A flagging considered complete, if it covers all the known vulnerabilities
is accurate if the identified instance indeed comes from a and attacks.
malicious user, which is referred to as true positive. The final 4) Timeliness: An IDS that performs its analysis as quickly
quantity of interest is recall, a measure of how many instances as possible enables the human analyst or the response engine
are identified correctly (see Figure 15). Precision and recall to promptly react before much damage is done within a
are often inversely proportional to each other and there is specific time period. This prevents the attacker from subverting
normally a trade-off between these two ratios. An algorithm the audit source or the IDS itself. The response generated by
that produces low precision and low recall is most likely the system while combating an attack is very important. Since
defective with conceptual errors in the underlying theory. The the data must be processed to discover intrusions, there is
330 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
always a delay between the actual moment of the attack and • Network anomalies may originate from various sources as
the response of the system. This is called total delay. Thus, the discussed in Section III. So, a better IDS should be able
total delay is the difference between tattack and tresponse . The to recognize origins of the anomalies before initiating the
smaller the total delay, the better an IDS is with respect to its detection process.
response. No matter if an IDS is anomaly-based or signature- • An IDS, to be capable of identifying both known as
based, there is always a gap between the starting time of an well as unknown attacks, should exploit both supervised
attack and its detection. (rule or signature-based learning) as well as unsupervised
5) Data Quality: Evaluating the quality of data is another (clustering or outlier-based) at multiple levels for real
important task in NIDS evaluation. Quality of data is influ- time performance with low false alarm rates.
enced by several factors, such as (i) source of data (should be • The IDS developer should choose the basic components,
from reliable and appropriate sources), (ii) selection of sample method(s), techniques or rule/signature/profile base to
(should be unbiased), (iii) sample size (neither over nor under- overcome four important limitations: subjective effective-
sampling), (iv) time of data (should be frequently updated ness, limited scalability, scenario dependent efficiency
real time data), (v) complexity of data (data should be simple and restricted security.
enough to be handled easily by the detection mechanism), and • The performance of a better IDS needs to be established
so on. both qualitatively and quantitatively.
6) Unknown attack detection: New vulnerabilities are • A better anomaly classification or identification method
evolving almost every day. An anomaly-based network in- enables us to tune it (the corresponding normal profiles,
trusion detection system should be capable of identifying thresholds, etc.) depending on the network scenario.
unknown attacks, in addition to known attacks. The IDS
should show consistent abilities of detecting unknown or even VII. O PEN I SSUES AND CHALLENGES
modified intrusions.
7) Profile Update: Once new vulnerabilities or exploits Although, many methods and systems have been developed
are discovered, signatures or profiles must be updated for by the research community, there are still a number of open
future detection. However, writing new or modified profiles research issues and challenges. The suitability of performance
or signatures without conflict is a challenge, considering the metrics is a commonly identified drawback in intrusion de-
current high-speed network scenario. tection. In evaluating IDSs, the three most important quali-
8) Stability: Any anomaly detection system should perform ties that need to be measured are completeness, correctness,
consistently in different network scenarios and in different and performance. The current state-of-the-art in intrusion
circumstances. It should consistently report identical events detection restricts evaluation of new systems to tests over
in a similar manner. Allowing the users to configure differ- incomplete datasets and micro-benchmarks that test narrowly
ent alerts to provide different messages in different network defined components of the system. A number of anomaly-
environments may lead to an unstable system. based systems have been tested using contrived datasets. Such
9) Information provided to Analyst: Alerts generated by evaluation is limited by the quality of the dataset that the
an IDS should be meaningful enough to clearly identify system is evaluated against. Construction of a dataset which is
the reasons behind the event to be raised, and the reasons unbiased, realistic and comprehensive is an extremely difficult
this event is of interest. It should also assist the analyst task.
in determining the relevance and appropriate reaction to a A formal proof of correctness [6] in the intrusion detection
particular alert. The alert should also specify the source of domain is exceptionally challenging and expensive. Therefore,
the alert and the target system. “pretty good assurance” presents a way in which systems can
10) Interoperability: An effective intrusion detection mech- be measured allowing fuzzy decisions, trade-offs, and priori-
anism is supposed to be capable of correlating information ties. Such a measure must take into consideration the amount
from multiple sources, such as system logs, other HIDSs, of work required to discover a vulnerability or weakness to
NIDSs, firewall logs and any other sources of information exploit for an attack and execute an attack on the system.
available. This helps in maintaining interoperability, while After a study of existing NIDSs, we find that it is still
installing a range of HIDSs or NIDSs from various vendors. extremely difficult to design a new NIDS to ensure robustness,
scalability and high performance. In particular, practitioners
VI. R ECOMMENDATIONS find it difficult to decide where to place the NIDS and how to
The following are some recommendations one needs to be best configure it for use within an environment with multiple
mindful of when developing a network anomaly detection stakeholders. We sort out some of the important issues as
method or a system. challenges and enumerate them below.
• Most existing IDSs for the wired environment work in (i) Runtime limitation presents an important challenge for
three ways: flow level traffic or packet level feature data a NIDS. Without losing any packets, a real time IDS
analysis, protocol analysis or payload inspection. Each of should be ideally able to capture and inspect each packet.
these categories has its own advantages and limitations. (ii) Most NIDSs and network intrusion detection methods
So, a hybridization of these (e.g., protocol level analysis depend on the environment. Ideally, a system or method
followed by flow level traffic analysis) may give better should be independent of the environment.
performance in terms of known (with high detection rate) (iii) The nature of anomalies keeps changing over time as
as well as unknown attack detection. intruders adapt their network attacks to evade existing
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 331
[27] M. Al-Kuwaiti, N. Kyriakopoulos, and S. Hussein, “A comparative [51] S. Boriah, V. Chandola, and V. Kumar, “Similarity measures for
analysis of network dependability, fault-tolerance, reliability, security, categorical data: A comparative evaluation,” in Proc. 8th SIAM In-
and survivability,” IEEE Commun. Surveys Tutorials, vol. 11, no. 2, ternational Conference on Data Mining, 2008, pp. 243–254.
pp. 106–124, April 2009. [52] G. Gan, C. Ma, and J. Wu, Data Clustering Theory, Algorithms and
[28] B. Donnet, B. Gueye, and M. A. Kaafar, “A Survey on Network Applications. SIAM, 2007.
Coordinates Systems, Design, and Security,” IEEE Commun. Surveys [53] C. C. Hsu and S. H. Wang, “An integrated framework for visualized
Tutorials, vol. 12, no. 4, pp. 488–503, October 2010. and exploratory pattern discovery in mixed data,” IEEE Trans. Knowl.
[29] S. X. Wu and W. Banzhaf, “The use of computational intelligence Data Eng., vol. 18, no. 2, pp. 161–173, 2005.
in intrusion detection systems: A review,” Applied Soft Computing, [54] M. V. Joshi, R. C. Agarwal, and V. Kumar, “Mining needle in a
vol. 10, no. 1, pp. 1–35, January 2010. haystack: classifying rare classes via two-phase rule induction,” in
[30] Y. Dong, S. Hsu, S. Rajput, and B. Wu, “Experimental Analysis of Proc. 7th ACM SIGKDD International Conference on Knowledge
Application Level Intrusion Detection Algorithms,” International J. Discovery and Data Mining. ACM, 2001, pp. 293–298.
Security and Networks, vol. 5, no. 2/3, pp. 198–205, 2010. [55] J. Theiler and D. M. Cai, “Resampling approach for anomaly detection
[31] M. Tavallaee, N. Stakhanova, and A. A. Ghorbani, “Toward credible in multispectral images,” in Proc. SPIE, vol. 5093. SPIE, 2003, pp.
evaluation of anomaly-based intrusion-detection methods,” IEEE Trans. 230–240.
Syst. Man Cybern. C Appl. Rev., vol. 40, no. 5, pp. 516–524, September [56] R. Fujimaki, T. Yairi, and K. Machida, “An approach to spacecraft
2010. anomaly detection problem using kernel feature space,” in Proc. 11th
[32] B. Daniel, C. Julia, J. Sushil, and W. Ningning, “ADAM: a testbed for ACM SIGKDD International Conference on Knowledge Discovery in
exploring the use of data mining in intrusion detection,” ACM SIGMOD Data Mining. USA: ACM, 2005, pp. 401–410.
Record, vol. 30, no. 4, pp. 15–24, 2001. [57] L. Portnoy, E. Eskin, and S. J. Stolfo, “Intrusion detection with
[33] Z. Zhang, J. Li, C. N. Manikopoulos, J. Jorgenson, and J. Ucles, unlabeled data using clustering,” in Proc. ACM Workshop on Data
“HIDE: a Hierarchical Network Intrusion Detection System Using Mining Applied to Security, 2001.
Statistical Preprocessing and Neural Network Classification,” in Proc. [58] H. H. Nguyen, N. Harbi, and J. Darmont, “An efficient local region
IEEE Man Systems and Cybernetics Information Assurance Workshop, and clustering-based ensemble system for intrusion detection,” in Proc.
2001. 15th Symposium on International Database Engineering & Applica-
[34] L. Ertoz, E. Eilertson, A. Lazarevic, P. Tan, V. Kumar, and J. Srivastava, tions. USA: ACM, 2011, pp. 185–191.
Data Mining - Next Generation Challenges and Future Directions. [59] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent
MIT Press, 2004, ch. MINDS - Minnesota Intrusion Detection System. Data Analysis, vol. 1, pp. 131–156, 1997.
[35] M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans. [60] Y. Chen, Y. Li, X. Q. Cheng, and L. Guo, “Survey and taxonomy of
Signal Process., vol. 51, no. 8, pp. 2191–2204, 2003. feature selection algorithms in intrusion detection system,” in Proc. 2nd
[36] J. M. Estevez-Tapiador, P. Garcia-Teodoro, and J. E. Diaz-Verdejo, SKLOIS conference on Information Security and Cryptology. Berlin,
“Anomaly detection methods in wired networks : a survey and tax- Heidelberg: Springer-Verlag, 2006, pp. 153–167.
onomy,” Computer Communication, vol. 27, no. 16, pp. 1569–1584, [61] Y. Li, J. L. Wang, Z. Tian, T. Lu, and C. Young, “Building lightweight
October 2004. intrusion detection system using wrapper-based feature selection mech-
[37] A. Fragkiadakis, E. Tragos, and I. Askoxylakis, “A Survey on Security anisms,” Computers & Security, vol. 28, no. 6, pp. 466–475, 2009.
Threats and Detection Techniques in Cognitive Radio Networks,” IEEE [62] H. T. Nguyen, K. Franke, and S. Petrovic, “Towards a Generic Feature-
Commun. Surveys Tutorials, vol. PP, no. 99, pp. 1–18, January 2012. Selection Measure for Intrusion Detection,” in Proc. 20th International
Conference on Pattern Recognition, August 2010, pp. 1529–1532.
[38] R. Heady, G. Luger, A. Maccabe, and M. Servilla, “The Architecture
[63] A. H. Sung and S. Mukkamala, “Identifying Important Features for
of a Network Level Intrusion Detection System,” Computer Science
Intrusion Detection Using Support Vector Machines and Neural Net-
Department, University of New Mexico, Tech. Rep. TR-90, 1990.
works,” in Proc. Symposium on Applications and the Internet. USA:
[39] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, “Selecting
IEEE CS, 2003, pp. 209–217.
Features for Intrusion Detection: A Feature Relevance Analysis on
[64] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual
KDD 99 Intrusion Detection Datasets,” in Proc. 3rd Annual Conference
Information : Criteria of Max-Dependency, Max-Relevance, and Min-
on Privacy, Security and Trust, October 2005.
Redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8,
[40] A. A. Ghorbani, W. Lu, and M. Tavallaee, Network Intrusion Detection pp. 1226–1238, August 2005.
and Prevention : Concepts and Techniques, ser. Advances in Informa- [65] F. Amiri, M. M. R. Yousefi, C. Lucas, A. Shakery, and N. Yazdani,
tion Security. Springer-verlag, October 28 2009. “Mutual information-based feature selection for intrusion detection
[41] P. Ning and S. Jajodia, Intrusion Detection Techniques. H Bidgoli systems,” J. Network and Computer Applications, vol. 34, no. 4, pp.
(Ed.), The Internet Encyclopedia, 2003. 1184–1199, 2011.
[42] F. Wikimedia, “Intrusion detection system,” [66] J. Dunn, “Well separated clusters and optimal fuzzy partitions,” J.
http://en.wikipedia.org/wiki/Intrusion-detection system, Feb 2009. Cybernetics, vol. 4, pp. 95–104, 1974.
[43] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Surveying Port [67] D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,”
Scans and Their Detection Methodologies,” The Computer Journal, IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, no. 2, pp. 224–227,
vol. 54, no. 10, pp. 1565–1581, October 2011. 1979.
[44] B. C. Park, Y. J. Won, M. S. Kim, and J. W. Hong, “Towards [68] L. Hubert and J. Schultz, “Quadratic assignment as a general data
automated application signature generation for traffic identification,” analysis strategy,” British J. Mathematical and Statistical Psychology,
in Proc. IEEE/IFIP Network Operations and Management Symposium: vol. 29, no. 2, pp. 190–241, 1976.
Pervasive Management for Ubiquitous Networks and Services, 2008, [69] F. B. Baker and L. J. Hubert, “Measuring the power of hierarchical
pp. 160–167. cluster analysis,” J. American Statistics Association, vol. 70, no. 349,
[45] V. Kumar, “Parallel and distributed computing for cybersecurity,” IEEE pp. 31–38, 1975.
Distributed Systems Online, vol. 6, no. 10, 2005. [70] F. J. Rohlf, “Methods of Comparing Classifications,” Annual Review
[46] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. of Ecology and Systematics, vol. 5, no. 1, pp. 101–113, 1974.
Addison-Wesley, 2005. [71] P. J. Rousseeuw, “Silhouettes : a graphical aid to the interpretation
[47] M. J. Lesot and M. Rifqi, “Anomaly-based network intrusion detection and validation of cluster analysis,” J. Computational and Applied
: Techniques, systems and challenges,” International J. Knowledge Mathematics, vol. 20, no. 1, pp. 53–65, 1987.
Engineering and Soft Data Paradigms, vol. 1, no. 1, pp. 63–84, 2009. [72] L. Goodman and W. Kruskal, “Measures of associations for cross-
[48] S. H. Cha, “Comprehensive Survey on Distance/Similarity Measures validations,” J. American Statistics Association, vol. 49, pp. 732–764,
between Probability Density Functions,” International J. Mathematical 1954.
Models and Methods in Applied Science, vol. 1, no. 4, pp. 300–307, [73] P. Jaccard, “The distribution of flora in the alpine zone,” New Phytol-
November 2007. ogist, vol. 11, no. 2, pp. 37–50, 1912.
[49] S. Choi, S. Cha, and C. C. Tappert, “A Survey of Binary Similarity and [74] W. M. Rand, “Objective criteria for the evaluation of clustering
Distance Measures,” J. Systemics, Cybernetics and Informatics, vol. 8, methods,” J. American Statistical Association, vol. 66, no. 336, pp.
no. 1, pp. 43–48, 2010. 846–850, 1971.
[50] M. J. Lesot, M. Rifqi, and H. Benhadda, “Similarity measures for [75] J. C. Bezdek, “Numerical taxonomy with fuzzy sets,” J. Mathematical
binary and numerical data: a survey,” International J. Knowledge Biology, vol. 1, no. 1, pp. 57–71, 1974.
Engineering and Soft Data Paradigms, vol. 1, no. 1, pp. 63–84, [76] , “Cluster Validity with fuzzy sets,” J. Cybernetics, vol. 3, no. 3,
December 2009. pp. 58–78, 1974.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 333
[77] X. L. Xie and G. Beni, “A Validity measure for Fuzzy Clustering,” Networks. Washington, DC, USA: IEEE Computer Society, 2010, pp.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 4, pp. 841–847, 313–317.
1991. [100] I. Kang, M. K. Jeong, and D. Kong, “A differentiated one-class
[78] F. J. Anscombe and I. Guttman, “Rejection of outliers,” Technometrics, classification method with applications to intrusion detection,” Expert
vol. 2, no. 2, pp. 123–147, 1960. Systems with Applications, vol. 39, no. 4, pp. 3899–3905, March 2012.
[79] E. Eskin, “Anomaly detection over noisy data using learned probabil- [101] C. F. Tsai, Y. F. Hsu, C. Y. Lin, and W. Y. Lin, “Intrusion detection
ity distributions,” in Proc. 7th International Conference on Machine by machine learning: A review,” Expert Systems with Applications,
Learning. Morgan Kaufmann, 2000, pp. 255–262. vol. 36, no. 10, pp. 11 994–12 000, December 2009.
[80] M. Desforges, P. Jacob, and J. Cooper, “Applications of probability [102] T. Abbes, A. Bouhoula, and M. Rusinowitch, “Efficient decision tree
density estimation to the detection of abnormal conditions in engineer- for protocol analysis in intrusion detection,” International J. Security
ing,” in Proc. Institute of Mechanical Engineers, vol. 212, 1998, pp. and Networks, vol. 5, no. 4, pp. 220–235, December 2010.
687–703. [103] C. Wagner, J. François, R. State, and T. Engel, “Machine Learning
[81] C. Manikopoulos and S. Papavassiliou, “Network Intrusion and Fault Approach for IP-Flow Record Anomaly Detection,” in Proc. 10th
Detection: A Statistical Anomaly Approach,” IEEE Commun. Mag., International IFIP TC 6 conference on Networking - Volume Part I,
vol. 40, no. 10, pp. 76–82, October 2002. 2011, pp. 28–39.
[82] P. K. Chan, M. V. Mahoney, and M. H. Arshad, “A machine learning [104] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and
approach to anomaly detection,” Department of Computer Science, R. C. Williamson, “Estimating the Support of a High-Dimensional
Florida Institute of Technology, Tech. Rep. CS-2003-06, 2003. Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, July
[83] M. V. Mahoney and P. K. Chan, “Learning rules for anomaly detection 2001.
of hostile network traffic,” in Proc. 3rd IEEE International Conference [105] M. Y. Su, G. J. Yu, and C. Y. Lin, “A real-time network intrusion
on Data Mining. Washington: IEEE CS, 2003. detection system for large-scale attacks based on an incremental mining
[84] K. Wang and S. J. Stolfo, “Anomalous Payload-Based Network In- approach,” Computers & Security, vol. 28, no. 5, pp. 301–309, 2009.
trusion Detection,” in Proc. Recent Advances in Intrusion Detection. [106] L. Khan, M. Awad, and B. Thuraisingham, “A New Intrusion Detection
springer, 2004, pp. 203–222. System Using Support Vector Machines and Hierarchical Clustering,”
[85] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional Anomaly The VLDB Journal, vol. 16, no. 4, pp. 507–521, October 2007.
Detection,” IEEE Trans. Knowl. Data Eng., vol. 19, pp. 631–645, 2007. [107] Z. Muda, W. Yassin, M. N. Sulaiman, and N. I. Udzir, “A K-means
[86] P. Chhabra, C. Scott, E. D. Kolaczyk, and M. Crovella, “Distributed and naive bayes learning approach for better intrusion detection,”
Spatial Anomaly Detection,” in Proc. 27th IEEE International Confer- Information Technology J., vol. 10, no. 3, pp. 648–655, 2011.
ence on Computer Communications, 2008, pp. 1705–1713. [108] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1,
[87] W. Lu and A. A. Ghorbani, “Network Anomaly Detection Based on no. 1, pp. 81–106, March 1986.
Wavelet Analysis,” EURASIP J. Advances in Signal Processing, vol. [109] H. Yu and S. Kim, Handbook of Natural Computing. Springer, 2003,
2009, no. 837601, January 2009. ch. SVM Tutorial - Classification, Regression and Ranking.
[88] F. S. Wattenberg, J. I. A. Perez, P. C. Higuera, M. M. Fernandez, and [110] L. V. Kuang, “DNIDS: A Dependable Network Intrusion Detection
I. A. Dimitriadis, “Anomaly Detection in Network Traffic Based on System Using the CSI-KNN Algorithm,” Master’s thesis, Queen’s
Statistical Inference and α-Stable Modeling,” IEEE Trans. Dependable University Kingston, Ontario, Canada, Sep 2007.
Secure Computing, vol. 8, no. 4, pp. 494–509, July/August 2011. [111] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “RODD:
[89] M. Yu, “A Nonparametric Adaptive CUSUM Method And Its Appli- An Effective Reference-Based Outlier Detection Technique for Large
cation In Network Anomaly Detection,” International J. Advancements Datasets,” in Advanced Computing. Springer, 2011, vol. 133, pp.
in Computing Technology, vol. 4, no. 1, pp. 280–288, 2012. 76–84.
[90] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Clas- [112] W. Lee, S. J. Stolfo, and K. W. Mok, “Adaptive Intrusion Detection
sifiers,” Machine Learning, vol. 29, no. 2-3, pp. 131–163, November : A Data Mining Approach,” Artificial Intelligence Review, vol. 14,
1997. no. 6, pp. 533–567, 2000.
[91] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, “Bayesian event [113] M. Roesch, “Snort - Lightweight Intrusion Detection for Networks,” in
classification for intrusion detection,” in Proc. 19th Annual Computer Proc. 13th USENIX Conference on System Administration, Washington,
Security Applications Conference, 2003. 1999, pp. 229–238.
[92] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association [114] B. Neumann, “Knowledge Management and Assistance Systems,”
Rules in Large Databases,” in Proc. 20th International Conference on http://kogs-www.informatik.uni-hamburg.de/ neumann/, 2007.
Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann, [115] Y. F. Zhang, Z. Y. Xiong, and X. Q. Wang, “Distributed intrusion
1994, pp. 487–499. detection based on clustering,” in Proc. International Conference on
[93] N. Subramoniam, P. S. Pawar, M. Bhatnagar, N. S. Khedekar, S. Gun- Machine Learning and Cybernetics, vol. 4, August 2005, pp. 2379–
tupalli, N. Satyanarayana, V. A. Vijayakumar, P. K. Ampatt, R. Ranjan, 2383.
and P. S. Pandit, “Development of a Comprehensive Intrusion Detection [116] K. Leung and C. Leckie, “Unsupervised anomaly detection in net-
System - Challenges and Approaches,” in Proc. 1st International work intrusion detection using clusters,” in Proc. 28th Australasian
Conference on Information Systems Security, Kolkata, India, 2005, pp. conference on Computer Science - Volume 38. Darlinghurst, Australia,
332–335. Australia: Australian Computer Society, Inc., 2005, pp. 333–342.
[94] S. Song, L. Ling, and C. N. Manikopoulo, “Flow-based Statistical [117] C. Zhang, G. Zhang, and S. Sun, “A Mixed Unsupervised Clustering-
Aggregation Schemes for Network Anomaly Detection,” in Proc. IEEE Based Intrusion Detection Model,” in Proc. 3rd International Confer-
International Conference on Networking, Sensing, 2006. ence on Genetic and Evolutionary Computing. USA: IEEE CS, 2009,
[95] H. Tong, C. Li, J. He, J. Chen, Q. A. Tran, H. X. Duan, and X. Li, pp. 426–428.
“Anomaly Internet Network Traffic Detection by Kernel Principle [118] P. Casas, J. Mazel, and P. Owezarski, “Unsupervised Network Intru-
Component Classifier,” in Proc. 2nd International Symposium on sion Detection Systems: Detecting the Unknown without Knowledge,”
Neural Networks, vol. LNCS. 3498, 2005, pp. 476–481. Computer Communications, vol. 35, no. 7, pp. 772–783, April 2012.
[96] S. R. Gaddam, V. V. Phoha, and K. S. Balagani, “K-Means+ID3: A [119] K. Sequeira and M. Zaki, “ADMIT: anomaly-based data mining for
Novel Method for Supervised Anomaly Detection by Cascading K- intrusions,” in Proc. eighth ACM SIGKDD international conference on
Means Clustering and ID3 Decision Tree Learning Methods,” IEEE Knowledge discovery and data mining. New York, NY, USA: ACM,
Trans. Knowl. Data Eng., vol. 19, no. 3, pp. 345–354, Mar 2007. 2002, pp. 386–395.
[97] K. Das, J. Schneider, and D. B. Neill, “Anomaly pattern detection [120] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, Applications
in categorical datasets,” in Proc. 14th ACM SIGKDD International of Data Mining in Computer Security. Kluwer Academic, 2002, ch.
Conference on Knowledge Discovery and Data Mining. USA: ACM, A geometric framework for unsupervised anomaly detection: Detecting
2008, pp. 169–176. intrusions in unlabeled data.
[98] W. Lu and H. Tong, “Detecting Network Anomalies Using CUSUM [121] Z. Zhuang, Y. Li, and Z. Chen, “Enhancing Intrusion Detection System
and EM Clustering,” in Proc. 4th International Symposium on Ad- with proximity information,” International J. Security and Networks,
vances in Computation and Intelligence. Springer-verlag, 2009, pp. vol. 5, no. 4, pp. 207–219, December 2010.
297–308. [122] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “An effective
[99] M. A. Qadeer, A. Iqbal, M. Zahid, and M. R. Siddiqui, “Network unsupervised network anomaly detection method,” in Proc. Interna-
Traffic Analysis and Intrusion Detection Using Packet Sniffer,” in tional Conference on Advances in Computing, Communications and
Proc. 2nd International Conference on Communication Software and Informatics. New York, NY, USA: ACM, 2012, pp. 533–539.
334 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
[123] M. E. Otey, A. Ghoting, and S. Parthasarathy, “Fast distributed outlier [147] M. Mohajerani, A. Moeini, and M. Kianie, “NFIDS: A Neuro-Fuzzy
detection in mixed-attribute data sets,” Data Mining and Knowledge Intrusion Detection System,” in Proc. 10th IEEE International Confer-
Discovery, vol. 12, no. 2-3, pp. 203–228, 2006. ence on Electronics, Circuits and Systems, vol. 1, December 2003, pp.
[124] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “NADO: network 348–351.
anomaly detection using outlier approach,” in Proc. ACM International [148] Z. Pawlak, “Rough sets,” International J. Parallel Programming,
Conference on Communication, Computing & Security. USA: ACM, vol. 11, no. 5, pp. 341–356, 1982.
2011, pp. 531–536. [149] Z. Cai, X. Guan, P. Shao, Q. Peng, and G. Sun, “A rough set theory
[125] S. Jiang, X. Song, H. Wang, J.-J. Han, and Q.-H. Li, “A clustering- based method for anomaly intrusion detection in computer network
based method for unsupervised intrusion detections,” Pattern Recogni- systems,” Expert Systems, vol. 20, no. 5, pp. 251–259, November 2003.
tion Letters, vol. 27, no. 7, pp. 802–810, May 2006. [150] W. Chimphlee, A. H. Abdullah, M. S. M. Noor, S. Srinoy, and S. Chim-
[126] Z. Chen and C. Chen, “A Closed-Form Expression for Static Worm- phlee, “Anomaly-Based Intrusion Detection using Fuzzy Rough Clus-
Scanning Strategies,” in Proc. IEEE International Conference on tering,” in Proc. International Conference on Hybrid Information
Communications. Beijing, China: IEEE CS, May 2008, pp. 1573– Technology, vol. 01. Washington, DC, USA: IEEE Computer Society,
1577. 2006, pp. 329–334.
[127] B. Balajinath and S. V. Raghavan, “Intrusion detection through learning [151] A. O. Adetunmbi, S. O. Falaki, O. S. Adewale, and B. K. Alese,
behavior model,” Computer Communications, vol. 24, no. 12, pp. “Network Intrusion Detection based on Rough Set and k-Nearest
1202–1212, July 2001. Neighbour,” International J. Computing and ICT Research, vol. 2,
[128] M. S. A. Khan, “Rule based Network Intrusion Detection using Genetic no. 1, pp. 60–66, 2008.
Algorithm,” International J. Computer Applications, vol. 18, no. 8, pp. [152] R. C. Chen, K. F. Cheng, Y. H. Chen, and C. F. Hsieh, “Using Rough
26–29, March 2011. Set and Support Vector Machine for Network Intrusion Detection
[129] S. Haykin, Neural Networks. New Jersey: Prentice Hall, 1999. System,” in Proc. First Asian Conference on Intelligent Information
[130] M. Amini, R. Jalili, and H. R. Shahriari, “RT-UNNID: A practical solu- and Database Systems. Washington, DC, USA: IEEE Computer
tion to real-time network-based intrusion detection using unsupervised Society, 2009, pp. 465–470.
neural networks,” Computers & Security, vol. 25, no. 6, pp. 459–468, [153] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization
2006. by a colony of cooperating agents,” IEEE Trans. Syst. Man Cybern. B,
[131] G. Carpenter and S. Grossberg, “Adaptive resonance theory,” CAS/CNS Cybern., vol. 26, no. 1, pp. 29–41, 1996.
Technical Report Series, no. 008, 2010. [154] H. H. Gao, H. H. Yang, and X. Y. Wang, “Ant colony optimization
[132] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, based network intrusion feature selection and detection,” in Proc.
pp. 1464–1480, 1990. International Conference on Machine Learning and Cybernetics, vol. 6,
[133] J. Cannady, “Applying CMAC-Based On-Line Learning to Intrusion aug. 2005, pp. 3871–3875.
Detection,” in Proc. IEEE-INNS-ENNS International Joint Conference [155] A. Visconti and H. Tahayori, “Artificial immune system based on
on Neural Networks, vol. 5, 2000, pp. 405–410. interval type-2 fuzzy set paradigm,” Applied Soft Computing, vol. 11,
[134] S. C. Lee and D. V. Heinbuch, “Training a neural-network based no. 6, pp. 4055–4063, September 2011.
intrusion detector to recognize novel attacks,” IEEE Trans. Syst. Man [156] S. Noel, D. Wijesekera, and C. Youman, “Modern Intrusion Detection,
Cybern. A, vol. 31, no. 4, pp. 294–299, 2001. Data Mining, and Degrees of Attack Guilt,” in Proc. International
[135] G. Liu, Z. Yi, and S. Yang, “A hierarchical intrusion detection model Conference on Applications of Data Mining in Computer Security.
based on the PCA neural networks,” Neurocomputing, vol. 70, no. 7-9, Springer, 2002.
pp. 1561–1568, 2007. [157] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang,
[136] J. Sun, H. Yang, J. Tian, and F. Wu, “Intrusion Detection Method Based and et al., “Specification-based anomaly detection: a new approach
on Wavelet Neural Network,” in Proc. 2nd International Workshop on for detecting network intrusions,” in Proc. 9th ACM Conference on
Knowledge Discovery and Data Mining. USA: IEEE CS, 2009, pp. Computer and Communications Security, 2002, pp. 265–274.
851–854. [158] X. Xu, “Sequential anomaly detection based on temporal-difference
[137] H. Yong and Z. X. Feng, “Expert System Based Intrusion Detection learning: Principles, models and case studies,” Applied Soft Computing,
System,” in Proc. International Conference on Information Manage- vol. 10, no. 3, pp. 859–867, 2010.
ment, Innovation Management and Industrial Engineering, vol. 4, [159] A. Prayote, “Knowledge Based Anomaly Detection,” Ph.D. disserta-
November 2010, pp. 404 –407. tion, School of Computer Science and Egineering, The University of
[138] A. Parlos, K. Chong, and A. Atiya, “Application of the recurrent New South Wales, November 2007.
multilayer perceptron in modeling complex process dynamics,” IEEE [160] K. Ilgun, R. A. Kemmerer, and P. A. Porras, “State transition analysis:
Trans. Neural Netw., vol. 5, no. 2, pp. 255–266, 1994. A rule-based intrusion detection approach,” IEEE Trans. Software Eng.,
[139] K. Labib and R. Vemuri, “NSOM: A Tool To Detect Denial Of Service vol. 21, no. 3, pp. 181–199, 1995.
Attacks Using Self-Organizing Maps,” Department of Applied Science [161] D. E. Denning and P. G. Neumann, “Requirements and model for IDES
University of California, Davis Davis, California, U.S.A., Tech. Rep., a real-time intrusion detection system,” Computer Science Laboratory,
2002. SRI International, USA, Tech. Rep. 83F83-01-00, 1985.
[140] D. Bolzoni, S. Etalle, P. H. Hartel, and E. Zambon, “POSEIDON: a [162] D. Anderson, T. F. Lunt, H. Javitz, A. Tamaru, and A. Valdes,
2-tier Anomaly-based Network Intrusion Detection System,” in Proc. “Detecting unusual program behaviour using the statistical component
4th IEEE International Workshop on Information Assurance, 2006, pp. of the next-generation intrusion detection expert system (NIDES),”
144–156. Computer Science Laboratory, SRI International, USA, Tech. Rep.
[141] M. V. Mahoney and P. K. Chan, “PHAD: Packet Header Anomaly SRIO-CSL-95-06, 1995.
Detection for Identifying Hostile Network Traffic,” Dept. of Computer [163] N. G. Duffield, P. Haffner, B. Krishnamurthy, and H. Ringberg, “Rule-
Science, Florida Tech, Tech. Rep. cs-2001-04, 2001. Based Anomaly Detection on IP Flows,” in Proc. 28th IEEE Interna-
[142] J. E. Dickerson, “Fuzzy network profiling for intrusion detection,” tional Conference on Computer Communications, Joint Conference of
in Proc. 19th International Conference of the North American Fuzzy the IEEE Computer and Communications Societies. Rio de Janeiro,
Information Processing Society, Atlanta, July 2000, pp. 301–306. Brazil: IEEE press, 2009, pp. 424–432.
[143] F. Geramiraz, A. S. Memaripour, and M. Abbaspour, “Adaptive [164] R. E. Schapire, “A brief introduction to boosting,” in Proc. 16th Inter-
Anomaly-Based Intrusion Detection System Using Fuzzy Controller,” national Joint Conference on Artificial Intelligence, Morgan Kaufmann,
International Journal of Network Security, vol. 14, no. 6, pp. 352–361, 1999, pp. 1401–1406.
2012. [165] A. Prayote and P. Compton, “Detecting anomalies and intruders,” AI
[144] A. Tajbakhsh, M. Rahmati, and A. Mirzaei, “Intrusion detection using 2006: Advances in Artificial Intelligence, pp. 1084–1088, 2006.
fuzzy association rules,” Applied Soft Computing, vol. 9, no. 2, pp. [166] G. Edwards, B. Kang, P. Preston, and P. Compton, “Prudent expert
462–469, March 2009. systems with credentials: Managing the expertise of decision support
[145] S. Mabu, C. Chen, N. Lu, K. Shimada, and K. Hirasawa, “An systems,” International journal of biomedical computing, vol. 40, no. 2,
Intrusion-Detection Model Based on Fuzzy Class-Association-Rule pp. 125–132, 1995.
Mining Using Genetic Network Programming,” IEEE Trans. Syst. Man [167] W. Scheirer and M. C. Chuah, “Syntax vs. semantics : competing
Cybern. Part C Appl. Rev., vol. 41, no. 1, pp. 130–139, 2011. approaches to dynamic network intrusion detection,” International
[146] J. Q. Xian, F. H. Lang, and X. L. Tang, “A novel intrusion detection Journal Securrity and Networks, vol. 3, no. 1, pp. 24–35, December
method based on clonal selection clustering algorithm,” in Proc. Inter- 2008.
national Conference on Machine Learning and Cybernetics. USA: [168] P. Naldurg, K. Sen, and P. Thati, “A Temporal Logic Based Framework
IEEE Press, 2005, vol. 6. for Intrusion Detection,” in Proc. 24th IFIP WG 6.1 International
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 335
Conference on Formal Techniques for Networked and Distributed ence on Computational Science and Optimization - Volume 02. USA:
Systems, 2004, pp. 359–376. IEEE CS, 2010, pp. 410–414.
[169] J. M. Estevez-Tapiador, P. Garcya-Teodoro, and J. E. Dyaz-Verdejo, [192] D. Ariu, R. Tronci, and G. Giacinto, “HMMPayl: An intrusion detec-
“Stochastic protocol modeling for anomaly based network intrusion tion system based on Hidden Markov Models,” Computers & Security,
detection,” in Proc. 1st International Workshop on Information Assur- vol. 30, no. 4, pp. 221–241, 2011.
ance. IEEE CS, 2003, pp. 3–12. [193] M. Arumugam, P. Thangaraj, P. Sivakumar, and P. Pradeepkumar, “Im-
[170] A. Shabtai, U. Kanonov, and Y. Elovici, “Intrusion detection for mobile plementation of two class classifiers for hybrid intrusion detection,” in
devices using the knowledge-based, temporal abstraction method,” J. Proc. International Conference on Communication and Computational
System Software, vol. 83, no. 8, pp. 1524–1537, August 2010. Intelligence, December 2010, pp. 486–490.
[171] S. S. Hung and D. S. M. Liu, “A user-oriented ontology-based approach [194] J. Zhang and M. Zulkernine, “A Hybrid Network Intrusion Detection
for network intrusion detection,” Computer Standards & Interfaces, Technique Using Random Forests,” in Proc. 1st International Confer-
vol. 30, no. 1-2, pp. 78–88, January 2008. ence on Availability, Reliability and Security. USA: IEEE CS, 2006,
[172] R. Polikar, “Ensemble based systems in decision making,” IEEE pp. 262–269.
Circuits Syst. Mag., vol. 6, no. 3, pp. 21–45, 2006. [195] M. A. Aydin, A. H. Zaim, and K. G. Ceylan, “A hybrid intrusion
[173] A. Borji, “Combining heterogeneous classifiers for network intrusion detection system design for computer network security,” Computers &
detection,” in Proc. 12th Asian Computing Science Conference on Electrical Engineering, vol. 35, no. 3, pp. 517–526, May 2009.
Advances in Computer Science: Computer and Network Security. [196] M. Panda, A. Abraham, and M. R. Patra, “Hybrid intelligent systems
Springer, 2007, pp. 254–260. for detecting network intrusions,” Computer Physics Communications,
[174] G. Giacinto, R. Perdisci, M. D. Rio, and F. Roli, “Intrusion detection vol. Early, 2012.
in computer networks by a modular ensemble of one-class classifiers,” [197] A. Herrero, M. Navarro, E. Corchado, and V. Julian, “RT-MOVICAB-
Information Fusion, vol. 9, no. 1, pp. 69–82, January 2008. IDS: Addressing real-time intrusion detection,” Future Generation
[175] L. Rokach, “Ensemble-based classifiers,” Artificial Intelligence Review, Computer Systems, vol. 29, no. 1, pp. 250–261, 2011.
vol. 33, no. 1-2, pp. 1–39, February 2010. [198] M. E. Locasto, K. Wang, A. D. Keromytis, and S. J. Stolfo, “FLIPS:
[176] K. Noto, C. Brodley, and D. Slonim, “Anomaly Detection Using an Hybrid Adaptive Intrusion Prevention,” in Recent Advances in Intrusion
Ensemble of Feature Models,” in Proc. IEEE International Conference Detection, 2005, pp. 82–101.
on Data Mining. USA: IEEE CS, 2010, pp. 953–958. [199] S. Peddabachigari, A. Abraham, C. Grosan, and J. Thomas, “Modeling
[177] P. M. Mafra, V. Moll, J. D. S. Fraga, and A. O. Santin, “Octopus-IIDS: intrusion detection system using hybrid intelligent systems,” J. Network
An Anomaly Based Intelligent Intrusion Detection System,” in Proc. and Computer Applications, vol. 30, no. 1, pp. 114–132, January 2007.
IEEE Symposium on Computers and Communications. USA: IEEE [200] J. Zhang, M. Zulkernine, and A. Haque, “Random-Forests-Based
CS, 2010, pp. 405–410. Network Intrusion Detection Systems,” IEEE Trans. Syst. Man Cybern.
C, vol. 38, no. 5, pp. 649–659, 2008.
[178] S. Chebrolu, A. Abraham, and J. P. Thomas, “Feature deduction
and ensemble design of intrusion detection systems,” Computers & [201] X. Tong, Z. Wang, and H. Yu, “A research using hybrid RBF/Elman
neural networks for intrusion detection system secure model,” Com-
Security, vol. 24, no. 4, pp. 295–307, 2005.
puter Physics Communications, vol. 180, no. 10, pp. 1795–1801, 2009.
[179] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and
[202] X. Yu, “A New Model of Intelligent Hybrid Network Intrusion De-
Regression Trees. Monterey, CA: Wadsworth and Brooks, 1984.
tection System,” in Proc. International Conference on Bioinformatics
[180] R. Perdisci, G. Gu, and W. Lee, “Using an Ensemble of One- and Biomedical Technology. IEEE CS, 2010, pp. 386–389.
Class SVM Classifiers to Harden Payload-based Anomaly Detection
[203] S. Selim, M. Hashem, and T. M. Nazmy, “Hybrid Multi-level Intrusion
Systems,” in Proc. 6th International Conference on Data Mining.
Detection System,” International J. Computer Science and Information
USA: IEEE CS, 2006, pp. 488–498.
Security, vol. 9, no. 5, pp. 23–29, 2011.
[181] G. Folino, C. Pizzuti, and G. Spezzano, “An ensemble-based evolution- [204] A. Soule, K. Salamatian, and N. Taft, “Combining filtering and statis-
ary framework for coping with distributed intrusion detection,” Genetic tical methods for anomaly detection,” in Proc. 5th ACM SIGCOMM
Programming and Evolvable Machines, vol. 11, no. 2, pp. 131–146, conference on Internet Measurement. USA: ACM, 2005, pp. 1–14.
June 2010. [205] KDDcup99, “Knowledge discovery in databases DARPA archive,”
[182] M. Rehak, M. Pechoucek, P. Celeda, J. Novotny, and P. Minarik, http://www.kdd.ics.uci.edu/databases/kddcup99/
“CAMNEP: Agent-based Network Intrusion Detection System,” in task.html, 1999.
Proc. 7th International Joint Conference on Autonomous Agents and [206] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K. Chan, “Cost-
Multiagent Systems: Industrial Track. Richland, SC: IFAAMS, 2008, Based Modeling for Fraud and Intrusion Detection: Results from the
pp. 133–136. JAM Project,” in Proc. DARPA Information Survivability Conference
[183] R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee, “McPAD: A and Exposition, vol. 2. USA: IEEE CS, 2000, pp. 130–144.
multiple classifier system for accurate payload-based anomaly detec- [207] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
tion,” Computer Networks, vol. 53, no. 6, pp. 864–881, April 2009. analysis of the KDD CUP 99 data set,” in Proc. 2nd IEEE International
[184] W. Khreich, E. Granger, A. Miri, and R. Sabourin, “Adaptive ROC- Conference on Computational Intelligence for Security and Defense
based ensembles of HMMs applied to anomaly detection,” Pattern Applications. USA: IEEE Press, 2009, pp. 53–58.
Recognition, vol. 45, no. 1, pp. 208–230, January 2012. [208] NSL-KDD, “NSL-KDD data set for network-based intrusion detection
[185] G. Giacinto, F. Roli, and L. Didaci, “Fusion of multiple classifiers for systems,” http://iscx.cs.unb.ca/NSL-KDD/, March 2009.
intrusion detection in computer networks,” Pattern Recognition Letters, [209] I. S. T. G. MIT Lincoln Lab, “DARPA Intrusion Detection Data Sets,”
vol. 24, no. 12, pp. 1795–1803, August 2003. http://www.ll.mit.edu/mission/communications/
[186] J. Shifflet, “A Technique Independent Fusion Model For Network ist/corpora/ideval/data/2000data.html, March 2000.
Intrusion Detection,” in Proc. Midstates Conference on Undergraduate [210] Defcon, “The Shmoo Group,” http://cctf.shmoo.com/, 2011.
Research in Computer Science and Mathematics, vol. 3, 2005, pp. 13– [211] CAIDA, “The cooperative Analysis for Internet Data Analysis,”
19. http://www.caida.org, 2011.
[187] D. Parikh and T. Chen, “Data Fusion and Cost Minimization for [212] LBNL, “Lawrence Berkeley National Laboratory and ICSI,
Intrusion Detection,” IEEE Trans. Inf. For. Security, vol. 3, no. 3, pp. LBNL/ICSI Enterprise Tracing Project,” http://www.icir.org/enterprise-
381–389, 2008. tracing/, 2005.
[188] L. Zhi-dong, Y. Wu, W. Wei, and M. Da-peng, “Decision-level fusion [213] UNIBS, “University of Brescia dataset,”
model of multi-source intrusion detection alerts,” J. Communications, http://www.ing.unibs.it/ntw/tools/traces/, 2009.
vol. 32, no. 5, pp. 121–128, 2011. [214] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Towards
[189] R. Yan and C. Shao, “Hierarchical Method for Anomaly Detection and developing a systematic approach to generate benchmark datasets for
Attack Identification in High-speed Network,” Information Technology intrusion detection,” Computers & Security, vol. 31, no. 3, pp. 357–
J., vol. 11, no. 9, pp. 1243–1250, 2012. 374, 2012.
[190] V. Chatzigiannakis, G. Androulidakis, K. Pelechrinis, S. Papavassil- [215] P. Gogoi, M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita,
iou, and V. Maglaris, “Data fusion algorithms for network anomaly “Packet and Flow Based Network Intrusion Datasets,” in Proc. 5th
detection: classification and evaluation,” in Proc. 3rd International International Conference on Contemporary Computing, vol. LNCS-
Conference on Networking and Services. Greece: IEEE CS, 2007, CCIS 306. Springer, August 6-8 2012, pp. 322–334.
pp. 50–57. [216] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “AOCD :
[191] W. Gong, W. Fu, and L. Cai, “A Neural Network Based Intrusion An Adaptive Outlier Based Coordinated Scan Detection Approach,”
Detection Data Fusion Model,” in Proc. 3rd International Joint Confer- International J. Network Security, vol. 14, no. 6, pp. 339–351, 2012.
336 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014
[217] C. Satten, “Lossless Gigabit Remote Packet Capture With Linux,” [231] R. Sekar, Y. Guang, S. Verma, and T. Shanbhag, “A high-performance
http://staff.washington.edu/corey/gulp/, 2007. network intrusion detection system,” in Proc. 6th ACM Conference on
[218] NFDUMP, “NFDUMP Tool,” http://nfdump.sourceforge.net/, 2011. Computer and Communications Security. USA: ACM, 1999, pp. 8–17.
[219] M. V. Mahoney and P. K. Chan, “An Analysis of the 1999
DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly
Detection,” in Proc. 6th International Symposium on Recent Advances
in Intrusion Detection. Springer, 2003, pp. 220–237. Monowar Hussain Bhuyan received his M.Tech.
[220] J. McHugh, “Testing Intrusion detection systems: a critique of the 1998 in Information Technology from the Department of
and 1999 DARPA intrusion detection system evaluations as performed Computer Science and Engineering, Tezpur Univer-
by Lincoln Laboratory,” ACM Trans. Inf. System Security, vol. 3, no. 4, sity, Assam, India in 2009. Currently, he is pursuing
pp. 262–294, November 2000. his Ph.D. in Computer Science and Engineering
[221] P. Mell, V. Hu, R. Lippmann, J. Haines, and M. Zissman, from the same university. He is a life member of
“An Overview of Issues in Testing Intrusion Detection Systems,” IETE, India. His research areas include machine
http://citeseer.ist.psu.edu/621355.html, 2003. learning, computer and network security, pattern
[222] J. Xu and C. R. Shelton, “Intrusion Detection using Continuous Time recognition. He has published 15 papers in interna-
Bayesian Networks,” J. Artificial Intelligence Research, vol. 39, pp. tional journals and referred conference proceedings.
745–774, 2010.
[223] S. Axelsson, “The base-rate fallacy and the difficulty of intrusion
detection,” ACM Trans. Inf. System Security, vol. 3, no. 3, pp. 186–205,
August 2000.
[224] R. P. Lippmann, D. J. Fried, I. Graf, J. Haines, K. Kendall, D. McClung, Dhruba K. Bhattacharyya received his Ph.D.
D. Weber, S. W. D. Wyschogord, R. K. Cunningham, and M. A. in Computer Science from Tezpur University in
1999. He is a Professor in the Computer Science
Zissman, “Evaluating Intrusion Detection Systems: The 1998 DARPA
Offline Intrusion Detection Evaluation,” in Proc. DARPA Information & Engineering Department at Tezpur University.
Survivability Conference and Exposition, January 2000, pp. 12–26. His research areas include Data Mining, Network
Security and Content-based Image Retrieval. Prof.
[225] M. V. Joshi, R. C. Agarwal, and V. Kumar, “Predicting rare classes
Bhattacharyya has published 150+ research papers
: can boosting make any weak learner strong?” in Proc. 8th ACM
in the leading international journals and conference
SIGKDD International Conference on Knowledge Discovery and Data
proceedings. In addition, Dr Bhattacharyya has writ-
Mining. USA: ACM, 2002, pp. 297–306.
ten/edited 8 books. He is a Programme Commit-
[226] P. Dokas, L. Ertoz, A. Lazarevic, J. Srivastava, and P. N. Tan, “Data tee/Advisory Body member of several international
Mining for Network Intrusion Detection,” in Proc. NSF Workshop on conferences/workshops.
Next Generation Data Mining, November 2002.
[227] Y. Wang, Statistical Techniques for Network Security : Modern
Statistically-Based Intrusion Detection and Protection. Hershey, PA: Jugal K. Kalita is a professor of Computer Sci-
Information Science Reference, IGI Publishing, 2008. ence at the University of Colorado at Colorado
[228] S. M. Weiss and T. Zhang, The handbook of data mining. Lawrence Springs. He received his Ph.D. from the University
Erlbaum Assoc Inc, 2003, ch. Performance Alanysis and Evaluation, of Pennsylvania. His research interests are in natu-
pp. 426–439. ral language processing, machine learning, artificial
[229] F. J. Provost and T. Fawcett, “Robust Classification for Imprecise intelligence and bioinformatics. He has published
Environments,” Machine Learning, vol. 42, no. 3, pp. 203–231, 2001. 120 papers in international journals and referred
[230] R. A. Maxion and R. R. Roberts, “Proper Use of ROC Curves in In- conference proceedings and has written two books.
trusion/Anomaly Detection,” School of Computing Science, University
of Newcastle upon Tyne, Tech. Rep. CS-TR-871, November 2004.