NetFlow Anomaly Detection Through Parallel Cluster Density ...
NetFlow Anomaly Detection Through Parallel Cluster Density ...
1 Introduction
In recent years, research into new methods of anomaly detection within a network has
increased in prominence. The need for a fast, reliable method to identify possible
malicious activity has grown in response to emerging threats. Protecting confidential and
proprietary data is of paramount importance to any organization to ensure that both legal
and contractual obligations are kept. In addition, the data stored may not necessarily be
the property of the company storing and handling the data. Malicious activity such as
Botnets and Port Scans are increasing in frequency. These attacks, while simple, have
the potential to allow for unauthorized access onto the network.
Within any organization, it is common place to use network monitoring and analysis
tools to help with the detection of any anomalous behaviour on the network. Tools, such
as McAfee ePO and Tipping point for example, are signature based models, which
require a known example of a threat to be catalogued and a signature generated. The
signature based model, while highly exact, fails if a novel attack occurs (e.g. zero-day
vulnerabilities), since no previous signature exists. This limitation gave rise to anomaly
based detection mechanisms. These methods require no signature database, but instead
models the “normal” traffic on a network and alerts to any activity that happens outside
of these normality bounds.
While much research has been conducted on various methods for anomaly based
systems using a variety of approaches [1–5], key limitations apply when attempting to
adapt these approaches to a real-time system. These include, most notably, computational
cost. Within commonly used distance based outlier detection mechanisms, the need for
distance based calculations for each new sample can be overwhelming for high volume
data. This gave rise to optimized algorithms designed to mitigate this limitation.
Algorithms such as Fast Local Outlier Factor (FastLOF) [6] and Micro-Cluster based
Outlier Detection (MCOD) [7], reduce the overall cost of the range queries with varying
degrees of success.
It has been shown that the application of time series can be beneficial in the detection
of network anomalies. Applying time series over time-windows of increasing size has
been shown to be capable of normalizing normal behaviours over time. However, at
smaller time-intervals, it is possible for abnormal behaviour to traverse time windows,
allowing for the possibility of becoming a false positive. This is particularly prevalent
among anomalies that generate low numbers of NetFlow. It is possible for it to become
hidden within other network traffic as time progresses, making detection increasingly
unlikely.
Moreover, while it is possible that a large increase may occur, small deviations in
established traffic behaviour may also be an indication of unauthorized activity. For
example, an increase in failed login attempts may produce little difference with respect
to NetFlow, it may be indicative of someone trying to guess a password. Detecting such
an instance would be of paramount importance, particularly if followed by a successful
login attempt [8]. This paper proposes a solution to this problem. By implementing
parallel clustering algorithms, it is possible to gain a higher level of granularity while
maintaining the normalization techniques gained from an incrementally increasing time-
window. Concurrent algorithms can detect minor deviations from established behaviour
that can occur, regardless if they occur while traversing between time-windows.
In Section 2, an overview of related work is given. In Section3, a brief overview of
the technology used is presented. In Section 4, we propose a framework for the
identification of anomalies within NetFlow data. The architecture is presented as well as
an overview given on algorithms created and used. In Section 5, testing methodology is
presented. In Section 6, results obtained from live data are presented and analysed. In
Section 7, conclusions and future work are presented.
2 Related Works
Recent research into anomaly detection has largely focused on applying anomaly
detection mechanisms on network data to successfully identify anomalous behaviour.
Many problems still exist however. Performance is a key factor when trying to utilize
anomaly detection techniques and there are many examples where this is apparent.
Methods such as Principal component analysis [3], K nearest neighbour [9] and ensemble
techniques [10], [11] have been used to various degrees of success in this task.
However, there comparatively expensive operations have led to a rise in clustering
techniques to mitigate the calculations need when associating anomaly detection with
big data [12], [13]. Using aggregated data, such as NetFlow can be used to reduce the
calculations further [14]. Limitations are present with these techniques however. Kumari
et al [15] looks at a clustering technique for the use of anomaly detection over a network,
setting a distance based threshold as the 100th farthest data point from the obtained
cluster centroids. This threshold is a common theme across multiple anomaly detection
solutions [16–18]. However, it can be argued that using a common threshold over all
clusters within real data is non-representative of the various forms of traffic created. E.g.
Traffic from different applications do not act in a similar manner.
This brings forward an interesting problem. While distance based outliers have been
shown to be of significance in a plethora of works [19–21], we propose another indicator
of possible anomalous behaviour. By monitoring cluster density over a time series,
changes in underlying behaviour can be detected. Rather than only focusing on samples
that are anomalous via distance based calculations, changes in the density of activity over
time are also monitored.
Asmuss et al [18] demonstrates the use and effectiveness of utilizing a time series
based approach over live data. The aggregation of traffic is highly beneficial in this case,
as it reduces the computational sources needed. Furthermore, it also provides a tangible
benefit when comparing results across clusters. This idea of time-aware analysis has been
used elsewhere also [22], and has shown to be a valuable tool in mapping continuous
behaviour.
This ability to generate a normalized view of traffic over time, practically speaking,
has some limitations however. The potential for an anomaly to traverse though
sequential time windows can lead to the anomaly threshold not being broken, thus
leading to a false positive reading. Presented in Section 4 is an architecture that utilizes
concurrent time windows in order to mitigate the risk of this happening. Anomalies can
be gathered from individual instances of the clustering algorithm, while the correlation
of clustering behaviours through instances can also indicate anomalous activity.
3 Technology Overview
Cisco NetFlow is a system that amalgamates network traffic information into a format
that successfully describes communications occurring on a network. Through a NetFlow
enabled device (Fig. 1), packet traces are identified and stored as a single flow
representation of a specific set of communications between two devices. These are used
for multiple tasks such as network performance monitoring and used as a means of
security evaluation when an incident has been detected. Visualization of the NetFlow has
also proven to be of tangible benefit [23]. Using this aggregated data for anomaly
detection has numerous benefits, such as data size being reduced for processing purposes
and storage. For the analysis in this paper, NetFlow was used as it was found that using
NetFlow for network monitoring purposes was highly common in the area[24], as well,
in this case, the current infrastructure of NPD allowed for the collection of NetFlow with
relative ease.
Fig. 1. NetFlow Example
3.2 MCOD
4 Proposed Framework
When using algorithms such as k-Nearest Neighbour and MCOD, a common variable
that must be tuned to the dataset is k. While it’s exact use may change, the principal
remains consistent. k is used to describe the limit of normality within anomaly detection
frameworks. It is a single variable used to tune the classification or detection rate within
an algorithm. Within MCOD, it is the value at which a micro cluster is generated and an
outlier is classified as an inlier. Within the context of NetFlow anomaly detection, this is
a severe limitation, as it assumes that all network traffic activity has an equal distribution
across the network, which is an extremely unlikely assumption (Discussed further in
Section 5).
In order to successfully mitigate this limitation, all cluster densities are monitored
irrespective of k’s value. This allows for the successful detection of abnormal increases
and decreases in cluster densities with respect to the time window being analysed.
Within the proposed architecture, MCODT is initialized with a time window of size
t. MCODT clusters data within this window, identifying anomalies using distance based
calculations. The clusters generated within this time window contain samples that
correspond to a type of network behaviour (Table 5). In parallel, windows are initialized
in time steps of St using the same configuration as the initial algorithm.
Fig. 3. Commonly Used Notation
Once the clusters have been correlated and shown to be representative of the same
class of network traffic, the cluster is persisted and given an ID. At the end of each
instance of MCODT, the clusters density is measured and compared to its own historical
activity and its activity in the other instances.
To successfully test the proposed method, NetFlow was collected in a 24-hour period
from within NPD. This was live data, and no previous insight about this 24hour period
was held. It was unknown if it held anomalies or not, simulating realworld conditions.
The NetFlow contained all communications, both internally and externally, during this
period. A total of 151,995,634 NetFlow samples. From these samples, 8 attributes were
extracted (Table 1). And from these, 6 attributes were selected to be used in the anomaly
detection calculations.
• Source/Destination IP
• Destination Port
• Source Port
• Destination Bytes
• Source Bytes
• Protocol ID
These attributes were normalized using theoretical maximums as well as observed
maximums over a 3-month period (Table 2).
For testing, a 24-hour example was chosen with no specific preference. No previous
assumptions existed about this data before testing. The architecture was configured with
an initial time window size of 5 minutes, and parallel instances were configured to run
at one minute intervals after this, leading to a total of 5 MCOD instances processing the
data in parallel. Configured variables are outlined in Table 3.
6 Results
In this section, we discuss the results of the proposed method of anomaly detection.
Anomalous samples that were identified at both stages of the proposed architecture are
outlined and analysed. Examples of normal activity of various types will also be
presented.
Due to the two-stage architecture of the proposed method, anomalies may be detected
in two different manners. Firstly, anomalies that are anomalous by the distanced based
calculations are outlined at the end of every time window. An anomaly outlined in this
window contains a point that never meets the required density irrespective of time. These
are regarded as Point Anomalies. Due to the relatively small time window of the MCODT
instances, the number of point anomalies detected within the first hour of processing was
vast. The number of additional anomalies fell rapidly over the course of the analysis.
Because of this, focus was placed on point anomalies that were detected after the initial
12 hours of analysis. Table 4 outlines two such samples that were correctly identified as
an anomaly.
Sample A represents a simple DNS request, which at first seemed like a false positive.
However, upon investigation, it was shown that this DNS request was from an internal
asset to an external DNS server. This incident was of interest to security technicians
within the NPD Group. Sample B was an unauthorized UPnP (Universal Plug and Play)
device connected to the network. It has been well documented how network security can
be effected by having a UPnP device hosted on a network [25]. Upon Detection, the
device was disconnected from the network.
Cluster densities were measured at regular intervals. Five instances of MCODT were
run in parallel, each with the same configuration settings (Table 3) The initial instance,
MCODTN0, was initialized at 00:00am on the day in question. One minute after this,
MCODTN1, was initialized, followed by MCODTN2 and so on. This low level of analysis
allows for the detection of possible malicious activity in as little as one minute after an
incident Table 5 outlines sample clusters, selected based on being classified as an
anomaly, and their activity over the course of one hour within the scope of all the
independent instances. This totals 1440 total densities measured for 1532 clusters
generated and persisted over the course of testing.
Anomalies were detected in all instances of MCODT. However, anomalies were
shown in both the independent analysis and correlative analysis. Of interest is an
anomaly detected (Table 5. Sample 9). This anomaly appears in all instances of the
algorithm, and is shown to have significant divergence from observed normal
behaviours. The rapid increase, once investigated, was attributed to a single asset. It was
shown to be connected to an external IP. It proceeded to attempt to open a connection to
the external asset, but never received any connection. This was of interest to the security
team within NPD, and was swiftly resolved. Sample 4 was also anomalous within the
test. The Cluster only appeared in one window, showing the extremely temporal nature
of the event. The other instances did not detect sufficient activity to generate a cluster. It
represents an extremely short burst of activity to an external device.
This paper proposed an architecture designed to detect anomalies within NetFlow data.
To achieve this at a micro level, a clustering algorithm was run in parallel to determine
anomalies in cluster activity in time-series. It was shown to be able to detect anomalies
in live data without any previous knowledge on the data. These anomalies were
investigated and shown to be of security interest. The result was interesting given that
the testing was conducted on real world, live data. However, to compile a complete
effectiveness rating for the algorithm would require a considerable amount of resources
if used on real world representative data.
Future work would include refining the extensibility of the algorithm. Due to the
abstraction of the density monitoring, it is possible to add attributes to MCODT’s feature
space in order to monitor changes in not only network traffic, but other metrics that could
attribute to the risk of a malicious attack, such as the vulnerability of an asset determined
by an external program.
References
1. Chen Z, Yeo CK, Francis BSL, Lau CT (2016) Combining MIC feature selection and
feature-based MSPCA for network traffic anomaly detection. In: 2016 Third Int. Conf.
Digit. Inf. Process. Data Min. Wirel. Commun. DIPDMWC. pp 176–181
2. Lin W-C, Ke S-W, Tsai C-F (2015) CANN: An intrusion detection system based on
combining cluster centers and nearest neighbors. Knowl-Based Syst 78:13–21. doi:
10.1016/j.knosys.2015.01.009
3. Fernandes Jr. G, Carvalho LF, Rodrigues JJPC, Proença Jr. ML (2016) Network anomaly
detection using IP flows with Principal Component Analysis and Ant Colony Optimization.
J Netw Comput Appl 64:1–11. doi: 10.1016/j.jnca.2015.11.024
4. Ciplinskas R, Paulauskas N (2016) Outlier Detection Method Use for the Network Flow
Anomaly Detection. Moksl - Liet Ateitis 8:327–333. doi: 10.3846/mla.2016.928
5. Wankhede R, Chole V (2016) Intrusion Detection System using Classification Technique.
Int J Comput Appl 139:25–28. doi: 10.5120/ijca2016909397
6. Goldstein M (2012) FastLOF: An Expectation-Maximization based Local Outlier detection
algorithm. In: 2012 21st Int. Conf. Pattern Recognit. ICPR. pp 2282–2285