Research
Research
Clustering Techniques
Rawat [45] and many more found that Clustering techniques work by grouping the
observed data into clusters, according to a given similarity or distance measure. There
exist at least two approaches to clustering based anomaly detection. In the first
approach, the anomaly detection model is trained using unlabeled data that consist of
both normal as well as attack traffic. In the second approach, the model is trained using
only normal data and a profile of normal activity is created. The idea behind the first
approach is that anomalous or attack data forms a small percentage of the total data. If
this assumption holds, anomalies and attacks can be detected based on cluster sizes
large clusters correspond to normal data, and the rest of the data points, which are
outliers, correspond to attacks.
The two typical unsupervised neural networks are self-organizing maps and adaptive
resonance theory. They used similarity to group objects. They are adequate for intrusion
detection tasks where normal behavior is densely concentrated around one or two
centers, while anomaly behavior and intrusions spread in space outside of normal
clusters.The Self-organizing map (SOM) is trained by an unsupervised competitive
learning algorithm [26]. The aim of the SOM is to reduce the dimension of data
visualization. That is, SOM outputs are clustered in a low dimensional (usually 2D or
3D) grid. It usually consists of an input layer and the Kohonen layer, which is designed
as the two-dimensional arrangement of neurons that maps n dimensional input to two
dimensions. Kohonen’s SOM associates each of the input vectors to a representative
output. The network finds the node nearest to each training case and moves the
winning node, which is the closest neuron (i.e. the neuron with minimum distance) in the
training course. That is, SOM maps similar input vectors onto the same or similar output
units on such a two-dimensional map, which leads to self-organize the output units into
an ordered map and the output units of similar weights are also placed nearby after
training. SOMs are the most popular neural networks to be trained for anomaly
detection tasks. For example Kayacik et al. [28], they have created three layers of
employment: First, individual SOM is associated with each basic TCP feature. Second
layer integrates the views provided by the first-level SOM into a single view of the
problem. The final layer is built for those neurons, which win for both attack and normal
behaviors. Oh and Chae [39] proposed an approach a real-time intrusion- detection
system based on SOM that groups similar data and visualizes their clusters. The
system labels the map produced by SOM using correlations between features. Jun et al.
[24] introduced a novel methodology to analysis the feature attributes of network traffic
flow with some new techniques, including a novel quantization model of TCP states.
Integrating with data preprocessing, the authors construct an anomaly detection
algorithm with SOFM and applied the detection frame to DARPA Intrusion Detection
Evaluation Data. Adaptive Resonance Theory (ART). The adaptive resonance theory
embraces a series of neural network models that perform unsupervised or supervised
learning, pattern recognition, and prediction. Unsupervised learning models Include
ART-1, ART- 2, ART-3, and Fuzzy ART. Various supervised networks are named with
the suffix ‘‘MAP’’, such as ARTMAP, Fuzzy ARTMAP, and Gaussian ARTMAP. Amini et
al. [1] Compared the performance of ART-1 (accepting binary inputs) and ART-2
(accepting continuous inputs) on KDD99 data. Liao et al. [29] deployed Fuzzy ART in an
adaptive learning framework which is suitable for dynamic changing environments.
Normal behavior changes are efficiently accommodated while anomalous activities can
still be identified
K-Means
K-means algorithm is a traditional clustering algorithm. It divides the data into k clusters,
and guarantee that the data within the same cluster are similar, while the data in a
various clusters have low similarities. K-means algorithm is first selected K data at
random as the initial cluster center, for the rest data add it to the cluster with the highest
similarity according to its distance to the cluster center, and then recalculate the cluster
center of each cluster. Repeat this process until each cluster center doesn’t change.
Thus data are divided into K clusters. Unfortunately, K-means clustering sensitive to the
outliers and a set of objects closer to a centroid may be empty, in which case centroids
cannot be updated[16]. [30] proposed K-means algorithms for anomaly detection.
Firstly, a method to reduce the noise and isolated points in the data set was advanced.
By dividing and merging clusters andusing the density radius of a super sphere, an
algorithm to calculate the number of the cluster centroid was given. By more accurate
method of finding k clustering center, an anomaly detection model was presented to get
better detection effect. Cuixiao et al. [7] proposed a mixed intrusion detection system
(IDS) model. Data are examined by the misuse detection module and then the detection
of abnormal data is performed by anomaly detection module. In this model,
unsupervised clustering method is used to build the anomaly detection module. The
algorithm used is an improved algorithm of K-means clustering algorithm and it is
demonstrate to have a high detection rate in the anomaly detection module.
Fuzzy C-means is a clustering method, which grants one piece of data to belong to two
or more clusters. It was developed by Dunn [9] and improved later by Bezdek [3], it is
used in applications for which hard classification of data is not meaningful or difficult to
achieve (e.g, pattern recognition). C-means algorithm is similar to K-Means except that
membership of each point is defined based on a fuzzy function and all the points
contribute to the relocation of a cluster centroid based on their fuzzy membership to that
cluster.Shingo et al. [52] proposed a new approach called FC-ANN, based on ANN and
fuzzy clustering to solve the problem and help IDS achieving higher detection rate, less
false positive rate and stronger stability. Yu and Jian [58] proposed an approach
integrating several soft computing techniques to build a hierarchical neuro-fuzzy
inference intrusion detection system. In this approach, principal component analysis
neural network is used to reduce feature space dimensions. The preprocessed data
were clustered by applying an enhanced fuzzy C-means clustering algorithm to extract
and manage fuzzy rules. Another approach that uses a fuzzy approach for
unsupervised clustering is presented by Shah et al. [50]. They employed the Fuzzy
C-Medoids (FCMdd) in order to index cluster streams of system call, low level Kernel
data and network data.
The one-class support vector machine is a very specified sample of a support vector
machine which is geared for anomaly detection. The one-class SVM varies from the
SVM generic version in that the resulting problem of quadratic optimization includes an
allowance for a specific small predefined outliers percentage, making it proper for
anomaly detection. These outliers lie between the origin and the optimal separating
hyper plane. All the remaining data fall on the opposite side of the optimal separating
hyper plane, belonging to a single nominal class, hence the terminology “one-class”
SVM. The SVM outputs a score that represents the distance from the data point being
tested to the optimal hyper plane. Positive values for the one-class SVM output
represent normal behavior (with higher values representing greater normality) and
negative values represent abnormal behavior (with lower values representing greater
abnormality) [42]. Eskin et al. [11] and Honig et al. [19] used an SVM in addition to their
clustering methods for unsupervised learning. The SVM algorithm had to be modified a
little to work in unsupervised learning domain. Once it was, it performs better than both
of their clustering methods. Shon and Moon [53] suggested a new SVM approach,
named Enhanced SVM, which merges (soft-margin SVM method and one-class SVM)
in order to provide unsupervised learning and low false alarm capability, similar to that
of a supervised SVM approach. Rui et al. [46] proposed a method for network anomaly
detection based on one class support vector machine (OCSVM). The method contains
two main steps: first is the detector training, the training data set is used to generate the
OCSVM detector, which is capable to learn the data nominal profile, and the second
step is to detect the anomalies in the performance data with the trained detector.
K -Nearest
Neighbor ● Simplicity: KNN is a ● High computational
simple algorithm and cost with large
easy to understand, datasets and
making. numerous features.
● No Training Phase. ● Significant memory
● Non-parametric usage as it
● no Adaptability memorizes the entire
● No Model Building training dataset.
● Sensitivity to noise
and irrelevant
features, requiring
preprocessing steps.
● Dependency on
selecting an optimal K
value, affecting
performance.
● Tendency to favor
majority classes in
imbalanced datasets,
leading to biased
predictions.
Neural network
● Powerful for complex ● Computationally
relationships intensive
● Adaptable to various ● Black-box nature
tasks ● Data dependency
● Automatic feature ● Prone to overfitting
extraction ● Require
● Parallel processing hyperparameter
capability tuning
● Robust to noise
Descision tree
● Interpretable ● Prone to overfitting
● No data ● Instability
preprocessing needed ● Limited
● Efficient expressiveness
● Can handle nonlinear ● Bias towards features
relationships with many levels
● Provide feature ● Difficulty with
importance continuous variables
support vector
machine ● Effective in ● Computationally
high-dimensional intensive, especially
spaces for large datasets
● Versatile with various ● Requires proper
kernel functions selection of kernel
● Robust to overfitting and tuning of
due to margin hyperparameters
maximization ● Doesn't provide
● Works well with small probability estimates
to medium-sized directly
datasets ● Sensitive to noise and
● Effective in cases outliers
where the number of
features exceeds the
number of samples
Self-Organizing
map ● Unsupervised learning ● Initialization
with topological sensitivity
properties ● Tendency to converge
preservation to local minima
● Effective for ● Need for tuning
dimensionality parameters such as
reduction and learning rate and
visualization neighborhood size
● Can handle non-linear ● Requires careful
relationships in data interpretation of
● Robust to noise results
● Can reveal hidden ● Computationally
structures in data intensive for large
datasets
K-means
● Simple and easy to ● Requires the number
implement of clusters (K) to be
● Scalable to large specified in advance
datasets ● Sensitive to initial
● Efficient cluster centroids
computational ● May converge to local
complexity optima
● Can handle large ● Assumes spherical
feature spaces clusters of similar
● Clusters can be easily sizes
interpreted ● Doesn't work well with
non-linear data
distributions
fuzzy C-means
● Provides soft ● Sensitive to the
clustering assigning choice of initial
membership cluster centers
probabilities to ● Computationally
clusters intensive, especially
● More robust to noise for large datasets
and outliers ● Interpretation of
compared to K-means cluster membership is
● Can handle more complex than in
overlapping clusters K-means
● Allows gradual ● Requires tuning
transition between parameters such as
clusters fuzziness coefficient
● No need to specify the ● May not perform well
number of clusters with non-convex
precisely clusters
Expetation
Maximization ● General framework for ● Sensitive to
unsupervised learning, initialization of
applicable to various parameters
probabilistic models ● Computationally
● Handles missing data intensive, especially
well for large datasets
● Provides soft ● May converge to local
clustering with optima
probability ● Requires assumptions
distributions about data
● Can model complex distribution (e.g.,
data distributions Gaussian)
● Guarantees ● Interpretation of
convergence to local results can be
optimum complex, especially
with high-dimensional
data
conclusion
Machine learning techniques have received considerable attention among the intrusion
detection researchers to address the weaknesses of knowledge base detection
techniques.
performance over the other techniques although they differ in their capabilities of
detecting all attacks classes efficiently.
references
and Cybernetics.
Evolutionary Computing.
Publishers.