0% found this document useful (0 votes)
38 views15 pages

Research

Unsupervised machine learning techniques like clustering algorithms can be used for anomaly detection in network traffic without labeled data. K-means clustering partitions traffic data into clusters of normal behavior and flags outliers as potential anomalies. DBSCAN also identifies dense clusters of normal traffic and outliers that may represent security threats. These techniques analyze features of network traffic to group typical patterns and detect abnormal deviations without needing examples of attacks or anomalies.

Uploaded by

neerajkumawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views15 pages

Research

Unsupervised machine learning techniques like clustering algorithms can be used for anomaly detection in network traffic without labeled data. K-means clustering partitions traffic data into clusters of normal behavior and flags outliers as potential anomalies. DBSCAN also identifies dense clusters of normal traffic and outliers that may represent security threats. These techniques analyze features of network traffic to group typical patterns and detect abnormal deviations without needing examples of attacks or anomalies.

Uploaded by

neerajkumawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Abstract

The advent of IoT technology and the increase in wireless networking


devices has led to an enormous increase in network attacks from
different sources. To maintain networks as safe and secure, the Intrusion
Detection System (IDS) have become very critical. Intrusion Detection
Systems (IDS) are designed to protect the network by identifying
anomaly behaviors or improper uses. Intrusion Detection systems
provide more meticulous security functionality than access control
barriers by detecting attempted and successful attacks at the end-point or
within the network. Intrusion prevention systems are the next logical
step to this approach as they can take real-time action against breaches.
To have an accurate IDS, detailed visibility is required into the network
traffic. The intrusion detection system should be able to detect inside the
network threats as well as access control breaches. IDS has been around
for a very long.Intrusion Detection Systems (IDS) is branch of
cybersecurity can utilize unsupervised machine learning techniques as
part of their methodology. Unsupervised machine learning is often
employed in IDS for anomaly detection, where the system learns normal
behavior patterns and raises alerts when it detects deviations or
anomalies that may indicate potential security threats. This allows the
IDS to adapt to evolving attack patterns without relying on predefined
.Unsupervised machine learning in network traffic analysis involves
methods like clustering and anomaly detection to identify patterns,
detect intrusions, or flag unusual behavior in network data without the
need for labeled examples. Techniques such as k-means clustering,
DBSCAN, and Isolation Forest are commonly used for this
purpose.Intrusion detection systems (IDS) using unsupervised machine
learning techniques are designed to identify anomalous behavior in
network traffic without relying on labeled training data. Unsupervised
methods, such as clustering algorithms like k-means or density-based
techniques like DBSCAN, can detect deviations from normal patterns in
network traffic, indicating potential intrusions or anomalies. These
systems analyze various features of network traffic, such as packet
headers, traffic volume, and protocol behavior, to identify suspicious
activity. However, they may also generate false positives and require
careful tuning to balance detection accuracy and performance.
Keywords : Internet of things, wireless networking,network attacks and
traffic, unsupervised machine learning , intrusion detection systems etc.
introduction
Unsupervised machine learning plays a crucial role in anomaly
detection in network traffic. Algorithms like k-means clustering,
DBSCAN, and isolation forests can identify unusual patterns or outliers
in network data without the need for labeled examples. By analyzing
various features such as packet size, protocol type, and timing, these
algorithms can flag potentially malicious activity or unusual behavior,
helping to enhance network security.
K-means clustering serves as a powerful unsupervised learning
technique for identifying anomalies in network traffic by partitioning the
data into clusters and flagging data points that deviate significantly from
the norm.
Grouping Normal Behavior: Initially, K-means clustering can be applied
to partition network traffic data into clusters representing typical or
normal behavior. The centroids of these clusters represent the normal
patterns of network traffic.
Detecting Anomalies: Once the clusters are established, any data point
that does not belong to any of the clusters or is significantly distant from
the centroids can be considered an anomaly. These outliers could
indicate potential security threats, network intrusions, or abnormal
behavior.
Thresholding: By setting thresholds based on distances from cluster
centroids or other statistical measures, anomalies can be detected in
real-time as new data comes in. If a data point falls outside these
thresholds, it is flagged as an anomaly, triggering alerts or further
investigation.
Dynamic Updates: Periodically, the clustering algorithm can be retrained
on the latest data to adapt to changes in network behavior. This ensures
that the model remains effective in detecting both known and emerging
anomalies.
Feature Engineering: Feature selection and engineering play a crucial
role in the effectiveness of K-means clustering for anomaly detection.
Choosing relevant network traffic attributes (features) such as packet
size, protocol type, source and destination IP addresses, etc., can
enhance the clustering performance and anomaly detection accuracy.
DBSCAN is effective for anomaly detection in network traffic by
identifying dense clusters of normal behavior and isolating outliers that
represent potential anomalies or malicious activities. It offers advantages
such as noise handling, adaptability to varying cluster shapes and sizes,
and scalability, making it a valuable tool for network security
applications.
DBSCAN (Density-Based Spatial Clustering of Applications with
Noise) is anotherr clustering algorithm that can be used for anomaly
detection in network traffic. Here’s how it works in this
context:Density-Based Clustering: DBSCAN works by grouping
together closely packed points in high-density regions. In the context of
network traffic, this means that it can identify clusters of data points that
represent normal behavior or typical patterns of
communication.Identification of Outliers: DBSCAN also identifies
points that lie in low-density regions, which are typically considered
outliers or anomalies. These outliers can represent unusual or potentially
malicious activity in the network.Parameter Tuning: DBSCAN requires
two parameters: epsilon (ε), which defines the radius within which to
search for neighboring points, and minPts, which specifies the minimum
number of points required to form a dense region. Tuning these
parameters is crucial for effective anomaly detection.
Noise Handling: DBSCAN inherently handles noise in the data by
classifying points that do not belong to any cluster as noise. These noisy
points are potential anomalies in the network traffic.Adaptability to
Varying Cluster Shapes and Sizes: Unlike K-means, which assumes
spherical clusters of similar sizes, DBSCAN can detect clusters of
arbitrary shapes and sizes, making it more suitable for detecting
anomalies in complex network traffic patterns.Scalability: DBSCAN is
relatively scalable and efficient, making it suitable for analyzing large
volumes of network traffic data in real-time.
Anomaly detection in network traffic using unsupervised machine
learning involves identifying unusual patterns that deviate from normal
behavior. Techniques like clustering or autoencoders can help detect
anomalies by learning the typical patterns in network data. Need more
details or assistance with a specific aspect.
1)Baseline Establishment: Create a baseline of normal network behavior
by analyzing historical data to understand regular patterns and
characteristics.
2)Unsupervised Learning: Utilize unsupervised machine learning
techniques such as clustering or autoencoders to detect anomalies
without relying on predefined labels or signatures.
3)Feature Extraction: Identify relevant features from network traffic
data, considering factors like packet size, frequency, protocols, and
communication patterns.
4)Real-time Monitoring: Continuously monitor network traffic in
real-time to promptly identify deviations or anomalies as they occur.
5)Thresholds and Alarms: Set thresholds based on normal behavior and
trigger alarms or alerts when network activities surpass these thresholds,
indicating potential anomalies.
6)Behavioral Analysis: Use behavioral analysis to understand the typical
interactions between devices, services, or users, making it easier to
identify deviations.
7)Adaptive Models: Implement adaptive models that can adjust to
changes in network patterns and adapt to evolving threats over time.
8)Integration with Security Information and Event Management (SIEM):
Integrate anomaly detection with SIEM solutions for comprehensive
security monitoring, analysis, and response .8)Integration with Security
Information and Event Management (SIEM): Integrate anomaly
detection with SIEM solutions for comprehensive security monitoring,
analysis, and response
9)Feedback Loop: Establish a feedback loop to continuously improve
the anomaly detection model based on the evolving nature of network
traffic and potential new threats.
10)Incident Response: Develop a robust incident response plan to
address and mitigate security incidents identified through anomaly
detection.
The effectiveness of anomaly detection relies on a combination of these
techniques and continuous refinement based on the evolving nature of
network behavior and potential threats.
Anomaly detection technique
The most common

unsupervised algorithms are, K-Means, Self-organizing maps(SOM), C-means,


Expectation-Maximization Meta algorithm(EM), Adaptive resonance theory (ART),
Unsupervised Niche Clustering (UNC) and One-Class Support Vector Machine.

Clustering Techniques

Rawat [45] and many more found that Clustering techniques work by grouping the
observed data into clusters, according to a given similarity or distance measure. There
exist at least two approaches to clustering based anomaly detection. In the first
approach, the anomaly detection model is trained using unlabeled data that consist of
both normal as well as attack traffic. In the second approach, the model is trained using
only normal data and a profile of normal activity is created. The idea behind the first
approach is that anomalous or attack data forms a small percentage of the total data. If
this assumption holds, anomalies and attacks can be detected based on cluster sizes
large clusters correspond to normal data, and the rest of the data points, which are
outliers, correspond to attacks.

Unsupervised Neural Network

The two typical unsupervised neural networks are self-organizing maps and adaptive
resonance theory. They used similarity to group objects. They are adequate for intrusion
detection tasks where normal behavior is densely concentrated around one or two
centers, while anomaly behavior and intrusions spread in space outside of normal
clusters.The Self-organizing map (SOM) is trained by an unsupervised competitive
learning algorithm [26]. The aim of the SOM is to reduce the dimension of data
visualization. That is, SOM outputs are clustered in a low dimensional (usually 2D or
3D) grid. It usually consists of an input layer and the Kohonen layer, which is designed
as the two-dimensional arrangement of neurons that maps n dimensional input to two
dimensions. Kohonen’s SOM associates each of the input vectors to a representative
output. The network finds the node nearest to each training case and moves the
winning node, which is the closest neuron (i.e. the neuron with minimum distance) in the
training course. That is, SOM maps similar input vectors onto the same or similar output
units on such a two-dimensional map, which leads to self-organize the output units into
an ordered map and the output units of similar weights are also placed nearby after
training. SOMs are the most popular neural networks to be trained for anomaly
detection tasks. For example Kayacik et al. [28], they have created three layers of
employment: First, individual SOM is associated with each basic TCP feature. Second
layer integrates the views provided by the first-level SOM into a single view of the
problem. The final layer is built for those neurons, which win for both attack and normal
behaviors. Oh and Chae [39] proposed an approach a real-time intrusion- detection
system based on SOM that groups similar data and visualizes their clusters. The
system labels the map produced by SOM using correlations between features. Jun et al.
[24] introduced a novel methodology to analysis the feature attributes of network traffic
flow with some new techniques, including a novel quantization model of TCP states.
Integrating with data preprocessing, the authors construct an anomaly detection
algorithm with SOFM and applied the detection frame to DARPA Intrusion Detection
Evaluation Data. Adaptive Resonance Theory (ART). The adaptive resonance theory
embraces a series of neural network models that perform unsupervised or supervised
learning, pattern recognition, and prediction. Unsupervised learning models Include
ART-1, ART- 2, ART-3, and Fuzzy ART. Various supervised networks are named with
the suffix ‘‘MAP’’, such as ARTMAP, Fuzzy ARTMAP, and Gaussian ARTMAP. Amini et
al. [1] Compared the performance of ART-1 (accepting binary inputs) and ART-2
(accepting continuous inputs) on KDD99 data. Liao et al. [29] deployed Fuzzy ART in an
adaptive learning framework which is suitable for dynamic changing environments.
Normal behavior changes are efficiently accommodated while anomalous activities can
still be identified

K-Means

K-means algorithm is a traditional clustering algorithm. It divides the data into k clusters,
and guarantee that the data within the same cluster are similar, while the data in a
various clusters have low similarities. K-means algorithm is first selected K data at
random as the initial cluster center, for the rest data add it to the cluster with the highest
similarity according to its distance to the cluster center, and then recalculate the cluster
center of each cluster. Repeat this process until each cluster center doesn’t change.
Thus data are divided into K clusters. Unfortunately, K-means clustering sensitive to the
outliers and a set of objects closer to a centroid may be empty, in which case centroids
cannot be updated[16]. [30] proposed K-means algorithms for anomaly detection.

Firstly, a method to reduce the noise and isolated points in the data set was advanced.
By dividing and merging clusters andusing the density radius of a super sphere, an
algorithm to calculate the number of the cluster centroid was given. By more accurate
method of finding k clustering center, an anomaly detection model was presented to get
better detection effect. Cuixiao et al. [7] proposed a mixed intrusion detection system
(IDS) model. Data are examined by the misuse detection module and then the detection
of abnormal data is performed by anomaly detection module. In this model,
unsupervised clustering method is used to build the anomaly detection module. The
algorithm used is an improved algorithm of K-means clustering algorithm and it is
demonstrate to have a high detection rate in the anomaly detection module.

fuzzy C-Means (FCM)

Fuzzy C-means is a clustering method, which grants one piece of data to belong to two
or more clusters. It was developed by Dunn [9] and improved later by Bezdek [3], it is
used in applications for which hard classification of data is not meaningful or difficult to
achieve (e.g, pattern recognition). C-means algorithm is similar to K-Means except that
membership of each point is defined based on a fuzzy function and all the points
contribute to the relocation of a cluster centroid based on their fuzzy membership to that
cluster.Shingo et al. [52] proposed a new approach called FC-ANN, based on ANN and
fuzzy clustering to solve the problem and help IDS achieving higher detection rate, less
false positive rate and stronger stability. Yu and Jian [58] proposed an approach
integrating several soft computing techniques to build a hierarchical neuro-fuzzy
inference intrusion detection system. In this approach, principal component analysis
neural network is used to reduce feature space dimensions. The preprocessed data
were clustered by applying an enhanced fuzzy C-means clustering algorithm to extract
and manage fuzzy rules. Another approach that uses a fuzzy approach for
unsupervised clustering is presented by Shah et al. [50]. They employed the Fuzzy
C-Medoids (FCMdd) in order to index cluster streams of system call, low level Kernel
data and network data.

Unsupervised Niche Clustering (UNC)

(UNC) is a robust clustering algorithm, which uses an evolutionary algorithm with a


niching strategy (Nasraoui et al. [38]. The evolutionary algorithm helps to find clusters
using a robust density fitness function, while the niching technique allows it to create
and maintain the niches (candidate clusters). Since UNC is based on genetic
optimization, it is much less susceptible to suboptimal solutions than traditional
techniques. The algorithm main advantage is the ability to handle noise and to
determine clusters number automatically. Elizabeth et al. [10] combined the UNC with
fuzzy set theory for anomaly detection and applied it to network intrusion detection.
They associated to each cluster generated by the UNC a member function that follows a
Gaussian shape using evolved cluster center and radius. Such cluster membership
functions will define the normalcy level of a data sample.
Expectation-Maximization Meta Algorithm(EM)

EM is another soft clustering method based on Expectation- Maximization Meta


algorithm Dempster et al. [8].Expectation-Maximization is an algorithm for finding
maximum probability estimates of parameters in probabilistic models. EM clustering
algorithm alternates between performing expectation (E) step, by computing an
estimation of likelihood using current model parameters (as if they are known), and a
maximization (M) step, by computing the maximum probability estimates of model
parameters. The model parameters new estimations contribute to an expectation step of
next iteration. Hajji [15] used Gaussian mixture models to characterize utilization
measurements. Model parameters are estimate using Expectation-Maximization (EM)
algorithm and anomalies are detected corresponding to network failure events. Animesh
and Jung [2] proposed an anomaly detection scheme, called SCAN to address the
threats posed by network-based denial of service attacks in high speed networks. The
noteworthy features of SCAN include: (a) it rationally samples the incoming network
traffic to reduce the amount of audit data being sampled while retaining the intrinsic
characteristics of the network traffic itself; (b) it computes the missing elements of the
sampled audit data by using an enhanced Expectation-Maximization (EM)
algorithm-based clustering algorithm; and (c) it enhances the convergence speed of the
clustering process by employing Bloom filters and data summaries.

One -Class Support Vector Machine(OCSVM)

The one-class support vector machine is a very specified sample of a support vector
machine which is geared for anomaly detection. The one-class SVM varies from the
SVM generic version in that the resulting problem of quadratic optimization includes an
allowance for a specific small predefined outliers percentage, making it proper for
anomaly detection. These outliers lie between the origin and the optimal separating
hyper plane. All the remaining data fall on the opposite side of the optimal separating
hyper plane, belonging to a single nominal class, hence the terminology “one-class”
SVM. The SVM outputs a score that represents the distance from the data point being
tested to the optimal hyper plane. Positive values for the one-class SVM output
represent normal behavior (with higher values representing greater normality) and
negative values represent abnormal behavior (with lower values representing greater
abnormality) [42]. Eskin et al. [11] and Honig et al. [19] used an SVM in addition to their
clustering methods for unsupervised learning. The SVM algorithm had to be modified a
little to work in unsupervised learning domain. Once it was, it performs better than both
of their clustering methods. Shon and Moon [53] suggested a new SVM approach,
named Enhanced SVM, which merges (soft-margin SVM method and one-class SVM)
in order to provide unsupervised learning and low false alarm capability, similar to that
of a supervised SVM approach. Rui et al. [46] proposed a method for network anomaly
detection based on one class support vector machine (OCSVM). The method contains
two main steps: first is the detector training, the training data set is used to generate the
OCSVM detector, which is capable to learn the data nominal profile, and the second
step is to detect the anomalies in the performance data with the trained detector.

pros and cons technique for anomaly detection

Technique pros cons

K -Nearest
Neighbor ● Simplicity: KNN is a ● High computational
simple algorithm and cost with large
easy to understand, datasets and
making. numerous features.
● No Training Phase. ● Significant memory
● Non-parametric usage as it
● no Adaptability memorizes the entire
● No Model Building training dataset.
● Sensitivity to noise
and irrelevant
features, requiring
preprocessing steps.
● Dependency on
selecting an optimal K
value, affecting
performance.
● Tendency to favor
majority classes in
imbalanced datasets,
leading to biased
predictions.

Neural network
● Powerful for complex ● Computationally
relationships intensive
● Adaptable to various ● Black-box nature
tasks ● Data dependency
● Automatic feature ● Prone to overfitting
extraction ● Require
● Parallel processing hyperparameter
capability tuning
● Robust to noise

Descision tree
● Interpretable ● Prone to overfitting
● No data ● Instability
preprocessing needed ● Limited
● Efficient expressiveness
● Can handle nonlinear ● Bias towards features
relationships with many levels
● Provide feature ● Difficulty with
importance continuous variables

support vector
machine ● Effective in ● Computationally
high-dimensional intensive, especially
spaces for large datasets
● Versatile with various ● Requires proper
kernel functions selection of kernel
● Robust to overfitting and tuning of
due to margin hyperparameters
maximization ● Doesn't provide
● Works well with small probability estimates
to medium-sized directly
datasets ● Sensitive to noise and
● Effective in cases outliers
where the number of
features exceeds the
number of samples

Self-Organizing
map ● Unsupervised learning ● Initialization
with topological sensitivity
properties ● Tendency to converge
preservation to local minima
● Effective for ● Need for tuning
dimensionality parameters such as
reduction and learning rate and
visualization neighborhood size
● Can handle non-linear ● Requires careful
relationships in data interpretation of
● Robust to noise results
● Can reveal hidden ● Computationally
structures in data intensive for large
datasets

K-means
● Simple and easy to ● Requires the number
implement of clusters (K) to be
● Scalable to large specified in advance
datasets ● Sensitive to initial
● Efficient cluster centroids
computational ● May converge to local
complexity optima
● Can handle large ● Assumes spherical
feature spaces clusters of similar
● Clusters can be easily sizes
interpreted ● Doesn't work well with
non-linear data
distributions

fuzzy C-means
● Provides soft ● Sensitive to the
clustering assigning choice of initial
membership cluster centers
probabilities to ● Computationally
clusters intensive, especially
● More robust to noise for large datasets
and outliers ● Interpretation of
compared to K-means cluster membership is
● Can handle more complex than in
overlapping clusters K-means
● Allows gradual ● Requires tuning
transition between parameters such as
clusters fuzziness coefficient
● No need to specify the ● May not perform well
number of clusters with non-convex
precisely clusters
Expetation
Maximization ● General framework for ● Sensitive to
unsupervised learning, initialization of
applicable to various parameters
probabilistic models ● Computationally
● Handles missing data intensive, especially
well for large datasets
● Provides soft ● May converge to local
clustering with optima
probability ● Requires assumptions
distributions about data
● Can model complex distribution (e.g.,
data distributions Gaussian)
● Guarantees ● Interpretation of
convergence to local results can be
optimum complex, especially
with high-dimensional
data

conclusion

Machine learning techniques have received considerable attention among the intrusion
detection researchers to address the weaknesses of knowledge base detection
techniques.

anomaly detection by unsupervised techniques. Many algorithms were used to achieve


good results for these techniques. propose of this paper

an overview of technique of unsupervise machine learning for anomaly detection.


Techniques for unsupervised such as

K-Means, SOM, and one class SVM achieved better

performance over the other techniques although they differ in their capabilities of
detecting all attacks classes efficiently.

references

Amini and Jalili. 2004. Network-based intrusion

detection using unsupervised adaptive resonance theory.

in Proceedings of the 4th Conference on Engineering of


Intelligent Systems (EIS’04).

[2] Animesh, P. and Jung,M. 2007. “Network Anomaly

Detection with Incomplete Audit Data”. Elsevier

Science,12 February, 2007, pp. 5-35.

[3] Bezdek, J. 1981.” Pattern recognition with fuzzy

objective function algorithms”. Kluwer Academic

Publishers, Norwell, MA, USA (1981).

[4] Bishop, C.1995. Neural networks for pattern recognition

England, Oxford University.

[5] Bouzida, F., Cuppens,B. and Gombault,s.2004.Efficient

intrusion detection using principal component analysis.

in Proceedings of the 3ème Conférence sur la Sécurité et

Architectures Réseaux (SAR).

[6] Chan, F. , Yeung,S. and Tsang,S.2005. Comparison of

different fusion approaches for network intrusion

detection using an ensemble of RBFNN. in: Proceedings

of 2005 International Conference on Machine Learning

and Cybernetics.

[7] Guobing,Z.,Cuixia,Z.and Shanshan,s.2009. A Mixed

Unsupervised Clustering-based Intrusion Detection

Model. Third International Conference on Genetic and

Evolutionary Computing.

[8] Dempster,A., Laird, N.and Rubin, D. 1977.” Maximum

likelihood from incomplete Data via the EM algorithm”.

J. Royal Stat, Soc, Vol. 39, 1977, pp. 1–38.


[9] Dunn, J. 1973.” A fuzzy relative of the ISO data process

and its use in detecting compact well-separated clusters”.

Journal of Cyber natics, Vol.3(3), pp. 32–57.

[10] Lizabeth, L., Olfa, N. and Jonatan,G.2007. Anomaly

detection based on unsupervised niche clustering with

application to network intrusion detection. Proceedings

of the IEEE Conference on Evolutionary Computation.

[11] Eskin,E.,Arnold,A .,Preraua,M., Portnoy.L and

Stolfo,S.” A geometric framework for unsupervised

anomaly detection: Detecting intrusions in unlabeled

data”. In D. Barber and S. Jajodia (Eds.). Data Mining for

Security Applications. Boston: Kluwer Academic

Publishers.

You might also like