0% found this document useful (0 votes)
16 views

Autoencoder-Based Anomaly Detection in Network Traffic

Uploaded by

paulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Autoencoder-Based Anomaly Detection in Network Traffic

Uploaded by

paulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Autoencoder-Based Anomaly Detection in Network

Traffic
2024 25th International Conference on Computational Problems of Electrical Engineering (CPEE) | 979-8-3315-0664-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/CPEE64152.2024.10720411

Krzysztof Korniszuk Bartosz Sawicki


Faculty of Electrical Engineering Faculty of Electrical Engineering
Warsaw University of Technology Warsaw University of Technology
Warsaw, Poland Warsaw, Poland
[email protected] [email protected]

Abstract—Due to the continuously increasing number of re- learning models, particularly neural networks, can process
sources and data availability in the cloud, the threats related large datasets, which is essential for cybersecurity applications.
to the security of computer networks and IT systems are Examples of AI-based systems for threat detection in IT
critical. Threat detection systems based on deep neural networks
and anomaly detection are trained on data related to normal networks include security information and event management
activity so that the network can recognize unusual patterns and (SIEM) platforms and intrusion detection systems (IDS). By
behaviours in the event of an attack or an attempt to infiltrate leveraging deep learning, these systems can uncover complex
a given IT infrastructure. This paper presents the results of de- patterns in network traffic, such as port activity or packet
veloping a neural network based on an autoencoder for anomaly header details, which may be imperceptible to human analysts.
detection in network packet data. The network was trained on
data from the HIKARI-2021 dataset. The autoencoder aims to This capability to detect subtle correlations in network data
learn representations of normal network traffic and associate this significantly enhances the effectiveness of anomaly detection
type of traffic with a minimal reconstruction error. The obtained in computer network security.
results were compared with those achieved by authors of other One technique worth mentioning in the context of security
works. High accuracy and sensitivity were achieved at the cost is anomaly detection [1]. Machine learning algorithms, both
of rather low precision, resulting in many false-positive results.
A simple algorithm based on a single threshold value proved traditional and deep learning, can be trained to recognize
efficient but limited in terms of effectiveness. This problem can normal activity in a given system. Then, when given a sample
be resolved by changing the method of calculating the individual of data that stands out, the algorithm should be able to
components of the vector, using only a subset of features, and tell the difference. These techniques enable the detection of
deriving multiple vectors, one for each class separately, which subtle deviations from normal network behaviour, which may
has been described and analyzed in more detail.
Index Terms—anomaly detection, deep learning, cybersecurity, indicate a range of security threats, including Distributed
autoencoder, threat detection Denial of Service (DDoS) attacks, unauthorized access, or data
exfiltration.
I. I NTRODUCTION In this work, publicly available dataset HIKARI-2021 will
The security of IT systems has been a critical concern be briefly analyzed. Then, a deep learning-based autoencoder
since their initial commercial deployment. With constantly algorithm will be presented, followed by a comparison with
evolving environments and the emergence of new types of other approaches applied to the same dataset and their respec-
cyberattacks, relying solely on security systems based on tive results. Finally, potential improvements and directions for
predefined behavioural patterns is no longer feasible. One future research will be proposed.
promising solution to this challenge is the application of ma-
chine learning (ML) and artificial intelligence (AI) techniques II. DATASET
in cybersecurity. Security vulnerabilities can lead to the loss The OSI (Open Systems Interconnection) model plays a
of data, financial assets, and reputational damage, which can critical role in IT security. It defines communication between
be difficult to recover from. To mitigate these risks, machine the various layers of an IT system, with data being transmitted
learning approaches have been increasingly employed, as they from the application layer—closest to the user—through the
can process large datasets using sophisticated algorithms. transport layers, down to the physical layer. This model is
Machine learning techniques can be categorized into three crucial for security as threats can be detected at different
primary types: supervised, unsupervised, and reinforcement. stages of communication. In the application layer, for example,
One of the significant advantages of ML is its ability to rapidly threats such as malicious queries to a server or attempts to steal
and effectively analyze vast quantities of data—a necessity in user credentials can be identified. The datasets analyzed in this
cybersecurity, where IT systems generate massive volumes of work primarily focus on the network layer.
logs, events, and entries. Several publicly available datasets were examined for this
A key area within machine learning is deep learning, study. Despite the rapid evolution of cybersecurity, many
a subfield that mimics the human learning process. Deep studies still rely on outdated datasets, such as the KDD99

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO RIO DE JANEIRO. Downloaded on November 21,2024 at 23:53:53 UTC from IEEE Xplore. Restrictions apply.
dataset, which is 25 years old at the time of this writing. collecting the most important features of the set in a reduced
Given the constant changes in the cybersecurity landscape, form, presented using a smaller number of dimensions. The
utilizing more up-to-date datasets that reflect current standards previously mentioned encoder learns the feature representation
and attack patterns would likely yield more accurate and by using narrowed deep layers - according to the manifold
meaningful research outcomes. hypothesis, high-dimensional data can be represented with
The HIKARI dataset was selected for further analysis due fewer dimensions [6]. The decoder is responsible for creating
to its size, number of features, and focus on encrypted data. a reconstruction of input features using the encoded informa-
It is also the most recent dataset among those analyzed in this tion in the hidden layers. Figure 1 presents an example of
paper, released in August 2021. The dataset was generated in a an autoencoder with three hidden layers. For simplicity of
controlled laboratory environment, simulating real-world user mathematical description, we can reduce the problem to only
behaviour and incorporating synthetic attack scenarios [5]. It one layer. Then, an encoder can be defined as the following
contains over 80 features and approximately 250,000 records, expression:
each labelled to indicate whether it represents an attack. The
HIKARI dataset includes three types of attacks: XMRIGCC H = σ(Wx X + bx ), (1)
CryptoMiner, Bruteforce-XML, and Bruteforce.
Most of the HIKARI features come from a similar CICIDS- where H is the representation of data in the hidden layer, σ
2017 dataset. They include information on basic flow features is activation function, Wx is the weight matrix, X is an input
(duration, IP addresses), packet lengths, and statistics such as vector and b is the bias vector [8]. It transforms the input
mean and standard deviation for packet and header lengths. vector X to the hidden representation H using the activation
Additionally, source and destination IP addresses and ports function σ. The transformation on the hidden layer H aims to
are provided [5]. reconstruct the input layer using the decoder:
III. I MPLEMENTATION OF ANOMALY DETECTION
ALGORITHM X̂ = σ(Wx H + bx̂ ), (2)
One type of network that can be used for anomaly detection which implies a reconstructed vector X̂ of X. The difference
tasks is an autoencoder. This is a specific type of neural between the reconstructed vector X̂ and X produces a recon-
network whose input and output size are the same. The main struction error:
task of an autoencoder is to reconstruct the input data. It
consists of two parts - encoder and decoder.
r = ∥X − X̂∥ (3)

The use of autoencoders to reduce data complexity is useful


in the context of cybersecurity. The encoder reduces the
number of dimensions in the input data. In anomaly detection,
the data for training the network must represent only normal
activity in the system. Then, normal and anomalous data can
be found in the test set. The second part of the network,
the decoder, aims to reconstruct the input data. In analysis
detection, the previously mentioned concept of reconstruction
error is used. It can be assumed that a high value of the
reconstruction error indicates anomalous data and a low value
indicates normal data.
A separate issue is determining the exact value of recon-
struction error threshold above which the data will be treated
as anomalies. The Receiver Operating Characteristic Curve
(ROC) was the method used to develop the anomaly detection
mechanism. It is a graph showing the performance of a binary
classifier, in the case of indicating whether a given sample is an
attack (see Fig. 3). The ROC curve is plotted on a unit square,
where the points correspond to the sensitivity and precision
values at different cut-off thresholds. The x-axis shows the
Fig. 1. The structure of autoencoder. Input and output layers have more
neurons than the hidden layers.
percentage of false positives among all true negative cases.
The y-axis shows the sensitivity, i.e., the percentage of true
The learning process assumes a reduced number of neurons positives among all true positive cases. The better the model,
in hidden layers compared to input layers, which will allow the larger the area under the graph. The whole thing aims at
for the reconstruction of the input feature pattern without dis- compromising between high precision and sensitivity [2]. The
tortions, as shown in Figure 1. The hidden layer is tasked with goal is to select such a reconstruction error value to achieve

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO RIO DE JANEIRO. Downloaded on November 21,2024 at 23:53:53 UTC from IEEE Xplore. Restrictions apply.
the lowest possible number of false positive samples while The default configuration refers to 30 epochs and 256
maintaining the highest possible number of true positives. units as a batch size. Standard metrics were used for eval-
As part of the experiment, a neural network based on the uation. Keras, Numpy and Pandas libraries were used to
architecture from Table I was implemented. This configuration implement the algorithm. Additionally, tools for calculating
comes from the work by Catillo et al [3]. The experiment also model statistics were used from the Sklearn machine learning
tested other configurations from the mentioned work, but this package. The code was implemented in the Jupyter Notebook
one showed the best results. environment. The model was trained on a MacBook Pro with
a six-core Intel Core i9 processor and 32 GB of RAM.
TABLE I
U SED NEURAL NETWORK ARCHITECTURE IV. O BTAINED RESULTS
For the trained model, it is crucial to determine the recon-
Layer type Activation function N. of dimensions
Input - 82 struction error threshold. For this purpose, the ROC method
Hidden 1 tanh 16 mentioned in the previous section of this paper was used. The
Hidden 2 sigmoid 4 curve is visible in Figure 3. It represents the best possible
Hidden 3 sigmoid 16
Output tanh 82 configuration obtained with the trained model, which was
exactly equal to 0.042.

Fig. 2. Division of the HIKARI dataset into training and test data

The dataset was divided into normal and attack data. Figure Fig. 3. ROC curve - the line at the top of the graph represents the model
2 shows the division of HIKARI dataset for further training. and the black point is the best possible configuration.
Redundant features, including IP addresses and possible iden-
tifiers, were removed from it. 80% of the normal data was The reconstruction error threshold is shown in Figure 4.
used for training, whereas data with attacks and the remaining Points representing the average values of the reconstruction
20% of the normal data was used for testing, as shown in error for individual points are shown for normal data as
Figure 2. The model training was performed on the resulting brighter points and for anomalies as darker ones. Anything
data set - 10% of the data was used for validation, the batch above the reconstruction error threshold, shown as a dark
size was 256 units, and the number of epochs was equal to dashed line, should be an anomaly, and anything below it
30. An attempt was also made to select other values for the should be normal activity. As can be seen, not all points were
given hyperparameters, but the default ones turned out to be correctly reconstructed by the decoder.
the best - the result of this research is shown in Table II. Table III shows the results obtained from our method
and compares them with other authors works. It contains
TABLE II
comparisons with algorithms from the works of Fernandes et
T HE IMPACT OF HYPERPARAMETERS CHANGES ON THE RESULTS OF THE al. [4] and Vitorino et al. [9]. These include implementations
NEURAL NETWORK . for random forest, extreme gradient boosting, KNN, and MLP.
Batch size Epochs
It can be seen that our algorithm had a high level of recall
Param Default and accuracy. A problem was the low precision parameter,
128 512 20 40
Accuracy 94% 94% 94% 94% 84% which was about 80%. Figure 4 shows a single value of the
Precision 81% 81% 81% 81% 60% reconstruction error threshold as the average value of the in-
Recall 99% 99% 99% 99% 99%
F1 89% 89% 89% 89% 74% dividual data components. In this figure, we can see that most
of the points with anomalies are above the line representing

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO RIO DE JANEIRO. Downloaded on November 21,2024 at 23:53:53 UTC from IEEE Xplore. Restrictions apply.
• Using multiple reconstruction error threshold vectors
instead of just one. For each of the individual classes
in a set, a reconstruction error threshold vector can be
determined, which should improve the classification of
individual threats. With a single error threshold value, a
large number of false positive events occur. Studies have
shown that a single reconstruction error threshold vector
deviates significantly from the test data. Due to its high
value for individual components, practically all samples
are classified as normal activity. A single discrepancy in
the data is enough and the vector value takes on very
high values relative to the set.
• Classification based on only some of the features. Not all
features will have the same impact on whether a given
sample is an anomaly. Using the chi-square method to
analyze individual features and using them in the data
classification process should yield better results in threat
detection and reduce the number of false positives.
• Change of the method of calculating components of the
error threshold vector. In this paper, it was chosen for
Fig. 4. Bright dots represent normal data, while dark represent anomalies. each component as the maximum value of all input
The dashed line is a representation of the reconstruction error threshold. records for the corresponding feature. A solution to this
may be to calculate the mean for all data and then the
TABLE III standard deviation.
C OMPARSION WITH THE RESULTS FROM RELATED WORKS .
R EFERENCES
Fernandes Vitorino
Parameter Our
RF XGB KNN MLP [1] Ahmed M., Mahmood A. N., Hu J. “A survey of network anomaly
Accuracy 94% 98% 96% 98% 90% detection techniques”, Journal of Network and Computer Applications,
Precision 81% 99% 99% 98% 90% 2016, 60: 19-31.
Recall 99% 69% 44% 98% 90% [2] Bradley A. P., “The use of the area under the ROC curve in the evaluation
F1 89% 81% 61% 98% 89% of machine learning algorithms,” Pattern Recognition, 1997.
[3] Catillo M., Pecchia A. and Villano U., “AutoLog: Anomaly Detection by
Deep Autoencoding of System Logs,” Expert Systems with Applications,
2021.
[4] Fernandes R., Silva J., Ribeiro Ó., Portela I. and Lopes N., “The
the reconstruction error threshold. On the other hand, there are impact of identifiable features in ML Classification algorithms with the
also a large number of points with normal data, which means HIKARI-2021 Dataset,” IEEE, 2023.
false positive alerts. An attempt was also made using the [5] Ferriyan A., Thamrin A. H., Takeda K., and Murai J., “Generating
network intrusion detection dataset based on real and encrypted synthetic
method from the work of Torabi [8], but it gave poor results, attack traffic,” IEEE, 2021.
below 80% accuracy. This results from the imperfections of [6] Goodfellow I., Bengio Y. and Courville A., “Deep Learning,” The MIT
the data in the training set - small deviations in the values for Press, 2016.
[7] Holly S. “Autoencoder based Anomaly Detection and Explained Fault
each of the features are enough and it generates significantly Localization in Industrial Cooling Systems,” Cornell University, 2022.
overstated thresholds, which are then required to classify the [8] Torabi H., Mirtaheri S. L. and Greco S., “Practical autoencoder based
data as anomalies. It is worth mentioning that the authors anomaly detection by using vector reconstruction error,” Springer Open,
2023.
of the aforementioned work developed a method based on [9] Vitorino J., Silva M., Maia E., Praça I., “Reliable Feature Selection
many classes. Our algorithm would need to be refined for this for Adversarially Robust Cyber-Attack Detection,” Cryptography and
purpose. Security, 2024.

V. C ONCLUSIONS , FURTHER WORK AND SUMMARY


The developed algorithm did well detecting threats and
could detect most attacks from the test set. The problem
is the precision parameter, which at a value of about 80%
means that the solution is characterized by a high number
of false positives. In the case of cybersecurity, it poses a
problem due to the costs and time required to analyze such
reports. The reason is identified as a single value of the
reconstruction error threshold. Further research is planned to
improve the algorithm’s performance. The following methods
of performance improvement are proposed:

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO RIO DE JANEIRO. Downloaded on November 21,2024 at 23:53:53 UTC from IEEE Xplore. Restrictions apply.

You might also like