0% found this document useful (0 votes)
15 views14 pages

Semi-Supervised Learning For Anomaly Traffic Detection Via Bidirectional Normalizing Flows

Uploaded by

New Way
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Semi-Supervised Learning For Anomaly Traffic Detection Via Bidirectional Normalizing Flows

Uploaded by

New Way
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Semi-Supervised Learning for Anomaly Traffic Detection via

Bidirectional Normalizing Flows

Zhangxuan Dang1 , Yu Zheng1 ,Xinglin Lin1 ,Chunlei Peng1, Qiuyu Chen2 , Xinbo Gao3
1 Xidian University
2 Amazon
3 Chongqing University of Posts and Telecommunications

Abstract: With the rapid development of the Internet, various types of anomaly traffic are threatening net-
arXiv:2403.10550v1 [cs.LG] 13 Mar 2024

work security. We consider the problem of anomaly network traffic detection and propose a three-
stage anomaly detection framework using only normal traffic. Our framework can generate pseudo
anomaly samples without prior knowledge of anomalies to achieve the detection of anomaly data.
Firstly, we employ a reconstruction method to learn the deep representation of normal samples.
Secondly, these representations are normalized to a standard normal distribution using a bidirec-
tional flow module. To simulate anomaly samples, we add noises to the normalized representations
which are then passed through the generation direction of the bidirectional flow module. Finally, a
simple classifier is trained to differentiate the normal samples and pseudo anomaly samples in the
latent space. During inference, our framework requires only two modules to detect anomalous sam-
ples, leading to a considerable reduction in model size. According to the experiments, our method
achieves the state of-the-art results on the common benchmarking datasets of anomaly network
traffic detection. The code is given in the https://github.com/ZxuanDang/ATD-via-Flows.git

1 Introduction
introduce “noise”
With the development of the Internet, the prolif-
eration of devices has led to explosive growth in the normal pseudo anomalies
(a) images
Internet traffic, which poses significant challenges to
the management of network resources and the assur-
ance of network security. In particular, the increasing introduce “noise” Can we get
complexity and diversity of network attacks require pseudo anomalies?

systems to enhance their ability to detect anomaly normal


(b) packets
traffic. Anomaly network traffic detection is a vi-
tal component in ensuring network security by de-
tecting anomaly traffic passing through computer net- Figure 1: (a) Anomalies in images comprise of both colour
work nodes. Such network traffic may include ma- and shape. Based on prior knowledge of anomaly patterns,
images can simulate anomalies by introducing ”noise” [11,
licious activity that is not in alignment with normal 12]. (b) Network traffic anomaly patterns are difficult to
behavior. It is critical to maintaining the security of generalise. Simulating abnormal network traffic packets by
the network infrastructure and reduces the likelihood directly introducing ”noise” may destroy the semantic infor-
of network intrusions. mation of the data packets and produce meaningless pseudo
Supervised methods are used to detect anomaly anomalies, as shown in Section 4.5. Our framework is able
traffic [1–6]. For example, a machine learning classi- to simulate anomaly samples without prior knowledge of
anomaly patterns.
fication model, trained on appropriately labelled man-
ual features, will declare anomaly traffic when the
data does not follow the normal distribution. How-
ever, the main drawbacks of supervised anomaly de- labels for normal and abnormal traffic. Due to lim-
tection are [7–10]: (1) Collecting anomaly traffic ited access to a large amount of anomaly data, semi-
would be a time-consuming and labor-intensive task supervised methods are often adopted for detecting
due to the nature of the anomaly traffic; (2) It can anomalies by training only on normal traffic.
be challenging to obtain accurate and representative Alternative methods generate network traffic to
address data labeling and scaling issues. For exam- anomaly pattern is unknown. By conducting a proxy
ple, Ring et al. propose three different preprocess- classification task between the normal samples and
ing methods for GANs that generate flow-based data the synthetic ones, we facilitate the model in accu-
and evaluate the quality of the generated flows [13]. rately identifying normal samples. As shown in Sec-
In data imputation, SS-GACN and GACN allow tion 4, pseudo anomaly samples help the model to
for missing values in data labels and features [14]. better detect normal representations, even if they are
The methods impute the missing data features based almost not overlapped with real anomaly traffic. To
on classification accuracy. Both of these methods the best of our knowledge, this is the first time that
demonstrate that GANs can successfully generate real a normalizing flow module has been used to generate
network traffic. However, the generation of network anomaly traffic network samples with no prior knowl-
traffic by GANs necessitates large-scale data. More- edge of anomaly patterns and has led to good results
over, such techniques can only generate network traf- in anomaly detection.
fic with a distribution similar to the collected data, In summary, the main contributions of this paper
making it challenging to simulate anomaly traffic are in three folds:
from diverse distributions [15].
• This paper introduces the normalizing flows to
In this work, we propose a novel method for sim- formulate a three-stage framework for anomaly
ulating anomalies that uses only normal traffic dur- traffic detection by using only normal traffic data.
ing training. Our method can generate anomaly sam- The normalizing flows are utilized to process the
ples of network traffic without any prior knowledge of packets obtained from the feature extractor. The
the anomalies, thereby improving anomaly detection. exceptional performance observed in downstream
Anomaly simulation-based methods are often used anomaly detection demonstrates the potential us-
for anomaly detection in computer vision [16–18]. age for adopting normalizing flows in anomaly
As shown in Figure 1, they generate new data out- traffic detection.
side the distribution of normal data by applying trans-
• This paper embeds the normalizing flows into
formations such as rotation, CutPaste [11], flipping,
the process of generating anomaly traffic sam-
and Cutout [19] to the original normal images, and
ples with no prior knowledge of anomaly pat-
then classify the data using a classifier. It has been
terns. The proposed bidirectional flow module
proven that this approach can successfully distinguish
effectively utilizes both normalization and gen-
between normal and anomaly samples [11].
eration directions to simulate anomaly samples
Using the prior knowledge of anomaly patterns, by manipulating the normalized vector without
geometric transformation enables the generation of prior knowledge of the anomaly patterns. The de-
anomaly samples by altering the colour and shape of tection results demonstrate a significant improve-
normal images. For network traffic packets, it is dif- ment in performance achieved through the simu-
ficult to simulate anomaly patterns with transforma- lated anomaly samples.
tions. We cannot directly use geometric transforma-
• Our method outperforms other popular anomaly
tions such as Cutout to obtain anomaly samples, as
detection methods on three common benchmark-
the packets are one-dimensional data structures with
ing datasets for anomaly network traffic detection
precise semantics and no spatial semantics, produc-
and is efficient in computation.
ing meaningless pseudo anomalies, as shown in Sec-
tion 4.5. Therefore, a bidirectional flow module is
proposed in our method. This module can normalize
the normal packet feature to a specific tractable dis- 2 Related Work
tribution. The unknown anomaly samples will be out-
side this distribution after normalization, as the exper- Anomaly Network Traffic Detection Methods In
iments show. By manipulating the vectors in the dis- the current research on anomaly network traffic detec-
tribution, we are able to change the properties of the tion, deep learning-based methods are widely used.
samples, enabling the simulation of anomalies [20]. Some researchers combine traditional feature extrac-
By introducing noise to the normalized features of tion with the classification ability of neural network
normal samples, we can make them deviate from the models. Cao et al. [21] utilize the RFP algorithm
distribution of normal samples. Then, through the for traffic feature extraction and incorporate convo-
direction of the flow generation, we can map them lutional neural networks (CNNs) and Gated Recur-
back to the original space to generate anomaly sam- rent Units (GRU) for classification. Saba et al. [22]
ples. Our framework introduces random noise to utilize a CNN-based classification approach to pre-
achieve simulation of anomaly samples, provided the dict anomaly traffic based on the features of traffic
datasets. Liu et al. [23] utilize a BP neural network use traffic generation to improve the performance of
model to detect manually extracted flow features. detection systems.
Other researchers leverage the powerful feature On the one hand, in research on preventing ad-
extraction capability of neural network models to ex- versarial sample attacks, traffic generation techniques
tract features from traffic data, thereby enhancing the are used to generate adversarial samples to improve
performance of classifiers. Shone et al. [24] utilize a the robustness and accuracy of the model. Elie et
non-symmetric autoencoder for feature extraction and al. [32] focus on the attack perspective and investi-
subsequently integrate it with random forests for in- gate techniques to generate adversarial examples that
trusion detection. Javaid et al. [25] propose a sparse can evade machine learning models. They specifi-
autoencoder to learn feature extraction from unla- cally explore the use of evolutionary computation and
beled data, effectively leveraging the available data deep learning as tools for adversarial example gen-
to enhance the feature extraction capability. Sub- eration. Ye et al. [33] propose a defense algorithm
sequently, they apply a classification task for detec- using a bidirectional generative adversarial network
tion purposes. Wang et al. [26] employ CNNs and (GAN) to improve the robustness and accuracy of
long short-term memory (LSTM) networks to effec- NIDS in the adversarial environment. The algorithm
tively learn spatial and temporal features for classifi- involves training the generator to learn the data dis-
cation. These methods are all fully supervised learn- tribution of normal samples and using the discrimina-
ing methods, which require collecting a large amount tor to detect adversarial samples based on reconstruc-
of labeled anomaly traffic. In contrast, our method tion and matching errors. Wang et al. [34] proposed
achieves effective anomaly detection by leveraging Def-IDS mechanism is a two-module training frame-
easily collectible normal data without the need for la- work that integrates multi-class generative adversar-
beled anomaly traffic. ial networks and multi-source adversarial retraining
to improve model robustness while maintaining de-
Anomaly Detection Methods The research on tection accuracy on unperturbed samples. Zolbayar
anomaly detection encompasses various methods, et al. [35] develop a generative adversarial network
including reconstruction-based, feature matching- (GAN)-based attack algorithm called NIDSGAN to
based, and anomaly simulation-based approaches. generate realistic adversarial network traffic flows that
Reconstruction-based methods aim to detect anomaly can evade ML-based NIDS. The main contributions
samples through the analysis of reconstruction errors. of the paper [36] are the proposal of GADoT, an ad-
Akcay et al. [27] detect anomaly images by recon- versarial training framework that leverages GANs to
structing the latent vectors using encoders. Feature generate fake-benign samples for perturbing DDoS
matching-based methods calculate the difference be- samples, and the evaluation of GADoT using network
tween test samples and stored embeddings to detect traffic traces capturing adversarially perturbed SYN
anomaly samples. Roth et al. [28] obtain anomaly and HTTP DDoS flood attacks.
scores by calculating the distance between test sam- On the other hand, in the work of network traffic
ples and normal embeddings stored in a memory classification, traffic generation techniques are used to
bank. Simulation anomaly sample-based methods uti- perform data augmentation on the traffic data to syn-
lize the synthetic anomaly samples to enhance the thesise network packets that are as realistic as possi-
feature extraction or make the models clearly distin- ble. Shahid et al. [37] propose combining an autoen-
guish the differences between normal and anomaly coder with a Generative Adversarial Network (GAN)
samples. Some researches use geometric transfor- to generate sequences of packet sizes that mimic the
mations to simulate anomaly samples [11, 17, 18, 29], behavior of real bidirectional flows. The autoen-
other researches simulate anomaly samples by adding coder is trained to learn a latent representation of
noise [12, 30, 31]. The experimental results in these real packet size sequences, and the GAN is trained
papers show that simulated anomaly samples enhance on the latent space to generate realistic sequences.
the overall detection performance. However, these Nukavarapu et al. [38] introduce MirageNet, a GAN-
methods directly process images to obtain synthetic based framework for synthetic network traffic gener-
anomaly samples, which can not be applied to net- ation. The first component of MirageNet, MiragePkt,
work traffic. Our method can simulate network traffic validates the performance of their framework using
anomaly samples and demonstrates excellent perfor- synthesized DNS packets. Yin et al. [39] propose an
mance in anomaly traffic detection. end-to-end framework called NetShare to explore the
feasibility of using Generative Adversarial Networks
Traffic Generation Methods In the current work (GANs) to generate synthetic packet and flow header
on network traffic analysis, there are many studies that traces for networking tasks. Hui et al. [40] propose a
knowledge-enhanced generative adversarial network
(GAN) framework to generate realistic IoT traffic.
LD = Ex∼pX [1 − D(x)] + Ex∼pX [D(G(x))] (2)
The framework incorporates semantic knowledge and
network structure knowledge of various IoT devices where x̂ = G(x).
through a knowledge graph. We use one-dimensional traffic packets x, x ∈ Rn as the
Generative Adversarial Networks (GANs) are fre- input for our model. The encoder GE compresses the in-
put x into a hidden vector z, while the decoder GD outputs
quently utilised in all of these approaches to generate x̂, which is responsible for reconstructing the hidden vector
network traffic. However, generating network traf- z back to the input packets. We believe that if the recon-
fic through GANs demands a vast amount of training structed x̂ by the model is very close to the input x, then the
samples, and the procurement of malicious traffic can hidden vector z can effectively represent the features of the
be challenging. In addition, GANs can only generate network packet. After pre-training, we retain only the GE
samples that are similar to the training set, making it of the feature extractor for extracting features from the traf-
difficult to generate data outside of the training distri- fic packets, resulting in a significant reduction in the model
size.
bution [15].
3.2 Bidirectional Flow Module
3 Proposed Method The normalizing flows contain both normalization and gen-
eration directions. Generative flow models leverage a se-
quence of invertible and differentiable operations to trans-
As shown in Figure 2, our framework is developed form a simple and tractable distribution into a complex dis-
in three stages: feature extractor, bidirectional flow tribution [41]. Generally, it can be described by the follow-
module, and classifier. In the following, we will in- ing formula:
troduce each stage.
c ∼ Pθ (c) (3)

3.1 Feature Extractor z = gθ (c) (4)


where c is a random vector that follows a distribution
Pθ , typically Pθ is a simple distribution such as a standard
The traffic packets is a one-dimensional structure normal distribution. z is a extracted vector which follows
with different protocol layers. By following the pre- the unknow true data distribution P∗ (z). The function gθ is a
processing steps showed in Section 4.2, the headers reversible and differentiable function that can generate real
of Network Layer and Transport Layer as well as samples from the complex distribution by utilizing samples
the payloads are the one-dimensional vector input for from a simple distribution. This direction is often called the
the model. Intuitively speaking, an extracted feature generation direction; The function fθ is the inverse function
vector that effectively represents the traffic packet is of gθ , fθ = g−1
θ , which normalizes the real samples from the
complex distribution into the space of a simple distribution.
crucial for the development of downstream tasks, be-
This direction is often called the normalization direction.
cause the original packet often contains a significant According to the change of variables, the probability
amount of redundant information that can confuse the density of the vector z can be written in the following form:
model. Related experiments will be described in Sec-
tion 4.5. In the field of computer vision, pre-trained log pθ (z) = log pθ (c) + log | det(dc/dz)| (5)
models are often used when feature extraction is re- where log | det(dc/dz)| refers to the logarithm of the
quired, such as ResNet18 and ResNet50. To perform absolute value of the determinant of the Jacobian matrix
feature extraction on original traffic packets, we pre- (dc/dz), which can be easily calculated using matrix trans-
trained the feature extractor using our datasets of nor- formations [42], such as triangular matrix transformations.
mal samples. In fact, it is difficult to construct a powerful, reversible,
differentiable and easy-to-calculate Jacobian function [43].
As we only have normal samples, we use a recon- So in the normalizing flows, it is common to combine a se-
struction method to extract features. Our feature ex- quence of reversible and differentiable functions to achieve
tractor is composed of a generator G and a discrimi- the desired transformation. This is the reason why this ap-
nator D, as illustrated in Figure 2. The generator G is proach is referred to as a ”flow”. Therefore, the transform
further divided into an encoder GE and a decoder GD . between c and z can be written as:
Similar to [27], we adopt both reconstruction loss g1 g2 gK
z ← h1 ← h2 · · · ← c (6)
and adversarial loss for better reconstruction training.
The training objectives of the model are as follows: f1 f2 fK
z → h1 → h2 · · · → c (7)
We follow the work in [41] and employ the affine cou-
LG = ωadv Ex∼pX kD(x) − D(G(x))k22 pling layers in each block of the bidirectional flow module,
(1)
+ ωrec Ex∼pX kx − G(x)k22 as showed in Figure 2. Our bidirectional flow module takes
Data Pre-Processing Feature Extractor c
D
灄灄灄 澟

η
!
Bidirectional Flow Module
! !

anomaly/normal

Raw Packet Packet Parser


GE z GD
Classifier

Inference

Conv1d Linear Tanh BatchNorm1d LeakyReLU/ReLU Sigmod Flow Block

Figure 2: An overview of our framework for anomaly detection. c corresponds to the representation of normal packets in
the standard normal space, and η corresponds to the noise vector sampled from a Gaussian Distribution. Feature Extractor
is trained to perform deep feature extraction on one-dimensional normal packets. Bidirectional Flow Module is trained
to normalize the representation of normal packets to a standard normal distribution. During the training of Classifier, the
representation of normal packets is normalized to a standard normal distribution. In the standard normal space, we introduce
noise sampled from a Gaussian Distribution to the normalized representation, and then simulate the representation of anomaly
traffic through the generation direction. Classifier is trained to distinguish the representation of normal packets z and the
simulated representation of anomaly packets ẑ, enabling efficient anomaly detection. In the inference phase, our method can
achieve good anomaly detection by maintaining only two modules, which greatly reduces the size of the model.

the feature extracted by the feature extractor as input, and tors in the latent space. In this way, we have the represen-
we maximize Eq. 5 to train this module through the normal- tation vector of normal samples and anomaly samples in
ization direction. After training, the bidirectional flow mod- the latent space. In the field of computer vision, operations
ule can map the features of normal samples to the standard such as Cutpaste [11] are applied to images of normal sam-
normal distribution space V . For normal samples, the flow ples to obtain synthetic anomaly samples. However, when it
module maps their features to the standard normal distribu- comes to traffic packets, it is challenging to obtain anomaly
tion. However, the situation is different for anomaly sam- samples that closely resemble the real network environment
ples. During the feature extraction phase, the pre-trained through image processing techniques. This is because traf-
feature extractor has only been trained on normal samples. fic packets lack spatial semantic information. Therefore, it
As a result, the features extracted from anomaly samples is necessary to use the bidirectional flow module to map the
are likely to deviate from the distribution of normal fea- feature vectors to the standard normal space V to construct
tures. The bidirectional flow module is also trained on nor- anomaly samples. In the standard normal space V , we can
mal samples, which means that the anomaly features, devi- manipulate the attributes of the vectors to bring them closer
ating from the normal sample distribution may fall outside to real anomaly traffic samples in the latent space. This
the standard normal distribution in the space V , as showned capability allows for the generation of synthetic anomaly
in Figure 4. To simulate the distribution of anomaly sam- samples that exhibit similarity to real-world anomaly traffic
ples, we introduce noise into normal samples in standard patterns.
normal space V . For the sake of simplicity, we choose to
randomly sample from Gaussian Distributions to generate 3.3 Classifier
noise. We employ a reparameterization trick to represent
noise:
After obtaining normal and pseudo anomaly sample fea-
η = µ+σ⊙ε (8) tures, it is natural and straightforward to consider using a
classifier for anomaly detection. We only employ a sim-
where µ represents the mean vector, σ represents the ple classifier to classify the obtained normal and anomaly
standard deviation vector, and ε is the random noise sam- feature vectors. We aim to improve the detection perfor-
pled from the standard normal distribution. By adjusting µ mance of real anomaly samples by encouraging the classi-
and σ, we can generate noise samples from different Gaus- fier to focus more on real normal samples. To achieve this,
sian Distributions. we reduce the number of anomaly samples to half that of
Then we use the generation direction of the bidirec- the normal samples. This prevents the model from overem-
tional flow module to transform the simulated anomaly phasising the features of the pseudo anomaly samples. In
samples in the normal distribution space V , resulting in vec- the testing phase, we only need to combine the classifier
Methods VPN TOR DataCon Params FLOPs
Model
Reverse Distillation [44] 0.6116 0.7450 0.6762 (M) (G)
DFKDE [45] 0.5907 0.7356 0.3969 PADIM [48] 2.78 0.05
DFM [46] 0.7156 0.7514 0.6744
GANomaly [27] 10.73 0.65
DRAEM [12] 0.5698 0.7028 0.6479
GANomaly 1d 45.89 11.94
FastFlow [47] 0.6195 0.6689 0.6571
FastFlow [47] 7.46 0.13
PADIM [48] 0.6726 0.7516 0.6768
PatchCore [28] 0.7058 0.7434 0.4605 DRAEM [12] 97.43 3.11
STFPM [49] 0.5657 0.7371 0.6292 Reverse Distillation [44] 80.61 0.61
CFlow [50] 0.5433 0.7025 0.5850 CFlow [50] 6.45 0.15
GANomaly [27] 0.6239 0.7823 0.6871 STFPM [49] 5.57 0.10
GANomaly 1d 0.5913 0.7166 0.6884 Ours 3.91 0.02
Ours 0.8658 0.8458 0.7292
Table 2: Comparison of model parameters and FLOPs in
Table 1: Anomaly detection AUROC of state-of-the-art the inference phase. Our approach has the lowest FLOPs
methods on DataCon2020, ISCX VPN and non-VPN, and the best detection performance, while also having small
UNB-CIC Tor and non-Tor datasets. We set up different model parameter sizes. The effectiveness of the PADIM
random seeds for three experiments to obtain the average re- method depends on the performance of the pre-trained
sults. Our method achieves the best detection performance model employed. The size of the model will increase as
on each dataset. the capability of the pre-trained model increases.

and encoder GE for anomaly detection, which significantly so naturally we define encrypted traffic and malicious traf-
reducing the model’s parameters and enhancing the model’s fic as anomaly traffic, and non-encrypted traffic and benign
deployment flexibility. We detect anomaly traffic based on traffic as normal traffic. Our training set consists only of
the classification results. normal samples. We have randomly selected 10,000 normal
samples from one dataset to form our training set. Subse-
quently, we have obtained our testing set by randomly se-
lecting 5,000 samples from the remaining pool of normal
4 Experimental Results samples and another 5,000 samples randomly drawn from
the abnormal samples. This process is also replicated for
the training and testing sets of the other two datasets.
4.1 Datasets
4.2 Pre-processing
We have selected three widely used network traffic datasets
for our experiments. Traffic cleaning We first remove packets in the datasets
The ”UNB-CIC Tor and non-Tor” dataset, captured by that may introduce confusion in anomaly detection. The
Arash et al. [51], is collected using Wireshark and Tcp- DNS protocol is used to map domain names and IP ad-
dump. The dataset includes both regular and Tor traffic cap- dresses to each other. Both normal and anomaly traffic ob-
tured from the workstation and gateway, encompassing 14 tain corresponding IP addresses through the DNS protocol
categories of traffic such as Chat, Streaming, Email, and prior to conducting activities. These traffic packets are not
others. directly associated with normal and anomaly characteris-
The ”ISCX VPN and non-VPN” dataset, captured by tics and do not contribute to anomaly detection. The TCP
Gerard et al. [52], is collected using Wireshark and tcp- protocol has a series of stages related to connections, in-
dump. During the capturing process, only the packets with cluding connection establishment, termination, and confir-
the target IP were captured. The dataset comprises a total mation. These packets often do not contain any actual pay-
of 14 categories for regular and VPN traffic, including File load, they are only associated with the connection and have
Transfer, P2P, and more. no direct relevance to specific activities. So we remove both
The ”DataCon2020” dataset [53] is derived from mali- types of packets [54]. In addition, there is also the ARP
cious and benign software collected between February and protocol, which is responsible for the mapping between IP
June 2020. The traffic was generated by sandbox collec- addresses and MAC addresses, but it is often not directly as-
tion from Qi An Xin. The dataset defines malicious traffic sociated with the activities of upper-layer users. Therefore,
as encrypted traffic generated by malware, while the benign we also remove packets related to the ARP protocol.
traffic represents encrypted traffic generated by benign soft- Then we process the structure information in the packet.
ware. The packet header of the data link layer is often responsi-
Our datasets include encrypted and unencrypted traffic ble for managing the transmission of specific physical links
datasets, benign traffic and malicious traffic datasets. In and cannot provide sufficient information for anomaly de-
some scenarios, encrypted traffic is not allowed to be used, tection. Therefore, we remove the data link layer header. In
(a) Tor and non-Tor dataset (b) VPN and non-VPN dataset (c) DataCon2020 dataset
Figure 3: Histogram with density curve. We plot the detection result histogram of the samples in the testing sets of the three
datasets. The curve represents the kernel density estimation of the results. Our method is more effective in distinguishing
between anomaly and normal traffic on the ”UNB-CIC Tor and non-Tor” and the ”ISCX VPN and non-VPN” datasets.
Although the results of distinguishing normal and anomaly traffic on the ”DataCon2020” dataset are not satisfactory, they are
still better than those achieved by other methods.

the header of the IP protocol, there are two fields, the des- 4.4 Comparison with Existing Methods
tination IP address and the source IP address. These two
fields can summarize a series of data communications be-
tween hosts. Detecting anomaly traffic only based on IP We analyze anomaly network traffic through our framework
addresses is considered a shortcut rather than true learning. of anomaly detection. To demonstrate the superiority of our
Therefore, we anonymize the IP addresses of all packets. framework, we extensively have compared it with state-of-
the-art methods in the field of anomaly detection, includ-
ing knowledge distillation based [44, 49], reconstruction
Traffic encoding We use byte encoding to process the based [12, 27], normalization flow based [47, 50], memory
packets by converting the individual bytes in the packets matching based [28], and distribution learning based meth-
into corresponding decimal numbers. This allows us to ob- ods [12, 46, 48]. The methods used for comparison can be
tain a one-dimensional array of packets. To ensure unifor- found in [45], including the code and settings employed.
mity and facilitate model comprehension, we fill the op- For the sake of fairness, We evaluate some methods on
tional fields of the IP and TCP protocols. In addition, there the three datasets by transforming the preprocessed pack-
are two protocols, TCP and UDP, in the transport layer. We ets from a one-dimensional structure to a two-dimensional
have also filled the header of the UDP protocol to match grayscale image. Since these methods employ pre-trained
the header length of TCP, ensuring consistency between the feature extractors specifically designed for two-dimensional
two protocols. The neural network model requires us to images. In addition, we have also modified the model
unify the length of all packets. Taking into account our in [27] and obtained a new method specifically for process-
statistical analysis of packet lengths and the preprocessing ing one-dimensional structured packets, which we refer to
requirements for comparative experiments, we have deter- as GANomaly 1d provided within the code. We run exper-
mined the length of the one-dimensional packet array to be iments 5 times with different random seeds and report the
1600 bytes. Subsequently, we normalize the packets to a mean AUC.
range of 0 to 1. Finally, we combine the processed packets
As shown in Table 1, we can clearly see that on the
with the corresponding labels and store them in a CSV file
”UNB-CIC Tor and non-Tor” and ”ISCX VPN and non-
format.
VPN” datasets, our method achieves the best results and
leads the second-ranked methods with a significant advan-
4.3 Implementaion Details tage of 6% and 15% respectively. In the knowledge distil-
lation based anomaly detection method [44, 49], the differ-
Our model is trained for 100 epochs with early stopping ence between the representations of the anomaly samples by
techniques. For the generator G and discriminator D in the the teacher model and the student model is used to detect the
feature extractor, we use two Adam optimizers with a learn- anomaly. However, the detection is not effective because
ing rate of 0.001, betas of 0.5 and 0.999 to train respectively. the pre-trained teacher model on ImageNet [55] has not seen
The wadv in the feature extractor loss is set to 1, while wrec the network traffic samples. Additionally, the conversion of
is set to 50. Additionally, we have determined the size of the packets from their original one-dimensional semantic
the hidden vector extracted by the feature extractor to be 70. structure to a two-dimensional format disrupts their inher-
We utilize the bidirectional flow module, which consists of ent structure. The added dimension does not add any ad-
8 coupling blocks. For training, we employ the Adam op- ditional information. Feature extraction is commonly em-
timizer with a learning rate of 0.001 and betas of 0.5 and ployed in anomaly detection [56]. Our method employs the
0.999. The two parameters, µ and σ, used for generating feature extractor that is pre-trained specifically on normal
noise are determined through experimental analysis. When one-dimensional network traffic packets. We design the
training the classifier, we utilize the Adam optimizer with a unsupervised feature extractor to represent network traffic
learning rate of 0.001 and betas of 0.5 and 0.999. These pa- packets effectively. This approach avoids directly generat-
rameters have been determined through an process of grid ing raw packets, which would destroy the data structure and
search and experiments. produce meaningless samples, as demonstrated in the Sec-
Noise Distribution Dataset Noise Distribution Dataset
µ σ TOR VPN DataCon µ σ TOR VPN DataCon
0.1 0.6344 0.4598 0.6850 0.1 0.7695 0.7910 0.7058
0.5 0.5888 0.5771 0.6906 0.5 0.5749 0.8082 0.6985
-100 5
5 0.6766 0.7582 0.7292 5 0.7004 0.7707 0.6828
15 0.6051 0.7221 0.7137 15 0.7418 0.8076 0.6926
0.1 0.6482 0.7609 0.4488 0.1 0.6364 0.6703 0.6740
0.5 0.7189 0.8191 0.5752 0.5 0.6331 0.8160 0.7020
-25 9
5 0.8458 0.7732 0.7146 5 0.6545 0.7989 0.6924
15 0.8161 0.7891 0.7092 15 0.6471 0.7427 0.6697
0.1 0.6553 0.8201 0.6893 0.1 0.4361 0.8370 0.7207
0.5 0.6138 0.8414 0.6023 0.5 0.6484 0.4467 0.7026
-10 10
5 0.5583 0.8602 0.6494 5 0.7855 0.7529 0.6966
15 0.7696 0.7892 0.5721 15 0.6094 0.8002 0.6836
0.1 0.6526 0.8118 0.6971 0.1 0.5958 0.6327 0.6486
0.5 0.7909 0.8155 0.6710 0.5 0.4326 0.7602 0.7101
-9 25
5 0.5763 0.8658 0.3903 5 0.8331 0.5499 0.6550
15 0.6485 0.7780 0.7062 15 0.5875 0.8542 0.6755
0.1 0.6523 0.7979 0.6531 0.1 0.7961 0.1944 0.6966
0.5 0.5938 0.7042 0.5486 0.5 0.6590 0.7584 0.6836
-5 100
5 0.6146 0.4137 0.6834 5 0.5877 0.8143 0.6998
15 0.7199 0.7368 0.6655 15 0.5606 0.8347 0.6805
Table 3: Detection performance on different noise distributions. The noise distribution determines the generated pseudo
anomaly samples. By the trick of reparameterization, we can easily get different noise distributions. We explore the effects of
different noise distributions on the three datasets.

tion 4.5.3. The extracted features are more beneficial for performance of all methods is not particularly outstanding.
enhancing the performance of downstream anomaly detec- We analyze that the diversity of categories and the insuf-
tion tasks. ficient number of samples in each category in this dataset
have proposed challenges for each detection method. In ad-
A traffic packet is a one-dimensional data structure con- dition, as shown in Figure 4, the similarity between normal
taining specific semantic fields but lacks spatial seman- packet samples and abnormal packet samples in the Dat-
tics [6]. For methods that employ generated anomaly sam- aCon dataset is relatively high, posing a challenge for the
ples for anomaly detection, such as DREAM [12], intro- model to detect abnormal samples.
ducing noise into normal packet images is meaningless as it
destroys the semantic fields of the original packet and does As shown in Figure 3, we draw the histogram of the
not effectively generate anomaly samples, resulting in poor detection results on the test sets of the three datasets to
results. Some of the problems that exist in current network visualize the detection effect of our model. Our model
traffic generation research [39, 57], such as the difficulty demonstrates excellent performance in distinguishing be-
of training GANs and the need to collect a large number tween normal and anomaly traffic on both the ”UNB-CIC
of samples. Our method is not affected by these issues. Tor and non-Tor” and the ”ISCX VPN and non-VPN”
Our method maps the normal samples to the space of stan- datasets. Additionally, on the ”DataCon2020 dataset”, our
dard normal distribution, and simulates the anomaly sam- model currently exhibits the best detection performance, al-
ples through manipulating them in this space. This method though there is still potential for further improvement.
avoids disrupting the specific semantic fields of the origi- We also compare the model sizes and the FLOPs of
nal data packet and allows for a closer semantic alignment the different methods during the inference, as shown in Ta-
with real anomaly samples. At the same time, we do not ble 2. Our approach is effective for detecting network traffic
need to collect anomaly samples or a priori knowledge of anomalies in computer power limited deployment environ-
anomaly patterns, but only simulate anomaly samples by ments. After training our model, anomaly detection can be
normal samples and randomly sampled noise. By adjusting achieved by retaining only the encoder part of the feature
the distribution of random noise, we are able to simulate extractor and the classifier in the inference process, and
different anomaly samples. On the ”DataCon2020” dataset, the rest of the modules are only used to support the train-
our method also achieves the best results, but the overall ing. Unlike other normalizing flow based methods [47, 50],
our approach does not require the flow module to compute the samples, but we have difficulty in achieving the seman-
anomaly scores, and serves as a module for synthesizing tic conversion from normal to abnormal samples by random
anomaly samples during the training process. noise, which may require careful design of the vectors. Ma-
We employ t-SNE to project the features of the normal, nipulating vectors in the standard normal space proves to be
anomaly, and synthetic anomaly samples into a 2D space, more effective to altering the properties of network traffic.
as showed in Figure 4. Ideally, we hope that the simulated
anomaly samples overlap well with the real anomaly sam- 4.5.3 Ablating Feature Extractor
ples. However, this requires us to spend more time carefully
designing the distribution of the noise. For simplicity, we
have tried sampling the noise from a random Gaussian dis- We conducted training using normal samples to acquire a
tribution and changing the normal sample properties with feature extractor capable of effective feature extraction on
it. While the synthetic anomaly features may not perfectly packets of network traffic. Autoencoders are often used for
simulate real anomaly samples, the model improves the dis- extracting features from network traffic [13, 58]. However,
crimination of normal samples by distinguishing between these features are typically deep representations of manu-
normal and synthetic anomaly samples. ally extracted features. This approach may overlook poten-
tial data connections within the packets. Our feature extrac-
tor introduces a discriminator to improve the capability of
4.5 Ablation Study the autoencoder, which is more helpful in generating dense
sample features.
4.5.1 Adopting Different Noise Distributions We remove the feature extractor in our feature extrac-
tor ablation experiments and directly feed the preprocessed
For our method, the noise distribution plays a particularly one-dimensional packets into the bidirectional flow mod-
critical role, as it directly determines the quality of the gen- ule. The bidirectional flow module is trained to map the
erated samples. In [20], the authors determine the direction one-dimensional packet vectors to the standard normal dis-
of attribute change by the difference between two sample tribution space. In the standard normal distribution space,
vectors with specific attributes. And since our model is only noise is introduced to the vectors, and through the gener-
trained on normal samples, we can only guide the genera- ation direction, simulated anomaly samples are obtained.
tion of simulated abnormal samples by trying random noise. Both normal samples and simulated anomaly samples are
This aspect is crucial for training classifiers effectively. then fed into the classifier for detection. The results ob-
We have tried different combinations of µ and σ, and the tained by retraining the parameters are shown in Table 4.
experimental results are shown in Table 3. This shows that The detection performance directly using one-dimensional
our model is sensitive to the parameters of the noise distri- packet vectors is poor on these three datasets. We believe
bution, which is in line with our expectations. The distribu- that the packet vectors without feature extraction may con-
tion of the simulated anomaly samples in the standard nor- tain a significant amount of redundant information, which
mal distribution space is difficult to determine, depending hampers model learning by lacking concentrated informa-
on the noise distribution from which the noise is sampled. tion. Consequently, this leads to poor performance.
We simulate the anomaly samples by manipulating the nor-
mal samples, thereby enabling the classifier to show excel- 4.5.4 Adjusting the Ratio of Samples
lent recognition capability for normal samples. In addition,
we can easily generate various types of anomaly samples For our model, the aim is for the classifier to learn how to
only by changing noise distributions. distinguish normal samples with a learning focus on such
samples. In addition, we think that if the number of syn-
4.5.2 Ablating Bidirectional Flow Module thetic abnormal samples is increased, it is possible that the
model will shift its learning focus.
Generating anomalous samples is crucial to our approach, We adjust the number of generated anomaly samples to
and the Bidirectional Flow Module is able to fit more com- demonstrate the impact of different ratios of normal sam-
plex distributions by combining multiple invertible trans- ples to anomaly samples on the model. In our method, the
formations, resulting in high quality synthetic samples. In number of synthetic anomaly samples is half the number of
addition, by training this module we are able to map normal normal samples. The other two comparative methods in-
samples to a specified distribution, while anomaly samples volve maintaining an equal number of normal and anomaly
not seen by the module will be mapped outside of the distri- samples, and having twice the number of anomaly samples
bution, thus simulating abnormal samples by deviating from compared to normal samples. The experimental results are
the normal samples. shown in Table 5. Compared to our method, when the num-
We remove the bidirectional flow module and directly ber of anomaly samples increases to be equal to the number
introduce noise to the latent vectors extracted by the fea- of normal samples, there is a slight decrease in performance
ture extractor to simulate anomaly samples. Then, the sim- on all three datasets. As the number of anomaly samples
ulated samples are fed into the classifier for detection. We continues to increase to twice the number of normal sam-
explore different combinations of µ and σ, and the exper- ples, the performance still decreases. Changing the ratio of
imental results are shown in Table 4. It can be observed samples will have an impact on the classifier. We expect
that directly introducing noise to the latent vectors to sim- the classifier to distinguish between normal and anomaly
ulate anomaly samples does not achieve satisfactory detec- samples. However, when the number of anomaly samples
tion results. The latent vectors can affect the properties of increases, the classifier tends to pay excessive attention to
(a) Tor and non-Tor dataset (b) VPN and non-VPN dataset (c) DataCon2020 dataset
Figure 4: T-SNE visualization of representations in latent space. We plot the features of the normal, anomaly, and synthetic
anomaly samples. It can be seen that our synthetic anomaly samples do not overlap well with real anomaly samples, but they
are significantly different from normal samples. The model learns how to accurately identify normal traffic by distinguishing
between them.

Noise Distribution Dataset Noise Distribution Dataset


µ σ VPN TOR DataCon µ σ VPN TOR DataCon
-100 5 0.7557 0.5573 0.6768 -100 5 0.7325 0.5711 0.7169
-25 5 0.5324 0.6042 0.6705 -25 5 0.5696 0.4260 0.3805
-20 10 0.7736 0.4758 0.3844 -20 10 0.6315 0.5567 0.6728
-10 5 0.3972 0.5509 0.6723 -10 5 0.3749 0.4729 0.6181
w/o Bidirectional Flow Module

-10 1 0.3746 0.4381 0.3440 -10 1 0.6870 0.5819 0.5431


w/o Feature Extractor

-9 5 0.6804 0.6120 0.3334 -9 5 0.6680 0.4549 0.6620


-9 1 0.4185 0.6638 0.4544 -9 1 0.6383 0.4909 0.5789
-5 5 0.5244 0.7459 0.4250 -5 5 0.6515 0.4481 0.6768
5 1.5 0.3090 0.5880 0.6427 5 1.5 0.6810 0.5010 0.6783
5 15 0.5873 0.6091 0.6594 5 15 0.7976 0.4838 0.4110
9 0.1 0.7900 0.6344 0.6705 9 0.1 0.2565 0.5016 0.6624
9 1 0.7767 0.5591 0.6760 9 1 0.3284 0.5169 0.6785
10 5 0.4010 0.6177 0.3828 10 5 0.3127 0.4883 0.6603
20 1 0.2760 0.5943 0.6770 20 1 0.5016 0.4624 0.6800
25 5 0.6552 0.4819 0.6297 25 5 0.5243 0.6912 0.6745
Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292 Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292
Table 4: Detection performance on ablating different modules of our method. We conduct ablation experiments on the
bidirectional flow module and the feature extractor to demonstrate the effectiveness of our proposed modules. The bottom
row corresponds to the results of our method with both modules, where µ and σ represent different datasets, respectively.

the anomaly samples, resulting in a decrease in detection ”DataCon2020” dataset. We train our model on one dataset
performance. and test on another dataset to assess its ability to general-
ize to unseen anomaly samples. As shown in Table 6, when
4.5.5 Generalizing across the Datasets training our model on one dataset and testing it on another,
the results show a slight drop, but remain within acceptable
limits. This shows that our method can have relatively good
In a real network traffic detection scenario, the trained
detection results even in the presence of unknown anomaly
model will be faced with a large number of unknown net-
traffic in real detection environments. Our approach im-
work traffic packets, both normal and anomaly traffic. We
proves the model’s ability to identify normal network traffic
will explore the detection ability of the model when faced
by classifying pseudo anomaly traffic. It is effective across
with the test samples from an unknown distribution, so we
different data distributions.
further investigate the generalization ability across datasets.
While the anomaly samples in three datasets are dif-
ferent, the normal samples are all captured from normal
activities and have similarities. We further study the gen-
eralization of our method across different datasets. Both 5 Conclusion
the ”UNB-CIC Tor and non-Tor” and ”ISCX VPN and non-
VPN” datasets consist of encrypted and unencrypted traf- In this paper, we propose a three-stage framework for
fic. We extend our experiments on one of these datasets and anomaly traffic detection that involves generating simulated
Noise Distribution Dataset Noise Distribution Dataset
µ σ VPN TOR DataCon µ σ VPN TOR DataCon
-100 5 0.3333 0.6416 0.6771 -100 5 0.2876 0.3372 0.6693
-25 5 0.8419 0.5984 0.6985 -25 5 0.6676 0.6493 0.6827
-20 10 0.7559 0.6317 0.6945 -20 10 0.7154 0.4850 0.7059
-10 5 0.7898 0.6274 0.6837 -10 5 0.5435 0.5279 0.6670
-10 1 0.6979 0.6394 0.7031 -10 1 0.3853 0.7602 0.6894
anomaly:normal=1:1

anomaly:normal=2:1
-9 5 0.8019 0.6781 0.6895 -9 5 0.8445 0.5985 0.6731
-9 1 0.7897 0.7820 0.6981 -9 1 0.8083 0.6944 0.6915
-5 5 0.7233 0.4607 0.6775 -5 5 0.7969 0.6685 0.6545
5 1.5 0.7899 0.7173 0.7137 5 1.5 0.8115 0.5829 0.6892
5 15 0.5487 0.5469 0.6912 5 15 0.6553 0.5507 0.5893
9 0.1 0.7639 0.5671 0.7001 9 0.1 0.7880 0.6879 0.6967
9 1 0.3962 0.7811 0.6576 9 1 0.8460 0.6487 0.6999
10 5 0.8308 0.6906 0.6855 10 5 0.7889 0.5456 0.6831
20 1 0.7555 0.7493 0.6429 20 1 0.6377 0.7505 0.7047
25 5 0.8197 0.6163 0.5783 25 5 0.7645 0.6256 0.6793
Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292 Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292
Table 5: Detection performance on different ratios of normal samples to abnormal samples. We achieve different ratios by
altering the quantity of pseudo anomaly samples during the training process. The bottom row shows our method, where
pseudo abnormal samples account for half of normal samples, where µ and σ represent different datasets, respectively.

Train Test REFERENCES


Result
DataCon TOR DataCon TOR [1] O. Salman, I. H. Elhajj, A. Chehab, and A. Kayssi,
√ √
0.7292 “A machine learning based framework for iot device
√ √ identification and abnormal traffic detection,” Trans-
0.7058 actions on Emerging Telecommunications Technolo-
√ √ gies, vol. 33, no. 3, p. e3743, 2022.
0.8458
√ √ [2] J. Niu, Y. Zhang, D. Liu, D. Guo, and Y. Teng, “Ab-
0.8060
normal network traffic detection based on transfer
Table 6: Detection performance across the datasets. ”UNB- component analysis,” in IEEE International Confer-
CIC Tor and non-Tor” is the encrypted and unencrypted ence on Communications Workshops, pp. 1–6, 2019.
traffic dataset, and the ”DataCon2020” dataset is the be- [3] M. Gao, L. Ma, H. Liu, Z. Zhang, Z. Ning, and J. Xu,
nign and malicious traffic dataset. Comparing with training “Malicious network traffic detection based on deep
and testing on the same dataset, the detection performance neural networks and association analysis,” Sensors,
across the datasets have a slightly decrease. vol. 20, no. 5, p. 1452, 2020.
[4] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, “In-
trusion detection using convolutional neural networks
for representation learning,” in International Confer-
ence on Neural Information Processing, pp. 858–866,
2017.
anomaly samples. Our approach is able to generate anomaly [5] L. Yang, Y. Song, S. Gao, B. Xiao, and A. Hu, “Grif-
samples with unknown patterns, without prior knowledge fin: an ensemble of autoencoders for anomaly traf-
of the anomalies, and use them to improve anomaly detec- fic detection in sdn,” in IEEE Global Communications
tion. The key lies in the feature extractor and bidirectional Conference, pp. 1–6, 2020.
flow module designed specifically for traffic. These mod- [6] Y. Zheng, Z. Dang, C. Peng, C. Yang, and X. Gao,
ules enable us to transform the packets into the standard “Multi-view multi-label anomaly network traffic clas-
normal distribution space, where we manipulate the vec- sification based on mlp-mixer neural network,” arXiv
tors to alter the properties of the traffic packets. This tech- preprint arXiv:2210.16719, 2022.
nique allows us to simulate the anomaly traffic. Our method
demonstrates excellent performance in anomaly detection [7] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly
across three real network traffic datasets. We envision that detection: A survey,” ACM computing surveys
our method of constructing anomaly samples can be widely (CSUR), vol. 41, no. 3, pp. 1–58, 2009.
applied in many fields, serving as a reliable technique for [8] Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan,
generating simulated anomaly samples. and K. Singh, “Flow-based anomaly detection using
semisupervised learning,” in 2015 9th International iot networks through deep learning model,” Comput-
Conference on Signal Processing and Communication ers and Electrical Engineering, vol. 99, p. 107810,
Systems (ICSPCS), pp. 1–5, IEEE, 2015. 2022.
[9] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. [23] Z. Liu, Y. He, W. Wang, and B. Zhang, “Ddos attack
Kalita, “Towards an unsupervised method for network detection scheme based on entropy and pso-bp neu-
anomaly detection in large datasets,” Computing and ral network in sdn,” China Communications, vol. 16,
informatics, vol. 33, no. 1, pp. 1–34, 2014. no. 7, pp. 144–155, 2019.
[10] Y. Shi and H. Shen, “Unsupervised anomaly detection [24] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A
for network traffic using artificial immune network,” deep learning approach to network intrusion detec-
Neural Computing and Applications, vol. 34, no. 15, tion,” IEEE Transactions on Emerging Topics in Com-
pp. 13007–13027, 2022. putational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.
[25] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep
[11] C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste:
learning approach for network intrusion detection sys-
Self-supervised learning for anomaly detection and
tem,” in International Conference on Bio-inspired In-
localization,” in IEEE/CVF Conference on Computer
formation and Communications Technologies, pp. 21–
Vision and Pattern Recognition, pp. 9664–9674, 2021.
26, 2016.
[12] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem- [26] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye,
a discriminatively trained reconstruction embedding Y. Huang, and M. Zhu, “Hast-ids: Learning hierar-
for surface anomaly detection,” in IEEE/CVF Inter- chical spatial-temporal features using deep neural net-
national Conference on Computer Vision, pp. 8330– works to improve intrusion detection,” IEEE Access,
8339, 2021. vol. 6, pp. 1792–1806, 2017.
[13] M. Ring, D. Schlör, D. Landes, and A. Hotho, “Flow- [27] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon,
based network traffic generation using generative ad- “Ganomaly: Semi-supervised anomaly detection via
versarial networks,” Computers & Security, vol. 82, adversarial training,” in Asian Conference on Com-
pp. 156–172, 2019. puter Vision, pp. 622–637, 2019.
[14] R. Ghanavi, B. Liang, and A. Tizghadam, “Generative [28] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox,
adversarial classification network with application to and P. Gehler, “Towards total recall in industrial
network traffic classification,” in 2021 IEEE Global anomaly detection,” in IEEE/CVF Conference on
Communications Conference (GLOBECOM), pp. 1– Computer Vision and Pattern Recognition, pp. 14318–
6, IEEE, 2021. 14328, 2022.
[15] A. Cheng, “Pac-gan: Packet generation of network [29] J. Song, K. Kong, Y.-I. Park, S.-G. Kim, and
traffic using generative adversarial networks,” in 2019 S.-J. Kang, “Anoseg: anomaly segmentation net-
IEEE 10th Annual Information Technology, Electron- work using self-supervised learning,” arXiv preprint
ics and Mobile Communication Conference (IEM- arXiv:2110.03396, 2021.
CON), pp. 0728–0734, IEEE, 2019. [30] M. Yang, P. Wu, and H. Feng, “Memseg: A semi-
[16] L. Bergman and Y. Hoshen, “Classification-based supervised method for image surface defect detec-
anomaly detection for general data,” arXiv preprint tion using differences and commonalities,” Engineer-
arXiv:2005.02359, 2020. ing Applications of Artificial Intelligence, vol. 119,
p. 105835, 2023.
[17] I. Golan and R. El-Yaniv, “Deep anomaly detection
[31] A.-S. Collin and C. De Vleeschouwer, “Improved
using geometric transformations,” Advances in Neural
anomaly detection by training an autoencoder with
Information Processing Systems, vol. 31, 2018.
skip connections on images corrupted with stain-
[18] D. Hendrycks, M. Mazeika, S. Kadavath, and shaped noise,” in International Conference on Pattern
D. Song, “Using self-supervised learning can improve Recognition, pp. 7915–7922, 2021.
model robustness and uncertainty,” Advances in Neu- [32] E. Alhajjar, P. Maxwell, and N. Bastian, “Adversar-
ral Information Processing Systems, vol. 32, 2019. ial machine learning in network intrusion detection
[19] T. DeVries and G. W. Taylor, “Improved regulariza- systems,” Expert Systems with Applications, vol. 186,
tion of convolutional neural networks with cutout,” p. 115782, 2021.
arXiv preprint arXiv:1708.04552, 2017. [33] Y. Peng, G. Fu, Y. Luo, J. Hu, B. Li, and Q. Yan, “De-
[20] D. P. Kingma and P. Dhariwal, “Glow: Genera- tecting adversarial examples for network intrusion de-
tive flow with invertible 1x1 convolutions,” Advances tection system with gan,” in 2020 IEEE 11th Interna-
in Neural Information Processing Systems, vol. 31, tional Conference on Software Engineering and Ser-
2018. vice Science (ICSESS), pp. 6–10, IEEE, 2020.
[34] J. Wang, J. Pan, I. AlQerm, and Y. Liu, “Def-ids: An
[21] B. Cao, C. Li, Y. Song, Y. Qin, and C. Chen, “Network
ensemble defense mechanism against adversarial at-
intrusion detection model based on cnn and gru,” Ap-
tacks for deep learning-based network intrusion de-
plied Sciences, vol. 12, no. 9, p. 4184, 2022.
tection,” in 2021 International Conference on Com-
[22] T. Saba, A. Rehman, T. Sadad, H. Kolivand, and S. A. puter Communications and Networks (ICCCN), pp. 1–
Bahaj, “Anomaly-based intrusion detection system for 9, IEEE, 2021.
[35] B.-E. Zolbayar, R. Sheatsley, P. McDaniel, M. J. Weis- for anomaly detection and localization,” in Interna-
man, S. Zhu, S. Zhu, and S. Krishnamurthy, “Gener- tional Conference on Pattern Recognition Workshops,
ating practical adversarial network traffic flows using pp. 475–489, 2021.
nidsgan,” arXiv preprint arXiv:2203.06694, 2022. [49] G. Wang, S. Han, E. Ding, and D. Huang, “Student-
[36] M. Abdelaty, S. Scott-Hayward, R. Doriguzzi-Corin, teacher feature pyramid matching for unsupervised
and D. Siracusa, “Gadot: Gan-based adversarial train- anomaly detection. arxiv 2021,” arXiv preprint
ing for robust ddos attack detection,” in 2021 IEEE arXiv:2103.04257.
Conference on Communications and Network Security [50] D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-
(CNS), pp. 119–127, IEEE, 2021. ad: Real-time unsupervised anomaly detection with
[37] M. R. Shahid, G. Blanc, H. Jmila, Z. Zhang, and localization via conditional normalizing flows,” in
H. Debar, “Generative deep learning for internet of IEEE/CVF Winter Conference on Applications of
things network traffic generation,” in 2020 IEEE 25th Computer Vision, pp. 98–107, 2022.
Pacific Rim International Symposium on Dependable [51] A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and
Computing (PRDC), pp. 70–79, IEEE, 2020. A. A. Ghorbani, “Characterization of tor traffic using
[38] S. K. Nukavarapu, M. Ayyat, and T. Nadeem, time based features,” in International Conference on
“Miragenet-towards a gan-based framework for syn- Information Systems Security and Privacy, pp. 253–
thetic network traffic generation,” in GLOBECOM 262, 2017.
2022-2022 IEEE Global Communications Confer- [52] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and
ence, pp. 3089–3095, IEEE, 2022. A. A. Ghorbani, “Characterization of encrypted and
[39] Y. Yin, Z. Lin, M. Jin, G. Fanti, and V. Sekar, “Practi- vpn traffic using time-related features,” in Interna-
cal gan-based synthetic ip header trace generation us- tional Conference on Information Systems Security
ing netshare,” in Proceedings of the ACM SIGCOMM and Privacy, pp. 407–414, 2016.
2022 Conference, pp. 458–472, 2022. [53] D. Community, “Datacon open dataset-
[40] S. Hui, H. Wang, Z. Wang, X. Yang, Z. Liu, D. Jin, datacon2020-encrypted malicious traf-
and Y. Li, “Knowledge enhanced gan for iot traffic fic dataset direction open dataset.”
generation,” in Proceedings of the ACM Web Confer- https://datacon.qianxin.com/opendata/openpage?resourcesId=6
ence 2022, pp. 3336–3346, 2022. 2021-11-11.
[41] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Den- [54] M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hos-
sity estimation using real nvp,” arXiv preprint sein Zade, and M. Saberian, “Deep packet: A
arXiv:1605.08803, 2016. novel approach for encrypted traffic classification us-
ing deep learning,” Soft Computing, vol. 24, no. 3,
[42] D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, pp. 1999–2012, 2020.
I. Sutskever, and M. Welling, “Improved variational
inference with inverse autoregressive flow,” Advances [55] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and
in Neural Information Processing Systems, vol. 29, L. Fei-Fei, “Imagenet: A large-scale hierarchical im-
2016. age database,” in 2009 IEEE conference on computer
vision and pattern recognition, pp. 248–255, Ieee,
[43] I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Nor- 2009.
malizing flows: An introduction and review of current
methods,” IEEE Transactions on Pattern Analysis and [56] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep
Machine Intelligence, vol. 43, no. 11, pp. 3964–3979, learning for anomaly detection: A review,” ACM com-
2020. puting surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.
[57] S. Xu, M. Marwah, M. Arlitt, and N. Ramakrishnan,
[44] H. Deng and X. Li, “Anomaly detection via reverse
“Stan: Synthetic network traffic generation with gen-
distillation from one-class embedding,” in IEEE/CVF
erative neural models,” in Deployable Machine Learn-
Conference on Computer Vision and Pattern Recogni-
ing for Security Defense: Second International Work-
tion, pp. 9737–9746, 2022.
shop, MLHat 2021, Virtual Event, August 15, 2021,
[45] S. Akcay, D. Ameln, A. Vaidya, B. Lakshmanan, Proceedings 2, pp. 3–29, Springer, 2021.
N. Ahuja, and U. Genc, “Anomalib: A deep learning
[58] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé,
library for anomaly detection,” 2022.
“Mobile encrypted traffic classification using deep
[46] N. A. Ahuja, I. Ndiour, T. Kalyanpur, and O. Tickoo, learning: Experimental evaluation, lessons learned,
“Probabilistic modeling of deep features for out-of- and challenges,” IEEE Transactions on Network and
distribution and adversarial detection,” arXiv preprint Service Management, vol. 16, no. 2, pp. 445–458,
arXiv:1909.11786, 2019. 2019.
[47] J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao,
and L. Wu, “Fastflow: Unsupervised anomaly detec-
tion and localization via 2d normalizing flows,” arXiv
preprint arXiv:2111.07677, 2021.
[48] T. Defard, A. Setkov, A. Loesch, and R. Audigier,
“Padim: a patch distribution modeling framework
This figure "ICAIIT.png" is available in "png" format from:

http://arxiv.org/ps/2403.10550v1

You might also like