Semi-Supervised Learning For Anomaly Traffic Detection Via Bidirectional Normalizing Flows
Semi-Supervised Learning For Anomaly Traffic Detection Via Bidirectional Normalizing Flows
Zhangxuan Dang1 , Yu Zheng1 ,Xinglin Lin1 ,Chunlei Peng1, Qiuyu Chen2 , Xinbo Gao3
1 Xidian University
2 Amazon
3 Chongqing University of Posts and Telecommunications
Abstract: With the rapid development of the Internet, various types of anomaly traffic are threatening net-
arXiv:2403.10550v1 [cs.LG] 13 Mar 2024
work security. We consider the problem of anomaly network traffic detection and propose a three-
stage anomaly detection framework using only normal traffic. Our framework can generate pseudo
anomaly samples without prior knowledge of anomalies to achieve the detection of anomaly data.
Firstly, we employ a reconstruction method to learn the deep representation of normal samples.
Secondly, these representations are normalized to a standard normal distribution using a bidirec-
tional flow module. To simulate anomaly samples, we add noises to the normalized representations
which are then passed through the generation direction of the bidirectional flow module. Finally, a
simple classifier is trained to differentiate the normal samples and pseudo anomaly samples in the
latent space. During inference, our framework requires only two modules to detect anomalous sam-
ples, leading to a considerable reduction in model size. According to the experiments, our method
achieves the state of-the-art results on the common benchmarking datasets of anomaly network
traffic detection. The code is given in the https://github.com/ZxuanDang/ATD-via-Flows.git
1 Introduction
introduce “noise”
With the development of the Internet, the prolif-
eration of devices has led to explosive growth in the normal pseudo anomalies
(a) images
Internet traffic, which poses significant challenges to
the management of network resources and the assur-
ance of network security. In particular, the increasing introduce “noise” Can we get
complexity and diversity of network attacks require pseudo anomalies?
η
!
Bidirectional Flow Module
! !
anomaly/normal
Inference
Figure 2: An overview of our framework for anomaly detection. c corresponds to the representation of normal packets in
the standard normal space, and η corresponds to the noise vector sampled from a Gaussian Distribution. Feature Extractor
is trained to perform deep feature extraction on one-dimensional normal packets. Bidirectional Flow Module is trained
to normalize the representation of normal packets to a standard normal distribution. During the training of Classifier, the
representation of normal packets is normalized to a standard normal distribution. In the standard normal space, we introduce
noise sampled from a Gaussian Distribution to the normalized representation, and then simulate the representation of anomaly
traffic through the generation direction. Classifier is trained to distinguish the representation of normal packets z and the
simulated representation of anomaly packets ẑ, enabling efficient anomaly detection. In the inference phase, our method can
achieve good anomaly detection by maintaining only two modules, which greatly reduces the size of the model.
the feature extracted by the feature extractor as input, and tors in the latent space. In this way, we have the represen-
we maximize Eq. 5 to train this module through the normal- tation vector of normal samples and anomaly samples in
ization direction. After training, the bidirectional flow mod- the latent space. In the field of computer vision, operations
ule can map the features of normal samples to the standard such as Cutpaste [11] are applied to images of normal sam-
normal distribution space V . For normal samples, the flow ples to obtain synthetic anomaly samples. However, when it
module maps their features to the standard normal distribu- comes to traffic packets, it is challenging to obtain anomaly
tion. However, the situation is different for anomaly sam- samples that closely resemble the real network environment
ples. During the feature extraction phase, the pre-trained through image processing techniques. This is because traf-
feature extractor has only been trained on normal samples. fic packets lack spatial semantic information. Therefore, it
As a result, the features extracted from anomaly samples is necessary to use the bidirectional flow module to map the
are likely to deviate from the distribution of normal fea- feature vectors to the standard normal space V to construct
tures. The bidirectional flow module is also trained on nor- anomaly samples. In the standard normal space V , we can
mal samples, which means that the anomaly features, devi- manipulate the attributes of the vectors to bring them closer
ating from the normal sample distribution may fall outside to real anomaly traffic samples in the latent space. This
the standard normal distribution in the space V , as showned capability allows for the generation of synthetic anomaly
in Figure 4. To simulate the distribution of anomaly sam- samples that exhibit similarity to real-world anomaly traffic
ples, we introduce noise into normal samples in standard patterns.
normal space V . For the sake of simplicity, we choose to
randomly sample from Gaussian Distributions to generate 3.3 Classifier
noise. We employ a reparameterization trick to represent
noise:
After obtaining normal and pseudo anomaly sample fea-
η = µ+σ⊙ε (8) tures, it is natural and straightforward to consider using a
classifier for anomaly detection. We only employ a sim-
where µ represents the mean vector, σ represents the ple classifier to classify the obtained normal and anomaly
standard deviation vector, and ε is the random noise sam- feature vectors. We aim to improve the detection perfor-
pled from the standard normal distribution. By adjusting µ mance of real anomaly samples by encouraging the classi-
and σ, we can generate noise samples from different Gaus- fier to focus more on real normal samples. To achieve this,
sian Distributions. we reduce the number of anomaly samples to half that of
Then we use the generation direction of the bidirec- the normal samples. This prevents the model from overem-
tional flow module to transform the simulated anomaly phasising the features of the pseudo anomaly samples. In
samples in the normal distribution space V , resulting in vec- the testing phase, we only need to combine the classifier
Methods VPN TOR DataCon Params FLOPs
Model
Reverse Distillation [44] 0.6116 0.7450 0.6762 (M) (G)
DFKDE [45] 0.5907 0.7356 0.3969 PADIM [48] 2.78 0.05
DFM [46] 0.7156 0.7514 0.6744
GANomaly [27] 10.73 0.65
DRAEM [12] 0.5698 0.7028 0.6479
GANomaly 1d 45.89 11.94
FastFlow [47] 0.6195 0.6689 0.6571
FastFlow [47] 7.46 0.13
PADIM [48] 0.6726 0.7516 0.6768
PatchCore [28] 0.7058 0.7434 0.4605 DRAEM [12] 97.43 3.11
STFPM [49] 0.5657 0.7371 0.6292 Reverse Distillation [44] 80.61 0.61
CFlow [50] 0.5433 0.7025 0.5850 CFlow [50] 6.45 0.15
GANomaly [27] 0.6239 0.7823 0.6871 STFPM [49] 5.57 0.10
GANomaly 1d 0.5913 0.7166 0.6884 Ours 3.91 0.02
Ours 0.8658 0.8458 0.7292
Table 2: Comparison of model parameters and FLOPs in
Table 1: Anomaly detection AUROC of state-of-the-art the inference phase. Our approach has the lowest FLOPs
methods on DataCon2020, ISCX VPN and non-VPN, and the best detection performance, while also having small
UNB-CIC Tor and non-Tor datasets. We set up different model parameter sizes. The effectiveness of the PADIM
random seeds for three experiments to obtain the average re- method depends on the performance of the pre-trained
sults. Our method achieves the best detection performance model employed. The size of the model will increase as
on each dataset. the capability of the pre-trained model increases.
and encoder GE for anomaly detection, which significantly so naturally we define encrypted traffic and malicious traf-
reducing the model’s parameters and enhancing the model’s fic as anomaly traffic, and non-encrypted traffic and benign
deployment flexibility. We detect anomaly traffic based on traffic as normal traffic. Our training set consists only of
the classification results. normal samples. We have randomly selected 10,000 normal
samples from one dataset to form our training set. Subse-
quently, we have obtained our testing set by randomly se-
lecting 5,000 samples from the remaining pool of normal
4 Experimental Results samples and another 5,000 samples randomly drawn from
the abnormal samples. This process is also replicated for
the training and testing sets of the other two datasets.
4.1 Datasets
4.2 Pre-processing
We have selected three widely used network traffic datasets
for our experiments. Traffic cleaning We first remove packets in the datasets
The ”UNB-CIC Tor and non-Tor” dataset, captured by that may introduce confusion in anomaly detection. The
Arash et al. [51], is collected using Wireshark and Tcp- DNS protocol is used to map domain names and IP ad-
dump. The dataset includes both regular and Tor traffic cap- dresses to each other. Both normal and anomaly traffic ob-
tured from the workstation and gateway, encompassing 14 tain corresponding IP addresses through the DNS protocol
categories of traffic such as Chat, Streaming, Email, and prior to conducting activities. These traffic packets are not
others. directly associated with normal and anomaly characteris-
The ”ISCX VPN and non-VPN” dataset, captured by tics and do not contribute to anomaly detection. The TCP
Gerard et al. [52], is collected using Wireshark and tcp- protocol has a series of stages related to connections, in-
dump. During the capturing process, only the packets with cluding connection establishment, termination, and confir-
the target IP were captured. The dataset comprises a total mation. These packets often do not contain any actual pay-
of 14 categories for regular and VPN traffic, including File load, they are only associated with the connection and have
Transfer, P2P, and more. no direct relevance to specific activities. So we remove both
The ”DataCon2020” dataset [53] is derived from mali- types of packets [54]. In addition, there is also the ARP
cious and benign software collected between February and protocol, which is responsible for the mapping between IP
June 2020. The traffic was generated by sandbox collec- addresses and MAC addresses, but it is often not directly as-
tion from Qi An Xin. The dataset defines malicious traffic sociated with the activities of upper-layer users. Therefore,
as encrypted traffic generated by malware, while the benign we also remove packets related to the ARP protocol.
traffic represents encrypted traffic generated by benign soft- Then we process the structure information in the packet.
ware. The packet header of the data link layer is often responsi-
Our datasets include encrypted and unencrypted traffic ble for managing the transmission of specific physical links
datasets, benign traffic and malicious traffic datasets. In and cannot provide sufficient information for anomaly de-
some scenarios, encrypted traffic is not allowed to be used, tection. Therefore, we remove the data link layer header. In
(a) Tor and non-Tor dataset (b) VPN and non-VPN dataset (c) DataCon2020 dataset
Figure 3: Histogram with density curve. We plot the detection result histogram of the samples in the testing sets of the three
datasets. The curve represents the kernel density estimation of the results. Our method is more effective in distinguishing
between anomaly and normal traffic on the ”UNB-CIC Tor and non-Tor” and the ”ISCX VPN and non-VPN” datasets.
Although the results of distinguishing normal and anomaly traffic on the ”DataCon2020” dataset are not satisfactory, they are
still better than those achieved by other methods.
the header of the IP protocol, there are two fields, the des- 4.4 Comparison with Existing Methods
tination IP address and the source IP address. These two
fields can summarize a series of data communications be-
tween hosts. Detecting anomaly traffic only based on IP We analyze anomaly network traffic through our framework
addresses is considered a shortcut rather than true learning. of anomaly detection. To demonstrate the superiority of our
Therefore, we anonymize the IP addresses of all packets. framework, we extensively have compared it with state-of-
the-art methods in the field of anomaly detection, includ-
ing knowledge distillation based [44, 49], reconstruction
Traffic encoding We use byte encoding to process the based [12, 27], normalization flow based [47, 50], memory
packets by converting the individual bytes in the packets matching based [28], and distribution learning based meth-
into corresponding decimal numbers. This allows us to ob- ods [12, 46, 48]. The methods used for comparison can be
tain a one-dimensional array of packets. To ensure unifor- found in [45], including the code and settings employed.
mity and facilitate model comprehension, we fill the op- For the sake of fairness, We evaluate some methods on
tional fields of the IP and TCP protocols. In addition, there the three datasets by transforming the preprocessed pack-
are two protocols, TCP and UDP, in the transport layer. We ets from a one-dimensional structure to a two-dimensional
have also filled the header of the UDP protocol to match grayscale image. Since these methods employ pre-trained
the header length of TCP, ensuring consistency between the feature extractors specifically designed for two-dimensional
two protocols. The neural network model requires us to images. In addition, we have also modified the model
unify the length of all packets. Taking into account our in [27] and obtained a new method specifically for process-
statistical analysis of packet lengths and the preprocessing ing one-dimensional structured packets, which we refer to
requirements for comparative experiments, we have deter- as GANomaly 1d provided within the code. We run exper-
mined the length of the one-dimensional packet array to be iments 5 times with different random seeds and report the
1600 bytes. Subsequently, we normalize the packets to a mean AUC.
range of 0 to 1. Finally, we combine the processed packets
As shown in Table 1, we can clearly see that on the
with the corresponding labels and store them in a CSV file
”UNB-CIC Tor and non-Tor” and ”ISCX VPN and non-
format.
VPN” datasets, our method achieves the best results and
leads the second-ranked methods with a significant advan-
4.3 Implementaion Details tage of 6% and 15% respectively. In the knowledge distil-
lation based anomaly detection method [44, 49], the differ-
Our model is trained for 100 epochs with early stopping ence between the representations of the anomaly samples by
techniques. For the generator G and discriminator D in the the teacher model and the student model is used to detect the
feature extractor, we use two Adam optimizers with a learn- anomaly. However, the detection is not effective because
ing rate of 0.001, betas of 0.5 and 0.999 to train respectively. the pre-trained teacher model on ImageNet [55] has not seen
The wadv in the feature extractor loss is set to 1, while wrec the network traffic samples. Additionally, the conversion of
is set to 50. Additionally, we have determined the size of the packets from their original one-dimensional semantic
the hidden vector extracted by the feature extractor to be 70. structure to a two-dimensional format disrupts their inher-
We utilize the bidirectional flow module, which consists of ent structure. The added dimension does not add any ad-
8 coupling blocks. For training, we employ the Adam op- ditional information. Feature extraction is commonly em-
timizer with a learning rate of 0.001 and betas of 0.5 and ployed in anomaly detection [56]. Our method employs the
0.999. The two parameters, µ and σ, used for generating feature extractor that is pre-trained specifically on normal
noise are determined through experimental analysis. When one-dimensional network traffic packets. We design the
training the classifier, we utilize the Adam optimizer with a unsupervised feature extractor to represent network traffic
learning rate of 0.001 and betas of 0.5 and 0.999. These pa- packets effectively. This approach avoids directly generat-
rameters have been determined through an process of grid ing raw packets, which would destroy the data structure and
search and experiments. produce meaningless samples, as demonstrated in the Sec-
Noise Distribution Dataset Noise Distribution Dataset
µ σ TOR VPN DataCon µ σ TOR VPN DataCon
0.1 0.6344 0.4598 0.6850 0.1 0.7695 0.7910 0.7058
0.5 0.5888 0.5771 0.6906 0.5 0.5749 0.8082 0.6985
-100 5
5 0.6766 0.7582 0.7292 5 0.7004 0.7707 0.6828
15 0.6051 0.7221 0.7137 15 0.7418 0.8076 0.6926
0.1 0.6482 0.7609 0.4488 0.1 0.6364 0.6703 0.6740
0.5 0.7189 0.8191 0.5752 0.5 0.6331 0.8160 0.7020
-25 9
5 0.8458 0.7732 0.7146 5 0.6545 0.7989 0.6924
15 0.8161 0.7891 0.7092 15 0.6471 0.7427 0.6697
0.1 0.6553 0.8201 0.6893 0.1 0.4361 0.8370 0.7207
0.5 0.6138 0.8414 0.6023 0.5 0.6484 0.4467 0.7026
-10 10
5 0.5583 0.8602 0.6494 5 0.7855 0.7529 0.6966
15 0.7696 0.7892 0.5721 15 0.6094 0.8002 0.6836
0.1 0.6526 0.8118 0.6971 0.1 0.5958 0.6327 0.6486
0.5 0.7909 0.8155 0.6710 0.5 0.4326 0.7602 0.7101
-9 25
5 0.5763 0.8658 0.3903 5 0.8331 0.5499 0.6550
15 0.6485 0.7780 0.7062 15 0.5875 0.8542 0.6755
0.1 0.6523 0.7979 0.6531 0.1 0.7961 0.1944 0.6966
0.5 0.5938 0.7042 0.5486 0.5 0.6590 0.7584 0.6836
-5 100
5 0.6146 0.4137 0.6834 5 0.5877 0.8143 0.6998
15 0.7199 0.7368 0.6655 15 0.5606 0.8347 0.6805
Table 3: Detection performance on different noise distributions. The noise distribution determines the generated pseudo
anomaly samples. By the trick of reparameterization, we can easily get different noise distributions. We explore the effects of
different noise distributions on the three datasets.
tion 4.5.3. The extracted features are more beneficial for performance of all methods is not particularly outstanding.
enhancing the performance of downstream anomaly detec- We analyze that the diversity of categories and the insuf-
tion tasks. ficient number of samples in each category in this dataset
have proposed challenges for each detection method. In ad-
A traffic packet is a one-dimensional data structure con- dition, as shown in Figure 4, the similarity between normal
taining specific semantic fields but lacks spatial seman- packet samples and abnormal packet samples in the Dat-
tics [6]. For methods that employ generated anomaly sam- aCon dataset is relatively high, posing a challenge for the
ples for anomaly detection, such as DREAM [12], intro- model to detect abnormal samples.
ducing noise into normal packet images is meaningless as it
destroys the semantic fields of the original packet and does As shown in Figure 3, we draw the histogram of the
not effectively generate anomaly samples, resulting in poor detection results on the test sets of the three datasets to
results. Some of the problems that exist in current network visualize the detection effect of our model. Our model
traffic generation research [39, 57], such as the difficulty demonstrates excellent performance in distinguishing be-
of training GANs and the need to collect a large number tween normal and anomaly traffic on both the ”UNB-CIC
of samples. Our method is not affected by these issues. Tor and non-Tor” and the ”ISCX VPN and non-VPN”
Our method maps the normal samples to the space of stan- datasets. Additionally, on the ”DataCon2020 dataset”, our
dard normal distribution, and simulates the anomaly sam- model currently exhibits the best detection performance, al-
ples through manipulating them in this space. This method though there is still potential for further improvement.
avoids disrupting the specific semantic fields of the origi- We also compare the model sizes and the FLOPs of
nal data packet and allows for a closer semantic alignment the different methods during the inference, as shown in Ta-
with real anomaly samples. At the same time, we do not ble 2. Our approach is effective for detecting network traffic
need to collect anomaly samples or a priori knowledge of anomalies in computer power limited deployment environ-
anomaly patterns, but only simulate anomaly samples by ments. After training our model, anomaly detection can be
normal samples and randomly sampled noise. By adjusting achieved by retaining only the encoder part of the feature
the distribution of random noise, we are able to simulate extractor and the classifier in the inference process, and
different anomaly samples. On the ”DataCon2020” dataset, the rest of the modules are only used to support the train-
our method also achieves the best results, but the overall ing. Unlike other normalizing flow based methods [47, 50],
our approach does not require the flow module to compute the samples, but we have difficulty in achieving the seman-
anomaly scores, and serves as a module for synthesizing tic conversion from normal to abnormal samples by random
anomaly samples during the training process. noise, which may require careful design of the vectors. Ma-
We employ t-SNE to project the features of the normal, nipulating vectors in the standard normal space proves to be
anomaly, and synthetic anomaly samples into a 2D space, more effective to altering the properties of network traffic.
as showed in Figure 4. Ideally, we hope that the simulated
anomaly samples overlap well with the real anomaly sam- 4.5.3 Ablating Feature Extractor
ples. However, this requires us to spend more time carefully
designing the distribution of the noise. For simplicity, we
have tried sampling the noise from a random Gaussian dis- We conducted training using normal samples to acquire a
tribution and changing the normal sample properties with feature extractor capable of effective feature extraction on
it. While the synthetic anomaly features may not perfectly packets of network traffic. Autoencoders are often used for
simulate real anomaly samples, the model improves the dis- extracting features from network traffic [13, 58]. However,
crimination of normal samples by distinguishing between these features are typically deep representations of manu-
normal and synthetic anomaly samples. ally extracted features. This approach may overlook poten-
tial data connections within the packets. Our feature extrac-
tor introduces a discriminator to improve the capability of
4.5 Ablation Study the autoencoder, which is more helpful in generating dense
sample features.
4.5.1 Adopting Different Noise Distributions We remove the feature extractor in our feature extrac-
tor ablation experiments and directly feed the preprocessed
For our method, the noise distribution plays a particularly one-dimensional packets into the bidirectional flow mod-
critical role, as it directly determines the quality of the gen- ule. The bidirectional flow module is trained to map the
erated samples. In [20], the authors determine the direction one-dimensional packet vectors to the standard normal dis-
of attribute change by the difference between two sample tribution space. In the standard normal distribution space,
vectors with specific attributes. And since our model is only noise is introduced to the vectors, and through the gener-
trained on normal samples, we can only guide the genera- ation direction, simulated anomaly samples are obtained.
tion of simulated abnormal samples by trying random noise. Both normal samples and simulated anomaly samples are
This aspect is crucial for training classifiers effectively. then fed into the classifier for detection. The results ob-
We have tried different combinations of µ and σ, and the tained by retraining the parameters are shown in Table 4.
experimental results are shown in Table 3. This shows that The detection performance directly using one-dimensional
our model is sensitive to the parameters of the noise distri- packet vectors is poor on these three datasets. We believe
bution, which is in line with our expectations. The distribu- that the packet vectors without feature extraction may con-
tion of the simulated anomaly samples in the standard nor- tain a significant amount of redundant information, which
mal distribution space is difficult to determine, depending hampers model learning by lacking concentrated informa-
on the noise distribution from which the noise is sampled. tion. Consequently, this leads to poor performance.
We simulate the anomaly samples by manipulating the nor-
mal samples, thereby enabling the classifier to show excel- 4.5.4 Adjusting the Ratio of Samples
lent recognition capability for normal samples. In addition,
we can easily generate various types of anomaly samples For our model, the aim is for the classifier to learn how to
only by changing noise distributions. distinguish normal samples with a learning focus on such
samples. In addition, we think that if the number of syn-
4.5.2 Ablating Bidirectional Flow Module thetic abnormal samples is increased, it is possible that the
model will shift its learning focus.
Generating anomalous samples is crucial to our approach, We adjust the number of generated anomaly samples to
and the Bidirectional Flow Module is able to fit more com- demonstrate the impact of different ratios of normal sam-
plex distributions by combining multiple invertible trans- ples to anomaly samples on the model. In our method, the
formations, resulting in high quality synthetic samples. In number of synthetic anomaly samples is half the number of
addition, by training this module we are able to map normal normal samples. The other two comparative methods in-
samples to a specified distribution, while anomaly samples volve maintaining an equal number of normal and anomaly
not seen by the module will be mapped outside of the distri- samples, and having twice the number of anomaly samples
bution, thus simulating abnormal samples by deviating from compared to normal samples. The experimental results are
the normal samples. shown in Table 5. Compared to our method, when the num-
We remove the bidirectional flow module and directly ber of anomaly samples increases to be equal to the number
introduce noise to the latent vectors extracted by the fea- of normal samples, there is a slight decrease in performance
ture extractor to simulate anomaly samples. Then, the sim- on all three datasets. As the number of anomaly samples
ulated samples are fed into the classifier for detection. We continues to increase to twice the number of normal sam-
explore different combinations of µ and σ, and the exper- ples, the performance still decreases. Changing the ratio of
imental results are shown in Table 4. It can be observed samples will have an impact on the classifier. We expect
that directly introducing noise to the latent vectors to sim- the classifier to distinguish between normal and anomaly
ulate anomaly samples does not achieve satisfactory detec- samples. However, when the number of anomaly samples
tion results. The latent vectors can affect the properties of increases, the classifier tends to pay excessive attention to
(a) Tor and non-Tor dataset (b) VPN and non-VPN dataset (c) DataCon2020 dataset
Figure 4: T-SNE visualization of representations in latent space. We plot the features of the normal, anomaly, and synthetic
anomaly samples. It can be seen that our synthetic anomaly samples do not overlap well with real anomaly samples, but they
are significantly different from normal samples. The model learns how to accurately identify normal traffic by distinguishing
between them.
the anomaly samples, resulting in a decrease in detection ”DataCon2020” dataset. We train our model on one dataset
performance. and test on another dataset to assess its ability to general-
ize to unseen anomaly samples. As shown in Table 6, when
4.5.5 Generalizing across the Datasets training our model on one dataset and testing it on another,
the results show a slight drop, but remain within acceptable
limits. This shows that our method can have relatively good
In a real network traffic detection scenario, the trained
detection results even in the presence of unknown anomaly
model will be faced with a large number of unknown net-
traffic in real detection environments. Our approach im-
work traffic packets, both normal and anomaly traffic. We
proves the model’s ability to identify normal network traffic
will explore the detection ability of the model when faced
by classifying pseudo anomaly traffic. It is effective across
with the test samples from an unknown distribution, so we
different data distributions.
further investigate the generalization ability across datasets.
While the anomaly samples in three datasets are dif-
ferent, the normal samples are all captured from normal
activities and have similarities. We further study the gen-
eralization of our method across different datasets. Both 5 Conclusion
the ”UNB-CIC Tor and non-Tor” and ”ISCX VPN and non-
VPN” datasets consist of encrypted and unencrypted traf- In this paper, we propose a three-stage framework for
fic. We extend our experiments on one of these datasets and anomaly traffic detection that involves generating simulated
Noise Distribution Dataset Noise Distribution Dataset
µ σ VPN TOR DataCon µ σ VPN TOR DataCon
-100 5 0.3333 0.6416 0.6771 -100 5 0.2876 0.3372 0.6693
-25 5 0.8419 0.5984 0.6985 -25 5 0.6676 0.6493 0.6827
-20 10 0.7559 0.6317 0.6945 -20 10 0.7154 0.4850 0.7059
-10 5 0.7898 0.6274 0.6837 -10 5 0.5435 0.5279 0.6670
-10 1 0.6979 0.6394 0.7031 -10 1 0.3853 0.7602 0.6894
anomaly:normal=1:1
anomaly:normal=2:1
-9 5 0.8019 0.6781 0.6895 -9 5 0.8445 0.5985 0.6731
-9 1 0.7897 0.7820 0.6981 -9 1 0.8083 0.6944 0.6915
-5 5 0.7233 0.4607 0.6775 -5 5 0.7969 0.6685 0.6545
5 1.5 0.7899 0.7173 0.7137 5 1.5 0.8115 0.5829 0.6892
5 15 0.5487 0.5469 0.6912 5 15 0.6553 0.5507 0.5893
9 0.1 0.7639 0.5671 0.7001 9 0.1 0.7880 0.6879 0.6967
9 1 0.3962 0.7811 0.6576 9 1 0.8460 0.6487 0.6999
10 5 0.8308 0.6906 0.6855 10 5 0.7889 0.5456 0.6831
20 1 0.7555 0.7493 0.6429 20 1 0.6377 0.7505 0.7047
25 5 0.8197 0.6163 0.5783 25 5 0.7645 0.6256 0.6793
Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292 Ours -9/-25/-100 5/5/5 0.8658 0.8458 0.7292
Table 5: Detection performance on different ratios of normal samples to abnormal samples. We achieve different ratios by
altering the quantity of pseudo anomaly samples during the training process. The bottom row shows our method, where
pseudo abnormal samples account for half of normal samples, where µ and σ represent different datasets, respectively.
http://arxiv.org/ps/2403.10550v1