Final Report
Final Report
DATA5703
Group Members
Our group, taking project CS40-1, with group members Wenbo Yan, Ling Nga Meric
Tong, Xia Wei, Ruijue Zou and Sophie Zou, would like to state the contributions each
group member has made for this project during semester 1 2021:
• Wenbo Yan:
– Coding:
Prepare datasets for Device #5, #8, #9;
Worked on EDA for partial devices;
Worked with Jane on building DAE for partial devices;
Tune different hyper-parameter for local model on device #9;
Run different devices combinations of training model for experiments com-
parison;
Initial implementation of FedAvgM aggregation
– Research and summarize related studies of DAE and FL models at the start-
ing stage
– Proposal Report: Introduction section, Methodology-GAN section
– Progress Report: Obstacles section
– Final Report: Introduction section, Methodology - DAE model architecture,
FedAvg with momentum, client selection and retraining process
– Client meetings: Took minutes in week 6; Present the architectures and
training process of models to clients
• Xia Wei:
i
– Research about multi-classification
– coding:Work on the PCA distribution of cameral devices; Work on the on
initial implementation of FedAvg with Wenbo; Work on initial implemen-
tation of FedAvgM aggregation with Meric; Run different FL method com-
bination to get results; Run different training devices combination to get
results.
– Proposal Report: Methodology- CNN/Multi-class; Implications
– Progress Report:Deviation to Timeline;Part of milestone table
– Final Report:Abstract; Results
• Ruijue Zou:
ii
· different combination of techniques
· organise experiment records
• Sophie Zou:
All group members agreed on the contributions listed on this statement by each group
member.
Signatures:
iii
ABSTRACT
With the increase in mobile devices and the development of IoT in recent years,
more attention has been applied to the issues of network traffic attacks. In the
traditional anomaly detection method, all the data need to be collected into an
agency to train the model, which is likely to lead to data leakage. Federal learning
is a good solution to this problem.The focus of this study is to improve the security
and generalizability of traditional anomaly detection model by Federated Learning
and then solve the problems related to data leakage, Non-IID data, model training
efficiency and performance. Basic techniques in this project ranging from Deep
Auto-encoder model to different Federated Learning methods were implemented.
Due to the different characteristic of each client, different equipment can affect
the overall model performance, therefore the FL learning is important which could
improve the whole efficiency. In addition, FL learning could protect the privacy
of clients, clients only need to share the parameters of the training model and do
not need to share their actual data. In order to deal with the issue caused by Non-
IID data property, the retraining methods is used to product more stable federated
anomaly detection model, this study also implements different model methods and
training devices, performs different experimental combinations. According to all
experiments combinations, it is found that the FedAvgM has a better performance
compared to other FL learning aggregation,which proves that FedAvgM is much
stability on non-IID dataset. For different method combination, it is found that the
MNP(Federated Deep Auto-encoder model with FedAvgM with No Retraining
and Partial Selection)has the best performance. Moreover, compared to the Non-
FL model, FL model has a better performance,including the higher average TPR,
the higher F1 score, the efficient running time and the excellent security.
iv
TABLE OF CONTENTS
1 Introduction 1
2 Related Literature 3
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Traditional ways of anomaly detection: . . . . . . . . . . . . . 3
2.1.2 Anomaly Detection using Deep Auto-encoders: . . . . . . . . . 4
2.1.3 Anomaly Detection with Federated Learning: . . . . . . . . . . 5
3 Research Problems 7
3.1 Research Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Methodologies 9
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Method for Anomaly Detection Locally . . . . . . . . . . . . . . . . . 12
4.4.1 Architecture of Deep Auto-encoder Model . . . . . . . . . . . 13
4.4.2 Detection Threshold Calculation . . . . . . . . . . . . . . . . . 14
4.5 Method for Anomaly Detection with Federated Learning . . . . . . . . 15
4.5.1 Federated Learning with Client Selection . . . . . . . . . . . . 16
4.5.2 Federated Learning with Retraining Process . . . . . . . . . . . 16
4.5.3 Aggregation in FL Model - Federated Averaging Algorithm . . 17
4.5.4 Aggregation in FL Model - Federated Averaging Algorithm
with Momentum . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.7 Other Related Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Resources 21
5.1 Hardware & Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Roles & Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Milestones / Schedule 23
v
7 Results 24
7.1 FedAvg & Retraining VS FedAvgM & Retraining . . . . . . . . . . . . 25
7.2 FedAvgM & Retraining VS FedAvgM & No Retraining . . . . . . . . . 25
7.3 FedAvgM & No Retraining & No partial Select Clients VS FedAvgM
& No Retraining & Partial Select Clients . . . . . . . . . . . . . . . . . 25
7.4 Different Methods Combination in Nine Devices Results . . . . . . . . 26
7.5 Devices Combinations with MNP . . . . . . . . . . . . . . . . . . . . 28
7.6 Deep Auto-encoder Model VS Federated Deep Auto-encoder Model
(best one) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Discussion 32
8.1 FL Methods Combination Discussion . . . . . . . . . . . . . . . . . . 32
8.2 Different Devices Combination Discussion . . . . . . . . . . . . . . . . 32
8.3 FL vs Non-FL Comparison Discussion . . . . . . . . . . . . . . . . . . 33
Appendices 42
vi
1 Introduction
Due to the growing usage of Internet of Things (IoT) devices, intrusion events in net-
work traffic have also risen dramatically. The network traffic attacks not only become
threats in cyber-security and fraud detection but also occur in every business company
that contains essential information. Nowadays, the need of detecting abnormal events
or behaviours has become the main challenge and focus to prevent and protect the risks
against anomalous activities. To detect anomalous activities, the most common secu-
rity solution is using a network intrusion detection system (NIDS) to monitor all the
activities [1]. In general, a NIDS is a tool or software placed at the network gateways,
proposed to detect if the traffic is most likely an attack or not, to protect the informa-
tion systems for servers. There are different approaches combined with NIDS, such as
anomaly-based systems and signature-based systems.
Signature-based and anomaly-based have one main different conceptual operation that
the first is aimed to seek the specific, known attack, however, it is not suitable to
detect unknown, unfamiliar malicious activities [2]. For instances, using the multi-
classification model to apply on labelled traffic data, which is implying that an un-
derstanding of the attack is already applied. It will not able to recognise any new
anomalous event easily leading to data leakage. On the other hand, the latter system
has a great capacity for detecting any unfamiliar attacks with higher performance in
intrusion detection analysis [2]. This paper focuses on anomaly-based detection ap-
proaches. By classifying the system acting as either normal or anomalous, anomaly
detection is based on finding an unusual point or pattern in the given dataset.
In recent decades, many advanced anomaly-based approaches have been offered to en-
hance the efficiency of anomaly detection. Principally, all different anomaly-based ap-
proaches contain the following three stages, which are parameterisation, training stage,
and detection stage. From establishing the observed instances, characterising the base-
line behaviour and training the built model to analyse the testing values respectively,
those stages are providing great effects [3]. For instance, the well-known algorithms
are the Deep Auto-encoder and KitNET algorithm. Both algorithms extract important
features in well-structured architecture as described in Meidan et al [4] and Mirsky et
al.[1] where the auto-encoder techniques show their merits in nonlinear and complex
problems. Both provide advantages in classifying the given observations as an anomaly
by how they set up the auto-encoder to train the model.
In this study, the main aim is to design and implement the deep learning models for
detecting anomalous events in network traffic through the IoT devices, such as web-
cam, doorbell, baby monitor, etc. By using provided datasets of interest, our models
are trained to evaluate the results in the confusion matrix and F1-score for detecting
1
anomalies activities. We are motivated to detect the anomalous data with the existing
implementation with a decentralised approach, where detection of the anomalous activ-
ities from the benign instances combines the deep auto-encoder model with Federated
Learning (FL). In general, federated learning is a type of privacy-preserving method
which aims to train a deep learning model on multiple devices (which also are called
clients) to process personal data, without explicitly exchanging or storing data samples.
In order to experiment with the best approach for anomaly detection, the federated
deep auto-encoder model is trained with two different aggregations which are feder-
ated averaging and federated averaging with momentum. Moreover, our out-performed
proposed approach improves both the efficiency and robustness of IoT devices anomaly
detection compared with the non-FL model.
2
2 Related Literature
Many studies from past literature have attempted to solve the anomaly detection task.
Anomaly Detection has been an important issue which raised lots of attention for ex-
perts in the application domain, such as computer intrusion attacks and credit cards
fraudulent activities. To detect such anomaly behaviour, action is eagerly required.
A survey conducted by [5] in 2015 provided several techniques for network anomaly
detection, includes statistical anomaly detection, clustering-based anomaly detection,
classification-based anomaly detection and information theory. This literature review
will first introduce the common methods based on the classification.
One widespread solution for Network Intrusion Detection Systems (NIDS) is Neural
Networks (NN). Since it could learn non-linear complicated patterns and events. There
is a Replicator Neural Network (RNN) proposed by [9]. They use RNN for anomaly
3
detection. However, there are some disadvantages of NN, such as the big storage space
and long time required, expensive, high-velocity traffic handling complexity, super-
vised learning and offline processing.
Although classification algorithms can handle datasets with a fixed number of super-
vised samples, detecting anomalous events in IoT devices by using classification mod-
els might not perform powerfully because of the discontinuous and unbalanced datasets.
There are some literature proposed approaches to handle anomaly detection for devices
in IoT environment.
Deep auto-encoder was first implemented by [4] to detect IoT Botnet Attacks on real
traffic data. The N-BaIoT dataset would also be used in this project. The deep auto-
encoder model is trained on benign data only. When a significant reconstruction loss is
detected, then the given samples can be classified as the anomaly. In terms of anomaly
detection, the false alarm rate of deep auto-encoder is remarkably lower than other
commonly used algorithms – Isolation Forest, PCA and SVM [10]. Since the devices
in the IoT environment have various functions, the researcher deploys separate auto-
encoder models for each device. Hence, the advantages of this approach to detect
IoT attacks are able to detect unseen attacks, efficiency to perform semi-online and
heterogeneity tolerance.
Additionally, Kitsune [1] proposed a novel ANN-based NIDS approach that applies
a group of small deep auto-encoder models instead of one deep auto-encoder model
to process unsupervised, efficiently and online. The literature states that it is more
efficient to train multiple layers than to train a single but huge transformation with
dimensional reduction. The dataset they used is the N-BaIoT dataset. The model raises
the processing rate by five times and performs better than other models such as isolation
forest. In this study, we will prioritise apply multiple deep auto-encoder models for
devices to detect anomalous data due to its outstanding performance when the problem
is complicated and non-linear.
Similarly, [11] uses bagging deep auto-encoder with the dynamic threshold for semi-
supervised anomaly detection. The dynamic threshold is found by setting a confidence
level randomly to see which one brings up the highest accuracy during validation. Com-
paring to other methods of only using one deep auto-encoder, this method of combining
multiple deep auto-encoders enhances the model’s robustness. They extract samples
from the original training set and obtain multiple sub-training sets for learning by mul-
tiple deep auto-encoders. These learners’ predictions are combined to integrate the final
prediction.
4
2.1.3 Anomaly Detection with Federated Learning:
Due to information security, the problem of training on decentralised data from IoT
devices has become an important research direction. For the first time, academics pro-
posed the concept of Federated Learning in 2016 [12]. The main purpose of federated
learning is to prevent leakage of private information. Federated Learning allows each
device to perform training on locally and send the results to the central server to ensure
privacy. The paper introduces the Federated Averaging algorithm – a fraction of clients
is selected in each round, the average gradient on each device’s local data is calculated,
the local update is iterated multiple times before the average, and the weighted average
of model results for all these clients is then sent to the server. This algorithm is compu-
tationally efficient but requires immense number of rounds of training to generate well
results. From their results, the FedAvg model reaches an accuracy of 99.44% after 300
rounds of training. In addition, FedAvg is effective at optimising the training loss, thus
is robust to unbalanced and non-IID data distributions.
A recent study of federated learning on anomaly detection has been carried out by [13].
This paper used federated learning on a LSTM model to learn time series IoT data. The
federated learning approach is efficient for demonstrations on simulated datasets such
that real data from a general electric current smart building can be distribute evenly.
This paper also uses the federated averaging algorithm to train their model and repeated
iteratively until the model convergence or the maximum number of training rounds is
reached. The training process involved inputting data from multiple heterogeneous data
sources across several sensors to address the common heterogeneous challenges of IoT
devices.
[14] integrated federated learning with block-chains to create a distributed and im-
mutable audit trail. A deep auto-encoder with three hidden layers was chosen as the
anomaly detection method in this paper, with a total of 3000 weights. The paper also
uses the FedAvg algorithm for the aggregation. Based on the performance of deep auto-
encoder, our project will also choose the deep auto-encoder model as the local model
and will follow similar data pre-processing and setup of the loss function and metrics.
A trade-off is identified between the local rate of convergence on the clients and the
audit frequency of the global model updates.
From all the studies above, there still exist many challenges of federated learning in the
IoT environment. The main challenges are the heterogeneity that are from the device
(high communication cost), statistical (non-IID data will result in weight divergence),
and model (different types of device may have different models). The concept of Fed-
erated Transfer Learning (FTL) was introduced in [15] in attempt to solve these issues.
FTL takes inspiration from transfer learning, so that further personalisation can be used
5
to transfer the globally shared model to distributed IoT devices to resolve statistical het-
erogeneity. The performance of transfer learning is strongly dependent on the relation
between different domains. As devices grouped in Federated Learning typically come
from the same type of industry, it is suitable to apply transfer learning in the federated
learning framework [16][17].
[18] delivered some other strategies to deal with the Non-IID data, such as data aug-
mentation, creating a small dataset which can be shared globally, tuning the capability
to run training on the local data on each device and adding an assumption on data dis-
similarities. [19] created an imbalanced data to simulate a real dataset which is used
to compare the local model with the FL model and to find whether the FL model can
improve performance without sufficient data.
6
3 Research Problems
With the development of information technology and the progress of the digital age,
we are generating huge amounts of data every day. The foundation of artificial intelli-
gence, deep learning, and other advanced data driven technologies lies in data. In many
industries, organisations are unable to share data due to data privacy, trade secrets, and
government regulations. This can lead to data island issues and hinder research institu-
tions and companies from integrating data to drive industry development. Traditionally,
businesses and government agencies use their own data to conduct research. For exam-
ple, hospitals use their own patient data to conduct medical research to prevent leakage
of patient privacy, which is not good for hospitals to build smart medicine businesses.
However, when all the data are collected into a specific agent for data mining and anal-
ysis, it is easy to lead to data leakage and security risks. Federated learning can make
full use of the data between different clients and build digital models such as artificial
intelligence by uploading parameters of local models without sharing data. Thus, the
problem of data island can be solved with ensuring data security. This study is mainly
divided into two parts. The first part is to find the optimal federated learning model by
analysing and comparing different federated learning models and the combination of
different training devices. The second part illustrates the usability and advantages of
federated learning by comparing the performance, security, and stability of the model
between federated learning and non-federated learning.
Compared with the traditional machine learning approaches, the main advantage of
federated learning is that it can realise the function of data security while guaranteeing
the performance of the model. The fundamental question of this research is whether
the anomaly detection model based on federated learning is superior to the anomaly
detection model based on non-federated learning, and what are the specific advantages
of federated learning. To answer this big question, we divided our research into the
following small questions:
There are different aggregation algorithms and some advanced training methods in fed-
erated learning. What impact will these different algorithms have on our dataset and
7
local model? What is the best method? We will try different federated learning algo-
rithms to get the best performance and security of our model.
In this experiment, due to the property of the data, trying different training devices will
have different results, and we want to find out an optimal devices combination to train
our federated learning anomaly detection model.
In this experiment, we will compare the evaluation metric and running time of the two
methods to illustrate the advantages of the FL model in this experiment, in addition to
safety.
• Training clients typically have a large number of sets in the federated learning
dataset, rather than nine in the current study. Therefore, we do not emphasise
the influence of the amount of training clients in each round on the model per-
formance, but more emphasis is placed on the influence of the combination of
different specific devices on the model results.
• The local model adopted in this study is the state-of-art Deep Auto-encoder tech-
nology, so even a few rounds of training will have a very high performance.
Therefore, this study no longer emphasises the tuning parameters of Epochs,
Batchsise and Learning rate, but the influence of different federated learning
methods on the model performance.
8
4 Methodologies
In this section, the two proposed approaches are discussed in details with data pre-
processing, non-FL and FL models architecture, and implementation. By following
our central idea, we experiment with the best practice on anomaly detection with a
privacy-preserving approach. The deep auto-encoder model is used not only for non-
FL and FL model comparison but also is used as the local model as a part of the FL
model. The key idea is that using the DAE model can enhance the capacity of deal-
ing with non-linear representations and project them into compressed layers for model
learning. Eventually, the reconstruction error is used for detecting whether the observa-
tion is anomalous or benign events. Moreover, there are three major achievements after
implementing the local model into the federated learning model: privacy-preserving,
efficiency, robustness.
In this project, the N-BaIoT dataset is provided by the Kaggle website[20], which can
also be collected from the UCI website[21]. This dataset contains 89 CSV files that
consist of traffic data of nine devices. There are 7,062,606 traffic data in total, and each
traffic data has 115 statistical features. The following screenshot is a preview of the
one dataset. According to the explanation of the data provider, the names of features
stand for the specific statistics of the traffic data during a time frame. Each feature name
consists of three parts, and the three parts are connected by an underscore. The first part
includes some abbreviation, such as "MI_dir", "HH" and "HpHp", to represent a stream
aggregation. The second part, like "L5" and "L3", means a time frame, to represent the
stream in the most recent 500 ms or 1.5 sec is captured. And the last part includes
the specific statistics information, such as "weight", "mean" and "variance". A more
detailed explanation of the feature names can be found on the UCI website.
9
4.2 Data Analysis
There are nine devices involves in the dataset and these nine devices belong to five types
of IoT devices, including thermostat, doorbell, baby monitor, webcam and security
camera. Normally, when a device is functioning, it will generate normal traffic data,
which is benign. However, the nine devices are attacked by ten types of Botnet attacks
respectively, which will lead to each device generating a large amount of anomalous
traffic data. The following table shows the number of benign traffic data and attacked
traffic data for each IoT device. It can be seen that the Botnet attack has ten types.
From the pie chart (Figure 3), the number of benign traffic data counted for only 8%
while there are 92% of the traffic data is anomalous. Also, according to the bar chart
(Figure 4) below, it is clear to see that, for each device, the number of anomalous data
is much more than benign traffic data. Therefore, the data is highly unbalanced.
10
Figure 4: Distribution of Benign Data and Anomalous Data among Nine Devices
Additionally, to better understand the dataset, the statistical information of each feature
has been generated. The table(Figure 5) shows a part of the outputs. It is noticed that
the different features have a quite different range. As it is difficult to which features are
more important than others, all the 115 features would be used in this project.
Besides, we have double-checked whether there is any missing value for each CSV file,
and the results reflect that this dataset is complete.
There are 89 CSV files in total and more than six separated CSV files for each IoT de-
vice. To complete the following methodologies for anomaly detection, it is required to
load all CSV files at the beginning, which is time-consuming and potential for making
mistakes. Therefore, to prevent the possible risks and make the future detection pro-
cess more easily, for each device, the separated CSV files that belong to the same IoT
11
device have been combined into a single CSV file. There is a new feature added at the
end called "type", to distinguish the traffic data is benign or attacked by which Botnet
attack. For instance, the type "benign" means the data is benign, whereas "m_scan"
indicates the data is attacked by "Scan" of the Mirai Attacks and "g_combo" represents
the data is attacked by "Combo" of the Gafgyt Attacks. Before the combination, the
feature names and their order have been checked to ensure every CSV files are consis-
tent. After the processing, the nine devices have a single CSV file with 116 features
respectively.
As we mentioned in the related literature, there exist many different kinds of neural
network of choice for anomaly detection. In this project, we proposed the deep auto-
encoder model for our anomaly detection application, both as the non-FL model and the
local model for Federated Learning. This is because the deep auto-encoder is capable
for learning the feature and representation of the observation through many layers and
neurons. In the following, the architecture of the deep auto-encoder model is discussed
in details, from data pre-processing to anomaly detection.
Figure 6 provides an overview of the whole process of anomaly detection by using the
deep auto-encoder model locally, starting from data pre-processing. After the datasets
of different kinds of attacks and benign are combined and labelled as traffic data, the
dataset if split into two kinds - benign and anomalous data. For data pre-processing,
12
the benign data was split into three parts evenly, which include benign_train, benign_tr,
and benign_test.
Firstly, benign_train set is only used for training the DAE model. Since DAE is a
special type of NN model that makes target values the same as the input values by forc-
ing the auto-encoder model to learn the input representation, the proposed DAE model
only uses benign_train to train and to extract the essential information of benign data.
Through the DAE model, benign_train data are treated as the input values which are
encoded and decoded from the input layer to hidden layers and to output layers. During
this training process, the hidden layers learn the most patterns of the benign data and
ignore the “noises”. The loss function we used for this DAE model is the Mean Squared
Error Loss (MSELoss). Once the model is trained, the benign_tr dataset is specifically
used as input into the trained model with outputting the MSELoss for calculating the
threshold as the reconstruction error for detecting anomalous. It is important to com-
pute the reconstruction error by using the loss values from the output and input. As if
the loss value is low, then it could be assumed that the model is recognising the instance
is inside of the known distribution.
Eventually, after the model is trained and reconstruction error is defined, benign_test
is combined with anomalous data as mix_data in this practice. The mix_data values
will be imputed into the trained model for evaluation. With the return of the loss values
from mix_data values, they will be compared with the threshold values. If the value is
higher than the threshold values, the input is recognised as anomalous data. Or else,
the input is recognised as benign data. In this practice, all nine devices are applied with
the same local model structure as non-FL model results.
The proposed network structure of the Deep Auto-encoder (DAE) is shown in Figure
7. It contains an encoder layer, hidden layer and decoder layer. Using the DAE model,
the size of the target values is as same as the input value, as it forces the DAE to learn
13
Figure 7: Deep Auto-encoder model Architecture
the representation of the input data. The deep auto-encoder model is only trained with
benign traffic data to learn the essential characteristics, then uses the proposed recon-
structed error from Median et al.[4] to recognised if the new observation is anomalous
data or not.
Theoretically, the output of the encoder layer is the input of the hidden layer, then the
output of the hidden layer is the input of the decoder layer. During the training process
of the DAE model, the benign traffic data is firstly encoded into four convolutional
neural network layers which are 75%, 50%, 33% and 25% of the set of the input of data
as following the work of Median et al.[4]. As DAE is a symmetrical network structure,
the encoded vector is then decoded layer by layer with 25%, 33%, 50%, and 75%
respectively after the final encoding is fully compressed. The decompression in this
model is followed through linear layers from the lowest encoding layer to reconstruct
the data.
After the model is trained and its mean square error (MSE) is extracted, as demonstrated
by Median et al.[4], the threshold (tr) is used as equation 1 shown for determining
normal and abnormal observations. The threshold equation is the sum of the sample’s
MSE mean and the sample’s MSE standard deviation over the benign train data. When
the value of the instance is above the threshold value, it is considered anomalous data,
otherwise, it is considered benign data.
14
4.5 Method for Anomaly Detection with Federated Learning
To achieve the project goal, the Federated Learning technique has been applied to the
deep auto-encoder model to detect anomalous data. According to the literature review,
Federated Learning could help training the deep learning model on various devices
to process data while keeping the preservation privately. One of the advantages of
applying Federated Learning is it does not require storing and exchanging any personal
data, which could ensure the privacy problem.
The Figure 8 shows the model architecture of the anomaly detection by using the fed-
erated deep auto-encoder model. For each device involved in the training process, also
called clients, the data pre-processing is the same as the way of the local detection
method. When starting the federated detection process, the training process would
happen simultaneously on all devices for training. Also, the weights of each deep auto-
encoder model of each device are sent to the global server, and the aggregation function,
such as FedAvg or FedAvgM, process the weights. Each deep auto-encoder model of
each device would receive the aggregated weights and continues on the training pro-
cess.
The main step of the entire process includes the forward process, sends weight to the
global server for the update and the back propagation process. In federated learning,
this whole process is called one communication round. There are a number of com-
munication rounds have been applied to update the weights and more rounds tend to
improve the performance of the federated model.
15
Furthermore, in the federated model architecture, there is a retraining process has been
added to make the model handling non-IID datasets, which also ensures the model to
be more suitable for the real world. The retraining process means the clients’ model
would be trained additionally before the weights aggregation on the global server. The
data for this process is randomly selected from the corresponding training data.
Moreover, to improve the federated model efficiency and robustness, a partial selection
mechanism has been involved in the federated model architecture. This means the
partial client devices would be randomly chosen for the real training process in each
communication round instead of using all of the client devices.
After the federated model is trained, the anomaly detection can be directly applied on
both the client devices and any new devices.
The datasets used in this study are heterogeneity and non-Identical Independent Dis-
tribution (non-IID). However, in the FL model, the whole configuration is sensitive to
the distribution of the client’s data classes and feature. It may affect the whole training
time and accuracy. Driven by the mentioned acknowledgment, we also propose two ap-
proaches, which are client selection and retraining process in the FL with DAE model
to improve the performance of the model. As shown by the Results section, the com-
parison of using/not using client selection and retraining approach has different effects
on the FL model’s efficiency and robustness.
Instead of using the communication rounds to update the weights and improve the per-
formance of the whole FL model, client selection is also used after to minimising the
non-IID dataset’s impact on the model. During the client selection process, the model
is randomly choosing partial training devices to be involved for training the model. For
instance, five devices are chosen to train the model as clients, only two clients are cho-
sen for training the global model in each communication round from total five number
of clients. The selected clients are updated with the global weights in each communi-
cation round.
In addition, combined with the client selection process, the retraining process is in-
volved optionally for addition improvement on the FL model. It also aims to resolve
the issue of using Non-IID datasets and make the model more suitable for the real
world. There will be randomly selected 1000 instances from the benign_train data for
16
the retraining process. The client’s model will be retrained before the aggregation of
weights on the global server, so that the weights from both communication round and
retrained round are used for the aggregation of weights on the global server. The local
models on each client will be updated with the global weights before the next round
starts for further training. Therefore, when the global model is done with updating and
training, it is used to test the model as anomaly detection.
Federated optimisation is built from stochastic gradient descent (SGD, and has the fol-
lowing properties):
• applied naively to the federated optimisation problem, where a single batch gra-
dient calculation is done per round of communication
• computationally efficient
• requires very large numbers of rounds of training to produce good models (even
with batch normalisation)
The average gradient on its local data at the current model wt , and the central server
aggregates these gradients and applies the update:
K
nk
wt+1 ← wt − η ∑ gk (2)
k=1 n
nk
Since ∑K k
k=1 n gk = ∇ f (wt ), an equivalent update is given by ∀k, wt+1 ← wt − ηgk ,
K
nk k
wt+1 ← wt − η ∑ wt+1 (3)
k=1 n
Each client locally takes one step of gradient descent on the current model using its
local data, and the server then takes a weighted average of the resulting models. More
17
computation can be added to each client by iterating the local update:
wk ← wk − η = Fk (wk ) (4)
• E, then number of training passes each client makes over its local dataset on each
round;
• and B, the local minibatch size used for the client updates
From a statistical perspective, FedAvg has been shown to diverge empirically in settings
where the data is non-identically distributed across devices.
18
4.5.4 Aggregation in FL Model - Federated Averaging Algorithm with Momen-
tum
w ← w − ∆w
w ← w−v
Confusion Matrix:
True Class
Abnormal Benign
Abnormal True Positive (TP) False Positive (FP)
Predicted Class
Benign False Negative (FN) True Negative (TN)
19
The confusion matrix for anomaly detection plots the output of the predicted class
against the actual class, where positive represents abnormal data and negative repre-
sents benign data. The following terms can be extracted from the confusion matrix:
• True Positive (TP) is when the model correctly predicts the positive class.
• True Negative (TN) is when the model correctly predicts the negative class.
• False Positive (FP) is when the model incorrectly predicts the positive class.
• False Negative (FN) is when the model incorrectly predicts the negative class.
The Evaluation Metrics used to compare the performance of different models are
given by the equations below:
T P+T N
Overall Accuracy Acc = (T P+FP+FN+T N)
MSE Loss measures the average squared difference between the estimated values and
the actual value, as given by:
1 n
MSE = ∑ (Yi − Ŷi)2
n i=1
(6)
20
5 Resources
All the experiments were implemented on a virtual cloud environment Google Colab,
the specific setting as follow:
Hardware Configuration
CPU IIntel(R) Xeon(R) CPU @ 2.30GHz
Hard Disk 69G
Memory 12.7G
GPU Tesla K80
In this project,the Numpy and Pandas libraries were be used all the time to do the data
manipulation. We mainly use scikit- learn libraries to conduct data preprocessing and
build the evaluation metrics. Pytorch was used to build the local models and Federated
Learning models.
Software Version
Python 3.7.10
Pytorch 1.8.1
Numpy 1.19.5
Pandas 1.1.5
Scikit-learn 0.22.2.post1
5.2 Materials
Description Link
Kaggle dataset https://www.kaggle.com/mkashifn/nbaiot-dataset
Federated Learning tutorial https://classroom.udacity.com/courses/ud185
course
Pytorch tutorial https://pytorch.org/tutorials/
Federated Learing implemen- https://towardsdatascience.com/preserving-data-
tation tutorial privacy-in-deep-learning-part-3-ae2103c40c22
GitHub https://github.com
21
5.3 Roles & Responsibilities
22
6 Milestones / Schedule
23
7 Results
We have set different experiment combinations with the methods mentioned in part 4
to test the FL model’s performance. The components of the experiment combinations
include FedAvgM, FedAvg, Retraining, No Retraining, Partial Selection, No Partial
Selection. Since a lot of experiment combinations in our study, there is a need to
use some series numbers to distinguish them. The abbreviation of each experiment
combination is given by the letter of each component. Take the method of MRP as an
example, it is the combination of FedAvgM, Retraining and Partial Selection.In more
detail, for FL learning, the first letter “F” and “M” on behalf of FedAvg and FedAvgM.
For the retraining process and no retraining process, the second letter “R” and “N”
represent their processes respectively. During the training,the third letter “P” and “N”
on behalf of partial selection and no partial selection. The figure 9 shows a summary of
all the combinations. In order to find the best experimental combination, we compare
the performance of every experiment combination using the average TPR, average FPR,
average F1 score and Running time.
24
7.1 FedAvg & Retraining VS FedAvgM & Retraining
Intuitively, FL learning with partial Select Clients would take less time. The perfor-
mance of FL learning with No Partial Select Clients are further investigated. From
25
previous experiments, it is known that the FedAvgM with No Retraining performs bet-
ter. Therefore, we can compare MNN and MNP to evaluate the importance of the
component of Partial Selection. For the parameters of these two experiments, there is
no need to use the parameter retrain_epochs for both experiments. The other parame-
ters for MNN are the same as in table 1. For MNP, the num_selected is 3 which means
in each round, the global model randomly select 3 devices from all devices to train; the
other parameters are kept the same as in table 1. From the table below, there is almost
no difference between MNN and MNP, but the running time of MNP is efficient which
is about 13.1 mins. Therefore, according to the experiment combinations previously,
the MNP is the best method, where Federated Deep Auto-encoder model is combined
with FedAvgM, No Retraining and Partial Selection.
To show the results of different federated learning methods in more detail, the average
TPR, FPR, F1 score and Training time of the eight methods are shown in Table5. It can
be seen that the MNP method has the best performance in FPR, F1 score and Training
time. The MRP method’s training time is only about 18 seconds slower than MNP,
which indicates that FedAvgM,No Retraining and Partial Selection methods spend less
time and have better performance. Tables 6, 7 and 8 respectively show the results of
FPR, TPR and F1 score on nine devices by different methods in detail. It can be seen
from the three tables that although the TPR and F1 score of devices 3 and 9 is relatively
low, they still have relatively low FPR compared with other devices.
26
Table 6: TPR for different methods in nine devices
Method Device Number
#1 #2 #3 #4 #5 #6 #7 #8 #9
FNN 0.99999 0.99999 0.35083 0.99987 0.99992 0.99988 0.99961 0.99991 0.75814
FNP 0.99999 1 0.35083 0.99986 0.99992 0.99988 0.9996 0.99991 0.75814
FRN 0.99999 0.99999 0.35083 0.99987 0.99992 0.99988 0.99961 0.99991 0.75814
FRP 0.99999 1 0.35083 0.99986 0.99992 0.99988 0.9996 0.99991 0.75814
MNN 0.99998 1 0.35081 0.99986 0.99994 0.99989 0.9996 0.99991 0.75813
MNP 0.99998 1 0.35081 0.99986 0.99994 0.99989 0.9996 0.99991 0.75813
MRN 0.99998 1 0.35081 0.99986 0.99994 0.99989 0.9996 0.99991 0.75813
MRP 0.99998 1 0.35081 0.99986 0.99994 0.99989 0.9996 0.99991 0.75813
27
7.5 Devices Combinations with MNP
From the previous experiments, the performance of device 3 and device 9 are much
lower than that of the other devices. The following two figures show the performance
comparison for the nine devices. The TPR and F1 Score of device 3 and 9 using all
methods are lower than the others.
28
shown to be better, as it has a higher TPR, lower FPR, higher F1 score and relatively
efficient running time. From the results, the best model is the MNP(Federated Deep
Auto-encoder model with FedAvgM with No Retraining and No Partial Selection) with
device1247 for train.
After screening the best anomaly detection deep auto-encoder model based on federated
learning, we compared the performance of federated learning model and non-federated
learning model. The figure 12 depicts a comparison of TPR between FL and Non-FL
in nine devices, where the TPR of Device3 and Device9 in Non-FL model performs
extremely poor compared to the FL model. Besides, as we can see from figure 13 and
figure 14, the FL model can achieve more stable and higher performance in average
F1 score and TPR metrics. Moreover, the table 10 shows that the FL model training
process can be much faster the Non-FL model, which is a huge superiority for building
instant anomaly detection model. However, the Non-FL model have the lower mean of
FPR which is about 0.006 less than the FL model.
29
Figure 12: FL vs Non-FL TPR comparison for nine devices
30
Figure 14: FL vs Non-FL Average TPR,FPR comparison
31
8 Discussion
In this project, we have tested different federated aggregation algorithm, training model
and client selection model respectively. By comparing different evaluation metrics, we
can finally conclude that the combination of FedAvgM, No Retraining and Partial Se-
lection is the best Federated Learning model. In addition, the dataset used in this study
is Non-IID data, and the categories of data are very uneven. According to section7.1,
we can find that the FedAvgM aggregation algorithm has better performance than Fe-
dAvg in this paper. This further verifies that the FedAvgM algorithm proposed by T.-M.
H. Hsu et al. has better performance and higher stability on non-IID datasets [24]. Sur-
prisingly, retraining and no retraining have the same evaluation scores, but the training
time of retraining is longer because there is an extra training session for retraining. The
Retraining process itself is used to deal with the instability in model training caused by
Non-IID data, and the likely reason for this result in section7.2 is that the data sets used
this time are all traffic data of Internet of Things. Although there exists different types
of devices, these devices are likely to have overall similar features. As a result, retrain-
ing method may not play a good role in generalization. Partial selection can speed up
the training of the model for it can decrease the number of clients in each communica-
tion round, but the FL model can still learn the characteristics of each device through
multiple rounds of training.
Moreover, in traditional federated learning studies, the number of clients is very large.
For example, in the vanilla FedAvg algorithms proposed by Google, the number of
clients is 100, and each client only have 600 samples [12]. Therefore, the parameters
tuning process of a federated learning technology have a significant impact on the re-
sults. As shown in table 11,12, adjusting parameters such as Batch size or Epoch has
no effect on the results of the data set of this study, because there are few training de-
vices and the data volume of each device is very large. Thus, by combining different
federated learning techniques, we can implement the process of techniques tuning for
a small number of clients data sets to select the best FL methods.
Through the previous study, we found that when we trained with device #3 and #9 data,
the models performed very poorly on both the test datasets of device #3 and #9, which
was probably caused by the different data characteristics of them. As can be seen from
figure 4, the amount of data of device #3 is small compared with other data, and the
32
proportion of normal data is large, which may lead to over-fitting, thus resulting in poor
test result of the model. Device #5, #6, #8 and #9 are all security camera type devices,
but the combination of Device #5, #6 and #8 performs well, so we guess that Device #9
may have a special distribution of data features. We performed PCA decomposition of
the 115 features of these four devices, and drew the first two principal components as
shown in figure15. Device #5 and #6 have similar pattern distribution, while device #8
and #9 are generally similar, but the abnormal data pattern in device #9 is not the same
with device #8. This probably resulted in device #9 not performing as well as device
#8.
Figure 15: PCA plots of abnormal and benign data in Device #5, #6, #8 and #9(0:Ab-
normal, 1:Benign)
Device #5, #6 and #8 is a combination of the same device type (all security cameras),
while Device#1, #2, #4 and #7 is a combination of different types of devices. Device#1,
#2, #4 and #7 has the better performance, as the FPR is around 0.005 smaller than De-
vice #5, #6 and #8. It is likely that this combination learns from the data characteristics
in different devices, thus improving the generalization ability and stability of the model.
Finally, we get the optimal device combination for Federated Learning to achieve the
best model performance. And we also found that when a single device, such as device
#3 crashes, we can discard it, and federated learning can still train well-performing
models on a small number of devices without put all devices into the training process.
To be fair and credible to compare the Federated Deep auto-encoder model and Deep
auto-encoder model performance, we adopted the same model architecture and training
33
data partition way with this study [4]. However, they did not give the specific way of
how to partition test data, so we have no way to directly compare our results with theirs,
and we finally choose to build our own test set. In section 7.6, it can be found that the
performance of FL model is better than the Non-FL model, especially in Device #3 and
#9, where FL model has a high F1 score and FPR. This shows that FL model has a very
good generalization ability, and the characteristics of different devices can be learned
by aggregating weights and other parameters of different local models. In addition,
compared with Non-FL model, FL has an extremely fast training speed, because FL
model requires less training devices in each round, so it requires less training data. In
industrial applications, it is important to build a fast network anomaly detection system,
because the earlier the anomaly is detected, the lower the loss of business and personal
property.
Therefore, the FL model is superior to the non-FL model, especially in terms of data
safety, model generalization and model training speed. Another advantage of the FL
model is that it can train a small number of devices to obtain a global model that is
applicable to all devices. It also ensures that in practical applications, IoT security
companies can obtain a general model from only a small number of devices which is a
huge promotion on IoT intrusion detection.
34
9 Limitation and Future Work
9.1 Limitations
There exists some limitations in our study which we intend to improve in the future, of
which most limitations were due to time constraints of our capstone project.
• Quality of Dataset – The distribution of the benign and malicious data was ex-
tremely uneven. Most devices had a vast number of anomalous traffic data and
a lack of benign data. This uneven distribution may have resulted in the low
accuracy of device #3 and #9 when using them as the training dataset.
• Use of previous literature for local Auto-encoder model – We used the data
selection procedure and threshold calculation methods from previous literature
as their results showed the best performance. The use of methods from previous
literature may limit us as our dataset may require different percentages of training
and testing data to calculate its best threshold for the auto-encoder model.
• Hyper-parameter Tuning – has been performed on batch set however the ac-
curacy and F1 score difference from hyper-parameter tuning was not obvious
enough for interpretation.
• Time Constraints – This project was completed as a capstone project for Master
of Data Science which had a time frame of one semester only. Due to such time
constraints, we could not experiment with more advanced methodologies and
algorithms.
• Platform Constraints – We mainly used Google Colab to run our codes. The
FedAvg required many training rounds (at least 300 rounds) to generate a good
model. Unfortunately, as Google Colab has a limit of only 12G of RAM for each
35
user, the approach of Federated Learning using PySyft resulted in Colab to crash.
This limitation caused us to disregard PySyft in our code.
As mentioned previously, the small sample size of only nine devices is a limitation in
our study. We can introduce more devices by splitting our data from each device into
100 sets, each representing one device. Thus, we will generate a total of 900 devices
for evaluating our federated model. This will allow us to model real world situations of
federated learning and be able to gain a better understanding.
Federated learning can be integrated with blockchains to support the auditing of ma-
chine learning models without the necessity of centralizing the training data. The basic
methodology of setting up the auto-encoder model for anomaly detection will be the
same as our proposed methods, where a metric used to identify whether a test sample
belongs to benign or not. The weight updates and models are then stored with each
epoch on the blockchain. To leverage the blockchain, a parameter server will then
process the weight updates that have been stored and confirmed on the blockchain.
As Federated Learning often involves numerous weight updates, public blockchains
such as Bitcoin would be too slow to generate the new blocks on the ledger. An open
source private blockchain called MultiChain supports multiple assets and mining with-
out proof-of-work. However, the blockchain can only verify the data stored inside it
but cannot assure accuracy overall. From the paper [14], the block-chained federated
learning approach increased their performance of around 5~15%. By applying this ap-
proach to our data, we hope to increase the performance of federated learning overall
when dealing with many devices.
A major source of our limitation was the non-IID characteristic of the N-BaIoT dataset.
Traditional Federated Learning assumes that the training data from different devices
share the same feature space. This is often not applicable in real world circumstances.
Our dataset consists of devices ranging from security cameras, doorbells, and ther-
mostats. Transfer Learning allows models to be trained on a large dataset for one do-
36
main is applied to a different but related domain. This technique can be combined with
Federated Learning to mitigate the statistical heterogeneity. Federated Transfer learn-
ing (FTL) transfers the globally shared model to distributed IoT devices. A typical
architecture of federated transfer learning is shown in Figure 16. Consider two datasets
A and B, where there exists only a small overlap in feature space and sample space.
The model learned from B is transferred to A by leveraging the small overlapping data
features. FTL transfers knowledge from non-overlapping features from source domain
to the new samples in the target domain, the region in right upper corner of the Figure
16.
In this project, we used use only two simple aggregation techniques for Federated
Learning, which are FedAvg and FedAvgM respectively. Further Improvement can
be achieved by changing the local model and the aggregation algorithm. The Federated
Matched Averaging (FedMA) algorithm is a layer-wise federated learning algorithm
designed for CNNs and LSTMs architectures. This aggregation algorithm accounts for
permutation invariance of the neurons and permits global model size adaptation. The
psuedocode for Federated Matched Averaging is given by:
37
Figure 17: Federated Matched Averaging
The paper [27], which proposed the FedMA algorithm, illustrated that FedMA outper-
forms prior federated learning algorithms and can efficiently utilize well-trained local
modals. The number of local training epochs E can affect the performance of FedAvg
and sometimes lead to divergence. It is also observed that longer training benefits
FedMA, which means that FedMA performs best on local models with higher quality.
In contrast to FedAvg algorithm, longer local training leads to worsening overall accu-
racy. Thus, FedMA allows local clients to train their model as long as required. In ad-
dition, extensions of FedMA to improve Federated learning can be done by additional
deep learning building blocks, such as residual connections and batch normalization
layers.
38
References
[1] Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shabtai, “Kitsune: An ensemble of
autoencoders for online network intrusion detection,” 2018.
[7] K. Heller, K. Svore, A. Keromytis, and S. Stolfo, “One class support vector ma-
chines for detecting anomalous windows registry accesses,” 12 2003.
[9] S. Hawkins, H. He, G. Williams, and R. Baxter, “Outlier detection using replica-
tor neural networks,” in Data Warehousing and Knowledge Discovery, Y. Kam-
bayashi, W. Winiwarter, and M. Arikawa, Eds. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2002, pp. 170–180.
[11] B. Guo, L. Song, T. Zheng, H. Liang, and H. Wang, “Bagging deep autoencoders
with dynamic threshold for semi-supervised anomaly detection,” in 2019 Inter-
national Conference on Image and Video Processing, and Artificial Intelligence,
vol. 11321. International Society for Optics and Photonics, 2019, p. 113211Z.
39
[12] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
“Communication-efficient learning of deep networks from decentralized data,”
2017.
[15] Q. Wu, K. He, and X. Chen, “Personalized federated learning for intelligent iot
applications: A cloud-edge based framework,” IEEE Computer Graphics and Ap-
plications, vol. PP, pp. 1–1, 05 2020.
[16] S. Saha and T. Ahmad, “Federated transfer learning: concept and applications,”
2021.
[17] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and
applications,” ACM Transactions on Intelligent Systems and Technology (TIST),
vol. 10, no. 2, pp. 1–19, 2019.
[19] Y. Chen, J. Zhang, and C. K. Yeo, “Network anomaly detection using federated
deep autoencoding gaussian mixture model,” in International Conference on Ma-
chine Learning for Networking. Springer, 2019, pp. 1–14.
[20] K. Naveed, “N-baiot dataset to detect iot botnet attacks,” Jan 2020. [Online].
Available: https://www.kaggle.com/mkashifn/nbaiot-dataset
40
[22] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description length,
and helmholtz free energy,” Advances in neural information processing systems,
vol. 6, pp. 3–10, 1994.
[23] Y. Ke and R. Sukthankar, “Pca-sift: A more distinctive representation for local im-
age descriptors,” in Proceedings of the 2004 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2. IEEE,
2004, pp. II–II.
[24] T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data
distribution for federated visual classification,” arXiv preprint arXiv:1909.06335,
2019.
[25] W. Liu, L. Chen, Y. Chen, and W. Zhang, “Accelerating federated learning via
momentum gradient descent,” IEEE Transactions on Parallel and Distributed Sys-
tems, vol. 31, no. 8, pp. 1754–1766, 2020.
[26] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initial-
ization and momentum in deep learning,” in International conference on machine
learning. PMLR, 2013, pp. 1139–1147.
41
A
42
1 32 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
1 64 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
1 128 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
Federated Learning Hyperparameter Tuning
Table 12: Hyperparameter Tunning for MNP methods, Ecpoh=5
Epochs Batchsize Evaluation Device
Metrics #1 #2 #3 #4 #5 #6 #7 #8 #9
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
5 16 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
43
5 32 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
5 64 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455
F1 score 99.993 99.989 99.937 99.921 99.976 99.98 99.939 99.97 99.987
5 128 TPR 0.99998 1 0.99984 0.99986 0.99995 0.99988 0.9996 0.9999 0.99994
FPR 0.00702 0.04118 0.04463 0.02282 0.02582 0.01056 0.01547 0.04315 0.04455