An Ensemble Multi-View Federated Learning Intrusio
An Ensemble Multi-View Federated Learning Intrusio
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI
ABSTRACT
The rise in popularity of Internet of Things (IoT) devices has attracted hackers to develop IoT-specific
attacks. The microservice architecture of IoT devices relies on the internet to provide their intended services.
An unguarded IoT network makes inter-connected devices vulnerable to attacks. It will be a tedious and
ineffective process to manually detect the attacks in the network, as the attackers frequently upgrade
their attack strategies. Machine learning (ML)-assisted approaches have been proposed to build intrusion
detection for cybersecurity automation in IoT networks. However, most such approaches focus on training an
ML model using a single view of the dataset, which often fails to build insightful knowledge and understand
each feature’s impact on the ML model’s decision-making ability. As such, the model training with a single
view may result in an incomplete understanding of patterns in large feature-set datasets. Moreover, the
current approaches are mainly designed in a centralized manner in which the raw data is transferred from
the edge devices to the central server for training. This, in turn, may expose the data to all kinds of attacks
without adhering to the privacy-preserving of data security. Multi-view learning has gained popularity for
its ability to learn from different data views and deliver efficient performance with more distinguished
predictions. This paper proposes a federated learning-based intrusion detection approach, called MV-FLID,
that trains on multiple views of IoT network data in a decentralized format to detect, classify, and defend
against attacks. The multi-view ensemble learning aspect helps in maximizing the learning efficiency of
different classes of attacks. The Federated Learning (FL) aspect, wherein the device’s data is not shared
to the server, performs profile aggregation efficiently with the benefit of peer learning. Our evaluation
results show that our proposed approach has higher accuracy compared to the traditional non-FL centralized
approach.
INDEX TERMS Internet of Things, IoT Security, Federated Learning, Neural Networks, Multi-view
Classification, Intrusion Detection System.
I. INTRODUCTION devices. The security measures [7] play a key role in the
There has been a sharp growth in the usage of Internet trustworthiness of these IoT devices and their services. Poor
of Things (IoT) devices in recent years. They interconnect security measures will make these devices vulnerable to a
with other digital and physical devices, which enables in- variety of cyber-attacks such as data leakage, Denial of Ser-
formation exchange and service delivery [1]. Flexible IoT vice (DoS) [8], which may disrupt the normal functionality
devices are currently used for inter-connecting knowledge of the end device. Several lightweight protocols have been
of cyber-physical systems in healthcare [2] [3], transporta- introduced [9] for effective communication within networks,
tion [4], smart homes [5], and smart cities [6], to name a one such protocol is Message Queuing Telemetry Transfer
few. IoT-aided devices are very ubiquitous nowadays due protocol (MQTT)[10]. MQTT protocol is mainly designed
to their massive adoption in various sectors. Even though for communication between devices with low bandwidth,
these devices play a prominent role in our everyday lives, making it an ideal solution for communication between IoT
many security complications are involved while using these devices. The publish and subscribe communication style of
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
the MQTT protocol helps to exchange information in client- process and can be subjected to various attacks. Though the
server communication through messaging. The network data centralized approach can detect attacks with good accuracy,
collected during information exchange from IoT devices the main overhead is the cost associated with transmitting
using MQTT protocol can be used to detect intruders in the IoT network data to the server. The process takes more time
network. because IoT devices and the central server (intrusion detec-
Intrusion detection is crucial in an IoT network as the tion system) are geographically isolated. Moreover, there will
intruders can attack and takeover, not just the IoT devices be instances where the data contains sensitive information
but the other devices connected to the IoT network. Intrusion that needs to be secured; therefore, sharing such data over
Detection System (IDS) techniques can be classified into the network will make data prone to various attacks, leading
main categories: Signature-based and Anomaly-based Intru- to critical consequences. The centralized methodology does
sion Detection. In the signature-based intrusion detection, not ensure user data privacy, and the latency cost is high in
there are a set of pre-defined malicious patterns of attacks, this paradigm. The edge computing paradigm solves latency
the device detects an attack when it encounters any of the by bringing the data and computational resources close to the
known patterns, whereas the IDS using anomaly-based de- end device, but this does not provide any privacy-preserving
pends on the deviations of normal behavior for detecting methods for data. Knowledge sharing is not supported in such
malicious activity. Signature-based detection performs better edge computing manner, i.e., information of new attacks or
than anomaly-based in terms of efficiency due to known at- change of behavioral information of existing attacks that any
tacks’ information availability but fails to detect new attacks. device encounters is not shared with latter devices.
While, anomaly-based detection is more capable of detecting Federated Learning (FL) [15] is an ideal solution for
new attacks based on the traffic deviations compared to performing on-device training while maintaining privacy-
normal behavior. Adapting machine learning algorithms for preserving methods using decentralized data. In recent years,
anomaly-based detection techniques will increase the self- FL became a widely adopted solution for ensuring privacy
learning process and help develop more intelligent systems [16, 17, 18, 19] of the end-user data and updates with low
for detecting attacks in IoT environments. latency. This technology addresses the limitations of cen-
Recent advancements in network intrusion detection have tralized and edge-computing paradigms and marking it as
shown that having knowledge on multiple view behavior of an outstanding ML technique for maintaining data privacy
an attack will achieve better performance rather than a single while providing knowledge sharing among peers. FL uses an
view feature set [11]. Multi-View learning is a new paradigm exceptional strategy [20] in which a trained ML model will be
in which a distinct function is used to model a particular shared between multiple devices in the network and devices
view and combines all functions to exploit redundant views that download the shared base ML model will train it with
of input data. In multi-view, the data is trained alternately its local data and computational resources that are available
to increase mutual agreement on two distinct views of input with that device. After training, the devices share the updated
data. Learning tasks in a multi-view strategy are done with local model parameters back to the server for performing
abundant information. Learning from the training of multiple aggregation of their locally computed parameters. Consid-
views can be combined to have an efficient outcome. As we ering privacy measures, the aggregation process is made so
can consider network data into multi-view form, the same that the server has no access to devices’ training data. Using
way the attack information can also be analyzed as multiple this paradigm, in this work, we are proposing an intrusion
views. The behavioral information of attacks can be varied detection technique called Multi-view Federated Learning
from one view to the other. based Intrusion Detection (MV-FLID) that uses decentralized
In the conventional ML-based approach, the entire data is data for performing training and inference procedures at the
considered as a single feature set and it requires more amount device’s end. The device’s data is not being shared with
of training time to find any deflections in few features of the central server or any other external devices, thereby
input data for detecting an attack. Whereas in the multi-view maintaining the security and privacy of the device’s data.
approach, we consider a set of features in multiple views FL aggregation [21] [22] will be done at the server end by
rather than a single feature set. As the features set can be collecting all trained models of devices that participate in the
reduced in multi-view, the training can be more efficient, FL process. Following is a list of contributions we made in
which leads to detection of attack with more accuracy. Hav- this research:
ing such view-level intelligence can be helpful in the better
learning process of attacks and detect abnormal behavior in 1) Proposing an intrusion detection approach with multi-
any view of network data. Most of the existing works use view information of IoT network data using federated
centralized ML techniques for intrusion detection [12] [13] learning methodology.
[14]. Typically, IoT devices in the real world are placed 2) Integrating Grey Wolves Optimization mechanism for
far away from the location of the central server. The traffic extracting optimized feature sets in the proposed ap-
logs generated by these IoT devices are tremendous. Sharing proach.
network logs data of IoT devices with the central server for 3) Devising an ensemble-based method for detecting at-
intrusion detection at every point of time is a cumbersome tacks from multiple views in the proposed approach.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
4) Obtaining high accuracy results in detecting attacks Researchers in [27] presented an Intelligent Intrusion de-
compared to traditional machine learning approaches tection using FL approach and Long Short Term Memory
that use a single feature-set of attack information in a (LSTM) recurrent neural network. LSTM networks have cell
non-decentralized manner. state and memory state within to hold required information
The remaining of the paper is structured in the following on long inputs of data. This model does compare the effi-
manner. Section II gives the related work. Section III presents ciency of the proposed algorithm with conventional neural
the proposed approach and illustrates the underlying archi- networks using Adam optimizer and achieves an accuracy
tecture with implementation details. Section IV presents the of 99.21% and an F1 score of 99.21. It used the supervised
dataset, metrics, and evaluation results and summarizes our learning mechanism while performing training procedures.
findings. Finally, Section V concludes the paper. On the other hand, the authors in [28] put forward a deep
anomaly detection framework for sensing time series data us-
ing the FL approach on distributed edge devices for Industrial
II. RELATED WORK
IoT. It also incorporates a CNN-LSTM model for extract-
Intrusion Detection has become a foremost thing to be ing fine-grained features of historical observations sensing
considered while utilizing IoT devices [13]. Many research time series data and uses LSTM modules for time-series
works have been carried out on Intrusion detection using ML predictions, thereby preventing memory loss and gradient-
methodologies. In this section, we discuss some of the recent dispersion problems.
advancements proposed for Intrusion detection techniques. Most of the works were carried out using a single view
The researchers in [23] used a supervised artificial neural of network data for detecting attacks of the aforementioned
network which is based on multi-level perceptron for training proposed approaches. There have been advancements in in-
and testing threat patterns in an IoT Network. They have trusion detection techniques using the multi-view informa-
achieved 99.4% accuracy in detecting Denial of Service tion of attacks. Authors in [11] proposed a semi-supervised
(DoS) / Distributed denial of service (DDoS) attacks. co-training approach using multi-view nature of attacks. In
Similarly, authors in [24] proposed an intrusion detection this approach, attack behavior will be maintained in multiple
approach using Feed-forward neural networks and multi- views, and attack detection will be done using the predictions
class classification for detecting attacks like DoS/DDoS, re- done by ML models of multiple views of an attack. They have
connaissance, and information theft attacks in IoT networks. used a centralized approach for implementing their research
This method involves collecting data from raw network and had an active labeling procedure for labeling unknown
packets using an analyzer tool. The analyzer tool captures attacks by experts. Researchers in [10] have introduced
all the generic features of the raw network traffic. The Pre- multi-view features of MQTT data and evaluated features
processed data is then used to train a Deep Neural network, using centralized ML algorithms. Authors in these works
which is used as a classifier for new incoming packet data. have proposed their methodologies in a centralized approach.
The classifier then labels the malicious packet into a specific In our work, we proposed Federated Learning methodology
category of attack. (MV-FLID) to train network data segmented into three views
Authors in [22] presented a comprehensive design of the in IoT devices. We used an ensembler at the end of deep
FL system. Unlike traditional ML architecture, FL utilizes a learners for detecting attacks based on their occurrence.
decentralized approach. A variety of FL techniques have been To abridge, most of the research works have been carried
proposed in the field of cybersecurity and also for intrusion out on a single feature set of data using a centralized mecha-
detection in IoT networks. nism and lags in using decentralized approaches for effective
For example, the authors in [25] proposed a Self-learning communication and intrusion detection in IoT environment.
anomaly detection system using an FL approach for detecting Our approach considers the limitations of existing works and
compromised IoT devices in the network. It utilizes Long proposes a multi-view decentralized approach for intrusion
Short Term Memory (LSTM) and Gated Recurrent Units detection.
(GRU's) for building the proposed model. It constantly mon-
itors the devices’ network traffic and detects anomalous devi- III. PROPOSED APPROACH
ations from given communication profiles of devices'without This section presents our proposed approach called MV-
human intervention. It has achieved an accuracy of 98.2% FLID, which integrates base classifiers trained on multiple
and can detect 95.6% of attacks in 257ms with less false views of network data to detect various attacks in IoT de-
alarm rate. vices. The list of acronyms used is given in Table 1.
Likewise, research work in[26] presents a deep learning
model by combining NIDS and HIDS to detect cyberattacks.
They assessed multiple stages of attacks and evaluated ma- A. ARCHITECTURE
chine and deep learning algorithms on various NIDS and Our proposed approach is depicted in Figure 1 in which
HIDS datasets. The best-performed algorithm is chosen for multiple virtual IoT instances are connected to Security
the list, and it was further evaluated on other multiple datasets Gateways, which are distributed at different sites. The MQTT
for efficiency. protocol in the session layer of these IoT devices will share
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
Directional Features (Uniflow view), and Packet-Features design neural network ML models for the three views, i.e.,
(Packet view). The multi-view features of packet are listed for Uniflow features, Biflow features, and Packet features
in Table 2. of data shared by MQTT protocol. The ML models will be
trained on the patterns of different kinds of attacks and a
soft-max function is employed at the end of the network to
2) Grey Wolves Optimization classify network traffic into attack categories. The feature
Feature Selection using Grey Wolves Optimization (GWO) sets extracted in the pre-processing data phase were given
[29] technique is a hunting kind of mechanism which can as input to the classification models wherein the ML models
be implemented for generating optimal feature set from the will learn the temporal information of these feature sets, and
given list of features in the training data. This heuristic the ML models get trained on the given knowledge.
approach attains the best feature set based on the accuracy of We consider this set of ML models as the base ML models
the features obtained at multiple iterations of the features list. of the corresponding views, and the trained ML models will
Therefore, the final features set obtained after implementing be shared with IoT devices for inference of the attacks on the
the GWO mechanism will have more accuracy. The GWO real-time network data of the device.
mechanism works on a strategy that has four different cate-
gories of wolves called alpha, beta, gamma, and delta. The 4) FL Training Process
wolves have a strict dominance hierarchy wherein Alpha(α) The FL process is implemented with the available number
is considered as the leader of the group who is conscientious of virtual instances. Each instance has its training data using
and responsible for making decisions. The next category of a which the shared view specific ML models get trained, and
wolf in the hierarchy is the beta (β) wolf which is subordinate shares updated parameters with the central server for aggre-
to the Alpha (α). Sometimes beta (β) wolves will find the gation purposes. The information presents in network data
best fitness and acknowledge αabout the positions. The other of IoT devices is segmented into three views and segmented
lower-ranking wolves are called omega (ω) and delta (δ). data is considered for training. In our work, we used three
α is considered as the best fitness solution. β and δ are deep learning ML models for training data present in three
considered as the next fitness for the sub-optimal solution. views. The communication rounds in the FL process are
All these categories of wolves will be scattered into n number the number of times the aggregation procedure is being
of dimensions and collectively tries to find the best solution implemented for locally trained ML models of three views in
for the given features on multiple iterations. IoT devices. The steps underlying this process are as follows:
The wolves hunting process at a given point of time can be
Step 1: The initial ML models of three views are shared
described in the below equations.
with the virtual IoT devices.
G = |C.Fprey (t) − Fwolf (t)| (1) Step 2: After getting base ML models of three views from
the server, the ML models will undergo a re-training process
using local network data of the device.
F (t + 1) = Fp (t) − A.G (2)
Step 3: View-specific features information of network data
is given as input to Biflow, Uniflow, and Packet View ML
A = 2ar1 − a (3) models to predict the class of an attack. All three classifi-
cation ML models will predict the attack class based on the
training parameters set in the initial model.
C = 2r2 (4)
Step 4: From the list of different probabilities of attack
In Equations 1 and 2 G is the wolves current position classes by output layer, the ML model gives an attack class
for the current time iteration t, whereas F represents the that has the highest probability from the list of predicted
position vector and Fprey denotes current position vector of attacks as output.
the prey(target). A & C are the coefficients calculated using Step 5: After training the ML models for several epochs
variable in Equations 3 and 4. r1, r2 are random vectors that locally on the device data, view-classification models of all
lie in the interval [-1.28, 1.28] and a is an arbitrary variable IoT devices are sent to the server for performing FL Aggre-
that linearly decreases from 2 to 0. gation of the corresponding view-classification ML models.
All the parameters of the view-classification ML models of
3) View Specific ML Models multiple IoT devices are aggregated, and a Global ML model
Referring to the architecture, in this module, a feed-forward is created for a corresponding view-classification model.
Artificial Neural Network based on a back-propagation train- Step 6: After completion of FL Aggregation, Global mod-
ing algorithm is used for training procedures. The structure els are shared back to IoT devices, and the current classifica-
of the proposed neural network consists of an input layer, tion models in the IoT devices are replaced with shared global
hidden layers, and an output layer. The number of inputs was models, and the training and inference procedure continues
determined by the input features set. As we are dealing with on local data.
the multiple views of the network data, in this module, we
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
Step 7: An ensembler is engaged and the attack predictions Next, followed by the Averaging of computed param-
of all view-classification models are sent to the ensembler for eters of models using F Laverage function. Finally, the
selecting the class based on the majority predicted class from EnsemblerLogic function consists of logic for predicting
the trained classifiers. the attack using test data of views that are passed to corre-
sponding trained view models.
5) FL Aggregation
Federated Aggregation is a computation algorithm in which Algorithm 1: FL Process
a group of devices connected to a network hold private net- Input: T rain and T est Data of M QT T Dataset
work data and collaborate in computing an aggregated model Output: Intrusion Detection in M ulti −
without revealing any sensitive information of the device view N etwork Data
and sharing only the locally-computed parameters of the Data: mqtt_dataset
device models for performing aggregation. Each device trains 1 mb , mu , mp /* ML models for three Views */
its models of three views locally for a selected number of 2 Reading Input data of three views
epochs before sharing its updated parameters with the central 3 V iews = Db , Du , Dp /* data for three views */
server. This process limits the communication overhead of 4 d = d1 , d2 ...d10 /* Initiating Virtual Devices
the device because of sharing only the trained models of
*/
views rather than view specific information of local data. 5 Function featureEngineering_GWO(V iews):
1 X 6 for view in V iews do
Fk (w) = fi (w) (5) 7 while max_iterations do
nk
i∈Pk /* Set Random Positions of Wolves
K */
X nk 8 α, β, γ, δ = random_positions
f (w) = Fk (w) (6)
n update(current position of wolves)
k=1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
1) The features extraction using Grey Wolves Optimiza- Table 2: Dataset Features
tion (GWO) [31] method for extracting the optimal View Description Total
feature set from the list of available features for three ’ip_src’, ’ip_dst’, ’prt_src’, ’prt_dst’, ’proto’,
’fwd_num_pkts’,’bwd_num_pkts’, ’fwd_mean_iat’,
views i.e., Uniflow, Biflow, and Packet features. Bi-Flow
’bwd_mean_iat’, ’fwd_std_iat’,’bwd_std_iat’,
27
2) The server initializes the base Artificial Neural Net- ’fwd_min_iat’,’num_psh_flags’, and so on
work models for these three views using the optimal ’timestamp’, ’src_ip’, ’dst_ip’, ’protocol’, ’ttl’,
’ip_len’, ’ip_flag_df’,’ip_flag_mf’, ’ip_flag_rb’,
feature sets obtained for corresponding views in the Packet ’src_port’, ’dst_port’, ’tcp_flag_res’, ’tcp_flag_ack’, 17
feature extraction process. The structure of these ML ’tcp_flag_push’, ’tcp_flag_reset’, ’tcp_flag_syn’,
models and their hyper-parameters are defined at this ’tcp_flag_fin’
’ip_src’, ’ip_dst’, ’prt_src’, ’prt_dst’, ’proto’,
stage. The defined neural network models perform an ’num_pkts’, ’mean_iat’,’std_iat’, ’min_iat’,
initial training round with available training data with Uni-Flow ’max_iat’, ’mean_pkt_len’, ’num_bytes’, 18
different categories of attacks. ’num_psh_flags’, ’num_rst_flags, ’num_urg_flags’,
’std_pkt_len’, ’min_pkt_len’, ’max_pkt_len’
3) The ML models produced in step (2) are shared with
Security gateways that are participating in the FL pro-
cess. B. DATASET
4) Once the security gateways download intrusion detec- To evaluate our approach, we have used a lightweight MQTT
tion models of the server’s corresponding view, the protocol dataset[10], which simulates realistic IoT device
learning parameters of the ML models are enhanced communication. The MQTT dataset consists of five recorded
based on IoT devices’ local network data. scenarios of 1 normal operation and 4 attack scenarios. It
5) As the ML models were trained with a certain number has both common network scanning attacks and brute-force
of attacks at the server, they will detect and identify attacks. MQTT protocol communication datasets are widely
the abnormal behavior in network traffic of IoT devices adopted [33] [34] for building an effective Intrusion Detec-
that assists in detecting attacks. tion model for IoT devices.
6) Only parameters of updated models are shared to the The processed features of the MQTT dataset can be cate-
server for aggregation process instead of sharing sen- gorized into Unidirectional, bi-directional, and packet-based
sitive information of network traffic data and creating features. The features of corresponding groups are listed
scope for data theft. in Table 2. The distribution of attacks in each view are
7) The server aggregates the weights obtained from dif- illustrated in Figure 3
ferent gateway models and creates new sophisticated
and updated models for corresponding views, which C. EVALUATION METRICS
are communicated back to security gateways after suc- In our approach, we use Accuracy, Precision, Recall, and
cessful aggregation. F1-score as our metrics for measuring the performance of
8) Each security gateway uses an updated model on new classification models.
upcoming traffic data. True Positive (TP) : The total number of attack records that
The above steps (5), (6), and 7 will be repeated to enhance were correctly classified as an attack.
the learning process, improve models’ accuracy and keep the True Negative (TN) :Total number of normal records that
global ML up-to-date with the latest data. are accurately classified as normal.
False Positive (FP): Total number of normal records that
were incorrectly classified as an attack.
IV. EVALUATION RESULTS False Negative (FN): Total number of attack records that
In this section, we discuss the environment setup , the were incorrectly classified as benign.
dataset, and its views segmentation used for implementing Performance Metrics:
our proposed approach. Then we walk through the evaluation
metrics used for analyzing the performance of our proposed Accuracy: It is defined as ratio of correctly classified
approach. In the final part, we present our experimental records to the total number of records.
results and provide discussion. TN + TP
Accuracy =
TN + FP + TP + FN
A. EXPERIMENTAL SETUP F1-Score: It is the Harmonic mean of Precision and Re-
For evaluating our proposed approach MV-FLID, we have call.
used Tesla V100-SXM2-16GB GPU(Graphics Processing Recall ∗ P recision
F 1 − score = 2 ∗
Unit) server that was hosted as a backend for the Google Recall + P recision
Colab Pro environment. We have implemented FL approach Precision: The ratio of Truly Positive to the Total number
using PySyft [32] deep learning framework which is based of results predicted positive.
on Pytorch deep learning framework. TP
P recision =
FP + TP
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
TP
Recall =
FN + TP
D. RESULTS
We have conducted our experiments using the Pysyft deep
learning framework with 10 virtual IoT devices in a Feder-
ated setting to evaluate our approach. We have implemented
ten rounds of aggregation of models that are trained using
local data of the corresponding device. Figure 4 shows the
device’s accuracy trends after every round of Federated Ag-
Figure 6: Evaluation metrics comparison of our proposed
gregation. We have distributed training and test data with
approach and non-FL approach
devices to enable a self-learning process. We have achieved
significant accuracy (as shown in Figure 4) after each round
of communication while preserving the privacy of the data.
2) We have defined attack classification models for three
Federated Learning methodology has spurred the efficiency
views i.e., Biflow View, Packet View, and Uniflow
of the knowledge-sharing process regards to diverse behavior
View.
of attacks among devices. The deep learning models in these
3) Next, we carried out training procedures for three-view
devices get trained with much information of attacks in
for a given number of epochs.
every round of communication, leading to an increase in our
4) As this procedure comes under a centralized approach,
accuracy.
there is a single round of training for view classification
models.
1) Comparison with Non-FL Approach
The results of the training round are shown in Figure 5. As
We implemented the non-FL version of our proposed ap- can be seen from the figure, the Biflow and Uniflow view
proach using the Pytorch deep learning framework to com- deep learners yielded more accuracy than that of the packet
pare its evaluated results with the FL version. flow view. The same tendency continues with the loss values
In the FL method, there are ten rounds of communication of models. The packet view deep learner has taken more steps
has been carried out. In every round of communication, for reaching an optimal minimum. The training accuracy in
any change in behavior of attack information encountered packet view is less in the Non-FL approach and required
by any device during training is being shared with latter more information to identify abnormal behavior in packet
devices at the time of aggregation. Whereas in the Non- view. Improving parameters of packet view deep learners
FL implementation, knowledge sharing is not happening. by knowledge sharing technique enhances its efficiency in
Also, the amount of data used for training in a centralized detecting abnormal behaviors in packet view.
approach is huge compared to the amount of data used in the In a side-by-side fashion, we compared the evaluation
decentralized methodology. The procedure of implementing metrics of trained models of multiple views from the non-
Non-FL version are as follows: FL setup with the metrics of trained view-model instances
1) We performed Data Pre-Processing techniques on view after completing the 10th round of FL averaging process of
datasets and extraction of valuable features set. the proposed approach. Figure 6 illustrates the comparison
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
more efficiently.
2) Security Analysis
The previous results (presented in Figure 4) were measured
with the premise that devices and their traffic data be le-
gitimate. To further evaluate the security of our proposed
approach in the presence of adversary/malicious traffic in the
network, we considered three devices with malicious traffic
for this analysis. In Figure 7, we have the accuracy trends of
devices that contain legitimate and malicious traffic. We used
devices 8, 9, and 10 with malicious traffic data. We imple-
mented a poison attack by flipping the training data labels for
Figure 8: Evaluation metrics results of proposed approach infected nodes to analyze the efficiency of our approach. As
with adversary nodes shown in Figure 7, the accuracy trends of infected devices
are low compared to other devices with legitimate traffic.
The training models of infected devices were also considered
of the evaluation metrics for FL and non-FL implementa- while performing the aggregation process. In the federated
tions. Compared to the results of the non-FL approach, our learning mechanism, parameters of on-device trained models
proposed approach using FL has achieved a higher amount of multiple devices will be combined in the aggregation
of accuracy in detecting attack behavior in network data. process. It promotes knowledge sharing by averaging the
The proposed approach has maintained a good amount of parameters of trained models from all participating devices,
accuracy than the Non-FL approach. Using Federated Learn- thereby reducing the negative impact of adversary samples
ing methodology, the network data of IoT devices will get and generating a global optimized model as an outcome.
trained after every round of aggregation process. As the Though the accuracy trends of infected devices are less, the
aggregation includes merging of Deep Learning models of global model that we generate after the federated aggre-
multiple views, the knowledge of attack behavior across gation process has maintained optimal performance. Figure
multiple devices is being shared, which leads to an increase 8 shows the evaluation metrics of our proposed approach
in the efficiency of detection of attacks. Whereas, In the with malicious traffic data. Our results show that the global
Non-FL methodology, training procedure is implemented model obtained after the 10th round of aggregation has
at the server end and a variety of attack behavior of IoT maintained optimal accuracy by the use of federated learning
devices is not being shared to the server for global perception. in detecting attacks. To strengthen our approach, we are
Knowledge sharing, which the FL method supports, plays the focusing on a mechanism that can forestall poisoning attacks
most significant role in increasing the detection of attacks by implementing an outlier detection filtering technique.
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
[12] Bruno Bogaz Zarpelão, Rodrigo Sanches Miani, Cláudio Toshio Szymkow, Bobby Wagner, Emma Bluemke, Jean-Mickael Nounahon,
Kawakani, and Sean Carlisto de Alvarenga. A survey of intrusion Jonathan Passerat-Palmbach, Kritika Prakash, Nick Rose, Théo Ryffel,
detection in internet of things. Journal of Network and Computer Zarreen Naowal Reza, and Georgios Kaissis. PySyft: A Library
Applications, 84:25–37, 2017. for Easy Federated Learning, pages 111–139. Springer International
[13] N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, and P. Faruki. Publishing, Cham, 2021.
Network intrusion detection for iot security based on learning tech- [33] Hector Alaiz-Moreton, Jose Aveleira-Mata, Jorge Ondicol-Garcia, An-
niques. IEEE Communications Surveys Tutorials, 21(3):2671–2701, gel Luis Muñoz-Castañeda, Isaías García, and Carmen Benavides.
2019. Multiclass classification procedure for detecting attacks on mqtt-iot
[14] Shahid Raza, Linus Wallgren, and Thiemo Voigt. Svelte: Real- protocol. Complexity, 2019, 2019.
time intrusion detection in the internet of things. Ad hoc networks, [34] Ivan Vaccari, Giovanni Chiola, Maurizio Aiello, Maurizio Mongelli,
11(8):2661–2674, 2013. and Enrico Cambiaso. Mqttset, a new dataset for machine learning
[15] Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, and Fahad techniques on mqtt. Sensors, 20(22):6578, 2020.
Saeed. Federated learning: A survey on enabling technologies, pro-
tocols, and applications. IEEE Access, 8:140699–140725, 2020.
[16] Zhuo Chen, Na Lv, Pengfei Liu, Yu Fang, Kun Chen, and Wu Pan.
Intrusion detection for wireless edge networks based on federated
learning. IEEE Access, 8:217463–217472, 2020.
[17] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y. C. Liang, Q. Yang,
D. Niyato, and C. Miao. Federated learning in mobile edge networks:
A comprehensive survey. IEEE Communications Surveys Tutorials,
22(3):2031–2063, 2020.
[18] Viraaji Mothukuri, Prachi Khare, Reza M. Parizi, Seyedamin Pouriyeh,
Ali Dehghantanha, and Gautam Srivastava. Federated learning-based
anomaly detection for iot security attacks. IEEE Internet of Things
Journal, pages 1–1, 2021.
[19] Viraaji Mothukuri, Reza M. Parizi, Seyedamin Pouriyeh, Yan Huang,
Ali Dehghantanha, and Gautam Srivastava. A survey on security and
privacy of federated learning. Future Generation Computer Systems,
115:619–640, 2021.
[20] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated
machine learning: Concept and applications. ACM Transactions on
Intelligent Systems and Technology (TIST), 10(2):1–19, 2019.
[21] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone,
H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and
Karn Seth. Practical secure aggregation for federated learning on user-
held data, 2016.
[22] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba,
Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Ste-
fano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David
Petrou, Daniel Ramage, and Jason Roselander. Towards federated
learning at scale: System design, 2019.
[23] E. Hodo, X. Bellekens, A. Hamilton, P. Dubouilh, E. Iorkyase, C. Tach-
tatzis, and R. Atkinson. Threat analysis of iot networks using artificial
neural network intrusion detection system. In 2016 International
Symposium on Networks, Computers and Communications (ISNCC),
pages 1–6, 2016.
[24] M. Ge, X. Fu, N. Syed, Z. Baig, G. Teo, and A. Robles-Kelly. Deep
learning-based intrusion detection for iot networks. In 2019 IEEE
24th Pacific Rim International Symposium on Dependable Computing
(PRDC), pages 256–25609, 2019.
[25] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan,
and A. Sadeghi. DÃŔot: A federated self-learning anomaly detection
system for iot. In 2019 IEEE 39th International Conference on
Distributed Computing Systems (ICDCS), pages 756–767, 2019.
[26] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-
Nemrat, and S. Venkatraman. Deep learning approach for intelligent
intrusion detection system. IEEE Access, 7:41525–41550, 2019.
[27] Ruijie Zhao, Yue Yin, Yong Shi, and Zhi Xue. Intelligent intrusion
detection based on federated learning aided long short-term memory.
Physical Communication, 42:101157, 2020.
[28] Yi Liu, Sahil Garg, Jiangtian Nie, Yang Zhang, Zehui Xiong, Jiawen
Kang, and M Shamim Hossain. Deep anomaly detection for time-
series data in industrial iot: A communication-efficient on-device
federated learning approach. IEEE Internet of Things Journal, 2020.
[29] Seyedali Mirjalili, Seyed Mohammad Mirjalili, and Andrew Lewis.
Grey wolf optimizer. Advances in engineering software, 69:46–61,
2014.
[30] Cha Zhang and Yunqian Ma. Ensemble machine learning: methods
and applications. Springer, 2012.
[31] H. Haddadpajouh, A. Mohtadi, A. Dehghantanaha, H. Karimipour,
X. Lin, and K. K. R. Choo. A multi-kernel and meta-heuristic feature
selection approach for iot malware threat hunting in the edge layer.
IEEE Internet of Things Journal, pages 1–1, 2020.
[32] Alexander Ziller, Andrew Trask, Antonio Lopardo, Benjamin
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3107337, IEEE Access
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/