0% found this document useful (0 votes)
37 views10 pages

Intrusion Detection Based On Privacy-Preserving Federated Learning For The Industrial IoT

Intrusion detection system

Uploaded by

I hate Scammers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views10 pages

Intrusion Detection Based On Privacy-Preserving Federated Learning For The Industrial IoT

Intrusion detection system

Uploaded by

I hate Scammers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO.

2, FEBRUARY 2023 1145

Intrusion Detection Based on Privacy-Preserving


Federated Learning for the Industrial IoT
Pedro Ruzafa-Alcázar, Pablo Fernández-Saura, Enrique Mármol-Campos, Aurora González-Vidal ,
José L. Hernández-Ramos , Jorge Bernal-Bernabe , and Antonio F. Skarmeta , Member, IEEE

Abstract—Federated learning (FL) has attracted signifi- represent a well-known approach for early detection of IoT
cant interest given its prominent advantages and applicabil- attacks and cyber-threats [1]. In recent years, IDS mechanisms
ity in many scenarios. However, it has been demonstrated are usually based on artificial intelligence (AI) techniques, so
that sharing updated gradients/weights during the training
process can lead to privacy concerns. In the context of the that the system is trained with devices’ network traffic to ac-
Internet of Things (IoT), this can be exacerbated due to curately detect any anomalous behavior, which could represent
intrusion detection systems (IDSs), which are intended to a certain type of attack [2]. Indeed, AI-based IDS are trained,
detect security attacks by analyzing the devices’ network considering monitored network traffic and behavioral data from
traffic. Our work provides a comprehensive evaluation of
heterogeneous IoT devices deployed in remote, possibly un-
differential privacy techniques, which are applied during
the training of an FL-enabled IDS for industrial IoT. Unlike trusted, and distributed domains and systems, to increase the
previous approaches, we deal with nonindependent and overall accuracy for attack detection. However, this approach
identically distributed data over the recent ToN_IoT dataset, sparks privacy issues as different domains might need to share
and compare the accuracy obtained considering different their private data [3].
privacy requirements and aggregation functions, namely As an alternative to typical centralized learning approaches,
FedAvg and the recently proposed Fed+. According to our
evaluation, the use of Fed+ in our setting provides similar federated learning (FL) was proposed in 2016 [4] as a collab-
results even when noise is included in the federated train- orative learning approach, in which an AI algorithm is trained
ing process. locally across multiple decentralized edge devices, called clients
Index Terms—Differential privacy (DP), federated learn-
or parties, and the information is continuously updated onto a
ing (FL), Internet of Things (IoT), intrusion detection sys- global model through several training rounds. Instead of sharing
tems (IDSs), machine learning. their data, parties share their models with an aggregator, which
computes a global model. Nonetheless, FL suffers from privacy
issues, as the global model’s updates provided by parties could
I. INTRODUCTION
be used to launch several attacks to infer the private information
S THE Internet of Things (IoT) expands, there is a signif-
A icant increase in the number and impact of security vul-
nerabilities and threats associated with IoT devices and systems.
of the training data [5]. To mitigate such privacy concerns, differ-
ential privacy (DP) [3] can be employed to obfuscate either the
training data or model updates, giving statistical privacy guaran-
To cope with such concerns, intrusion detection systems (IDSs) tees over the data against an adversary. DP is usually considered
in the scope of FL settings due to the stringent communication
Manuscript received 30 July 2021; revised 25 October 2021; ac- requirements of other privacy-preserving approaches, such as
cepted 31 October 2021. Date of publication 9 November 2021; date
of current version 13 December 2022. This work was supported in
secure multiparty computation (SMC) [6].
part by the UMU-Campus Living Lab under Grant EQC2019-006176- While the use of DP techniques has been considered in FL [7],
P funded by ERDF funds, by Grant PID2020-112675RB-C44 funded [8], existing works do not analyze the impact of such techniques
by MCIN/AEI/10.13039/5011000011033 (ONOFRE Project), by the Eu-
ropean Commission through EU Projects the PHOENIX under Grant
in the scope of IDS approaches, and do not address the impact
893079 and through the DEMETER under Grant 857202; in part by of different aggregation methods considering nonindependent
the European Social Fund (ESF); and in part by the Youth European and identically distributed (non-i.i.d.) data distributions, which
Initiative (YEI) under the Spanish Seneca Foundation (CARM). Paper
no. TII-21-3297. (Corresponding author: Aurora González-Vidal.)
are common in real-world scenarios. In this direction, our
Pedro Ruzafa-Alcázar, Pablo Fernández-Saura, Enrique Mármol- work provides a comprehensive evaluation of DP approaches
Campos, Aurora González-Vidal, Jorge Bernal-Bernabe, and Antonio F. through several additive noise techniques based on Gaussian and
Skarmeta are with the Department of Information and Communica-
tion Engineering, University of Murcia, 30100 Murcia, Spain (e-mail:
Laplacian distributions, which are applied during the training of
[email protected]; [email protected]; enrique.marmol@ an FL-enabled IDS for industrial IoT (IIoT). Unlike previous
um.es; [email protected]; [email protected]; skarmeta@ approaches, our evaluation is based on an instance selection
um.es).
José L. Hernández-Ramos is with European Commission, Joint
process to deal with non-i.i.d. data distributions over the recent
Research Centre, 21027 Ispra, Italy (e-mail: jose-luis.hernandez- ToN_IoT dataset [9], which contains recent attacks related to
[email protected]). such scenarios. Furthermore, unlike the current state of the art,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2021.3126728.
our evaluation compares the accuracy obtained by using different
Digital Object Identifier 10.1109/TII.2021.3126728 DP techniques and a recently proposed aggregation function

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
1146 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

called Fed+ [10], which provides significantly better accuracy GAN can also be considered to launch membership inference
results compared to the traditional FedAvg function [4]. To the attacks, in which an attacker could infer if local data of a certain
best of our knowledge, this is the first effort to analyze the party were used for the training process [6]. These attacks can
impact of non-i.i.d. data and aggregation functions for the imple- be also carried out by external entities to the training process,
mentation of a privacy-preserving FL-enabled IDS in IoT/IIoT as well as compromised FL clients. In the context of IoT/IIoT,
scenarios. the impact of these attacks can be significant due to the potential
Based on this, the main contributions and novelties of this sensitivity of the network data required for the implementation
article are as follows. of an IDS in such scenarios.
1) A thorough evaluation on the feasibility and performance To deal with the aforementioned privacy issues, different tech-
of applying DP-enabled FL to detect attacks in IoT sce- niques have been postulated, including SMC and DP [5]. SMC
narios, adapting and partitioning the ToN_IoT dataset uses different cryptographic protocols to jointly calculate a func-
considering non-i.i.d. data distributions. tion by using a set of input values, which are kept private for the
2) An empirical analysis of using different FL-aggregation parties. In the case of an FL environment, the parameters/weights
methods with DP techniques, and their impact on the produced by FL clients are kept private when they are fused
effectiveness for intrusion detection in IoT. by the aggregator. However, recent works [8] highlight the high
3) This article accomplishes the first complete quantitative computational and communication requirements associated with
and computational performance analysis of diverse DP the use of SMC that can make these techniques unfeasible for
perturbation mechanisms applied to FL for intrusion de- IoT environments. Moreover, DP is based on injecting random
tection in IoT, using different privacy-factor values and noise into a dataset so that, looking at the output of a certain
FL settings. function over the dataset, it is not possible to discern if a certain
The rest of this article is organized as follows. Section II data was used in such a dataset.
provides an overview of FL, highlighting the main privacy issues According to [3], the formal definition of DP is based on two
and potential mitigation approaches. Section III describes the concepts.
DP-enabled FL architecture, including the proposed training Definition 1 (Mechanism): A mechanism κ is a random func-
algorithm based on several DP techniques. Section IV de- tion that takes a dataset D and outputs a random variable κ(D).
scribes our methodology for the proposed privacy-preserving For example, if the input is an IoT attacks dataset, then the
FL-enabled IDS, including the aspects of the used dataset, clas- output can be the flow duration plus noise from the standard
sification model, and aggregation functions. Section V describes normal distribution. In our case, the inputs will be the weights
the evaluation results. Section VI analyzes the current state of of a model.
the art. Finally, Section VII concludes this article. Definition 2 (Distance): The distance of two datasets D and
D denotes the minimum number of sample changes that are
required to change D into D .
II. PRIVACY-PRESERVING FL For example, if D and D differ on at most one individual,
The training process in FL is based on a set of rounds, in there is d(D, D ) = 1. We also call such a pair of datasets
which the coordinator selects a subset of clients depending neighbors.
on the problem context, and sends the parameters of a global Definition 3 (Differential Privacy): A mechanism κ satisfies
model to them. Then, those parameters are updated by each (, δ)-DP if and only if for all neighbor datasets D and D , and
client using their own collected data and they are sent back to ∀S ⊆ Range(κ), as long as the following probabilities are well
the coordinator. The entities make use of a certain aggregation defined, there holds
method to fusion the parameters received from the clients. While
FedAvg [4] represents the most common approach, in which a P r(κ(D) ∈ S) ≤ e P r(κ(D ) ∈ S) + δ
simple average is applied over such parameters, other methods where δ,  > 0
have recently been proposed to increase the accuracy of the
trained model, particularly in settings with non-i.i.d. data distri- δ represents the probability that a κ output varies by more than
butions. Indeed, as described in Section V, our work evaluates a factor of e when applied to a dataset and any one of its
the recent Fed+ algorithm [10], which clearly improves FedAvg neighbors. This definition captures the intuition that a computa-
for our particular scenario. tion on private data will not reveal sensitive information about
The main advantage of FL is that parties do not need to share individuals in a dataset if removing or replacing an individual
their data for training a certain model. However, recent works in the dataset has a negligible effect on the output distribution.
highlight the need to apply privacy-preserving mechanisms to A lower value of δ implies a greater confidence, and a smaller
address potential attacks derived from the sharing of parame- value of  tightens the privacy protection. This can be seen
ters/weights throughout the training rounds [5]. In particular, because the lower δ and  are, the closer P r(κ(D) ∈ S) and
a malicious aggregator could modify the received parameters P r(κ(D ) ∈ S), and therefore, the protection is stronger. As
to fool the model being trained. Even an honest-but-curious described in [3], when δ = 0, (,0)-differential is simplified to
aggregator might perform a reconstruction attack to infer train- -DP. If δ > 0, there is still a small chance that some information
ing data from these parameters using several techniques, such is leaked. In the case of δ = 0, the guarantee of information
as generative adversarial networks (GANs). Indeed, the use of leakage is not probabilistic. Therefore, -DP provides stronger
RUZAFA-ALCÁZAR et al.: INTRUSION DETECTION BASED ON PRIVACY-PRESERVING FEDERATED LEARNING FOR THE INDUSTRIAL IoT 1147

TABLE I In particular, considering the use of DP in the federated


DIFFERENT VARIABLES
training, the process in each training round is as follows.
1) The aggregator selects a set of clients to participate in
the training process by considering a certain client selec-
tion approach. In the context of IoT, operational condi-
tions of the device (e.g., battery consumption) could be
considered.
2) In the case of the initial training round, the aggregator
creates a new general model W G , whose weights (wiG )
are sent to the selected clients. Otherwise, the aggregator
creates an aggregated model by using a certain aggrega-
tion function based on the updates provided by the clients.
For this work, we compare the use of FedAvg [4] and Fed+
[10] aggregation methods, which are further explained in
Section IV. The resulting aggregated model is sent to the
clients.
3) Each client takes the shared model and trains it with its
local data, generating a particular model update with the
weights Wik that results from local training. It should
be noted that the number of epochs executed by a client
is determined by the number of epochs E.
4) The clients anonymize the calculated weights leverag-
ing a DP mechanism. The particular DP perturbation
mechanism applied in this step could be one of the
approaches defined in the following section. Then, the
resultant anonymized weights Wi are sent back to the
Fig. 1. Proposed FL with DP architecture.
aggregator entity.
5) The aggregator combines all the received model updates
using an aggregation algorithm, such as FedAvg, gener-
privacy guarantees than (, δ)-DP. For this reason, we have ating a new global model.
chosen for our approach δ = 0. These steps are repeated until a certain number of rounds R is
As described in Section V, our work evaluates different reached, or another condition is met, such as achieving a certain
DP techniques based on Gaussian and Laplacian distributions, target accuracy.
which are further detailed in Section III. Based on the previous description, Algorithm 1 provides a
detailed description of the required steps for the DP-enabled
III. PROPOSED PRIVACY-PRESERVING FL-ENABLED IDS federated training process, showing the integration and relation-
FOR IOT/IIOT ship between the FL aggregation method and DP mechanism
applied in each round. It should be noted that our work is focused
Based on the aforementioned aspects of privacy-preserving
on evaluating several DP techniques under different privacy
approaches for FL, in the following we describe the proposed
requirements to come up with a potential tradeoff between
architecture and algorithm enabling the FL training process.
privacy and accuracy. These techniques are further described
Furthermore, we provide a formal description of the specific
in the following.
DP techniques being considered in our work. For the sake of
clarity, Table I provides the meaning of the main variables and
terms, which are used in such description.
B. Perturbation DP Mechanisms
A. DP-Enabled FL
As depicted in Definition 1, an output perturbation mechanism
The overall architecture of our DP-enabled FL approach for takes an input D and returns a random variable κ(D). Such a ran-
intrusion detection in IoT is shown in Fig. 1. A client represents dom variable is computed using the addition of the transforma-
the end device where the local training is performed. Each tion of the input data by means of a function f : X → Rd to some
client is in charge of training the global model sent by the random noise that follows a certain distribution rn; therefore, we
aggregator with its local data and generating a model update. could express the (, δ) − DP as follows: κ(D) = f (D) + rn.
Furthermore, clients are endowed with the logic needed to apply The perturbation methods analyzed in this article are summa-
the corresponding DP algorithm. Moreover, the aggregator is the rized as follows.
central service that receives the model updates coming from the 1) Truncated Laplacian mechanism [11]: Given the privacy
clients and generates an aggregated model, which is sent back parameters 0 < δ < 1/2 and  > 0 and query sensitivity
to the clients for each training round. Δ > 0, the density probability of the truncated Laplacian
1148 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

distribution function ρTLap is defined as (−R, R)


 x
n
Be− λ , for x ∈ [−A, A] 1
fTLap (x) MDE,R (x(1) , . . ., x(n) ) = x(i) + η (7)
0, otherwise n i=1

where where x(i) and each ηi is drawn independently from


the distribution μDE,R . Let e ∈ (0, 1), k ∈ N, and δ ≥
Δ −k 
λ= (1) e log8 ˜k . Define R = n
C
log(1/δ), where C > 0 is a uni-

  versal constant, then MDE,R is (, δ) − DP .
Δ e − 1 As we can understand from the definitions of the DP methods,
A= log 1 +
 2δ the level of perturbation will highly depend on the chosen
1 1 parameter . In that sense, we wanted to analyze how the
B=  A
=  . similarity between the perturbed weights and the original ones is
2λ 1 − e− λ 2 Δ 1− 1

1+ e 2δ−1 affected by , and we used the Pearson correlation to do so. This
coefficient is a measure of the linear association of two variables,
2) Uniform: A special case of the truncated Laplacian mech- and it is measured from −1 to +1. A value of +1 indicates that the
anism [11] when  = 0. data objects are perfectly correlated, and a score of −1 means
3) Gaussian [12]: For any , δ ∈ (0, 1), Gaussian output that the data objects are not correlated.
perturbation mechanism is Essentially, the Pearson correlation coefficient (PCC) is the
 ratio between the covariance and the standard deviation of two
σ = Δ( 2 log (1.25/δ))/. (2)
variables. In mathematical form, the coefficient can be described
4) Analytic Gaussian mechanism [12]: Let f : X → Rd be a as
function with global L2 sensitivity Δ. For any  ≥ 0 and (xi − x)(yi − y)
δ ∈ [0, 1], the Gaussian output perturbation mechanism r=  (8)
(xi − x)2 (yi − y)2
M (x) = f (x) + Z with Z ∼ N (0, σ 2 I) is (, δ)-DP if
and only if where r is the correlation coefficient, xi represents the values
    of the x-variable in a sample, x̄ is the mean of the values of the
Δ σ Δ σ x-variable, yi represents the values of the y-variable in a sample,
φ − − e φ − − ≤ δ. (3)
2σ Δ 2σ Δ and ȳ is the mean of the values of the y-variable.
In our approach, the PCC, defined in (8), is calculated over the
5) Bounded domain Laplace mechanism [13]: Given b > 0 model updates for every training round, before and after applying
and D ⊂ R, the bounded Laplace mechanism Wq : Ω → the DP mechanism, as defined in (1)–(7). This means that this
D, for each q ∈ D, is given by its probability density metric indicates the similarity between the original weights and
function fWq the modified ones for every client.

0 if x ∈
/D IV. METHODOLOGY
fWq (x) = 1 1 − |x−q| (4)
Cq 2b e
b , if x ∈ D
Before describing our evaluation results, this section provides
 |x−q|
− b
an overview of different aspects of our approach, including the
where Cq = D 2b 1
e dx is a normalization constant. dataset, machine learning classification technique, and aggrega-
Let  ≥ 0, δ ∈ [0, 1], and query sensitivity Δ > 0, then tion functions being considered.
the mechanism Wq is (, δ)-DP whenever
Δ A. Dataset Description
b≥ . (5)
e − logΔC(b) − log(1 − δ) The dataset used in this article is based on the CIC-ToN_IoT
dataset,1 which was generated through the CICFlowMeter tool
It is needed to note that Cq is a function of b, and that the
from the original PCAP files of the ToN_ IoT dataset [9] to
domain D = [l, u] having lu. Then, the following holds:
extract 83 features. As a first step, we remove the nonnumeric
Cq |q−q | Δ features (e.g., flow ID). Then, we separate the samples of the
max q,q  ∈D e b = C(b)e b whole dataset based on the destination IP addresses, i.e., victims’
|q  −q|≤Δ Cq
addresses, and then remove the samples that do not correspond
when we define ΔC(b) as follows: to the top ten IP addresses, sorted by number of samples. The
resulting dataset contains 4.404.084 samples, which correspond
Cl+Δ (b) to the 82.29% of the original CIC-ToN-IoT.2
ΔC(b) = . (6)
Cl (b)
1 [Online].
Available: https://staff.itee.uq.edu.au/marius/NIDS_datasets/
6) Bounded Laplace noise mechanism [14]: This algorithm 2 The used dataset is available and can be downloaded from the
adds independent noise in each of the k coordinates, following link: [Online]. Available: https://mega.nz/folder/FIoClAKT#
drawn from the distribution μDE,R that is supported in ueQbjk5iU7J8XW12lLuvaw
RUZAFA-ALCÁZAR et al.: INTRUSION DETECTION BASED ON PRIVACY-PRESERVING FEDERATED LEARNING FOR THE INDUSTRIAL IoT 1149

classes until we reach an entropy level higher than 0.6. First,


Algorithm 1: Algorithm of Our DP Framework:
from these classes, we select the instances that satisfy the en-
Input: Number of rounds R, number of parties P , tropy requirement, and then we randomly remove these selected
number of epochs E and learning rate α instances. Table III summarizes the resulting data distribution
Output: Global model W G of parties that are selected for the DP-enabled federated training
[party[i]] process.
LocalUpdate(i, WrG ):
Let x be the input and y the labels of the local data B. Multiclass Classification
Normalize local inputs
Replace the parameters in the local model in order to get In our approach, we use supervised learning considering a
Wir multiclass approach to classify the dataset instances into benign
Initialize w = 0m−1 , b = 0 or a specific attack, namely, DoS, DDoS, Backdoor, Injection,
for each epoch from 1 to E do MITM, Scanning, Password, and XSS. Specifically, we apply
Compute prediction ŷk = h(xk ) the multinomial logistic regression [16], also called softmax
Compute loss Lk = L(ŷk , yk ) regression, due to its training efficiency. The algorithm was
Compute the gradients w = −∇Lk w, b = −∂L −∂b
provided by the sklearn library3 . It can also interpret model
Update parameters w = w + w, b = +b coefficients as indicators of feature importance. As with most
end for classifiers, the input variables need to be independent for the
Apply the (, δ) − DP mechanism to the weights w to correct use of the algorithm. Given the input x, the objective is
get κ(w). to know the probability of y (the label) in each potential class
return κ(w) to the server p(y = c|x). The softmax function takes a vector z of k arbitrary
[server] values and maps them to a probability distribution as follows:
initialize W0G exp(zi )
for each iteration r from 0 to R do softmax(zi ) = k
.
exp(zj )
Sr = Choose p parties out of P j=1

for each party i ∈ Sr do In our case, the input to the softmax will be the dot product
wri = LocalUpdate(i, WrG ) between a weight vector w and the input vector x plus a bias for
end for each of the k classes
Calculate new weights using Fed+: exp(wc ẋ + bc )
k
wr+1 = wrk − αk [fk (wrk ) + γk B(wrk , C(w))] p(y = c|x) = k .
end for j=1 exp(wj ẋ + bj )

The loss function for multinomial logistic regression general-


izes the loss function for binary logistic regression and is known
Afterward, as the resulting datasets are highly imbalanced, as the cross-entropy loss or log loss. While other supervised
we use Shannon entropy to measure the imbalance of each one techniques could be employed, it should be noted that our work
of them. The main reason to do so is that the use of imbalanced is focused on analyzing the impact of DP techniques considering
and non-i.i.d. data has a significant impact on FL scenarios [15]. different privacy requirements and aggregation functions in an
In particular, given a dataset of length n, and k classes of size FL setting.
ci , the balance between the classes is given by the formula
C. Aggregation Functions
−Σki=1 cni log ci
Entropy = n
(9) As described in Section II, the local updates generated by
log k
each client in FL are combined through an aggregation function
where the function is equal to 0 if all classes are 0 except one, in each training round. The most basic aggregation function is
and is equal to 1 if all ci = nk . represented by FedAvg [4], which generates the global model
Furthermore, it should be noted that we consider that one based on the average of the weights generated by the FL clients.
FL client is represented by a single victim IP address. In this In particular, let W G = (wiG ) be the weights of the general
context, given that each instance represents a network flow, n model and W k = (wik ) the weights of the party k, then
is the number of network flows, k is the number of the attack Di k
classes, and ci is the number of instances of the class i. wiG =w
D i
Table II describes the data distribution of the different parties,
including the distribution of classes and the entropy values for where D and Di are the total data size and data size of each
each party. Based on such values, we select the parties with an party, respectively.
entropy value higher than 0.2, so parties 0, 2, 4, and 5 are selected However, the performance of FedAvg may be degraded in
as the FL clients for our scenario. scenarios with non-i.i.d. and highly skewed data, as this case.
Then, after this initial step, due to the fact that the classes In this work, we also consider a recent approach called Fed+
of each local dataset are not well balanced, we use a simple [10], which unifies several functions to cope with scenarios
instance selection mechanism based on undersampling, which
consists in removing random samples from the predominant 3 [Online]. Available: https://scikit-learn.org/
1150 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

TABLE II
DESCRIPTION OF THE PARTIES CONSIDERED IN OUR DATASET

TABLE III
DESCRIPTION OF THE PARTIES PARTICIPATING IN THE PROPOSED DP-ENABLED FL SCENARIO

Fig. 2. FedAvg accuracy results for all perturbation mechanisms.

composed by heterogeneous data distributions. For this purpose, through


Fed+ relaxes the requirement of forcing all parties to converge k
Wr+1 = Wrk − γ k [fk (Wrk ) + αk B(Wrk , C(W ))]
on a single model. In particular, let the main objective in FedAvg
be where γ k are the learning rates, and Wrk represents the weights
of party k at round r.
1
min F (W ) = fk (W k )
D V. EVALUATION RESULTS
where fk is the local loss function of the party k. In the case of This section describes the evaluation results achieved by
Fed+, the main objective is applying each one of the previously described DP mechanisms
1 to the FL training scenario with a logistic regression classifier
min F (W ) = fk (W k ) + αk B(W k , C(W )) over the ToN_IoT dataset. The main performance evaluation
D
parameters are , the perturbation DP mechanism, and the ag-
where W are the global model weights, W k are the weights gregation algorithm to be applied. The evolution in terms of
of the party k model, B(·, ·) is a distance function, αi > 0 are accuracy for each mechanism and different  values throughout
penalty constants, and C is an aggregate function that computes the rounds are shown in Figs. 2 and 3. For the first one, FedAvg
a central point of W . is configured as the aggregation algorithm to be used in every
Then, to calculate the weights in each round, parties generate round, while Fed+ is used in the second one. As a reminder,
their respective W k , and send it to the aggregator. Afterward, a smaller  provides a better privacy-preserving scenario. The
the aggregator calculates the value of C(W ) and then sends it to evaluation also compares the achieved accuracy when using FL
the parties. Finally, the parties calculate the new model weights without applying DP techniques.
RUZAFA-ALCÁZAR et al.: INTRUSION DETECTION BASED ON PRIVACY-PRESERVING FEDERATED LEARNING FOR THE INDUSTRIAL IoT 1151

Fig. 3. Fed+ accuracy results for all perturbation mechanisms.

For the Gaussian analytic mechanism,  can take values higher Figs. 2 and 3, for almost all perturbation mechanisms and both
than 1. Fig. 2 shows that the accuracy levels for all  values aggregation algorithms, there is no big difference between the
are higher than 0.8, and that we achieve similar results to the accuracy reached by a mechanism using the most restrictive 
configuration without DP, meaning that our framework almost value (lower) and using the most relaxed one (higher). Never-
does not impact the accuracy while providing more privacy than theless, using FedAvg and some mechanisms, such as Laplace
classical FL approaches. Moreover, Gaussian mechanism, as truncated or Gaussian, it is clear that the more restrictive the 
shown in Fig. 2, reaches close accuracy values for every  value, is, the lower the accuracy throughout the rounds.
even with the ones using the non-DP configuration. In general, Moreover, it should also be noted that the privacy enhance-
in both cases, it is noticeable an increase in the first ten rounds ment achieved by each mechanism is given by the distance
in all cases, and then the accuracy stabilizes. to a PCC value of 1, which would represent the classical FL
Furthermore, Fig. 2 shows the same information for Laplace scenario where no perturbation mechanism is applied to data,
bounded domain and Laplace bounded noise mechanisms. In and therefore, there is no obfuscation at all. Table IV gives
this case, as in the previous mechanism, the accuracy values the average PCC values for each perturbation mechanism and
for different  values are very close, even if the accuracy level different  values. It should be noted that, for all mechanisms,
is reduced at round 30 for Laplace bounded noise and round the lower the  value, the lower the PCC. This indicates that,
20 for Laplace bounded domain for every  and the non-DP actually, lower  values achieve a more obfuscated set of weights
configuration. In the case of the uniform mechanism, it should and, therefore, a higher privacy factor. As it can be seen, uniform
be noted that the accuracy value is clearly reduced throughout is the mechanism that provides a more obfuscated set of weights,
the rounds. and consequently, the lowest PCC. Therefore, this mechanism
Moreover, Fig. 3 shows the results by using Fed+ as the provides the highest privacy factor compared with the other DP
aggregation algorithm. For this case, the evolution of the ac- techniques analyzed in this article. Furthermore, the uniform
curacy value is clearly ascending until round ten for every  mechanism also provides a similar accuracy level compared with
in all mechanisms. Compared with FedAvg, the final accuracy the other DP techniques when using Fed+ as shown in Fig. 3.
levels at the last round are slightly better, except for the uniform Consequently, it provides the best results considering the values
mechanism, which has a clearly higher accuracy level compared of PCC and accuracy for the proposed scenario.
with FedAvg, since Fed+ behaves better with non-i.i.d. data, By last, Fig. 4 shows a box–plot graphic of execution times
which is our case. According to Fig. 3, in most of the cases, for every mechanism. This graphic is the result of ten different
the accuracy using our DP strategy outperforms the accuracy of executions per mechanism for the same epsilon ( = 1), except
the scenario without any perturbation. This could be explained for the uniform mechanism since it only accepts  = 0. For each
by the fact that running stochastic gradient descent with noisy execution, the time spent in the perturbation process is measured,
gradient estimates can help in the performance of the model [17], and then the maximum, minimum, and average times are shown
while the noisy data are above some threshold. Also, this phe- in each box in the graphic.
nomenon has previously been observed in other state-of-the-art
studies, which apply DP for FL in a more limited way compared
to us but yet obtain better accuracy when obfuscating data in VI. RELATED WORK
certain cases [18]. As discussed previously, despite the advantages provided by
Theoretically, the lower the  value, since the privacy factor FL about privacy aspects, recent works demonstrated that differ-
increases, the final model should reach lower accuracy values ent attacks (e.g., inference) are still possible during the federated
as the weights are more obfuscated. However, as can be seen in training process through the access to the gradients/weights,
1152 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

TABLE IV
PEARSON CORRELATION COEFFICIENT

However, they did not provide evaluation results related to the


DP technique being employed. Furthermore, Zhao et al. [22]
assessed the impact on the accuracy of applying Gaussian noise
as a DP technique in an FL environment. However, the evaluation
is based on the well-known MNIST dataset, and they did not
compare these results with other DP techniques. In addition,
Hu et al. [23] considered an IoT scenario with resource con-
straints, where a relaxed version of DP is applied and evaluated
considering different datasets.
While previous approaches demonstrate the interest on the ap-
plication of privacy-preserving techniques for FL, these aspects
are not typically considered in the context of anomaly/intrusion
detection. For example, Li et al. [24] proposed a fog architecture
for DDoS attack detection and mitigation using FL. However,
only DoS attacks are considered, and privacy techniques are not
integrated. Furthermore, Liu et al. [25] used CNN for anomaly
detection in IIoT based on a gradient compression mechanism
to reduce the communication overhead in FL. In addition, the
Fig. 4. DP execution times per mechanism. use of FL was also considered by Khoa et al. [26] to implement
an intrusion detection approach in Industry 4.0 scenarios. They
compared their solution using different ML techniques and
which are uploaded by the FL clients to the coordinator [5]. datasets, but privacy aspects were not considered. Furthermore,
Therefore, as mentioned in Section II, recent works have pro- Attota et al. [27] used an ensemble approach and an optimization
posed the application of different privacy-preserving techniques, algorithm for feature extraction to come up with an IDS approach
such as SMC and DP, to FL scenarios. In particular, a partial for IoT. For validation purposes, they used a dataset containing
evaluation of DP techniques was carried out in [7], which im- traffic of a single IoT protocol. Moreover, recently a federated
plements the PrivacyFL simulator. Furthermore, Truex et al. [19] version of CNN was used by Man et al. [28] for intrusion detec-
integrated SMC and DP to tackle inference attacks while main- tion in IoT that is intended to reduce communication overhead.
taining an acceptable level of accuracy. The resulting approach Other works, such as [29], are based on old industrial datasets
is applied to several ML models, namely, decision trees, convo- that do not consider recent attacks from IIoT scenarios. However,
lutional neural networks (CNN), and support-vector machine. like in the previous cases, privacy-preserving techniques are not
However, these works do not compare the impact of different integrated. More related to our proposal, Chathoth et al. [30]
DP techniques considering their application to IDS approaches recently proposed two DP-based continuous learning methods
for IoT scenarios. that consider heterogeneous privacy requirements for different
In the context of IoT, Briggs et al. [8] described the use of FL clients in an IDS system. However, the approach is based on
privacy-preserving techniques for FL, which also provides a set a non-IoT-specific dataset (CSE-CIC-IDS2018), and different
of challenges for resource-constrained scenarios. Hu et al. [20] DP techniques are not compared. In contrast to this approach,
addressed the application of DP for FL in IoT, which uses activity our work evaluates different DP techniques that are applied to
recognition data from smartphones and personalized models on the recent Ton_IoT dataset, which contains different types of
each device. Also based on the use of DP, Lu et al. [21] made use attacks from IIoT environments, including network and sensor
of blockchain so that the computation required for the consen- data manipulation attacks. To the best of our knowledge, this
sus mechanism is also used for the federated training process. is the first effort that provides a comprehensive evaluation on
RUZAFA-ALCÁZAR et al.: INTRUSION DETECTION BASED ON PRIVACY-PRESERVING FEDERATED LEARNING FOR THE INDUSTRIAL IoT 1153

the application of DP techniques for FL considering different [10] P. Yu, L. Wynter, and S. H. Lim, “Fed+: A family of fusion algorithms for
aggregation techniques, in order to foster the development of a federated learning,” 2020. [Online]. Available: https://arxiv.org/abs/2009.
06303
privacy-preserving FL-enabled IDS for IIoT. [11] Q. Geng, W. Ding, R. Guo, and S. Kumar, “Privacy and utility tradeoff in
approximate differential privacy,” 2018. [Online]. Available: https://arxiv.
org/abs/1810.00877
[12] B. Balle and Y.-X. Wang, “Improving the Gaussian mechanism for differ-
VII. CONCLUSION ential privacy: Analytical calibration and optimal denoising,” in Proc. Int.
Conf. Mach. Learn., 2018, pp. 394–403.
The development of ML-enabled IDS approaches was based [13] N. Holohan, S. Antonatos, S. Braghin, and P. M. Aonghusa, “The bounded
on the processing of devices’ network traffic to detect potential Laplace mechanism in differential privacy,” 2018. [Online]. Available:
attacks and threats. While FL was coined to avoid parties to share https://arxiv.org/abs/1808.10410
[14] Y. Dagan and G. Kur, “A bounded-noise mechanism for differential
their data, it still suffers from privacy issues associated with the privacy,” 2020. [Online]. Available: https://arxiv.org/abs/2012.03817
communication of gradients/weights in each training round. To [15] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence
address such issue, this work provided an exhaustive evaluation of FedAvg on non-IID data,” 2019. [Online]. Available: https://arxiv.org/
abs/1907.02189
on the use of DP techniques based on additive noise mechanisms, [16] D. Böhning, “Multinomial logistic regression algorithm,” Ann. Inst. Stat.
which are applied during the federated training process of the Math., vol. 44, no. 1, pp. 197–200, 1992.
ToN_IoT dataset to come up with a privacy-preserving IDS for [17] S. Song, K. Chaudhuri, and A. Sarwate, “Learning from data with hetero-
geneous noise using SGD,” in Proc. 18th Int. Conf. Artif. Intell. Statist.,
IIoT scenarios. We compared different noise addition techniques 2015, pp. 894–902.
based on Gaussian and Laplacian distributions, and assessed [18] K. Wei et al., “Federated learning with differential privacy: Algorithms
the accuracy obtained using Fed+ as an alternative aggregation and performance analysis,” IEEE Trans. Inf. Forensics Secur., vol. 15,
pp. 3454–3469, Apr. 2020.
function to FedAvg that has recently been proposed to deal [19] S. Truex et al., “A hybrid approach to privacy-preserving federated learn-
with non-i.i.d. data distributions, which are prevalent in the real ing,” in Proc. 12th ACM Workshop Artif. Intell. Secur., 2019, pp. 1–11.
world. According to our evaluation results, the use of such DP [20] R. Hu, Y. Guo, H. Li, Q. Pei, and Y. Gong, “Personalized federated
learning with differential privacy,” IEEE Internet Things J., vol. 7, no. 10,
techniques maintains an acceptable level of accuracy, which pp. 9530–9539, Oct. 2020.
is even close to a non-DP scenario in the case of low privacy [21] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain and
requirements (i.e., with a high  value). In the case of Fed+, the federated learning for privacy-preserved data sharing in industrial IoT,”
IEEE Trans. Ind. Informat., vol. 16, no. 6, pp. 4177–4186, Jun. 2020.
impact of DP techniques on the accuracy is not perceptible. To [22] B. Zhao, K. Fan, K. Yang, Z. Wang, H. Li, and Y. Yang, “Anonymous
the best of our knowledge, this work represents the first effort and privacy-preserving federated learning with industrial Big Data,” IEEE
to provide a comprehensive evaluation of an FL-enabled IDS Trans. Ind. Informat., vol. 17, no. 9, pp. 6314–6323, Sep. 2021.
[23] R. Hu, Y. Guo, E. P. Ratazzi, and Y. Gong, “Differentially private federated
in IIoT considering different aggregation functions. As future learning for resource-constrained Internet of Things,” 2020. [Online].
work, we will analyze the development of a personalized FL Available: https://arxiv.org/abs/2003.12705
approach where each device has different privacy requirements, [24] J. Li, L. Lyu, X. Liu, X. Zhang, and X. Lv, “FLEAM: A federated learning
empowered architecture to mitigate DDoS in industrial IoT,” IEEE Trans.
as well as the use of gradient compression techniques to be Ind. Informat., to be published, doi: 10.1109/TII.2021.3088938.
considered on IIoT scenarios with network constraints. [25] Y. Liu et al., “Deep anomaly detection for time-series data in industrial
IoT: A communication-efficient on-device federated learning approach,”
IEEE Internet Things J., vol. 8, no. 8, pp. 6348–6358, Apr. 2021.
[26] T. V. Khoa et al., “Collaborative learning model for cyberattack detection
REFERENCES systems in IoT Industry 4.0,” in Proc. IEEE Wireless Commun. Netw. Conf.,
2020, pp. 1–6.
[1] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of
[27] D. C. Attota, V. Mothukuri, R. M. Parizi, and S. Pouriyeh, “An ensemble
intrusion detection systems: Techniques, datasets and challenges,” Cyber-
multi-view federated learning intrusion detection for IoT,” IEEE Access,
security, vol. 2, no. 1, pp. 1–22, 2019.
vol. 9, pp. 117734–117745, 2021.
[2] N. Garcia, T. Alcaniz, A. González-Vidal, J. B. Bernabe, D. Rivera,
[28] D. Man, F. Zeng, W. Yang, M. Yu, J. Lv, and Y. Wang, “Intelligent intrusion
and A. Skarmeta, “Distributed real-time SlowDoS attacks detection over
detection based on federated learning for edge-assisted Internet of Things,”
encrypted traffic using artificial intelligence,” J. Netw. Comput. Appl.,
Secur. Commun. Netw., vol. 2021, 2021, Art. no. 9361348.
vol. 173, 2021, Art. no. 102871.
[29] V. Mothukuri, P. Khare, R. M. Parizi, S. Pouriyeh, A. Dehghan-
[3] Z. Ji, Z. C. Lipton, and C. Elkan, “Differential privacy and machine
tanha, and G. Srivastava, “Federated learning-based anomaly detection
learning: A survey and review,” 2014. [Online]. Available: https://arxiv.
for IoT security attacks,” IEEE Internet Things J., to be published,
org/abs/1412.7584
doi: 10.1109/JIOT.2021.3077803.
[4] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
[30] A. K. Chathoth, A. Jagannatha, and S. Lee, “Federated intrusion detection
“Communication-efficient learning of deep networks from decentralized
for IoT with heterogeneous cohort privacy,” 2021. [Online]. Available:
data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., 2017, pp. 1273–1282.
https://arxiv.org/abs/2101.09878
[5] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and
G. Srivastava, “A survey on security and privacy of federated learning,”
Future Gener. Comput. Syst., vol. 115, pp. 619–640, 2021.
[6] O. Choudhury et al., “Anonymizing data for privacy-preserving federated
learning,” 2020. [Online]. Available: https://arxiv.org/abs/2002.09096
[7] V. Mugunthan, A. Peraire-Bueno, and L. Kagal, “PrivacyFL: A simulator
Pedro Ruzafa-Alcázar received the degree in
for privacy-preserving and secure federated learning,” in Proc. 29th ACM
computer science from the University of Murcia,
Int. Conf. Inf. Knowl. Manage., 2020, pp. 3085–3092.
Murcia, Spain, in 2021.
[8] C. Briggs, Z. Fan, and P. Andras, “A review of privacy-preserving feder-
He became a scholarship Researcher in
ated learning for the Internet-of-Things,” in Federated Learning Systems.
2020. His research interests include cyber se-
Cham, Switzerland: Springer, 2021, pp. 21–50.
curity, network architecture, and telematics ser-
[9] T. M. Booij, I. Chiscop, E. Meeuwissen, N. Moustafa, and F. T. den Hartog,
vices deployment.
“Ton_IoT: The role of heterogeneity and the need for standardization of
features and attack types in IoT network intrusion datasets,” IEEE Internet
Things J., to be published, doi: 10.1109/JIOT.2021.3085194.
1154 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

Pablo Fernández-Saura received the B.Eng. José L. Hernández-Ramos received the Ph.D.
and M.Sc. degrees in computer science from degree in computer science from the University
the University of Murcia, Murcia, Spain, in 2020 of Murcia, Murcia, Spain, in 2016.
and 2021 respectively. He is currently a Scientific Project Officer
He was a Researcher in several European with Joint Research Centre, European Commis-
projects, such as H2020 CyberSec4Europe and sion, Ispra, Italy. He has participated in differ-
H2020 Inspire-5Gplus, with the University of ent European research projects, such as So-
Murcia. His research interests include cyberse- cIoTal, SMARTIE, and SerIoT. He was a Sci-
curity, artificial intelligence, and 5G networks. entific Expert for the Agence Nationale de la
Recherche, a TPC Member/Co-Chair of sev-
eral conferences, and the Guest Editor and Re-
viewer of different journals. He has authored or coauthored more than
60 peer-reviewed papers. His research interests include application of
security and privacy mechanisms in the Internet of Things and transport
systems scenarios, including blockchain and machine learning.

Enrique Mármol-Campos received the gradu-


ate degree in mathematics and the M.S. degree
in advanced math, with the specialty of opera-
tive research and statistics, in 2018 and 2019, Jorge Bernal-Bernabe received the B.S. de-
respectively, from the University of Murcia, Mur- gree in computer science, in 2007, M.S. degree
cia, Spain, where he is currently working toward in telematics, in 2008, M.B.A. degree in 2009
the Ph.D. degree in federated learning, cyberse- and Ph.D. degree in computer science, from the
curity, IoT, and deep learning. University of Murcia, Murcia, Spain, in 2015.
His research interests include the application He was a Visiting Researcher with the
of federated learning in the cybersecurity of In- Hewlett-Packard Laboratories Palo Alto, CA,
ternet of Things and Internet of Vehicles de- USA and University of the West of Scotland,
vices, finding new ways to optimize the resources, securing the privacy Glasgow, U.K. He has authored more than 40
of all parts involved, and the search of its use in real-time scenarios high-impact indexed journal papers and numer-
where these devices presents lots of restrictions. ous conferences. During the last years, he was
working in several European research projects, such as SocIoTal,
ARIES, OLYMPUS, ANASTACIA, INSPIRE-5G, and CyberSec4EU. He
is currently an Associate Professor with the University of Murcia. His
research interests include security, trust, and privacy management in
distributed systems and IoT.

Aurora González-Vidal received the graduate


degree in mathematics and the Ph.D. degree in
computer science from the University of Murcia,
Murcia, Spain, in 2014 and 2019 respectively. Antonio F. Skarmeta (Member, IEEE) received
In 2015, she received a fellowship to work the Ph.D. degree in computer science from the
with the Statistical Division of the Research University of Murcia, Murcia, Spain, in 1995.
Support Service, where she specialized in He is currently a Full Professor with the De-
statistics and data analysis. Afterward, she stud- partment of Information and Communications
ied a Big Data Master. She is currently a Post- Engineering, University of Murcia. He has au-
doctoral Researcher with the University of Mur- thored or coauthored more than 200 interna-
cia. She has collaborated in several national and tional papers. He was a Member of several pro-
European projects, such as ENTROPY, IoTCrawler, and DEMETER. Her gram committees. His research interests include
research interests include machine learning in IoT-based environments, integration of security services, identity, the In-
missing values imputation, and time-series segmentation. ternet of Things, and smart cities.
Dr. Vidal is the President of the R Users Association UMUR.

You might also like