Shrink AutoEncoder For Federated Learning-Based IoT Anomaly Detection
Shrink AutoEncoder For Federated Learning-Based IoT Anomaly Detection
Abstract—Federated Learning (FL)-based anomaly detection Among these works, the proposed FedDetect model in [15]
is a promising framework for Internet of Things (IoT) security. can be considered as one of the most effective solutions
Due to the scarcity of abnormal data, unsupervised deep learning for IoT anomaly detection. Specifically, FedDetect estimates
neural network models, such as variations of AutoEncoder (AE),
2022 9th NAFOSTED Conference on Information and Computer Science (NICS) | 978-1-6654-5422-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/NICS56915.2022.10013475
are considered effective solutions for anomaly detection in IoT the global anomaly score1 by the median of mean square
devices. These models construct low-dimensional representations error (MSE) received from all training clients (IoT devices).
of input data that are utilized for classification. Nevertheless, However, the difference between the normal data collected by
given the enormous number of IoT devices, their intrinsic different IoT devices makes a significant difference in MSE at
heterogeneity, and the distributed nature of the FL training the training clients. As shown in Fig. 1 (a), the representations
process, the latent representation of the local data is distributed
randomly. The determination of the global anomaly score is thus of the normal data of IoT devices are different, leading to
no longer accurate. To address this issue, this work provides an different anomaly scores. Thus, estimating the global anomaly
effective FL-based IoT anomaly detection framework with novel score of FedDetect is less effective.
AutoEncoder models, namely Federated Shrink AutoEncoder Given the above, this work proposes an effective FL-based
(FedSAE). The proposed model forces normal data of IoT devices IoT anomaly detection framework with novel AE models,
to nearly the origin. Thus, a universal or global anomaly score
can be determined accurately for all IoT devices. The extensive namely Federated Shrink AutoEncoder (FedSAE). Specifi-
experiments on the N-BaIoT dataset indicate that FedSAE may cally, we design the FL based on the Shrink AutoEncoder
reduce the false detection rate by 1.84% compared with that (SAE) [5] that can force all normal IoT data to the region
of the AE-based FL frameworks for the IoT anomaly detection that is near the origin so that we can define a universal/global
problem. anomaly score for all the IoT devices. Fig. 1 (b) shows that
Index Terms—Shrink AutoEncoder, federated learning,
by forcing the normal data from different IoT devices closer
anomaly detection, IoT.
to the origin, the global anomaly score (the green circle) can
be estimated more effectively. Thus, FL based on SAE, i.e.,
I. I NTRODUCTION Federated SAE (FedSAE) can handle the problem of global
anomaly score on the FL frameworks. The major contributions
The Internet of Things (IoT) is a network of billions of of this paper are as follows:
IoT devices that execute independent tasks with minimum • We propose using SAE for FL (i.e., FedSAE) to handle
human intervention. These devices are often limited in battery the problem of the global anomaly score for FL-based
and computing power, hence more vulnerable to malicious IoT anomaly detection.
attacks. A number of deep neural network (DNN) models • We conduct the extensive experiments to prove that Fed-
have been recently developed to detect anomalous behaviors in SAE can enhance the accuracy of IoT anomaly detection.
IoT networks, e.g., [1]–[4]. To build a DNN model, we need Specifically, FedSAE can reduce the false detection rate
to collect a huge amount of data generated by IoT devices, by 1.84% compared with the convention.
including normal and abnormal data. However, abnormal data
The rest of the paper is organized as follows. Section II
is usually hard to collect. Thus, unsupervised DNN models that
reviews the techniques for anomaly detection related to DNN
are trained only on normal data are preferred for IoT anomaly
models and FL architectures. In Section III, we present the
detection. Within the various DNN models, AutoEncoder
fundamental background of our proposed method. Then, the
(AE)-based models are considered the most effective solution
proposed model is presented in detail in Section IV. In
for IoT anomaly detection [5]–[10].
Section V, we describe the dataset and experimental scenarios.
As aforementioned, using DNN models for IoT anomaly The results and discussions of experiments are represented
detection usually requires massive training data at the center in Section VI. Finally, conclusions with future work are
server [5]–[10]. However, collecting training data from IoT summarized in Section VII.
devices and sending it to a central server can compromise
the privacy of devices. This may discourage the IoT devices II. R ELATED WORK
from sharing their raw data for the algorithm or model training Due to the speed and volume of data generated by IoT de-
purposes. To address this issue, Federated Learning (FL) vices, DNNs have been widely adopted for the anomaly detec-
frameworks have recently been introduced as a highly effec- tion problem recently [16]–[18]. Among various DNN models,
tive solution to train different DNN models with distributed
training data [11]–[15]. 1 The score is used to separate the normal and abnormal data [15]
978-1-6654-5422-3/22/$31.00
Authorized ©2022
licensed use limited to: East China 383 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
IEEE and Tech. Downloaded
Univ of Science
2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
−10
−15
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
Fig. 1. Representation of normal samples and abnormal data samples selected randomly in the N-BaIoT dataset [7]. The representations have two dimensions
resulted from (a) FedDetect [15] and (b) FedSAE.
AE-based models are widely used in anomaly detection due However, the anomaly data is not always available, especially
to their simplicity, flexibility across multiple data formats, and with data collected from IoT devices. To handle this, the
variety of variations [5], [6], [8], [9]. [8] proposed a novel FL architecture based on AE, namely FedDetect, was proved
architecture for time series anomaly detection that combines as an effective method for IoT anomaly detection [15]. This
Convolutional Neural Network (CNN) and Long Short Term architecture leverages AE to learn the representation of normal
Memory (LSTM) based on AE. The preprocessing of the IoT data. The anomaly score is set globally based on the
two-stage sliding window might be considered the extraction M SE of AE models from all the clients. However, because
of low-level temporal features. However, these techniques each client trains on different data collected from different IoT
are only suitable for univariable time series data. Moreover, devices, thus, setting the global anomaly score for all clients
[5] also examined the learning representation models based is difficult. We address this problem by proposing a model
on AE for anomaly detection. The results showed that AE- that can force latent representations around the origin, thus
based models can enhance the accuracy of anomaly detection facilitating a more accurate calculation of the global anomaly
compared with other methods. However, the performance of score.
these techniques is highly dependent on the anomaly scores
III. BACKGROUND
that are selected manually based on the loss value of AE, i.e.,
Mean Square Error (MSE). In this paper, we overcome this In this section, we present the fundamental background of
issue by calculating an anomaly score based on multiple MSE AE and FL that are the key components of our proposed
values. frameworks.
FL has recently received paramount interest in handling the A. AutoEncoder
data privacy of DNN models thanks to its potential to address An Autoencoder (AE) [19] is a type of artificial neural
drawbacks of centralized training systems such as latency, network designed for unsupervised learning. This is composed
communication bandwidth, and data privacy [11]–[15]. The of two major components: an encoder En and a decoder De.
work [11] proposed sparse tertiary compression (STC), a The encoder generates a low-dimensional representation of an
communication protocol that utilizes sparsification, ternariza- input data xi (i.e., a latent representation z i ), while the decoder
tion, error accumulation, and optimum Golomb encoding to reconstructs the data x̂i from the latent representation space
compress both upstream and downstream data. However, since in order to minimize the loss function (as shown in Eq. (4)).
STC compressed models after the local training phase, the We use ϕ = (We , be ) to denote the parameter set for train-
quantization process is not optimized during training. The ing the encoder, where We and be are the weight matrix and
work [12] introduced an asynchronous client-side learning the bias vector, respectively. Assume the latent representation
technique and a server-side temporally weighted aggregation z i of the input sample xi has a general form of
of the local models. Furthermore, DeepFed [13] was a feder-
ated scheme using a novel gated recurrent unit-CNN model z i = qϕ (xi ) = σe (We xi + be ), (1)
for detecting threats against industrial cyber-physical systems.
where qϕ is the encoder, and σe is the activation function of
In [14], the authors introduced an on-device communication
the encoder. Similarly, θ = (Wd , bd ), pθ , and σd represent
efficient FL-based architecture for deep anomaly detection. An
the parameter set, the decoder and the activation function of
Attention Mechanism-based Convolutional Neural Network-
the decoder, respectively. The decoder pθ is responsible for
Long Short Term Memory (AMCNN-LSTM) model was pro-
mapping the latent representation z i to the reconstruction x̂i
posed for detecting anomalies. In practice, these FL archi-
tectures usually need to collect anomaly data for learning. x̂i = pθ (z i ) = σd (Wd z i + bd ). (2)
384 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded
2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
The AE is trained to achieve a minimal reconstruction error. representation by the same anomaly score. This property is
The reconstruction error of an AE is often estimated as the suitable for FL while a central AE-based model is aggregated
mean squared error (MSE) averaged across all data samples from multiple AE-based models. Thus, we have developed a
as follows: new FL framework based on SAE for the anomaly detection
n
1X i problem.
LAE (x, ϕ, θ) = ∥x − pθ (qϕ (xi ))∥2 , (3)
n i=1 C. Federated Learning
where n is the number of data samples in the dataset. Federated Learning (FL) is a promising distributed machine
AEs are commonly utilized in anomaly detection for train- learning paradigm for privacy preservation [23]. A shared
ing on normal data samples [5]. The training objective of AEs global model is trained from a federation of participating
is to minimize the loss function in Eq. (4). Moreover, the devices under the control of a central server. FL enables
M SE value is usually used to setup the anomaly score for mobile devices to collaborate on training a shared model
the anomaly detection problem [5], [15]. while preserving all training data on the device, ensuring the
B. AutoEncoder Variations protection of personal data and minimizing system latency.
The FL operation process consists of three phases, i.e.,
AEs, which have too many hidden layers, will be able to
initialization, local model training and updating, and global
learn the task of copying data in inputs without extracting
model aggregation and updating [24]. The server broadcasts
essential information. Denoising AE (DAE) [5], [20] is a
an initialized global model to all the clients during the ini-
regularized AE motivated to acquire new characteristics and
tialization phase. Following that, each client trains the model
generalize more effectively. DAEs are trained to reconstructed
and updates the model parameters locally by using its private
the original input from a stochastically noise-added input. The
dataset. At the end of this phase, the clients send the updated
loss function of DAEs is reformulated as follows:
model parameters to the server. The server aggregates the local
n
X models and then broadcasts the updated global model back to
LDAE (x, ϕ, θ) = Ep(x̃|xi ) ∥xi − pθ (qϕ (x̃))∥2 , (4)
the clients again during the aggregation phase. There are sev-
i=1
eral ways to aggregate client models, however, the averaging
where x̃ is the corrupted version of xi as determined by model operation has been widely used in the literature [12],
p(x̃|xi ). Ep(x̃|xi ) is the expectation of a reconstruction loss at [14].
xi over a number of samples x̃. There are several techniques to
corrupt the input, such as Gaussian noise and salt and pepper IV. P ROPOSED METHOD
noise, but the most popular is arbitrarily masking input features
This section presents our proposed FL architecture, i.e.,
to zero.
FedSAE for IoT anomaly detection.
A Variational AE (VAE) [5], [21] is an AE whose distribu-
tion of encodings is regularized during training to guarantee
Algorithm 1 Training FedSAE.
that its latent space has good features that enable the gen-
1: Server: Initialize ws ,
eration of new data. Unlike in a conventional AE, the input
2: Clients: Initialize wk (k = 1 . . . K),
of the VAE is encoded as a distribution throughout the latent
3: for round < T do
space, rather than as a single point, which helps incorporate
some regularisation into the latent space. The loss function of 4: for each client k = 1 to K do
VAE consists of two terms. The first one is the reconstruction 5: Train the SAE model,
term on the final layer, which tends to make the encoding- 6: Update the weight wk at the client K,
decoding scheme as efficient as possible. The second one is 7: Send wk to the Server,
the regularisation term on the latent layer, which tends to 8: end for
9: end for
regularise the organisation of the latent space by placing the 1
PK
10: Server: Calculate ws = K k=1 wk ,
encoder closer to the standard normal distribution. This term
11: Server: Calculate MSEs = MSE1 , MSE2 , . . . , MSEK ,
is stated as the Kulback-Leibler divergence [22] between the
12: Server: Calculate thglobal = MSEs + α ∗ σ(MSEs ),
retrieved distribution and a Gaussian distribution.
13: Server: Send the new model weight ws and the anomaly
Shrink AutoEncoder (SAE) was introduced by Loi et al. [5].
This is a variant of AE by adding a new regularizer to the loss score thglobal to all the Clients.
function of AE. The new regularizer helps to force the latent
representation of an AE nearly to the origin. In the training Shrink AutoEncoder (SAE) introduced in [5] that was
process, only the normal data is trained to map to nearly the proved as an effective unsupervised AE-based model for
origin of the latent representation space. After training, the anomaly detection. SAE is trained on normal network traffic
normal data is forced around the origin while the abnormal data and uses an anomaly score to distinguish between normal
data is mapped to the regions far from the origin. By forcing and abnormal network traffic data. The loss function of SAE
the normal data to the origin, the normal and abnormal data aims to map normal data samples to nearly the origin, that can
of multiple data types are easily distinguished in the latent be expressed as follows:
385 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded
2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
386 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded
2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
TABLE I
Server
P ERFORMANCE COMPARISON BETWEEN FL FRAMEWORK ON N-BA I OT
- Calculate weight ws DATASET.
- Calculate global
anomaly score thglobal
Global Model
Model FPR TNR Precision
w
s, th FedVAE 100 0 50
SE 1
,M w g lob
ws, thglobal
w1 K, M al
w2, MSE2
b al
s,
th glo SE FedDAE 5.81 94.19 94.19
w K
387 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded
2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
97.09 98.88 99.07 [14] Y. Liu, S. Garg, J. Nie, Y. Zhang, Z. Xiong, J. Kang, and M. S.
100 Hossain, “Deep anomaly detection for time-series data in industrial
iot: A communication-efficient on-device federated learning approach,”
IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6348–6358, 2021.
Accuracy (%)
388 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded