FSL: federated sequential learning-based cyberattack detection for Industrial Internet of Things

Li, Fangyu; Lin, Junnuo; Han, Honggui

doi:10.1007/s44244-023-00006-2

FSL: federated sequential learning-based cyberattack detection for Industrial Internet of Things

Research
Open access
Published: 17 March 2023

Volume 1, article number 4, (2023)
Cite this article

Download PDF

You have full access to this open access article

Industrial Artificial Intelligence Aims and scope Submit manuscript

FSL: federated sequential learning-based cyberattack detection for Industrial Internet of Things

Download PDF

Fangyu Li¹,
Junnuo Lin¹ &
Honggui Han¹

2132 Accesses
7 Citations
Explore all metrics

Abstract

Industrial Internet of Things (IIoT) brings revolutionary technical supports to modern industries. However, today’s IIoT still faces the challenges of modeling varying time-series in common data isolation while considering data security. To accurately characterize industrial dynamics, we propose a possible solution based on federated sequence learning (FSL) with cyber attack detection capabilities. Under a federated framework, FSL constructs a collaborative global model without violating local data integrity. Taking advantages of the locally sequential modeling, FSL captures the intrinsic industrial time-series responses. Furthermore, data heterogeneity among distributed clients is also considered, which is important to maintenance a robust but sensitive attack detection. Experiments on classic distributed datasets demonstrate that FSL is capable to accurately model data heterogeneity caused by data isolation and dynamics of time-series. Real IIoT attack detection experiments using a distributed testbed show that our FSL provides better detection performances for industrial time-series sensory data compared to existing methods. Therefore, the proposed attack detection approach FSL is promising in real IIoT scenarios in terms of feasibility, robustness and accuracy.

A novel approach detection for IIoT attacks via artificial intelligence

Article Open access 06 May 2024

Embedding Time-Series Features into Generative Adversarial Networks for Intrusion Detection in Internet of Things Networks

One-Shot Learning of Ensembles of Temporal Logic Formulas for Anomaly Detection in Cyber-Physical Systems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

World is making a major push towards a new industrial revolution aimed at effectively automating production systems, such as Industry 4.0 [1]. Industrial Internet of Things (IIoT) application environments typically integrate a variety of technologies, such as embedded devices, cloud computing, machine-to-machine communication, etc., in a closed industrial environment [2], as shown in Fig. 1. IoT-connected devices are now expanding quickly in quantity, with recent estimates suggesting that there will be around 70 billion internet-connected devices by 2025. A network of linked intelligent devices is known as Internet of Things (IoT), a sub-type of which is IIoT. IoT links all ‘Things’ to the internet, enabling all ‘Things’ to gather data and assign tasks to the devices to which it is linked through sensors, etc. [3], IoT devices mostly consist of processors, sensors, actuators and communication hardware, etc., and can process and send the data obtained from these devices [4]. Today, smart factories are combining industrial artificial intelligence (IAI) with IIoT for purposes such as data-driven predictive analytics and collaborative modeling in industrial production conditions [5], but challenges still exist.

There are issues in the combination of IAI and IIoT. For example, in real industrial environment, data distribution is not centralized. Gathering large amounts of data may cause serious information privacy threats, such as malicious eavesdropping of information, or even malicious attacks, resulting in insufficient security in the process of training IAI models [6, 7]. Furthermore, real industrial systems contain many nodes and datasets at once, which makes it challenging to effectively handle traditional deployments [8, 9]. Therefore, in order to achieve and support an effective combination of IAI and IIoT, an optimal learning framework is needed.

Different edge devices in industrial applications generate skewed local data distributions. [Skewed data distributions are also known as non-independent identical distributions (non-i.i.d.).] Non-i.i.d. characteristics may lead to a severe impairment in the performance and convergence rate of trained neural networks, as the data collected at each endpoint do not hold a constant distribution. In order to achieve a performance comparable to centralized solutions, the traditional approach updates the model frequently by increasing the number of communication rounds, resulting in an overall degraded performance of the global model with lower accuracy [10]. In particular, there are complex temporal dependencies between different events and terminals, leading to heterogeneity in the continuous IIoT data collection [11]. Furthermore, statistics heterogeneity exists not only in the distribution of the collected data, but also in the specific circumstances of different nodes, which results in heterogeneity of nodes. In real IIoT applications, different nodes are subject to different types of cyber attacks, which also induced node heterogeneity which makes the global model less accurate. Therefore, IIoT security is important, as IIoT is vulnerable to attacks at all layers in the architecture [12]. However, IIoT devices often have limitations (e.g., power, storage capacity, computing power, etc.) that make the security of the IIoT vulnerable to challenges [13]. At the same time, the relatively independent and widely heterogeneous nature of IIoT devices makes it difficult to construct the security guarantee [14]. The security of data is therefore one of the core concerns of IIoT [15].

Recently, federated learning (FL), which integrates IAI with IIoT, has been a promising approach for the development of smart factories (i.e., IIoT data offloading and caching, IIoT mobile crowd sensing, etc.) [16]. Rather than requiring all endpoints to upload raw data, FL constructs a global model by aggregating sub-models trained on each endpoint based on their individual data sets [17]. FL aims to involve each endpoint in the construction of the model while privacy data leakage [18], thus contributing to IIoT in terms of data privacy, cost of communication between devices, etc. [19]. FL was originally designed for edge devices, where the endpoints collaborate to train models without exchanging local raw data, then use the FL to combine the submodels trained on the endpoints into a global model. McMahan et al. [20] proposed federated averaging (FedAvg), a standard algorithm for obtaining local models by performing stochastic gradient descent (SGD) at each terminal and performing weighted averaging of the weight parameters in the local models. The global model is obtained by a weighted average of the fixed weights, with the weight of each terminal proportional to the size of the original dataset at each terminal. Wang et al. [21] proposed FedMA algorithm, which matches hidden units with similar feature extraction labels to each other and averages them to create a shared global model. Li et al. [22] proposed the FedProx algorithm, which is a generalization and re-parameterization of FedAvg. FedProx proposes a proximity term. The approximation term is calculated based on the ${l_2-\text{norm}}$ distance between the current global model and the weight parameters of the local model. Therefore, it works better with non-i.i.d. data as the approximation term limits the update of the global model by the local model. Karimireddy et al. [23] proposed the SCAFFOLD algorithm, which is an improvement on FedAvg. The SCAFFOLD algorithm uses control variables to correct the ‘client-drift’ problem in its local updates. It has also been shown that the SCAFFOLD algorithm requires significantly less communication and is not affected by data heterogeneity or client-side. Chen et al. [24] proposed an asynchronous federated learning model and designs a lightweight node selection algorithm. The method is demonstrated empirically to be optimal in various cases of independent identical distribution (i.i.d) and non-i.i.d. Ouyang et al. [25] proposed ClusterFL, a novel multitasking federated learning framework for clusters, which automatically captures the intrinsic clustering relationships between data from different nodes and improves convergence speed and model accuracy by eliminating the slower converging nodes in each cluster.

Researchers have proposed algorithms to detect cyber attacks for IoT systems [7]. Cyber attack intrusion detection algorithms are improved using a deep learning framework. Ioannou et al. [26] proposed to use SVM models to identify cyber attacks and to perform multiple classifications of cyber attacks. The proposed C-SVM detection model showed good performances with a high classification accuracy. Zhang et al. [27] proposed a deep belief network (DBN), a genetic algorithm (GA) based intrusion detection algorithm that can improve the detection accuracy of DBN intrusion detection models by adaptively generating the number of hidden layers and neurons in several iterations. Li et al. [28] proposed a fused multi-convolutional neural network approach and used it to detect anomalies in the IoT. It was evaluated on an IoT dataset and the results show that the algorithm classifies attack types with high accuracy. Hassan et al. [29] proposes an algorithm that uses a weight-decreasing long short-term memory (WDLSTM) network to maintain the long-term dependency of extracted features. They developed a deep learning model using convolutional neural networks (CNNs) to extract useful features from time-series data of intrusion detection systems (IDSs). The superiority of this algorithm has also been experimentally validated. Xin et al. [30] proposed S-TCN, an improved multi-classification network based on temporal convolutional network (TCN), and they also experimentally demonstrated the effectiveness of S-TCN in handling temporal heterogeneous data in IoT scenarios. All of the above algorithms lack effective methods for handling time-series data with node heterogeneity present in real IIoT, have poor application to high-dimensional time-series data, and have not been experimentally validated on time-series data with node heterogeneity in real IIoT.

To improve the security of IIoT, many researchers have carried out relevant research. One of the most classical approaches is the detection of attacks using classification. Ge et al. [31] proposed an IoT cyber attack detection model using a deep learning model, which uses feed-forward networks to classify various cyber attacks. According to experiments, the accuracy of the proposed method outperforms that of traditional support vector machine (SVM) based intrusion detection networks. Aamir et al. [32] proposed a semi-supervised intrusion detection model, which is based on principal component analysis and random forest clustering techniques and is mainly used to identify DDoS attacks. Hara et al. [33] proposed an automatic encoder intrusion detection model based on semi-supervised learning. Simulation experiments were also conducted on an IIoT dataset and the model was found to be more accurate than an intrusion detection model using deep neural networks. Mcdermott et al. [34] proposed A bi-directional network framework based on long short-term memory (LSTM) to identify botnets in data. The framework is also experimentally verified to have high accuracy. Pacheco et al. [35] proposed a machine learning (ML) based intrusion detection system. The model has high accuracy for cyber attack intrusion detection in IIoT. The above experimental pairs do not take into account the different attacks and different levels of attacks suffered by different nodes, which may lead to differences in the distribution between the nodes’ local data, thus causing the problem of heterogeneity in the nodes’ data and making the accuracy of the trained model low.

In our study, federated sequential learning (FSL) based attack detection for IIoT networks in the FL framework is proposed. FSL is a TCN-based attack detection algorithm for detecting and classifying time-series industrial datasets with node heterogeneity, where FSL effectively extracts temporal features from time-series IIoT data. The algorithm is used to characterize time-series industrial datasets with node heterogeneity. The FSL algorithm has benefits over contemporary algorithms. The following are our main contributions:

We propose the FSL framework for detection of cyber attacks under real-world IIoT, where FSL can extract time-series features from sensory signals from real industrial application, while reducing the degradation of model accuracy due to the node heterogeneity.
We propose a TCN-based local model to characterize a single client’s time-series dataset and extract time-series features from the one-dimensional signals to improve the model’s classification accuracy against cyber attacks.
We propose a FedProx-based federated strategy to handle the low global model accuracy problem due to node heterogeneity. FSL trains sub-models on each endpoint and performs model aggregation with the uploaded sub-models to generate a global model.
We built a real testbed to simulate an industrial environment in our experiments. Raspberry Pi is used as an endpoint to process both traditional and IIoT datasets for our experimental tests with good results, demonstrating the feasibility of our proposed approach.

The remainder of the paper will be arranged as follows. We present the problem to address in Sect. 2. IIoT is vulnerable to cyberattacks, which can cause data to be prone to anomalies and affect the efficiency of industrial production. Then, in Sect. 3, we describe the main principles of FSL. Afterwards, we conduct experiments in Sect. 4 to prove that the suggested model framework is effective in IIoT security. We have some discussions about experimental and test platforms, leaving some clues for our future work in Sect. 5. With a summary and comments on potential research areas, we conclude our study in Sect. 6.

2 Problem formulation

In this section, we discuss the problem with today’s IIoT in cyber attack detection. When the FL framework is applied at the IIoT level, IoT applications, communication layers and cloud servers are vulnerable to attacks. In a real-world IIoT scenario, it may suffer from lack of data centralization, poor device scalability and cyber-attacks. Among other things, the IIoT is vulnerable to cyber-attacks, resulting in reduced productivity, as shown in Fig. 2. In a realistic IIoT scenario, various nodes may be subject to different types and degrees of cyber attacks. These cyber attacks may result in different distributions of transmitted data and heterogeneity of data distribution on different nodes, as shown in Fig. 3. We need to process the time-series data collected by the individual terminals and build a cyber-attack detection model. The detection model determines whether the collected industrial data have been subjected to a cyber attack. If a terminal is under cyber attack, the detection model also needs to classify the type of attack.

Suppose there are I devices, and in each device $device_i$ there is a private data which will not be uploaded $\mathcal {D}_i=\left\{ \mathcal {X}_i, \mathcal {Y}_i\right\}$, where $\mathcal {X}_i$ is a one-dimensional time-series dataset, and $\mathcal {Y}_i$ is the corresponding label. The data on each $\mathcal {D}_i$ will be used for local supervised model training. In training we need to minimize the loss function, so we aim for the following formula:

$$\begin{aligned} \min _{W_i} L_s\left( \mathcal {D}_i, \mathcal {M}_i, W_i\right) , \end{aligned}$$

(1)

where $L_s$ is the loss function for the model task, where the parameters are the training data with labels ${\mathcal{D}}_i$, the model structure ${\mathcal{M}}_i$ and the weighting parameter $W_i$.

The FL framework aims to minimise the loss of the classification model for I devices by training a classification model for cyber attacks. The model can discriminate whether the collected data are anomalous or not and can classify the type of attack on the anomalous data. Thus, the optimization of the local model for the ith device can be written as

$$\begin{aligned} \min _{W_i} \sum _{i=1}^I \frac{|D_i|}{|\cup_i{D}_i|} L_s\left( \mathcal {D}_i, \mathcal {M}_i, W_i\right) . \end{aligned}$$

(2)

Cyber attack detection at IIoT can be implemented as a multi-classification task of the collected data, which requires the construction of a classification model. The role of the model is to classify the cyber attacks and the multi-classification task needs to be completed. In the multi-classification task, we use probability-based classification of the type of cyber attack. To achieve intrusion detection of cyber attacks, the output of the final layer of the model is processed using a softmax function. The output values are converted to true probability values in the softmax function processing, and the category corresponding to the highest probability is used as the type of cyber attack detected by the model. For the ith device, the classification result can be described as follows:

$$\begin{aligned} \begin{aligned} \text{IIoT}\;\text{Status}&=\text {softmax}(P\left( Y=c_k \mid X=x\right) )\\&=\text {softmax}{\frac{p\left( X=x \mid Y=c_k\right) P\left( Y=c_k\right) }{\sum _k p\left( X=x \mid Y=c_k\right) P\left( Y=c_k\right) }}, \end{aligned} \end{aligned}$$

(3)

where $c_k$ denotes the different types of cyber attacks($c_0$ indicates a normal state) and k denotes the number of attack types.

3 Methodology

In this section, we propose an FSL framework to deal with the IIoT challenge of extracting time-series features and heterogeneity of data across different nodes. A generic IIoT architecture based on federated learning is shown in Fig. 4. Different types of edge industrial devices train a local model using local datasets and communicate with the cloud server for model gradient upload to obtain a global model. This global model is transferred to each edge industrial device for local model updates.

3.1 Local time-series modeling

To construct distributed local models, TCN is able to efficiently perform convolutional operations on the temporal convolutional layers to extract features of the time-series signal across time steps. In the one-dimensional TCN network, the input of one layer feeds into the output of the following layer. To enable TCN to better extract information about the timing features in a one-dimensional time-series signal, we use a dilation convolution method. Dilation convolution enables to obtain global information about the whole sequence so that each point of the output is constructed from most of the points of the whole time-series. The dilation factor d represents the step size of the time gap to be crossed when selecting the input for each layer, where the step size of the time gap to be crossed varies from layer to layer, with the gap increasing as the number of layers in the network increases, depending on frequency at which the sensor collects the signal. At the same time, the use of larger expansion factors facilitates the extension of the sensing domain.

The structure of the TCN used in our study is a 4-layer one with $d=[1,2,4]$. The output $X_l$ for each layer is expressed in the following equation:

$$\begin{aligned} X_l= \,& {} \text {ReLU} \left( \sum _{i=0}^{k-1} w^l_i \otimes x_{s-d \cdot j} \right) , \end{aligned}$$

(4)

$$\begin{aligned} w_{i}= \,& {} w_{i-1} - \eta \sum \frac{\partial L}{\partial w_{i-1}}, \end{aligned}$$

(5)

where d denotes the dilation factor, l denotes the number of layers, $w^l_i$ denotes the lth layer of filters in the ith device, $\otimes$ denotes the one-dimensional time convolution, $s - d\cdot j$ denotes the past direction, $x_{s-d \cdot j}$ denotes the time-series signal of the previous layer and k denotes the filter size. L denotes loss function, the $w^i$ of the ith device consists of all $w_l^i$ and when communication takes place each device needs to upload the $w^i$ of the local model. $\eta$ denotes learning rate.

3.2 Heterogeneous node-federated modeling

When the local model training is completed, each client needs to upload the weight parameters from the locally calculated model. When uploading the model information to the cloud, the server aggregates the model parameters. Because different clients experience different types and levels of cyber attacks, there are differences in the distributions of the collected data, making node heterogeneity across clients.

To cope with the node heterogeneity, we leverage FedProx for model aggregation, which addresses the differences in communication and computing power between devices, as well as the non-i.i.d of data between devices. FedProx is a generalization of FedAvg that improves the local update by proposing an approximation term prox that subtracts the weight parameters from the global model of the previous round from the computed regularization term prox so that the local update does not deviate too much from the global model. The improved objective function $h_i$ to be optimized is as follows:

$$\begin{aligned} \min _w h_k\left( w _\text{global}; w_i\right) =L_i(w)+\frac{\mu }{2}\left\| w_\text{global}-w_i\right\| ^2, \end{aligned}$$

(6)

where $w_\text{global}$ and $w_i$ are the global model and local model, respectively. $L_i(\cdot )$ is the loss function for the ith device, and $\mu$ is the associated parameter.

3.3 Federated sequential learning

To perform attack detection on time-series datasets from real industrial data, we use a TCN model as a local model in the FL framework. We use TCN to train the local model weighting parameters $w_i$ on the local dataset $D_i$ and then update the approximation terms of the global model $M_\text{global}$ using the FedProx algorithm. The process is repeated until convergence as shown in Algorithm 1. The whole FSL approach is demonstrated in Fig. 5.

4 Experiments

In this section, we conduct a number of experiments on our self-built testbed to evaluate the performance of our proposed FSL algorithm. Datasets include MNIST, Bearing Defect Detection (BDD) and Edge-IIoTset. We concentrate on evaluating the effectiveness of the FSL algorithm after training the aforementioned datasets. This can reflect that the FSL algorithm is more advantageous when training time-series data with node heterogeneity. To recreate a real industrial environment, we built a real testbed to conduct the experiments, making the results more convincing.

4.1 Testbed setups

Our testbed consists of a server and a client, as shown in Fig. 6. Figure 7 is a diagram of the testbed, where the server uses a 12th generation Intel Core i7-1260P with MX550 2GB GDDR6 discrete graphics and the clients are four Raspberry Pi 4B with ARM Cortex-A72 (quad-core, 1.5GHz) and 500 MHz VideoCore IV. The server and clients communicate using a wireless local area network (WLAN) built in the lab.

4.2 Dataset

MNIST [36], Bearings Defect Detection [37] are typical datasets for validating the accuracy of deep learning networks, while Edge-IIoTset is an IIoT dataset subject to cyber attacks [38]. Detailed information on the datasets can be found in Table 1.

Table 1 Detailed information about the data sets

Full size table

MNIST

MNIST is a grey-scale image dataset consisting of 250 handwritten numbers 0–9, all of 28$\times$28 in size, including 70,000 data samples and divided the training and test sets in a 6:1 ratio. The dataset is processed so that each sample is one-dimensional and 28$\times$28 in length.

Bearings defect detection (BDD)

Each sample in the bearing defect detection dataset is sampled at 6000 moments. Where the bearing condition is represented by the numbers 0–9, where 0 indicates normal. There are three types of bearing faults, including two faults in the bearing rings and one fault in the bearing balls. There are three different diameters of bearings and therefore nine types of fault conditions.

Edge-IIoTset

Edge-IIoTset generated by Ferrag et al. [38] is a real-world cybersecurity dataset for IoT and IIoT applications. To adequately assess the viability of our suggested strategy, cyber assaults are added to the generated dataset, which is created utilizing a dedicated IIoT testbed with a large number of representative devices, sensors, protocols, and cloud/edge setups, we selected 40,000 normal and 33,000 anomalous samples and divided the training/testing/valid ratio sets in a 7:2:1 ratio. Among them, the Edge-IIoTset data types include:

Normal: This type of normal data are data that are not subject to a cyber attack and therefore do not contain any outliers.
DDoS: This type of attack causes data anomalies due to the number of requests from the attacker, which prevents the target server or network resources from working properly.
Vulnerability scanning attack (VSA): This type of attack is typically conducted by an attacker looking for potential network entry points.
SQL injection (SI): This type of attack exploits a security flaw in the application communicating with the database, thereby corrupting the dataset.
Uploading attack (UA): This type of attack involves uploading malicious program files to a web server, gaining administrative privileges and thus conducting a cyber attack on the data.
Backdoor attack (BA): This type of attack allows an attacker to conduct a cyber attack by exploiting a vulnerability in a system, thereby providing unauthorized remote access to an infected IoT device.
Password breaching attack (PBA): This type of attack involves breaking passwords by trying successive combinations to attack the dataset.

4.3 Comparative models

To objectively evaluate the local model performance, we need to compare our FSL model with classic local models. We choose to use a CNN model and an LSTM model for training and to compare with our proposed FSL. The CNN model is a 4-layer 1D convolution: two fully connected layers, two pooling layers and a softmax output layer; the LSTM model is a 4-layer hidden layer and 2 fully connected layers.

In the FL framework, we set the number of local training sessions to 2 rounds. After each client completes the model training sessions locally, it will communicate with the server to upload the weight parameters and aggregate the global model, and we set the number of communication sessions to 50.

4.4 Evaluation metrics

We employ accuracy, precision, recall, and F1 scores to evaluate performances of CNN, LSTM and FSL models when processing test data from MNIST, BMD and Edge-IIoTset. The definitions are as follows:

$$\begin{aligned} \text {Accuracy }= \,& {} \frac{\text{TP}+\text{TN}}{\text{TP}+\text{FP}+\text{TN}+\text{FN}}, \end{aligned}$$

(7)

$$\begin{aligned} \text {Precision }= \,& {} \frac{\text{TP}}{\text{TP}+\text{FP}}, \end{aligned}$$

(8)

$$\begin{aligned} \text { Recall }= \,& {} \frac{\text{TP}}{\text{TP}+\text{FN}}, \end{aligned}$$

(9)

$$\begin{aligned} \text {F1}\, \text {score}\,= \,& {} 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision}+\text{Recall}}, \end{aligned}$$

(10)

where true positive (TP) is the number of the positive category that the model correctly identifies as positive; true negative (TN) is the number of the negative category that the model identifies as correctly being negative; and false positive (FP) is the number of the negative category that the model incorrectly identifies as positively. False negatives (FN) are the instances when the model misclassified a positive category as a negative; TP + FP is the sum of all positively predicted categories; and TP + FN is the sum of all positively classified categories in the original dataset.

4.5 Results analysis

We trained CNN, LSTM and FSL on our testbed using MNIST, BDD, and Edge-IIoTset, as shown in Figs. 8, 9, 10. The model accuracies gradually improved and the loss gradually decreased after different training rounds. We applied the trained CNN, LSTM, and FSL models on MNIST, and industrial dataset BDD dataset and Edge-IIoTset and obtained results, as demonstrated in Tables 2, 3, 4. Next, we discuss the experimental results in detail.

Table 2 Classification performance of models based on MNIST

Full size table

In the MNIST dataset, both the FSL-trained model and the CNN-trained model were able to achieve accuracy above 0.8 in the test set, with the CNN model reaching a maximum of 0.91 and the FSL model reaching a maximum of 0.82, while the LSTM-trained model had an accuracy of up to 0.78. As can be seen from the trend plots of the loss functions, the loss functions of the FSL and CNN-trained models converged better than the LSTM-trained model, as shown in Fig. 8. Furthermore, the loss function convergence of the CNN and FSL models is approximately the same, but the convergence of the LSTM model is relatively poor and slow. According to Table 2, since the LSTM and FSL models are mainly used to process data with correlated time-series, MNIST does not process non-time series datasets as well as the CNN model with approximate structure. The CNN model is considered to be a better choice for the classification task, so it can be seen from the results that the CNN performs much better. Whereas the outputs of the LSTM and FSL models need to be trained through the entire input, both of which have memory capabilities, the sensitivity of the LSTM and FSL models to time-series leads to poor classification results when dealing with MNIST data without time-series properties.

Table 3 Classification performance of models based on BDD

Full size table

In the BDD dataset, both the FSL-trained model and the LSTM-trained model achieved an accuracy of 0.9 or higher on the test set. In addition, the loss function of the FSL-trained model converged faster than that of the LSTM-trained model. The CNN-trained model had better classification results in the MNIST dataset, but performed poorly in the BDD dataset. Meanwhile, the loss function of the CNN-trained model converged poorly, fluctuating up and down around 0.5, as shown in Figure 9 and Table 3. Since the BDD dataset is a bearing vibration dataset, which is an industrial time-series dataset. the samples in the BDD dataset have strong time-series correlation, so the LSTM and FSL trained models have high classification accuracy. Moreover, since the LSTM and FSL models have the ability to remember timing features, they can greatly improve the classification accuracy of the models and the classification effect is better than that of the CNN-trained classification effect is better than that of the CNN-trained models.

Table 4 Classification performance of models based on Edge-IIoTset

Full size table

In the Edge-iiotset dataset, the test set had the highest accuracy of 0.95 on the FSL-trained model, while the training set had an accuracy of 0.81 on the LSTM model, but the highest accuracy of 0.78 on the CNN model. Meanwhile, the FSL loss function converged faster and gave the best results. And the loss function convergence of LSTM and CNN models are basically the same, as shown in Fig. 10. According to Table 4, it can be seen that the CNN, LSTM and FSL models generally have higher classification accuracy for normal signals. If we simply implement the binary classification problem of determining the presence or absence of a cyber attack, all three models have good recognition results. However, to classify cyber attacks specifically, the classification accuracy is different for different types of cyber attacks. One of them, according to Table 4 shows that in the case of VSA, UA and BA, detection and classification are poor and the models tend to confuse VSA and UA with normal signals. This is because these two types of attacks have less impact on the timing signal, which tends to make the model detect poorly. However, the overall results show that the models trained with FSL are better at detecting overall cyber attacks than those trained with LSTM and CNN.

5 Discussion

In the experiments using MNIST, the classification accuracy of the models trained by CNN, LSTM and FSL were lower, especially the LSTM model with the memory module and the FSL model, which did not perform as well as the traditional CNN model. This is due to the data pre-processing in the MNIST dataset, which converts all image samples with multiple layers into one dimension, resulting in the loss of structural information. Whereas MNIST is an image dataset, the data itself is not a time-series. However, when using the BDD dataset and the Edge-IIoTSet dataset, the presence of temporal information in these two industrial datasets allows the LSTM and FSL models to perform better, especially the FSL model. An accuracy of 0.92 can be achieved in the BDD dataset and 0.95 in the Edge-IIoTSet, which can demonstrate the efficacy of our proposed technique in identifying samples of cyber attacks.

In fact, there are many types of cyber attacks in real industrial environments, and we only represent six of the more common types of cyber attacks. More types of cyber attack data can be collected in the future to further improve the experiment. During the experiment, only four Raspberry Pi 4Bs were used as clients, making the heterogeneity of nodes in the experiment not obvious enough, and the experimental platform can be improved in the future.

6 Conclusions

IIoT brings revolutionary technology support to modern times. In this work, we proposed a possible solution based on federated sequence learning (FSL) for the construction of global models of different time-series in the context of data isolation, and which has the capability to detect cyber attacks. By conducting experiments on MNIST, BDD and Edge-iiotset industrial data in a distributed testbed, it is verified that FSL provides better detection performances for industrial time-series sensory data more effectively than traditional methods. Thus, our proposed attack detection method FSL is promising in terms of feasibility, robustness, and accuracy in practical IIoT scenarios. FSL can be applied to the detection of cyber attacks on one-dimensional signals in the real IIoT. In the future, we can conduct research on the detection of unknown attacks and improve the scope of the application of attack detection systems.

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author, F.L., upon reasonable request.

References

Pivoto D, Fernandes L, Righi R, Rodrigues J, Lugli A, Alberti A (2021) Cyber-physical systems architectures for industrial internet of things applications in Industry 4.0: a literature review. J Manuf Syst 58:176. https://doi.org/10.1016/j.jmsy.2020.11.017
Article Google Scholar
Khan WZ, Rehman MH, Zangoti HM, Afzal MK, Armi N, Salah K (2020) Industrial internet of things: recent advances, enabling technologies and open challenges. Comput Electr Eng 81:106522. https://doi.org/10.1016/j.compeleceng.2019.106522
Article Google Scholar
Laghari A, Wu K, Laghari R, Ali M, Ayub Khan A (2021) A review and state of art of internet of things (IoT). Arch Comput Methods Eng. https://doi.org/10.1007/s11831-021-09622-6
Article Google Scholar
Gaber MM, Aneiba A, Basurra S, Batty O, Elmisery AM, Kovalchuk Y, Rehman MHU (2019) Internet of things and data mining: from applications to techniques and systems. Int J Account Financ Report. https://doi.org/10.1002/widm.1292
Article Google Scholar
Peres RS, Jia X, Lee J, Sun K, Colombo AW, Barata J (2020) Industrial artificial intelligence in industry 4.0—systematic review, challenges and outlook. IEEE Access 8:220121–220139. https://doi.org/10.1109/ACCESS.2020.3042874
Article Google Scholar
Li F, Shinde A, Shi Y, Ye J, Li X-Y, Song W-Z (2019) System statistics learning-based iot security: feasibility and suitability. IEEE Internet Things J 6(4):6396–6403. https://doi.org/10.1109/JIOT.2019.2897063
Article Google Scholar
Li F, Shi Y, Shinde A, Ye J, Song W-Z (2019) Enhanced cyber-physical security in internet of things through energy auditing. IEEE Internet Things J 6(3):5224–5231. https://doi.org/10.1109/JIOT.2019.2899492
Article Google Scholar
Zhao L, Li F, Valero M (2021) Hybrid decentralized data analytics in edge-computing-empowered iot networks. IEEE Internet Things J 8(9):7706–7716. https://doi.org/10.1109/JIOT.2020.3040657
Article Google Scholar
Li F, Li Q, Zhang J, Kou J, Ye J, Song W, Mantooth AH (2021) Detection and diagnosis of data integrity attacks in solar farms based on multi-layer long short-term memory network. IEEE Trans Power Electron 36(3):2495–2498. https://doi.org/10.1109/TPEL.2020.3017935
Article Google Scholar
Li Z, He Y, Yu H, Kang J, Li X, Xu Z, Niyato D (2022) Data heterogeneity-robust federated learning via group client selection in industrial iot. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2022.3161943
Article Google Scholar
Liu L, Shen J, Zhang M, Wang Z, Tang J (2018) Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction. Proc AAAI Conf Artif Intell. https://doi.org/10.1609/aaai.v32i1.11307
Article Google Scholar
Serror M, Hack S, Henze M, Schuba M, Wehrle K (2020) Challenges and opportunities in securing the industrial internet of things. IEEE Trans Ind Inf. 17(5):2985–2996. https://doi.org/10.1109/TII.2020.3023507
Article Google Scholar
Meneghello F, Calore M, Zucchetto D, Polese M, Zanella A (2019) IoT: internet of threats? A survey of practical security vulnerabilities in real IoT devices. IEEE Internet Things J. 6(5):8182–8201. https://doi.org/10.1109/JIOT.2019.2935189
Article Google Scholar
Franco J, Arış A, Canberk B, Uluagac S (2021) A survey of honeypots and honeynets for internet of things, industrial internet of things, and cyber-physical systems. IEEE Commun Surv Tutor. https://doi.org/10.1109/COMST.2021.3106669
Article Google Scholar
Tsiknas K, Taketzis D, Demertzis K, Skianis C (2021) Cyber threats to industrial IoT: a survey on attacks and countermeasures. IoT 2(1):163–186. https://doi.org/10.3390/iot2010009
Article Google Scholar
Qu Y, Pokhrel SR, Garg S, Gao L, Xiang Y (2020) A blockchained federated learning framework for cognitive computing in industry 4.0 networks. IEEE Trans Ind Inf 17(4):2964–2973. https://doi.org/10.1109/TII.2020.3007817
Article Google Scholar
Zhao L, Li J, Li Q, Li F (2022) A federated learning framework for detecting false data injection attacks in solar farms. IEEE Trans Power Electron 37(3):2496–2501. https://doi.org/10.1109/TPEL.2021.3114671
Article Google Scholar
Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Proc Magazine. https://doi.org/10.1109/MSP.2020.2975749
Article Google Scholar
Nguyen DC, Ding M, Pathirana PN, Seneviratne A, Li J, Niyato D, Poor HV (2021) Federated learning for industrial internet of things in future industries. IEEE Wirel Commun 28(6):192–199. https://doi.org/10.1109/MWC.001.2100102
Article Google Scholar
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017)Communication-Efficient Learning of Deep Networks from Decentralized Data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282. PMLR. https://doi.org/10.48550/arXiv.1602.05629
Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y (2020) Federated Learning with Matched Averaging. https://doi.org/1048550/ arXiv:2002.06440
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learning Syst 2:429–450
Google Scholar
Karimireddy SP, Kale S, Mohri M, Reddi SJ, Stich SU, Suresh AT SCAFFOLD: Stochastic Controlled Averaging for Federated Learning, 41. https://doi.org/10.1109/DCOSS.2019.00118
Chen Z, Liao W, Hua K, Lu C (2021) Towards asynchronous federated learning for heterogeneous edge-powered internet of things. Digit Commun Netw 7(3):317–326. https://doi.org/10.1016/j.dcan.2021.04.001
Article Google Scholar
Ouyang X, Xie Z, Zhou J, Huang J, Xing G ClusterFL: A similarity-aware federated learning system for human activity recognition. In: Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, pp. 54–66. ACM. https://doi.org/10.1145/3458864.3467681
Ioannou C, Vassiliou V (2019) Network attack classification in IoT using support vector machines, https://doi.org/10.1109/DCOSS.2019.00118
Zhang Y, Li P, Wang X (2019) Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access 7:31711–31722. https://doi.org/10.1109/ACCESS.2019.2903723
Article Google Scholar
Li Y, Xu Y, Liu Z, Hou H, Zheng Y, Xin Y, Zhao Y, Cui L (2020) Robust detection for network intrusion of industrial IoT based on multi-CNN fusion. Measurement 154:107450. https://doi.org/10.1016/j.measurement.2019.107450
Article Google Scholar
Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396. https://doi.org/10.1016/j.ins.2019.10.069
Article Google Scholar
Xin L, Ziang L, Yingli Z, Wenqiang Z, Dong L, Qingguo Z (2022) TCN enhanced novel malicious traffic detection for IoT devices. Connect Sci 34(1):1322–1341. https://doi.org/10.1080/09540091.2022.2067124
Article Google Scholar
Ge M, Fu X, Syed N, Baig Z, Teo G, Robles-Kelly A (2019) Deep Learning-Based Intrusion Detection for IoT Networks. In: 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 256–25609. IEEE. https://doi.org/10.1109/PRDC47002.2019.00056
Aamir M, Ali Zaidi SM (2021) Clustering based semi-supervised machine learning for DDoS attack classification. J King Saud Univ-Comput Inf Sci 33(4):436–446. https://doi.org/10.1016/j.jksuci.2019.02.003
Article Google Scholar
Hara K, Shiomoto K (2020) Intrusion Detection System using Semi-Supervised Learning with Adversarial Auto-encoder. In: NOMS 2020–2020 IEEE/IFIP Network Operations and Management Symposium, pp. 1–8. https://doi.org/10.1109/NOMS47738.2020.9110343
McDermott CD, Majdani F, Petrovski AV (2018) Botnet Detection in the Internet of Things using Deep Learning Approaches. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE. https://doi.org/10.1109/IJCNN.2018.8489489
Pacheco J, Benitez VH, Félix-Herrán LC, Satam P (2020) Artificial neural networks-based intrusion detection system for internet of things fog nodes. IEEE Access 8:73907–73918
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Neupane D, Seok J (2020) Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: a review. IEEE Access 8:93155–93178. https://doi.org/10.1109/ACCESS.2020.2990528
Article Google Scholar
Ferrag MA, Friha O, Hamouda D, Maglaras L, Janicke H (2022) Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 10:40281–40306. https://doi.org/10.1109/ACCESS.2022.3165809
Article Google Scholar

Download references

Funding

This work was supported by National Key Research and Development Project under Grants 2022YFB3305800-5 and 2018YFC1900800-5, National Science Foundation of China under Grants 62125301, 61890930-5, 61903010, and 62021003, Beijing Outstanding Young Scientist Program under Grant BJJWZYJH01201910005020, Beijing Natural Science Foundation under Grant KZ202110005009, and Beijing Youth Scholar under Grant No. 037.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Pingleyuan No. 100, Beijing, 100124, Beijing, China
Fangyu Li, Junnuo Lin & Honggui Han

Authors

Fangyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Junnuo Lin
View author publications
You can also search for this author in PubMed Google Scholar
Honggui Han
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.L. and J.L. wrote the main manuscript text and prepared figures. All authors reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fangyu Li.

Ethics declarations

Competing interests

The authors declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, F., Lin, J. & Han, H. FSL: federated sequential learning-based cyberattack detection for Industrial Internet of Things. Industrial Artificial Intelligence 1, 4 (2023). https://doi.org/10.1007/s44244-023-00006-2

Download citation

Received: 31 December 2022
Accepted: 29 January 2023
Published: 17 March 2023
DOI: https://doi.org/10.1007/s44244-023-00006-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FSL: federated sequential learning-based cyberattack detection for Industrial Internet of Things

Abstract

Similar content being viewed by others

A novel approach detection for IIoT attacks via artificial intelligence

Embedding Time-Series Features into Generative Adversarial Networks for Intrusion Detection in Internet of Things Networks

One-Shot Learning of Ensembles of Temporal Logic Formulas for Anomaly Detection in Cyber-Physical Systems

1 Introduction

2 Problem formulation