e Tarjome E17304

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

1030 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO.

1, JANUARY 2023

Trustworthy and Reliable Deep-Learning-Based


Cyberattack Detection in Industrial IoT
Fazlullah Khan , Senior Member, IEEE, Ryan Alturki , Senior Member, IEEE,
Md Arafatur Rahman , Senior Member, IEEE, Spyridon Mastorakis , Member, IEEE,
Imran Razzak , Senior Member, IEEE, and Syed Tauhidullah Shah

Abstract—A fundamental expectation of the stakehold- irrelevant features, allowing high detection rates. The pro-
ers from the Industrial Internet of Things (IIoT) is its posed scheme is evaluated on 15 datasets generated from
trustworthiness and sustainability to avoid the loss of SCADA-based networks. The experimental results show
human lives in performing a critical task. A trustworthy that the proposed scheme outperforms traditional methods
IIoT-enabled network encompasses fundamental security and machine learning-based detection approaches. The
characteristics, such as trust, privacy, security, reliability, proposed scheme improves the security and associated
resilience, and safety. The traditional security mechanisms measure of trustworthiness in IIoT-enabled networks.
and procedures are insufficient to protect these networks
owing to protocol differences, limited update options, and Index Terms—Cybersecurity, data acquisition networks,
older adaptations of the security mechanisms. As a result, deep learning, Industrial Internet of Things (IIoT), supervi-
these networks require novel approaches to increase trust- sory control, trustworthiness.
level and enhance security and privacy mechanisms. There-
fore, in this article, we propose a novel approach to improve
the trustworthiness of IIoT-enabled networks. We propose I. INTRODUCTION
an accurate and reliable supervisory control and data ac-
HE Industrial Internet of Things (IIoT) is a pervasive
quisition (SCADA) network-based cyberattack detection in
these networks. The proposed scheme combines the deep-
learning-based pyramidal recurrent units (PRU) and deci-
T network that connects a diverse set of smart appliances in
the industrial environment to deliver various intelligent services.
sion tree (DT) with SCADA-based IIoT networks. We also In IIoT networks, a significant amount of industrial control
use an ensemble-learning method to detect cyberattacks in systems (ICSs) premised on supervisory control and data ac-
SCADA-based IIoT networks. The nonlinear learning ability
of PRU and the ensemble DT address the sensitivity of
quisition (SCADA) are linked to the corporate network through
the Internet [1]. Typically, these SCADA-based IIoT networks
consist of a large number of field devices [2], for instance,
intelligent electronic devices, sensors, and actuators, connected
Manuscript received 7 December 2021; revised 17 March 2022, 19
May 2022, and 27 June 2022; accepted 2 July 2022. Date of publication to an enterprise network via heterogeneous communications [3].
13 July 2022; date of current version 8 November 2022. This work was This integration provides the industrial networks and systems
supported in part by the National Science Foundation under Awards with supervision and a lot of flexibility and agility [2]–[4],
CNS-2104700, CNS-2016714, and CBET-2124918, in part by the Na-
tional Institutes of Health under Award NIGMS/P20GM109090, the Uni- resulting in greater production and resource efficiency. On the
versity of Nebraska Collaboration Initiative, and in part by the Nebraska other hand, this integration exposes SCADA-based IIoT net-
Tobacco Settlement Biomedical Research Development Funds. Paper works to serious security threats and vulnerabilities, posing a
no. TII-21-5431. (Corresponding authors: Fazlullah Khan; Ryan Alturki;
Md Arafatur Rahman.) significant danger to these networks and the trustworthiness
Fazlullah Khan is with the Department of Computer Science, Abdul of the systems [5]. The trustworthiness of an IIoT-enabled
Wali Khan University Mardan, Mardan 23200, Pakistan (e-mail: fazlullah system ensures that it performs as expected while meeting a
@awkum.edu.pk).
Ryan Alturki is with the Department of Information Science, College of variety of security requirements, including trust, security, safety,
Computer and Information Systems, Umm Al-Qura University, Makkah reliability, resilience, and privacy [6]–[8]. Fig. 1 depicts the
24382, Saudi Arabia (e-mail: [email protected]). fundamental aspects of trustworthiness in an IIoT-enabled net-
Md Arafatur Rahman is with the School of Mathematics and Computer
Science, University of Wolverhampton, WV1 1LY Wolverhampton, U.K. work. The basic goal of the IIoT-enabled system is to increase
(e-mail: [email protected]). trustworthiness by safeguarding identities, data, and services,
Spyridon Mastorakis is with the Department of Computer Sci- and therefore to secure SCADA-based IIoT networks from
ence, University of Nebraska, Omaha, NE 68182 USA (e-mail: smas-
[email protected]). cybercriminals [8], [9].
Imran Razzak is with the School of Computer Science and Engineer- Several protocol updates have been proposed to meet this pur-
ing, Faculty of Engineering, University of New South Wales Sydney, pose, including the distributed network protocol (DNP 3.0) [10].
Sydney, NSW 2052, Australia (e-mail: [email protected]).
Syed Tauhidullah Shah is with the Department of Software Engi- However, it covers authentication and data integrity aspects only,
neering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail: leaving numerous holes for attackers to use known flaws like
[email protected]). hash collision to carry out serious attacks [11]. Information
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2022.3190352. Technology and Industrial Operational technology bodies build
Digital Object Identifier 10.1109/TII.2022.3190352 a typical risk management plan utilizing ISO 27005:2018 [10]
1551-3203 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information. e-tarjome.com

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
KHAN et al.: TRUSTWORTHY AND RELIABLE DEEP-LEARNING-BASED CYBERATTACK DETECTION IN INDUSTRIAL IOT 1031

Fig. 2. SCADA-based industrial IoT network.


Fig. 1. Security and trustworthiness goals and CIA triad.
3) A statistical analytic approach for ensuring the trustwor-
thiness and reliability of the proposed model for SCADA-
to recognize, rank, and implement alleviation techniques in based IIoT networks.
automated or semiautomated enterprises. A comprehensive risk The rest of the article is organized as follows. In Section II,
management plan and adequate preventive measures may not we have discussed the basics of problem formulation. In Sec-
ensure absolute security against growing risks and attacks. This tion III, we have given details of our proposed work, followed
consequently offers a difficult research challenge for industrial by the results and discussion in Section IV. Finally, Section V
and cybersecurity control researchers to 1) obtain the maximum concludes this article.
degree of attack detection, 2) report malicious behavior as soon
as it appears, and 3) isolate the afflicted subsystems as soon II. PRELIMINARIES AND METHODS
as possible. In recent years, there has been a surge toward In this article, we follow the real-world settings [17] of
the utility of artificial intelligence (AI) methods in evolving cyberattacks on an ICS. Through these settings, we leverage
cybersecurity approaches, including attack prediction [12], pri- the datasets from the power control system [18] for detecting
vacy preservation [13], forensic exploration [14], and malware industrial cyberattacks. Fig. 2 illustrates the overall architecture
disclosure [15]. Deep learning (DL) is an AI approach that of a SCADA-based industrial control network. It is made up of
incorporates better learning models with considerable success various layers, including a processing and central master control
in various disciplines [16]. However, designing a reliable and layer, a physical layer, and a corporate layer, all of which are
trustworthy AI, particularly a DL-based cyberattack detection formed in a hierarchical order.
model for the IIoT platforms, remains a research problem.
By considering the limitations of previous techniques, we A. Datasets
employ network attributes of industrial protocols and propose a
pyramidal recurrent unit (PRUs)- and decision tree (DT)-based The physical layer, as indicated in Fig. 2, contains various
ensemble detection mechanism. The proposed mechanism has equipment such as breakers (BR1−BR4), intelligent electronic
the potential to detect cyberattacks in any extensive industrial devices, power generators (G1, G2), and programmable logic
network. The interoperability with other detection engines and controllers. The lowest physical layer collects sensor-based data
expandability for a wider industrial network with multiple areas and is used by the local control logic to make control decisions
distinguishes the proposed mechanism from previous studies. before transmitting it to the devices. They also get instructions
The proposed detection method is disseminable across many from the top or master control/process layers, which also are re-
IIoT domains. Furthermore, our model is straightforward to sponsible for managing and keeping track of the remote physical
implement and deploy and can improve efficiency and accuracy devices and local control layer devices. They are also equipped
while overcoming the shortcomings of previous efforts. The fol- with intrusion detection systems (IDS). The corporate layer
lowing capabilities can characterize the novelty and contribution aids business operations and launches management declarations
of our article. to the master control layer. In this article, we adopt the 15
1) We propose a scalable and efficient DL- and DT-based benchmark datasets obtained from the SCADA power system1
ensemble cyber-attack detection framework to resolve to identify and detect different kinds of attacks. The intrusion
trustworthiness issues in the SCADA-based IIoT net- attacks on the SCADA system are detected using two separate
works. classification events. The binary classification events, compris-
2) We present an efficient probing approach by the SCADA- ing 37 events, are divided into 28 attacks and 9 normal events.
based network data to solve the protocol mismatch lim- The other is the multiclass classification events, encompassing
itations of traditional security solutions for the IIoT
1 https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets
platform.

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
1032 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 1, JANUARY 2023

1) Pyramidal Transformation (PR) for Input: Instead of lin-


early transforming a given input vector x to an output vector
y as y = FL (x) = W.x, where W ∈ RN ×M is the weight
matrix (x ∈ RN to y ∈ RM ), PR subsample it into K pyramidal
levels to obtain various representations with different scales. The
subsampling propagates K vectors as
N
xk ∈ R 2k−1 (1)
k−1
where 2 denotes the sampling rate and k = {1, . . . , K}. For
each k = {1, . . . , K}, the PR learns a scale-specific transforma-
tion as
N ×M
2k−1 K
Wk ∈ R 1 . (2)
Then, PR concatenates the transformed subsamples to get the
pyramidal output y ∈ RM as
 
y = FP (x) = W1 · x1 , . . . , WK · xK (3)
where [·, ·] denotes the concatenation operation. Given an input
vector x, PR subsamples it using a kernel k with 2e + 1 elements
Fig. 3. PRU network.
as
N/s e
 
xk = xk−1 [si]κ[j] (4)
37 different events, such as natural events, regular events, and i=1 j=−e
attack events, each with its own set of class labels.
Each of the 15 datasets has thousands of distinct attacks. The where s denotes the stride operation while k = {2, . . . , K}.
datasets are randomly sampled at 1% to decrease the influence 2) Grouped Linear Transformation: GLT breaks down the
of a small sample size. Accordingly, there are 3711 attack-event traditional linear transformation through factoring in two parts.
samples, 1221 natural-event samples, and 294 no-event samples. First, given the input vector h ∈ RN , GLT split it into g smaller
groups as
B. Problem Formulation   N
h = h1 , . . . , h g ∀hi ∈ R g . (5)
Assume a dataset D = {(x1 , y1 ), . . . , (xn , yn )} with training N M

examples, where xi indicates a vector of real or discrete values. Then, through a linear transformation FL : R g → R g , GLT
M
Further, these values represent the features of vector xi , ex- transforms hi into zi ∈ R g for each i = {1, . . . , g}. The final
pressed as xi1 , xi2 , xi3 , . . . , xim . xij represents the jth feature output vector is then formed by concatenating the resulting g
of any given vector xi . In contrast, the values of yi are of dual output vectors z i as
nature. One type indicates binary classification, while the other  
z = FG (h) = W1 · h1 , . . . , Wg · hg . (6)
consists of classes {1, . . . , K}, representing multiclassification.
Different from that, the second type includes real values, repre- 3) Pyramidal Recurrent Unit: PRU is created by extending
senting regression. In a nutshell, given a training dataset D with the vanilla LSTM architecture using the pyramidal and the GLTs
E examples, the goal is to train a learning algorithm, which described above. At a given time t, PRU combines both input
can produce a classifier output T . The classifier T indicates and context vectors through a transformation function using
a hypothesis in the means of a true function, expressed as Ĝv (xt , ht−1 ) = F̂P (xt ) + FG (ht−1 ) (7)
f (xi ) = yi that predicts new values for yi every given value
of xi . where v ∈ {f, i, c, o} indicates the forget, input, and output gates
of the vanilla LSTM. F̂P (·) denotes the pyramidal, whereas
III. PROPOSED MODEL FG (·) represent the GLTs. The resultant Gv is then fed to the
vanilla LSTM architecture to model PRU. Specifically, a PRU
A. PRU Models cell takes xt ∈ RN , ht−1 ∈ RM , and ct−1 ∈ RM at a given time
Deep PRUs [19] are deep learning models used to manipulate t as input and generate the forget gate signal as
sequential data. Fig. 3 provides an overview of the cell structure 
of a PRU cell. The PRU comprises several cells, each with three ft = σ Ĝf (xt , ht−1 ) . (8)
major layers: 1) the forget gate, 2) input gate, and 3) output The forget gate is in charge of removing each cell’s prior
gate. Also, PRU applies the pyramidal transformation to the information. The input and content gates, which update cell
input vector and uses a grouped linear transformation (GLT) to information is then calculated as
the context vector. Then, they combine them under the umbrella 
of PRU and feed it as input to the LSTM cell. it = σ Ĝi (xt , ht−1 )

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
KHAN et al.: TRUSTWORTHY AND RELIABLE DEEP-LEARNING-BASED CYBERATTACK DETECTION IN INDUSTRIAL IOT 1033

TABLE I
PRUS SETTINGS FOR THE PROPOSED METHOD


ĉt = tanh Ĝc (xt , ht−1 ) . (9)

Similarly, the output gate is calculated as



ot = σ Ĝo (xt , ht−1 ) . (10)

Context vector and cell state are then generated by combining


the inputs with these gate signals as
ct = ft ⊗ ct−1 + it ⊗ ĉt
ht = ot ⊗ tanh (ct ) (11)
where ⊗ is the elementwise multiplication, σ represents the
sigmoid while tanh denotes the tangent activation function. In
Fig. 4. Flowchart of the proposed method.
general, PRU cells are composed of only one layer. However,
increasing the network depth enhances its efficiency and effec-
tiveness when it comes to learning and recognizing complex
sequential patterns [19]. Thus, we use a stack of PRUs with determines the PRU output space manifold and provides a model
different configurations to better classify normal and attack for classifying the output class label yi . The proposed approach
events. The network size and number of layers are two of the is efficient in training and testing, requires little memory, and is
most significant characteristics to consider while designing our appropriate and scalable for intrusion detection in SCADA IIoT
PRUs. Table I lists the PRUs used in our method. because of its ability to eliminate irrelevant features.
Theorem 1: Our method is trusted to detect SCADA-based
B. Ensemble of PRUs IIoT cyberattacks through the ensemble of PRUs and DT.
Proof: Suppose S represents a group of training instances
To produce an aggregated outcome on the result of PRUs, we and a deep-learning model D can build a learner L. L can
employ a DT unit. Suppose DT combines a set of different PRUs be considered a hypothesis around a true function f , which
(denoted by L) over a subspace S for features Fi ⊇ F , indicated accepts an instance x and assigns a label y to it. The pro-
as {Fi (·)}j=1,...,S . {yi }j=1,...,S denotes the class label, which is posed model produces a collection of learners/hypotheses (L)
acquired through distinct PRUs L. Each L can be independently and explores a space H for optimal hypotheses. The proposed
classified for any given example x ∈ Fi through its feature learning process can discover various distinct hypotheses in H,
subspace Fi . The DT considers a set of confidence rates for where each provides identical or varying accuracy outcomes on
each class in the dataset before deciding on the result. The DT training examples of distinct random feature sets. The proposed
module receives the input from L as approach reduces the likelihood of selecting incorrect learners
Input of DT = {Li,c where i ∈ { Number of L} by generating a collection of accurate learners and combining
them through a DT. Combining precise hypotheses can better
AND c ∈ Number of Classes} (12) statistically approximate the function f . Hence, the proposed
where Li,c indicates the confidence rate of ith trained model model is trusted to identify intrusion attacks in SCADA-based
for class c. As an input, the DT takes these confidence rates IIoT networks.
and determines the association among the true label of network
data and the L confidence rate in a hierarchical manner. Fig. 4
shows the schematic structure of the DT and its functions in our IV. EVALUATIONS AND FINDINGS
Fi
proposed scheme. Suppose a training set DM , of M samples We conducted a wide range of experiments with the bench-
Fi
and F features, which each i ∈ S. In the same fashion, DN mark datasets discussed in Section II-A. We implemented our
represents the test set with N samples and F features. DT proposed model using Python 3.7 and the popular deep learning

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
1034 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 1, JANUARY 2023

Fig. 5. Performance analysis of our proposed scheme for binary clas- Fig. 6. Performance analysis of the proposed model for multiclassifi-
sification in terms of accuracy. cation in terms of accuracy.

framework PyTorch.2 We ran all experiments on the NVIDIA


GEFORCE GTX 1080 GPU for our proposed models and al-
ternative baselines. We trained six distinct PRUs, each with a
varied structure. We employ Adam [20], which delivers faster
convergence than the SGD and avoids the challenge of adjust-
ing the learning rate [16]. We selected 256 as the batch size,
200 as the epoch, 0.001 as the learning rate and determined
the hyperparameters through experiments. We also employed
a 10-fold cross-validation approach [21] for both training and
testing, which breaks a dataset randomly into ten segments and
takes one segment for testing and the remaining nine for training.
However, we divided the dataset into three parts at random and Fig. 7. Performance analysis of the proposed model for binary classi-
utilized eight segments for training, one segment for testing, and fication in terms of false-positive rates.
one segment for validation. We use the following metrics and
detection time to measure the effectiveness of our model:
TP + TN
Accuracy = (13)
TP + TN + FP + FN
FP
False positive rate = (14)
FP + TN
where FP, FN, TP, and TP represent the false positive, false
negative, true positive, and true negative, respectively. Accuracy
measures the samples accurately detected by a classifier divided
by total samples.

A. Results
Figs. 5–8 demonstrate the experimental outcomes of the Fig. 8. Performance analysis of the proposed model for multiclassifi-
baselines and our proposed model. Fig. 5 shows the accuracy, cation in terms of false-positive rates.
whereas Fig. 6 describes the false-positive rate for detecting both
normal and abnormal events. In the same fashion, Fig. 7 shows
the accuracy, whereas Fig. 8 illustrates the false-positive rate for of computational time costs, we only consider dataset 9. In
classifying the normal and various attacks in traffic events. addition, we also use a statistical analysis test to assess the
statistical variations in accuracy results.
B. Comparison With Benchmark Methods 1) Comparison of Accuracy Results: We conducted experi-
ments with each model on all 15 datasets. We conducted exper-
We compare our method with RKNN [10] and RSRT [14] iments with each model on all 15 datasets. Figs. 5 and 6 and
models in terms of accuracy and computational time to illus- Table II illustrate the accuracy results. As can be seen in both
trate its superior performance. We follow the same structure Figs. 5 and 6, PRU model 4 is the best model. Thus, for clarity, we
as reported in their work for a fair comparison. We compare only showcase the results of PRU 4. Table II shows how well our
the accuracy results for all of the 15 datasets, and in terms model detects both normal and abnormal events when compared
to other baselines. Similarly, our model also outperforms the
2 https://pytorch.org/ baseline models in the multiclassification attack settings. Also,

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
KHAN et al.: TRUSTWORTHY AND RELIABLE DEEP-LEARNING-BASED CYBERATTACK DETECTION IN INDUSTRIAL IOT 1035

TABLE II
COMPARISON RESULT OF OUR METHOD AND OTHER BASELINE METHODS IN
TERMS OF ACCURACY FOR BINARY AND MULTICLASSIFICATION

Fig. 9. Standard deviation of the proposed method for binary classifi-


cation in terms of accuracy.

it can be seen that our proposed model outperforms both RSRT


and RSKNN for detecting both normal and abnormal events for
binary and multiclass classification.
2) Statistical Analysis of Accuracy Results: We used the
nonparametric Mann–Whitney T-test for a statistical analysis
and looked at the implications of the accuracy results for RSRT
and our proposed method. The nonparametric Mann–Whitney T-
test is considered resilient against outliers, better for small sam-
ple sizes, and is independent of distributional assumptions [22].
The Mann–Whitney T-test compares the observations of two Fig. 10. Standard deviation of the proposed method for multiclassifi-
groups and uses their size for ranking them, and is computed as cation in terms of accuracy.

n1 (n1 + 1) n2 (n2 + 1)
T = R1 − + R2 − (15)
2 2 TABLE III
where R1 and R2 imply the sum of rank in 1 and 2, respectively, DESCRIPTIVE STATISTICS OF OUR METHOD FOR BINARY AND
MULTICLASSIFICATION IN TERMS OF ACCURACY RESULTS
and n1 and n1 represent sample sizes 1 and 2, respectively, by
utilizing the sum of ranks and mean rank for every single group. d
The best group is ranked first, whereas the second-best is ranked c
second in this situation. The statistical analysis’s testing question
can be stated as follows “Is there a statistically significant
difference between the accuracy results obtained by RSRT and
TABLE IV
the proposed models?” We begin by presenting the hypothesis COMPARISON BETWEEN RSRT AND OUR PROPOSED METHOD FOR BINARY
and classifying the assert in the following manner AND MULTICLASSIFICATION IN TERMS OF RANKS
1) Alternate Hypothesis: There are statistical variations for
classifying normal and abnormal events (binary clas-
sification) or various kinds of attacks in traffic events
(multiclassification) in the accuracy outcomes of the two
models.
2) Null Hypothesis: There are no statistical variations for
classifying normal and abnormal events (binary clas- TABLE V
TEST STATISTICS OF OUR METHOD FOR BINARY AND MULTICLASSIFICATION
sification) or various kinds of attacks in traffic events IN TERMS OF ACCURACY RESULTS
(multiclassification) in the accuracy outcomes of the two
models.
Fig. 9 depicts the standard error of standard deviation for clas-
sifying normal and abnormal attacks, whereas Fig. 10 illustrates
the standard error of standard deviation in the multiclassification
settings. We used the statistical SPSS tool to conduct the test.
For binary classification, Tables III– V summarize the rank, test

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
1036 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 1, JANUARY 2023

TABLE VI
COMPARATIVE RESULTS OF PROPOSED MODEL WITH RSRT IN TERMS OF
AVERAGE TIME (SECONDS) AND TRAINING AND TESTING COST

statistics, and descriptive statistics in terms of accuracy results.


The two-tailed p-value, as indicated in Table V, is below 0.05.
Thus, with a confidence level of 95%, we refuse the null and
Fig. 11. Error probability for various learners.
adopt the alternative hypothesis. Consequently, we infer that the
accuracy outcomes of the two models differ statistically. From
Table IV, we may further deduce that these variations are for
our proposed method, indicating its superiority over the RSRT,
based on the sum of ranks and mean rank results. Likewise, we
establish the following hypothesis for classifying normal and
various other attacks in traffic events.

C. Comparison of Computational Time Costs:


We used dataset 9, which comprises 5340 different instances,
to compare the time costs for both our proposed and RSRT
methods. After preprocessing, the dataset contains 3738 and
1602 instances for both training and testing, respectively. We
examine both binary and multiclass configurations to determine Fig. 12. Accuracy results of the proposed method for various learners.
the time cost. On the specified dataset, for both training and
testing, Table VI shows the average time cost. We can see that
our proposed method requires somewhat more time to train than
the RSRT. This is due to the fact that the RSRT model does not
utilize a deep learning method. On the other hand, our proposed
method takes substantially less time for testing than the RSRT
model, making it more effective in real-world scenarios.

D. Reliability and Trustworthiness


To examine the reliability, assume that our method comprises
10 PRUs or learners. Because of the heterogeneous nature of
ensemble learning, the errors that occur in these PRUs are uncor-
related. If some learners are inaccurate, the remaining learners
may be accurate, enabling our method to properly categorize
Fig. 13. TCB security paradigm employing a defense-in-depth method
intrusion attacks in SCADA-based IIoT networks. Fig. 11 shows to ensure trustworthiness.
a simulated probability of error for 10 different learners. We
can see that each learner has an error of less than or equal to
0.14, and 7 of them have an error of less than 0.09, making how confidentiality–integrity–availability (CIA) is sustained.
our method good enough to detect attacks in SCADA-based Fig. 13 illustrates the TCB security paradigm, which is em-
IIoT networks. We carry out experiments with dataset 1 to bedded in our proposed SCADA-based model. The elements
verify the trustworthiness of our proposed model by classifying of the trusted zone inside the security outline include security
attacks with various numbers of learners. Fig. 12 shows the control, hardware and software, and policies, which are coupled
accuracy results of the proposed model utilizing an ensemble to guarantee the maintenance of the CIA triad and the total secu-
of 10 base learners corresponding to a single learner. Also, we rity system adds to trustworthiness. The TCB/SCADA reference
can observe how the accuracy of our proposed method increases monitor/physical security control paradigm prevents and detects
by combining multiple learners. unwanted illegal actions to resources within the trusted zone’s
We can also reveal the trustworthiness of our method by boundary. This layer often includes automated physical access
offering a mapping of the trusted computing base (TCB) model control systems (PACS), for instance, mantraps, CCTV cameras,
to the defense-in-depth model. This mapping can help explain and motion detectors. On the other hand, SCADA systems

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
KHAN et al.: TRUSTWORTHY AND RELIABLE DEEP-LEARNING-BASED CYBERATTACK DETECTION IN INDUSTRIAL IOT 1037

and associated subsystems are typically positioned in remote [7] M. A. Shahriar et al., “Modelling attacks in blockchain systems using
locations, where PACS deployment is challenging. Hence, in this petri nets,” in Proc. IEEE 19th Int. Conf. Trust Secur. Privacy Comput.
Commun., 2020, pp. 1069–1078.
case, a defense-in-depth approach must be supplemented with [8] M. Abdel-Basset, V. Chang, H. Hawash, R. K. Chakrabortty, and M.
extra measures, for example, establishing antimalware resources Ryan, “Deep-IFS: Intrusion detection approach for IIoT traffic in fog
or IDSs in the logical control. environment,” IEEE Trans. Ind. Informat., vol. 17, no. 11, pp. 7704–7715,
Nov. 2021.
They are incompatible with the SCADA settings since they [9] S. Huda, J. Abawajy, B. Al-Rubaie, L. Pan, and M. M. Hassan, “Automatic
are dependent on application program interfaces or protocols. As extraction and integration of behavioural indicators of malware for protec-
a result, these classical detective or preventative security con- tion of cyber–physical networks,” Future Gener. Comput. Syst., vol. 101,
pp. 1247–1258, 2019.
trols fail against blocking unauthorized access. Hence, accurate [10] Information Technology-Security Techniques-Information Security Risk
and reliable security control must be established to ensure a Management, ISO/IEC 27005:2018, 2018.
defense-in-depth approach and improve the trustworthiness of [11] X. Yan, Y. Xu, X. Xing, B. Cui, Z. Guo, and T. Guo, “Trustworthy network
anomaly detection based on an adaptive learning rate and momentum
the SCADA system. We solved these shortcomings in our pro- in IIoT,” IEEE Trans. Ind. Informat., vol. 16, no. 9, pp. 6182–6192,
posed model, formed a reliable cyber-attack detection method, Sep. 2020.
and verified it with massive SCADA network traffic with various [12] D. Wu, Z. Jiang, X. Xie, X. Wei, W. Yu, and R. Li, “LSTM learning
with Bayesian and Gaussian processing for anomaly detection in industrial
attacks targeting several vulnerabilities of SCADA components IoT,” IEEE Trans. Ind. Informat., vol. 16, no. 8, pp. 5244–5253, Aug. 2020.
and the overall system. [13] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for
network intrusion detection systems (UNSW-NB15 network data set),” in
Proc. Mil. Commun. Inf. Syst. Conf., 2015, pp. 1–6.
V. CONCLUSION [14] M. M. Hassan, A. Gumaei, S. Huda, and A. Almogren, “Increasing
the trustworthiness in the industrial IoT networks through a reliable
The ability to protect SCADA-based IIoT networks against cyberattack detection model,” IEEE Trans. Ind. Informat., vol. 16, no. 9,
cyberattacks increases their trustworthiness. The existing se- pp. 6154–6162, Sep. 2020.
curity methods along with machine learning algorithms were [15] A. N. Jahromi et al., “An improved two-hidden-layer extreme learning ma-
chine for malware hunting,” Comput. Secur., vol. 89, 2020, Art. no. 101655.
inefficient and inaccurate for protecting IIoT networks. In this [16] S. T. U. Shah, J. Li, Z. Guo, G. Li, and Q. Zhou, “DDFL: A deep dual
article, we proposed a cyberattacks detection mechanism using function learning-based model for recommender systems,” in Proc. Int.
enhanced deep and ensemble learning in a SCADA-based IIoT Conf. Database Syst. Adv. Appl., 2020, pp. 590–606.
[17] R. C. B. Hink, J. M. Beaver, M. A. Buckner, T. Morris, U. Adhikari,
network. The proposed mechanism is reliable and accurate and S. Pan, “Machine learning for power system disturbance and cyber-
because an ensemble detection model was built using a com- attack discrimination,” in Proc. 7th Int. Symp. Resilient Control Syst., 2014,
bination of the PRU and the DT. The proposed method was pp. 1–8.
[18] A. Derhab et al., “Blockchain and random subspace learning-based IDS
evaluated across 15 datasets generated from a SCADA-based for SDN-enabled industrial IoT security,” Sensors, vol. 19, no. 14, 2019,
network, and a considerable increase in terms of classification Art. no. 3119.
accuracy was obtained. Compared to state-of-the-art techniques, [19] S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi, “Pyra-
midal recurrent unit for language modeling,” in Proc. Conf. Empirical
the obtained outcomes of our method exhibited a good balance Methods Natural Lang. Process., 2018, pp. 4620–4630.
between reliability, trustworthiness, classification accuracy, and [20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
model complexity, resulting in improved performance. 2014, arXiv:1412.6980.
[21] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation,” Encyclopedia
In the future, we will employ more powerful deep learning Database Syst., vol. 5, pp. 532–538, 2009.
models to further improve trustworthiness by detecting cyberat- [22] G. W. Zeoli and T. S. Fong, “Performance of a two-sample Mann-Whitney
tacks accurately. In addition, we will try to formulate and assess nonparametric detector in a radar application,” IEEE Trans. Aerosp. Elec-
tron. Syst., vol. AES-7, no. 5, pp. 951–959, Sep. 1971.
its performance in real-world scenarios. Also, we will work on
the selection of optimal features in scenarios when the features
are not sufficient. Fazlullah Khan (Senior Member, IEEE) re-
ceived the Ph.D. degree in computer science
from Abdul Wali Khan University Mardan, Mar-
REFERENCES dan, Pakistan, in 2020.
[1] Y. Luo, Y. Duan, W. Li, P. Pace, and G. Fortino, “A novel mobile and His research has been published in theIEEE
hierarchical data transmission architecture for smart factories,” IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS,
Trans. Ind. Informat., vol. 14, no. 8, pp. 3534–3546, Aug. 2018. IEEE TRANSACTIONS ON INTELLIGENT TRANS-
[2] C. Gavriluta, C. Boudinet, F. Kupzog, A. Gomez-Exposito, and R. PORTATION SYSTEMS,IEEE TRANSACTIONS ON
Caire, “Cyber-physical framework for emulating distributed control sys- GREEN COMMUNICATIONS AND NETWORKING,
tems in smart grids,” Int. J. Elect. Power Energy Syst., vol. 114, 2020, IEEE INTERNET OF THINGS JOURNAL, IEEE AC-
Art. no. 105375. CESS, Elsevier Computer Networks, Elsevier Fu-
[3] M. S. Mahmoud, M. M. Hamdan, and U. A. Baroudi, “Modeling and ture Generation Computer Systems, Elsevier Journal of Network and
control of cyber-physical systems subject to cyber attacks: A survey of Computer Applications, Elsevier Computers & Electrical Engineering,
recent advances and challenges,” Neurocomputing, vol. 338, pp. 101–115, Springer Mobile Networks & Applications (MoNET), and Springer Neural
2019. Computing and Applications (NCAA). His research interests include
[4] T. Wang, G. Zhang, M. Z. A. Bhuiyan, A. Liu, W. Jia, and M. Xie, “A security and privacy, Internet of Things, machine learning, artificial in-
novel trust mechanism based on fog computing in sensor–cloud system,” telligence, security and privacy issues in the Internet of Vehicles, SDN,
Future Gener. Comput. Syst., vol. 109, pp. 573–582, 2020. fog/cloud computing, and big data analytics.
[5] K. Guo et al., “MDMaaS: Medical-assisted diagnosis model as a service Dr. Khan was the Guest Editor of the IEEE JOURNAL OF BIOMED-
with artificial intelligence and trust,” IEEE Trans. Ind. Informat., vol. 16, ICAL AND HEALTH INFORMATICS, Elsevier Digital Communications and
no. 3, pp. 2102–2114, Mar. 2020. Networks, Springer Multimedia Technology and Applications, Springer
[6] M. Al-Hawawreh and E. Sitnikova, “Developing a security testbed for MoNET, and Springer NCAA. He has served more than 10 conferences
industrial Internet of Things,” IEEE Internet of Things J., vol. 8, no. 7, in leadership capacities including General Chair, General Co-Chair, Pro-
pp. 5558–5573, Apr. 2021. gram Co-Chair, Track Chair, and Session Chair.

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.
1038 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 1, JANUARY 2023

Ryan Alturki (Senior Member, IEEE) received Imran Razzak (Senior Member, IEEE) received
the Ph.D. degree in computer systems from the the Ph.D. degree from University of Technol-
University of Technology Sydney, Ultimo, NSW, ogy Sydney Australian, Australia, in 2019. He is
Australia. currently a Senior Lecturer in human-centered
He is currently an Assistant Professor with AI and machine learning with the School of
the Department of Information Science, College Computer Science and Engineering, University
of Computers and Information Systems, Umm of New South Wales, Sydney, Sydney, NSW,
Al-Qura University, Makkah, Saudi Arabia. He Australia. He is also an Associate Editors/Guest
authored or coauthored several publications in Editor of several journals such as IEEE TRANS-
high-ranked international journals, conferences, ACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS,
and chapters of books. His research interests IEEE JOURNAL OF BIOMEDICAL AND HEALTH IN-
include eHealth, mobile technologies, the Internet of Things, artificial FORMATICS, IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, etc. His
intelligence, cloud computing, and cybersecurity. research interests include machine learning and NLP with its application
to a broad range of topics, particularly deep learning, big data analytics,
healthcare, and cyber security, mainly focusing on the healthcare sector,
and he is passionate about making the healthcare industry a better place
through emerging technologies.
Md Arafatur Rahman (Senior Member, IEEE)
received the Ph.D. degree in ETE from the Uni-
versity of Naples Federico II, Naples, Italy, in
2013.
He is currently a Senior Lecturer with the
School of Engineering, Computing & Mathe-
matical Sciences, University of Wolverhampton,
Wolverhampton, U.K. His research interests in- Syed Tauhidullah Shah received the B.S. de-
clude IoT, wireless communication networks, gree in computer science from Abdul Wali Khan
cognitive radio networks, 5G, vehicular commu- University Mardan, Mardan, Pakistan, and the
nication, big data, cloud-fog-edge computing, M.S. degree from the School of Computer Sci-
machine learning, and security. ence and Technology, Huazhong University of
Science and Technology, Wuhan, China, in
2017 and 2020. He is currently working toward
the Ph.D. degree in machine learning and natu-
Spyridon Mastorakis (Member, IEEE) received ral language processing for requirement elicita-
the five-year diploma (equivalent to M.Eng.) tion with the Department of Software Engineer-
in electrical and computer engineering from ing, University of Calgary, Calgary, AB, Canada.
the National Technical University of Athens His research interests include deep learning, recommender systems,
(NTUA), Athens, Greece, in 2014, and the M.S. Internet of Things, and natural language processing.
and the Ph.D. degrees in computer science from
the University of California, Los Angeles, Los
Angeles, CA, USA, in 2017 and 2019, respec-
tively.
He is currently an Assistant Professor in com-
puter science with the University of Nebraska
Omaha, Omaha, Nebraska. His research interests include network sys-
tems and protocols, Internet architectures, IoT and edge computing, and
security.

Authorized licensed use limited to: The University of Toronto. Downloaded on December 15,2022 at 07:45:56 UTC from IEEE Xplore. Restrictions apply.

You might also like