Toward Detection and Attribution of Cyber-Attacks in Iot-Enabled Cyber-Physical Systems
Toward Detection and Attribution of Cyber-Attacks in Iot-Enabled Cyber-Physical Systems
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
1
Abstract—Securing Internet of Things (IoT)-enabled cyber- was another campaign that targeted Ukraine power grids in
physical systems (CPS) can be challenging, as security solutions 2015, resulting in power outage that affected approximately
developed for general information / operational technology (IT 230,000 people [4]. In April 2018, there were also reports
/ OT) systems may not be as effective in a CPS setting. Thus,
this paper presents a two-level ensemble attack detection and of successful cyber-attacks affecting three U.S. gas pipeline
attribution framework designed for CPS, and more specifically firms, and resulted in the shutdown of electronic customer
in an industrial control system (ICS). At the first level, a deci- communication systems for several days [1]. Although security
sion tree combined with a novel ensemble deep representation- solutions developed for information technology (IT) and op-
learning model is developed for detecting attacks imbalanced erational technology (OT) systems are relatively mature, they
ICS environments. At the second level, an ensemble deep neural
network is designed for attack attribution. The proposed model may not be directly applicable to ICSs. For example, this could
is evaluated using real-world datasets in gas pipeline and water be the case due to the tight integration between the controlled
treatment system. Findings demonstrate that the proposed model physical environment and the cyber systems.
outperforms other competing approaches with similar computa- Therefore, system-level security methods are necessary to
tional complexity. analyze physical behaviour and maintain system operation
Index Terms—Cyber-attacks, Deep representation learning, availability [1]. ICS security goals are prioritized in the order
Cyber threat detection, Cyber threat attribution, Industrial of availability, integrity, and confidentiality, unlike most IT/OT
Control System, ICS, Cyber-physical systems, Industrial Internet systems (generally prioritized in the order of confidentiality,
of Things (IIoT)
integrity, and availability) [5]. Due to close coupling between
variables of the feedback control loop and physical processes,
I. INTRODUCTION (successful) cyber-attacks on ICS can result in severe and
Internet of Things (IoT) devices are increasingly integrated potentially fatal consequences for the society and our environ-
in cyber-physical systems (CPS), including in critical infras- ment. This reinforces the importance of designing extremely
tructure sectors such as dams and utility plants. In these robust safety and security measurements to detect and prevent
settings, IoT devices (also referred to as Industrial IoT or intrusions targeting ICS [1].
IIoT) are often part of an Industrial Control System (ICS), Popular attack detection and attribution approaches include
tasked with the reliable operation of the infrastructure. ICS those based on signatures and anomalies. To mitigate the
can be broadly defined to include supervisory control and known limitations in both signature-based and anomaly-based
data acquisition (SCADA) systems, distributed control systems detection and attribution approaches, there have been attempts
(DCS), and systems that comprise programmable logic con- to introduce hybrid-based approaches [6]. Although hybrid-
trollers (PLC) and Modbus protocols. based approaches are effective at detecting unusual activates,
The connection between ICS or IIoT-based systems with they are not reliable due to frequent network upgrades, result-
public networks, however, increases their attack surfaces and ing in different Intrusion Detection System (IDS) typologies
risks of being targeted by cyber criminals. One high-profile [7]. Beyond this, conventional attack detection and attribution
example is the Stuxnet campaign, which reportedly targeted techniques mainly rely on network metadata analysis (e.g. IP
Iranian centrifuges for nuclear enrichment in 2010, causing addresses, transmission ports, traffic duration, and packet inter-
severe damage to the equipment [1], [2]. Another example vals). Therefore, there has been renewed interest in utilizing
is that of the incident targeting a pump that resulted in the attack detection and attribution solutions based on Machine
failure of an Illinois water plant in 2011 [3]. BlackEnergy3 Learning (ML) or Deep Neural Networks (DNN) in recent
times.
Amir Namavar Jahromi and Hadis Karimipour are with the School In addition, attack detection approaches can be categorized
of Engineering, University of Guelph, Ontario, Canada (email: ana-
[email protected] and [email protected]). into network-based or host-based approaches. Supervised clus-
Ali Dehghantanha is with the School of Computer Science, University of tering, single-class or multi-class Support Vector Machine
Guelph, Ontario, Canada (email: [email protected]) (SVM), fuzzy logic, Artificial Neural Network (ANN), and
Kim-Kwang Raymond Choo is with the Department of Information Sys-
tems and Cyber Security and the Department of Electrical and Computer DNN are commonly used techniques for attack detection in
Engineering, University of Texas at San Antonio (UTSA), San Antonio, TX network traffic. These techniques analyze real-time traffic data
78249, USA. He also has courtesy appointments with UTSA’s Department of to detect malicious attacks in a timely manner. However,
Electrical and Computer Engineering and Department of Computer Science,
and UniSA STEM at the University of South Australia, Adelaide, SA 5095, attack detection that considers only network and host data
Australia. (email: [email protected]) may fail to detect sophisticated attacks or insider attacks.
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
2
Unsupervised models that incorporate process/physical data other systems. Finally, Section VI concludes this paper.
can complement a system’s monitoring since they do not
rely on detailed knowledge of the cyber-threats. In general, II. RELATED WORK
a sophisticated attacker with sufficient knowledge and time,
ML-based attack detection techniques are generally de-
such as a nation state advanced persistent threat actor, can
signed to detect moving targets that constantly evolve by
potentially circumvent robust security solutions. Furthermore,
learning new vulnerabilities and not relying on known attack
most of the existing approaches ignore the imbalanced prop-
signatures or normal network patterns [6]. We will now discuss
erty of ICS data by modeling only a system’s normal behavior
the related literature as follows.
and reporting deviations from normal behavior as anomalies.
This is, perhaps, due to limited attack samples in existing
datasets and real-world scenarios. Although using majority A. Conventional Machine Learning
class samples is a good solution to avoid issues due to In [11], ML algorithms, such as K-Nearest Neighbor
imbalanced datasets, the trained model will have no view of (KNN), Random Forest (RF), DT, Logistic Regression (LR),
the attack samples’ patterns. In other words, such an approach ANN, Na¨ıve Bayes (NB), and SVM were compared in terms of
fails to detect unseen attacks and suffers from a high false- their effectiveness in detecting backdoor, command, and SQL
positive rate [8]. Thus, there have been attempts to utilize injection attacks in water storage systems. The comparative
DL approaches, for example, to facilitate automated feature summary suggested that the RF algorithm has the best attack
(representation) learning to model complex concepts from detection, with a recall of 0.9744; the ANN is the fifth-best
simpler ones [9] without depending on human-crafted features algorithm, with a recall of 0.8718; and the LR is the worst-
[10]. performing algorithm, with a recall of 0.4744. The authors
Motivated by the above observations, this paper presents our also reported that the ANN could not detect 12.82% of the
proposed novel two-stage ensemble deep learning-based attack attacks and considered 0.03% of the normal samples to be
detection and attack attribution framework for imbalanced ICS attacks. In addition, LR, SVM, and KNN considered many
datasets. In the first stage, an ensemble representation learning attack samples as normal samples, and these ML algorithms
model combined with a Decision Tree (DT) is designed are sensitive to imbalanced data. In other words, they are
to detect attacks in an imbalanced environment. Once the not suitable for attack detection in ICS. In [12], the authors
attack is detected, several one-vs-all classifiers will ensemble presented a KNN algorithm to detect cyber-attacks on gas
together to form a larger DNN to classify the attack attributes pipelines. To minimize the effect of using an imbalanced
with a confidence interval during the second stage. Moreover, dataset in the algorithm, they performed oversampling on the
the proposed framework is capable of detecting unseen attack dataset to achieve balance. Using the KNN on the balanced
samples. A summary of our approach in this study is as dataset, they reported an accuracy of 97%, a precision of 0.98,
follows: a recall of 0.92, and an f-measure of 0.95. In [13], the authors
1) We develop a novel two-phase ensemble ICS attack presented a Logical Analysis of Data (LAD) method to extract
detection method capable of detecting both previously patterns/rules from the sensor data and use these patterns/rules
seen and unseen attacks. We will also demonstrate to design a two-step anomaly detection system. In the first step,
that the proposed method outperforms other competing a system is classified as stable or unstable, and in the second
approaches in terms of accuracy and f-measure. The one, the presence of an attack is determined. They compared
proposed deep representation learning results in this the performance of the proposed LAD method with the DNN,
method being robust to imbalanced data. SVM, and CNN methods. Based on these experiments, the
2) We propose a novel self-tuning two-phase attack at- DNN outperformed the LAD method in the precision metric;
tribution method that ensembles several deep one-vs- however, the LAD performed better in recall and f-measure.
all classifiers using a DNN architecture for reducing
false alarm rates. The proposed method can accurately B. Deep Learning
attribute attacks with high similarity. This is the first
In [14], the authors used the DNN algorithm to detect
ML-based attack attribution method in ICS/IIoT at the
false data injection attacks in power systems. Findings of
time of this research.
their evaluation using two datasets suggested 91.80% accuracy.
3) We analyze the computational complexity of the pro-
In [15], the authors proposed an autoencoder-based method
posed attack detection and attack attribution framework,
to detect false data injection attacks and clean them using
demonstrating that despite its superior performance, its
denoising autoencoders. Their experiments showed that these
computational complexity is similar to that of other
methods outperformed the SVM-based method. To handle the
DNN-based methods in the literature.
effect of imbalanced data on the algorithm, they ignored attack
The rest of the paper will be organized as follows. Section II data in training the autoencoder. In [16], the authors presented
will introduce the relevant background and related literature. a technique based on Extreme Learning Machine (ELM) for
Section III will describe the proposed framework, followed attack detection in CPS. To address the imbalanced challenge
by the experimental setup in Section IV. In Section V, the of neural networks, training was conducted using only normal
evaluation findings based on two real-world ICS datasets data. Based on these experiments, the proposed ELM-based
demonstrate that the proposed framework outperforms several method outperformed the SVM attack detection method.
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
3
Despite promising results in both conventional ML and deep severe impacts on the environment or human life. In addi-
learning-based techniques, most existing ML algorithms suffer tion, validation of the generated samples is time-consuming.
from the curse of dimensionality due to the large data volume Moreover, removing the normal data from a dataset is not the
generated in real-world ICS. Therefore, feature engineering right solution since the number of attack samples in ICS/IIoT
must reduce the number of features or generate a new rep- datasets is usually less than 10% of the dataset, and most of
resentation of the features to reduce computational overhead. the dataset knowledge is discarded by removing 80% of the
Moreover, an imbalanced dataset of the ICS is another chal- dataset.
lenge that should be considered. Researchers have attempted To avoid the above mentioned problems in handling imbal-
to resolve this issue using oversampling/undersampling, as anced datasets, this study proposed a new deep representation
well as ignoring attack samples and building algorithms using learning method to make the DNN able to handle imbalanced
normal samples. datasets without changing, generating, or removing samples.
Attack attribution seeks to answer the question of “What This model consisted of two unsupervised stacked autoen-
kind of attack was it?” and this is generally more challenging coders, each responsible for finding patterns from one class.
to answer in ICS than in typical IT/OT systems due to the Since each model tries to extract abstract patterns of one
different network structures, industry-specific protocols, and class without considering another, the output of that model
so forth [17], [18]. While there have been a small number represented its inputs well. The stacked autoencoders had three
of ML-based malware attack attributions [19], [20], designing decoders and encoders with input and final representation
robust and effective ML-based attack attribution for ICS and layers. The encoder layers mapped the input representation
IIoT systems appears to be understudied. Thus, this paper to a higher, 800-dimensional space, a 400-dimensional space,
proposes a two-stage ensemble deep learning-based attack de- and the final 16-dimensional space. Equations 1 shows the
tection and attack attribution framework for ICS. Our approach encoder function of an autoencoder. The decoder layers did
incorporates both process and physical data to solve the im- the opposite and tried to reconstruct the input representation
balanced data problem without subsampling or oversampling. by starting from the 16-dimensional new representation and
The proposed framework utilizes an unsupervised ensemble of mapping it to the 400-dimensional, 800-dimensional, and input
learned representations from normal and attack instances for representations. Equations 2 shows the decoder function of an
attack detection. Next, using an ensemble of several one-vs-all autoencoder. These hyperparameters were selected using trial-
classifiers trained on each attack attribute, it forms a two-part and-error to have the best performance in f-measure with the
DNN to attribute the samples into their corresponding attack lowest architectural complexity.
attributes.
hi = σ(wi xi + bi ) (1)
III. THE PROPOSED FRAMEWORK In the above equation, σ denotes an activation function, w
Figure 1 shows the architecture of the proposed framework. is the weight matrix of the encoder, x is a vector of sample
In this framework, the attack detection method detects the features, b is encoder’s bias, h is the encoded representation,
attacks by analyzing the ICS input features using the com- and i ∈ {Normal, Attack}.
bination of ensembled unsupervised DNNs and a decision
tree. If an attack is detected, the sample is passed to several x̂i = σ ′ (wi′ hi + b′i ) (2)
DNNs for detailed analysis. If the attacks were previously
In the above equation, σ′ is the decoder’s activation func-
unseen/unknown, the unseen attack detection module would
tion, w′ is the weight matrix of the decoder, h is the encoded
detect it and label it as an unseen attack. This will be passed on
representation, b′ is decoder’s bias, x̂ is the reconstruction of
for detailed security analysis. Otherwise, the attack attribution
input x, and i ∈ {
Normal, Attack . }
method detects the attribute of the attack.
Each autoencoder trained individually using the loss func-
tion indicated in Equation 3.
A. Proposed Ensemble Attack Detection Method
The proposed attack detection consists of two phases, L(x, x̂) = ||x − x̂||2 = ||x − σ ′ (w ′ (wx + b) + b′ )||2 (3)
namely representation learning and detection phase. Using
a conventional unsupervised DNN on an imbalanced dataset In the above equation, L(x, x̂) denotes the loss between the
yielded a DNN model that mainly learned majority class input x and its reconstruction x̂.
patterns and missed minority class characteristics. Most re- After training the autoencoders, all observations were
searchers have tried to address this challenge by generating passed through both autoencoders, and the final representations
new samples or removing certain samples to make the dataset were fused to form a super-vector for each instance to build
balanced and then passing the data to a DNN. However, a new dataset.
in ICS/IIoT security applications, generating or removing Xnew = [Hnormal, Hattack] (4)
samples are not reasonable solutions. Due to the ICS/IIoT
systems’ sensitivity, generated samples should be validated In the above equation, Xnew is the new dataset consists of a
in a real network, which is impossible since the generated super-vector of the learned representations from normal and
attack samples may be harmful to the network and cause attack autoencoder models for each sample. The Hnormal is a
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
4
Attack Samples
Representation Learning
(Autoencoder) PCA Ensemble Model
Attack Data
Candidate
Attributes
Attack Detection
(Decision Tree)
One-Class SVM
Attack
Unseen Attack Attribute
Detection Module
Labeled as
Unseen
Decision Tree of
Candidate Attributes
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
5
B. Proposed Self-Tuning Attack Attribution Method Algorithm 2: The proposed two-phase attack attribu-
The proposed self-tuning attack attribution method consists tion component
of two phases. In the first phase, a one-vs-all classifier is Data: Dataset including Attack samples from various
trained for each attribute. To train these classifiers, a dataset’s families (X) and the labels (y ∈ [1, c])
attack samples are split into several subsets based on their Training Phase:
x − min(x)
attributes, and one DNN model is trained for every set. The X = z(X): z = ;
Rectified Linear Unit (ReLU) function is used as an activation max(x) − min(x)
function for the hidden layers, and the Sigmoid function is foreach attack type i do
used as the output-layer activation function. Next, the outputs foreach sampe x ∈ X do
of all of the first phase DNNs are passed to the second phase if y[x] = i then
to attribute the instances based on one-vs-all DNNs. yi = 1
In the second phase, the one-vs-all classifiers and a DNN end
ensemble model are combined to compose a more complex else
yi = 0
DNN. This DNN is constructed from two components: a
end
partially-connected element consisting of several one-vs-all
end
classifiers and a fully-connected element fusing the first part’s
end
results and attributes of the samples into different classes .
] Training the binary DTs:
The ReLU activation function is used for the hidden layers foreach two class of attacks do
of the ensemble DNN, and the softmax function is used as Train a DT
its output activation function (equation 8). The Categorical end
Cross-Entropy (CE) is performed as the loss function of the ] Training one-vs-all classifiers:
final DNN (equation 9). In addition, the outputs of this DNN foreach attack type i do
are the two most probable attributions for the given sample. for number of epochs do
This model is called the primary attack attribution method. A for number of batches in the Attack type i do
DT classifier is trained for each pair of attack attributes used Train the one-vs-all classifier (classifieri):
for the final attack attribution from the two candidates, and min L (yi , ŷi );
this is referred to as the secondary attack attribution method. end
esi end
σ(s)i = ΣK (8) end
j=1 esj ] Ensemble model:
where K is the number of classes, and z = (z , ..., z ) ∈ RK . DNN = new neural network;
1 k
foreach classifier i do
DNN.add(classifieri);
K
Σ
CE = − yi log(σ(s)i) (9) end
i=1 DNN.add(fully − connected neural network);
for number of epochs do
where yi is the label of the −
i th class, and log(σ(s)i) is the
for number of batches in training data do
output of the softmax function.
train the whole network: min L (y, ŷ);
This method is self-tuning since it can tune itself by
end
changing the attack patterns without needing pre-processing.
end
This results from using the gradient descent technique to
Testing Phase:
simultaneously update the weights of all one-vs-all classifiers
xtest = z(xtest);
and the ensemble model. This feature is useful when a new
DNN.predict2bests(xtest);
attack attribute is discovered, and then it is added to the
Pass xtest to the DT;
attack attribution method . This work is done by passing the
Output: Attack type (ŷ)
new dataset, including the new attack attribute, through the
proposed attack attribution method. Algorithm 2 shows the
algorithm od the proposed attack attribution component.
Na¨ıve Malicious Response Injection (NMRI), Complex Mali-
IV. EXPERIMENTAL SETUP cious Response Injection (CMRI), Malicious State Command
Injection (MSCI), Malicious Parameter Command Injection
A. Dataset (MPCI), Malicious Function Code Injection (MFCI), Denial
As previously discussed, we evaluated the proposed frame- of Service (DoS), and Reconnaissance (Recon) attacks. It
work using two real-world ICS datasets. The first dataset reportedly contained 274,628 observations, in which 214,580
was collected at the Mississippi State University [23] from (78.14%) were normal samples, and the remaining 60,048
a gas pipeline system consisting of sensors and actuators, a (21.86%) samples were attack samples. This dataset also
communication network, and supervisory control. This dataset consisted of 17 features of network and field states.
consists of normal samples and seven attack types, including The second dataset was the Secure Water Treatment (SWaT)
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
6
dataset [24], collected at Singapore University of Technology • Recall indicates the number of samples that are detected
from a water treatment system, consisting of 449,920 samples. as attack correctly over the total samples of the attack in
In this dataset, 87.9% and 12.1% were normal and attack the dataset (see Equation 12).
samples, respectively. Each dataset sample was formed by 51 • F-measure is the harmonic value of precision and recall
features that were the physical measurements of the systems. (see Equation 13).
In addition, this dataset consisted of 31 different attack sce- In the detection task, the desired class is the attack one. The
narios that could be used for attack attribution. attack class is considered as the positive class for precision,
recall, and f-measure metrics.
B. Pre-Processing
As shown in Figure 1, the proposed framework consists of D. Feature Extraction
several DNNs that accept the raw features as input and map
PCA was chosen for dimensionality reduction and also to
them to new representations for attack detection and attack
extract the best features from super-vectors. It also improve
attribution. Similar to some other approaches [25], [26], [27],
the performance of the DT classifier by extracting independent
the data was normalized using the min-max technique before
features in an unsupervised manner.
passing it through the methods to make them unbiased against
To extract the best features using the PCA, 10-fold cross-
the features. This was the only pre-processing for the proposed
validation was performed on each dataset’s possible number
framework. Moreover, 10-fold cross-validation was performed
of features. The dataset’s principal components were extracted
to obtain the results.
in each run, and the model was trained and tested using the
principal components. To make the PCA unbiased to the test
C. Evaluation Metrics
data, training was performed on the training data . The number
To ensure fairness in comparison, this study evaluated the of principal components with the best f-measure over ten runs
performance of the proposed attack attribution method using was then selected as the number of PCA components.
the DT classifier on the original representation and approaches
that used the same dataset(s) in their original articles. However, V. DISCUSSIONS
for the proposed self-tuning attack attribution method, we were
not able to find similar approaches. A comparison with the The proposed attack detection and attack attribution meth-
Fuzzy C-Mean (FCM) clustering [25] verified that FCM could ods form a framework that can keep ICS/IIoT systems secure.
detect only four out of eight classes in the gas pipeline dataset This framework is proposed to address the challenge of
(while our model attributes all eight classes). This suggested ICS imbalanced data without ignoring the minority class or
that the attacks were very similar and hard to classify . balancing the dataset. The proposed framework should be
Similar to other approaches, this study used standard metrics deployed on the physical layer to passively monitor the sensor
to evaluate the performance of machine learning algorithms. data and give an alert when an attack happens. In such a case,
Specifically, it used True Positive (TP), True Negative (TN), the data is sent to the attribution model to detect the attack
False Positive (FP), and False Negative (FN) to represent the attribute. Finally, security experts and incident response teams
number of samples correctly classified as attacks, correctly can handle attacks and prevent potential damages using the
classified as normal, wrongly classified as attacks, and wrongly proposed framework’s efficient, accurate information.
classified as normal, respectively. Using these metrics, it is
possible to define Accuracy (ACC), Precision (Pre), Recall A. The Proposed Attack Detection Method
(Rec), F-measure, Receiver Operating Characteristics (ROC) The proposed attack detection method consists of a deep
curve, and Area Under Curve (AUC) to quantify the perfor- representation learning model with two unsupervised stacked
mance of ML algorithms in performing malware detection. autoencoders, feature extraction using the PCA, and a DT
TP + TN classification.
ACC = (10)
TP + TN + FP + FN Due to the consideration of both attack and normal data
TP in the training step, the proposed attack detection method
Pre = (11) can detect previously seen attacks with better f-measures than
TP + FP
TP the other methods, as can be seen in Table I. To enhance
Rec = (12) the method’s ability to face the previously unseen attacks, an
TP + FN anomaly detection module was added to the system trained on
2 × Pre × Rec (13) the normal data to capture the normal data structure and detect
f − measure =
Pre + Rec anomalies. The OCSVM model was used in this module.
• Accuracy indicates the number of samples that are cor- The proposed attack detection component is scalable to
rectly classified over the entire dataset. Since ICS datasets larger ICS with more features and larger data sets. The only
are imbalanced, this metric is not a good one for evalu- part of the system that depends on the ICS architecture is the
ation (see Equation 10). representation-learning step, which needs more training time
• Precision indicates the number of samples that are de- by increasing the size of the system and/or the data’s size.
tected correctly as attack over total samples detected as However, it will not affect the performance of the proposed
an attack (see Equation 11). framework in real implementation.
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
7
1) General Performance: As observed in Table I, the minority set. Furthermore, the fusion layer consists of useful
proposed method outperformed the base DT model on the representations from both majority (normal) and minority
original representation in all metrics. Moreover, it outper- (attack) data.
formed other techniques in the f-measure metric (i.e. the 3) Previously Unseen Attack Detection: To detect previ-
harmony between precision and recall and an important metric ously unseen attacks, the OCSVM model was added to the
to evaluate imbalanced datasets). In addition, the proposed proposed framework. OCSVM, a type of SVM, attempts
attack detection method outperformed all other techniques to maximize the decision boundary’s margin to yield better
on the SWaT dataset. In other words, the proposed attack generalization. Based on the evaluations, we observed that
detection method achieved good precision without affecting this method correctly detected 86.14% of previously unseen
the recall metric on the data. As discussed earlier, accuracy is attacks in the gas pipeline dataset. Moreover, 94.53% of the
not a useful metric by which to evaluate models’ performances previously unseen attacks were detected correctly in the SWaT
using imbalanced datasets; in this case, by labeling all of the dataset.
samples with the majority class label, the model achieved high 4) Execution Time Comparison: Table III compares the
accuracy (78.14% in gas and 87.9% in the SWaT dataset). proposed attack detection component’s execution time with
Moreover, as shown in Table II, the proposed attack de- other proposed methods in the literature. As illustrated in
tection method has a higher recall (true-positive rate) than Table III, it takes 1200 seconds to train the whole model
other techniques for each attack attribute. In other words, the on the SWaT dataset, while applying the trained model over
proposed method detects more attacks than the others when testing samples takes 2.98 seconds, which means around 0.03
trained on only one attack type. milliseconds for each sample. Moreover, training the proposed
Table I reinforces the importance of the representation learn- method on the Gas Pipeline dataset takes 1115 seconds, while
ing models to ICS datasets. The proposed deep representation the test takes around 1.1 seconds, which means around 0.02
learning step enables the method to develop new features milliseconds for each sample. As can be seen from Table III,
separately for normal and attack data in an unsupervised the proposed model is faster than most DNN-based techniques
manner based on their patterns. In turn, these new features due to its simpler architecture combined with the PCA method,
allow the DT to perform a more effective classification than which makes the DT faster. Besides, the proposed attack
was facilitated using the original features. detection component’s execution time illustrates that it can
2) Imbalanced Testing: The reported higher f-measure in detect attack samples in almost real-time (0.02 milliseconds for
Table I shows that the proposed attack detection method the Gas Pipeline dataset and 0.03 milliseconds for the SWaT
achieved better performance on the imbalanced datasets. To dataset).
evaluate the robustness of the proposed ensemble two-phase
attack detection method for imbalanced ICS data, this study
generated different sets of data with different imbalance ratios B. The Proposed Attack Attribution Method
by varying the number of attack samples in the original In the proposed attack attribution method, a one-vs-all DNN
dataset. These sets were obtained from the original datasets classifier was responsible for extracting each attribute’s pattern
and generated randomly. Next, the new datasets were fed and assigned belonging confidence to each observation. These
into the proposed attack detection method and compared with confidences from all DNNs were passed to another DNN,
several base classifiers, including DT, Logistic Regression which was responsible for attack attribution. Due to the close
(LR), Gradient Boosting (GB), AdaBoost M1 (AB), and patterns of the attacks [25], this DNN was not performed well.
Random Forest (RF). The new imbalanced sets were used However, it can detect attributes better than FCM. To improve
for training to ensure a fair comparison, and the evaluation the attack attribution method performance, this study defined
was performed using a predefined test set. In addition to a two-step method. In the first step, the aforementioned DNN
achieving better performance for the proposed attack detection determined the two best attribute candidates for the observed
method in all metrics, the proposed model resulted in a sample. In the second step, the observed sample was sent to a
robust, consistent performance in all metrics for both datasets DT pre-trained on the samples of two candidate attributes to
(see Figure 2). Robustness refers to the low variance of the detect the best attribute.
changes in the performance of the model. It indicates that the Using one-vs-all classifiers for each attack attribute guaran-
proposed attack detection method achieves high accuracy, low tees that each classifier passes the best result to the ensemble
false positives, and high f-measures simultaneously, thereby DNN model that yields better performance, as this paper
outperforming the competing approaches. More specifically, will show here. These classifiers were connected to a DNN
the high f-measure of the proposed method is significant in fusion model to pass their extracted features and fuse them
performance evaluation for imbalanced datasets. into the fusion model to attribute the samples. Each one-vs-
Beyond this, the findings suggested that the proposed all classifier was a supervised DNN that encoded the input
method mitigates the challenge of the imbalanced problem features within an 8-dimensional space and then into a 128-
in DNNs by separating the attack and normal samples and dimensional space using the ReLU activation function. Based
running separate, unsupervised stacked autoencoders on each on the final representation, the output layer classified it. The
of them. Using this technique, major class samples’ effects on fusion model is another DNN; its inputs were the outputs
the gradient descent algorithm are avoided/omitted, enabling of the one-vs-all classifiers. This fusion model decoded the
the autoencoders to extract more useful features from the input features in the 128-dimensional space, followed by a
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
8
TABLE I
COMPARISON OF THE PROPOSED ATTACK DETECTION METHOD WITH OTHER TECHNIQUES ON THE GAS PIPELINE AND SWAT DATASETS
TABLE II
COMPARISON BETWEEN THE RECALL OF THE PROPOSED ATTACK DETECTION METHOD AND OTHER TECHNIQUES ON THE GAS PIPELINE DATASET
ATTACK ATTRIBUTES
Fig. 2. Comparison of accuracy , AUC, and f-measure of the proposed attack detection method and other basic classifiers on original representation for
different attack IR (A), (B), and (C) on the gas pipeline dataset and (D), (E), and (F) on the SWaT dataset. In the figures, PM is the proposed attack detection
method, DT is the Decision Tree, LR is the Logistic Regression, GB is the Gradient Boosting, AB is the AdaBoost M1, and RF is the Random Forest.
64- dimensional space using the ReLU activation function. The probable attributes to obtain the final attack attribute. This was
output layer used the softmax activation function to attribute labelled the secondary attack attribution method. As observed
the observation to the given attributes (31 for the SWaT dataset in Table IV, all of the metrics improved significantly by using
and seven for the gas pipeline dataset). the final DT model (secondary attack attribution) compared
As discussed in [25], running the FCM algorithm on the with the primary attack attribution method (using the output
gas pipeline dataset with the eight clusters resulted in four of DNN model) on both datasets. Thus, the attack attribution
clusters. This implies that the attacks are very similar and method can attribute all attacks with reasonable confidence (as
share many common features that the FCM algorithm consid- a best or second-best result). Figure 3 compares the confusion
ers them one group. To overcome this problem, this study matrices for the performance of the proposed primary and
detected the two most probable attack attributes for each secondary attack attribution methods for the gas pipeline
sample using the ensemble model. These samples were fed dataset. The confusion matrix for the SWaT dataset is not
into the DT classifier, which was trained on the two most reported due to page limitations since it includes 36 different
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
9
TABLE III
COMPARISON OF THE TRAIN AND TEST EXECUTION TIME OF THE PROPOSED ATTACK DETECTION METHOD WITH OTHER TECHNIQUES ON THE GAS
PIPELINE AND SWAT DATASETS. IN THIS TABLE, S STANDS FOR SECONDS AND W STANDS FOR WEEKS.
attack attributes. Despite the strong evaluation results of the which is similar to the other DNN-based detection methods in
secondary attack attribution method, it cannot discriminate the literature.
between DoS and MPCI samples due to the similar impacts Moreover, the testing computational complexity of the pro-
of these attacks on its features. posed attack detection method is shown in Equation 15.
The proposed attack attribution component is scalable to
larger ICS with more features and larger data sets. However, 2 2
O(n ) + O(n) + O(1) = O(n ) (15)
its execution time depends on the number of attack classes
and almost independent of the system’s size (features). which is similar to all other DNN-based methods (except the
1) Execution Time: Training of the proposed attack attribu- recurrent neural network-based methods) in the literature.
tion component on the Gas Pipeline dataset took 1155 seconds,
Adding the previously unseen module did not change the
while the attribution over test data took 0.65 seconds, which computational complexity of training and testing the pro-
means around 0.05 milliseconds for each sample. Moreover, posed attack detection technique since the OCSVM’s training
training of the proposed attack attribution component on the 3
computational complexity is O(n ). In addition, its testing
SWaT dataset took 3452 seconds, and it classified the test 2
computational complexity is O(n ), which cannot affect the
data in 2.87 seconds, which means around 0.27 milliseconds
proposed attack detection method’s computational complexity.
for each sample. The proposed model’s training and testing
execution time depend on the number of attribute classes 2) The Proposed Attack Attribution Method: The proposed
attack attribution method includes several one-vs-all DNNs
(seven classes for the Gas Pipeline dataset vs. 31 classes for
connected using another DNN to make a deeper DNN model.
the SWaT dataset).
The best two attribution candidates were selected using this
DNN model, and a pre-trained DT on the candidate attributes
C. Computational Complexity was used to detect the final attributes. As a DT should
c (c 1)
In this section, the computational complexity of the pro- be trained for every two attributes, × − DTs should be
2
posed attack detection and attribution methods will be ana- trained; where c is the number of attributes, each has a
3
lyzed. computational complexity of O(n ). Thus, the computational
2 3
The computational complexities of training and testing the complexity of training all of the DTs is O(c × n ), where c
used algorithms are shown in Table V [34], [35]. In this table, is the number of attributes, and n is the number of training
n is the number of training samples, and the computational samples.
complexities were calculated for the worst-case scenario, in In addition to the DTs, the proposed attack attribution
which the number of input features, number of neurons in method used DNNs with the training computational complex-
4
each layer, number of selected support vectors, and depth of ity of O(n ). Combining the DTs’ and the DNN model’s train-
the DT is considered to be n. ing, the computational complexity of training the proposed
1) The Proposed Attack Detection Method: As mentioned attack attribution model is shown in Equation 16.
before, the proposed attack detection method consists of a
novel form of deep representation learning, PCA feature
O(c × n ) + O(n ) = O(n )
2 3 4 4
extraction, and a DT classification. Each deep representation (16)
learning model has three encoding and three decoding layers.
Based on Table V, the computational complexity of training where c is the number of attributes, and n is the number
the proposed deep representation learning in the worst-case of training samples. Since the number of training samples is
4 significantly larger than the number of attributes, the number
scenario is O(n ), where n is the number of training samples.
of attributes is ignored in the computational complexity anal-
The other parts of this method are the PCA and DT algo-
ysis. As seen in Equation 16, the computational complexity
rithms. As mentioned in Table V, in the worst-case scenario,
of training the proposed attack attribution method is similar to
the PCA and DT algorithms’ computational complexity is
3 that of the other DNN methods.
equal to O(n ). Equation 14 shows the computational com-
plexity of training of the proposed attack detection method. The proposed attack attribution’s testing computational
2
complexity is O(n ), similar to the computational complexi-
4 3 3 4
O(n ) + O(n ) + O(n ) = O(n ) (14) ties of the other DNN-based techniques in the literature.
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
10
TABLE IV
RESULTS OF THE PROPOSED SELF-TUNING TWO-PHASE ATTACK ATTRIBUTION METHOD ON BOTH GAS PIPELINE AND SWAT DATASETS
(A) (B)
Fig. 3. Confusion matrices of the proposed attack attribution method on the gas pipeline dataset for (A) the proposed primary attack attribution method and
(B) the proposed secondary attack attribution method
2327-4662 (c) 2022IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2022at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3067667, IEEE Internet of
Things Journal
11
Normal
OCSVM module
Security Monitoring System
[7] J. F. Clemente, “No cyber security for critical energy infrastructure,” simulation and data logging for intrusion detection system research,” in
Ph.D. dissertation, Naval Postgraduate School, 2018. 7th Annual Southeastern Cyber Security Summit, 2015.
[8] C. Bellinger, S. Sharma, and N. Japkowicz, “One-class versus binary [24] J. Goh, S. Adepu, K. N. Junejo, and A. Mathur, “A dataset to support
classification: Which and when?” in 2012 11th International Conference research in the design of secure water treatment systems,” in Crit-
on Machine Learning and Applications, vol. 2, 2012, pp. 102–106. ical Information Infrastructures Security, G. Havarneanu, R. Setola,
[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT H. Nassopoulos, and S. Wolthusen, Eds. Cham: Springer International
Press, 2016. [Online]. Available: http://www.deeplearningbook.org Publishing, 2017, pp. 88–99.
[10] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A [25] S. N. Shirazi, A. Gouglidis, K. N. Syeda, S. Simpson, A. Mauthe,
review and new perspectives,” IEEE Transactions on Pattern Analysis I. M. Stephanakis, and D. Hutchison, “Evaluation of anomaly detection
and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. techniques for scada communication resilience,” in 2016 Resilience Week
[11] M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan, and R. Jain, (RWS), 2016, pp. 140–145.
“Machine Learning-Based Network Vulnerability Analysis of Industrial [26] J. Inoue, Y. Yamagata, Y. Chen, C. M. Poskitt, and J. Sun, “Anomaly
Internet of Things,” IEEE Internet of Things Journal, vol. 6, no. 4, pp. detection for a water treatment system using unsupervised machine
6822–6834, 2019. learning,” IEEE International Conference on Data Mining Workshops,
[12] I. A. Khan, D. Pi, Z. U. Khan, Y. Hussain, and A. Nawaz, “HML-IDS: ICDMW, vol. 2017-November, pp. 1058–1065, 2017.
A hybrid-multilevel anomaly prediction approach for intrusion detection [27] M. Kravchik and A. Shabtai, “Detecting cyber attacks in industrial
in SCADA systems,” IEEE Access, vol. 7, pp. 89 507–89 521, 2019. control systems using convolutional neural networks,” Proceedings of
[13] T. K. Das, S. Adepu, and J. Zhou, “Anomaly detection in industrial the ACM Conference on Computer and Communications Security, no. 1,
pp. 72–83, 2018.
control systems using logical analysis of data,” Computers & Security,
vol. 96, p. 101935, 2020. [28] S. D. Anton, A. Hafner, S. Sinha, and H. Schotten, “Anomaly-based
intrusion detection in industrial aata with SVM and random forests,” in
[14] J. J. Q. Yu, Y. Hou, and V. O. K. Li, “Online False Data Injection Attack
the 27th International Conference on Software, Telecommunicationsand
Detection With Wavelet Transform and Deep Neural Networks,” IEEE
Computer Networks (SoftCOM). IEEE, 2019.
Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3271–3280,
[29] M. Kravchik and A. Shabtai, “Efficient cyber attack detection in indus-
2018.
trial control systems using lightweight neural networks and pca,” IEEE
[15] M. M. N. Aboelwafa, K. G. Seddik, M. H. Eldefrawy, Y. Gadallah, transactions on dependable and secure computing, pp. 1–1, 2021.
and M. Gidlund, “A machine-learning-based technique for false data [30] D. Li, D. Chen, B. Jin, L. Shi, J. Goh, and S. K. Ng, “MAD-GAN:
injection attacks detection in industrial iot,” IEEE Internet of Things Multivariate anomaly detection for time series data with generative
Journal, vol. 7, no. 9, pp. 8462–8471, 2020. adversarial networks,” Lecture Notes in Computer Science (including
[16] W. Yan, L. K. Mestha, and M. Abbaszadeh, “Attack detection for subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
securing cyber physical systems,” IEEE Internet of Things Journal, Bioinformatics), vol. 11730 LNCS, pp. 703–716, 2019.
vol. 6, no. 5, pp. 8471–8481, 2019. [31] Q. Lin, S. Verwer, S. Adepu, and A. Mathur, “TABOR: A graphical
[17] A. Cook, A. Nicholson, H. Janicke, L. Maglaras, and R. Smith, “Attri- model-based approach for anomaly detection in industrial control sys-
bution of Cyber Attacks on Industrial Control Systems,” EAI Endorsed tems,” ASIACCS 2018 - Proceedings of the 2018 ACM Asia Conference
Transactions on Industrial Networks and Intelligent Systems, vol. 3, on Computer and Communications Security, pp. 525–536, 2018.
no. 7, p. 151158, 2016. [32] C. Feng, T. Li, and D. Chana, “Multi-level anomaly detection in
[18] L. Maglaras, M. Ferrag, A. Derhab, M. Mukherjee, H. Janicke, and industrial control systems via package signatures and lstm networks,” in
S. Rallis, “Threats, Countermeasures and Attribution of Cyber Attacks 2017 47th Annual IEEE/IFIP International Conference on Dependable
on Critical Infrastructures,” ICST Transactions on Security and Safety, Systems and Networks (DSN), 2017, pp. 261–272.
vol. 5, no. 16, p. 155856, 2018. [33] M. Macas and W. Chunming, “Enhanced cyber-physical security through
[19] M. Alaeiyan, A. Dehghantanha, T. Dargahi, M. Conti, and S. Parsa, deep learning techniques,” CEUR Workshop Proceedings, vol. 2457,
“A Multilabel Fuzzy Relevance Clustering System for Malware Attack no. 38, 2019.
Attribution in the Edge Layer of Cyber-Physical Networks,” ACM [34] C.-t. Chu, S. Kim, Y.-a. Lin, Y. Yu, G. Bradski, K. Olukotun, and A. Ng,
Transactions on Cyber-Physical Systems, vol. 4, no. 3, pp. 1–22, 2020. “Map-reduce for machine learning on multicore,” in Advances in Neural
[20] U. Noor, Z. Anwar, T. Amjad, and K.-K. R. Choo, “A machine Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman,
learning-based FinTech cyber threat attribution framework using high- Eds., vol. 19. MIT Press, 2007, pp. 281–288.
level indicators of compromise,” Future Generation Computer Systems, [35] J. Su and H. Zhang, “A fast decision tree learning algorithm,” in
vol. 96, pp. 227–242, 2019. Proceedings of the 21st National Conference on Artificial Intelligence -
[21] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Volume 1, ser. AAAI’06. AAAI Press, 2006, p. 500–505.
Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1, pp. 37
– 52, 1987, proceedings of the Multivariate Statistical Workshop for
Geologists and Geochemists.
[22] A. N. Jahromi, J. Sakhnini, H. Karimpour, and A. Dehghantanha,
“A deep unsupervised representation learning approach for effective
cyber-physical attack detection and identification on highly imbalanced
data,” in Proceedings of the 29th Annual International Conference on
Computer Science and Software Engineering, ser. CASCON ’19. USA:
IBM Corp., 2019, p. 14–23.
[23] T. Morris, Z. Thornton, and I. Tunipseed, “Industrial control system
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Guelph. Downloaded on April 07,2021 at 22:22:55 UTC from IEEE Xplore. Restrictions apply.
View publication stats