0% found this document useful (0 votes)
31 views8 pages

A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems

Uploaded by

Janavi Khochare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems

Uploaded by

Janavi Khochare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2020 IEEE-HYDCON

A Semi-Supervised Approach for Detection of


SCADA Attacks in Gas Pipeline Control Systems
Chaitali Joshi, Janavi Khochare, Jash Rathod, Faruk Kazi
Veermata Jijabai Technological Institute
Mumbai, India 400019
cpjoshi [email protected], jskhochare [email protected], jsrathod [email protected], [email protected]

Abstract—The imperative role played by Supervisory Control can cause severe damage to critical systems such as oil and gas
And Data Acquisition (SCADA) systems in providing a central- pipelines, power grids, etc. Another key concern is the capital
ized control for modern infrastructure have made them into expenditure as SCADA systems are worth tens of thousands
one of the most desired targets for malicious attackers owing
to its rapid evolution as well as widespread adoption of these to millions of dollars and can prove costly for the organization
systems. To counter these attacks, it is necessary that more robust and national security. Considering the critical role that SCADA
approaches be adopted. The advent of Machine Learning has systems play in various industries, it is of utmost importance
shown great potential for its usage along with existing Intrusion to implement security measures in order to prevent attacks on
Detection Systems (IDS). This paper presents a novel approach these systems which may affect the various entities dependent
to detect malicious behaviour in SCADA data used to control gas
pipeline system. As most of the data available in this industry on them. Hence to protect these systems and all the entities
are unsupervised, this paper uses an approach that makes use dependent upon them, it is inevitable that powerful security
of a Semi-Supervised Deep Learning architecture - Autoencoder, measures are incorporated to protect SCADA systems from the
that is believed to be most suited for this type of tasks. The disturbances caused by external or internal errors and attacks.
effectiveness of this deep learning network is due to the fact that The solution to the presented scenario is provided by the
it reconstructs the input as the output and in the training process
learns only the most important features of normal observations Intrusion Detection Systems (IDS) as in [4], [5] which prevents
that are representative of the input data; thus malicious data is the above-mentioned issues from affecting the organization.
easily detected due to a high reconstruction error. The proposed The task of intrusion detection involves the monitoring of the
algorithm is validated on gas pipeline control system dataset and systems and interpreting the results obtained to determine their
found to give excellent results in detection. security status. A robust IDS should be able to detect as many
Index Terms—SCADA, Machine Learning, Autoencoder, In-
trusion Detection, Gas Pipelines attacks as possible and distinguish between them correctly
in order to safeguard SCADA. Another important metric that
needs to be considered is the number of false alarms that the
I. I NTRODUCTION
IDS generates - the lesser is the number of such alarms, more
Various industrial processes such as oil and gas pipelines, is the reliability of the system. The extent of damage caused
water distribution, electrical power grids, manufacturing, is a major concern; hence detecting attacks within a specific
wastewater, recycling, etc are managed and supervised by time period is equally important.
Supervisory Control and Data Acquisition (SCADA) systems The prevalent techniques for analyzing the Intrusion De-
which are Industrial Control Systems (ICS). SCADA serves tection Systems (IDS) are deployed by humans who analyze
as the medium to monitor and uphold efficiency and enable the alerts and then decide on the further course of action.
smarter decisions. SCADA systems facilitate the process of This involves critical decision making as the number of alerts
accessing real-time data at a particular plant from anywhere generated could be quite large and the conditions under which
in the world. Businesses, organizations, governments, and the data is collected are dynamic. A possible solution to the
individuals can make data-driven decisions by effectively using above-listed problems is presented by machine learning [6]
and accessing this real-time information. The components and neural networks [7] as they have the capability to gather
of a basic SCADA system comprise of Human-Machine knowledge about the new data and make predictions about new
Interface, Supervisory System, Remote Terminal Units, Pro- data based on the knowledge gained from the previous data.
grammable Logic Controllers (PLCs), Communication Infras- This reduces the time and energy which is being invested for
tructure, SCADA Programming [1]. intrusion detection. Machine learning techniques prove to be
SCADA systems that link facilities like gas pipeline systems more versatile for intrusion detection than human analysts as
might have been constructed keeping in view the above- they single-handedly perform the various security tasks which
mentioned advantages, but these systems are not always involve detection as well as monitoring more efficiently.
shielded from attacks. It is evident that SCADA systems Artificial Intelligence (AI) and machine learning (ML) have
play a valuable role in industries and hence are vulnerable shown potential for its use in a wide variety of domains
to cyber-attacks as discussed in [2], [3]. In the absence of like share price prediction [8] and disease detection in crops
necessary security measures, cyber attacks on SCADA systems [9]. These approaches have also been found to offer an

978-1-7281-4994-3/20/$31.00
Authorized ©2020 of
licensed use limited to: Auckland University IEEE
Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

edge in numerous security applications over the prevalent of both labeled and unlabeled data. This approach proves to be
machine-implemented variants. Attempts have been made in a better choice as it possesses the ability to be actually put to
which a non-parametric and semi-supervised learning algo- use in industry and also augment with the existing approaches
rithm has been implemented for the problem of intrusion in in the industry, where real-time data can be fed to obtain real-
the network [4]. It claims to provide significant improvements time results, unlike supervised techniques. There have been
when compared to anomaly detection and supervised learning attempts using various statistical and basic machine learning
approaches. Another work describes an approach that proposed algorithms [14] and also complex machine learning and deep
the development of machine learning algorithms like KNN, learning approaches [15] for the task of fault detection. As
Naive Bayes, Logistic Regression, Decision Tree, and Random concluded in [15], it can be observed that the use of Deep
Forest [10]. These models were tested online as well as offline Learning algorithms for this purpose proves to be the best fit
networks for the detection of cyberattacks on a SCADA system for real-time fault diagnostics. Hence, our work aims at using
testbed. a semi-supervised learning technique by implementing a deep
The task of finding attacks in SCADA system data is quite neural network architecture that suits best for the type of task
different from other applications of machine learning, making at hand.
it difficult to employ machine learning approaches for the The main contributions of this paper are –
detection of malicious activity [11]. With advancements in • developing an Autoencoder network for providing a Deep
research and technology, modern systems have crossed this Learning approach to anomaly-based Intrusion Detection
hurdle. The versatility of these modern systems is due to the System
fact that they are able to learn from the previously fed data • providing a solution for identification of a malicious
and use this information in the future when confronted by a SCADA attack in a gas pipeline control system
similar condition. Thus, these systems can be implemented • extending the approach for determining the class of attack
in the various domains of security in order to prevent cyber- for greater insights
attacks.
The remainder of the paper is structured as follows. In
Attempts have been made to implement machine learning
Section II various terminologies related to the experiments in
techniques for fault detection in SCADA systems. This work
this paper have been explained. Section III gives an overview
intends to progress the prior work by using much more
of the experimental data used, the research problem and the
robust algorithms and more novel techniques in an attempt to
approach to solve it. In section IV, the results obtained on
catch hold of malicious SCADA activities. There are several
applying the techniques described in this paper have been
approaches that one can opt for when it comes to using how
described. Section V compares approach taken in this paper
one can train a machine learning model that can be most
with the existing work. Finally, Section VI concludes the paper
efficient in detecting the faults. There has been prior work
and suggests directions for future research.
done by implementing Supervised Learning, Unsupervised
Learning, and Semi-Supervised Learning approaches. II. P RELIMINARIES
As in [10], [12], [13], various machine learning algorithms
like Naive Bayes, Support Vector Machines and Random For- A. Autoencoders
est have been implemented to achieve this goal. It makes use Autoencoders (AE) is a family of neural networks for which
of only Supervised Learning algorithms. Supervised Learning the output is similar to the input as described in [16]. The
methods have the ability to be much more accurate than input is first compressed into a latent-space representation and
compared to Unsupervised Learning methods, owing to the this representation is decompressed to obtain an output [17].
fact that it requires prior knowledge of what the outputs (or Autoencoders are based on the idea of data compression and
results) should be. In this approach, the algorithm takes the work on data which is similar to the one on which the network
input and predicts the output which is then compared with has been previously trained. It needs to be taken into account
the true output. Hence, this approach generally produces good that an autoencoder has to be trained for specialized tasks
results but also has the ability to produce undesirable results only, that is an AE trained on Dataset A cannot be used for
if tested on the sort or category of data that it might not classification tasks of Dataset B, where Dataset A and Dataset
have ever encountered while training before. This scenario B are not of the same kind. The functions are also lossy, i.e,
is very common in Intrusion Detection Systems (IDS). Also, the outputs from the decoder will be degraded compared to
as Supervised Learning cannot be used for tasks that involve the data which is input to the encoder.
real-time data analysis such as ours, it is better than the Autoencoder is an algorithm that understands the pattern
Unsupervised or Semi-Supervised Learning approach to be of the input data and based on the trends observed is able to
adopted. distinguish between input and the test data. The key feature of
For the past few years, work has been done using semi- autoencoders is that the hidden layer is smaller (lesser number
supervised learning to provide benefits and results in this of nodes) than the size of the input and output layer and both
area. As discussed in [4], especially for IDS, semi-supervised layers are the same size. The hidden layer is a compressed
approaches have shown signs of being able to achieve excep- representation, and involve two sets of weights (and biases)
tional results. A semi-supervised approach involves the usage that encode input data into the compressed representation and

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

Relu : σ(x) = max(0, x) (5)

When we train our autoencoder, we try to minimize the


reconstruction error function with respect to θ and θ0 :

n
1X
θ∗ , θ0∗ = arg min ε(xi , x0i )
θ,θ 0 n i=1
n (6)
1X
= arg min ε(xi , gθ0 (fθ (x0i )))
Fig. 1: Autoencoder architecture θ,θ 0 n i=1

The implementation of the training process using the above


decode compressed representation into a form which is similar equations is followed by feeding test data to obtain the values
to the input data. The Euclidean distance loss is called the of reconstruction errors for each sample in the test dataset. The
reconstruction error. This error is an indicator of how close following rule can be used to distinguish between an attack
the output is to the input - the smaller the error, the more is and a normal case:
the similarity of output with input. A perfect reconstruction is
not obtained as the number of neurons in the hidden layer is
(
normal if εi < λ
less than the number of neurons in the input layer, but the best c(xi ) = (7)
malicious if εi > λ
possible reconstruction is expected of the parameters [18].
As in [19], [20], let us consider {x1 , x2 , x3 , ..., xn } to be
our training set, each being a d dimensional vector (xi ∈ Rd ) B. Softmax Classifier
and let {x01 , x02 , x03 , ..., x0n } be the reconstructed outputs. The The intention behind using the softmax classifier is to
reconstruction error is defined as: implement the task of multi-class classification. Just like
d logistic regression generates outputs as probabilities in the
case of binary classification, the softmax classifier assigns
X
ε(xi , x0i ) = (xi − x0i )2 (1)
j=1 decimal probabilities to each of the classes in a multiclass
classification problem. It can be said that the softmax classifier
The input vectors (xi ∈ Rd ) are compressed into p(p < is a generalization of the binary form of logistic regression.
d) neurons which form the hidden layer of the encoder. The The probabilities so generated by the normalization and ex-
activation of the neuron i in the hidden layer is given by: ponentiation lie in the range of 0 to 1 and must add up to 1.
n The implementation of a softmax classifier involves using a
neural network just before the output layer. The key feature
X
hi = fθ (x) = σ( Wijinput xj + binput
i ) (2)
j=1 of the classifier being discussed is that the number of nodes
is the same as those in the output layer. For k classes and
where x is the input vector, θ is the parameters i = 0, 1, 2, ...k, mathematical representation of Softmax is
{W input , binput }, W represents the encoder weight matrix of given as Eq. 8.
size p × d and b is a bias vector of size p respectively. Thus,
smaller dimensional vector is obtained by encoding the input
exi
vector. Sof tmax(xi ) = k
(8)
The obtained hidden representation hi is then decoded back
P
exj
to the initial input space Rd . The function for this decoder j=0
neural network is as follows:
C. Fine Tuning
Xn
x0i = gθ0 (h) = σ( Wijhidden hj + bhidden
i ) (3) The purpose underlying the use of fine-tuning is to gen-
j=1 erate better performing models. For the model presented in
this paper, fine-tuning has been performed using supervised
The parameters of the decoder is θ0 = {W input , binput }.
learning. The reason for employing supervised learning is that
The function σ, used in Eq. 2 and Eq. 3, can be a linear or
the target classes are known. The network is trained using
nonlinear activation function. In our model, a tanh and relu
training data with a method similar to the training of neural
functions are used and computed as follows:
networks. The main idea which needs to be considered is that
ex − e−x only the encoder part of the autoencoders has been used to
T anh : σ(x) = (4) carry out the training process
ex + e−x

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

D. Confusion Matrix TABLE I: Two-Class Confusion Matrix


A Confusion Matrix is a table in which the predictions Predicted
made by a machine learning model are tabulated. Table I is a Malicious Normal
confusion matrix for Binary classification. Depending on the Malicious True Positive (TP) False Negative (FN)
Actual
Normal False Positive (FP) True Negative (TN)
predicted value and the actual value, the observation is put into
the respective cell. The performance of the machine learning
model is commonly evaluated using data in the matrix. TABLE II: Dataset feature description [12]
Entries in the matrix have the following meaning in the
context of study at hand: Feature Name Description
Command/Response Specifies data is of type “command” or
• TP is the number of observations that were predicted to
“response”
be an attack and were actually an attack. Control Mode Gives auto control mode status of RTU unit
• FN is the number of observations that were predicted to Control Scheme Indicates if the control is achieved by the use
of solenoid
be normal but were actually an attack. Data Length Length of the data of type “command” or
• FP is the number of observations that were predicted to “response”
be an attack but were actually normal. Invalid Data Length Unacceptable size of data element
• TN is the number of observations that were predicted to Invalid Function Value giving indication of validity of function
Code code
be normal and were actually normal. Pipeline Pressure Denotes pipeline pressure of the gas pipeline
These four parameters can be used to obtain a set of per- (PSI) (measured in pounds per square inch)
Pump State Variable indicating state of pump
formance metrics. Of all the performance metrics, Precision,
Setpoint Descriptive of the setpoint value
Recall, and F1 Score will be used in this paper for evaluation Solenoid State Variable indicating state of solenoid
of our results and comparison with the existing work. Each of
these metrics can be evaluated using the formulae:
TP • Burst Values - Multiple values of pipeline pressure are
P recision = (9)
TP + FP sent at a high frequency.
TP • Fast Change - The parameters sent change very fast as
Recall = (10) compared to the rate at which they change during a
TP + FN
normal instance.
2 × P recision × Recall
F 1 Score = (11) • Single Data Injection - An artificial response in which the
P recision + Recall value is changed to cheat the control loop is sent after an
III. M ETHODOLOGY actual response
This section highlights our way of addressing the machine • Slow Change - The parameters sent change slowly as
learning algorithms for classifying the disturbances in the compared to the rate at which they change during a
system. Here, we describe the deep learning method used and normal instance.
its advantages over the other traditional methods. • Value Wave Injection - Multiple values of oscillating
pipeline pressure are sent.
A. Experimental Data
• Setpoint Value Injection – The false pipeline pressure
The data for the experiment was collected from a Gas values are sent which are equal to setpoint.
Pipeline System, Mississippi State University’s Critical Infras-
During a command injection attack, commands issued are
tructure Protection Center [21]. The dataset comprises samples
changed in order to influence the gas pipeline control. The
of Command Injection attacks, Data Injection attacks, and
four types of command injection attacks are:
normal instances i.e. cases that do not represent an attack. The
data generated for the experiments as mentioned in [12] was • Address Scan – Data with a variety of addresses is sent
used as the dataset for testing the algorithm mentioned in the to scan and understand the system.
Methodology. The various features as mentioned in [12] which • Function Scan – Used to obtain the function codes.
have been considered for drawing inferences are included in • Illegal Setpoint – The values of PSI and setpoint are
Table II. modified and usually set at a very high value.
The data used in the experiments as mentioned in [12] • Illegal PID Command – The functioning of the control
involves the Data/Response Injection attacks with the intent of loop is modified by changing its parameters.
manipulating PLC responses to cheat an analyst or a security The aim of this work is to emphasize the versatility of
system and keep them from knowing the original state of the semi-supervised learning algorithms, especially autoencoders
system. The data injection attacks have been categorized into in the classification tasks in this domain. The metrics used to
7 types according to how each influences the PLC response compare the performance of autoencoders as opposed to that
[12]: of traditional supervised learning algorithms are precision and
• Negative Values - Since pipeline pressure is non-negative, recall. The proposed solution targets at correctly classifying
the negative values injected are invalid. an observation as normal or malicious as well as obtaining

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

the type of attack that has occurred, if malicious. Thus, the that the algorithm may consider instances of attack as being
problem has been treated as being both a binary as well as noise and ignore such examples. As a result, we may end up
a multiclass classification problem. The greatest merit of this having very less or no case of intrusion, with the examples
approach is that a malicious attack type that is identified by being discarded as noise or being misclassified as being
multiclass methodology can contribute towards taking coun- instances of normal behavior. Either way may prove to be
termeasures quicker or troubleshooting issues in the system if very harmful for infrastructures as important as SCADA.
any. In semi-supervised techniques this may not be the case as
The two methods of detection in an IDS include misuse the anomalous cases may not be straight away discarded as
detection and anomaly detection. Anomaly detection entails being noise but may be considered as deviations from normal
detecting the patterns of normal cases first and then differen- behavior and such activity may be effectively detected. This
tiating the anomalous or cases of attack from the normal ones serves as the principal advantage of semi-supervised learning
when a deviation from normal behavior is observed. Since the techniques over supervised learning techniques.
approach adopted in this paper involves using Autoencoders
for detecting anomalous cases using the properties of normal B. Data Segregation
cases, it can be classified as being an anomaly detection The dataset consists of column ‘Label’ which defines the
problem. different types of data injection attacks. The rows labeled
Machine learning can be subdivided into three main tech- ‘normal’ are treated as an instance of the normal operating
niques: Supervised, Semi-supervised, and Unsupervised. The conditions while others are treated as attacks in the case of
difference between these techniques arises on the basis of binary classification. For multiclass classification, each type
whether the target variable which is also called a label, is of attack and normal data are treated as a separate class.
provided or is supposed to be determined. In Supervised
learning, entire data is labeled so the main aim in Supervised C. Classification Technique
learning is to develop a model that best approximates how Fig. 2 shows the proposed methodology.
the input and the output are related to each other. Whereas in 1) Binary Classification: The data obtained by binary data
Semi-supervised learning [22], [23], only limited amount of segregation was fed as an input to the autoencoder as men-
data is labeled and the majority of data is unlabeled data, this tioned in Fig. 3. The Autoencoder will ignore the instances
labeled data is used as the training data; based on this training labeled as malicious, and train only on the normal instances.
data the model learns to analyze the unlabelled data and then The Autoencoder has now learned the features of the normal
unlabelled data is predicted (labeled) in the testing period. instances. As the Autoencoder is trained, now it will be able to
Lastly in Unsupervised learning the entire data is unlabelled, predict any new data that is a part of normal instances as they
so the main task in Unsupervised learning is to understand all have the same type of distribution and so the reconstruction
the pattern of the data [24]. The machine learns to group the error will be negligible. If we try to reconstruct from the
unsorted data on the basis of the resemblance, patterns, and malicious instances, the Autoencoder will struggle. This will
variations with the prior training data. make the reconstruction error high during the process. We
A majority of real-world datasets available in this domain can catch such high reconstruction errors and label them as
are unsupervised [24] or semi-supervised [22], [23] and hence, an instance of malicious data [26], [27]. This procedure is
using autoencoders which can be treated as a semi-supervised similar and contributes towards anomaly detection methods as
learning algorithm is a suitable approach to work on such in [28]–[31].
datasets. As mentioned above in semi-supervised learning, 2) Multiclass Classification: As described in [18], [32],
very limited labeled data and the majority of unlabeled data classification is done using an autoencoder along with a Deep
is available, the model is trained on labeled data such that Neural Network (DNN) (Fig. 6). The DNN is so constructed as
the trained model is used to label the unlabelled data. It to contain a Stacked Autoencoder (SAE) [33] and a Softmax
is much more convenient to obtain unlabeled data from a layer (Fig. 5). The training data is input to the first autoencoder
reliable source rather than labeled data which involves human and it tries to reconstruct the input as described above. The
interference, this can be noted as a crucial difference between output of this autoencoder is given as an input to the second
supervised and unsupervised techniques. In the world of autoencoder (Fig. 4) and similarly tries to reconstruct the input.
growing data, it is very difficult to manually label each and The output of this layer is then fed to a softmax classifier layer
every data for the algorithm to train and so shifting towards for training with original labels being the target vector. To
semi-supervised and unsupervised can be beneficial for the improve the performance of DNN, fine-tuning is implemented,
future. which is carried out in a supervised manner by retraining the
It is also necessary that all the results are obtained in network with training data.
real-time. Supervised and unsupervised learning algorithms
have been found to be less effective in intrusion detection IV. R ESULTS
scenarios [25]. Also a case of attack is a rare event i.e. In this section, we have presented the results obtained after
the number of such instances in the dataset is less. When applying autoencoders to classify the given instances as normal
a supervised learning algorithm is used, it may be possible or attack. The results have been obtained in the form of

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

Fig. 3: Autoencoder 1

Fig. 4: Autoencoder 2

Fig. 2: Attack detection methodology


Fig. 5: Softmax

TABLE III: Distribution of Data

Command Injection Response Injection


Category Dataset Instances Dataset Instances
Normal Normal 28086 Normal 16510

Malicious Address Scan 2 Burst 217


Function Scan 9 Fast 225
Illegal Setpoint 197 Negative 119
Illegal PID 49 Setpoint 171
Single 33
Slow 306
Wave 253

Fig. 6: DNN

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

Precision, Recall, and F1 score on applying autoencoders for V. C OMPARISION W ITH E XISTING W ORK
both binary and multi-class classification. In this section, a comparison between the results obtained
A. Command Injection Results by the method suggested in this paper and those obtained
Autoencoder achieved a Recall, Precision, and hence F1 in [12] has been presented. The results obtained for binary
Score value of 1.00 for all the classes of multiclass dataset. classification as well as multiclass classification in Command
This implies that all the attacks along with their types were Injection instances are unity Precision and Recall which are
correctly captured. Table IV shows the results for the com- more desirable than those obtained by approach suggested by
mand injection binary classification. Again, the value of 1.00 [12]. The result obtained for binary classification in Response
was achieved for Recall, Precision, and hence F1 Score, Injection instances is at par with the results obtained by
indicating that every observation that the autoencoder was fed prior work. The values of metrics obtained for multiclass
was correctly classified as being normal or malicious. Thus classification in Response Injection instances are above 0.97
from the obtained results, we can conclude that autoencoders which outperforms the results achieved by the existing work.
have the ability to produce exceptional results to catch hold Thus, on comparison of results obtained by methods in
of Command Injection attacks. [12] with the results achieved by the approach suggested
in this paper, it is promised that an autoencoder network
B. Response Injection Results surpasses the performance of traditional methods and novice
Fig. 7 shows the Response Injection results for multiclass machine learning algorithms. By extension, this serves as
classification. Autoencoders achieved a Recall, Precision, and proof that semi-supervised and unsupervised approaches will
F1 Score value of 1.00 for most of the classes but values for definitely outperform supervised approaches, given that robust
Setpoint and Single were around 0.98 for each metric that was data preprocessing and feature engineering has been done.
calculated, no metric being less than 0.96.
Table V shows the results for the Response Injection Binary VI. C ONCLUSION
Classification. A value of 1.00 was achieved for Precision SCADA systems play a valuable role and hence are vulner-
whereas the value degrades for the metrics Recall and thus able to cyber-attacks, so in order to protect these systems,
F1 Score. Intrusion Detection Systems (IDS) comes in the scenario,
TABLE IV: Command Injection Binary Classification Results which helps in detecting and preventing attacks from affecting
the organization. This paper presented an advanced machine
Command Injection Binary Classification learning approach that can be augmented along with the
Measures Results existing techniques applied in these systems by implementing
Precision 1.00 a Semi-Supervised Autoencoder network that specializes in
Recall 1.00
F1 Score 1.00 detecting anomalous behavior of malicious observation.
From the results mentioned above, the autoencoder model
surpasses the machine learning algorithms implemented in
TABLE V: Response Injection Binary Classification Results prior work [12]. In a world full of growing unstructured and
unsupervised data, supervised machine learning algorithms
Response Injection Binary Classification have shown to perform poorly, so it has become quite nec-
Measures Results essary to move towards Semi-Supervised and Unsupervised
Precision 1.00 approaches that keep performing better and better with the
Recall 0.11
F1 Score 0.20 increase in the size of data as in the real-world scenarios.
In order to better maintain and improve our Intrusion De-
tection Systems which employ machine learning approaches,
it becomes crucial to answer the question as to why the model
predicted an observation as normal or malicious. Hence, future
directions of this work aim at interpreting the predictions made
by the Autoencoder model using sophisticated interpretable
machine learning techniques. This will help us gain better
insights about our model as well as the nature of attacks.

R EFERENCES
[1] Y. Zhang, L. Wang, Y. Xiang, and C. Ten, “Power System Reliability
Evaluation With SCADA Cybersecurity Considerations,” IEEE Trans-
actions on Smart Grid, vol. 6, no. 4, pp. 1707–1721, Jul. 2015, doi:
10.1109/TSG.2015.2396994.
[2] Y. Zhang, Y. Xiang, and L. Wang, “Power System Reliability As-
sessment Incorporating Cyber Attacks Against Wind Farm Energy
Fig. 7: Response Injection Multiclass Classification Results Management Systems,” IEEE Transactions on Smart Grid, vol. 8, no.
5, pp. 2343–2357, Sep. 2017, doi: 10.1109/TSG.2016.2523515.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE-HYDCON

[3] M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan, and R. Jain, [21] T. Morris, R. Vaughn, and Y.S. Dandass, “A testbed for SCADA control
“Machine Learning-Based Network Vulnerability Analysis of Indus- system cybersecurity research and pedagogy,” Proceedings of the Sev-
trial Internet of Things,” IEEE Internet Things J., vol. 6, no. 4, pp. enth Annual Workshop on Cyber Security and Information Intelligence
6822–6834, Aug. 2019, doi: 10.1109/JIOT.2019.2912022. Research, 2011.
[4] C. T. Symons and J. M. Beaver, “Nonparametric semi-supervised learn- [22] Y. C A Padmanabha Reddy, P. Viswanath, and B. Eswara Reddy, “Semi-
ing for network intrusion detection: combining performance improve- supervised learning: a brief review,” IJET, vol. 7, no. 1.8, p. 81, Feb.
ments with realistic in-situ training,” in Proceedings of the 5th ACM 2018, doi: 10.14419/ijet.v7i1.8.9977.
workshop on Security and artificial intelligence - AISec ’12, Raleigh, [23] J. E. van Engelen, and H. H. Hoos, “A survey on semi-supervised
North Carolina, USA, 2012, p. 49, doi: 10.1145/2381896.2381905. learning,” textitMachine Learning, vol. 109, no. 2, pp. 373–440, Feb.
[5] S. L. P. Yasakethu and J. Jiang, “Intrusion Detection via Machine Learn- 2020, doi: 10.1007/s10994-019-05855-6.
ing for SCADA System Protection,” presented at the 1st International [24] M. Usama et al., “Unsupervised Machine Learning for Networking:
Symposium for ICS SCADA Cyber Security Research 2013 (ICS-CSR Techniques, Applications and Research Challenges,” IEEE Access, vol.
2013), Sep. 2013, doi: 10.14236/ewic/ICSCSR2013.12. 7, pp. 65579–65615, 2019, doi: 10.1109/ACCESS.2019.2916648.
[6] A. F. S. Prisco and M. J. Freddy Duitama, “Intrusion detection system [25] P. Laskov, P. Düssel, C. Schäfer, and K. Rieck, “Learning Intru-
for SCADA platforms through machine learning algorithms,” in 2017 sion Detection: Supervised or Unsupervised?,” in Image Analysis and
IEEE Colombian Conference on Communications and Computing (COL- Processing – ICIAP 2005, vol. 3617, F. Roli and S. Vitulano, Eds.
COM), Cartagena, Colombia, Aug. 2017, pp. 1–6, doi: 10.1109/ColCom- Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 50–57, doi:
Con.2017.8088210. 10.1007/11553595 6.
[7] H. Yang, L. Cheng, and M. C. Chuah, “Deep-Learning-Based Network [26] D. Oh and I. Yun, “Residual Error Based Anomaly Detection Using
Intrusion Detection for SCADA Systems,” in 2019 IEEE Conference on Auto-Encoder in SMD Machine Sound,” Sensors, vol. 18, no. 5, p. 1308,
Communications and Network Security (CNS), Jun. 2019, pp. 1–7, doi: Apr. 2018, doi: 10.3390/s18051308.
10.1109/CNS.2019.8802785. [27] A. Borghesi, A. Bartolini, M. Lombardi, M. Milano, and L. Benini,
[8] A. K. Goel, R. Chakraborty, M. Agarwal, Md. D. Ansari, S. K. “Anomaly Detection Using Autoencoders in High Performance Com-
Gupta, and D. Garg, “Profit or Loss: A Long Short Term Mem- puting Systems,” AAAI, vol. 33, pp. 9428–9433, Jul. 2019, doi:
ory based model for the Prediction of share price of DLF group 10.1609/aaai.v33i01.33019428.
in India,” in 2019 IEEE 9th International Conference on Advanced [28] I. A. Khan, D. Pi, Z. U. Khan, Y. Hussain, and A. Nawaz, “HML-
Computing (IACC), Tiruchirappalli, India, Dec. 2019, pp. 120–124, doi: IDS: A Hybrid-Multilevel Anomaly Prediction Approach for Intrusion
10.1109/IACC48062.2019.8971601. Detection in SCADA Systems,” IEEE Access, vol. 7, pp. 89507–89521,
[9] M. Agarwal, V. K. Bohat, Mohd. D. Ansari, A. Sinha, S. Kr. Gupta, 2019, doi: 10.1109/ACCESS.2019.2925838.
and D. Garg, “A Convolution Neural Network based approach to detect [29] A. Almalawi, A. Fahad, Z. Tari, A. Alamri, R. AlGhamdi, and A.
the disease in Corn Crop,” in 2019 IEEE 9th International Conference Y. Zomaya, “An Efficient Data-Driven Clustering Technique to Detect
on Advanced Computing (IACC), Tiruchirappalli, India, Dec. 2019, pp. Attacks in SCADA Systems,” IEEE Trans.Inform.Forensic Secur., vol.
176–181, doi: 10.1109/IACC48062.2019.8971602. 11, no. 5, pp. 893–906, May 2016, doi: 10.1109/TIFS.2015.2512522.
[10] M. Teixeira, T. Salman, M. Zolanvari, R. Jain, N. Meskin, and M. [30] P. Nader, P. Honeine, and P. Beauseroy, “lp -norms in One-Class
Samaka, “SCADA System Testbed for Cybersecurity Research Using Classification for Intrusion Detection in SCADA Systems,” IEEE
Machine Learning Approach,” Future Internet, vol. 10, no. 8, p. 76, Trans. Ind. Inf., vol. 10, no. 4, pp. 2308–2317, Nov. 2014, doi:
Aug. 2018, doi: 10.3390/fi10080076. 10.1109/TII.2014.2330796.
[11] R. Sommer and V. Paxson, “Outside the Closed World: On Using [31] Y. He, G. J. Mendis, and J. Wei, “Real-Time Detection of False Data
Machine Learning for Network Intrusion Detection,” in 2010 IEEE Injection Attacks in Smart Grid: A Deep Learning-Based Intelligent
Symposium on Security and Privacy, Oakland, CA, USA, 2010, pp. Mechanism,” IEEE Trans. Smart Grid, vol. 8, no. 5, pp. 2505–2516,
305–316, doi: 10.1109/SP.2010.25. Sep. 2017, doi: 10.1109/TSG.2017.2703842.
[12] J. M. Beaver, R. C. Borges-Hink, and M. A. Buckner, “An Eval- [32] K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data
uation of Machine Learning Methods to Detect Malicious SCADA classification using stacked autoencoders in deep neural networks,”
Communications,” in 2013 12th International Conference on Machine Clinical Epidemiology and Global Health, vol. 7, no. 4, pp. 530–535,
Learning and Applications, Miami, FL, USA, Dec. 2013, pp. 54–59, Dec. 2019, doi: 10.1016/j.cegh.2018.12.004.
doi: 10.1109/ICMLA.2013.105. [33] G. Liu, H. Bao, and B. Han, “A Stacked Autoencoder-Based Deep
[13] R. C. Borges Hink, J. M. Beaver, M. A. Buckner, T. Morris, U. Adhikari, Neural Network for Achieving Gearbox Fault Diagnosis,” Mathemat-
and S. Pan, “Machine learning for power system disturbance and cyber- ical Problems in Engineering, vol. 2018, pp. 1–10, Jul. 2018, doi:
attack discrimination,” in 2014 7th International Symposium on Resilient 10.1155/2018/5105709.
Control Systems (ISRCS), Denver, CO, USA, Aug. 2014, pp. 1–8, doi:
10.1109/ISRCS.2014.6900095.
[14] V. L. Do, “Statistical detection and isolation of cyber-physical attacks
on SCADA systems,” in IECON 2017 - 43rd Annual Conference of the
IEEE Industrial Electronics Society, 2017, pp. 3524–3529.
[15] S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep Learn-
ing Algorithms for Bearing Fault Diagnostics—A Comprehensive Re-
view,” IEEE Access, vol. 8, pp. 29857–29881, 2020, doi: 10.1109/AC-
CESS.2020.2972859.
[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
2016.
[17] Y. Lin and J. Wang, “Probabilistic Deep Autoencoder for Power System
Measurement Outlier Detection and Reconstruction,” IEEE Transac-
tions on Smart Grid, vol. 11, no. 2, pp. 1796–1798, Mar. 2020, doi:
10.1109/TSG.2019.2937043.
[18] H. I. Choi,“ Lectures on Machine Learning (Fall 2017), Lecture 16:
Autoencoders, Seoul National University”.
[19] Z. Chen, C. K. Yeo, B. S. Lee, and C. T. Lau, “Autoencoder-
based network anomaly detection,” in 2018 Wireless Telecommuni-
cations Symposium (WTS), Phoenix, AZ, Apr. 2018, pp. 1–5, doi:
10.1109/WTS.2018.8363930.
[20] S. Deng, L. Du, C. Li, J. Ding, and H. Liu, “SAR Automatic Target
Recognition Based on Euclidean Distance Restricted Autoencoder,”
IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, vol. 10,
no. 7, pp. 3323–3333, Jul. 2017, doi: 10.1109/JSTARS.2017.2670083.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on December 24,2020 at 15:24:05 UTC from IEEE Xplore. Restrictions apply.

You might also like