0% found this document useful (0 votes)

58 views15 pages

SCADA System Testbed For Cybersecurity R

The document describes the development of a SCADA system testbed used for cybersecurity research. Sophisticated cyber-attacks were conducted against the testbed and network traffic was captured. Features were extracted from the traffic to build a dataset for training machine learning algorithms to detect attacks. Five machine learning algorithms were trained and tested on the dataset, then deployed in the network to detect attacks in real time.

Uploaded by

abdel taib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views15 pages

SCADA System Testbed For Cybersecurity R

Uploaded by

abdel taib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

future internet

Article
SCADA System Testbed for Cybersecurity Research
Using Machine Learning Approach
Marcio Andrey Teixeira 1,2, *, Tara Salman 2 , Maede Zolanvari 2 , Raj Jain 2 ID
, Nader Meskin 3
and Mohammed Samaka 4
1 Department of Informatics, Federal Institute of Education, Science, and Technology of Sao Paulo,
Catanduva 15808-305, SP, Brazil
2 Department of Computer Science and Engineering, Washington University in Saint Louis,
Saint Louis, MO 63130, USA; [email protected] (T.S.); [email protected] (M.Z.);
[email protected] (R.J.)
3 Department of Electrical Engineering, Qatar University, Doha 2713, Qatar; [email protected]
4 Department of Computer Science and Engineering, Qatar University, Doha 2713, Qatar;
[email protected]
* Correspondence: [email protected]; Tel.: +55-17-98118-7649

Received: 17 July 2018; Accepted: 8 August 2018; Published: 9 August 2018

Abstract: This paper presents the development of a Supervisory Control and Data Acquisition
(SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank’s
control system, which is a stage in the process of water treatment and distribution. Sophisticated
cyber-attacks were conducted against the testbed. During the attacks, the network traffic was
captured, and features were extracted from the traffic to build a dataset for training and testing
different machine learning algorithms. Five traditional machine learning algorithms were trained to
detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN. Then,
the trained machine learning models were built and deployed in the network, where new tests were
made using online network traffic. The performance obtained during the training and testing of the
machine learning models was compared to the performance obtained during the online deployment
of these models in the network. The results show the efficiency of the machine learning models in
detecting the attacks in real time. The testbed provides a good understanding of the effects and
consequences of attacks on real SCADA environments.

Keywords: cybersecurity; machine learning; SCADA system; network security

1. Introduction
Supervisory Control and Data Acquisition (SCADA) systems are Industrial Control Systems (ICS)
widely used by industries to monitor and control different processes such as oil and gas pipelines,
water distribution, electrical power grids, etc. These systems provide automated control and remote
monitoring of services being used in daily life. For example, states and municipalities use SCADA
systems to monitor and regulate water levels in reservoirs, pipe pressure, and water distribution.
A typical SCADA system includes components like computer workstations, Human Machine
Interface (HMI), Programmable Logic Controllers (PLCs), sensors, and actuators [1]. Historically,
these systems had private and dedicated networks. However, due to the wide-range deployment
of remote management, open IP networks (e.g., Internet) are now used for SCADA systems
communication [2]. This exposes SCADA systems to the cyberspace and makes them vulnerable
to cyber-attacks using the Internet.
Machine learning (ML) and artificial intelligence techniques have been widely used to constitute
an intelligent and efficient Intrusion Detection System (IDS) dedicated to ICS. However, researchers

Future Internet 2018, 10, 76; doi:10.3390/fi10080076 www.mdpi.com/journal/futureinternet

Future Internet 2018, 10, 76 2 of 15

generally develop and train their ML-based security system using network traces obtained from
publicly available datasets. Due to malware evolution and changes in the attack strategies,
these datasets fail to protect the system from new types of attacks, and consequently, the benchmark
datasets should be updated periodically.
This paper presents the deployment of a SCADA system testbed for cybersecurity research and
investigates the feasibility of using ML algorithms to detect cyber-attacks in real time. The testbed was
built using equipment deployed in real industrial settings. Sophisticated attacks were conducted on
the testbed to develop a better understanding of the attacks and their consequences in SCADA
environments. The network traffic was captured, including both abnormal and normal traffic.
The behavior of both types of traffic (abnormal and normal) was analyzed, and features were extracted
to build a new SCADA-IDS dataset. This dataset was then used for training and testing ML models
which were further deployed in the network. The performance of the ML model depends highly on
the available datasets. One of the main contributions of this paper is building a new dataset updated
with recent and more sophisticated attacks. We argue that IDS using ML models trained with a dataset
generated at the process control level could be more efficient, less complicated, and more cost-effective
as compared to traditional protection techniques. Five traditional machine learning algorithms were
trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN.
Once trained and tested, the ML models were deployed in the network, where real network traffic
was used to analyze the effectiveness and efficiency of the ML models in a real-time environment.
We compared the performance obtained during the training and test phase of the ML models with
the performance obtained during the online deployment of these models in the network. The online
deployment is another contribution of this paper since most of the published papers present the
performance of the ML models obtained during the training and test phases. We conducted this
research to build an IDS software based on ML models to be deployed in the ICS/SCADA systems.
The remainder of the paper is organized as follows. Section 2 presents a brief background of the
ICS-SCADA system reference model and the related works. Section 3 describes the developed SCADA
system testbed. Section 4 describes the ML algorithms and the performance measurements used in this
work. Section 5 shows the scenario of the conducted attacks and the main features of the dataset used
to train the algorithms. Section 6 discusses our results and the interoperations behind them. Finally,
Section 7 concludes the paper with a summary of the main points and outcomes.

2. Background
In this section, we briefly present a description of the ICS-SCADA reference model and some
related works in the domain of ML algorithms for SCADA system security.

2.1. ICS Reference Model

The ICS is a general term that covers numerous control systems, including SCADA systems,
distributed control systems, and other control system configurations [3]. An ICS consists of
combinations of control components (e.g., electrical, mechanical, hydraulic, pneumatic) that are
used to achieve various industrial objectives (e.g., manufacturing, transportation of matter or energy).
Figure 1 shows an example of an ICS reference model [4].
As can be seen from Figure 1, the ICS model is divided into four levels, 3 to 0. Level 3
(corporate network) consists of traditional information technology, including the general deployment
of services and systems, such as file transfer, websites, mail servers, resource planning, and office
automation systems. Level 2 (supervisory control local area network) includes the functions involved
in monitoring and controlling the physical processes and the general deployment of systems such as
HMIs, engineering workstations, and history logs. Level 1 (control network) includes the functions
involved in sensing and manipulating physical processes. For example, receiving the information,
processing the data, and triggering outputs, which are all done in PLCs. Level 0 (I/O network) consists
of devices (sensors/actuators) that are directly connected to the physical process.
Future Internet 2018, 10, x FOR PEER REVIEW 3 of 15

processing the data, and triggering outputs, which are all done in PLCs. Level 0 (I/O network) consists
Future Internet 2018, 10, 76 3 of 15
of devices (sensors/actuators) that are directly connected to the physical process.

Figure 1. Industrial control systems (ICS) reference model [4].
Figure 1. Industrial control systems (ICS) reference model [4].

As shown in Figure 1, Level 3 is composed of the traditional IT infrastructure system (Internet
As shown in Figure 1, Level 3 is composed of the traditional IT infrastructure system (Internet
access service, file transfer protocol server, Virtual Private Network (VPN) remote access, etc.). Levels
access service, file transfer protocol server, Virtual Private Network (VPN) remote access, etc.). Levels 2,
2, 1, and 0, represent a typical SCADA system which is composed of the following components:
1, and 0, represent a typical SCADA system which is composed of the following components:
 HMI: Used to observe the status of the system or to adjust the system parameters for processes
• control and management purposes.
HMI: Used to observe the status of the system or to adjust the system parameters for processes
 control and management purposes.
Engineering workstation: Used by engineers for programming the control functions of the HMI.
•
 Engineering workstation: Used by engineers for programming the control functions of the HMI.
History logs: Used to collect the data in real‐time from the automation processes for current or
• later analysis.
History logs: Used to collect the data in real-time from the automation processes for current or
 later analysis.
PLCs: Act as slave stations in the SCADA architecture. They are connected to sensors or
• actuators.
PLCs: Act as slave stations in the SCADA architecture. They are connected to sensors or actuators.

2.2. The SCADA Communication Protocol

2.2. The SCADA Communication Protocol
There are
There are several communication
several communication protocols developed
protocols for use
developed in use
for SCADA systems. systems.
in SCADA These protocols
These
define the standard message format for all inter-device communications in the network.
protocols define the standard message format for all inter‐device communications in the network. One popular
protocol, which is widely used in SCADA system environments, is Modbus protocol [5]. Modbus is an
One popular protocol, which is widely used in SCADA system environments, is Modbus protocol
application-layer
[5]. Modbus is messaging protocol that provides
an application‐layer the client/server
messaging protocol that communications between
provides the devices
client/server
connected to an Ethernet network and offers services specified by function codes. The function codes
communications between devices connected to an Ethernet network and offers services specified by
tell the server what action to take. For example, a client can read the status of the discrete outputs or
function codes. The function codes tell the server what action to take. For example, a client can read
the values of digital inputs from the PLC; or it can read/write the data contents of a group of registers
the status of the discrete outputs or the values of digital inputs from the PLC; or it can read/write the
insidecontents
data the PLC.of
Figure 2 illustrates
a group an example
of registers of Modbus
inside the client/server
PLC. Figure communication.
2 illustrates an example of Modbus
The Modbus register address type consists of four data reference types [5,6] which are summarized
client/server communication.
in Table
The 1.Modbus
The “xxxx” following
register a leading
address digit represents
type consists of four adata
four-digit address
reference location
types in the user
[5,6] which are
data memory.
summarized in Table 1. The “xxxx” following a leading digit represents a four‐digit address location
in the user data memory.
Future Internet 2018, 10, 76 4 of 15
Future Internet 2018, 10, x FOR PEER REVIEW 4 of 15

Figure 2. Modbus client/server communication example.
Figure 2. Modbus client/server communication example.

Table 1. Data reference types [6,7].
Table 1. Data reference types [6,7].
Reference Range Description
Reference Range Description
0xxxx 00001–09999 Read/Write Discrete Outputs or Coils.
0xxxx
1xxxx 00001–09999
10001–19999 Read/Write Discrete Outputs or Coils.
Read Discrete Inputs.
1xxxx 10001–19999 Read Discrete Inputs.
3xxxx
3xxxx
30001–39999
30001–39999
Read Input Registers.
Read Input Registers.
4xxxx
4xxxx 40001–49999
40001–49999Read/Write‐Output or Holding Registers.
Read/Write-Output or Holding Registers.

2.3. Related Works
2.3. Related Works
Cyber‐attacks are continuously evolving and changing behavior to bypass security mechanisms.
Cyber-attacks are continuously evolving and changing behavior to bypass security mechanisms.
Thus, the
Thus, the utilization
utilization ofof advanced
advanced security
security mechanisms
mechanisms is essential
is essential to identify
to identify and prevent
and prevent new
new attacks.
attacks. In this sense, the development of real testbeds advances the research in this area.
In this sense, the development of real testbeds advances the research in this area.
Morris et al. [7] describe four datasets to be used for cybersecurity research. The datasets include
Morris et al. [7] describe four datasets to be used for cybersecurity research. The datasets include
network traffic,
network traffic, process
process control,
control, and
and process
process measurement
measurement features
features from
from aa set
set of
of attacks
attacks against
against
testbeds which use Modbus application layer protocol. The authors argue
testbeds which use Modbus application layer protocol. The authors argue that there are several that there are several
datasets developed
datasets developed to to train
train and
and validate
validate IDS
IDS associated
associated withwith traditional
traditional information
information technology
technology
systems, but in the SCADA security area there is a lack of availability and access to SCADA network
systems, but in the SCADA security area there is a lack of availability and access to SCADA network
traffic. In
traffic. In our work, aa new
our work, new dataset
dataset with
with new
new types
types ofof attacks
attacks was
was created.
created. So,
So, once
once our
our dataset
dataset is
is
available, we
available, we are
are providing
providing a resource
a resource thatthat
couldcould
be usedbe by
used by researchers
researchers to train,to train, validate,
validate, and
and compare
compare their results with other datasets.
their results with other datasets.
In order to investigate the security of the Modbus/TCP protocol, Miciolino et al. [8] explored a
In order to investigate the security of the Modbus/TCP protocol, Miciolino et al. [8] explored
acomplex
complex cyber‐physical
cyber-physical testbed,
testbed,conceived
conceivedfor for
the the
control and and
control monitoring
monitoringof a of
water system.
a water The
system.
analysis of the experimental results highlights the critical characteristics of the
The analysis of the experimental results highlights the critical characteristics of the Modbus/TCP as a
Modbus/TCP as a
popular communication protocol in ICS environments. They concluded that by obtaining sufficient
popular communication protocol in ICS environments. They concluded that by obtaining sufficient
knowledge of the system, an attacker is able to change the commands of the actuators or the sensor
knowledge of the system, an attacker is able to change the commands of the actuators or the sensor
readings in order to achieve its malicious objectives. Obtaining knowledge of the system is the first
readings in order to achieve its malicious objectives. Obtaining knowledge of the system is the first
step in attacking a system. This attack is also known as a reconnaissance attack. Hence, in our work,
step in attacking a system. This attack is also known as a reconnaissance attack. Hence, in our work,
our ML models are trained to recognize this kind of attack.
our ML models are trained to recognize this kind of attack.
In Ref. [9], Rosa et al. describes some practical cyber‐attacks using an electricity grid testbed.
In Ref. [9], Rosa et al. describes some practical cyber-attacks using an electricity grid testbed.
This testbed consists of a hybrid environment of SCADA assets (e.g., PLCs, HMIs, process control
This testbed consists of a hybrid environment of SCADA assets (e.g., PLCs, HMIs, process control
servers) controlling an emulated power grid. The work explains their attacks and discusses some of
servers) controlling an emulated power grid. The work explains their attacks and discusses some of
the challenges faced by an attacker in implementing them. One of the attacks is the reconnaissance
the challenges faced by an attacker in implementing them. One of the attacks is the reconnaissance
network attack. The authors argue that this kind of attack can be used not only to discover devices
network attack. The authors argue that this kind of attack can be used not only to discover devices and
and types of services but also to perform fingerprinting and discover PLCs behind the gateways.
types of services but also to perform fingerprinting and discover PLCs behind the gateways. Hence,
Hence, in our work, advanced reconnaissance attacks were carried out, and ML algorithms were used
in our work, advanced reconnaissance attacks were carried out, and ML algorithms were used to
to detect them.
detect them.
In Ref. [10], Keliris et al. developed a process‐aware supervised learning defense strategy that
considers the operational behavior of an ICS to detect attacks in real‐time. They used a benchmark
Future Internet 2018, 10, 76 5 of 15

In Ref. [10], Keliris et al. developed a process-aware supervised learning defense strategy that
considers the operational behavior of an ICS to detect attacks in real-time. They used a benchmark
chemical process and considered several categories of attack vectors on their hardware controllers.
They Future Internet 2018, 10, x FOR PEER REVIEW
used their trained SVM model to detect abnormalities in real-time and to distinguish5 of 15 between
disturbances and malicious behavior as well. In our work, we used five ML algorithms to identify the
chemical process and considered several categories of attack vectors on their hardware controllers.
abnormal behavior in real-time and evaluated their detection performance.
They used their trained SVM model to detect abnormalities in real‐time and to distinguish between
Tomin et al. [11] presented a semi-automated method for online security assessment using
disturbances and malicious behavior as well. In our work, we used five ML algorithms to identify
ML techniques. They outline their experience obtained at the Melentiev Energy Systems Institute,
the abnormal behavior in real‐time and evaluated their detection performance.
Russia inTomin et al. [11] presented a semi‐automated method for online security assessment using ML
developing ML-based approaches for detecting potentially dangerous states in power
systems. Multiple ML algorithms were trained offline using resampling cross-validation method.
techniques. They outline their experience obtained at the Melentiev Energy Systems Institute, Russia
Then,in
the best model
developing amongapproaches
ML‐based the ML algorithms was
for detecting selected dangerous
potentially based on performance and
states in power was used
systems.
online. They argue that the use of ML techniques provides reliable and robust solutions that can
Multiple ML algorithms were trained offline using resampling cross‐validation method. Then, the
best model among the ML algorithms was selected based on performance and was used online. They
resolve the challenges in planning and operating future industrial systems with an acceptable level
of security. that the use of ML techniques provides reliable and robust solutions that can resolve the
argue
challenges in planning and operating future industrial systems with an acceptable level of security.
Cherdantseva et al. [12] reviewed the state of the art in cybersecurity risk assessment of SCADA
Cherdantseva et al. [12] reviewed the state of the art in cybersecurity risk assessment of SCADA
systems. This review indicates that despite the popularity of the machine learning techniques,
systems. This review indicates that despite the popularity of the machine learning techniques,
research groups in ICS security have reported a lack of standard datasets for training and testing
research groups in ICS security have reported a lack of standard datasets for training and testing
machine learning algorithms. The lack of standard datasets has resulted in an inability to develop
machine learning algorithms. The lack of standard datasets has resulted in an inability to develop
robustrobust ML models to detect the anomalies in ICS. Using the testbed proposed in this paper, we built
ML models to detect the anomalies in ICS. Using the testbed proposed in this paper, we built a
new dataset for training and testing ML algorithms.
a new dataset for training and testing ML algorithms.

3. The3. The SCADA System Testbed
SCADA System Testbed
In this section, we describe the configuration of our SCADA system testbed for
In this section, we describe the configuration of our SCADA system testbed for cybersecurity
research. research.
cybersecurity

The Testbed Framework
The Testbed Framework
The purpose
The purpose of our
of our testbed
testbed is is
toto emulate real-world
emulate real‐world industrial
industrialsystems
systemsas as
closely as possible
closely as possible
without replicating an entire plant or assembly system [13]. The utilization of a testbed allows us to
without replicating an entire plant or assembly system [13]. The utilization of a testbed allows us to
carry out real cyber‐attacks. Our testbed is dedicated to controlling a water storage tank, which is a
carry out real cyber-attacks. Our testbed is dedicated to controlling a water storage tank, which is
part of the process of water treatment and distribution. The components used in our testbed are
a partcommonly used in real SCADA environments. Figure 3 shows the SCADA testbed framework for
of the process of water treatment and distribution. The components used in our testbed are
commonly used in real SCADA environments. Figure 3 shows the SCADA testbed framework for our
our targeted application and Table 2 shows a brief description of the equipment used to build the
targeted application and Table 2 shows a brief description of the equipment used to build the testbed.
testbed.

Figure 3. The testbed framework.
Figure 3. The testbed framework.

Future Internet 2018, 10, 76 6 of 15

Table 2. Description of the devices used in the testbed.

Devices Descriptions
On Button Turns on the level control process of the water storage tank.
Off Button Turns off the level control process of the water storage tank.
Light Indicator Indicates whether the system is on or off.
Monitors the maximum water level in the tank. When the water reaches the
Level Sensor 1 (LS1)
maximum level, the sensor sends a signal to PLC.
Monitors the minimum water level in the tank. When the water reaches the
Level Sensor 2 (LS2)
minimum level, the sensor sends a signal to PLC.
Controls the water level in the tank. When the water reaches the maximum level,
Valve the valve opens, and when the water reaches the minimal level, the valve closes.
This logic is implemented in PLC using the ladder language.
Water Pump 1 Fills up the water tank.
Water Pump 2 Draws water from the tank when the valve is open.
Controls the physical process. The logic of the water control system is in PLC,
PLC which receives signals from the input devices (buttons, sensors), executes the
program, and sends signals to the output devices (water pumps and valve).
Used by the administrator to monitor and control the water storage system in
HMI real-time. The administrator can also display the devices’ state and interact with
the system through this interface.
Data History Used to store logs and events of the SCADA system.

As shown in Figure 3, the storage tank has two level sensors: Level Sensor 1 (LS1) and Level
Sensor 2 (LS2) that monitor the water level in the tank. When the water reaches the maximum level
defined in the system, the LS1 sends a signal to the PLC. The PLC turns off Water Pump 1 used to fill
up the tank, opens the valve, and turns on Water Pump 2 to draw the water from the tank. When the
water reaches the minimal level defined in the system, LS2 sends a signal to the PLC, which closes
the valve, turns off Water Pump 2, and turns on Water Pump 1 to fill up the tank. This process starts
over when the water level reaches LS1. The SCADA system gets data from the PLC using the Modbus
communication protocol and displays them to the system operator through the HMI interface.
There are other ICS protocols which could be used instead of Modbus in our testbed. For example,
DNP3 is an ICS protocol that provides some security mechanisms [14,15]. However, in a recent
research, Li et al. [16] reported that they found 17,546 devices connected to the Internet using the
Modbus protocol spread all over the world. They did not count the amount of equipment not directly
connected to the internet. Although there are other ICS protocols, many industries still use SCADA
systems with Modbus protocol because their equipment does not support other protocols. In this case,
solutions to detect attacks can be cheaper than other solutions, for example, changing the devices.
PLC Schneider model M241CE40 is used in our testbed to control the process of the water
storage tank. The logic programming of the PLC is done using the LADDER programming language.
The LADDER language is not covered in this paper; however, more information can be found in [17,18].
The sensors described in Table 2 are connected to the digital inputs of the PLC. The pumps and valves
are connected to the output of the PLC.

4. Machine Learning Algorithms and Performance Measurements

In this section, we describe the ML algorithms used in our work as well as the measurements
used to evaluate their performances.
Future Internet 2018, 10, 76 7 of 15

4.1. Machine Learning Algorithms

The ML algorithms can be classified as supervised, unsupervised, and semi-supervised. Each class
has its own characteristics and applicability. The discussion of all algorithms is beyond the scope
of this paper. However, we refer the reader to [19,20] for detailed technical discussions of these
algorithms. In this paper, we use traditional ML algorithms to detect the attacks. Our target is to build
supervised machine learning models, and we chose the followings algorithms for attack detection and
classification:

1. Logistic Regression [20].

2. Random Forest [21].
3. Naïve Bayes [22].
4. Support Vector Machine (SVM) [23].
5. KNN [24].

The performance of these algorithms is discussed in Section 6.

4.2. Performance Measurements

Traditionally, the performance of ML algorithms is measured by metrics which are derived from
the confusion matrix [25]. Table 3 shows the confusion matrix in the IDS context.

Table 3. Confusion matrix in the intrusion detection system (IDS) context.

Data Class Classified as Normal Classified as Abnormal

Normal True Negative (TN) False Negative (FN)
Abnormal False Positive (FP) True Positive (TP)

In the IDS context, the following parameters are used to create the confusion matrix:

• TN: Represents the number of normal flows correctly classified as normal (e.g., normal traffic);
• TP: Represents the number of abnormal flows (attacks) correctly classified as abnormal (e.g.,
attack traffic);
• FP: Represent the number of normal flows incorrectly classified as abnormal;
• FN: Represents the number of abnormal flows incorrectly classified as normal;

Next, we present several evaluation metrics and their respective formulas which are derived from
the confusion matrix parameters:

• Accuracy: The percentage of correctly predicted flows considering the total number of predictions:

TP + TN
Accuracy % = × 100 (1)
TP + TN + FP + FN

• False Alarm Rate (FAR): This represents the percentage of the normal flows misclassified as
abnormal flows (attack) by the model:

FP
FAR % = × 100 (2)
FP + TN

• Un-Detection Rate (UND): The fraction of the abnormal flows (attack) which are misclassified as
normal flows by the model:
FN
UND % = × 100 (3)
FN + TP
Future Internet 2018, 10, 76 8 of 15

Accuracy (Equation (1)) is the most frequently used metric for evaluating the performance of
learning models in classification problems. However, this metric is not very reliable for evaluating
the ML performance in scenarios with imbalanced classes [26]. In this case, one class is dominant in
number, and it has more samples relatively compared to another class. For example, in IDS scenarios,
the proportion of normal flows to attack flows is very high in any realistic dataset. That is, the number
of samples in the dataset which represent the normal flows is enormous compared to the number
of samples which represent the attack flows. This problem is prevalent in scenarios where anomaly
detection is crucial, such as fraudulent transactions in banks, identification of rare diseases, and in the
Future Internet 2018, 10, x FOR PEER REVIEW
identification infrastructure. New metrics have been developed to avoid
of cyber-attacks in critical 8 of 15
a
biased analysis [27]. So, in addition to the accuracy, we also used the FAR and UND metrics.
5. Attack Scenarios, Features Selection, and Evaluation Scenarios
5. Attack Scenarios, Features Selection, and Evaluation Scenarios
In this section, we describe the attacks carried out in our testbed and the features used to build
In this section, we describe the attacks carried out in our testbed and the features used to build our
our dataset. This dataset was used for training and testing the ML algorithms, as described in Section
dataset.
6. This dataset was used for training and testing the ML algorithms, as described in Section 6.

5.1. Attack Scenarios

5.1. Attack Scenarios
Network
Network attacks
attacks on
on SCADA
SCADA systems
systems can
can be
be divided
divided into
into three
three categories:
categories: reconnaissance,
reconnaissance,
command injection, and denial of service (DoS) [7]. Our focus in this paper is on the reconnaissance
command injection, and denial of service (DoS) [7]. Our focus in this paper is on the reconnaissance
attacks
attacks where the network
where the network is is scanned
scannedfor forpossible
possiblevulnerabilities
vulnerabilitiesto tobe beused
usedfor for later
later attacks.
attacks. A
A reconnaissance attack is the first stage of any attack on a networking system. In this
reconnaissance attack is the first stage of any attack on a networking system. In this stage, hackers stage, hackers
use scan tools to inspect the topology of the victim network and identify the devices in the network as
use scan tools to inspect the topology of the victim network and identify the devices in the network
well as their
as well vulnerabilities.
as their FigureFigure
vulnerabilities. 4 shows
4 our testbed
shows our attack
testbed scenario
attack where thewhere
scenario dashedthe rectangles
dashed
highlight the vulnerable spots and possible attack targets in the system.
rectangles highlight the vulnerable spots and possible attack targets in the system.

Figure 4. Attack Scenario.
Figure 4. Attack Scenario.

Some reconnaissance attacks can be easily detected. For example, there are scanning tools which
Some reconnaissance attacks can be easily detected. For example, there are scanning tools which
send a large number of packets per second under Modbus/TCP to the targeted device and wait for
send a large number of packets per second under Modbus/TCP to the targeted device and wait for
acknowledgment of the packets from them. If a response is received, the host (i.e., the device) is active.
acknowledgment of the packets from them. If a response is received, the host (i.e., the device) is active.
This attack generates a considerable variation in the traffic behavior which can be easily detected by
This attack generates a considerable variation in the traffic behavior which can be easily detected
the traditional IDS or even the traditional firewall or rule‐based mechanisms. Figure 5 shows an
by the traditional IDS or even the traditional firewall or rule-based mechanisms. Figure 5 shows an
example of the traffic behavior when a scanning tool was used in our testbed.
example of the traffic behavior when a scanning tool was used in our testbed.
Some reconnaissance attacks can be easily detected. For example, there are scanning tools which
send a large number of packets per second under Modbus/TCP to the targeted device and wait for
acknowledgment of the packets from them. If a response is received, the host (i.e., the device) is active.
This attack generates a considerable variation in the traffic behavior which can be easily detected by
the traditional
Future Internet 2018,IDS or even the traditional firewall or rule‐based mechanisms. Figure 5 shows
10, 76 9 ofan
15
example of the traffic behavior when a scanning tool was used in our testbed.

Figure 5. Network traffic behavior under easy to detect attacks.
Figure 5. Network traffic behavior under easy to detect attacks.
Future Internet 2018, 10, x FOR PEER REVIEW 9 of 15

On the other hand, there are some sophisticated reconnaissance attacks which are more difficult
On the other hand, there are some sophisticated reconnaissance attacks which are more difficult
to detect. For example, some exploits can be used to map the network, which results in an attack
to detect. For example, some exploits can be used to map the network, which results in an attack
behavior very similar to normal traffic. Figure 6 illustrates the network traffic behavior during such
behavior very similar to normal traffic. Figure 6 illustrates the network traffic behavior during such
exploit attacks. As can be seen, the change in the traffic behavior is negligible under the attack. Thus,
exploit attacks. As can be seen, the change in the traffic behavior is negligible under the attack. Thus,
it is difficult to detect the attack. The use of rule-based mechanisms would fail because the signature of
it is difficult to detect the attack. The use of rule‐based mechanisms would fail because the signature
the Modbus and TCP traffic do not change, and the language used to express the detection rules may
of the Modbus and TCP traffic do not change, and the language used to express the detection rules
not be expressive enough. On the other hand, the use of ML can improve the detection rate as ML
may not be expressive enough. On the other hand, the use of ML can improve the detection rate as
algorithms can be trained to detect these attack scenarios.
ML algorithms can be trained to detect these attack scenarios.

Figure 6. Network traffic behavior under difficult to detect attacks.
Figure 6. Network traffic behavior under difficult to detect attacks.

We conducted the following reconnaissance and exploit attacks specific to the ICS environment
We conducted the following reconnaissance and exploit attacks specific to the ICS environment
described in Table 4. Details of the commands used to perform the attacks can be found in [28,29].
described in Table 4. Details of the commands used to perform the attacks can be found in [28,29].
During the attacks, the network traffic was captured to be analyzed. We used the following tools to
During the
analyze the attacks, the network
captured traffic was [30],
traffic: Wireshark captured
and toArgus
be analyzed. We used
[31]. The the following
captured tools to
traffic included
analyze the captured traffic: Wireshark [30], and Argus [31]. The captured traffic included unencrypted
unencrypted control information of the devices (valve, pumps, sensors) as well as information
control information of the devices (valve, pumps, sensors) as well as information regarding their type
regarding their type (function codes, type of data). Table 5 presents statistical information about the
(function codes, type of data). Table 5 presents statistical information about the captured traffic.
captured traffic.

Table 4. Reconnaissance attacks carried out against our testbed [29,30].

Attack Name Attack Description
This attack is used to identify common SCADA protocols on the network. Using
Nmap tool, packets are sent to the target at intervals which vary from 1 to 3 s. The
Port Scanner [29]
TCP connection is not fully established so that the attack is difficult to detect by
rules.
This attack is used to scan network addresses and identify the Modbus server
Address Scan address. Each system has only one Modbus server and disabling this device would
Future Internet 2018, 10, 76 10 of 15

Table 4. Reconnaissance attacks carried out against our testbed [29,30].

Attack Name Attack Description

This attack is used to identify common SCADA protocols on the network. Using
Nmap tool, packets are sent to the target at intervals which vary from 1 to 3 s.
Port Scanner [29]
The TCP connection is not fully established so that the attack is difficult to detect
by rules.
This attack is used to scan network addresses and identify the Modbus server
address. Each system has only one Modbus server and disabling this device
Address Scan Attack [29]
would collapse the whole SCADA system. Thus, this attack tries to find the
unique address of the Modbus server so that it can be used for further attacks.
This attack is used to enumerate the SCADA Modbus slave IDs on the network
Device Identification
and to collect additional information such as vendor and firmware from the first
Attack [29]
slave ID found.
Device Identification This attack is similar to the previous attack. However, the scanning uses an
Attack (Aggressive aggressive mode, which means that the additional information about all slave IDs
Mode) [29] found in the system is collected.
Exploit is used to read the coil values of the SCADA devices. The coils represent
Exploit [30] the ON/OFF status of the devices controlled by the PLC, such as motors, valves,
and sensors [29].

Table 5. Statistical information on the captured traffic.

Measurement Value
Duration of capture (h) 25
Dataset length (GB) 1.27
Number of observations 7,049,989
Average data rate (kbit/s) 419
Average packet size (bytes) 76.75
Percentage of scanner attack 3 × 10−4
Percentage of address scan attack 75 × 10−4
Percentage of device identification attack 1 × 10−4
Percentage of device identification attack (aggressive mode) 4.93
Percentage of exploit attack 1.13
Percentage of all attacks (total) 6.07
Percentage of normal traffic 93.93

5.2. Features Selection

Once the network traffic is captured, the next step is to select potential features which can
distinguish the anomalous traffic from the normal traffic. The authors in [19] selected 12 useful features
for ML-based network security monitoring in the ICS networks. In Ref. [32], the authors study the
potential features presented in [19]. In our work, we analyzed the variation of the features during
the normal and attack traffic, and we analyzed those features that did not vary during the normal
and attack traffic. Based on these prior works and our studies, Table 6 shows the features selected for
our dataset.

Table 6. Features selected to create the dataset.

Features Descriptions
Total Packets (TotPkts) Total transaction packet count
Total Bytes (TotBytes) Total transaction bytes
Source packets (SrcPkts) Source/Destination packet count
Destination Packets (DstPkts) Destination/Source packet count
Source Bytes (SrcBytes) Source/Destination transaction bytes
Source Port (Sport) Port number of the source
Future Internet 2018, 10, 76 11 of 15

5.3. Evaluation Scenario

After defining the dataset, the features were extracted as discussed in the previous subsection.
Then, the data was labeled either as normal traffic or attack traffic. Following that, the dataset was
split into training and test datasets. The
Future Internet 2018, 10, x FOR PEER REVIEW training dataset was composed of 80% of the total data, 11 of 15
and it
was used to train our ML model. The test dataset consists of the remaining 20% of the data, and it
was used to evaluate the performance of our trained ML model. We call this training and test phase
as “offline evaluation”, because the ML models were trained and tested offline. Figure 7 shows our
evaluation scenario.
After training and testing, the trained ML models were created and deployed in the network.
Then, their performance was analyzed using real network traffic. This phase was called “online
evaluation”. We compared the results obtained from the two phases (offline and online). This is
described next.
Future Internet 2018, 10, x FOR PEER REVIEW 11 of 15

Figure 7. Model evaluation.

6. Numerical Results
In this section, we present the numerical results of the attacks described in Section 5.1. Figure 8
shows the results for the accuracy of the ML algorithms that were used.
The accuracy represents the total number of correct predictions divided by the total number of
samples (Equation (1)). As shown in Figure 8, considering the offline evaluations, Decision Tree and

KNN have the best accuracy (100%) compared to other ML models. However, the difference in the
Figure 7. Model evaluation.
accuracy is small among all trained models. In other words, all chosen ML algorithms performed well
Figure 7. Model evaluation.
in terms of accuracy during the offline phase. During the online phase, Decision Tree, Random Forest,
6. Numerical Results
Naïve Bayes Results
and Logistic Regression show a small difference, hence, the performance of these
6. Numerical
algorithms in both phases (offline and online), are similar. The same does not apply to the KNN
In this section, we present the numerical results of the attacks described in Section 5.1. Figure 8
In this section, we present the numerical results of the attacks described in Section 5.1. Figure 8
model. There was a significant difference between the online and offline phase which indicates that
shows the results for the accuracy of the ML algorithms that were used.
shows the results for the accuracy of the ML algorithms that were used.
in practice, the KNN does not provide good accuracy.
The accuracy represents the total number of correct predictions divided by the total number of
samples (Equation (1)). As shown in Figure 8, considering the offline evaluations, Decision Tree and
KNN have the best accuracy (100%) compared to other ML models. However, the difference in the
accuracy is small among all trained models. In other words, all chosen ML algorithms performed well
in terms of accuracy during the offline phase. During the online phase, Decision Tree, Random Forest,
Naïve Bayes and Logistic Regression show a small difference, hence, the performance of these
algorithms in both phases (offline and online), are similar. The same does not apply to the KNN
model. There was a significant difference between the online and offline phase which indicates that
in practice, the KNN does not provide good accuracy.

Figure 8. Accuracy results.
Figure 8. Accuracy results.

As shown in Table 5, our dataset is unbalanced. Therefore, accuracy is not the ideal measure to
evaluate accuracy
The represents
performance the total
[33]. Other number
metrics ofneeded
are correct predictions
to compare divided by the total of
the performance number of
the ML
samples (Equation (1)). As shown in Figure 8, considering the offline evaluations, Decision Tree and
algorithms. Figure 9 shows the false alarm rate (FAR) results. The FAR metric is the percentage of the
regular traffic which has been misclassified as anomalous by the model (Equation (2)).
Regarding the offline and online evaluations, as shown in Figure 9, the Random Forest and
Decision Tree models performed best, followed by the KNN model. These three models
had the
lowest false alarm percentages followed by the Logistical Regression and Naïve Bayes. These lowest
Figure 8. Accuracy results.
percentages mean that Random Forest, Decision Tree, and KNN perform better in detecting normal
Future Internet 2018, 10, 76 12 of 15

KNN have the best accuracy (100%) compared to other ML models. However, the difference in the
accuracy is small among all trained models. In other words, all chosen ML algorithms performed
well in terms of accuracy during the offline phase. During the online phase, Decision Tree, Random
Forest, Naïve Bayes and Logistic Regression show a small difference, hence, the performance of these
algorithms in both phases (offline and online), are similar. The same does not apply to the KNN model.
There was a significant difference between the online and offline phase which indicates that in practice,
the KNN does not provide good accuracy.
As shown in Table 5, our dataset is unbalanced. Therefore, accuracy is not the ideal measure to
evaluate performance [33]. Other metrics are needed to compare the performance of the ML algorithms.
Figure 9 shows the false alarm rate (FAR) results. The FAR metric is the percentage of the regular
traffic which has been misclassified as anomalous by the model (Equation (2)).
Regarding the offline and online evaluations, as shown in Figure 9, the Random Forest and
Decision Tree models performed best, followed by the KNN model. These three models had the
lowest false alarm percentages followed by the Logistical Regression and Naïve Bayes. These lowest
percentages mean that Random Forest, Decision Tree, and KNN perform better in detecting normal
traffic. In our dataset, normal traffic is the dominant traffic; therefore, it is expected to have 12 of 15
Future Internet 2018, 10, x FOR PEER REVIEW a low
FAR value. This low FAR value could be due to the model’s bias toward estimating the normal traffic
perfectly,
perfectly, which
which isis
common
common in in
unbalanced datasets.
unbalanced Further,
datasets. the clustering
Further, done in
the clustering the in
done Random Forest,
the Random
Decision Tree and KNN models can be helpful, especially when dealing with two types of data having
Forest, Decision Tree and KNN models can be helpful, especially when dealing with two types of
different network features.
data having different network features.

Figure 9. False alarm rate results.
Figure 9. False alarm rate results.

Figure 10 shows the results of the un‐detection rate metric. The UND (Equation (3)) represents
Figure 10 shows the results of the un-detection rate metric. The UND (Equation (3)) represents the
the percentage of the traffic which is an anomaly but is misclassified as normal (the opposite of the
percentage of the traffic which is an anomaly but is misclassified as normal (the opposite of the FAR).
FAR). The traffic represented by this metric is more critical than the traffic represented by the FAR
The traffic represented by this metric is more critical than the traffic represented by the FAR metric
metric because, in this case, an attack can happen without being detected. Further, in our unbalanced
because, in this case, an attack
dataset, the models are can
biased happen
toward without
normal beingand
traffic detected. Further,
this metric in our
would unbalanced
show dataset,
how biased the
the models are biased toward normal traffic and this metric would show how biased the models are.
models are.
As shown in Figure 10, considering the offline performance results, the percentage of the UND
is small for the Naïve Bayes, Logistic Regression, and KNN models, and zero for the Decision Tree
and Random Forest models. That is, all algorithm shows excellent performance on this critical metric.
However, considering the online performances, KNN model had the worst performance, which was
very different to the offline evaluation. The same did not happen to the other models, and their
online performances are very close to their offline performance. This excellent performance shows
that the features selected in this work are also very good as they were able to detect attacks even in an
unbalanced dataset.
Figure 10 shows the results of the un‐detection rate metric. The UND (Equation (3)) represents
the percentage of the traffic which is an anomaly but is misclassified as normal (the opposite of the
FAR). The traffic represented by this metric is more critical than the traffic represented by the FAR
metric because, in this case, an attack can happen without being detected. Further, in our unbalanced
dataset,
Future the models are
Internet 2018, 10, 76 biased toward normal traffic and this metric would show how biased
13 ofthe
15
models are.

Figure 10. Un‐detected rate results.
Figure 10. Un-detected rate results.

As shown in Figure 10, considering the offline performance results, the percentage of the UND
7. Conclusions
is small for the Naïve Bayes, Logistic Regression, and KNN models, and zero for the Decision Tree
and Random Forest models. That is, all algorithm shows excellent performance on this critical metric.
This paper presents the development of a SCADA system testbed to be used in cybersecurity
However, considering the online performances, KNN model had the worst performance, which was
research. The testbed was dedicated to controlling a water storage tank which is one of several stages
very different to the offline evaluation. The same did not happen to the other models, and their online
in the process of water treatment and distribution. The testbed was used to analyze the effects of
performances are very close to their offline performance. This excellent performance shows that the
the attacks on SCADA systems. Using the network traffic, a new dataset was developed for use by
features selected
researchers to trainin machine
this work are also
learning very good
algorithms as they
as well as towere able
validate andto compare
detect attacks even in
their results an
with
unbalanced dataset.
other available datasets.
Five reconnaissance attacks specific to the ICS environment were conducted against the testbed.
During the attacks, the network traffic with information about the devices (valves, pumps, sensors)
was captured. Using Argus and Wireshark network tools, features were extracted to build a dataset for
training and testing machine learning algorithms.
Once the dataset was generated, five traditional machine learning algorithms were used to
detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN.
These algorithms were evaluated in two phases: during the training and testing of the machine learning
models (offline), and during the deployment of these models in the network (online). The performance
obtained during the online phase was compared to the performance obtained during the offline phase.
Three metrics were used to evaluate the performance of the used algorithms: accuracy, FAR,
and UND. Regarding the accuracy metric, in the offline phase all ML algorithms showed an excellent
performance. In the online phase, almost all the algorithms performed very close to the offline
results. The KNN algorithm was the only one which did not perform well. Moreover, considering an
unbalanced dataset and analyzing the FAR and UND metrics, we concluded that Random Forest and
Decision Tree models performed best in both phases compared to the other models.
The results show the feasibility of detecting reconnaissance attacks in ICS environments.
Our future plans include generating more attacks and checking the models’ feasibility and performance
in different environments. Moreover, experiments using unsupervised algorithms will be done.

Author Contributions: M.A.T. built the testbed and performed the experiments. T.S. and M.Z. assisted with
revisions and improvements. The work was done under the supervision and guidance of R.J., N.M. and M.S.,
who also formulated the problem.
Funding: This work has been supported under the grant ID NPRP 10-901-2-370 funded by the Qatar National
Research Fund (QNRF) and grant #2017/01055-4 São Paulo Research Foundation (FAPESP).
Acknowledgments: The statements made herein are solely the responsibility of the authors. The authors would
like to thank the Instituto Federal de Educação, Ciência e Tecnologia de São Paulo (IFSP), Washington University
in Saint Louis, and Qatar University.
Conflicts of Interest: The authors declare no conflicts of interest.
Future Internet 2018, 10, 76 14 of 15

References
1. Aragó, A.S.; Martínez, E.R.; Clares, S.S. SCADA laboratory and test-bed as a service for critical infrastructure
protection. In Proceedings of the 2nd International Symposium on ICS & SCADA Cyber Security Research,
St Pölten, Austria, 11–12 September 2014.
2. National Communications Systems (NCS). Supervisory Control and Data Acquisition (SCADA) Systems,
Technical Information Bulletin 04-1. 2004. Available online: https://www.cedengineering.com/userfiles/
SCADA%20Systems.pdf (accessed on 8 August 2018).
3. Filkins, B. IT Security Spending Trends. Sans Institute, Tech. Rep. 2016. Available online: https://www.sans.
org/reading-room/whitepapers/analyst/security-spending-trends-36697 (accessed on 5 June 2018).
4. NIST Special Publication 800-82, Revision 2. Guide to Industrial Control Systems (ICS) Security; May 2015.
Available online: http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r2.pdf (accessed on
5 June 2018).
5. Modbus TCP/IP. Available online: http://www.modbus.org/tech.php (accessed on 5 December 2017).
6. Modbus Application Protocol Specification V1.1b3. Available online: http://www.modbus.org/docs/
Modbus_Application_Protocol_V1_1b3.pdf (accessed on 8 August 2018).
7. Morris, T.; Gao, W. Industrial control system traffic data sets for intrusion detection research. In Critical
Infrastructure Protection VIII. ICCIP 2014. IFIP Advances in Information and Communication Technology; Butts, J.,
Shenoi, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2014.
8. Miciolino, E.E.; Bernieri, G.; Pascucci, F.; Setola, R. Communications network analysis in a SCADA system
testbed under cyber-attacks. In Proceedings of the 23rd Telecommunications Forum, Belgrade, Serbia,
24–26 November 2015.
9. Rosa, L.; Cruz, T.; Simões, P.; Monteiro, E.; Lev, L. Attacking SCADA systems: A practical perspective.
In Proceedings of the IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon,
Portugal, 8–12 May 2017.
10. Keliris, A.; Salehghaffari, H.; Cairl, B. Machine learning-based defense against process-aware attacks on
industrial control systems. In Proceedings of the IEEE International Test Conference (ITC), Fort Worth, TX,
USA, 15–17 November 2016.
11. Tomin, N.V.; Kurbatsky, V.G.; Sidorov, D.N.; Zhukov, A.V. Machine learning techniques for power system
security assessment. In Proceedings of the IFAC Workshop on Control of Transmission and Distribution
Smart Grids (CTDSG), Prague, Czech Republic, 11–13 October 2016.
12. Cherdantseva, Y.; Burnap, P.; Blyth, A.; Eden, P.; Jones, K.; Soulsby, H.; Stoddart, K. A review of cyber
security risk assessment methods for SCADA systems. Comput. Secur. 2016, 56, 1–27. [CrossRef]
13. An Industrial Control System Cybersecurity Performance Testbed. 2015. Available online: http://nvlpubs.
nist.gov/nistpubs/ir/2015/NIST.IR.8089.pdf (accessed on 3 June 2018).
14. DNP3. 2018. Available online: https://www.dnp.org/Pages/AboutDefault.aspx (accessed on 3 June 2018).
15. Darwish, I.; Igbe, O.; Saadawi, T. Experimental and theoretical modeling of DNP3 attacks in smart grids.
In Proceedings of the 36th IEEE Sarnoff Symposium, Newark, NJ, USA, 20–22 September 2016.
16. Li, Q.; Feng, X.; Wang, H.; Sun, L. Understanding the usage of industrial control system devices on the
internet. IEEE Internet Things J. 2018, 5, 2178–2189. [CrossRef]
17. Schneider PLC M241CE40. Available online: https://www.schneider-electric.us/en/product/
TM241CE40R/controller-m241-40-io-relay-ethernet/ (accessed on 8 August 2018).
18. Erickson, K.T. Programmable Logic Controllers: An Emphasis on Design and Application; Dogwood Valley Press,
LLC: Rolla, MO, USA, 2011.
19. Mantere, M.; Uusitalo, I.; Sailio, M.; Noponen, S. Challenges of machine learning based monitoring for
industrial control system networks. In Proceedings of the 26th International Conference on Advanced
Information Networking and Applications Workshops, Fukuoka, Japan, 26–29 March 2012.
20. Jordan, M.I.; Ng, A.Y. On discriminative vs. generative classifiers: A comparison of logistic regression and
naive bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems:
Natural and Synthetic, Vancouver, BC, Canada, 3–8 December 2001.
21. Zhang, J.; Zulkernine, M.; Haque, A. Random-forests-based network intrusion detection systems. IEEE Trans.
Syst. Man Cybern. Part C 2008, 38, 649–659. [CrossRef]
Future Internet 2018, 10, 76 15 of 15

22. Amor, N.B.; Benferhat, S.; Elouedi, Z. Naive bayes vs. decision trees in intrusion detection systems. In Proceedings
of the 2004 ACM Symposium on Applied Computing, Nicosia, Cyprus, 14–17 March 2004.
23. Chen, W.; Hsu, S.; Shen, H. Application of SVM and ANN for intrusion detection. Comput. Oper. Res. 2005,
32, 2617–2634. [CrossRef]
24. Zhang, H.; Berg, A.C.; Maire, M.; Malik, J. SVM-KNN: Discriminative nearest neighbor classification for
visual category recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, New York, NY, USA, 17–22 June 2006.
25. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. J. Inf.
Process. Manag. 2009, 45, 427–437. [CrossRef]
26. Buda, M.; Maki, A.; Mazurowski, M.A. A Systematic Study of the Class Imbalance Problem in Convolutional
Neural Networks. Available online: https://arxiv.org/pdf/1710.05381.pdf (accessed on 20 November 2017).
27. He, H.; Eduardo, A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284.
28. Calderon, P. Nmap: Network Exploration and Security Auditing Cookbook, 2nd ed.; Packet Publishing:
Birmingham, UK, 2017.
29. Vulnerability & Exploit Database, Modbus Client Utility. Available online: https://www.rapid7.com/db/
modules/auxiliary/scanner/scada/modbusclient (accessed on 30 January 2017).
30. Wireshark. Available online: https://www.wireshark.org/ (accessed on 20 October 2017).
31. ARGUS. Available online: https://qosient.com/argus/ (accessed on 10 November 2017).
32. Mantere, M.; Sailio, M.; Noponen, S. Network traffic features for anomaly detection in specific industrial
control system network. Futur. Internet 2013, 5, 460–473. [CrossRef]
33. Salman, T.; Bhamare, D.; Erbad, A.; Jain, R.; Samaka, M. Machine learning for anomaly detection and
categorization in multi-cloud environments. In Proceedings of the 4th IEEE International Conference on
Cyber Security and Cloud Computing, New York, NY, USA, 26–28 June 2017.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Automated Scada Systems
No ratings yet
Automated Scada Systems
158 pages
Securing Industrial Control Systems
100% (1)
Securing Industrial Control Systems
42 pages
Critical State Based Intrusion Detection
No ratings yet
Critical State Based Intrusion Detection
145 pages
New Project Page 3
No ratings yet
New Project Page 3
53 pages
Covidien RFA Vein Operator Manual
No ratings yet
Covidien RFA Vein Operator Manual
99 pages
Falih Badran 2020 IOP Conf. Ser. - Mater. Sci. Eng. 917 012059
No ratings yet
Falih Badran 2020 IOP Conf. Ser. - Mater. Sci. Eng. 917 012059
13 pages
Machine Learning-Based Intrusion Detection For Smart Grid Computing: A Survey
No ratings yet
Machine Learning-Based Intrusion Detection For Smart Grid Computing: A Survey
29 pages
SNCS D 23 05223 Reviewer
No ratings yet
SNCS D 23 05223 Reviewer
21 pages
Electronics 3103323 Peer Review v1
No ratings yet
Electronics 3103323 Peer Review v1
18 pages
SCADA Securing System Using Deep Learning To Prevent Cyber - 2023 - Neural Netwo
No ratings yet
SCADA Securing System Using Deep Learning To Prevent Cyber - 2023 - Neural Netwo
12 pages
Elastic Introduction To Application Performance Monitoring
No ratings yet
Elastic Introduction To Application Performance Monitoring
16 pages
ICC2021 FS IDS FinalCamera-Ready
No ratings yet
ICC2021 FS IDS FinalCamera-Ready
7 pages
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Deep Learning For Smart Grid Intrusion Detection: A Hybrid CNN-LSTM-Based Model
No ratings yet
Deep Learning For Smart Grid Intrusion Detection: A Hybrid CNN-LSTM-Based Model
16 pages
Improving SIEM For Critical SCADA Water
No ratings yet
Improving SIEM For Critical SCADA Water
17 pages
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
No ratings yet
Design and Implementation of PV Emulator Based On Synchronous Buck Converter Using Arduino Nano Microcontroller
9 pages
Configurable Mark VIe Control UKCA PDF
No ratings yet
Configurable Mark VIe Control UKCA PDF
1 page
Nigerian Air Force
No ratings yet
Nigerian Air Force
1 page
A Review of Research Works On Supervised Learning Algorithms For SCADA
No ratings yet
A Review of Research Works On Supervised Learning Algorithms For SCADA
19 pages
Cyber Security in Power System Report Final
No ratings yet
Cyber Security in Power System Report Final
28 pages
SSRN 4749820
No ratings yet
SSRN 4749820
8 pages
Conference Major CSE B14 PDF
No ratings yet
Conference Major CSE B14 PDF
15 pages
Power System Reliability Evaluation With SCADA Cybersecurity Considerations
No ratings yet
Power System Reliability Evaluation With SCADA Cybersecurity Considerations
15 pages
Construction and Evaluation of Defense-in-Depth Ar
No ratings yet
Construction and Evaluation of Defense-in-Depth Ar
5 pages
Processes 10 02504
No ratings yet
Processes 10 02504
20 pages
WUSTL-IIOT-2018 Dataset For ICS (SCADA) Cybersecurity Research
No ratings yet
WUSTL-IIOT-2018 Dataset For ICS (SCADA) Cybersecurity Research
3 pages
Ricardo Vargas Pmbok Flow 6ed Color En-A0
No ratings yet
Ricardo Vargas Pmbok Flow 6ed Color En-A0
121 pages
Anomaly Events Classification and Detection System in Critical Industrial Internet of Things Infrastructure Using Machine Learning Algorithms
No ratings yet
Anomaly Events Classification and Detection System in Critical Industrial Internet of Things Infrastructure Using Machine Learning Algorithms
22 pages
Machine Learning and Cyber Security: December 2017
No ratings yet
Machine Learning and Cyber Security: December 2017
8 pages
Industrial Control System-Anomaly Detection Dataset ICS-ADD For Cyber-Physical Security Monitoring in Smart Industry Environments
No ratings yet
Industrial Control System-Anomaly Detection Dataset ICS-ADD For Cyber-Physical Security Monitoring in Smart Industry Environments
10 pages
ICC2021 FS IDS FinalCamera-Ready
No ratings yet
ICC2021 FS IDS FinalCamera-Ready
7 pages
Paper 3 - Securing The Operations in SCADA-IoT Platform
No ratings yet
Paper 3 - Securing The Operations in SCADA-IoT Platform
12 pages
Splnproc1703 D
No ratings yet
Splnproc1703 D
12 pages
Sensors 23 02415
No ratings yet
Sensors 23 02415
18 pages
Electronics
No ratings yet
Electronics
28 pages
Dhs WP BP Machine Learning-Based Network Vulnerability Analysis of Industrial Internet of Things
No ratings yet
Dhs WP BP Machine Learning-Based Network Vulnerability Analysis of Industrial Internet of Things
1 page
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
No ratings yet
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
8 pages
Ics Survey
No ratings yet
Ics Survey
7 pages
A Survey On Industrial Control System Testbeds
No ratings yet
A Survey On Industrial Control System Testbeds
46 pages
Thesis
No ratings yet
Thesis
76 pages
Handa Sharma Shukla 2019 Machine Learning in Cybersecurity A Review
No ratings yet
Handa Sharma Shukla 2019 Machine Learning in Cybersecurity A Review
7 pages
February
No ratings yet
February
2 pages
MachineLearningNotes PDF
100% (1)
MachineLearningNotes PDF
299 pages
Nse4 7.2 April 2023 0021655255099
No ratings yet
Nse4 7.2 April 2023 0021655255099
193 pages
Dynamic Rule Generation For SCADA Intrusion Detect
No ratings yet
Dynamic Rule Generation For SCADA Intrusion Detect
6 pages
Classifier
No ratings yet
Classifier
9 pages
CSSE - Anomaly Detection in ICS Datasets With Machine Learning Algorithms
No ratings yet
CSSE - Anomaly Detection in ICS Datasets With Machine Learning Algorithms
14 pages
Paper 113
No ratings yet
Paper 113
10 pages
Splnproc1703 C
No ratings yet
Splnproc1703 C
12 pages
1 s2.0 S1874548222000348 Main
No ratings yet
1 s2.0 S1874548222000348 Main
15 pages
Machine Learning For Cybersecurity in Smart Grids
No ratings yet
Machine Learning For Cybersecurity in Smart Grids
24 pages
Journal Pre-Proof
No ratings yet
Journal Pre-Proof
29 pages
b0193bc N
No ratings yet
b0193bc N
148 pages
Lecture 09 - Sequential Quadratic Programming
No ratings yet
Lecture 09 - Sequential Quadratic Programming
4 pages
Review On PLC SCADA Based Automated System
No ratings yet
Review On PLC SCADA Based Automated System
6 pages
An Efficient Security Model For Industrial Internet of Things
No ratings yet
An Efficient Security Model For Industrial Internet of Things
12 pages
IEEE Xplore Conference-Template-A4
No ratings yet
IEEE Xplore Conference-Template-A4
4 pages
Texas Homework and Practice Workbook Holt Mathematics Course 2 Answers
100% (1)
Texas Homework and Practice Workbook Holt Mathematics Course 2 Answers
6 pages
Main
No ratings yet
Main
9 pages
Evaluation of Machine Learning Algorithms Used On Attacks Detection in Industrial Control Systems
No ratings yet
Evaluation of Machine Learning Algorithms Used On Attacks Detection in Industrial Control Systems
12 pages
Ayu Shahirah Salem: Objective
No ratings yet
Ayu Shahirah Salem: Objective
2 pages
Behavior Based Anomaly Detection Model in SCADA
No ratings yet
Behavior Based Anomaly Detection Model in SCADA
5 pages
I/A Series System V8.2 Release Notes and Installation Procedures
No ratings yet
I/A Series System V8.2 Release Notes and Installation Procedures
182 pages
Foxboro Control Software: ® Product Specifications
100% (1)
Foxboro Control Software: ® Product Specifications
8 pages
Indin08 Scada
No ratings yet
Indin08 Scada
7 pages
Fulltext
No ratings yet
Fulltext
4 pages
Sonica Eswar Resume
No ratings yet
Sonica Eswar Resume
1 page
A Case Study Applying The Cyber Security Modeling Language: 21, Rue D'artois, F-75008 PARIS
No ratings yet
A Case Study Applying The Cyber Security Modeling Language: 21, Rue D'artois, F-75008 PARIS
8 pages
Accurate Modeling of Modbus/TCP For Intrusion Detection in SCADA Systems
No ratings yet
Accurate Modeling of Modbus/TCP For Intrusion Detection in SCADA Systems
14 pages
Limooezekii Report 7
No ratings yet
Limooezekii Report 7
17 pages
IOT Based Ids System Using ANN
No ratings yet
IOT Based Ids System Using ANN
8 pages
Installation Guide On Windows
No ratings yet
Installation Guide On Windows
92 pages
Secure Ifac11
No ratings yet
Secure Ifac11
7 pages
b0700sn G
No ratings yet
b0700sn G
62 pages
Discontinuation of AxioVision Correlative Particle Analyzer (CAPA) and AxioVision For LCM Systems
No ratings yet
Discontinuation of AxioVision Correlative Particle Analyzer (CAPA) and AxioVision For LCM Systems
4 pages
Foxboro DCS: System Definition V3.6
100% (1)
Foxboro DCS: System Definition V3.6
20 pages
OPPIOT Catalogue
No ratings yet
OPPIOT Catalogue
39 pages
An Evaluation of Machine Learning Methods To Detect Malicious SCADA Communications PDF
No ratings yet
An Evaluation of Machine Learning Methods To Detect Malicious SCADA Communications PDF
6 pages
SY-5005005 HMI EngineersManual
No ratings yet
SY-5005005 HMI EngineersManual
136 pages
QP14 15 Informatics Practice XI Paper
No ratings yet
QP14 15 Informatics Practice XI Paper
12 pages
6 Best Practices To Optimize It Investment
No ratings yet
6 Best Practices To Optimize It Investment
6 pages
Mantra MFS 110
No ratings yet
Mantra MFS 110
8 pages
Multi IDS IEC61850 SCADA Networks
No ratings yet
Multi IDS IEC61850 SCADA Networks
11 pages
ITN260
No ratings yet
ITN260
7 pages
Installation Procedure
No ratings yet
Installation Procedure
9 pages
b0860rg B
No ratings yet
b0860rg B
34 pages
Remotewatch V3.1 Release Notes: I/A Series System
No ratings yet
Remotewatch V3.1 Release Notes: I/A Series System
36 pages
EJX430A General Spec
No ratings yet
EJX430A General Spec
11 pages
Complaint Type:Cyber Crime / Report & Track: Complainant Details
No ratings yet
Complaint Type:Cyber Crime / Report & Track: Complainant Details
2 pages
Remotewatch I/A Series Data Acquisition System (Das) V3.2 Installation and Configuration
No ratings yet
Remotewatch I/A Series Data Acquisition System (Das) V3.2 Installation and Configuration
36 pages
CCS0007 - Laboratory Exercise 3
No ratings yet
CCS0007 - Laboratory Exercise 3
17 pages
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
No ratings yet
Privilege 12 Eylul 2022-2023 Answer Key PDF 10
1 page
Utrasonic Cleaning Generator - 2
No ratings yet
Utrasonic Cleaning Generator - 2
17 pages
ICS Frequently Asked Questions 21 July 2017
No ratings yet
ICS Frequently Asked Questions 21 July 2017
9 pages
Asa 5512x
No ratings yet
Asa 5512x
12 pages
Schematic - Zigbee Stick 4.0 CH340C
No ratings yet
Schematic - Zigbee Stick 4.0 CH340C
1 page
Cisco Asa 5525 X
No ratings yet
Cisco Asa 5525 X
16 pages
Foxboro Evo™ Process Automation System: Product Specifications
No ratings yet
Foxboro Evo™ Process Automation System: Product Specifications
8 pages
H92 Archival Cabling
No ratings yet
H92 Archival Cabling
5 pages
Dunbar Syndrome A Rare Case Report of Ce
No ratings yet
Dunbar Syndrome A Rare Case Report of Ce
3 pages
Cisco AppDynamics Associate Performance Analyst (500-420 CAAPA) – Study Guide
From Everand
Cisco AppDynamics Associate Performance Analyst (500-420 CAAPA) – Study Guide
Anand Vemula
No ratings yet
Ds Epolicy Orchestrator
No ratings yet
Ds Epolicy Orchestrator
7 pages
Exoplayer Dev Playlists HTML
No ratings yet
Exoplayer Dev Playlists HTML
1 page
Cisco ASA ASDM Configuration: Search
No ratings yet
Cisco ASA ASDM Configuration: Search
4 pages
PUBLISHED A Survey of SCADA Testbed Implementation Approaches
No ratings yet
PUBLISHED A Survey of SCADA Testbed Implementation Approaches
8 pages
TS 883xu RP E2124 8G
No ratings yet
TS 883xu RP E2124 8G
2 pages
Foxboro: System Auditor
No ratings yet
Foxboro: System Auditor
15 pages
QNAP NAS Vs HVE NAS
No ratings yet
QNAP NAS Vs HVE NAS
1 page
DLP 9.3P5 On ePO 5.3.0 Installation Document
No ratings yet
DLP 9.3P5 On ePO 5.3.0 Installation Document
3 pages
Foxboro Evo™ Process Automation System: Product Specifications
No ratings yet
Foxboro Evo™ Process Automation System: Product Specifications
12 pages
Ds Cyber Physical Threat Intelligence
No ratings yet
Ds Cyber Physical Threat Intelligence
2 pages
Lead Instrument System Analyst
No ratings yet
Lead Instrument System Analyst
1 page
Emmanuel NIYOMUGABO'S C.V
No ratings yet
Emmanuel NIYOMUGABO'S C.V
5 pages
CSMS. Cyber Security Management System. Conformity Assessment Scheme
No ratings yet
CSMS. Cyber Security Management System. Conformity Assessment Scheme
8 pages
Cirrus: SR22 / SR22T WM Temporary Revision 24-50-02 Electrical Power
No ratings yet
Cirrus: SR22 / SR22T WM Temporary Revision 24-50-02 Electrical Power
4 pages
Framework for SCADA Cybersecurity
From Everand
Framework for SCADA Cybersecurity
Richard Clark
5/5 (1)
VIH Series60
100% (2)
VIH Series60
1 page
Cyber Security Implications of SIS Integration With Control Networks
100% (2)
Cyber Security Implications of SIS Integration With Control Networks
34 pages
b0193df M
No ratings yet
b0193df M
182 pages

SCADA System Testbed For Cybersecurity R

Uploaded by

SCADA System Testbed For Cybersecurity R

Uploaded by

future internet

Keywords: cybersecurity; machine learning; SCADA system; network security

Future Internet 2018, 10, 76; doi:10.3390/fi10080076 www.mdpi.com/journal/futureinternet

2.1. ICS Reference Model

2.2. The SCADA Communication Protocol

Table 2. Description of the devices used in the testbed.

4. Machine Learning Algorithms and Performance Measurements

4.1. Machine Learning Algorithms

1. Logistic Regression [20].

The performance of these algorithms is discussed in Section 6.

4.2. Performance Measurements

Table 3. Confusion matrix in the intrusion detection system (IDS) context.

Data Class Classified as Normal Classified as Abnormal

5.1. Attack Scenarios

Table 4. Reconnaissance attacks carried out against our testbed [29,30].

Attack Name Attack Description

Table 5. Statistical information on the captured traffic.

5.2. Features Selection

Table 6. Features selected to create the dataset.

5.3. Evaluation Scenario

You might also like