0% found this document useful (0 votes)
8 views14 pages

Limiting The Impact of Stealthy Attacks On Industrial Control Systems

Uploaded by

onik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Limiting The Impact of Stealthy Attacks On Industrial Control Systems

Uploaded by

onik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Limiting the Impact of Stealthy Attacks on Industrial

Control Systems

David I. Urbina1 , Jairo Giraldo1 , Alvaro A. Cardenas1 , Nils Ole Tippenhauer2 ,


Junia Valente1 , Mustafa Faisal1 , Justin Ruths1 , Richard Candell3 , and Henrik Sandberg4
1
University of Texas at Dallas, 2 Singapore University of Technology and Design,
3
National Institute of Standards and Technology, and 4 KTH Royal Institute of Technology
{david.urbina, jairo.giraldo, alvaro.cardenas, juniavalente, mustafa.faisal, jruths}@utdallas.edu,
[email protected], [email protected], and [email protected]

ABSTRACT tems (fluid dynamics) or the power grid (electromagnetics)


While attacks on information systems have for most prac- can be used to create prediction models that we can then
tical purposes binary outcomes (information was manipu- use to confirm that the control commands sent to the field
lated/eavesdropped, or not), attacks manipulating the sen- were executed correctly and that the information coming
sor or control signals of Industrial Control Systems (ICS) can from sensors is consistent with the expected behavior of the
be tuned by the attacker to cause a continuous spectrum in system: if we opened an intake valve, we would expect the
damages. Attackers that want to remain undetected can at- water tank level to rise, otherwise we may have a problem
tempt to hide their manipulation of the system by following with the control, actuator, or the sensor.
closely the expected behavior of the system, while injecting The idea of using physics-based models of the normal op-
just enough false information at each time step to achieve eration of control systems to detect attacks has been used in
their goals. an increasing number of publications in security conferences
In this work, we study if physics-based attack detection in the last couple of years. Applications include water con-
can limit the impact of such stealthy attacks. We start with trol systems [21], state estimation in the power grid [35, 36],
a comprehensive review of related work on attack detection boilers in power plants [67], chemical process control [10],
schemes in the security and control systems community. We electricity consumption data from smart meters [40], and a
then show that many of these works use detection schemes variety of industrial control systems [42].
that are not limiting the impact of stealthy attacks. We pro- The growing number of publications shows the importance
pose a new metric to measure the impact of stealthy attacks of leveraging the physical properties of control systems for
and how they relate to our selection on an upper bound on security; however, a missing element in this growing body
false alarms. We finally show that the impact of such attacks of work is a unified adversary model and security metric to
can be mitigated in several cases by the proper combination help us compare the e↵ectiveness of previous proposals. In
and configuration of detection schemes. We demonstrate particular, the problem we consider is one where the attacker
the e↵ectiveness of our algorithms through simulations and knows the attack-detection system is in place and bypasses it
experiments using real ICS testbeds and real ICS systems. by launching attacks imitating our expected behavior of the
system, but di↵erent enough that over long periods of time
it can drive the system to an unsafe operating state. This
Keywords attacker is quite powerful and can provide an upper bound
Industrial Control Systems; Intrusion Detection; Security on the worst performance of our attack-detection tools.
Metrics; Stealthy Attacks; Physics-Based Detection; Cyber- Contributions. (i) We propose a strong adversary model
Physical Systems that will always be able to bypass attack-detection mech-
anisms and propose a new evaluation metric for attack-
detection algorithms that quantifies the negative impact of
1. INTRODUCTION these stealthy attacks and the inherent trade-o↵ with false
One of the fundamentally unique and intrinsic proper- alarms. Our new metric helps us compare in a fair way
ties of Industrial Control Systems (ICS)—when compared previously proposed attack-detection mechanisms.
to general Information Technology (IT) systems— is that (ii) We compare previous attack-detection proposals across
changes in the system’s state must follow immutable laws of three di↵erent experimental settings: a) a testbed operating
physics. For example, the physical properties of water sys- real-world systems, b) network data we collected from an
operational large-scale Supervisory Control and Data Acqui-
sition (SCADA) system that manages more than 100 Pro-
Publication rights licensed to ACM. ACM acknowledges that this contribution was grammable Logic Controllers (PLCs), and c) simulations.
authored or co-authored by an employee, contractor or affiliate of the United States (iii) Using these three scenarios we find the following re-
government. As such, the Government retains a nonexclusive, royalty-free right to
publish or reproduce this article, or to allow others to do so, for Government purposes sults: (a) while the vast majority of previous work uses state-
only. less tests on residuals, stateful tests are better in limiting
CCS’16, October 24 - 28, 2016, Vienna, Austria the impact of stealthy attackers (for the same levels of false
© 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. alarms), (b) limiting the impact of a stealthy attacker can
ACM ISBN 978-1-4503-4139-4/16/10. . . $15.00 also depend on the specific control algorithm used and not
DOI: http://dx.doi.org/10.1145/2976749.2978388 only on the attack-detection algorithm, (c) linear state-space

1092
models outperform output-only autoregressive models, (d)
time and space correlated models outperform models that
do not exploit these correlations, and (e) from the point of
view of an attacker, launching undetected actuator attacks is
more difficult than launching undetected false-data injection
for sensor values.
The remainder of this paper is organized as follows: In
§ 2, we provide the scope of the paper, and provide the
background to analyze previous proposals. We introduce
our attacker model and the need for new metrics in § 3. We Figure 1: Di↵erent attack points in a control sys-
introduce a way to evaluate the impact of undetected attacks tem: (1) Attack on the actuators (blue): vk j uk , (2)
and attack-detection systems in § 4, and then we use this Attack on the sensors (purple): yk j zk , (3) Attack
adversary model and metric to evaluate the performance of on the controller (red): uk j K(yk )
these systems in physical testbeds, real-world systems, and
simulations in § 5.
popular models used by the papers we survey are Auto-
2. BACKGROUND AND TAXONOMY Regressive (AR) models and Linear Dynamical State-
space (LDS) models.
Scope of Our Study. We focus on using real-time mea- An AR model for a time series yk is given by
surements of the physical world to build indicators of at-
tacks. In particular, we look at the physics of the process un- k

der control but our approach can be extended to the physics ŷk+1 = = ↵i yi + ↵0 (1)
of devices as well [18]. Our work is motivated by false sensor i=k N
measurements [35, 58] or false control signals like manipu- where ↵i are obtained through system identification and yi
lating vehicle platoons [19], manipulating demand-response the last N sensor measurements. The coefficients ↵i can be
systems [58], and the sabotage Stuxnet created by manip- obtained by solving an optimization problem that minimizes
ulating the rotation frequency of centrifuges [17, 32]. The the residual error (e.g., least squares) [37].
question we are trying to address is how to detect these If we have inputs (control commands uk ) and outputs
false sensor or false control attacks in real-time. (sensor measurements yk ) available, we can use subspace
2.1 Background model identification methods, producing LDS models:
A general feedback control system has five components: xk+1 = Axk + Buk + ✏k
(1) the physical phenomena of interest (sometimes called yk = Cxk + Duk + ek (2)
the process or plant), (2) sensors that send a time series yk
denoting the value of the physical measurement zk at time where A, B, C, and D are matrices modeling the dynamics
k (e.g., the voltage at 3am is 120kV) to a controller, (3) of the physical system. Most physical systems are strictly
based on the sensor measurements received yk , the controller causal and therefore D = 0 in general. The control com-
K(yk ) sends control commands uk (e.g., open a valve by 10
p
mands uk " R a↵ect the next time step of the state of the
n q
%) to actuators, and (4) actuators that produce a physical system xk " R and sensor measurements yk " R are mod-
change vk in response to the control command (the actuator eled as a linear combination of these hidden states. ek and
is the device that opens the valve). ✏k are sensor and perturbation noise, and are assumed to be
A general security monitoring architecture for control sys- a random process with zero mean. To make a prediction,
tems that looks into the “physics” of the system needs an we i) first need yk and uk to obtain a state estimate x̂k+1
anomaly detection system that receives as inputs the sensor and ii) use the estimate to predict ŷk+1 = C x̂k+1 . A large
measurements yk from the physical system and the control body of work on power systems employs the second equation
commands uk sent to the physical system, and then uses from Eq. (2) without the dynamic state equation. We refer
them to identify any suspicious sensor or control commands to this special case of LDS used in power systems as Static
is shown in Fig. 1. Linear State-space (SLS) models.
Detection Statistic. If the observations we get from sen-
2.2 Taxonomy sors yk are significantly di↵erent from the ones we expect
Anomaly detection is usually performed in two steps. First (i.e., if the residual is large) we generate an alert. A State-
we need a model of the physical system that predicts the less test, raises an alarm for every deviation at time k: i.e.,
output of the system ŷk . The second step compares that if ∂yk ŷk ∂ = rk ' ⌧ , where ⌧ is a threshold.
prediction ŷk to the observations yk and then performs a In a Stateful test we compute an additional statistic Sk
statistical test on the di↵erence. The di↵erence between that keeps track of the historical changes of rk (no mat-
prediction and observation is usually called the residual rk . ter how small) and generate an alert if Sk ' ⌧ , i.e., if
We now present our new taxonomy for related work, based there is a persistent deviation across multiple time-steps.
on four aspects: (1) physical model, (2) detection statistic, There are many tests that can keep track of the histori-
(3) metrics, and (4) validation. cal behavior of the residual rk such as taking an average
Physical Model. The model of how a physical system be- over a time-window, an exponential weighted moving aver-
haves can be developed from physical equations (Newton’s age (EWMA), or using change detection statistics such as
laws, fluid dynamics, or electromagnetic laws) or it can be the non-parametric CUmulative SUM (CUSUM) statistic.
learned from observations through a technique called system The nonparametric CUSUM statistic is defined recursively
as S0 = 0 and Sk+1 = (Sk + ∂rk ∂ ) , where (x) represents
+ +
identification [4, 38]. In system identification one often has
to use either Auto-Regressive Moving Average with eXoge- max(0, x) and is selected so that the expected value of
nous inputs (ARMAX) or linear state-space models. Two ∂rk ∂ < 0 under hypothesis H0 (i.e., prevents Sk from

1093
increasing consistently under normal operation). An alert other, and this makes it difficult to build upon previous work
is generated whenever the statistic is greater than a previ- (it is impossible to identify best practices without a way
ously defined threshold Sk > ⌧ and the test is restarted with to compare di↵erent proposals). To address this problem
Sk+1 = 0. The summary of our taxonomy for modeling the we propose a general-purpose evaluation metric in § 4 that
system and to detect an anomaly in the residuals is given in leverages our stealthy adversary model, and then compare
Fig. 2 previously proposed methods. Our results show that while
stateless tests are more popular in the literature, stateful
tests are better to limit the impact of stealthy attackers.
Detection In addition, we show that LDS models are better than AR
Residual Generation models, that AR models proposed in previous work can be
yk Anomaly improved by leveraging correlation among di↵erent signals,
rk Detection: alert
and that having an integral controller can limit the impact
yk Physical ŷk rk = yk ŷk Sateless or
1
Model Stateful of stealthy actuation attacks.
uk LDS or AR To address point (4) we conduct experiments using all
three options: a testbed with a real physical process under
control § 5.1, real-world data § 5.2, and simulations § 5.3. We
Figure 2: The detection block from Fig. 1 focusing show the advantages and disadvantages of each experimental
on our taxonomy. setup, and the insights each of these experiments provide.

Metrics. An evaluation metric is used to determine the ef- 3. MOTIVATING EXAMPLE


fectiveness of the physics-based attack detection algorithm. The testbed we use for our experiments is a room-size,
Popular evaluation metrics are the True Positive Rate (TPR) water treatment plant consisting of 6 stages to purify raw
and the False Positive Rate (FPR)—the trade-o↵ between water. The testbed has a total of 12 PLCs (6 main PLCs and
these two numbers is called the Receiver Operating Char- 6 in backup configuration to take over if the main PLC fails).
acteristic (ROC) curve. Some papers just plot the residuals The general description of each stage is as follows: Raw wa-
(without quantifying the TPR or FPR values), and other ter storage is the part of the process where raw water is
papers just measure the impact of attacks. stored and it acts as the main water bu↵er supplying water
Validation. The experimental setting to validate proposals to the water treatment system. It consists of one tank, an
can use simulations, data from real-world operating systems, on/o↵ valve that controls the inlet water, and a pump that
and testbeds. Testbeds can be classified as testbeds control- transfers the water to the ultra filtration (UF) tank. In Pre-
ling a real-system or a testbed with Hardware-in-the-Loop treatment the Conductivity, pH, and Oxidation-Reduction
(HIL) where part of the physical system is simulated in a Potential (ORP) are measured to determine the activation of
computer. For our purposes a HIL testbed is similar to chemical dosing to maintain the quality of the water within
having pure simulations, because the model of the physical some desirable limits. This stage is illustrated in Fig. 3 and
system is given by the algorithm running on a computer. will be used in our motivating example. Ultra Filtration is
used to remove the bulk of the feed water solids and col-
2.3 Limitations of Previous Work loidal material by using fine filtration membranes that only
There is a large variety of previous work but because of the allow the flow of small molecules. After the small residu-
diversity of domains (e.g., power systems, industrial control, als are removed by the UF system, the remaining chlorines
and theoretical studies) and academic venues (e.g., security, are destroyed in the Dechlorinization stage, using ultraviolet
control theory, and power systems conferences), the field chlorine destruction unit and by dosing a solution of sodium
has not been presented in a unified way with a common bisulphite. Reverse Osmosis (RO) system is designed to
language that can be used to identify trends, alternatives, reduce inorganic impurities by pumping the filtrated and
and limitations. Using our previously defined taxonomy, in dechlorinated water with a high pressure. Finally, in RO
this section we discuss previous work and summarize our final product stage stores the RO product (clean water).
results in Table 1.
The columns in Table 1 are arranged by conference venue
(we assigned workshops to the venue that the main confer-
ence is associated with), we also assigned conferences asso-
ciated with CPSWeek to control conferences because of the
overlap of attendees to both venues. We make the follow-
ing observations: (1) the vast majority of prior work use
stateless tests; (2) most control and power grid venues use
LDS (or their static counterpart SLS) to model the physical
system, while computer security venues tend to use a vari-
ety of models, several of them are non-standard and difficult
to replicate by other researchers; (3) there is no consistent
Figure 3: Stage controlling the pH level.
metric or adversary model used to evaluate proposed attack-
detection algorithms; and (4) no previous work has validated
their work with all three options: simulations, testbeds and Attacking the pH level. In this process, the water’s pH
real-world data. level is controlled by dosing the water with Hydrochloric
The first three observations (1-3) are related: while previ- Acid (HCl). Fig. 4 illustrates the normal operation of the
ous work has used di↵erent statistical tests (stateless vs. state- plant: if the pH sensor reports a level above 7.05, the PLC
ful) and models of the physical system to predict its expected sends a signal to turn On the HCl pump, and if the sensor
behavior, so far they have not been compared against each reports a level below 6.95, it sends a signal to turn it O↵.

1094
Table 1: Taxonomy of related work. Columns are organized by publication venue.

[21] Hadziosmanovic et al.


[57] Sridhar, Govindarasu
[29] Koutsandria et al.

[65] Valente, Cardenas


[50] Pasqualetti et al.
[54] Sandberg et al.

[10] Cardenas et al.


[40] Mashima et al.
[13] Dan, Sandberg

[49] Parvania et al.

[55] Shoukry et al.


[59] Teixeira et al.

[61] Teixeira et al.

[60] Teixeira et al.

[47] Morrow et al.


[30] Krotofil et al.
[66] Vukovic, Dan

[9] Carcano et al.


[44] Mo, Sinopoli

[35, 36] Liu et al.


[1, 2] Amin et al.

[53] Sajjad et al.


[42] McLaughlin
[28] Kosut et al.
[25] Kerns et al.
[31] Kwon et al.

[67] Wang et al.


[33] Liang et al.

[14] Davis et al.


[20] Giani et al.
[8] Bobba et al.

[43] Miao et al.

[16] Eyisi et al.

[26] Kim, Poor

[27] Kiss et al.


[6] Bai, Gupta

[23] Hou et al.

[12] Cui et al.

[22] Hei et al.


[34] Lin et al.
[46] Mo et al.

[45] Mo et al.

[15] Do et al.
[7] Bai et al.

[56] Smith
Venue Control Smart/Power Grid Security Misc.
Detection Statistic
stateless c c c - - - c c c G
# c c c c c - c G
# c - c c c c c - c - c G
# c - c - G # - - c c c c c - c
# G
# l l - - - - - - - - - c - - - l - - - - - l - c - - - l - c - - c c c - - c - l -
stateful - - - G
Physical Model
AR - - - - - - - - - - - - - - - - - - - - - - - - - - - c - - - - c - - - - - - - - - - - -
SLS c c G# - - - - - - - - - - - - - - - - c c c c c G
# - - - c - - - - - - - - - - G
# G
# c - - -
LDS - - - c c c c c c c c c c c c c c c c - - - - - - - - - - - - c - c - - c - - - - - - - -
other - - - - - - - - - - - - - - - - - - - - - - - - - c c - - G
# G
# - - - c c - c c - - - c G
# c
ò
Metrics
impact - c - c - - c - c c - c c c c - - - c c - c - c - - c c c - c - - - - - c - - c - c - c -
statistic - - c - c - - c c - - c c c c - c - - c - - c - - - c - c - - - - - - - - c c c - c - - c
TPR - - - - c c - - - - c - - - - c - - - - c - c - - c - - - - - c c - c - - - - - - - c - -
FPR - - - - c - - - - - c - - - - c - - - - - - c - - c - c - - - c - - c - - - - - - - c - -
Validation
simulation - c c c c c c c c c c c c c - c - c c c - c c c c c G
# - c - c - - c c - c - c c c c - - c
real data - - - - - - - - - - - - - - - - c - - - - - - - G
# c - c - - - - c - - - - c - - - - - c -
testbed - - - - - - - - - - - - - - c - - - c - - - - - - - - - - G
# - c c - - c - - - - - - c - -

Legend: c: feature considered by authors, G


#: feature assumed implicitly but exhibits ambiguity, l: a windowed stateful
ò
detection method is used, Evaluation options have been abbreviated in the table: Attack Impact, Statistic Visualization,
True Positive Rate, False Positive Rate.

The wide oscillations of the pH levels occur because there is by injecting a malicious device in the EtherNet/IP ring of
a delay between the control actions of the HCl pump, and the testbed, given that the implementation of this protocol
the water pH responding to it. is unauthenticated. A detailed implementation of our attack
is given in our previous work [64]. In particular, our MitM
8.5 intercepts sensor values coming from the HCL pump and
Water pH measure
8 HCl Pump On the pH sensor, and intercept actuator commands going to
Pump State

the HCl pump, to inject false sensor readings and commands


Water pH

7.5

7 Off sent to the PLC and HCl pump.


6.5

6
14
1 2 3 4 5 6 7 8 9 10 Real Water pH
Time(min) 12 Compromised pH
Water pH

10 Attack
Figure 4: During normal operation, the water pH is
8
kept in safe levels.
6

To detect attacks on the PLC, the pump or the sensor, 4


1 2 3 4 5 6 7 8 9 10 11
we need to create a model of the physical system. While Time(min)
5
the system is nonlinear, let us first attempt it to model it Stateful
Detection Metric

as time-delayed LDS of order 2. The model is described by 4 Stateless


pHk+1 = pHk + uk Tdelay , where we estimate (by observing 3
Alarm
the process behavior) uk Tdelay = 0.1 after a delay of 35
2
time steps after the pump is turned On, and 0.1 after a delay
of 20 time steps after it is turned O↵. We then compare the 1
predicted and observed behavior, compute the residual, and 0
apply a stateless, and a stateful test to the residual. If either 1 2 3 4 5 6 7 8 9 10 11
of these statistics goes above a defined threshold, we raise Time(min)
an alarm.
We note that high or low pH levels can be dangerous. Figure 5: Attack to the pH sensor.
In particular, if the attacker can drive the pH below 5, the
acidity of the water will damage the membranes of the Ultra Our attack sends false sensor data to the PLC, faking a
Filtration and Reverse Osmosis stages, the pipes, and even high pH level so the pump keeps running, and thus driving
sensor probes. the acidity of the water to unsafe levels, as illustrated in
We launch a wired Man-In-The-Middle (MitM) attack be- Fig. 5. Notice that both, stateless and stateful tests detect
tween the field devices (sensors and actuators) and the PLC this attack (each test has a di↵erent threshold set to main-

1095
10
Attack
4. A STRONGER ADVERSARY MODEL
We assume an attacker that has compromised a sensor

HCl Pump
Water pH
8 On
(e.g. pH level in our motivating example) or an actuator
Off
(e.g. pump in our motivating example) in our system. We
6
Real Water pH
Compromised HCl Pump
also assume that the adversary has complete knowledge of
4
our system, i.e. she knows the physical model we use, the
1 2 3 4 5 6 7 8 9 10 statistical test we use, and the thresholds we select to raise
1.5 Time(min) alerts. Given this knowledge, she generates a stealthy at-
Detection Metric

Stateful tack, where the detection statistic will always remain below
Stateless
1 Alarm the selected threshold.
While similar stealthy attacks have been previously pro-
0.5 posed [13, 35, 36], in this paper we extend them for generic
control systems including process perturbations and mea-
0
1 2 3 4 5 6 7 8 9 10
surement noise, we force the attacks to remain stealthy against
Time(min) stateful tests, and also force the adversary to optimize the
negative impact of the attack. In addition, we assume our
adversary is adaptive, so if we lower the threshold to fire
Figure 6: Attack to the pump actuator.
an alert, the attacker will also change the attack so that
the anomaly detection statistic remains below the thresh-
old. This last property is illustrated in Fig. 7.
tain a probability of false alarm of 0.01). We also launched
Notice that this type of adaptive behavior is di↵erent from
an attack on the pump (actuator). Here the pump ignores
how traditional metrics such as ROC curves work, because
O↵ control commands from the PLC, and sends back mes-
they use the same attacks for di↵erent thresholds of the
sages stating that it is indeed O↵, while in reality it is On.
anomaly detector. On the other hand, our adversary model
As illustrated in Fig. 6, only the stateful test detects this
requires a new and unique (undetected) attack specifically
attack. We also launched several random attacks that were
tailored for every anomaly detection threshold. If we try
easily detected by the stateful statistic, and if we were to
to compute an ROC curve under our adversary model we
plot the ROC curve of these attacks, we would get 100%
would get a 0% detection rate because the attacker would
detection rate.
generate a new undetected attack for every anomaly detec-
Observations. As we can see, it is very easy to create tion threshold.
attacks that can be detected. Under these simulations we This problem is not unique to ROC curves: most popular
could initially conclude that our LDS model combined with metrics for evaluating the classification accuracy of intrusion
the stateful anomaly detection are good enough; after all, detection systems (like the intrusion detection capability, the
they detected all attacks we launched. However, are these Bayesian detection rate, accuracy, expected cost, etc.) are
attacks enough to conclude that our LDS model is good known to be a multi-criteria optimization problem between
enough? And if these attacks are not enough, then which two fundamental trade-o↵ properties: the false alarm rate,
types of attacks should we launch? and the true positive rate [11], and as we have argued, using
Notice that for any physical system, a sophisticated at- any metric that requires a true positive rate will be inef-
tacker can spoof deviations that follow relatively close the fective against our adversary model launching undetected
“physics” of the system while still driving the system to a attacks.
di↵erent state. How can we measure the performance of our
anomaly detection algorithm against these attacks? How Observation. Most intrusion detection metrics are varia-
can we measure the e↵ectiveness of our anomaly detection tions of the fundamental trade-o↵ between false alarms and
tool if we assume that the attacker will always adapt to our true positive rates [11], however, our adversary by definition
algorithms and launch an undetected attack? And if our will never be detected so we cannot use true positive rates
algorithms are not good enough, how can we design better (or variations thereof). Notice however that by forcing our
algorithms? If by definition the attack is undetected, then adversary to remain undetected, we are e↵ectively forcing
we will always have a 0% true positive rate, therefore we her to launch attacks that follow closely the physical behav-
need to devise new metrics to evaluate our systems. ior of the system (more precisely, we are forcing our attacker
to follow more closely our Physical Model ), and by following
closer the behavior of the system, then the attack impact is
D reduced: the attack needs to appear to be a plausible phys-
ical system behavior. So the trade-o↵ we are looking for
with this new adversary model is not one of false positives
vs. true positives, but one between false positives and the
impact of undetected attacks.
Attacks
New Metric. To define precisely what we mean by impact
T
of undetected attack we select one (or more) variables of
interest (usually a variable whose compromise can a↵ect the
Figure 7: Our attacker adapts to di↵erent detection safety of the system) in the process we want to control–
thresholds: If we select ⌧2 the adversary launches e.g., the pH level in our motivating example. The impact
an attack such that the detection statistic (dotted of the undetected attack will then be, how much can the
blue) remains below ⌧2 . If we lower our threshold to attacker drive that value towards its intended goal (e.g., how
⌧1 , the adversary selects a new attack such that the much can the attacker lower the pH level while remaining
detection statistic (solid red) remains below ⌧1 . undetected) per unit of time.
Therefore we propose a new metric consisting of the trade-

1096
a
attack yk is a new vector where some (or all) of the sensor

Less deviation = More Secure


by undetected attacks per time unit
Tradeoff Curve of Detector 1
measurements are compromised.
An optimal greedy-attack (y ) at time k " [, f ] (where
Maximum deviation imposed aò
Tradeoff Curve of Detector 2
 and f are the initial and final attack times, respectively),
f (yk+1 ) (where
Security Metric:

aò a
satisfies the equation: yk+1 = arg maxyk+1 a

f (yk+1 ) is defined by the designer of the detection method


Detector 2 is better than Detector 1: a
For the same level of false alarms,
undetected attackers can cause to quantify the attack impact) subject to not raising an alert
less damage to the system (instead of max it can be min). For instance, if f (yk+1 ) =
a

Ωyk+1 yk+1 Ω, the greedy attack for a stateless test is: yk+1 =
a aò
Longer time between false alarms = More Usable
ŷk+1 ± ⌧. The greedy optimization problem for an attacker
aò a
facing a stateful CUSUM test becomes yk+1 = max{yk+1 ⇥
Sk+1 & ⌧ }. Because Sk+1 = (Sk +rk ) the optimal attack is
Usability Metric: Expected time between false alarms

given when Sk = ⌧ , which results in yk+1 = ŷk+1 ±(⌧ + Sk ).



Figure 8: Illustration of our proposed tradeo↵ met-
ric. The y-axis is a measure of the maximum devi- For all attack times k greater than the initial time of attack

ation imposed by undetected attacks per time unit , Sk = ⌧ and yk+1 = ŷk+1 ± .
X /T U , and the x-axis represents the expected time
Generating undetectable actuator attacks is more diffi-
between false alarms E[Tf a ]. Anomaly detection al- cult than sensor attacks because in several practical cases
gorithms are then evaluated for di↵erent points in it is impossible to predict the outcome yk+1 with 100% accu-
this space. racy, given the actuation attack signal vk in Fig. 1. For our
experiments when the control signal is compromised in § 5.3,
we use the linear state space model from Eq. (2) to do a re-

o↵ between the maximum deviation per time unit imposed verse prediction from the intended yk+1 to obtain the control
by undetected attacks (y-axis) and the expected time be- signal vk that will generate that next sensor observation.
tween false alarms (x-axis). Our proposed trade-o↵ metric Computing the X-axis of our Metric. Most of the lit-
is illustrated in Fig. 8, and its comparison to the perfor- erature that reports false alarms uses the false alarm rate
mance of Receiver Operating Characteristic (ROC) curves metric. This value obscures the practical interpretation of
against our proposed adversary model is illustrated in Fig. 9. false alarms: for example a 0.1% false alarm rate depends
on the number of times an anomaly decision was made, and
New Metrics for
Stronger Adversary Model
the time-duration of the experiment: and these are vari-
ables that can be selected: for example a stateful anomaly
detection algorithm that monitors the di↵erence between ex-
ROC for
Stronger Adversary Model
pected ŷk and observed yk behavior has three options with
every new observation k: (1) it can declare the behavior as
normal, (2) it can generate an alert, (3) it can decide that
the current evidence is inconclusive, and it can decide to
take one more measurement yk+1 .
Because the amount of time T that we have to observe the
process and then make a decision is not fixed, but rather is
Figure 9: Comparison of ROC curves with our pro- a variable that can be selected, using the false alarm rate is
posed metric: ROC curves are not a useful metric misleading and therefore we have to use ideas from sequential
against a stealthy and adaptive adversary. detection theory [24]. In particular, we use the average time
between false alarms TF A , or more precisely, the expected
time between false alarms E[TF A ]. We argue that telling
Notice that while the y-axis of our proposed metric is com-
security analysts that e.g., they should expect a false alarm
pletely di↵erent to ROC curves, the x-axis is similar, but
every hour is a more direct and intuitive metric rather than
with a key di↵erence: instead of using the probability of
giving them a probability of false alarm number over a deci-
false alarms, we use instead the expected time between false
alarms E[Tf a ]. This quantity has a couple of advantages
sion period that will be variable if we use stateful anomaly
detection tests. This way of measuring alarms also deals
over the false alarm rate: (1) it addresses the deceptive na-
with the base rate fallacy, which is the problem where low
ture of low false alarm rates due to the base-rate fallacy [5],
false alarm rates such as 0.1% do not have any meaning un-
and (2) it addresses the problem that several anomaly de-
less we understand the likelihood of attacks in the dataset
tection statistics make a decision (“alarm” or “normal be-
(the base rate of attacks). If the likelihood of attack is low,
havior”) at non-constant time-intervals.
then low false alarm rates can be deceptive [5].
We now describe how to compute the y-axis and the x-axis
In all the experiments, the usability metric for each evalu-
of our proposed metric.
ated detection mechanism is obtained by counting the num-
4.1 Computing the X and Y axis of Fig. 8 ber of false alarms nF A for an experiment with a duration
TE under normal operation (without attack), so for each
Computing Attacks Designed for the Y-axis of our threshold ⌧ we calculate the estimated time for a false alarm
Metric. The adversary wants to maximize the deviation by E[Tf a ] ⌅ TE /nF A. Computing the average time be-
of a variable of interest yk (per time unit) without being tween false alarms in the CUSUM test is more complicated
detected. The true value of this variable is yk , yk+1 , . . . , yN , than with the stateless test. In the CUSUM case, we need to
and the attack starts at time k, resulting in a new observed compute the evolution of the statistic Sk for every threshold
a a a
time series yk , yk+1 , . . . , yN . The goal of the attacker is to we test, because once Sk hits the threshold we have to reset
maximize the distance maxi ∂∂yi yi ∂∂. Recall that in general
a
it to zero.
yk can be a vector of n sensor measurements, and that the Notice that while we have defined a specific impact for

1097
Algorithm 1: Computing Y axis A physical testbed has typically a smaller scale than a
f (yk+1 )
a real-world operational system, so the fidelity in false alarms
1: Define
2: Select ⌧set = {⌧1 , ⌧2 , . . .}, , f , and
might not be as good as with real data, but on the other
Kset = {, . . . , kf 1}
hand, we can launch attacks. The attacks we can launch are,
however, constrained because physical components and de-
3: æ(⌧, k) " ⌧set ✓ Kset , find vices may su↵er damage by attacks that violate the safety re-
4: quirements and conditions for which they were designed for.
yk+1 (⌧ ) = arg max f (yk+1 )
aò a
Moreover, attacks could also drive the testbed to states that
a
yk+1 endanger the operator’s and environment’s safety. There-
s.t. fore, while a testbed provides more experimental interaction
than real data, it introduces safety constraints for launching
Detection Statistic & ⌧
attacks.
5: æ⌧ " ⌧set , calculate Simulations on the other hand, do not have these con-
axis = max f (yk+1 (⌧ ))
aò straints and a wide variety of attacks can be launched. So
y our simulations will focus on attacks to actuators and demon-
k"Kset
strate settings that cannot be achieved while operating a
real-world system because of safety constraints. Simulations
also allow us to easily change the control algorithms and to
Algorithm 2: Computing X axis our surprise, we found that control algorithms have a big
1: Observations Y
na
with no attacks of time-duration TE impact on the ability of our attacker to achieve good results
2: æ⌧ " ⌧set , compute in the y-axis of our metric. However, while simulations allow

Detection Statistic: DS (Y )
na
us to test a wide variety of attacks, the problem is that the
false alarms measured with a simulation are not going to be
Number of false alarms: nF A(DS , ⌧ ) as representative as those obtained from real data or from a
x axis = E[Tf a (⌧ )] = TE /nF A
testbed.

5.1 Physical Testbed (EtherNet/IP packets)


In this section, we focus on testbeds that control a real
physical process, as opposed to testbeds that use a Hardware-
undetected attacks in our y-axis for clarity, we believe that In-the-Loop (HIL) simulation of the physical process. A HIL
designers who want to evaluate their system using our met- testbed is similar to the experiments we describe in § 5.3.
ric should define an appropriate worst case undetected attack We developed an attacker who has complete knowledge
optimization problem specifically for their system. In par- of the physical behavior of the system and can manipulate
ticular, the y-axis can be a representation of a cost function EtherNet/IP packets and inject attacks. We now apply our
f of interest to the designer. There are a variety of metrics metric to the experiments we started in section § 3.
(optimization objectives) that can be measured such as the Attacking pH Level. Because this system is highly non-
product degradation from undetected attacks, or the histor-
ical deviation of the system under attack <i ∂yi ŷi ∂ or the
a linear, apart from the simple physical model (LDS) of or-
deviation at the end of the attack ∂yN ŷN ∂, etc. A sum-
a der 2 we presented in section § 3, we also applied a system
identification to calculate higher order system models: an
mary of how to compute the y-axis and the x-axis of our LDS model of order 20 and two nonlinear models (order
metric is given in Algorithms 1 and 2. 50 and 100) based on wavelet networks [52]. Fig. 10 shows
the minimum pH achieved by the attacker after 4-minutes
5. EXPERIMENTAL RESULTS and against three di↵erent models. Notice that the nonlin-
ear models limited the impact of the stealthy attack by not
allowing deviations below a pH of 5, while our linear model
Table 2: Advantages and disadvantages of di↵erent (which was successful in detecting attacks in our motivating
evaluation setups. example) was not able to prevent the attacker from taking
Reliability of: X-Axis Y-Axis the pH below 5.
Real Data #
Testbed G
# #
G 8
Simulation #

= well suited, G
# = partially suitable, # = least suitable 7
pH

Attack
We evaluate anomaly detection systems under the light of 6

our Stronger Adversary Model (see section § 4), using our pH with Nonlinear order-100
pH with Nonlinear order-50
new metrics in a range of test environments, with individ- 5 pH with LDS order-20
pH without Attack
ual strengths and weaknesses (see Table 2). As shown in
the table, real-world data allows us to analyze operational 5 6 7 8 9 10 11 12
Time (min)
large-scale scenarios, and therefore it is the best way to test
the x-axis metric E[Tf a ]. Unfortunately, real-world data
does not give researchers the flexibility to launch attacks Figure 10: pH deviation imposed by greedy attacks
and measure the impact on all parts of the system. Such while using stateful detection (⌧ = 0.05) with both,
interactive testing requires the use of a dedicated physical LDS and nonlinear models.
testbed.

1098
3.5
water level attacks with di↵erent increment rates, starting
3 Nonlinear Model order-100 from the Low level setting and stopping at the High level
Nonlinear Model order-50
LDS order-20
setting, and their induced maximum over the real level.
2.5
Only attacks a1 and a2 achieve a successful overflow (only
pH / min

2 a2 achieves a water spill), while a3 deviates the water level


without overflow. In our experiment, High corresponds to a
1.5 water level of 0.8 m and Low to 0.5 m. Overflow occurs at
1
1.1 m. The testbed has a drainage system to allow attacks
that overflow the tank.
0.5
0.5 1 1.5 2
E[T fa ] (min)

Figure 11: Comparison of LDS and nonlinear mod-


els to limit attack impact using our metric. Higher
order nonlinear models perform better.

Fig. 11 illustrates the application of our proposed metric


over 10 di↵erent undetected greedy attacks, each averaging
Figure 12: Impact of di↵erent increment rates on
4 minutes, to evaluate the three system models used for
overflow attack. The attacker has to select the rate
detection. Given enough time, it is not possible to restrict a
deviation of pH below 5. Nevertheless, for all E[Tf a ](min),
of increase with the lowest slope while remaining
undetected.
the nonlinear model of order 100 performs better than the
nonlinear model of order 50 and the LDS of order 20, limiting
the impact of the attack per minute pH /min. It would take
0.3
over 5 minutes for the attacker to deviate the pH below 5 =0.0042
without being detected using a nonlinear model of order 100, 0.25
whereas it would take less than 3 minutes with the nonlinear =0.3 =2.5
of order 50 and the LDS of order 20. =2
m / sec
0.2 =1

5.1.1 Attacking the Water Level 0.15


0.3

Now we turn to another stage in our testbed. The goal of 0.1 0.2
=0.0014
the attacker this time is to deviate the water level in a tank 0.1 Stateless
0.05
as much as possible until the tank overflows. =0.0011 Stateful
0
While in the pH example we had to use system identifi- 0
0 0.1 0.2 0.3

cation to learn LDS and nonlinear models, the evolution of 0 50 100 150
E[Tfa] (min)
the water level in a tank is a well-known LDS system that
can be derived from first principles. In particular, we use a
mass balance equation that relates the change in the water Figure 13: Comparison of stateful and stateless de-
in out
level h with respect to the inlet Q and outlet Q volume tection. At 0.3m the tank overflows, so stateless
in out
of water, given by Area dt = Q
dh tests are not good for this use case. ⌧b , ⌧c correspond
to the threshold associated to some E[Tf a ].
Q , where Area is the
cross-sectional area of the base of the tank. Note that in
this process the control actions for the valve and pump are
in out
On/O↵. Hence, Q or Q remain constant if they are Because it was derived from “first principles”, our LDS
open, and zero otherwise. Using a time-discretization of 1 s, model is a highly accurate physical model of the system, so
we obtain an LDS model of the form there is no need to test alternative physical models. How-
in out
Q Qk ever, we can combine our LDS model with a stateless test,
hk+1 = hk + k .
Area and with a stateful test and see which of these detection
Note that while this equation might look like an AR model, tests can limit the impact of stealthy attacks.
in out
it is in fact an LDS model because the input Qk Qk In particular, to compute our metric we need to test state-
changes over time, depending on the control actions of the less and stateful mechanisms and obtain the security metric
PLC (open/close inlet or start/stop pump). In particular that quantifies the impact of undetected attacks for sev-
it is an LDS model with xk = hk , uk = [Qk , Qk ] , B =
in out T
eral thresholds ⌧ . We selected the parameter = 0.002 for
[ Area , Area ], A = 1, and C = 1.
1 1
the stateful (CUSUM) algorithm, such that the detection
Recall that the goal of the attacker is to deviate the water metric Sk remains close to zero when there is no attack.
level in a tank as much as possible until the tank overflows. The usability metric is calculated for TE = 8 h, which is the
In particular, the attacker increases the water level sensor time of the experiment without attacks.
signal at a lower rate than the real level of water (Fig. 12) Fig. 13 illustrates the maximum impact caused by 20 dif-
with the goal of overflowing the tank. A successful attack ferent undetected attacks, each of them averaging 40 min-
occurs if the PLC receives from the sensor a High water-level utes. Even though the attacks remained undetected, the
message (the point when the PLC sends a command to close impact using stateless detection is such that a large amount
the inlet), and at that point, the deviation ( ) between the of water can be spilled. Only for very small thresholds is it
real level of water and the “fake” level (which just reached the possible to avoid overflow, but it causes a large number of
High warning) is ' Overflow High. Fig. 12 shows three false alarms. On the other hand, stateful detection limits

1099
the impact of the adversary. Note that to start spilling wa- impact of a stealthy attack when compared to the stateless
ter (i.e., > 0.3 m) a large threshold is required. Clearly, test we now show how to improve the AR physical model
selecting a threshold such that E[Tf a ] = 170 min can avoid previously used by Hadziosmanovic et al. [21]. In particular,
the spilling of water with a considerable tolerable number of we notice that Hadziosmanovic et al. use an AR model per
false alarms. signal ; this misses the opportunity of creating models of how
In addition to attacking sensor values, we would like to multiple signals are correlated, creating correlated physical
analyze undetected actuation attacks. To launch attacks on models will limit the impact of undetected attacks.
the actuators (pumps) of this testbed, we would need to turn
them On and O↵ in rapid succession in order try to main- Stateless
tain the residuals of the system low enough to avoid being 600
Stateful
detected. We cannot do this on real equipment because the 500

/ sec
pumps would get damaged. Therefore, we will analyze unde- 400

tected actuator attacks with simulations (where equipment 300


cannot be damaged) in § 5.3.
200

100

5.2 Large-Scale Operational Systems (Modbus 0 200 400 600 800 1000

packets) E[T fa ] (sec)

We were allowed to place a network sni↵er on a real-world


operational large-scale water facility in the U.S. We collected Figure 14: Stateful performs better than stateless
more than 200GB of network packet captures of a system detection: The attacker can send larger undetected
using the Modbus/TCP [63] industrial protocol. Our goal false measurements for the same expected time to
is to extract the sensor and control commands from this false alarms.
trace and evaluate and compare alternatives presented in
the survey.
The network has more than 100 controllers, some of them
with more than a thousand registers. In particular, 1) 95%
Correlated Signals
140

of transmissions are Modbus packets and the rest 5% corre- 120


s8
s16
sponds to general Internet protocols; 2) the trace captured 100
s19

Measurements
108 Modbus devices, of which one acts as central master, 80

60
one as external network gateway, and 106 are slave PLCs; 40
3) of the commands sent from the master to the PLCs, 74% 0 1 2 3 4 5 6 7 8 9
Time (sec) ×10 4

are Read/Write Multiple Registers (0x17) commands, 20% 112

s16
are Read Coils (0x01) commands, and 6% are Read Discrete 110

108 s19
Inputs (0x02) commands; and 4) 78% of PLCs count with 106

200 to 600 registers, 15% between 600 to 1000, and 7% with 104

more than 1000. 102

100
We replay the traffic traces in packet capture (pcap) for- 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8

Time (sec) ×10 4

mat and use Bro [51] to track the memory map of holding
(read/write) registers from PLCs. We then use Pandas [68],
a Python Data Analysis Library, to parse the log generated Figure 15: Three example signals with significant
by Bro and to extract per PLC the time series correspond- correlations. Signal S16 is more correlated with S19
ing to each of the registers. Each time series corresponds to than it is with S8 .
a signal (yk ) in our experiments. We classify the signals as
91.5% constant, 5.3% discrete and 3.2% continuous based
Spatial and Temporal Correlation. In an ideal situ-
on the data characterization approach proposed to analyze
ation the water utility operators could help us identify all
Modbus traces [21] and uses AR models (as in Eq. (1)). We
control loops and spatial correlations of all variables (the
follow that approach by modeling the continuous time-series
water pump that controls the level of water in a tank etc.);
in our dataset with AR models. The order of the AR model
however, this process becomes difficult to perform in a large-
is selected using the Best Fit criteria from the Matlab sys-
scale system with thousands of control and sensor signals
tem identification toolbox [39], which uses unexplained out-
exchanged every second; therefore we now attempt to find
put variance, i.e., the portion of the output not explained
correlations empirically from our data. We correlate sig-
by the AR model for various orders [41].
nals by computing the correlation coefficients of di↵erent
Using the AR model, our first experiment centers on de-
signals s1 , s2 , ⇧, sN . The correlation coefficient is a nor-
ciding which statistical detection test is better, a stateless
malized variant of the mathematical covariance function:
corr(si , sj ) = ’ where cov(si , sj ) denotes
test or the stateful CUSUM change detection test. Fig. 14 cov(si ,sj )
shows the comparison of stateless vs. stateful tests with our cov(si ,si )cov(sj ,sj )
proposed metrics (where the duration of an undetected at- the covariance between si and sj and correlation ranges
tack is 10 minutes). As expected, once the CUSUM statis- between 1 & corr(si , sj ) & 1. We then calculate the p-
tic reaches the threshold Sk = ⌧ , the attack no longer has value of the test to measure the significance of the corre-
enough room to continue deviating the signal without be- lation between signals. The p-value is the probability of
ing detected, and larger thresholds ⌧ do not make a di↵er- having a correlation as large (or as negative) as the ob-
ence once the attacker reaches the threshold, whereas for served value when the true correlation is zero (i.e., testing
the stateless test, the attacker has the ability to change the the null hypothesis of no correlation, so lower values of p
measurement by ⌧ units at every time step. indicate higher evidence of correlation). We were able to
Having shown that a CUSUM (stateful) test reduces the find 8,620 correlations to be highly significant with p = 0.

1100
Because corr(si , sj ) = corr(sj , si ) there are 4,310 unique sig- output signal is not under complete control of the attacker:
nificant correlated pairs. We narrow down our attention to consumers can also a↵ect the frequency of the system (by
corr(si , sj ) > .96. Fig. 15 illustrates three of the correlated increasing or decreasing electricity consumption), and there-
signals we found. Signals s16 and s19 are highly correlated fore they can cause an alarm to be generated if the attacker
with corr(s16 , s19 ) = .9924 while s8 and s19 are correlated is not conservative. We assume the worst possible case of
but with a lower correlation coefficient of corr(s8 , s19 ) = an omniscient adversary that knows how much consumption
.9657. For our study we selected to use signal s8 and its will happen at the next time-step (this is a conservative ap-
most correlated signal s17 which are among the top most proach to evaluate the security of our system, in practice
correlated signal pairs we found with corr(S8 , S17 ) = .9996. we expect the anomaly detection system to perform better
because no attacker can predict the future).
Stateless Stateful
800
160 AR model LDS model
20 0.7
600 150
18 0.6

Maximum ∆f (Hz)
140

Maximum ∆f (Hz)
/ sec

/ sec

400 130 0.5


16 0.4
120
0.4
0.2
200 110 14
0.3 0
100 0 0.05
12
0 90 0.2
0 200 400 600 800 1000 0 200 400 600 800 1000 Stateless test
E[T fa ] (sec) E[T fa ] (sec) 10 Stateful test 0.1 Stateless test
Stateful test
80 2 4 6 00 2 4 6
Figure 16: Using the defined metrics, we show how E[Tfa] (min) E[Tfa] (min)
our new correlated AR models perform better (with
stateless or stateful tests) than the AR models of Figure 17: These figures show two things: (1) the
independent signals. stateful (CUSUM) test performs better than state-
less tests when using AR (left) or LDS (right) mod-
Our experiments show that an AR model trained with cor- els, and (2) LDS models perform an order of mag-
related signals (see Fig. 16) is more e↵ective in limiting the nitude better than AR models (right vs left). Only
maximum deviation the attacker can achieve (assuming the for really small values of ⌧ < (0.04 minutes on av-
attacker only compromises one of the signals). For that rea- erage between false alarms), will the stateless test
son, we encourage future work to use correlated AR models performs better than the stateful test.
rather than AR models of single signals.
We now evaluate all possible combinations of the pop-
5.3 Simulations of the Physical World ular physical models and detection statistics illustrated in
With simulations we can launch actuator attacks without Table 1. In particular we want to test AR models vs. LDS
the safety risk of damaging physical equipment. In partic- models estimated via system identification (SLS models do
ular, in this section we launch actuation attacks and show not make sense here because our system is dynamic) and
how the control algorithm used can significantly limit the stateless detection tests vs. stateful detection tests.
impact of stealthy attackers. In particular we show that the We launch an undetected actuator attack after 50 seconds
Integrative part of a Proportional Integral Derivative (PID) using stateless and stateful detection tests for both: AR and
control algorithm (or a PI or I control algorithm) can correct LDS physical models. Our experiments show that LDS mod-
the deviation injected by the malicious actuator, and force els outperform AR models, and that stateful models (again)
the system to return to the correct operating state. outperform stateless models, as illustrated in Fig 17. These
We use simulations of primary frequency control in the wide variations in frequency would not be tolerated in a real
power grid as this is the scenario used by the Aurora at- system, but we let the simulations continue for large fre-
tack [69]. Our goal is to maintain the frequency of the power quency deviations to illustrate the order of magnitude ability
grid as close as possible to 60Hz, subject to perturbations— from LDS models to limit the impact of stealthy attackers
i.e., changes in the Mega Watt (MW) demand by consumers— when compared to AR models.
and attacks. Having settled for LDS physical models with CUSUM as
We assume that the attacker takes control of the actua- the optimal combination of physical models with detection
tors. When we consider attacks on a control signal, we need tests, we now evaluate the performance of di↵erent control
to be careful to specify whether or not the anomaly detection algorithms, a property that has rarely been explored in our
system can observe the false control signal. In this section, survey of related work. In particular, we show how Integra-
we assume the worst case: our anomaly detection algorithm tive control is able to correct undetected actuation attacks.
cannot see the manipulated signal and indirectly observes In particular we compare one of the most popular control
the attack e↵ects from sensors (e.g., vk is controlled by the algorithms: P control, and then we compare it to PI control.
attacker, while the detection algorithm observes the valid uk If the system operator has a P control of the form uk = Kyk ,
control signal, see Fig. 1). the attacker can a↵ect the system significantly, as illustrated
Attacking a sensor is easier for our stealthy adversary be- in Fig. 18. However, if the system operator uses a PI control,
cause she knows the exact false sensor value ŷ that will al- the e↵ects of the attacker are limited: The actuator attack
low her to remain undetected while causing maximum dam- will tend to deviate the frequency signal, but this deviation
age. On the other hand, by attacking the actuator the at- will cause the controller to generate a cumulative compensa-
tacker needs to find the input uk that deviates the frequency tion (due to the integral term) and because the LDS model
enough, but still remains undetected. This is harder be- knows the e↵ect of this cumulative compensation, it is going
cause even if the attacker has a model of the system, the to expect the corresponding change in the sensor measure-

1101
Actuator attack I/O with P control
66 0.15
Real control
of the system achieved by the attacker, and the maximum
Compromised control temporary deviation of the state of the system achieved by
Frequency (Hz)

64.8
0.1
the attacker.

U , Ua (MW)
63.6
0.05 As we can see, the control algorithm plays a fundamental
62.4
Real freq.
Estimated freq.
role in how e↵ective an actuation attack can be. An at-
61.2
0 tacker that can manipulate the actuators at will can cause
-0.05
a larger frequency error but for a short time when we use
60
30 40 50 60 70 80 30 40 50 60 70 80 PI control; however, if we use P control, the attacker can
Time (sec) Time (sec)
launch more powerful attacks causing long-term e↵ects. On
the other hand, attacks on sensors have the same long-term
Figure 18: Left: The real (and trusted) frequency negative e↵ects independent of the type of control we use
signal is increased to a level higher than the one ex- (P or PI). Depending on the type of system, short-term ef-
pected (red) by our model of physical system given fects may be more harmful than long-term errors. In our
the control commands. Right: If the defender uses a power plant example, a sudden frequency deviation larger
P control algorithm, the attacker is able to maintain than 0.5 Hz can cause irreparable damage on the generators
a large deviation of the frequency from its desired and equipment in transmission lines (and will trigger pro-
60Hz set point. tection mechanisms disconnecting parts of the grid). Small
long-term deviations may cause cascading e↵ects that can
Actuator attack with I/O estimation and PI control propagate and damage the whole grid.
64
63
Real freq.
0.1
Real control
Compromised control
While it seems that the best option to protect against
Estimated freq.
actuator attacks is to deploy PI controls in all generators,
Frequency (Hz)

62 0.05
U,U a (MW)

61 0 several PI controllers operating in parallel in the grid can


60
-0.05 lead to other stability problems. Therefore often only the
-0.1
-59 central Automatic Generation Control (AGC) implements a
-0.15
58
-0.2
PI controller although distributed PI control schemes have
30 40 50
Time (sec)
60 70 80 30 40 50 60
Time (sec)
70 80 been proposed recently [3].
Recall that we assumed the actuation attack was launched
by an omniscient attacker that knows even the specific load
Figure 19: Same setup as in Fig. 18, but this time the system is going to be subjected (i.e., it knows exactly
the defender uses a PI control algorithm: this results how much will consumers demand electricity at every time-
in the controller being able to drive the system back step, something not even the controller knows). For many
to the desired 60Hz operation point. practical applications, it will be impossible for the attacker
to predict exactly the consequence of its actuation attack
due to model uncertainties (consumer behavior) and random
ment. As a consequence, to maintain the distance between perturbations. As such, the attacker has a non-negligible
the estimated and the real frequency below the threshold, risk of being detected when launching actuation attacks when
the attack would have to decrease its action. At the end, compared to the 100% certainty the attacker has of not be-
the only way to maintain the undetected attack is when the ing detected when launching sensor attacks. In practice,
a
attack is non-existent uk = 0, as shown in Fig. 19. we expect that an attacker that would like to remain unde-
tected using actuation attacks will behave conservatively to
Attack with P control Attack with PI control
4.5 4.5 accommodate for the uncertainties of the model, and thus
4 4 sensor attack
Max. deviation with we expect that the maximum transient deviation from actu-
Final deviation with sensor attack
3.5 Max. deviation with actuator attack
ation attacks will be lower.
3.5
Final deviation with actuator attack
3 3
" F/sec

2.5 2.5
6. CONCLUSIONS
2 2
1.5
0.04
1.5 0.4 6.1 Findings
1 1 0.2
0.02
We introduced theoretical and practical contributions to
0.5 0 5 0.5 0 5
the growing literature of physics-based attack detection in
0 4 0 4
0 2
E[T ] (min)
6 0 2
E[Tfa] (min)
6 control systems. Our literature review from di↵erent do-
fa
mains of expertise unifies disparate terminology, and nota-
tion. We hope our e↵orts can help other researchers refine
Figure 20: Di↵erences between attacking sensors and improve a common language to talk about physics-based
and actuators, and e↵ects when the controller runs attack detection across computer security, control theory,
a P control algorithm vs. a PI control algorithm. and power system venues.
In particular, in our survey we identified a lack of unified
In all our previous examples with attacked sensors (except metrics and adversary models. We explained in this paper
for the pH case), the worst possible deviation was achieved the limitations of previous metrics and adversary models,
at the end of the attack, but for actuation attacks (and PI and proposed a novel stealthy and adaptive adversary model,
control), we can see that the controller is compensating the together with its derived intrusion detection metric, that can
attack in order to correct the observed frequency deviation, be used to study the e↵ectiveness of physics-based attack-
and thus the final deviation will be zero: that is, the asymp- detection algorithms in a systematic way.
totic deviation is zero, while the transient impact of the We validated our approaches in multiple setups, includ-
attacker can be high. Fig. 20 illustrates the di↵erence be- ing: a room-size water treatment testbed, a real large-scale
tween measuring the maximum final deviation of the state operational system managing more than 100 PLCs, and sim-

1102
ulations of primary frequency control in the power grid. We an unsafe state. Therefore maintaining safety under both,
showed in Table 2 how each of these validation setups has attacks and false alarms, will need to take priority in the
advantages and disadvantages when evaluating the x-axis study of any automatic response to alerts.
and y-axis of our proposed metric.
One result we obtained across our testbed, real opera- Acknowledgments
tional systems, and simulations, is the fact that stateful
tests perform better than stateless tests. This is in stark The work at UT Dallas was supported by NIST under award
contrast to the popularity of stateless detection statistics 70NANB14H236 from the U.S. Department of Commerce
as summarized in Table 1. We hope our paper motivates and by NSF CNS-1553683. The work of Justin Ruths at
more implementations of stateful instead of stateless tests SUTD was supported by grant NRF2014NCR-NCR001-40
in future work. from NRF Singapore. H. Sandberg was supported in part
We also show that for a stealthy actuator attack, PI con- by the Swedish Research Council (grant 2013-5523) and the
trols play an important role in limiting the impact of this Swedish Civil Contingencies Agency through the CERCES
attack. In particular we show that the Integrative part of project. We thank the iTrust center at SUTD for enabling
the controller corrects the system deviation and forces the the experiments on the SWaT testbed.
attacker to have an e↵ective negligible impact asymptoti-
cally. Disclaimer
Finally, we also provided the following novel observations:
(1) finding spatio-temporal correlations of Modbus signals Certain commercial equipment, instruments, or materials
has not been proposed before, and we showed that these are identified in this paper in order to specify the experimen-
models are better than models of single signals, (2) while tal procedure adequately. Such identification is not intended
input/output models like LDS are popular in control the- to imply recommendation or endorsement by the National
ory, they are not frequently used in papers published in se- Institute of Standards and Technology, nor is it intended to
curity conferences, and we should start using them because imply that the materials or equipment identified are neces-
they perform better than the alternatives, unless we deal sarily the best available for the purpose.
with a highly-nonlinear model, in which case the only way
to limit the impact of stealthy attacks is to estimate non- 7. REFERENCES
linear physical models of the system, and (3) we show why [1] S. Amin, X. Litrico, S. Sastry, and A. Bayen. Cyber
launching undetected attacks in actuators is more difficult security of water SCADA systems; Part I: Analysis
than in sensors. and experimentation of stealthy deception attacks.
IEEE Transactions on Control Systems Technology,
6.2 Discussion and Future Work 21(5):1963–1970, 2013.
While physics-based attack detection can improve the se- [2] S. Amin, X. Litrico, S. Sastry, and A. Bayen. Cyber
curity of control systems, there are some limitations. For ex- security of water SCADA systems; Part II: Attack
ample, in all our experiments the attacks a↵ected the resid- detection using enhanced hydrodynamic models. IEEE
uals and anomaly detection statistics while keeping them Transactions on Control Systems Technology,
below the thresholds; however, there are special cases where 21(5):1679–1693, 2013.
depending on the power of the attacker or the characteris- [3] M. Andreasson, D. V. Dimarogonas, H. Sandberg, and
tics of the plant, the residuals can remain zero (ignoring the K. H. Johansson. Distributed pi-control with
noise) while the attacker can drive the system to an arbi- applications to power systems frequency control. In
trary state. For example, if the attacker has control of all Proceedings of American Control Conference (ACC),
sensors and actuators, then it can falsify the sensor readings pages 3183–3188. IEEE, 2014.
so that our detector believes the sensors are reporting the
[4] K. J. Åström and P. Eykho↵. System identification—a
expected state given the control signal, while in the mean-
survey. Automatica, 7(2):123–162, 1971.
time, the actuators can control the system to an arbitrary
unsafe condition. [5] S. Axelsson. The base-rate fallacy and the difficulty of
Similarly, some properties of the physical systems can intrusion detection. ACM Transactions on Information
also limit us from detecting attacks. For example, systems and System Security (TISSEC), 3(3):186–205, 2000.
vulnerable to zero-dynamics attacks [61], unbounded sys- [6] C.-z. Bai and V. Gupta. On Kalman filtering in the
tems [62], and highly non-linear or chaotic systems [48]. presence of a compromised sensor : Fundamental
Finally, one of the biggest challenges for future work is performance bounds. In Proceedings of American
the problem of how to respond to alerts. While in some Control Conference, pages 3029–3034, 2014.
control systems simply reporting the alert to operators can [7] C.-z. Bai, F. Pasqualetti, and V. Gupta. Security in
be considered enough, we need to consider automated re- stochastic control systems : Fundamental limitations
sponse mechanisms in order to guarantee the safety of the and performance bounds. In Proceedings of American
system. Similar ideas in our metric can be extended to Control Conference, 2015.
this case, where instead of measuring the false alarms, we [8] R. B. Bobba, K. M. Rogers, Q. Wang, H. Khurana,
measure the impact of a false response. For example, our K. Nahrstedt, and T. J. Overbye. Detecting false data
previous work [10] considered switching a control system to injection attacks on DC state estimation. In
open-loop control whenever an attack in the sensors was de- Proceedings of Workshop on Secure Control Systems,
tected (meaning that the control algorithm will ignore sensor volume 2010, 2010.
measurements and will attempt to estimate the state of the [9] A. Carcano, A. Coletta, M. Guglielmi, M. Masera,
system based only on the expected consequences of its con- I. N. Fovino, and A. Trombetta. A multidimensional
trol commands). As a result, instead of measuring the false critical state analysis for detecting intrusions in
alarm rate, we focused on making sure that a reconfiguration SCADA systems. IEEE Transactions on Industrial
triggered by a false alarm would never drive the system to Informatics, 7(2):179–186, 2011.

1103
[10] A. A. Cardenas, S. Amin, Z.-S. Lin, Y.-L. Huang, systems. In Proceedings of Chinese Control and
C.-Y. Huang, and S. Sastry. Attacks against process Decision Conference, pages 3319–3323, 2015.
control systems: risk assessment, detection, and [24] T. Kailath and H. V. Poor. Detection of stochastic
response. In Proceedings of the ACM symposium on processes. IEEE Transactions on Information Theory,
information, computer and communications security, 44(6):2230–2231, 1998.
pages 355–366, 2011. [25] A. J. Kerns, D. P. Shepard, J. A. Bhatti, and T. E.
[11] A. A. Cárdenas, J. S. Baras, and K. Seamon. A Humphreys. Unmanned aircraft capture and control
framework for the evaluation of intrusion detection via gps spoofing. Journal of Field Robotics,
systems. In Proceedings of Symposium on Security and 31(4):617–636, 2014.
Privacy, pages 77–91. IEEE, 2006. [26] T. T. Kim and H. V. Poor. Strategic protection
[12] S. Cui, Z. Han, S. Kar, T. T. Kim, H. V. Poor, and against data injection attacks on power grids. IEEE
A. Tajer. Coordinated data-injection attack and Transactions on Smart Grid, 2(2):326–333, 2011.
detection in the smart grid: A detailed look at [27] I. Kiss, B. Genge, and P. Haller. A clustering-based
enriching detection solutions. Signal Processing approach to detect cyber attacks in process control
Magazine, IEEE, 29(5):106–115, 2012. systems. In Proceedings of Conference on Industrial
[13] G. Dán and H. Sandberg. Stealth attacks and Informatics (INDIN), pages 142–148. IEEE, 2015.
protection schemes for state estimators in power [28] O. Kosut, L. Jia, R. Thomas, and L. Tong. Malicious
systems. In Proceedings of Smart Grid data attacks on smart grid state estimation: Attack
Commnunications Conference (SmartGridComm), strategies and countermeasures. In Proceedings of
October 2010. Smart Grid Commnunications Conference
[14] K. R. Davis, K. L. Morrow, R. Bobba, and E. Heine. (SmartGridComm), October 2010.
Power flow cyber attacks and perturbation-based [29] G. Koutsandria, V. Muthukumar, M. Parvania,
defense. In Proceedings of Conference on Smart Grid S. Peisert, C. McParland, and A. Scaglione. A hybrid
Communications (SmartGridComm), pages 342–347. network IDS for protective digital relays in the power
IEEE, 2012. transmission grid. In Proceedings of Smart Grid
[15] V. L. Do, L. Fillatre, and I. Nikiforov. A statistical Communications (SmartGridComm), 2014.
method for detecting cyber/physical attacks on [30] M. Krotofil, J. Larsen, and D. Gollmann. The process
SCADA systems. In Proceedings of Control matters: Ensuring data veracity in cyber-physical
Applications (CCA), pages 364–369. IEEE, 2014. systems. In Proceedings of Symposium on Information,
[16] E. Eyisi and X. Koutsoukos. Energy-based attack Computer and Communications Security (ASIACCS),
detection in networked control systems. In Proceedings pages 133–144. ACM, 2015.
of the Conference on High Confidence Networked [31] C. Kwon, W. Liu, and I. Hwang. Security analysis for
Systems (HiCoNs), pages 115–124, New York, NY, cyber-physical systems against stealthy deception
USA, 2014. ACM. attacks. In Proceedings of American Control
[17] N. Falliere, L. O. Murchu, and E. Chien. W32. stuxnet Conference, pages 3344–3349, 2013.
dossier. White paper, Symantec Corp., Security [32] R. Langner. Stuxnet: Dissecting a cyberwarfare
Response, 2011. weapon. Security & Privacy, IEEE, 9(3):49–51, 2011.
[18] D. Formby, P. Srinivasan, A. Leonard, J. Rogers, and [33] J. Liang, O. Kosut, and L. Sankar. Cyber attacks on
R. Beyah. Who’s in control of your control system? ac state estimation: Unobservability and physical
Device fingerprinting for cyber-physical systems. In consequences. In Proceedings of PES General Meeting,
Network and Distributed System Security Symposium pages 1–5, July 2014.
(NDSS), Feb, 2016. [34] H. Lin, A. Slagell, Z. Kalbarczyk, P. W. Sauer, and
[19] R. M. Gerdes, C. Winstead, and K. Heaslip. CPS: an R. K. Iyer. Semantic security analysis of SCADA
efficiency-motivated attack against autonomous networks to detect malicious control commands in
vehicular transportation. In Proceedings of the Annual power grids. In Proceedings of the workshop on Smart
Computer Security Applications Conference (ACSAC), energy grid security, pages 29–34. ACM, 2013.
pages 99–108. ACM, 2013. [35] Y. Liu, P. Ning, and M. K. Reiter. False data injection
[20] A. Giani, E. Bitar, M. Garcia, M. McQueen, attacks against state estimation in electric power
P. Khargonekar, and K. Poolla. Smart grid data grids. In Proceedings of ACM conference on Computer
integrity attacks: characterizations and and communications security (CCS), pages 21–32.
countermeasures ⇡. In Proceedings of Smart Grid ACM, 2009.
Communications Conference (SmartGridComm), [36] Y. Liu, P. Ning, and M. K. Reiter. False data injection
pages 232–237. IEEE, 2011. attacks against state estimation in electric power
[21] D. Hadžiosmanović, R. Sommer, E. Zambon, and P. H. grids. ACM Transactions on Information and System
Hartel. Through the eye of the PLC: semantic security Security (TISSEC), 14(1):13, 2011.
monitoring for industrial processes. In Proceedings of [37] L. Ljung. The Control Handbook, chapter System
the Annual Computer Security Applications Identification, pages 1033–1054. CRC Press, 1996.
Conference (ACSAC), pages 126–135. ACM, 2014. [38] L. Ljung. System Identification: Theory for the User.
[22] X. Hei, X. Du, S. Lin, and I. Lee. PIPAC: patient Prentice Hall PTR, Upper Saddle River, NJ, USA, 2
infusion pattern based access control scheme for edition, 1999.
wireless insulin pump system. In Proceedings of [39] L. Ljung. System Identification Toolbox for Use with
INFOCOM, pages 3030–3038. IEEE, 2013. MATLAB. The MathWorks, Inc., 2007.
[23] F. Hou, Z. Pang, Y. Zhou, and D. Sun. False data [40] D. Mashima and A. A. Cárdenas. Evaluating
injection attacks for a class of output tracking control electricity theft detectors in smart grid networks. In

1104
Research in Attacks, Intrusions, and Defenses, pages authentication for active sensors under spoofing
210–229. Springer, 2012. attacks. In Proceedings of the ACM SIGSAC
[41] I. MathWorks. Identifying input-output polynomial Conference on Computer and Communications
models. www.mathworks.com/help/ident/ug/ Security (CCS), pages 1004–1015, New York, NY,
identifying-input-output-polynomial-models.html, USA, 2015. ACM.
October 2014. [56] R. Smith. A decoupled feedback structure for covertly
[42] S. McLaughlin. CPS: Stateful policy enforcement for appropriating networked control systems. In
control system device usage. In Proceedings of the Proceedings of IFAC World Congress, volume 18,
Annual Computer Security Applications Conference pages 90–95, 2011.
(ACSAC), pages 109–118, New York, NY, USA, 2013. [57] S. Sridhar and M. Govindarasu. Model-based attack
ACM. detection and mitigation for automatic generation
[43] F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas. Coding control. Smart Grid, IEEE Transactions on,
sensor outputs for injection attacks detection. In 5(2):580–591, 2014.
Proceedings of Conference on Decision and Control, [58] R. Tan, V. Badrinath Krishna, D. K. Yau, and
pages 5776–5781, 2014. Z. Kalbarczyk. Impact of integrity attacks on
[44] Y. Mo and B. Sinopoli. Secure control against replay real-time pricing in smart grids. In Proceedings of the
attacks. In Proceedings of Allerton Conference on SIGSAC conference on Computer & communications
Communication, Control, and Computing (Allerton), security (CCS), pages 439–450. ACM, 2013.
pages 911–918. IEEE, 2009. [59] A. Teixeira, S. Amin, H. Sandberg, K. H. Johansson,
[45] Y. Mo, S. Weerakkody, and B. Sinopoli. Physical and S. S. Sastry. Cyber security analysis of state
authentication of control systems: designing estimators in electric power systems. In Proceedings of
watermarked control inputs to detect counterfeit Conference on Decision and Control (CDC), pages
sensor outputs. IEEE Control Systems, 35(1):93–109, 5991–5998. IEEE, 2010.
2015. [60] A. Teixeira, D. Pérez, H. Sandberg, and K. H.
[46] Y. L. Mo, R. Chabukswar, and B. Sinopoli. Detecting Johansson. Attack models and scenarios for networked
integrity attacks on SCADA systems. IEEE control systems. In Proceedings of the conference on
Transactions on Control Systems Technology, High Confidence Networked Systems (HiCoNs), pages
22(4):1396–1407, 2014. 55–64. ACM, 2012.
[47] K. L. Morrow, E. Heine, K. M. Rogers, R. B. Bobba, [61] A. Teixeira, I. Shames, H. Sandberg, and K. H.
and T. J. Overbye. Topology perturbation for Johansson. Revealing stealthy attacks in control
detecting malicious data injection. In Proceedings of systems. In Proceedings of Allerton Conference on
Hawaii International Conference on System Science Communication, Control, and Computing (Allerton),
(HICSS), pages 2104–2113. IEEE, 2012. pages 1806–1813. IEEE, 2012.
[48] E. Ott, C. Grebogi, and J. A. Yorke. Controlling [62] A. Teixeira, I. Shames, H. Sandberg, and K. H.
chaos. Physical review letters, 64(11):1196, 1990. Johansson. A secure control framework for
[49] M. Parvania, G. Koutsandria, V. Muthukumary, resource-limited adversaries. Automatica, 51:135–148,
S. Peisert, C. McParland, and A. Scaglione. Hybrid 2015.
control network intrusion detection systems for [63] The Modbus Organization. Modbus application
automated power distribution systems. In Proceedings protocol specification, 2012. Version 1.1v3.
of Conference on Dependable Systems and Networks [64] D. Urbina, J. Giraldo, N. Tippenhauer, and
(DSN), pages 774–779, June 2014. A. Cárdenas. Attacking fieldbus communications in
[50] F. Pasqualetti, F. Dorfler, and F. Bullo. Attack ics: Applications to the swat testbed. In Proceedings
detection and identification in cyber-physical systems. of the Singapore Cyber-Security Conference
Automatic Control, IEEE Transactions on, (SG-CRC), Singapore, volume 14, pages 75–89, 2016.
58(11):2715–2729, Nov 2013. [65] J. Valente and A. A. Cardenas. Using visual
[51] V. Paxson. Bro: a system for detecting network challenges to verify the integrity of security cameras.
intruders in real-time. Computer networks, In Proceedings of Annual Computer Security
31(23):2435–2463, 1999. Applications Conference (ACSAC). ACM, 2015.
[52] S. Postalcioglu and Y. Becerikli. Wavelet networks for [66] O. Vuković and G. Dán. On the security of distributed
nonlinear system modeling. Neural Computing and power system state estimation under targeted attacks.
Applications, 16(4-5):433–441, 2007. In Proceedings of the Symposium on Applied
[53] I. Sajjad, D. D. Dunn, R. Sharma, and R. Gerdes. Computing, pages 666–672. ACM, 2013.
Attack mitigation in adversarial platooning using [67] Y. Wang, Z. Xu, J. Zhang, L. Xu, H. Wang, and
detection-based sliding mode control. In Proceedings of G. Gu. SRID: State relation based intrusion detection
the ACM Workshop on Cyber-Physical for false data injection attacks in SCADA. In
Systems-Security and/or PrivaCy (CPS-SPC), pages Proceedings of European Symposium on Research in
43–53, New York, NY, USA, 2015. ACM. Computer Security (ESORICS), pages 401–418.
http://doi.acm.org/10.1145/2808705.2808713. Springer, 2014.
[54] H. Sandberg, A. Teixeira, and K. H. Johansson. On [68] Pandas: Python data analysis library.
security indices for state estimators in power http://pandas.pydata.org, November 2015.
networks. In Proceedings of Workshop on Secure [69] M. Zeller. Myth or reality—does the aurora
Control Systems, 2010. vulnerability pose a risk to my generator? In
[55] Y. Shoukry, P. Martin, Y. Yona, S. Diggavi, and Proceedings of Conference for Protective Relay
M. Srivastava. PyCRA: Physical challenge-response Engineers, pages 130–136. IEEE, 2011.

1105

You might also like