0% found this document useful (0 votes)
35 views13 pages

Joint Detection and Localization of Stealth False Data Injection Attacks in Smart Grids Using Graph Neural Networks

pp

Uploaded by

manasyogi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views13 pages

Joint Detection and Localization of Stealth False Data Injection Attacks in Smart Grids Using Graph Neural Networks

pp

Uploaded by

manasyogi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO.

1, JANUARY 2022 807

Joint Detection and Localization of Stealth False


Data Injection Attacks in Smart Grids
Using Graph Neural Networks
Osman Boyaci , Graduate Student Member, IEEE, Mohammad Rasoul Narimani , Member, IEEE,
Katherine R. Davis , Senior Member, IEEE, Muhammad Ismail , Senior Member, IEEE,
Thomas J. Overbye , Fellow, IEEE, and Erchin Serpedin , Fellow, IEEE

Abstract—False data injection attacks (FDIA) are a main cate- X ∈ Rn State space.
gory of cyber-attacks threatening the security of power systems. Z ∈ Rm Measurement space.
Contrary to the detection of these attacks, less attention has x∈X A state vector.
been paid to identifying the attacked units of the grid. To this
end, this work jointly studies detecting and localizing the stealth x̂ ∈ X Original state vector without an attack.
FDIA in power grids. Exploiting the inherent graph topology of x̌ ∈ X False data injected state vector.
power systems as well as the spatial correlations of measure- z∈Z A measurement vector.
ment data, this paper proposes an approach based on the graph zo ∈ Z Original measurement vector.
neural network (GNN) to identify the presence and location of za ∈ Z Attacked measurement vector.
the FDIA. The proposed approach leverages the auto-regressive
moving average (ARMA) type graph filters (GFs) which can a∈Z Attack vector.
better adapt to sharp changes in the spectral domain due to h(x) Nonlinear measurement function at x.
their rational type filter composition compared to the polyno- T Attacker’s target area to perform FDI attack.
mial type GFs such as Chebyshev. To the best of our knowledge, W ∈ Rn×n Weighted
 adjacency matrix.
this is the first work based on GNN that automatically detects D ∈ Rn×n Dii = j W ij Diagonal degree matrix.
and localizes FDIA in power systems. Extensive simulations and
visualizations show that the proposed approach outperforms the  ∈ Rn×n = diag [λ1 , . . . , λn ] Graph Fourier
available methods in both detection and localization of FDIA frequencies.
for different IEEE test systems. Thus, the targeted areas can be U ∈ Rn×n = [u1 , . . . , un ] Graph Fourier basis.
identified and preventive actions can be taken before the attack L ∈ Rn×n = UUT Normalized graph Laplacian.
impacts the grid.
Index Terms—False data injection attacks, graph neural
networks, machine learning, smart grid, power system security. I. I NTRODUCTION
MART grids integrate Information and Communication
S Technologies (ICT) into large-scale power networks
to generate, transmit, and distribute electricity more effi-
N OMENCLATURE
ciently [1]. Remote Terminal Units (RTUs) and Phasor
Pi + jQi Complex power injection at bus i. Measurement Units (PMUs) are utilized to acquire the physical
Pij + jQij Complex power flow between bus i and j. measurements and deliver them to the Supervisory Control and
Vi , θi Voltage magnitude and phase angle of bus i. Data Acquisition Systems (SCADAs). Then, the ICT network
n, m Number of buses, number of measurements. transfers these measurements to the application level where
Manuscript received May 23, 2021; revised August 27, 2021 and September the power system operators process them and take the nec-
30, 2021; accepted October 2, 2021. Date of publication October 5, 2021; essary actions [2]. As a direct consequence, power system
date of current version December 23, 2021. This work was supported by NSF reliability is determined by the accuracy of the steps along
under Award 1808064. Paper no. TSG-00795-2021. (Corresponding author:
Osman Boyaci.) this cyber-physical pipeline [3]. Power system state estima-
Osman Boyaci, Katherine R. Davis, Thomas J. Overbye, and tion (PSSE) modules employ these measurements to estimate
Erchin Serpedin are with the Department of Electrical and Computer the current operating point of the grid [4] and thus the integrity
Engineering, Texas A&M University, College Station, TX 77843 USA
(e-mail: [email protected]; [email protected]; [email protected]; and trustworthiness of the measurements are crucial for proper
[email protected]). operation of power systems. In addition, the accuracy of power
Mohammad Rasoul Narimani is with the College of Engineering, Arkansas system analysis tools such as energy management, contin-
State University, Jonesboro, AR 72404 USA (e-mail: [email protected]).
Muhammad Ismail is with the Department of Computer Science, gency and reliability analysis, load and price forecasting, and
Tennessee Technological University, Cookeville, TN 38505 USA (e-mail: economic dispatch depends on these measurements [5]. Thus,
[email protected]). power system operation strongly depends on the accuracy of
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TSG.2021.3117977. the measurements and the integrity of their flow through the
Digital Object Identifier 10.1109/TSG.2021.3117977 system. Therefore, metering devices represent highly attractive
1949-3053 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
808 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

targets for adversaries that try to obstruct the grid operation a threshold using the Mahalanobis norm of the residuals of
by corrupting the measurements. the non-attacked situation. Any residue larger than the thresh-
By disrupting the integrity of measurement data, false data old is regarded as an attacked sample. Apart from the manual
injection attacks (FDIAs) constitute a considerable cyber- threshold optimization steps, detection times are at the range of
physical threat. More specifically, an adversary injects some seconds in their estimation based models. A generalized mod-
false data to the measurements in order to mislead the PSSE ulation operator that is applied on the states of the system is
and force it to converge to another operating point. Since the presented as an ongoing work in a brief announcement in [12]
state of the power system is miscalculated by using these to localize the FDIAs in power systems. Yet, the results are
false data, any action taken by the grid operator based on not published as of today. Authors in [13] present an internal
the false operating point can lead to serious physical con- observer-based detection and localization method for FDIA in
sequences including systematic problems and failures [6]. In power systems. They create and assign an interval observer
traditional power grids, the largest normalized residual test to each measurement device and construct a customized logic
(LNRT) is employed within the bad data detection (BDD) localization judgment matrix to detect and localize the FDIA.
module along with PSSE to detect the “bad” measurement Nevertheless, their average detection delay is more than 1.1
data [4]. Nevertheless, a designed false data injected measure- seconds, which can highly limit their usability in a real life
ment can bypass the BDD. In particular, [2], [7] show that by scenario. Lack of scalability and the need for a custom solu-
satisfying the power flow equations, an intruder can create an tion requiring manual labor represent additional limitations of
unobservable (stealth) FDIA and bypass the BDD if s/he has this method. A Graph Signal Processing (GSP) based approach
sufficient information about the grid. Various methods have is developed in [14] to detect and localize FDIAs using the
been proposed to alert the grid operator about the presence of Graph Fourier Transform (GFT), local smoothness, and vertex-
the FDIA without providing any information about the attack frequency energy distribution methods. Hovewer, the random
location [8], [9]. Localizing the attack is crucial for power and easily detectable attacks employed to test their models do
system operators since they can take preventive action such as not comprehensively assess the actual performance of the mod-
isolating the under-attack buses and re-dispatching the system els. Besides, manual threshold tuning of graph filters (GFs)
accordingly. Therefore, this paper focuses jointly on detection brings extra effort for their proposed methods. Authors in [15]
and localization of the FDIA in power systems. propose physics- and learning-based approaches to detect and
localize the FDIAs in automatic generation control (AGC)
of power systems. While the physics-based method relies on
A. Related Works interaction variables, the learning-based approach exploits the
In general, there are two main approaches to detect historical Area Control Error (ACE) data, and utilizes a Long
and localize the FDIAs: model-based and data driven Short Term Memory (LSTM) Neural Network (NN) to gener-
approaches [8]. In the model-based methods [10]–[14], a ate a model for learning the data pattern. Nevertheless, [15]
model for the system is built and its parameters are esti- reports results limited to a 5-bus system and assumes training
mated to detect the FDIAs. Since there is no training, these an LSTM model for each measurement. Thus, the limited num-
methods do not require the historical data. However, the ber of components deeply confines the large scale attributes of
detection delays, scalability issues and threshold tuning steps the proposed method. Furthermore, training a separate detector
can limit the performance and usability of the model-based for each bus extremely increases the overall model complex-
approaches [9]. Conversely, the data-driven methods [15] are ity for large systems and reduces its suitability for real world
system independent and require historical data and a train- applications.
ing procedure. However, they provide scalability and real time
compatibility due to the excessive training. Data driven meth-
ods, machine learning (ML) [16], in particular, offer superior B. Motivation
performances to detect FDIAs in power systems as the his- Due to their graph-based topology, graph structural data
torical datasets are growing [8], [9]. Therefore, we employ a such as social networks, traffic networks, and electric grid
data driven approach in this work for detecting and localizing networks cannot be modeled efficiently in the Euclidean space
FDIAs in power systems. and require graph-type architectures [17]. Processing (filter-
While there has been a great deal of research on detection of ing) an image having 30 pixels and a power grid having 30
FDIAs, only a few attempts have been made to localize these buses are demonstrated in Fig. 1. Since nodes are ordered and
attacks [10]–[15]. Since localization of FDIAs is relatively a have the same number of neighbors for image data, it can
newer research subject compared to detection of these attacks, be processed in a 2D Euclidean space. For example a slid-
the current approaches proposed in literature suffer from some ing kernel can easily capture the spatial correlations of pixels
limitations. A multistage localization algorithm based on graph in this Euclidean space. Conversely, neighborhood relation-
theory results is proposed in [10] to localize the attack at clus- ships are unordered and vary from node to node in a graph
ter level. Nevertheless, the low resolution hinders the benefits signal [17]. Therefore, graph signals need to be processed in
of localization in cluster level algorithms. In [11], a model- non-Euclidean spaces determined by the topology of the graph.
driven analytical redundancy approach utilizing Kalman filters In fact, as a highly complex graph structural data, smart grid
is presented for joint detection and mitigation of FDIA in AGC signals require graph type architectures such as GSP or GNN
systems. In their model, the authors of [11] first determine to exploit the spatial correlations of the grid.

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 809

Moreover, s/he can design an attack vector so that a malicious


sample can be indistinguishable from an honest one if the
spatial correlations of grid data are not well captured or the
designed GFs do not satisfy the required spectral response.
Thus, we design an GNN based model by utilizing ARMA
GFs to be able to fit sharp changes in the spectral domain
of the grid. Filter weights are learned automatically during
training by an end-to-end data-driven approach. To compare
our results with the existing data-driven techniques, we uti-
lized several models to jointly detect and localize the FDIA.
Moreover, for a fair comparison, the Bayesian hyper-parameter
optimization technique is employed to all models for tun-
Fig. 1. Demonstration of signal processing in Euclidean 1(a) and non- ing the models’ hyperparameters such as number of layers,
Euclidean 1(b) spaces with an image and a power grid signal [17]. Neighbor
nodes (blue) of a node (red) are ordered and constant in size for the image neurons, etc.
having 30 pixels in 2D Euclidean space. In contrast, they are unordered
and variable in size for the IEEE 30-bus system in Non-Euclidean space.
Therefore, in order to efficiently model the spatial correlation of the power C. Contributions and Paper Organization
grid, graph type approaches that consider the topology of the underlying
systems such as GSP and GNN are necessary. The contributions of this work are outlined as follows.
• To properly capture the spatial correlations of the smart
grid data in a non-Euclidean space, we utilize IIR type
GSP has emerged in the past few years to deal with the ARMA GFs which provide more flexible frequency
data in non-Euclidean spaces [18]. A few researchers designed responses compared to FIR type Chebyshev GFs. It is
GFs to detect and localize the FDIA [12], [14] using GSP. demonstrated on IEEE 118- and 300-bus test systems
However, manual tailoring of the filters and detection thresh- that ARMA GFs better approximate the desired filter
olds substantially limits the applicability and efficiency of response compared to CHEB GFs for the same filter order
GSP. Conversely, GNN, as a data-driven counterpart of the by comparing their empirical frequency responses when
GSP, eliminates the custom design steps and provides an end- approximating an ideal band pass filter.
to-end design that exploits the spatial locality dictated by the • To precisely test our proposed method, we generate a
historical data. Similar to the classical signal processing, a dataset for each test system with 1-minute intervals using
graph signal is first converted into the spectral domain by GFT, several FDIA generation algorithms in the literature as
then its Fourier coefficients are multiplied with those filter well as our optimization-based FDIA method developed
weights and finally the signal transformed back into the vertex in our previous paper [24].
domain by the inverse GFT [19]. To circumvent this spectral • To automatically determine the unknown filter weights by
decomposition and domain transformation, polynomial GFs an end-to-end data-driven approach, we propose a scal-
are proposed in [20] in which localized filters are learned able, ARMA GF-based GNN model that jointly detects
directly in the vertex domain [21]. For a polynomial GF, the and localizes the FDIAs in a few milliseconds. The
output of each vertex v is only dependent on the K-hop neigh- proposed architecture efficiently predicts the presence of
borhood of v and its spectral response is a K-order polynomial. the attack for the whole grid and for each bus separately.
Polynomial GFs, which are also referred to as finite impulse • To fairly compare the proposed method with the cur-
response (FIR) GFs due to the local information sharing, per- rently available approaches, we implement the other data-
form a weighted moving average (MA) filtering [22], [23]. driven models in the literature and compare our detection
However, FIR GFs may require a high degree polynomial to and localization results with them. Hyperparameters of
capture the global structure of the graph. In fact, interpola- the models are tuned systematically using the Bayesian
tion and extrapolation performance of high degree polynomials hyper-parameter optimization technique.
are unsatisfactory [22], and they are not “flexible” enough • To adequately assess the localization performance, we
to adapt to sudden changes in the spectral domain [21]. To evaluate the localization results, using both sample wise
overcome this limitation, infinite impulse response (IIR) type and node wise comparisons. For instance, although sam-
GFs performing Auto-Regressive Moving Average (ARMA) ple wise localization could yield fairly high accuracy for
are proposed in [22]. Contrary to FIR GFs, IIR GFs have ratio- the entire system, the same set of nodes could be missed
nal type spectral responses. Therefore, IIR GFs can implement or falsely alarmed at each sample. If revealed, these nodes
more complex responses with a low degree of polynomials could be easily targeted by the intruders.
both in the numerator and denominator since rational func- • To better analyze and visualize the multidimensional data
tions have better performance compared to polynomial ones in processed by the implemented models, we embed them
terms of interpolation and extrapolation capabilities [21], [22]. into a two dimensional (2D) space using the t-SNE algo-
Detection and localization of FDIA can be a challenging rithm [25]. By visually inspecting the output of models’
task if an intruder has ‘enough’ information about the grid intermediate layers in 2D, it is verified that the ARMA
to create a stealth attack [7]. S/he can hide an attack vector GNN based model preserves the structure of the data, and
into an honest sample if the topology of the grid is ignored. hence gives better detection performance.

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
810 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

The rest of this paper is organized as follows. Section II


presents the problem formulation. Section III proposes the
approach for the joint detection and localization of FDIA.
Numerical results are presented in Section IV. Section V
finally concludes the paper.

II. P ROBLEM F ORMULATION


The system state x (Vi and θi at each bus i) is esti-
mated using the PSSE module. The PSSE iteratively solves
the optimization problem in (1) phrased as a weighted least
squares estimation (WLSE) using the complex power mea-
surements z collected in noisy conditions by RTUs and
PMUs:
Fig. 2. Visualization of an attack and its prediction on the example
x̂ = min(z − h(x))T R−1 (z − h(x)), (1) IEEE 14 bus system where the actual T = {4, 7, 9, 10, 14} and predicted
x T̂ = {4, 7, 9, 13, 14} areas are enclosed with the solid red and dashed
where R represents measurements’ error covariance matrix and green surfaces, respectively. True positives T ∩ T̂ = {4, 7, 9, 14}, false pos-
itives T  ∩ T̂ = {13}, false negatives T ∩ T̂  = {10}, and true negatives
z includes Pi , Qi , Pij , and Qij . T  ∩ T̂  = {1, 2, 3, 5, 8, 6, 11, 12} are represented by yellow circles, green
FDIA aim to deceive the PSSE by deliberately injecting triangles, red squares, and black circles, respectively. In this example, the
false data a into some of the original measurements zo in such presence of the attack is correctly predicted. Nevertheless, attack to the bus
10 is missed and bus 13 is falsely alarmed even though it is not under attack.
a way that the state vector x converges to another point in the
state space of the system. Formally,
zo = h(x̂), za = zo + a = h(x̌) (2) A. Spectral Graph Filters
which means if an adversary can design a = h(x̌) − h(x̂), s/he In spectral graph theory, the normalized Laplacian operator
can change the system state from x̂ to x̌ without being detected L = In − D−1/2 WD−1/2 = UUT ∈ Rn×n plays an important
by the LNRT based traditional BDD systems. role for graph G where D and In ∈ Rn×n represent the degree
In general, an adversary tries to change specific measure- and identity matrices, respectively. The columns ui ∈ Rn×1
ment(s) in the power system in order to maximize the damage of matrix U = [u1 , . . . , un ] ∈ Rn×n store the n orthonor-
to the grid and at the same time minimize the probability of mal eigenvector ui and constitute the graph Fourier basis.
being detected. To this end, s/he alters some other measure- Diagonal matrix  = diag([λ1 , . . . , λn ]) ∈ Rn×n captures the
ment(s) connected to the targeted meter(s) since each x relates n eigenvalues representing the graph Fourier frequencies [18].
to multiple z through z = h(x). In order to reflect this con- Analogously to the classical Fourier Transform, Graph Fourier
straint and to be realistic, we assume that an adversary targets Transform (GFT) transforms a vertex domain signal into the
a specific area of grid represented by T and crafts the attack spectral domain: the forward and inverse GFT are defined by
vector a by changing the measurements denoted by Tz to spoil X̃ = UT X, and X = UX̃, where X and X̃ ∈ Rn×f denote
the state variables represented by Tx in this area. the vertex and spectral domain signals with f features at each
The grid operator, in contrast, aims to detect those attacks node, respectively [18]. In fact, X is filtered by a GF h:
and localize the attacked buses if there are any. Therefore,
we formulate the FDIA detecting and localization problem as Y = h ∗ X = h(L)X = Uh()UT X (3)
a multi-label classification task where each bus has a binary
by first converting the vertex domain signal X into the spec-
label indicating the presence of attack with true label 1. We
tral domain using the forward GFT, then scaling the Fourier
also reserve an extra binary label for the whole grid to denote
components by h() = diag [h(λ1 ), . . . , h(λn )], and finally
the attack presence with true label 1. Fig. 2 clarifies the
reverting it back to the vertex domain by the inverse GFT [18].
proposed multi-label classification approach by depicting the
For example, X, h, and Y may correspond to bus injections
actual and predicted under attack buses for an exemplary attack
values with high frequency noise, a low pass GF and filtered
on the IEEE 14-bus test system.
bus injections values, respectively in eq. (3). Nonetheless, this
spectral filtering is not spatially localized since each λi is pro-
III. J OINT D ETECTION AND L OCALIZATION OF FDIA
cessed for each node. Besides its computational complexity
Connected, undirected and weighted graph G = (V, E, W) is high due to eigenvalue decomposition (EVD) of L and the
having a finite set of vertices V with |V| = n, a finite set of matrix multiplications with U and UT .
edges E, and a weighted adjacency matrix W ∈ Rn×n can be
used to represent the topology of a smart power grid [18]. In
this representation, buses correspond to vertices V, branches B. Polynomial Graph Filters
and transformers corresponds to edges E and line admittances To localize spectral filters and reduce
K−1their complexity, poly-
correspond to W. Similarly, a signal or a function f : V → R nomial spatial filters hPOLY (λ) = k=0 ak λ are proposed
k

in G is represented by a vector f ∈ Rn , where the element i to approximate the required filter response [20]. Since only
of the vector corresponds to a scalar at the vertex i ∈ V. K-hop neighbors of v are considered to calculate the filter

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 811

response at each v ∈ V, they are K-localized. In fact, they


implement the weighted MA filtering in the form of FIR [23].
Chebyshev polynomial approximation [26] is one of the
preferred methods in signal processing due to their fast com-
putation since they are generated via a recursion and not
a convolution [27]. The Chebyshev polynomial of the first
kind Tk (x) can be computed recursively Tk (x) = 2xTk−1 (x) −
Tk−2 (x) where T0 (x) = 1 and T1 (x) = x [26]. Thus, a filter h Fig. 3. NN Implementation of ARMA1 filter as a building block of ARMAK
layer. In T fixed iterations, an ARMA1 block realizes eq. (6).
can be approximated by a truncated expansion of Chebyshev
polynomials Tk , up to order K − 1. So, X can be filtered:

K−1 can be  realized by averaging K parallel ARMA1 filters with
Y = h ∗ X = h(L)X = ak Tk (L̃)X (4) Y = K1 K T
k=1 Y K which leads to an ARMA GF with a ratio-
KK bk
k=0 nal frequency response hARMAK (λ̃n ) = k=1 1−ak λ̃n with a
where Tk (L̃) ∈ Rn×n is the Chebyshev polynomial of order K −1 and K order polynomials in its numerator and denomina-
k evaluated at the scaled Laplacian L̃ = 2L/λmax − In [20] tor, respectively. For detailed analysis and justifications, please
where a ∈ RK is a vector of Chebyshev coefficients. Full refer to [21], [22], [29]–[31].
EVD can be omitted since this operation only requires the
largest eigenvalue λmax which can be efficiently approximated D. Frequency Response of Polynomial and Rational GFs
by the power method [28]. Although the MA type Chebyshev To demonstrate the ARMA GFs better fit sharp changes in
(CHEB) GFs are fast and localized, they often require high- the frequency response compared to that of the CHEB GFs,
degree polynomials to capture the graph’s global structure. we design two ideal GFs in equations (7) and (8) for IEEE
In fact, it restricts their ability to adapt sharp transitions 118- and 300-bus test cases, respectively.
in the frequency response due to the poor interpolation and  λ 2λmax
3 <λ< 3
extrapolation capabilities of high degree polynomials [29]. 1, max
h† (λ) = (7)
0, otherwise

C. Rational Graph Filters 1, λ < λmax
h‡ (λ) = 2 (8)
To circumvent these problems, distributed IIR type ARMA 0, otherwise
GFs are proposed in [22], [29]. They better approximate the
sudden changes in the frequency response in comparison with Then, we investigate the approximating capability of the
the FIR type MA GFs due to their rational filter composition. ARMA and the CHEB GFs by numerically analyzing their
A potential building block of K-order ARMA GFs may start frequency responses. Note that similar results can be obtained
with a first order recursive ARMA1 filter: by any other filters or test cases [22]. Let x, y ∈ Rn×1 denote
the input and output of a GF h(λ), respectively. Then, accord-
Y t+1 = aL̃Y t + bX, (5) ing to eq. (3), empirical frequency response h̃ can be calculated
uT y
where Y t is the filter output at iteration t, X is the filter input, by h̃(λi ) = uTi x [29]. Namely, each h̃(λi ) represents how ui ,
i
a and b are arbitrary coefficients, and modified Laplacian corresponding to λi , “scales” x to obtain y.
L̃ = λmax −λ min
In − L is a linear translation of L with same In order to obtain h̃(λi ) values, we first randomly generated
2
eigenvectors as L and shifted eigenvalues λ̃n = λmax −λ min
− λn 216 xs for the aforementioned systems from the normal dis-
2
relative to those of L. According to [30, Th. 1], eq. (5) tribution and filter them by h† and h‡ using eq. (3) to obtain
converges regardless of Y 0 and L values and its frequency ys. Then, a layer of CHEB and ARMA models with no acti-
response is given by hARMA1 (λ̃n ) = b
. In fact, eq. (5) vation function are trained in batches having 26 samples of x
1−aλ̃n and y values until there is no improvement. Next, h̃(λi ) values
provides a useful distributed filter realization [22]. At each
iteration t, each node i revises its output Y ti ∈ Rn×cout with a are calculated for each x, y tuple, averaged for smooth transi-
linear combination of its input Xi ∈ Rn×cin and its adjacent tions, and plotted. As seen from Fig. 4, due to their rational
nodes’ outputs Y t−1 type frequency responses, ARMA GFs are more flexible to
j , where cin and cout denote the number of
channels in the input and the output tensors, respectively. It fit sudden changes for a fixed K when compared to CHEB
can be implemented as a NN layer if we unroll the recursion GFs having polynomial type frequency responses. This consti-
into T fixed iterations: tutes the main motivation for selecting ARMA GFs for jointly
detecting and localizing the FDIA in power grids.
Y t+1 = L̃Y t α + Xβ + θ , (6)
where α ∈ Rcout ×cout , β ∈ Rcin ×cout , and θ ∈ Rcout are trainable E. Architecture of the Proposed Joint Detector & Localizer
weights. Besides, since 0 ≤ λmin ≤ λmax ≤ 2, the modified The proposed joint detector and localizer consists of one
Laplacian can be simplified to L̃ = In − L for λmin = 0, input layer to represent complex bus power injections, L − 1
and λmax = 2 [21]. In Fig. 3, NN implementation of the hidden ARMAK layers to extract spatial features, one dense
ARMA1 block which implements the eq. (6) in T fixed iter- layer to predict the probability of attack at each node, and one
ations is depicted. ARMA1 ’s K-order version ARMAK filter output layer to return an extra bit to indicate the probability

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
812 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

TABLE I
I MPLEMENTED FDIA S

ARMAK layer’s output channel is selected as one in order


to have one feature for each v ∈ V. In addition, ReLU acti-
vation is used at the end of each ARMAK layer to increase
the model’s nonlinear modeling ability, whereas sigmoid is
employed to transform the outputs to probabilities.

IV. E XPERIMENTAL R ESULTS


A. Data Generation
Due to the privacy concerns, there is no preexisting pub-
licly available dataset to train and evaluate the proposed
models against FDIA. Thus, researches use historical load
Fig. 4. Empirical frequency responses of CHEB and ARMA GFs when profiles to mimic the timely deviations of the grids they
approximating ideal filters h† and h‡ applied on IEEE 118- and 300-bus test simulate [14], [15], [32]–[35]. We take the same approach
systems, respectively. Compared to CHEB, ARMA better approximates the
desired filter for the same K (e.g., h̃cheb3 vs h̃arma3 ) and it requires a lower based on the historical load profile of NYISO [36] to generate
K for the same level of approximation (e.g., h̃cheb11 vs h̃arma5 ). our dataset. As a first step, we download 5-minute intervals of
the actual load profile of NYISO for July 2021 and interpolate
them to increase the resolution to 1-minute. Next, we generate
a realistic dataset following the Algorithm 1 in our previous
work [24] for the IEEE 57-, 118-, and 300-bus standard test
cases using 1-minute interval load profile. Namely, for each
timestamp, load values are distributed and scaled to buses pro-
portional to their initial values, AC power flow algorithms are
executed, and 1% noisy power measurements are saved.
To simulate the FDIA, we implement some of the frequently
used FDIA generation algorithms in the literature such as data
replay attacks (Ar ) [32], [33], data scale attacks (As ) [14], [15],
and distribution-based (Ad ) attacks [37], [38] as well as our
Fig. 5. Architecture of the proposed ARMA GNN based detector and local-
izer with three hidden layers where each ARMAK layer consists of K parallel constrained optimization based FDIA (Ao ) method explained
ARMA1 . Each one of the K dashed blocks in an ARMAK layer corresponds thoroughly in [24]. While Ar simply changes a measurement zio
to an ARMA1 block depicted with a dashed block in Fig. 3. While complex with one of its previous values at τ back in time, As multiply
power injections P, Q and predicted attack probabilities Y, S at the node and
graph level are visualized with thick bars at each node, activation and mean it with a number sampled from a uniform distribution (U )
value functions are represented with σ and μ, respectively. between 0.9 and 1.1. In contrast, Ad mimics the mean (μ) and
variance (σ 2 ) of the measurement by sampling from a nor-
mal distribution (N ) and Ao solves a constrained optimization
of attack at the graph level. Its architecture is demonstrated in problem to maximize the changes in state variables while
Fig. 5 for L = 3 with a small graph having n = 5. minimizing the changes in measurements.
In this multi-layer GNN model, the input tensor [Pi , Qi ] is Implemented attacks types, their formulations and some
given by X0 ∈ Rn×2 , the output tensor of hidden layer l is works that have utilized them are given in Table I.
denoted by Xl ∈ Rn×cl , and model outputs are denoted by We shuffled the whole data to eliminate the seasonality,
Y ∈ Rn and S ∈ R to indicate the location and the presence standardized it with a zero mean and a standard deviation
of the attack, respectively, where cl represents the number of of one to have a faster and smoother learning process, and
channels in layer l for 1 ≤ l ≤ L. In particular, while an split it into three sections: 2/3 for training, 1/6 for validating
ARMAK layer takes Xl−1 ∈ Rn×cl−1 as input and produces and hyper-parameter tuning, and 1/6 for testing the proposed
Xl ∈ Rn×cl as output in layer l, the dense layer propagates the models. In order to evaluate the performance of our method
information to the whole graph and outputs the probability under unseen attack types, we arbitrarily selected Ao and Ad
of the attack at the node level with Y ∈ Rn for localization. and included them in the training and validation splits. Test
Finally, the output layer detects the attack at the graph level split, in contrast, includes all of the four FDIA methods given
by S = max(Y) ∈ R and outputs it with Y. Note that the last in Table I. The number of honest samples are equalized with

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 813

TABLE II TABLE III


N UMBER OF S AMPLES IN E ACH S PLIT O PTIMIZED M ODEL H YPER -PARAMETERS

the number of malicious samples in each split to have a bal-


anced classification problem as can be seen from Table II.
The final dataset assumes 60 samples/hour × 24 hour/day ×
24 day = 34560 samples which consist of complex power mea-
surements, complex bus voltages, and n + 1 binary labels to
indicate the true labels for each bus and the whole grid at each
timestamp.

B. Feature Selection, Performance Metrics, and Training


To be able to rapidly detect and localize the attacks instead
of waiting for Vi and θi values at the output of the PSSE
process, we employ power measurements as input features in
our detectors. From the power measurements, only Pi and Qi
values are fed to the models
 as seen from the input layer of TABLE IV
Fig. 5 since Pi + jQi = k∈i Pik + jQik , node features can D ETECTION R ESULTS IN DR, FA, AND F1 P ERCENTAGES
represent branch features as summation in their corresponding
set of buses i connected to bus i. Besides, it is experimentally
verified that utilizing Vi and θi values along with Pi and Qi
does not increase the model performance due to tuples’ high
correlation. PSSE and BDD modules, on the contrary, continue
to receive every available measurement to operate. As for the
weighted adjacency matrix we select W = |Ybus | to calculate
L̃ and feed the ARMAK layers where Ybus ∈ Rn×n denotes
nodal admittance matrix.
For performance evaluation we use detection rate DR =
TP+FN , false alarm rate FA = FP+TN , and F1 score F1 =
TP FP
2∗TP
2TP+FP+FN , where TP, FP, TN, and FN represent true pos- knowledge [15] is the only data-driven approach in the liter-
itives, false positives, true negatives, and false negatives, ature in which authors employ LSTM architecture to localize
respectively [16]. In addition, to overcome the division by zero the FDIA. Thus, we trained an LSTM model with our dataset
problem when there is no attack at all, we assumed DR = 1, to compare the performances. In addition, although they
FA = 0, and F1 = 1 if all the labels are correctly predicted are proposed for detection, we implement other available
as not attacked. Otherwise, even if there is one mismatch, we methods in the literature suitable for the multi-label clas-
assign DR = 0, FA = 1, and F1 = 0. sification task such as Decision Tree (DT) [42], K-Nearest
All free unknown parameters defined in the model are Neighbor (KNN) [43], Multi Layer Perceptron (MLP) [44],
computed by a multi-label supervised training using the Convolutional Neural Network (CNN) [43], and Chebyshev
binary cross-entropy loss. Training samples are fed into the GNN (CHEB) [24]. We train, validate and test these models
model as mini batches of 256 samples with 256 maximum similarly to the proposed model using our dataset as we do
number of epochs. In addition, we employ early stopping not have access to the data set of corresponding works.
criteria where 16 epochs are tolerated without any improve- Model hyper-parameters are tuned with Sklearn [40]
ment less than e−4 in the validation set’s cross entropy and Keras-tuner [45] Python libraries by using Bayesian
loss. All the implementations were carried out in Python 3.8 optimization techniques. Models are trained on the training set
using the Pandapower [39], Sklearn [40], t-SNE [25], and and their hyper-parameters are optimized on the validation set
Tensorflow [41] libraries on Intel i9-8950 HK CPU 2.90GHz for each IEEE test system for 250 trials. Finally, models with
with NVIDIA GeForce RTX 2070 GPU. optimal parameters in terms of the validation set performance
are saved and their results are presented for detection and
C. Joint Detection and Localization Results localization. Table III shows the hyper-parameter set and the
Since we take a data-driven approach in this work, we optimal hyper-parameters for each model and test system.
implement other existing data-driven approaches from the lit- In Table IV, detection performance of the optimized models
erature to compare with our method. To the best of our for each test system is tabulated as percentages. For all test

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
814 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

systems, although KNN yields the best FA rate, its F1 scores


are not satisfactory since it gives the lowest DR. ARMA, in
contrast, reaches the best F1 scores with 99.81%, 99.44%, and
99.91%, due to its high DR with 99.90%, 99.13%, and 99.97%
and low FA rate with 0.28%, 0.24%, and 0.14% for 57-, 118-,
and 300-bus systems, respectively. Although detection results
are close to each other in terms of F1 scores for some models
such as CHEB and ARMA, CHEB yields almost two and five
times FA rate for IEEE 118- and 300- bus systems, respec-
tively. Nevertheless, detection considers the attacks at the grid
level and any intrusion to a bus in the grid is regarded as an
attack. Thus, bus level localization is required to determine
the exact place of the attack.
Since localization is a multi-label classification problem, we
evaluate it in both possible ways: (i) sample-wise (SW) evalu-
ation yields b metrics where each one of b samples at a fixed
time-step is treated individually along the buses, and (ii) node-
wise (NW) evaluation yields n metrics where each one of n
buses is evaluated separately along the samples. Therefore, in
order to better assess the models, we visualize and analyze the
distributions of SW and NW localization results in F1 percent-
ages by box plots and ratio of items satisfying some specified
thresholds. Box plots helps us to visualize the distribution by
drawing first (Q1 , 25th percentile), second (Q2 , 50th percentile
or median), and third (Q3 , 75th percentile) quartiles, lower
(LW = Q1 − 1.5 × (Q3 − Q1 )) and upper (UW = Q3 + 1.5 ×
(Q3 − Q1 )) whiskers and outliers [46]. In addition, the ratio of
the samples or buses satisfying some thresholds provides quan-
tifiable metrics to assess model performances. For instance, the
percentage of samples (buses) having F1 ≤ 5% or F1 ≥ 95%
in SW (NW) evaluation can be used to measure the ratio of
“unacceptable” and “acceptable” samples in the distributions,
respectively.
SW localization results are given in Fig. 6. Since the dis-
tributions are highly left skewed, the median (Q2 ) values can
overlap with Q3 . In specific, Q2 = Q3 = UW = 100% except
the MLP for IEEE-300 and DT for all systems. This is not
surprising because 50% of the samples are not attacked and
it is relatively easy to predict them as not attacked for each
bus. Q1 and LW, in contrast, vary for each model and test Fig. 6. Distribution of F1 scores for sample wise evaluation of localization.
system. For example, LW = Q1 = 0 in all test systems
for KNN model which shows that F1 = 0% for more than
1/4 of the samples for that model. Although DT yields bet- Its “acceptable” (F1 ≥ 95%) percentages are 5.64%, 8.56%,
ter results compared to KNN, its results are unsatisfactory: its and 10.07% greater than the second best model CHEB in
Q1 values are 69.39%, 63.16%, and 63.33%, for 57-, 118- SW localization, for IEEE 57-, 118-, and 300-bus test cases,
and 300-bus test systems, respectively. Models from the NN respectively.
family better localize the attacks compared to the classical In Fig. 7, the distribution of F1 scores for NW evalua-
ML approaches. For example, their Q1 values are greater than tion is depicted. Due to the largely left skewed distributions,
79.74% for all test systems. Namely, in at least 3/4 of the the median (Q2 ) values may overlap with Q3 . Specifically,
samples, attacked buses are correctly labeled with F1 score Q3 = 100% for all the models and systems, and Q2 = 100%
deviating between 79.74% and 100%. To better compare the for all the models in IEEE-118. Similar to the SW evaluation,
model performances, the percentages of the samples having performance of DT and KNN is poor: their Q1 values deviate
F1 ≤ 5% and F1 ≥ 95% are given in Fig. 6(d) for each between 24.19% and 64.86%. MLP, LSTM, and CNN pro-
model and test system. ARMA gives the best results in all test vide better results compared to DT and KNN. Nevertheless,
cases: while the sample percentages for F1 ≤ 5% are calcu- they are subject to some outliers at 0% which means there
lated as 0.21%, 0.56%, and 0.10%, sample percentages with are some buses that are always mislabeled at each times-
F1 ≥ 95% are measured as 79.53%, 83.00%, and 79.03% tamp. The only model that can localize the FDIA for each
for IEEE 57-, 118-, and 300-bus test systems, respectively. bus with at least 80% F1 score is ARMA. Namely, for all

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 815

TABLE V
J OINT D ETECTION AND L OCALIZATION T IMES IN M ILLISECONDS

D. Joint Detection and Localization Times


We measure the elapsed time during model’s joint detec-
tion and localization process for each sample in the test set,
calculate the mean values, and tabulate them in Table V.
Clearly, detection times of KNN are not satisfactory: it can
take more than 0.8 second to respond. It is due to the fact
that in KNN each new sample has to be compared with others
for proximity calculation. LSTM, in contrast, provides better
results compared to the KNN. Nevertheless, its highly com-
plex recurrent architecture can delay its output almost 0.1
second for IEEE-300, which may limit its application in a
real time scenario. All the other models including DT, MLP,
CNN, CHEB, and ARMA provide reasonable detection times
for a real time application: for all test system their response
time is less than 3 milliseconds. Among them DT provides
the best detection times with values under 0.7 milliseconds;
yet, its poor detection and localization performance hinders its
suitability as a reliable method.

E. Visualization of Intermediate Layers With t-SNE


To compare the proposed model with the existing
approaches, we analyze and visualize the multidimensional
data processed by the intermediate layers of the proposed mod-
els. Nevertheless, the high dimensionality of the data severely
limits examining them. Besides, examining a specific feature
of a bus does not provide enough information to fully compre-
hend how the model processes the grid. Thus, we transform
the layer outputs by using the t-distributed stochastic neigh-
bor embedding (t-SNE), which is a nonlinear dimensionality
reduction technique to visualize the high dimensional data in
Fig. 7. Distribution of F1 scores for node wise evaluation of localization. two or three dimensional spaces [25]. By iteratively minimiz-
ing the Kullback-Leibler divergence between the probability
distributions representing the sample similarities in the original
and mapped spaces, it projects samples into the low dimen-
the test systems, the ARMA model can determine the loca- sional space. Thus, it preserves the structure of the data and
tion of an FDIA attack for all buses with F1 score greater enables visualization of the data in a lower dimension [25].
than 80%. Fig. 7(d) presents the percentages of buses satisfy- Due to space limitations, only models trained for the IEEE-
ing F1 ≤ 5% and F1 ≥ 95% levels. For all test systems, 300 bus system are analyzed in two dimensions (2D) with
only ARMA model has 0% with F1 ≤ 5% success level test data having 5, 760 samples. Embedding of input data
which means only ARMA model doesn’t yield any “unac- [P, Q] = X ∈ R300×2 is plotted in Fig. 8(a), where a dominat-
ceptable” bus localization performance. In comparison, one ing daily profile can be seen from the smooth transition from
bus in IEEE 118- and 5 buses in IEEE 300-bus systems the lower left to the upper right samples depicted with green
always have F1 score less than 5% in all timestamps for stars (attacked) and black circles (non-attacked). Moreover,
the second best CHEB model. For the F1 ≥ 95% threshold, the close proximity between attacked and non-attacked sam-
only ARMA model can surpass the 70% level for each test ples indicates that the attacked samples preserve similarity to
systems and it outperforms the second best model CHEB by their non-attacked samples. Fig. 8(b) shows the embedding
8.78%, 11.87%, and 14.67% for the 95% F1 threshold level in of true output Y ∈ R300 where non-attacked samples clus-
NW localization for IEEE 57−, 118−, and 300-bus systems, tered in the middle and attacked samples are scattered around
respectively. them. This is not surprising since non-attacked samples are

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
816 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

as those in [10]–[14] do not require any historical datasets.


Nevertheless, scalability, manual threshold optimization pro-
cess, detection lags, model complexity, and localization reso-
lution could hinder the usability of them for real applications.
For instance, results are not published in [12] and localiza-
tion could only be done at the cluster level in [10]. Detection
times are larger than a second in [11] and [13] for small
test systems having 12 and 36 buses, respectively. In their
model-based detectors, authors of [14] utilize GSP techniques
such as Local Smoothness (LS) and Vertex-Frequency Energy
Fig. 8. Embedded input and output data for IEEE-300 bus system where Distribution (VFED). Nonetheless, they evaluate their method
attacked and non-attacked samples are depicted with green stars and black
circles, respectively. with an easily detectable attack by the classical LNRT based
BDD methods which can conceal the actual performance
of the model. Specifically, they simulate the FDIAs using
all formed from 0s and attacked samples include 1s in their zia (t) = zio (t) + (−1)d .a.u.range where d ∼ {0, 1} is a
corresponding labels to indicate the attacked bus. binary random variable (r.v.), u ∼ U [0, 1] is an uniform r.v.,
Fig. 9 demonstrates the embedding of layer l’s output where range = max(zio ) − min(zio ) and a is scaler for the attack. For
each model takes X (Fig. 8(a)) as input, transforms it in instance, if zio ∼ N (μo , σo2 ), expected values of the attacked
the hidden layers, and tries to predict Y (Fig. 8(b)) as out- data distribution become E [μa ] = μo , and E [σa ] = 6aσo due
put. The number of TP (green stars), FP (blue diamonds), to the product properties of uniform and normal distributions,
FN (red squares), and TN (black circles) samples are given where μo , σo and μa , σa tuples represent the mean and stan-
under the model name for easy reference. MLP clearly falls dard deviation of original and attacked data, respectively. The
behind the other approaches due to the FNs scattered all accuracy of localization for IEEE 118-bus test system when
around. For instance, unlike other approaches, MLP misses a = 4, which makes E [σa ] = 24σo , are 85% and 91% for the
easily detectable attacked samples in l2 very close to the TP LS and VFED techniques, respectively. These high accuries
cluster and it maps many FNs nearby to the TNs placed at are not realistic since the scaler a plays a significant role in
the lower part of its last layer. LSTM, in contrast to the MLP, simulation process.
reduces FN and FP samples. However, in l2 , it falsely maps The data-driven methods, in contrast, present a bet-
many attacked samples adjacent to the non-attacked samples ter performance since historical datasets are growing and
which yields a high number of FNs. In addition, like the MLP, the modeling capability of these algorithms is being
it falsely predicts multiple non-attacked clusters in its final increased [8], [9]. For example, in their data-driven method
layer. in [15] researchers employ an LSTM model for each measure-
Contrarily, CNN is one of the best models in terms of ment in a 5-bus test system in which only one bus is under
FP number. Yet, it “destroys” the structure of data in l1 attack at a time to detect and localize the point-wise FDIAs.
which brings a significant number of FNs. We believe it is They report greater than 90% accuracy for detection and local-
due to the fact that CNN tries to capture the correlations of ization of random, ramp, and scale attacks for low, medium,
non-Euclidean data in an Euclidean space and samples from and high attack levels. However, the capability of this method
different classes may look the same in that space. Due to for detection and localization of different FDIAs in large test
their inherent graph architecture, CHEB and ARMA yield cases has not been investigated. Besides, assigning an indepen-
better results since they both consider the “structure” of the dent model to each measurement has two major drawbacks:
data within X in their graph convolutional layers. However, (i) overall model complexity increases severely, and (ii) spatial
CHEB misses 5 more attacks and yields more than 5 times correlations of the measurements are ignored totally.
FP samples compared to the ARMA. For instance, many In data-driven approaches, compatibility between the struc-
non-attacked samples in l4 are falsely regarded as an attack ture of collected data and architecture of the data-driven model
due to close mapping to an attacked cluster. Conversely, our is the primary factor on the performance of the model. For
proposed model gives only 4 FP and 1 FN due to its rational instance, DT, KNN, or MLP architectures could be better
graph convolutional filters that provide more flexible frequency suited for a dataset having uncorrelated features from different
responses. Note that no sample is mapped in the vicinity spaces. Similarly, an RNN architecture might be more appli-
of attacked samples unlike the other models. Besides, only cable to model the recurrent relations in a natural language
ARMA outputs a highly similar pattern to Y: a non-attacked data. A CNN architecture, in contrast, could be more favor-
sample cluster in the center and attacked samples distributed able than GNN for an image data where pixel locality is well
around it. modeled in 2D Euclidean space. However, as demonstrated
with Fig. 1(a), spatial correlations in power measurements data
can only be captured in a non-Euclidean space dictated by
F. Discussions & Theoretical Comparisons the topology of the grid. For instance, if we had a hypothet-
As indicated earlier, two main approaches have been ical power grid like in Fig. 1(a), a CNN architecture could
proposed for detecting and localizing the FDIAs: model-based have comparable performances with ARMA. Nonetheless, for
and data driven approaches [8]. Model-based approaches such a power grid data collected from graph type structure, a GNN

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 817

Fig. 9. t-SNE embedding of model’s layers to visualize the attack detection where true input and output data are given in Fig. 8. For each model and each
layer l, output of the model is embedded in 2D using t-SNE. Since t-SNE preserves the structure of the high dimensional data, models’ transformation can be
visualized and compared in a lower dimension, such as 2D. Note that due to its topology aware ARMA graph filtering, the proposed model better classifies
samples by converging to the true output depicted in Fig. 8(b). As a consequence, it yields the minimum number of FP and FN compared to other models.

is more advantageous than other architectures as can be seen the global characteristics of the graph, an FIR GF requires
from the detection results in Table IV and the localization dis- “high” order spectral response as can be seen from Fig. 4.
tributions in Figs. 6 and 7. As for the GNN family, ARMA Nevertheless, due to the poor interpolation and extrapolation
outperforms CHEB due to the fact that rational GFs imple- capabilities of the high order polynomials, it becomes sensi-
mented using the ARMA architecture provide more flexible tive to variations and may overfit to the training data [21].
frequency responses compared to the polynomial filters such To verify this characteristic, we fix the other parameters of
as CHEB [29]. CHEB GF at their optimal values tabulated in Table III and
It is observed from our extensive experiments that the train a CHEB model for the IEEE 300-bus test system for
proposed ARMA based model performs better compared to each K ∈ {5, 7, 9, 11}. FDIA detection results in terms of
other models for larger test cases. As an illustration, for the F1% are depicted in Fig. 10. Clearly, increasing K beyond a
95% F1 threshold level, it outperforms the second best model certain point makes the model susceptible to variations such
CHEB by 5.64%, 8.56%, and 10.07% in SW localization and as noise, and thus, it can degrade the test set performance.
by 8.78%, 11.87%, and 14.67% in NW localization for IEEE Note that similar conclusion can also be corroborated for the
57-, 118-, and 300-bus systems, respectively. This difference localization results.
is due to the fact that in larger and denser graphs, (i) the spa- Bus level localization is a multi-label classification task and
tial correlation between adjacent measurements becomes more should be evaluated accordingly. Besides, performance metrics
dominant compared to the global correlations and (ii) ARMA can cause inaccurate or misleading outcomes when they are
GFs better adapt to abrupt changes in the spectral domain not interpreted correctly. Namely, missing an attack (FN) could
compared to the polynomial ones. be much more severe than a false alarm (FP) when dealing
As stated before, the output of each vertex v only depends with FDIAs due to their consequences. An example of local-
on its K-hop neighbors for a K-order polynomial GF. In other ization results for a hypothetical model is given in Table VI
words, the output of v is independent of the vertices beyond the with 4 samples in rows and 5 buses in columns where TP, FP,
K-hop neighbors for a K-order FIR GF [22]. Thus, to capture FN, and TN samples are highlighted with green, blue, red, and

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
818 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 1, JANUARY 2022

confirm that the performance of the proposed model in detect-


ing FDIA exceeds the performance of CHEB model by 0.12%,
0.68%, and 0.38% for IEEE 57-, 118-, and 300-bus, respec-
tively. The proposed model also outperforms the CHEB model
in localizing the attacks (i.e., 95% F1 threshold level) by
5.64%, 8.56%, and 10.07% in SW localization and by 8.78%,
11.87%, and 14.67% in NW localization for the same above-
mentioned test systems, respectively. Furthermore, visualizing
Fig. 10. Detection performance vs. filter order of CHEB models for IEEE the intermediate layers for different approaches including those
300-bus system. Optimal results are obtained at K = 4 as given in Table III. in literature corroborates the supremacy of the proposed model
in detecting FDIA.
TABLE VI
SW AND NW L OCALIZATION E XAMPLE IN ACC% AND F1%
R EFERENCES
[1] X. Yu and Y. Xue, “Smart grids: A cyber–physical systems perspective,”
Proc. IEEE, vol. 104, no. 5, pp. 1058–1070, May 2016.
[2] K. R. Davis, K. L. Morrow, R. Bobba, and E. Heine, “Power flow
cyber attacks and perturbation-based defense,” in Proc. IEEE 3rd Int.
Conf. Smart Grid Commun. (SmartGridComm), Tainan, Taiwan, 2012,
pp. 342–347.
[3] S. Sridhar, A. Hahn, and M. Govindarasu, “Cyber–physical system
security for the electric power grid,” Proc. IEEE, vol. 100, no. 1,
pp. 210–224, Jan. 2012.
[4] A. Abur and A. G. Expósito, Power System State Estimation: Theory
and Implementation (Power Engineering (Willis)). New York, NY,
USA: Marcel Dekker, 2004. [Online].Available: https://books.google.
com/books?id=NQhbtFC6_40C
black colors, respectively. In addition, SW and NW localiza- [5] G. B. Giannakis, V. Kekatos, N. Gatsis, S.-J. Kim, H. Zhu, and
tion results are given at the end of each row and columns in B. F. Wollenberg, “Monitoring and optimization for power grids: A sig-
nal processing perspective,” IEEE Signal Process. Mag., vol. 30, no. 5,
terms of accuracy ACC = TP+TN+FP+FN
TP+TN
and F1 percentages. pp. 107–128, Sep. 2013.
The ACCsw is not a reliable metric since it can not prop- [6] G. Liang, J. Zhao, F. Luo, S. R. Weller, and Z. Y. Dong, “A review of
erly take into account the distribution of errors. For instance, false data injection attacks against modern power systems,” IEEE Trans.
Smart Grid, vol. 8, no. 4, pp. 1630–1638, Jul. 2017.
although ACCsw shows high accuracy for all the samples, [7] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against
it does not have any mechanism to mirror the faults at n2 state estimation in electric power grids,” ACM Trans. Inf. Syst. Security,
vol. 14, no. 1, pp. 1–33, 2011.
which can have serious consequences for the power system. [8] A. S. Musleh, G. Chen, and Z. Y. Dong, “A survey on the detection
Comparing F1 with ACC reveals that F1 has a better mecha- algorithms for false data injection attacks in smart grids,” IEEE Trans.
nism to evaluate the accuracy of the model. For instance, the Smart Grid, vol. 11, no. 3, pp. 2218–2234, May 2020.
[9] A. Sayghe et al., “Survey of machine learning methods for detecting
ACCsw = 60% for s3 and s4 since they have the same number false data injection attacks in power systems,” IET Smart Grid, vol. 3,
of falsely predicted samples. F1sw metric, in contrast, yields no. 5, pp. 581–595, 2020.
50% for s3 and 75% for s4 since s4 includes 1 more TP com- [10] T. R. Nudell, S. Nabavi, and A. Chakrabortty, “A real-time attack local-
ization algorithm for large power system networks using graph-theoretic
pared to the s3 . Since our focus is to determine the localization techniques,” IEEE Trans. Smart Grid, vol. 6, no. 5, pp. 2551–2559,
of FDIAs, then F1 is the proper candidate to evaluate the Sep. 2015.
[11] M. Khalaf, A. Youssef, and E. El-Saadany, “Joint detection and mitiga-
accuracy of the model. The result and discussion reveals the tion of false data injection attacks in AGC systems,” IEEE Trans. Smart
supremacy of our model compared to DT, MLP, RNN, CNN, Grid, vol. 10, no. 5, pp. 4985–4995, Sep. 2019.
and CHEB models in terms of detection and localization of [12] E. Drayer and T. Routtenberg, “Cyber attack localization in smart grids
by graph modulation (brief announcement),” in Proc. Int. Symp. Cyber
FDIAs. Security Cryptogr. Mach. Learn., 2019, pp. 97–100.
[13] X. Luo, Y. Li, X. Wang, and X. Guan, “Interval observer-based detection
and localization against false data injection attack in smart grids,” IEEE
Internet Things J., vol. 8, no. 2, pp. 657–671, Jan. 2021.
V. C ONCLUSION [14] M. A. Hasnat and M. Rahnamay-Naeini, “Detection and locating cyber
and physical stresses in smart grids using graph signal processing,” 2020.
This work proposed a GNN based model by integrating [Online]. Available: arXiv:2006.06095.
the underlying graph topology of the grid and spatial corre- [15] A. Jevtic, F. Zhang, Q. Li, and M. Ilic, “Physics- and learning-based
lations of its measurement data to jointly detect and localize detection and localization of false data injections in automatic generation
control,” IFAC-PapersOnLine, vol. 51, no. 28, pp. 702–707, 2018.
the FDIAs in power systems while the full AC power flow [16] C. Bishop, Pattern Recognition and Machine Learning (Information
equations are employed to address the physics of the network. Science and Statistics). New York, NY, USA: Springer, Jan. 2006.
Adopting IIR type ARMA GFs in its hidden layers, the [17] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A com-
prehensive survey on graph neural networks,” IEEE Trans. Neural Netw.
proposed model is more flexible in frequency response com- Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021.
pared to FIR type polynomial GFs, e.g., CHEB thanks to [18] A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst,
“Graph signal processing: Overview, challenges, and applications,” Proc.
their rational type filter composition. Although our algorithm IEEE, vol. 106, no. 5, pp. 808–828, May 2018.
has better detection and localization performance compared to [19] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
the state of the art CHEB model [24] in the literature, the P. Vandergheynst, “The emerging field of signal processing on graphs:
Extending high-dimensional data analysis to networks and other irreg-
improvement rate for localization is much higher than detec- ular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98,
tion. Simulations performed on various standard test systems May 2013.

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.
BOYACI et al.: JOINT DETECTION AND LOCALIZATION OF STEALTH FALSE DATA INJECTION ATTACKS 819

[20] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural Osman Boyaci (Graduate Student Member, IEEE)
networks on graphs with fast localized spectral filtering,” in Advances in received the B.Sc. degree (Hons.) in electronics
Neural Information Processing Systems (NIPS). Red Hook, NY, USA: engineering in 2013, the B.Sc. degree (Hons.) in
Curran, 2016. computer engineering in 2013, and the M.Sc. degree
[21] F. M. Bianchi, D. Grattarola, L. Livi, and C. Alippi, “Graph in computer engineering in 2017 from Istanbul
neural networks with convolutional ARMA filters,” IEEE Technical University, Istanbul, Turkey. He is cur-
Trans. Pattern Anal. Mach. Intell., early access, Jan. 26, 2021, rently pursuing the Ph.D. degree with Texas A&M
doi: 10.1109/TPAMI.2021.3054830. University, working on graph neural network-based
[22] X. Shi, H. Feng, M. Zhai, T. Yang, and B. Hu, “Infinite impulse response cybersecurity in smart grids.
graph filters in wireless sensor networks,” IEEE Signal Process. Lett., His research interests include machine learning,
vol. 22, no. 8, pp. 1113–1117, Aug. 2015. artificial intelligence, and cybersecurity.
[23] N. Tremblay, P. Gonçalves, and P. Borgnat, “Design of graph filters
and filterbanks,” in Cooperative and Graph Signal Processing. London, Mohammad Rasoul Narimani (Member, IEEE)
U.K.: Elsevier, 2018, pp. 299–324. received the B.S. degree in electrical engineer-
[24] O. Boyaci et al., “Graph neural networks based detection of stealth ing from Razi University, the M.S. degree in
false data injection attacks in smart grids,” 2021. [Online]. Available: electrical engineering from the Shiraz University
arXiv:2104.02012. of Technology, and the Ph.D. degree in electri-
[25] L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. cal engineering from the Missouri University of
Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. Science & Technology. He is an Assistant Professor
[26] J. C. Mason and D. C. Handscomb, Chebyshev Polynomials. Boca Raton, with the College of Engineering, Arkansas State
FL, USA: CRC Press, 2002. University. Before joining Arkansas State University,
[27] S. W. Smith, The Scientist and Engineer’s Guide to Digital Signal he was a Postdoctoral Researcher with Texas A&M
Processing, vol. 14. San Diego, CA, USA: California Techn. Publ., 1997. University, College Station. His research interests are
[28] H. Föllmer and U. Küchler, “Richard von Mises,” in Mathematics in in the application of optimization techniques to electric power systems.
Berlin. Basel, Switzerland: Springer, 1998, pp. 111–116.
[29] A. Loukas, A. Simonetto, and G. Leus, “Distributed autoregressive Katherine R. Davis (Senior Member, IEEE)
moving average graph filters,” IEEE Signal Process. Lett., vol. 22, received the B.S. degree in electrical engineering
no. 11, pp. 1931–1935, Nov. 2015. from the University of Texas at Austin, Austin, TX,
[30] A. Loukas, M. Zuniga, M. Woehrle, M. Cattani, and K. Langendoen, USA, in 2007, and the M.S. and Ph.D. degrees in
“Think globally, act locally: On the reshaping of information land- electrical engineering from the University of Illinois
scapes,” in Proc. 12th Int. Conf. Inf. Process. Sens. Netw., Philadelphia, at Urbana–Champaign, Champaign, IL, USA, in
PA, USA, 2013, pp. 265–276. 2009 and 2011, respectively. She is currently an
[31] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving Assistant Professor of Electrical and Computer
average graph filtering,” IEEE Trans. Signal Process., vol. 65, no. 2, Engineering with Texas A&M University.
pp. 274–288, Jan. 2017.
[32] J. Zhao, J. Wang, and L. Yin, “Detection and control against replay
attacks in smart grid,” in Proc. 12th Int. Conf. Comput. Intell. Security Muhammad Ismail (Senior Member, IEEE)
(CIS), Wuxi, China, 2016, pp. 624–627. received the B.Sc. (Hons.) and M.Sc. degrees in
[33] G. Chaojun, P. Jirutitijaroen, and M. Motani, “Detecting false data injec- electrical engineering (electronics and communica-
tion attacks in AC state estimation,” IEEE Trans. Smart Grid, vol. 6, tions) from Ain Shams University, Cairo, Egypt, in
no. 5, pp. 2476–2483, Sep. 2015. 2007 and 2009, respectively, and the Ph.D. degree
[34] S. Tan, W.-Z. Song, M. Stewart, J. Yang, and L. Tong, “Online data in electrical and computer engineering from the
integrity attacks against real-time electrical market in smart grid,” IEEE University of Waterloo, Waterloo, ON, Canada, in
Trans. Smart Grid, vol. 9, no. 1, pp. 313–322, Jan. 2018. 2013. He is currently an Assistant Professor with
[35] J. Giraldo, A. Cárdenas, and N. Quijano, “Integrity attacks on real-time the Department of Computer Science, Tennessee
pricing in smart grids: Impact and countermeasures,” IEEE Trans. Smart Technological University, Cookeville, TN, USA. He
Grid, vol. 8, no. 5, pp. 2249–2257, Sep. 2017. was a co-recipient of the Best Paper Awards in the
[36] New York Independent System Operator (NYISO). (Jul. 2021). Actual— IEEE ICC 2014, the IEEE Globecom 2014, the SGRE 2015, the Green 2016,
Historical. [Online]. Available: https://www.nyiso.com/load-data the Best Conference Paper Award from the IEEE TCGCN at the IEEE ICC
[37] M. Ozay, I. Esnaola, F. T. Y. Vural, S. R. Kulkarni, and H. V. Poor, 2019, and IEEE IS 2020.
“Machine learning methods for attack detection in the smart grid,” Thomas J. Overbye (Fellow, IEEE) received the
IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 8, pp. 1773–1786, B.S., M.S., and Ph.D. degrees in electrical engi-
Aug. 2016. neering from the University of Wisconsin Madison,
[38] J. Yan, B. Tang, and H. He, “Detection of false data attacks in smart Madison, WI, USA. He is currently with Texas
grid with supervised learning,” in Proc. Int. Joint Conf. Neural Netw. A&M University, where he is a Professor and holder
(IJCNN), Vancouver, BC, Canada, 2016, pp. 1395–1402. of the O’Donnell Foundation Chair III.
[39] L. Thurner et al., “Pandapower—An open-source python tool for conve-
nient modeling, analysis, and optimization of electric power systems,”
IEEE Trans. Power Syst., vol. 33, no. 6, pp. 6510–6521, Nov. 2018.
[40] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” J. Mach.
Learn. Res., vol. 12, pp. 2825–2830, Nov. 2011.
[41] M. Abadi et al., “TensorFlow: A system for large-scale machine learn- Erchin Serpedin (Fellow, IEEE) is a Professor
ing,” in Proc. 12th USENIX Symp. Oper. Syst. Design Implement. with the Electrical and Computer Engineering
(OSDI), 2016, pp. 265–283. Department, Texas A&M University, College
[42] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, Station. He has authored four research monographs,
“Decision tree and svm-based data analytics for theft detection in one textbook, 17 book chapters, 170 journal papers,
smart grid,” IEEE Trans. Ind. Informat., vol. 12, no. 3, pp. 1005–1016, and 270 conference papers. His current research
Jun. 2016. interests include signal processing, machine
[43] D. Wang, X. Wang, Y. Zhang, and L. Jin, “Detection of power grid dis- learning, artificial intelligence, cyber security,
turbances and cyber-attacks based on machine learning,” J. Inf. Security smart grids, and wireless communications. He
Appl., vol. 46, pp. 42–52, Jun. 2019. served as an Associate Editor for more than 12
[44] Y. Wang, M. M. Amin, J. Fu, and H. B. Moussa, “A novel data analyti- journals, including journals such as the IEEE
cal approach for false data injection cyber-physical attack mitigation in T RANSACTIONS ON I NFORMATION T HEORY, IEEE T RANSACTIONS ON
smart grids,” IEEE Access, vol. 5, pp. 26022–26033, 2017. S IGNAL P ROCESSING, IEEE T RANSACTIONS ON C OMMUNICATIONS,
[45] T. O’Malley et al.. (2019). Keras Tuner. [Online]. Available: IEEE S IGNAL P ROCESSING L ETTERS, IEEE C OMMUNICATIONS L ETTERS,
https://github.com/keras-team/keras-tuner IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS, IEEE Signal
[46] J. W. Tukey, Exploratory Data Analysis, vol. 2. Reading, MA, USA: Processing Magazine, and Signal Processing (Elsevier), and as a technical
Addison-Wesley, 1977. chair for six major conferences.

Authorized licensed use limited to: JNT University Kakinada. Downloaded on May 09,2022 at 16:28:24 UTC from IEEE Xplore. Restrictions apply.

You might also like