Sensors 23 01683
Sensors 23 01683
Article
Detection of False Data Injection Attacks in Smart Grids Based
on Expectation Maximization
Pengfei Hu 1,2 , Wengen Gao 1,2, * , Yunfei Li 1,2 , Minghui Wu 1,2 , Feng Hua 1,2 and Lina Qiao 1,2
Abstract: The secure operation of smart grids is closely linked to state estimates that accurately reflect
the physical characteristics of the grid. However, well-designed false data injection attacks (FDIAs)
can manipulate the process of state estimation by injecting malicious data into the measurement data
while bypassing the detection of the security system, ultimately causing the results of state estimation
to deviate from secure values. Since FDIAs tampering with the measurement data of some buses will
lead to error offset, this paper proposes an attack-detection algorithm based on statistical learning
according to the different characteristic parameters of measurement error before and after tampering.
In order to detect and classify false data from the measurement data, in this paper, we report the
model establishment and estimation of error parameters for the tampered measurement data by
combining the the k-means++ algorithm with the expectation maximization (EM) algorithm. At the
same time, we located and recorded the bus that the attacker attempted to tamper with. In order to
verify the feasibility of the algorithm proposed in this paper, the IEEE 5-bus standard test system
and the IEEE 14-bus standard test system were used for simulation analysis. Numerical examples
demonstrate that the combined use of the two algorithms can decrease the detection time to less than
0.011883 s and correctly locate the false data with a probability of more than 95%.
Keywords: false data injection attacks; statistical learning methods; attack detection; attack location;
smart grid
Citation: Hu, P.; Gao, W.; Li, Y.; Wu,
M.; Hua, F.; Qiao, L. Detection of
False Data Injection Attacks in Smart
Grids Based on Expectation
1. Introduction
Maximization. Sensors 2023, 23, 1683.
https://doi.org/10.3390/s23031683
The current power system is continuously monitored by an energy management
system (EMS), and a supervisory control and data acquisition (SCADA) system us used
Academic Editors: Naveen to maintain normal and secure operating conditions [1]. In particular, the SCADA system
Chilamkurti and Jong-Hyouk Lee
in the control center uses state estimators to process the received measurements. The
Received: 28 November 2022 estimator obtains the best estimate of the system’s state by filtering incorrect data. These
Revised: 5 January 2023 state estimates are then transmitted to all EMS to control the proper functioning of the
Accepted: 29 January 2023 physical aspects of the grid, such as the power flow calculation.
Published: 3 February 2023 The measurements collected by the SCADA system include not only measurement
noise due to the limited precision of sensors and communication medium, but also errors
due to various problems, such as connecting and calibrating a failed meter. To decrease the
effects of noise and error, power system researchers have developed many methods to deal
Copyright: © 2023 by the authors. with the measurements during state estimation [2,3]. The basic principle of these methods
Licensee MDPI, Basel, Switzerland. is to use the redundancy of multiple measurements to identify and eliminate anomalies.
This article is an open access article
Most of the technologies used to protect grid systems are designed to ensure system
distributed under the terms and
reliability, such as preventing random failures. However, more and more attention has been
conditions of the Creative Commons
paid to preventing malicious network attacks in the recent proposals for smart grids [4]. The
Attribution (CC BY) license (https://
operation and control of smart grids depend on the complex network space of computer,
creativecommons.org/licenses/by/
software and communication technology [5]. Since measurement components supported by
4.0/).
smart devices, such as smart instruments and sensors, play important roles in confirming
the real-time physical states of power systems, they are likely to be targets of attack. These
measuring devices widely use Internet-based protocols in communication systems, which
are open to external networks and lack of hardware to prevent tampering. In order to
promote data sharing, enterprise networks, and even individual users, are allowed to
connect to the infrastructure of power grid information [6]. Potential complex malicious
attacks increase after these network interfaces are introduced into power systems [7–10].
Liu et al. [11] indicated in 2009 that a new FDIA could bypass bad data detection (BDD)
in current SCADA systems and introduce any errors into state estimation without being
detected. Malicious covert data injection of network buses will inevitably have a negative
impact on power-system state estimation [12,13]. The injection of these malicious data that
deviates state estimates away from security values can directly result in serious social and
economic losses, and an attacker can utilize the FDIA to manipulate the electricity price of
the electric market [14–16], and this attack can even result in regional power shortages [17].
Du et al. [18] proposed a method to extract network parameters from the limited
data obtained by phasor measurement units (PMUs) when the network parameters are
unknown and then use these parameters to build an AC attack model, finally making the
state estimation deviate from the securely value. Most of the classical methods used to
construct the attack model focus on tampering measurements, such as the power injected
into the bus and the power flow between buses. Liu et al. [19] proposed a method to attack
network parameters which reduces the number of attack measurements by coordinating
the modifications of parameters and other measurements in the power system. The attack
method is still applicable in cases where the topology and line impedance of the network
are incomplete. Since it is unrealistic for an attack to modify network parameters directly.
Liu et al. [20] proposed a more universally applicable attack model. The concrete approach
is to tamper with network parameters indirectly by exploiting the vulnerabilities that exist
when the network parameters are incorrectly handled.
Several directions have been taken in the research of detecting FDIAs in smart grids.
Although these detection methods differ to varying degrees, they can be broadly classified
into two broad categories. Detection methods can be categorized as model-based detection
algorithms and data-driven detection algorithms. In response to the situation in which
network parameters are attacked, [21] proposed a way to detect network parameter attacks
based on the inconsistency of historical data and specified network parameters. However,
such methods are no longer applicable in detecting combinatorial attacks. Methods to
detect FDIAs using differences in the probability distributions of historical and current
measurement data may not be applicable any longer, such as assuming the attack vector
is a trapezoidal attack or that spurious data injected do not significantly deviate from the
historical trend [22–24]. In addition, such a detection method will easy cause false detection
when encountering actual events, such as sudden changes in the load or from the generator.
To deal with this situation, a method was proposed in [25] to detect FDIAs using the
difference in the residual probability distribution between historical measurement data and
that of current measurement data. This method still maintains good detection performance
when facing trapezoidal attacks and real events. Chen et al. [26] proposed a scheme to
detect data before state estimation by using vector autoregression model. This scheme
uses vector autoregressive model to predict and classifiers to detect, which improves the
detection rate based on the autoregressive model. Saleh et al. [27] proposed a detection
method to detect FDIAs that destroy the state estimation of PMUs. The phase lock value
(PLV) is used to judge whether the phase changes between buses are consistent. If the
phase change was no longer constant, the data for the PMU were considered to have been
manipulated; otherwise, data security at PMUs was considered. The above are several
model-based detection methods.
Unlike model-based detection algorithms for FDIAs, machine learning, as a data-
driven technique, implies a huge dependence on historical data of the system under
test. Yu et al. [28] proposed a false data injection attack detection method for AC state
Sensors 2023, 23, 1683 3 of 21
estimation. When FDIAs exist, their spatial and temporal data correlations may deviate
from the correlations under normal conditions. By using wavelet transforms and deep
neural networks to analyze the estimated states in continuous time, the proposed method
can effectively detect this inconsistency. Xun et al. [29] proposed an extreme learning
machine (ELM)-based one-class and one-network (OCON) framework for detecting FDIAs.
In this framework, the subnetwork of the state identification layer in OCON uses the
ELM algorithm to accurately classify false data and normal data. Almasabi et al. [30]
proposed a new method to detect FDIAs using moving average, correlation and machine
learning algorithms. The experiments showed that the proposed method is able to detect
the attacked PMUs and its timing issues with a high detection rate. Most existing machine-
learning-based detection methods generally assume that the labels of the training data are
known, which may not be consistent with common sense. Since real-life FDIAs are generally
considered as rare events, it may be challenging to obtain the identity of the compromised
data. An et al. [31] proposed the use of unsupervised integrated autoencoders connected
to a Gaussian mixture model (GMM) to accommodate multiple domains. Attention-based
potential representation and minimum error reconstruction features are utilized in the
hidden space of the integrated autoencoder. The expectation maximization (EM) algorithm
is used to estimate the sample density in the GMM. When the estimated sample density
exceeds the learning threshold obtained in the training phase, the sample is identified
as an outlier. Since the EM algorithm has the disadvantage of being sensitive to initial
values, excellent initialization parameters are required for the next iterative step of the
calculation. To deal with this challenge, we are required to develop an unsupervised
detection approach.
This paper proposes a detection and location method for the false data injection attacks
in smart grid. FDIAs threaten the management and control of grids by tampering with
the measurement data of the smart grid systems. In fact, the attacker adds an unknown
deviation to the measurement data of a system to launch an FDIA. Since the presence of
unknown attacks generates error bias, there are different characteristic parameters for the
measurement error contained by false data and that of normal data. Therefore, we used the
k-means++ algorithm and the expectation maximization (EM) algorithm to estimate the
corresponding parameters of the measured data to eliminate the data affected by the FDIA,
and finally achieved the purpose of attack detection. The main contributions of this paper
can be summarized as follows:
• Since the error models of both measurement vectors and state variables with false data
have the characteristics of the Gaussian mixture model (GMM), a false data injection
attack detection method based on the k-means++ and expectation maximization (EM)
algorithms is proposed.
• To address the fact that the k-means algorithm is sensitive to the initial clustering
centers and affects the convergence efficiency, the k-means++ algorithm is proposed to
determine the initial estimated parameters of the GMM in a faster iterative approach.
• The k-means++ algorithm is used to preprocess the data to solve the problem of EM
algorithm being sensitive to initial values. It also decreases the calculation complexity
of the EM algorithm, and finally detects and locates false data rapidly according to the
classification results.
2. System Model
For complex information processing of smart grid, it is necessary to generate corre-
sponding mathematical model according to network topology and data of distribution
network [32]. The general linear state equation of voltage and current phasors in the smart
grid distribution system is as follows [33]:
y = |{z}
Hx +e (1)
=z
Sensors 2023, 23, 1683 4 of 21
where y ∈ Cm is the original measurement vector of voltage and current phasor; z is the
noiseless measurement vector; x ∈ Cn is the vector describing the system state variable;
H ∈ Cm×n is the network topology matrix describing the vicinity of a given working
point; e ∈ Cm is the measurement error produced by the sensor, where each component
is modeled as an independent homodistributed and obeys a complex Gaussian random
variable with a zero mean and variance of σ2 .
Attackers use FDIAs to add attack vectors to the measurement vectors to corrupt the
measurements available to the operator. The actual measurements after being attacked are
y a = |{z}
Hx +e + a (2)
=z
where a ∈ Cm is the attack vector; y a ∈ Cm represents the measurement after being
attacked by false data injection.
With the rapid development of synchronous phasor measurement units (PMUs), a
smart grid can obtain impeccable phasor measurement values by arranging PMUs on
the terminal buses [34]. Using these measurements, the system state variable x can be
accurately estimated. However, due to the price factor of PMUs, the device cannot be
installed on all transmission buses of the power system, and can only cooperate with other
sensors to obtain system measurements. One of the attacks considered under this condition
is that during the stable operation of the power system, one of the N phasor measurements
in the measurement vector y is continuously attacked; that is, a component in the attack
vector a is not zero. In the subsequent measurement acquisition process, we determine
whether the phasor measurements are replaced with false data by K (K ≥ 1) measurement
vectors. To facilitate the calculation, the obtained measurement samples are converted from
complex representation to real coordinate representation, and then the actual obtained
component of the ith phase measurement of the kth measurement vector yk ∈ R N ×2 is
represented as
Y = Z+E (4)
where Y ∈ R NK ×2 , Z ∈ R NK ×2 and E ∈ R NK ×2 represent the original measurement,
actual measurement and measurement error obtained from K measurements for N phase
measurement units, respectively.
study, we used the method of processing the results of multiple measurements as a set of
data. Since interrelated measurement data are linked, the probability of false alarms can
be decreased by mathematically determining the relationship between the data. However,
the difficulty of this method is also in which calculation method should be used to quickly
determine the relationship between the data in the group. An inappropriate method is
likely to increase the workload of the detection system and decrease the detection efficiency.
3. Attack Detection
3.1. Maximum Likelihood Estimation
When all measurements Y are considered as a whole, the corresponding measure-
ment error samples E can be seen as coming from two clusters—one with MK correct
phasor measurement samples and the other with ( N − M )K attacked tampered phasor
measurement samples. Without testing, it is impossible to determine which samples of
measurements have been tampered with by FDIAs. The probability distribution of the
measurement error e for each measurement y according to the assumed statistics can be
represented by a Gaussian mixture model (GMM):
2
∑ αl pe
(l )
p(e; θ) = (e; µl , Σl ) (8)
l =1
With unobserved data γi,k,l , the complete data are (ei,k , γi,k,1 , γi,k,2 ). More specifically,
if ei,k is the measurement error of the security data, then ei,k belongs to the first mixture com-
(1)
ponent pe (e; µ1 , Σ1 ) of the Gaussian mixture model, and its complete data are (ei,k , 1, 0). If
ei,k is the measurement error of the false data, then ei,k belongs to the other components of
the Gaussian mixture model, denoted as (ei,k , 0, 1). The log-likelihood function for complete
data is
LC (θ;
E, γ) = ln[ p( E, γ; θ)]
N K
= ln ∏ ∏ p(ei,k , γi,k,1 , γi,k,2 ; θ)
(i=1 k=1 )
N K 2 h (l )
iγi,k,l (13)
= ln ∏ ∏ ∏ αl pe (ei,k ; µl , Σl )
j =1 k =1 l =1
N K 2 h
(l )
i
= ∑ ∑ ∑ γi,k,l ln αl pe (yi,k − zi,k ; µl , Σl )
i =1 k =1 l =1
D (ei,k )2
pc (ei,k )= (14)
N K
2
∑ ∑ D (ei,k )
i =1 k =1
(0)
Finally, the second initial cluster center c2 is selected according to the roulette wheel selection.
Sensors 2023, 23, 1683 7 of 21
The second step is to assign the dataset. Assign each sample of the dataset to the
appropriate cluster center according to the principle of minimum Euclidean distance.
(n)
(
(n) 1, l = arg min ei,k − cl
γi,k,l = l (15)
0, otherwise
(n)
where (15) indicates that ei,k belongs to the cl -centered clustering domain.
The third step is to update the clustering centers. At the (n + 1)th iteration, the cluster
centers of the dataset are recalculated based on the hidden variable γ(n+1) . The newly
calculated cluster centers are then used as the center of mass of the samples belonging to
that category.
∑ ei,k
(n)
( n +1) γi,k,l =1
cl = (16)
N K (n)
∑ ∑ γi,k,l
i =1 k =1
3.3. EM Algorithm
The idea of EM algorithm is to estimate unknown parameters through two iterations:
an expectation (E) step and a maximization (M) step. In the first step (E-step), the con-
ditional expectation of the log-likelihood function for complete data is calculated based
on the conditional probability of the hidden variable. In the second step (M-step), the
conditional expectation obtained by the E-step is maximized for the desired parameters.
Using the estimated parameter θ obtained with the k-means++ algorithm, we proposed the
workflow of the EM algorithm for the (η + 1)th iteration thereafter.
Step 1 (E-step): The conditional expectation for defining the log-likelihood function of
complete data is as follows:
n o
Q θ, θ(η ) = E ln[ p( E, γ; θ)]; E, θ(η )
= ∑ ln[ p( E, γ; θ)] Pr γ| E; θ(η )
γ (17)
= ∑ ln[ p( E, γ; θ)]γ̂i,k,l
(η )
γ
(η ) (η ) (η )
where γ̂i,k,l is a shorthand form of the conditional probability Pr γi,k,l = 1| E; θ(η ) . γ̂i,k,l
denotes the probability that observed data ei,k come from the lth Gaussian sub-model under
the current model parameters, called the responsiveness of sub-model l to observed data
(η )
ei,k . γ̂i,k,l can be calculated from the Bayesian rule of Equation (18).
(η ) (l ) (η ) (η )
(η )
(η )
α l p e e i,k ; µ l , Σ l
γ̂i,k,l = Pr γi,k,l = 1| E; θ(η ) = (18)
2 (η ) (l )
(η ) (η )
∑ αl pe ei,k ; µl , Σl
l =1
Step 2 (M-step): The maximum of function Q θ, θ(η ) is obtained from Equation (18)
with θ as the vector parameter. The result of the (η + 1)th iteration is
θ(η +1) = arg max Q θ, θ(η ) (19)
θ
Sensors 2023, 23, 1683 8 of 21
4. Algorithm Implementation
The probability density function (PDF) of random variables in measurement error E is
In order to maximize the GMM with parameter Λ(η ) (θ), we can solve
" !#
2
∂
Λ (θ) + λ ∑ αl − 1
(η )
=0 (22)
∂αl l
∂ h (η ) i
Λ (θ) =0 (23)
∂µl
∂ h (η ) i
Λ (θ) = 0 (24)
∂Σl
( η ) ( η ) ( η +1) ( η ) ( η +1) ( η ) T
h i
where λ in (22) is a Lagrange multiplier. In (24), θ= α1 , α2 , µ1 , Σ1 , µ2 , Σ2 .
Meanwhile, the solutions of the equations are all in closed form, and the result is
N K (η )
∑ ∑ γ̂i,k,l
( η +1) i =1 k =1
αl = (25)
NK
N K (η )
∑ ∑ ei,k γ̂i,k,l
( η +1) i =1 k =1
µl = N K
(26)
(η )
∑ ∑ γ̂i,k,l
i =1 k =1
N K
( η +1) T
( η +1)
(η )
∑ ∑ ei,k − µl ei,k − µl γ̂i,k,l
( η +1) i =1 k =1
Σl = (27)
N K (η )
∑ ∑ γ̂i,k,l
i =1 k =1
The above calculations are repeated until the log-likelihood function value no longer
( η +1)
changes significantly. By rounding the final data γ̂i,k,l of the hidden variable, we obtain
the complete data set { E, γ} and the vector parameter θ of the GMM.
Thus, the pseudo-algorithm of the joint use of k-means++ algorithm and EM algorithm
for parameter estimation of GMM is shown in Algorithm 1.
Sensors 2023, 23, 1683 9 of 21
5. Algorithm Analysis
5.1. Convergence Analysis
The essence of using k-means++ algorithm to calculate new clustering centers is to
minimize the sum of squared error (SSE) function:
( n +1) 2
∑
( n +1)
J cl = ei,k − cl (28)
( n +1)
γi,k,l =1
As can be found from the algorithm, SSE is a rigorous coordinate descent procedure.
Selecting the mean of the current clustering as the new clustering center ensures that SSE
will be decreased at each iteration.
( n +1) (n)
J cl ≤ J cl (29)
Since SSE is monotonically decreasing and has a lower bound, the optimal solution cl
that converges SSE to the minimum can finally be obtained.
For any Gaussian distribution parameter vector θ(η ) in the EM algorithm’s parame-
( η +1) ( η +1) ( η +1) ( η +1) ( η +1) ( η +1)
ter space, updating α1 , α2 , µ1 , Σ1 , µ2 , Σ2 is easily verified via the
following relationship [37,38]:
Q θ(η +1) , θ(η ) ≥ Q θ(η ) , θ(η ) (30)
Based on the monotonicity of the log-likelihood function Q θ, θ(η ) for complete data
and the boundedness of p( E; θ) in the EM algorithm, it can be proved that the proposed
EM algorithm converges to a stationary point L∗I of the log-likelihood function L I (θ; E) for
incomplete data.
Sensors 2023, 23, 1683 10 of 21
requires 2(( NK + 4)ε mul + (2NK + 1)ε sub + ( NK + 1)ε div + ( NK + 1)ε add + 2NKε pow +
(η )
NKε exp + 1ε sqrt ) FLOPs. Equation (18) requires NK (1ε add + 1ε div + 1ε sub ) FLOPs. With γ̂i,k,l
, we can calculate the Equations (25)–(27), which require ( NK − 1)ε add + 1ε div + 1ε sub FLOPs,
2(2( NK − 1)ε add + 2NKε mul + 2ε div ) FLOPs and 2(2( NK − 1)ε add + 2NKε sub + 2NKε pow +
2NKε mul + 2ε div ) FLOPs, respectively. We define FL(θ) as the FLOPs required to estimate
θ during each EM algorithm iteration.
k EM
FL ≈ Nitr [ FL(c)] + Nitr [ FL(θ)] (34)
6. Simulation Analysis
To verify the feasibility of the proposed algorithm, the simulation in this paper was
performed with IEEE 5-bus standard test system and IEEE 14-bus standard test system.
The MATLAB R2018b software was used for simulation, and the related data in the MAT-
POWER 7.1 power simulation package were used for routine power flow calculation. The
final operating data were used as the measurement data for the power system. The at-
Sensors 2023, 23, 1683 11 of 21
tack vector was injected into the system first, and then the k-means++ algorithm and EM
algorithm were jointly used to verify the feasibility of this detection method.
Parameter Value
N 6
K 100
µ1 [0 0]
µ2 [0.03 0.03]
σ 0.01
∆ 10−6
max
Nitr 100
Figure 1. The actual distribution of phase measurement errors after injecting false data.
Sensors 2023, 23, 1683 12 of 21
Figure 5 shows that the sum of squared errors of the model gradually flattens out as
the number of iterations monotonically changes when using the k-means++ algorithm for
simulation. Figure 6 shows that with the EM algorithm, the logarithmic likelihood function
values of the model gradually flatten out as the number of iterations monotonically changes.
The simulation results show that both algorithms can take little time to achieve convergence.
0.145
0.14
Sum of squared error
0.135
0.13
0.125
0.12
0.115
1 2 3 4 5 6 7 8
Number of iterations
Figure 5. The change in the sum of the squared errors under the k-means++ algorithm.
3561.5
Value of log-likelihood
3561
3560.5
3560
1 2 3 4 5 6 7 8
Number of iterations
Figure 6. The change in the log-likelihood function value under the EM algorithm.
The simulation result shows in Figure 7 that the detected false data come from the
branches I1−2 between measurement buses 1 and 2. There was one misdetected measure-
ment datum each in branch I1−5 and branch I4−5 .
120
80
60
40
20
0 1 0 0 1
0
I 1-2 I 1-4 I 1-5 I 2-3 I 3-4 I 4-5
Branches
Figure 7. Localization of false data.
For a changing number of measurement buses injected with false data, the average
error change of vector parameter θ=[α1 , α2 , µ1 , Σ1 , µ2 , Σ2 ]T in GMM obtained by the detec-
tion method in this paper is shown in Figures 8–10. It can be seen that as the false data
Sensors 2023, 23, 1683 14 of 21
2
Error 5
4.8
4.6
4.4
1 2 3 4 5
Number of attacked buses
Figure 8. The error variation of the parameter α while the number of attacked buses varies.
10 -3
1.5
1
2
Error
0.5
1 2 3 4 5
Number of attacked buses
Figure 9. The error variation of the parameter µ while the number of attacked buses varies.
10 -4
10
1
9
2
8
7
Error
3
1 2 3 4 5
Number of attacked buses
Figure 10. The error variation of the parameter Σ while the number of attacked buses varies.
As the proportion of false data in the overall data increases, the probabilities of false
data detection, missed detection and false detection by this algorithm change, as shown in
Figure 11. It can be seen that the detection rate of the algorithm for false data is basically
above 95%, and the detection probability can be further improved to above 99% as the
amount of false data increases; thus, the probabilities of false detection and missed detection
are normally below 1%.
Sensors 2023, 23, 1683 15 of 21
120
Detection Miss detection False detection
100
Probability (%)
80
60
40
20
0
0 1 2 3 4 5 6
Number of attacked buses
Figure 11. Probability of false data detection.
In order to further verify the rapidity of the algorithm proposed in this paper for detecting
false data injection attacks, we have conducted 1000 repeated experiments. The simulated time
statistic histogram and normal distribution curve obtained after 1000 repetitions of simulation
experiments are shown in Figure 12. From the normal distribution curve in the graph, it can
be seen that the algorithm can basically detect false data in 0.011883 s.
Figure 12. The simulation time statistics of 1000 repeated experiments and their normal distribution.
To verify the feasibility of the proposed algorithm, it was further tested in the IEEE
14-bus standard test system. The measurement errors of active and reactive power of the bus
and transmission lines and the errors after being attacked by false data injection are shown in
Table 3. The validity of the method was verified by injecting false data into arbitrarily selected
measurement units. One thousand sets of quantitative measurement vectors with false data
were generated as experimental data according to the Monte Carlo method.
Table 3. The measurement error before and after the power system was attacked.
The attack vector injected in this paper against the IEEE 14-bus system was
a = [∆P3 , ∆Q2 , ∆Q3 , ∆P1−2 , ∆P2−3 , ∆P4−2 , ∆Q1−2 , ∆Q2−3 , ∆Q4−2 ]T (35)
Sensors 2023, 23, 1683 16 of 21
Firstly, the measurement errors were used to detect FDIAs. The measurement errors
obtained by Monte Carlo method for 1000 instances of normal data were transformed into
samples that conformed to the standard normal distribution model, and the measurement
error data obtained are shown in Figure 13. All the data conform to the model of standard
normal distribution, and the measurement errors of the sample data are not shifted.
The results of the measurement error after injecting false data are shown in Figure 14.
It can be seen in the figure that the FDIAs with Equation (35) as the attack vector made the
degree of offset of the measurement error more significant. The results of clustering the
measurement errors after the false data injection attack by the k-means++ algorithm are
shown in Figure 15.
The data preprocessed using the k-means++ algorithm were further iteratively cal-
culated using the EM algorithm. The final PDF image of the GMM of the measurement
error was obtained as shown in Figure 16. The results of classifying the sample data of
1000 measurement vectors according to the fitted GMM are shown in Figure 17. From the
figure, it can be seen that there is no influence of bias in the normal measurement data, so
its error distribution is basically around zero. The data with error deviations were removed
and classified by classifying the sample data. It is known that the power measurement
data of P3 , Q2 , Q3 , P1−2 , P2−3 , Q2−3 , Q1−2 and Q4−2 in the power system were tampered
with by the attacker through FDIAs. The detection of false data in the measurement data
using the algorithm of this paper is shown in Figure 18. A small number of data were
identified as normal data because the data in measurement units P3 , Q3 , P1−2 and P2−3 are
more similar to the normal data.
Secondly, we detected FDIAs from the perspective of the results of state estimation.
When not under attack, 100 sets were randomly selected from the 1000 sets of measurement
data for state estimation. The errors of their state estimation results were transformed into
samples that conformed to the model of standard normal distribution, and the obtained
estimation errors are shown in Figure 19. All data conform to the model with a standard
normal distribution, and none of the sample data are biased by the measurement errors.
The results of its measurement error after injecting false data are shown in Figure 20.
From the figure, it can be seen that the voltage amplitude and phase angle of the state
estimate of some buses are significantly shifted.
The data preprocessed by the k-means++ algorithm were further iteratively calculated
using the EM algorithm, and the final PDF image of the state estimation error conforming
to the GMM is shown in Figure 21. The results of classifying the sample data of 100 state
variables according to the fitted GMM are shown in Figure 22. From the figure, it can be
seen that the data with error deviations were removed and classified by classifying the
sample data. The errors of voltage magnitude and phase angle of bus 1 and buses 4–14
are around zero, and their deviations are very small, so they basically have no impact
on the power system. The results of the state estimation of bus 3 are mainly the offset of
voltage amplitude, which has a mild impact on the power system. The results of the state
estimation of bus 2 show large shifts in voltage magnitude and phase angle, indicating that
bus 2 was the main target of the FDIAs. The detection of false data in the measurement
vector using the algorithm proposed in this paper is shown in Figure 23.
Sensors 2023, 23, 1683 19 of 21
7. Conclusions
Considering that false data injection attacks can disrupt the secure operation of smart
grids, we proposed a method to detect and locate false data injection attacks in power systems
using statistical learning. By combining the k-means++ algorithm with the EM algorithm, it is
possible to accurately model the smart grid bus measurement data within 0.011883s. At the
same time, the GMM containing the characteristic parameters of data measurement errors can
be obtained. Numerical examples showed that the mathematical model obtained by this joint
algorithm provides a detection probability of more than 95% for false data, and can accurately
locate the measured buses that are tampered with by FDIAs.
Subsequent research can provide the best choice of GMM with different models by
combining the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC),
Silhouette Coefficient (SC), Calinski–Harbasz (CH) score and other methods, so as to build
a more perfect model to improve the algorithm in this paper.
Sensors 2023, 23, 1683 20 of 21
Author Contributions: Conceptualization, P.H. and M.W.; methodology, Y.L. and W.G.; software, P.H.
and F.H.; validation, W.G.; formal analysis, L.Q.; resources, W.G.; data curation, L.Q.; writing—original
draft preparation, P.H.; writing—review and editing, W.G. and F.H.; visualization, P.H. and M.W.;
supervision, W.G.; project administration, Y.L.; funding acquisition, W.G. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (U21A20146),
Natural Science Foundation of AnHui Province (1908085MF215) and Key Research and Development
Project of Anhui Province (201904a05020007).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: We thank the anonymous reviewers for their valuable comments.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Abur, A.; Exposito, A.G. Power System State Estimation: Theory and Implementation; CRC Press: Boca Raton, FL, USA, 2004.
2. Monticelli, A.; Wu,F.F.; Yen,M. Mutiple bad data identwication for state estimation by combinatorial oftimization. IEEE Trans.
Power Deliv. 1986, 1, 361–369. [CrossRef]
3. Granelli, G.P.; Montagna, M. Identification of interacting bad data in the framework of the weighted least square method. Electr.
Power Syst. Res. 2008, 78, 806–814. [CrossRef]
4. Harvey, M.; Long, D.; Reinhard, K. Visualizing nistir 7628, guidelines for smart grid cyber security. In Proceedings of the 2014
Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 28 February–1 March 2014; pp. 1–8. [CrossRef]
5. Zanero, S. When cyber got real: Challenges in securing cyber-physical systems. In Proceedings of the 2018 IEEE Sensors, New
Delhi, India, 28–31 October 2018; pp. 1–4. [CrossRef]
6. Ten, C.W.; Liu, C.C.; Manimaran, G. Vulnerability assessment of cybersecurity for SCADA systems. IEEE Trans. Power Syst. 2008,
23, 1836–1846. https://ieeexplore.ieee.org/document/4652578. [CrossRef]
7. Khurana, H.; Hadley, M.; Lu, N.; Frincke, D.A. Smart-grid security issues. IEEE Secur. Priv. 2010, 8, 81–85.
MSP.2010.49. [CrossRef]
8. Mo, Y.; Kim, H. J.; Brancik, K.; Dickinson, D.; Lee, H.; Perrig, A.; Sinopoli, B. Cyber–physical security of a smart grid infrastructure.
Proc. IEEE 2012, 100, 195–209. [CrossRef]
9. Teixeira, A.; Amin, S.; Sandberg, H.; Johansson, K.H.; Sastry, S.S. Cyber security analysis of state estimators in electric power
systems. In Proceedings of the 49th IEEE Conference on Decision and Control (CDC), Atlanta, GA, USA, 15–17 December 2010;
pp. 5991–5998. [CrossRef]
10. Metke, A.R.; Ekl, R.L. Smart grid security technology. In Proceedings of the 2010 Innovative Smart Grid Technologies (ISGT),
Gaithersburg, MD, USA, 19–21 January 2010; pp. 1–7. [CrossRef]
11. Liu, Y.; Reiter, M.K.; Ning, P. False data injection attacks against state estimation in electric power grids. In Proceedings of the
2009 ACM Conference on Computer and Communications Security (CCS), Chicago, IL, USA, 9–13 November 2009; pp. 1–33.
[CrossRef]
12. Xie, B.; Peng, C.; Zhang, H.; Yang, M. Power system state estimation based on network attack node credibility. Chin. J. Sci. Instrum.
2018, 39, 157–166. [CrossRef]
13. Ahmadi, N.; Chakhchoukh, Y.; Ishii, H. Power systems decomposition for robustifying state estimation under cyber attacks. IEEE
Trans. Power Syst. 2021, 36, 1922–1933. [CrossRef]
14. Jia, L.; Thomas, R.J.; Tong, L. Impacts of malicious data on real-time price of electricity market operations. In Proceedings of the
Hawaii International Conference on System Sciences, Maui, HI, USA, 4–7 January 2012; pp. 1907–1914. [CrossRef]
15. Xie, L.; Mo, Y.; Sinopoli, B. Integrity data attacks in power market operations. IEEE Trans. Smart Grid 2011, 2, 659–666. [CrossRef]
16. Choi, D.H.; Xie, L. Malicious ramp-induced temporal data attack in power market with look-ahead dispatch. In Proceedings of
the 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm), Tainan, Taiwan, 5–8 November
2012; pp. 330–335. [CrossRef]
17. Yuan, Y.; Li, Z.; Ren, K. Modeling load redistribution attacks in power systems. IEEE Trans. Smart Grid 2011, 2, 382–390. [CrossRef]
18. Du, M.; Pierrou, G.; Wang, X.; Kassouf, M. Targeted false data injection attacks against AC state estimation without network
parameters. IEEE Trans. Smart Grid 2021, 12, 5349–5361. [CrossRef]
19. Liu, C.; Liang, H.; Chen, T. Network parameter coordinated false data injection attacks against power system AC state estimation.
IEEE Trans. Smart Grid 2021, 12, 1626–1639. [CrossRef]
20. Liu, C.; He, W.; Deng, R.; Tian, Y.C.; Du, W. False data injection enabled network parameter modifications in power systems:
Attack and detection. IEEE Trans. Ind. Inform. 2022, 19, 177–188. [CrossRef]
Sensors 2023, 23, 1683 21 of 21
21. Molzahn, D.K.; Wang, J. Detection and characterization of intrusions to network parameter data in electric power systems. IEEE
Trans. Smart Grid 2019, 10, 3919–3928. [CrossRef]
22. Chaojun, G.; Jirutitijaroen, P.; Motani, M. Detecting false data injection attacks in AC state estimation. IEEE Trans. Smart Grid 2015,
6, 2476–2483. [CrossRef]
23. Singh, S.K.; Khanna, K.; Bose, R.; Panigrahi, B.K.; Joshi, A. Joint-transformation-based detection of false data injection attacks in
smart grid. IEEE Trans. Ind. Inform. 2018, 14, 89–97. [CrossRef]
24. Li, B.; Ding, T.; Huang, C.; Zhao, J.; Yang, Y.; Chen, Y. Detecting false data injection attacks against power system state estimation
with fast go-decomposition approach. IEEE Trans. Ind. Inform. 2019, 15, 2892–2904. [CrossRef]
25. Cheng, G.; Lin, Y.; Zhao, J.; Yan, J. A highly discriminative detector against false data injection attacks in AC state estimation.
IEEE Trans. Smart Grid 2022, 13, 2318–2330. [CrossRef]
26. Chen, Y.; Hayawi, K.; Zhao, Q.; Mou, J.; Yang, L.; Tang, J.; Li, Q.; Wen, H. Vector auto-regression-based false data injection attack
detection method in edge computing environment. Sensors 2022, 22, 6789. [CrossRef]
27. Almasabi, S.; Alsuwian, T.; Javed, E.; Irfan, M.; Jalalah, M.; Aljafari, B.; Harraz, F.A. A novel technique to detect false data injection
attacks on phasor measurement units. Sensors 2021, 21, 5791. [CrossRef]
28. Yu, J.Q.; Hou, Y.; Li, V. Online False Data Injection Attack Detection with Wavelet Transform and Deep Neural Networks. IEEE
Trans. Ind. Inform. 2018, 14, 3271–3280.. [CrossRef]
29. Xue, D.; Jing, X.; Liu, H. Detection of False Data Injection Attacks in Smart Grid Utilizing ELM-Based OCON Framework. IEEE
Access 2019, 7, 31762–31773.. [CrossRef]
30. Almasabi, S.; Alsuwian, T.; Awais, M.; Irfan, M.; Jalalah, M.; Aljafari, B.; Harraz, F.A. False Data Injection Detection for Phasor
Measurement Units. Sensors 2022, 22, 3146. [CrossRef] [PubMed]
31. An, P.; Wang Z.; Zhang, C. Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection. Inf.
Process. Manag. Libr. Inf. Retr. Syst. Commun. Netw. Int. J. 2022, 59, 102844.. [CrossRef]
32. Sheng, T.; Wu, W.; Sun, H.; Wang, Z.; Sun, Q.; Ma, J. A fully distributed topology identification approach for active distribution
network based on multi-agent framework. In Proceedings of the 2018 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia),
Singapore, 22–25 May 2018; pp. 435–440. [CrossRef]
33. Chen, J.C.; Chung, H.M.; Wen, C.K.; Li, W.T.; Teng, J.H. State estimation in smart distribution system with low-precision
measurements. IEEE Access 2017, 5, 22713–22723. [CrossRef]
34. Jiang, J.; Qian, Y. Defense mechanisms against data injection attacks in smart grid networks. IEEE Commun. Mag. 2017, 55, 76–82.
[CrossRef]
35. Sheng, J.; Liu, D. An improved maximum likelihood approach to image reconstruction using ordered subsets and data subdivi-
sions. IEEE Trans. Nucl. Sci. 2004, 51, 130–135.. [CrossRef]
36. Duan, X.; Sun, G.; Tao, Y. Moving target detection based on genetic k-means algorithm. In Proceedings of the 2011 IEEE 13th
International Conference on Communication Technology, Jinan, China, 25–28 September 2011; pp. 819–822. [CrossRef]
37. Watanabe, M.; Yamaguchi, K. The EM Algorithm and Related Statistical Models; CRC Press: Boca Raton, FL, USA, 2003. [CrossRef]
38. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.