Bad_data_detection_and_identification_in_power_system_state_estimation_with_network_parameters_uncertainty
Bad_data_detection_and_identification_in_power_system_state_estimation_with_network_parameters_uncertainty
%(,
1RYHPEHU
,UDQ8QLYHUVLW\RI6FLHQFHDQG7HFKQRORJ\ ±7HKUDQ,UDQ
Baddatadetectionandidentificationinpowersystem
stateestimationwithnetworkparametersuncertainty
*DEULHOH'¶$QWRQDDQG/XFD3HUIHWWR
'HSDUWPHQWRI(QHUJ\
3ROLWHFQLFRGL0LODQR
Milan, Italy
JDEULHOHGDQWRQDOXFDSHUIHWWR`#SROLPLLW`
Abstract—The State Estimation (SE) problem in electric power not ever true. In fact, due to different causes, such as loading
systems consists of three main functions: estimation, bad data and environmental conditions or inaccurate manufacturing data,
detection and identification.D ’Antonaf ormalizedt heestimation it is important to consider also network parameter accuracy as
procedure considering the contribution of both the measurement
andthenetworkparameterstouncertainty,inthesocalledextended a further and higher source of uncertainty. Therefore, due to an
SE. This paper presents an investigation of the effectiveness of data inconsistency of adopted measurement model and to the high
detection and identificationi nt hee xtendedS E.S omer esultso na interactions among all errors, SE results can be corrupted and then
simple three buses network are given as a test case of the proposed
a redefinition is needed, in order to provide a new mathematical
approach.
Keywords —State estimation, network parameters uncertainty, approach to SE.
bad data detection and identification.
The most direct way to find the solution, consists of a pro-
I. I NTRODUCTION jection of network parameter errors into the measurement errors,
Nowadaysstateestimationrepresentsoneofthemostimportant which increase the standard uncertainty of the data. This means
applicationsforany EnergyManagement Systems(EMS) whose dealing with the evaluation of the ”new” measurement variance-
performance depends on the accuracy of available data. The covariance matrix and with the resolution of WLS. In add, due
solution approach of the traditional SE, is performed assuming to the network parameter uncertainty, detection and identification,
perfectlyknownnetworktopology(breakerandtapchangerposi- as briefly explained above for traditional SE, are not generally
tions,etc.)andnetworkparameters(transmissionlineresistances, effective, due to the high interactions among measurements and
inductances and shunt capacitances). Other important functions parameters errors. In particular for the identification, different
of a SE are detection and identificationa nalysis:exploiting methods are proposed: to identify group of doubtful measure-
some statistical properties of measurement residuals ([1], [2]), ments through an elimination processing ([3], [4]); to estimate
the firsto nei sr elatedt oe valuatet hep resenceo fb add atain network parameters, based on measurement residuals sensitivity
measurements;thesecondoneisrelatedtodirectlyidentifywhich analysis or an augmented state vector using normal equations or
is the bad data. *O MJUFSBUVSF GPS UIF USBEJUJPOBM 4& TPMVUJPO JT Kalman filter theory [5].
PCUBJOFEVTJOH8FJHIUFE-FBTU4RVBSFT 8-4 *OBEE BTTVNJOH
UIF NFBTVSFNFOUT SFTJEVBMT BT (BVTTJBO EJTUSJCVUFE EFUFDUJPO In this paper the idea is pretty different: based on [6], it is used
BOEJEFOUJGJDBUJPOBSFQFSGPSNFEUISPVHITPNFUFTUTPOUIFMPTT a new estimator proposed in [7], that using a re-formalisation
GVODUJPO BOE NFBTVSFNFOUT SFTJEVBMT WBMVFT SFTQFDUJWFMZ of WLS solution, allows to correctly estimate the vector state
)ZQPUIFTJT5FTUJOH*EFOUJGJDBUJPO)5* and in a certain sense, to split measurement and parameter
errors, providing also an estimation of them using measurement
residuals. Therefore, after a review of useful mathematics on
"MM BCPWF GVODUJPOT BSF QFSGPSNFE BTTVNJOH NFBTVSFNFOU measurement model definition and regression method in section
BDDVSBDZBTUIFPOMZTPVSDFPGVODFSUBJOUZCVUBDUVBMMZUIJTJT II, an identification approach is proposed using Principal Com-
ponent Analysis (PCA) in section III, useful to reduce problem
,(((
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.
dimensions and to cluster available data (measurements and Therefore, it is possible to formalize the SE problem:
parameters). Some numerical tests are performed and results are
shown in section V. Conclusions are given in section VI. J(r) = r T Rr −1 r (6)
II. M ATHEMATICAL REVIEW
This means that the vector state x is estimated solving the
A. WLS minimisation problem imposing the minimum norm to the term
Assuming perfectly known the network topology, the most r.
common true measurement model is:
B. Extension of WLS
In [7] a new mathematical approach for the extended SE is
zT = h(xT , π) (1) proposed. It is based on the formalization of the SE procedure
as the following constrained optimization problem:
where
• zT is the m × 1 true measurement vector; J(Δd) = ΔdT Rd −1 Δd
(7)
• h is the m × 1 nonlinear function vector of xT determined st: AΔd = z − h(xT , π)
by nodal or branch parameters and Kirchoff’s law;
with A defined as the m × (m + p) matrix:
• xT is the n × 1 true state vector;
∂h
A= I − (8)
• π is the p × 1 nominal parameter vector. ∂π T
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.
III. BAD DATA DETECTION AND IDENTIFICATION which have the k × k clusters variance-covariance matrix is Rc
that actually correspond to a k × k sub-matrix of S.
A. Detection Each normalized cluster absolute value is:
Due to the problem definition, assuming the measurement
ci
residuals as approximately Gaussian distributed and in absence cN,i = (17)
of bad data, the loss function results as χ2 distributed with Rcii
k = m−n freedom degrees, where m and n are the measurement
and state numbers respectively. It is possible to suspect about where Rcii is the generic diagonal element of clusters variance-
presence of bad data when the its value exceed a fixed threshold covariance matrix.
C relating to a level of confidence [1].
After all this useful definitions, the analysis consists of the
Hence, considering these two different hypothesis: identification of the largest normalized cluster absolute value, that
in such a way is not still conform to its standard deviation. In
• H0 (null hypothesis): there are not bad data; this way, it is possible to identify a doubtful group of variables
• H1 (alternative hypothesis): there are bad data. (data residuals that refer to measurements or parameters) that can
contain a bad data.
A way to perform the test can be:
IV. DATA R ESIDUALS OBSERVABILITY
• if J(r̃) > C, reject H0 ;
• if J(r̃) ≤ C, accept H0 ; Traditionally PCA provides the most meaningful space basis
to filter out the noise and reveals main pattern in a data set,
B. Identification in order to re-express it. In the case of SE, the application of
PCA carries out the PCs that contain all the variable explained
The nature of the problem make the direct visualisation of bad variabilities: in fact, due to the low rank of analysed matrix,
data more difficult. In this direction it is necessary to exploit only the first k elements of diagonal matrix S are not equal to
an algebraical and statistical tool that is the Principal Component zero and contain all the needed information. This implies that the
Analysis (PCA). It is a quantitatively rigorous method to approach maximum definable cluster number is equal to the measurement
the bad data analysis in a multivariate structure: based on a Single redundancy, that becomes the key factor for the data residuals
Value Decomposition (SVD), it generates a new set of variables, observability, intended as detectability and identifiability. In other
called Principal Component (PCs), that are linear combinations of words, this means that it is very difficult to correctly perform the
original variables and that are uncorrelated to each other having bad data analysis in presence of (m + p) variables with only k
a specified variance that corresponds to the eigenvalues sizes of freedom degrees.
analysed matrix ([9], [11]).
In add, for each variable, it is important the loading score:
Following the mathematics definition of above section, let it represents the correlation coefficient between the variable and
apply the PCA on data residuals variance-covariance matrix and each PC; in practise, the squared of this score is the percent of
decompose this matrix as: variable variance explained in each PC. Therefore, those scores
equal to zero (or approximately), are not able to show the latent
RΔd˜ = U SV T (15)
nature of multivariate data. In particular, when all extracted scores
where related to a single variable are zero (all a row in matrix D), a
bad data in that variable is not neither detectable nor identifiable.
• U ,V : (m + p) × (m + p) matrices, containing in their
columns, respectively, the left and right eigenvectors of
analysed matrix that represent the PCs loading scores; V. C ASE STUDIES
• S: (m + p) × (m + p) matrix, containing only diagonal
In order to analyse the theoretical background explained in
elements that represent the eigenvalues of analysed matrix.
previous sections, is considered the following simple three buses
Let define the k most important eigenvalues and then the (m + power network and its related parameter values (shunt parameters
p)×k matrix D as a sub-matrix of U . Therefore, the k×1 cluster are not considered for the sake of simplicity):
variable vector as:
c = D T Δd˜ (16)
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.
• series network parameter uncertainties:
T
σΔz = 10−2 1, 1, 1, 1, 1, 1, 1, 1, 1
-20
T
Δπ = Δg12 , Δg13 , Δg23 , Δb12 , Δb13 , Δb23 -30
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.
Note that the semi-logarithmic scale is used to better visualise than their theoretical uncertainties:
the results. Values in horizontal part are not equal to zero due to
numerical simulations in Matlab. ⎡ ⎤
0.0000
⎢ -0.0715 ⎥ ⎡ ⎤
⎢ ⎥ 0.0013
⎢ -0.0689 ⎥
Extracted PCs: ⎢ ⎥ ⎢ 0.4616 ⎥
⎢ 0.0712 ⎥ ⎢ ⎥
⎢ ⎥ ⎢−0.0161⎥
⎢ ⎥ ⎢
Δz̃ = ⎢ 0.0005 ⎥ Δπ̃ = ⎢ ⎥
⎥
TABLE I ⎢−0.0022⎥ ⎢ 0.0100 ⎥
E XTRACTED PC S ⎢ ⎥ ⎣−1.1584⎦
⎢−0.0055⎥
⎢ ⎥ 0.0703
Δd P C1 P C2 P C3 P C4 ⎣ 0.0007 ⎦
ΔP1 -0.0000 0.0002 -0.0000 0.0000
ΔP2 -0.0000 -0.0012 -0.0222 0.5810 −0.0034
ΔP12 0.0001 -0.0014 0.0425 0.5680
ΔP23 0.0001 -0.0007 0.0216 -0.5809
ΔQ1 0.0000 0.0027 -0.0001 0.0000
ΔQ2 0.0004 -0.0013 -0.2224 -0.0021 Identification: it is possible to understand that there is
ΔQ12 0.0003 -0.0203 -0.2204 -0.0021 something wrong in the three marked measurement residuals,
ΔQ23 -0.0005 -0.0063 0.2182 0.0030 which are related to P2 , P12 and P23 . In fact, the loss function
ΔV3 0.0002 -0.0013 -0.0774 0.0155
Δg12 -0.0004 0.0010 -0.1346 -0.0270 is J(r̃)=154.86 (>9.49) and this suggests to suspect about the
Δg13 0.2230 0.9078 -0.0333 0.0136 presence of a bad data. Then, evaluating the variance-covariance
Δg23 0.0013 -0.0801 -0.0377 0.0070
Δb12 0.0029 -0.0385 0.9082 0.0039
Δb13 -0.9748 0.2056 -0.0060 0.0033 matrix RΔd˜, it is possible to extract the four PCs, in order to
Δb23 -0.0051 0.3538 0.1709 -0.0316
identify which is the suspected cluster that can contain the bad
data. Therefore, computing the normalized cluster absolute value
Looking at table I note that all values are in pu and the most vector:
important loading scores are in bold type (assuming 0.2 as a fixed ⎡ ⎤
1.1008
threshold). In add, it is possible to identify the following set of ⎢ 1.5652 ⎥
clusters: cN =⎢
⎣ 0.6509 ⎦
⎥
12.2429
• cluster 1 (P C1 ): Δg13 , Δb13 ;
• cluster 2 (P C2 ): Δg13 , Δb13 , Δb23 ;
• cluster 3 (P C3 ): ΔQ2 , ΔQ12 , ΔQ23 , Δb12 ; is possible to identify the cluster 4 as corrupted. The identifi-
• cluster 4 (P C4 ): ΔP2 , ΔP12 , ΔP23 . cation is in compliance with the input bad data in P2 .
This means that each variable belongs to a specified cluster. B. Bad data in parameter g13
The presence of a gross error in one of the variable represented Input: in this case the parameter g13 is affected by a gross
by its data residual, implies the identification of its cluster as error, that is 20·σg13 .
corrupted. In particular, data residuals not indicated in the above Estimated data residuals: in these following vectors, there are
clusters (ΔP1 , ΔQ1 , ΔV3 , Δg12 and Δg23 ) are related to those the estimation of measurement and network parameter errors.
variables neither detectable nor identifiable, due to their loading All the terms in bold type, represent those values that are larger
scores indicated in table I. than their theoretical uncertainties:
⎡ ⎤
0.0083
A. Bad data in measurement P2 ⎢ 0.0197 ⎥ ⎡ ⎤
⎢ ⎥ 0.0057
⎢−0.0086⎥
⎢ ⎥ ⎢ -5.4156 ⎥
⎢ -0.0195 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
Input: in this case the measurement P2 is affected by a gross Δz̃ = ⎢ -0.0192 ⎥ Δπ̃ = ⎢−0.0175⎥
⎢ ⎥ ⎢ 0.0784 ⎥
error, that is 20·σP2 . ⎢ -0.0194 ⎥ ⎢ ⎥
⎢ ⎥ ⎣ 0.6745 ⎦
Estimated data residuals: in these following vectors, there are ⎢ 0.0753 ⎥
⎢ ⎥ 0.0999
the estimation of measurements and network parameter errors. ⎣ 0.0191 ⎦
All the terms in bold type, represent those values that are larger 0.0014
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.
Note that in this case also some measurements residuals are R EFERENCES
marked, due to the network parameter gross error influence. [1] Monticelli, A. ”Electric power system state estimation.” Proceedings of the
IEEE 88.2 (2000): 262-282.
Detection: the loss function is J(r̃)=13104.3 (>9.49); this [2] Handschin, E., et al. ”Bad data analysis for power system state estimation.”
suggests to suspect about the presence of bad data. Power Apparatus and Systems, IEEE Transactions on 94.2 (1975): 329-337.
[3] Van Cutsem, Th, and M. Ribbens-Pavella. ”BAD DATA IDENTIFI-
CATION METHODS IN POWER SYSTEM STATE ESTIMATION-
ACOMPARATIVE STUDY.” IEEE Transactions on PowerApparatus and
Identification: it is possible to understand that there is Systems 104.11 (1985).
something wrong in four marked measurement residuals, which [4] Nian-De, Xiang, Wang Shi-Ying, and Yu Er-Keng. ”A new approach for
are related to Q1 , Q2 , Q12 and Q23 and in two marked detection and identification of multiple bad data in power system state
estimation.” Power Apparatus and Systems, IEEE Transactions on 2 (1982):
network parameter residuals, which are g13 and b13 . In fact, 454-462.
the loss function is J(r̃)=3.44·103 (>9.49) and this suggests to [5] Zarco, Pedro, and Antonio Gomez Exposito. ”Power system parameter
suspect about the presence of a bad data. Then, evaluating the estimation: a survey.” Power Systems, IEEE Transactions on 15.1 (2000):
216-222.
variance-covariance matrix RΔd˜, it is possible to extract the [6] D’Antona, Gabriele. ”The full least-squares method.” Instrumentation and
four PCs, in order to identify which is the suspected cluster that Measurement, IEEE Transactions on 52.1 (2003): 189-196.
can contain the bad data. Therefore, computing the normalized [7] D’Antona, Gabriele. ”Power System State Estimation with Uncertain mea-
cluster absolute value vector: surements and network parameters”. unpublished.
[8] D’Antona, Gabriele. ”Uncertainty of power system state estimates due to
measurements and network parameter uncertainty.” 2010 IEEE International
⎡ ⎤ Workshop on Applied Measurements for Power Systems. 2010.
2.7646
⎢36.9796⎥ [9] Fabbris, Luigi. Statistica multivariata: analisi esplorativa dei dati. McGraw-
cN =⎢
⎣28.6977⎦
⎥ Hill Libri Italia, 1997.
[10] Ertel, Suitbert. Factor analysis: Healing an ailing model. Universittsverlag
5.1959 Gttingen, 2013.
[11] Jolliffe, Ian. Principal component analysis. John Wiley & Sons, Ltd, 2002.
[12] Johnson, Richard Arnold, and Dean W. Wichern. Applied multivariate
statistical analysis. Vol. 4. Englewood Cliffs, NJ: Prentice hall, 1992.
is possible to identify the clusters 2 and 3 as corrupted, even
if the first in a stronger way than the second. The identification
is in compliance with the input bad data in g13 .
VI. C ONCLUSIONS
This paper shows the results of a new approach to detect
and identify bad data in measurements and also in network
parameters, based on an extended version of the WLS method
for SE. Obviously, because of the complexity of the problem,
that is formulated as an underdetermined one and that regards a
multivariate analysis, the direct observability of all data residuals
is rather impossible (excluding some particular cases). Anyhow,
this proposed method allows to determine a group of suspected
variables (called clusters) that can be suspected to contain the
bad data.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 18,2023 at 17:48:36 UTC from IEEE Xplore. Restrictions apply.