Recurrent Graph Convolutional Network-Based Multi-Task Transient Stability Assessment Framework in Power System
Recurrent Graph Convolutional Network-Based Multi-Task Transient Stability Assessment Framework in Power System
June 1, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2991263
ABSTRACT Reliable online transient stability assessment (TSA) is fundamentally required for power
system operation security. Compared with time-costly classical digital simulation methods, data-driven deep
learning (DL) methods provide a promising technique to build a TSA model. However, general DL models
show poor adaptability to the variation of power system topology. In this paper, we propose a new graph-
based framework, which is termed as recurrent graph convolutional network based multi-task TSA (RGCN-
MT-TSA). Both the graph convolutional network (GCN) and the long short-term memory (LSTM) unit are
aggregated to form the recurrent graph convolutional network (RGCN), where the GCN explicitly integrate
the bus (node) states with the topological characteristics while the LSTM subsequently captures the temporal
features. We also propose a multi-task learning (MTL) scheme, which provides joint training of stability
classification (Task-1) as well as critical generator identification (Task-2) in the framework, and accelerate
the process with parallel computing. Test results on IEEE 39 Bus system and IEEE 300 Bus system indicate
the superiority of the proposed scheme over existing models, as well as its robustness under various scenarios.
INDEX TERMS Deep graph-based learning, transient stability assessment (TSA), graph convolutional
network (GCN), recurrent graph convolutional network (RGCN), multi-task learning (MTL).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 93283
J. Huang et al.: Recurrent Graph Convolutional Network-Based Multi-Task TSA Framework in Power System
inputs to the stability labels [15], [16]. GUPTA et al. [17] carried out on IEEE 39 Bus system and IEEE 300 Bus system
consider a new description of measured generator data as to validate the generalization and robustness of the proposed
an image with each value in the data matrix represented as scheme.
color intensity. The visual dissimilarity of images of stable Generally, this paper is highlighted with the following
and unstable cases is then distinguished by CNN, which is contributions:
trained simultaneously for both stability classification and 1) The adjacency matrix of GCN is designed to represen-
critical generators (i.e generators most affected under the tatively describe the graph topology of power system
disturbance) identification. In [18], the authors adopt discrete and effectively reflect the inherent physical character-
Fourier transform to obtain spectrum from the fault-on gener- istics.
ator trajectories and arrange them into 2D images, such that 2) A block-diagonal sparse matrix is constructed with
CNN can achieve good performance in refined CCT regres- each block corresponding to the adjacency matrix of
sions. Shi et al. [19] construct larger images with variables of a graph. Such an attempt supports batch-wise process
all buses and verify the effectiveness of CNN on instability of graph data and fully utilize parallel computing.
mode (e.g., caused by insufficient synchronizing or damping 3) A cost-sensitive cross-entropy function is designed to
torque) prediction. Aimed at a large scale of contingency deal with category-imbalanced problem in critical gen-
screening, Yan et al. [20] introduce cascade CNNs in stability erator identification.
probability prediction for early TDS termination without 4) A soft sharing scheme is proposed to accelerate the
losses of accuracy, based on continuously refreshing them- multi-task training.
selves with the increase of labeled TDS outputs. Nonethe-
The rest of this paper is organized as follows. Section II
less, all above DL models are not specialized in exploring
introduces the design of RGCN. Section III presents the data
of observations with explicit topological graph correlation,
preprocessing and the application of the RGCN-MT-TSA
where power system is such an interconnected network of
framework with offline training tricks. Section IV demon-
generators and loads [21]. A series of studies [22]–[24] estab-
strates cases study on two different benchmark systems and
lish that there exists a close relationship between topology
various scenarios. The conclusion is discussed in Section V.
and transient stability. As a result, changes of power system
topology, which is frequently triggered by maintenance or
II. RECURRENT GRAPH CONVOLUTIONAL NETWORK
faults, may deteriorate the performance of TSA models based
In this paper, we propose a novel aggregating network
on SAE, CNN or recurrent methods.
structure, named as the recurrent graph convolutional net-
Correspondingly, graph convolutional network (GCN)
work (RGCN) and shown in Fig. 1. RGCN consists of four
develops an explicit way of integrating topological structure
cascading modules, where GCN and LSTM are hierarchical
into the convolution algorithm [25]. GCN has been proved
modules while the time pooling and classifier are single ones.
extremely useful for graph analysis tasks in a wide variety
GCN and LSTM play a critical role in addressing graphical
of application areas, such as knowledge graph learning [26],
and temporal feature extraction. Then the time pooling mod-
text classification [27] and recommender system prediction
ule aggregates features from the whole time steps and the
[28]. The basic idea behind GCN is to distill the high-
classifier provides final discrimination.
dimensional information about a node’s graph neighborhood
Both the GCN and the LSTM adopt hierarchical stacked
into a vector representation with dimension reduction. With
structure containing also the normalization layers and full
this in mind, GCNs are also employed in the field of power
connected (FC) layers. We select two types of normaliza-
system recently, to deal with fault location and load shed-
tion layers, i.e. the batch normalization (BN) and the layer
ding [29], [30]. Specially, under the context of TSA, James
normalization (LN). Details of these layers and modules are
J Q et al. [31] designs a GCN model for recovery of the
introduced as follows.
missing PMU data and indicate lower errors than existing
implementation [14].
In this paper, we propose a new recurrent graph convo-
lutional network (RGCN) for spatio-temporal feature inte-
gration. RGCN adopts cascading architecture where the
improved GCN modules process measurements at nodes con-
sidering the power system graph structures firstly, and then
the LSTM modules accomplish the temporal fusion. Based on
the RGCN, we further design a multi-task TSA framework,
named as the RGCN-MT-TSA in the paper. Multi-task learn-
ing (MTL) is exploited for joint training of two subtasks, i.e.
stability classification (Task-1) and critical generator identi-
fication (Task-2). The proposed framework provides early-
warning based on the results of both tasks such that they
can verify each other spontaneously. Comprehensive tests are FIGURE 1. Cascade architecture of RGCN.
where c̃1 + c̃2 = 1. The system is predicted as unstable MTL is essentially a multi-objective optimization,
when c̃1 > 0.5, which is labeled as [1, 0]T . Otherwise, i.e., multiple loss functions are simultaneously minimized
the system is stable and labeled as [0, 1]T . based on gradient descent. In the context of DL, hard and
• Multi-label classifier soft parameter sharing [38], [39] are the most commonly
For multi-label classification, each sample is simulta- used settings for MTL. The former requires all the tasks to
neously associated with a set of labels. We assign a share the same subset of the hidden layers and thus effectively
0/1 binary code for each label to represent False/True. alleviates the chance to overfit. However, the drawback is
Then the problem can be actually decomposed to multi- that we might solve the multi-objective programming directly
ple related binary-category learning. We adopt the sig- to obtain the common representation that captures multiple
moid function that limits z to c̃(c̃i ∈ (0, 1)), where c̃ tasks. Another practical way is to merge the weighted loss
denotes a vector of the confidence of all labels. Define functions and thus optimize the single-objective problem.
a threshold δ, a label i is predicted to be true one when The assignment of the weights among the tasks implements
its confidence c̃i ∈ (0, δ], and the final output is a binary a direct effect on the generalization of all the tasks. Here,
vector. we adopt the latter setting.
Going back to the critical generator identification prob- As shown in Fig. 4, task-1 and task-2 have their own mod-
lem, each label is corresponding to a generator and els and parameters, but we regularize the distance between the
the set of true labels from output refers to that of the parameters of the two models to encourage their parameters
predicted critical generators G̃c . Furthermore, if there is to be similar. Considering that the operation complexity for
a set of labels whose confidence belongs to (δ, δa ) with Task-2 is significantly larger than Task-1 with when there
δ < δa < 1, we say this set refers to the set of predicted are tens of labels, Task-1 will be trained at first and its spa-
significant generators G̃s . tial feature extractor, GCN modules, are then transferred to
Task-2 as an initial setting. A regularization term is merged in
III. THE RGCN-MT-TSA FRAMEWORK the loss function of Task-2 to minimize the distance between
A. GENERAL INTRODUCTION OF THE FRAMEWORK its parameters and the trained parameters of Task-1. With the
In this paper, we propose a multi-task TSA solution based on benefit of such a generalization design, we use the implicit
RGCN. The framework is shown in Fig. 4. experience in Task-1 as guidance for the parameters optimiza-
tion of Task-2 and the multi-objective problem is simplified
to a two-stage single-objective optimization.
2) INPUT VECTOR
We choose three physical variables of each bus (node) to form
the input space of GCN. They are the bus voltage magnitude,
the bus relative phase and the rotor speeds of generators
connected to the power plant bus, i.e., the derivative of rotor
angles with respect to time. For the load buses, their values of
the third variable (rotor speed) are uniformly set to zero. The
observation time window of the model inputs starts from the
moment of fault occurrence t0 and ends at the fault clearance
period tc (including t0− and tc+ ). Denoting length of the
observation window as T and the sampling frequency as
fs , then the number of snapshots of above variables will be
M = T fs + 1. Therefore, for a power system with N buses,
FIGURE 4. The flowchart of the proposed RGCN-MT-TSA framework.
we need M RGCNs and each RGCN has an input feature
matrix of size N × 3.
1) MULTI-TASK DESIGN In the online TSA, the input data can be obtained from
Our TSA task is composed of two subtasks, i.e. the stability either PMUs or TDS.
classification (Task-1) and the critical generator identification
(Task-2). Conventionally, different tasks may have distin- 3) ADJACENCY MATRIX
guished parameters or even architectures. However, individ- For any graph G = (V, E) describing the structure of power
ual designed blocks the sharing of knowledge in the training system, the nodes refer to the buses, while the edges refer to
process. In fact, for the transient stability problem, the judg- the transmission lines. Typically, the element at (i, j) of the
ment of instability has strong, or even causal, links to the adjacency matrix A is defined as follows:
behavior of the critical generators. Hence, we follow the
(
0 Vi , Vj ∈ V, (Vi , Vj ) ∈
/E
conceptional idea of multi-task learning (MTL) in our design ai,j = (5)
1 Vi , Vj ∈ V, (Vi , Vj ) ∈ E
such that the model shares the representation of the related
tasks and performs better on the target tasks. where (Vi , Vj ) denotes the edge from i to j.
them, cannot gain ‘‘message’’ from each other. Therefore, TABLE 1. Confusion matrix for Task-1.
we consider batches of graphs as subgraphs of one or more
large graphs, with the characteristic that any two nodes
belonging to two different subgraphs are still separated from
each other in the synthetic graph. The parallel computing
process for n subgraphs is illustrated as Fig. 6.
When the process is GPU accelerated and the block matrix where w0 and b0 are trained parameters of the sharing
does not suffer from memory leak, a single sparse opera- hidden layers of Task-1, while β2 is another regulariza-
tion with complexity O(nL) in a convolution operation as tion weight.
(2) is converted to n parallel operations with O(Li ). Hence, All the loss functions mentioned above are optimized
The max complexity of the sparse operation drops down to with Adam algorithm [40], which is one of the most
O(Lmax ). In our model, we assume a sample as a graph-based commonly-used optimization algorithms for DL.
series of size T , and a block-diagonal matrix for a batch with
m samples is C. PERFORMANCE METRICS
00 0 0 0 0 0 Taking the difference between the tasks into account,
à = diag([Ã1,1 , Ã1,2 , . . . , Ã1,T , . . . Ã2,T , . . . , Ãm,T ]) (10) we designed two categories of metrics to measure the per-
formance of the model.
corresponding to a block matrix of node features X 0 and an
output convolved sparse matrix O0 .
1) CONFUSION MATRIX BASED METRICS
Based on the confusion matrix in Tab. 1, (16) to (19) explain
2) COST-SENSITIVE CROSS-ENTROPY FUNCTION
the specific metrics for our model evaluation, including ACC,
• Task-1
miss alarm (MA) rate, false alarm (FA) rate and G-mean.
Cross-entropy (CE) is widely adopted as the cost func-
tion for classification tasks. However for the problem TP + TN
with imbalanced samples, the stable (negative) samples ACC = (15)
TP + FP + FN + TN
attract too much attention and as a result, the unstable FP
(positive) samples suffer a loss of fit and generalization. MA = (16)
TN + FP
Here, we adopt the cost-sensitive cross-entropy (CSCE) FN
function with L2 regularization term as: FA = (17)
FN + TP
p
X X
Loss1 = − αi ( ci,j log c̃i,j ) + LossL2 (11) G−mean = (1 − MA)(1 − FA) (18)
i j
where αi is the balanced factor. Normally, αi of the where ACC denotes the proportion of the correctly predicted
unstable samples has a bigger value to encourage higher samples. MA represents the proportion of the correct results
accuracy (ACC) for them. [ci,1 , ci,2 ] is the annotated in all unstable samples, which reflects the reliability of assess-
categories, while [c̃i,1 , c̃i,2 ] denotes the softmax function ment with a higher risk priority than FA. FA represents the
outputs of the ith sample. It follows that proportion of the correct results of the stable ones, which
is used to monitor excessive alarm. Furthermore, G-mean is
1 a comprehensive index for the classification of imbalanced
LossL2 = β1 (kwk2 + kbk2 ) (12)
2 samples.
with w and b as learnable network parameters. β1 is the
regularization weight. 2) SET SIMILARITY BASED METRICS
• Task-2 Distinguished from Task-1 with scalar based evaluation,
Assume [ci,1 , ci,2 , . . . , ci,L ] to be annotated labels and Task-2 predicts the set of critical generators. Here, we intro-
[c̃i,1 , c̃i,2 , . . . , c̃i,L ] to be the sigmoid function outputs, duce the Jaccard similarity to evaluate the distance between
sets of integers. Given any two set si , sj ∈ N, Jaccard similar- TABLE 2. RGCN construction in IEEE 39 Bus system.
ity is defined as:
si ∩ sj s i ∩ s j
J(si , sj ) = = (19)
si ∪ sj |si | + sj − si ∩ sj
where J ∈ [0, 1] and J(si , sj ) = 1. Here, we consider the
sample correct only when J(G̃c , Gc ) = 1. Similar to ACC
and MA, we define Jaccard accuracy (JACC) of all samples
as well as Jaccard accuracy of unstable (JACCU) samples.
In terms of the parameter similarity between Task-1 and
Task-2, we prefer expand Jaccard similarity (EJS) instead of
Jaccard similarity that considers the difference in value and
direction of an ordered set, e.g. a vector. Given vectors vi , vj ∈
R, EJS is calculated by:
v ·v
EJS(vi , vj ) =
i
j (20)
kvi k +
vj
− vi · vj
Here we take account of two parameter sets p0 = [w0 , b0 ] for
Task-1 and p = [w, b] for Task-2, and the similarity of the
sharing layers can be expressed as:
1
EJS(p, p0 ) = (EJS(w, w0 ) + EJS(b, b0 )) (21)
2
where EJS ∈ [0, 1]. The closer the similarity is to 1, the better
the two sets of parameters satisfy the similarity constraint.
TABLE 3. Metrics comparison of methods in IEEE 39 Bus system. TABLE 4. Metrics comparison of composite methods in IEEE 39 Bus
system under ‘‘N−3’’ cases.
FIGURE 8. Hidden layer activations visualization of the spatial extractors. (a) RSAE. (b) RCNN. (c)RGCN.
3) ROBUSTNESS ANALYSIS
Once the offline model is applied online, we should consider
the input damage problem led by loads fluctuation or poor
measurement. On one hand, the existing research tends to
simplify the distribution of error in the sampling and cal-
culation stage, as ideal Gaussian white noise ε ∼ N (0, 1).
However, white noise might be converted to color noise [43]
in the low-pass filtering of PMUs. The pulse expression of a
low-pass filter is defined as:
15
1 X
h(t) = δ(t − i) (22)
15
i=0
where δ denotes the pulse function. A series of Gaussian color
noise ε0 is generated as:
ε0 = h ∗ ε (23)
Generally, signal to noise ratio (SNR) is used to calculate
distance between noise and signal:
kxk
SNR = 20 lg
0
(24)
ε
SNR of small values refers to high signal distortion. On
the other hand, communication error or signal interference,
etc., usually result in data missing or abnormal values in the
sampling stage. We simulate this scenario by assuming values
of data drop to zero or soar to two times of themselves with
an assumed probability.
Extensive performance of the models for both tasks in
above multiple scenarios is listed as Tab. 5. Under the ideal
scenario, the changes in the loss functions make no significant
FIGURE 9. Convergence metrics comparison of the composite methods.
difference to ACC of Task-1, while we find the improvement (a) Training curves. (b) Response time.
of 0.68% in MA with CSCE. JACCU of Task-2 rises by over
5% with the parameter similarity ( EJS) of more than 0.99. Considering wide area abnormal values of 1% to 3%,
All the metrics remain practically unchanged when consid- the proposed method with message passing based GCN mod-
ering the color noise of big values. Assuming an extreme con- ules is only mildly affected by individual abnormal buses.
dition SNR = 20dB where the noise reaches 10% of the orig- The max loss of metrics in both tasks keeps less than 2%
inal input, MA and FA slightly increase to almost 1%. Due to compared with the ideal scenario. On the whole, the proposed
distinctively more prediction objectives than Task-1, JACCU method results in desirable performance under the designed
suffers a drop of around 2%. Nonetheless, our model covers scenarios and fulfills the requirements of adaptability and
97% of the unstable samples under large noise interference. robustness.
B. IEEE 300 BUS SYSTEM TABLE 6. Metrics comparison of methods in IEEE 300 Bus system.
2) TEST RESULTS
In contrast with IEEE 39 Bus system, the input scale grows at
a geometric progression and as a result, ACC of the shallow
networks declines by 8.93% to 13.49%. Deep networks have
ACC over 90% and LSTM performs worst among them. It is Fig. 10 d refers to true stable samples while the others refer to
inferred that a single temporal method generalizes poorly true unstable samples. Hidden activations of generator nodes
to data with abundant spatial and topological characteris- are similarly compressed as 2D vectors with t-SNE. Here a
tics. In terms of the composite methods, our method is the circle represents a generator and its color intensity is related
most reliable one regardless of the system scale. Here ACC to the confidence. The set of predicted critical generators is
and G-mean both maintain about 99%. We then transfer the highlighted by a solid oval, and that of predicted significant
pretrained GCN modules of Task-1 to more fine-grained generators is circled by a dashed one.
Task-2, where the numbers of generators to be predicted From details in Fig.10 a and Fig.10 d, the systems are pre-
are almost 7 times of those in the previous system. JACCU dicted to be unstable in Task-1, while the whole generators are
and JACC of the proposed method are 97.32%, 97.98% and considered to be stable in Task-2 with all of their confidence
meanwhile, the similarity regularization is satisfied with EJS over 0.9. Therefore, ADM determines the system status as
of 0.992. ‘‘Secure’’ and avoids false alarms. In terms of ‘‘Uncertain’’
cases in Fig. 10 b and Fig. 10 e, there still exists conflict in the
C. VISUALIZATION VERIFICATION OF RGCN-MT-TSA predictions of both tasks, where the model of Task-2 detects
FRAMEWORK the set of critical or significant generators. It is expected to
Assume δa equal to 0.9, we apply the proposed framework be concerned more about these generators while utilizing
online based on parallel computing of Task-1 and Task-2. TDS to further reduce harmful MA phenomena. When both
The average time of a batch assessment in IEEE 39 Bus tasks predict the systems and the generators to be unstable as
system, as well as IEEE 300 Bus system, is respectively Fig. 10 c and Fig. 10 f, ADM indicates the state of emergency
16ms and 62ms. It follows that ADM generates three signals, and prompt critical control can be implemented based on the
e.g., ‘‘Secure’’, ‘‘Uncertain’’ and ‘‘Critical’’. The following set of critical generators. Generally due to the visualization,
typical examples described in Fig. 10 are selected to verify dispatchers might efficiently recognize the set of generators
the effectiveness of the designed signals, where Fig. 10 a and to be controlled based on the color intensity and clusters of
FIGURE 10. Typical visualization results of RGCN-MT-TSA framework. (a)(d) Secure. (b)(e) Uncertain. (c)(f) Critical.
circles. It is convenient to sort the importance of generators In future work, we will pay attention to periodic model
and then develop more precise control strategies. update in our framework when facing more complex changes
in the topology. This adaptive framework is expected to
V. CONCLUSION expand on a large practical system with thousands of buses.
In this paper, a multi-task transient stability assessment
framework is proposed to address early-warning of stability REFERENCES
classification and critical generator identification according [1] P. Kundur, J. Paserba, V. Ajjarapu, G. Andersson, A. Bose, C. A. Canizares,
to PMU data. We design a cascade neural network archi- N. D. Hatziargyriou, D. J. Hill, A. M. Stankovic, C. Taylor, T. Van Cutsem,
tecture named RGCN to capture the transient characteristics and V. Vittal, ‘‘Definition and classification of power system stability
IEEE/CIGRE joint task force on stability terms and definitions,’’ IEEE
graphically and temporally, where a state-of-the-art network, Trans. Power Syst., vol. 19, no. 3, pp. 1387–1401, Aug. 2004.
GCN, is creatively used to explicitly extract physical topolog- [2] S. Obuz, M. Ayar, R. D. Trevizan, C. Ruben, and A. S. Bretas, ‘‘Renewable
ical information of the power system. The offline models of and energy storage resources for enhancing transient stability margins:
A PDE-based nonlinear control strategy,’’ Int. J. Elect. Power Energy Syst.,
different tasks are trained in a parallel way, with a new cost- vol. 116, Mar. 2020, Art. no. 105510.
sensitive cross-entropy function to handle the imbalanced [3] P. Kundur, N. J. Balu, and M. G. Lauby, Power System Stability and
problem. A similarity regularization item is designed such Control, vol. 7. New York, NY, USA: McGraw-Hill, 1994.
that the model of Task-1 can be transferred to that of Task- [4] R. Diao, S. Jin, F. Howell, Z. Huang, L. Wang, D. Wu, and Y. Chen,
‘‘On parallelizing single dynamic simulation using HPC techniques and
2 and the training difficulty is alleviated. To evaluate the APIs of commercial software,’’ IEEE Trans. Power Syst., vol. 32, no. 3,
effectiveness and robustness of the proposed method, a series pp. 2225–2233, May 2017.
of case studies as well as comparisons with six different single [5] L. M. Skvortsov, ‘‘A fifth order implicit method for the numerical solution
or aggregating models are comprehensively conducted on two of differential-algebraic equations,’’ Comput. Math. Math. Phys., vol. 55,
no. 6, pp. 962–968, Jun. 2015.
benchmark systems of different scales. Test results indicate [6] T. Athay, R. Podmore, and S. Virmani, ‘‘A practical method for the direct
the desirable performance and reliability of the proposed analysis of transient stability,’’ IEEE Trans. Power App. Syst., vol. PAS-98,
method. Furthermore, our framework provides comprehen- no. 2, pp. 573–584, Mar. 1979.
sive signals and 2D visualization of the generators, which [7] Y. Xu, Z. Y. Dong, R. Zhang, Y. Xue, and D. J. Hill, ‘‘A decomposition-
based practical approach to transient stability-constrained unit com-
helps to improve the false alarm rate as well as implement mitment,’’ IEEE Trans. Power Syst., vol. 30, no. 3, pp. 1455–1464,
more accurate and timely control. May 2015.
[8] M. He, J. Zhang, and V. Vittal, ‘‘Robust online dynamic security assess- [30] C. Kim, K. Kim, P. Balaprakash, and M. Anitescu, ‘‘Graph convolutional
ment using adaptive ensemble decision-tree learning,’’ IEEE Trans. Power neural networks for optimal load shedding under line contingency,’’ in
Syst., vol. 28, no. 4, pp. 4089–4098, Nov. 2013. Proc. IEEE Power Energy Soc. Gen. Meeting (PESGM), Aug. 2019,
[9] F. R. Gomez, A. D. Rajapakse, U. D. Annakkage, and I. T. Fernando, pp. 1–5.
‘‘Support vector machine-based algorithm for post-fault transient stability [31] J. J. Q. Yu, D. J. Hill, V. O. K. Li, and Y. Hou, ‘‘Synchrophasor recovery and
status prediction using synchronized measurements,’’ IEEE Trans. Power prediction: A graph-based deep learning approach,’’ IEEE Internet Things
Syst., vol. 26, no. 3, pp. 1474–1483, Aug. 2011. J., vol. 6, no. 5, pp. 7348–7359, Oct. 2019.
[10] S. A. Siddiqui, K. Verma, K. R. Niazi, and M. Fozdar, ‘‘Real-time mon- [32] T. N. Kipf and M. Welling, ‘‘Semi-supervised classification with graph
itoring of post-fault scenario for determining generator coherency and convolutional networks,’’ in Proc. Int. Conf. Learn. Represent., 2017,
transient stability through ANN,’’ IEEE Trans. Ind. Appl., vol. 54, no. 1, pp. 1–14.
pp. 685–692, Jan. 2018. [33] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, ‘‘Neural
[11] I. B. Sulistiawati, A. Priyadi, O. A. Qudsi, A. Soeprijanto, and N. Yorino, message passing for quantum chemistry,’’ in Proc. Int. Conf. Mach. Learn.,
‘‘Critical clearing time prediction within various loads for transient stabil- vol. 70, 2017, pp. 1263–1272.
ity assessment by means of the extreme learning machine method,’’ Int. J. [34] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Elect. Power Energy Syst., vol. 77, pp. 345–352, May 2016. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[12] Q. Zhu, J. Chen, L. Zhu, D. Shi, X. Bai, X. Duan, and Y. Liu, ‘‘A deep [35] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep network
end-to-end model for transient stability assessment with PMU data,’’ IEEE training by reducing internal covariate shift,’’ in Proc. Int. Conf. Mach.
Access, vol. 6, pp. 65474–65487, 2018. Learn., 2015, pp. 448–456.
[13] J. J. Q. Yu, D. J. Hill, A. Y. S. Lam, J. Gu, and V. O. K. Li, ‘‘Intelligent [36] T. Kim, I. Song, and Y. Bengio, ‘‘Dynamic layer normalization for adaptive
time-adaptive transient stability assessment system,’’ IEEE Trans. Power neural acoustic modeling in speech recognition,’’ in Proc. Interspeech,
Syst., vol. 33, no. 1, pp. 1049–1058, Jan. 2018. Aug. 2017, pp. 2411–2415.
[14] J. J. Q. Yu, D. J. Hill, and A. Y. S. Lam, ‘‘Delay aware transient stabil- [37] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
ity assessment with synchrophasor recovery and prediction framework,’’ R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
Neurocomputing, vol. 322, pp. 187–194, Dec. 2018. from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
[15] Y. Zhou, Q. Guo, H. Sun, Z. Yu, J. Wu, and L. Hao, ‘‘A novel data-driven 2014.
approach for transient stability prediction of power systems considering [38] R. Caruana, ‘‘Multitask learning: A knowledge-based source of inductive
the operational variability,’’ Int. J. Elect. Power Energy Syst., vol. 107, bias,’’ in Proc. Int. Conf. Mach. Learn., 1993, pp. 41–48.
pp. 379–394, May 2019. [39] L. Duong, T. Cohn, S. Bird, and P. Cook, ‘‘Low resource dependency
[16] R. Zhang, J. Wu, Y. Xu, B. Li, and M. Shao, ‘‘A hierarchical self- parsing: Cross-lingual parameter sharing in a neural network parser,’’ in
adaptive method for post-disturbance transient stability assessment of Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics, 7th Int. Joint Conf.
power systems using an integrated CNN-based ensemble classifier,’’ Ener- Natural Lang. Process., 2015, pp. 845–850.
gies, vol. 12, no. 17, p. 3217, 2019. [40] D. P. Kingma and J. Lei Ba, ‘‘Adam: A method for stochastic optimiza-
[17] A. Gupta, G. Gurrala, and P. S. Sastry, ‘‘An online power system stability tion,’’ in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–15.
monitoring system using convolutional neural networks,’’ IEEE Trans. [41] A. Paszke et al., ‘‘PyTorch: An imperative style, high-performance
Power Syst., vol. 34, no. 2, pp. 864–872, Mar. 2019. deep learning library,’’ in Proc. Adv. Neural Inf. Process. Syst., 2019,
[18] L. Zhu, D. J. Hill, and C. Lu, ‘‘Hierarchical deep learning machine for pp. 8024–8035.
power system online transient stability prediction,’’ IEEE Trans. Power [42] L. van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’ J. Mach.
Syst., vol. 35, no. 3, pp. 2399–2411, May 2020. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
[19] Z. Shi, W. Yao, L. Zeng, J. Wen, J. Fang, X. Ai, and J. Wen, ‘‘Convolu- [43] C. K. Papadopoulos and C. L. Nikias, ‘‘Parameter estimation of exponen-
tional neural network-based power system transient stability assessment tially damped sinusoids using higher order statistics,’’ IEEE Trans. Acoust.,
and instability mode prediction,’’ Appl. Energy, vol. 263, Apr. 2020, Speech, Signal Process., vol. 38, no. 8, pp. 1424–1436, Aug. 1990.
Art. no. 114586.
[20] R. Yan, G. Geng, Q. Jiang, and Y. Li, ‘‘Fast transient stability batch
assessment using cascaded convolutional neural networks,’’ IEEE Trans.
Power Syst., vol. 34, no. 4, pp. 2802–2813, Jul. 2019.
[21] T. Ishizaki, A. Chakrabortty, and J.-I. Imura, ‘‘Graph-theoretic analysis of JIYU HUANG received the B.S. degree in electri-
power systems,’’ Proc. IEEE, vol. 106, no. 5, pp. 931–952, May 2018.
cal engineering from the South China University of
[22] F. Ebrahimzadeh, M. Adeen, and F. Milano, ‘‘On the impact of topol-
Technology (SCUT), Guangzhou, China, in 2019,
ogy on power system transient and frequency stability,’’ in Proc. IEEE
where he is currently pursuing the Ph.D. degree.
Int. Conf. Environ. Electr. Eng., IEEE Ind. Commercial Power Syst. Eur.
(EEEIC/I&CPS Europe), Jun. 2019, pp. 1–5. His research interests include deep learning
[23] Y. Song, D. J. Hill, and T. Liu, ‘‘Characterization of cutsets in networks in power system security and transient stability
with application to transient stability analysis of power systems,’’ IEEE assessment.
Trans. Control Netw. Syst., vol. 5, no. 3, pp. 1261–1274, Sep. 2018.
[24] T. Weckesser, H. Jóhannsson, M. Glavic, and J. Østergaard, ‘‘An improved
on-line contingency screening for power system transient stability assess-
ment,’’ Electr. Power Compon. Syst., vol. 45, no. 8, pp. 852–863,
May 2017.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro- LIN GUAN (Member, IEEE) received the B.S. and
cess. Syst., 2012, pp. 1097–1105. Ph.D. degrees in electric power engineering from
[26] R. Ye, X. Li, Y. Fang, H. Zang, and M. Wang, ‘‘A vectorized relational the Huazhong University of Science and Technol-
graph convolutional network for multi-relational network alignment,’’ in ogy, Wuhan, China, in 1990 and 1995, respec-
Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 4135–4141.
tively.
[27] L. Yao, C. Mao, and Y. Luo, ‘‘Graph convolutional networks for text
She is currently a Professor with the Electric
classification,’’ in Proc. AAAI Conf. Artif. Intell., vol. 33, Jul. 2019,
pp. 7370–7377.
Power College, South China University of Tech-
[28] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and nology, Guangzhou, China. From 2014 to 2015,
J. Leskovec, ‘‘Graph convolutional neural networks for Web-scale recom- she was a Visiting Scholar with Stanford Univer-
mender systems,’’ in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov- sity. She is the author of more than 120 articles
ery Data Mining, Jul. 2018, pp. 974–983. and a Principal Investigator of more than 50 projects. Her research interests
[29] K. Chen, J. Hu, Y. Zhang, Z. Yu, and J. He, ‘‘Fault location in power include application of artificial intelligence technology in electrical engi-
distribution systems via deep graph convolutional networks,’’ IEEE J. Sel. neering, power system security and control, and power system planning and
Areas Commun., vol. 38, no. 1, pp. 119–131, Jan. 2020. reliability.
YINSHENG SU received the B.S. and M.S. MENGXUAN GUO was born in Hunan, China,
degrees in electrical engineering from Shanghai in 1997. She received the B.S. degree in electrical
Jiaotong University, Shanghai, China, in 1999 and engineering from the South China University of
2002, respectively. He is currently a Senior Spe- Technology (SCUT), Guangzhou, China, in 2019.
cialist with China Southern Grid Co. Ltd. His Her research interests include deep learning in
research interest includes electric power system power system security and small-signal stability
operation and control. analysis.
HAICHENG YAO received the B.S. degree in ZHI ZHONG received the B.S. degree in electri-
water conservancy and hydropower engineering cal engineering from Southeast University, Nan-
from the Huazhong University of Science, Wuhan, jing, China, in 2019. He is currently pursuing
China, in 2004, and the M.S. degree from the the M.S. degree in electrical engineering with the
State Grid Electric Research Institute, Shanghai, South China University of Technology (SCUT),
China, in 2007. He is currently a Senior Engineer Guangzhou, China. His research interests include
with China Southern Grid Co. Ltd. His research machine learning, transient stability assessment of
interest includes electric power system operation power system, and the applications of big data in
and control. smart grids.