Keywords: Faulty feeder detection has always been recognized as an essential issue in guaranteeing the stable and reliable
Faulty feeder detection operation of distribution networks. Existing methods fail to fully exploit the spatial correlation and temporal
Feeder to feeder dependencies in the zero-sequence currents, which are exhibited as waveforms on the spatial domain and are
Hybrid CNN-LSTM model
displayed as time-series data on the time domain. In this paper, a novel detection method based on a hybrid
Patch to patch
Spatial–temporal features
model of convolutional neural network (CNN) and long short-term memory (LSTM) neural network is proposed,
and the spatial–temporal information can be comprehensively explored. Deep spatial features are extracted by
using CNN, and they are further processed by using LSTM network. Besides, to maintain the temporal de
pendencies of the extracted spatial features, the current waveforms are recognized on the patch scale by the
proposed patch-to-patch CNN (PToP CNN). Moreover, a feeder-to-feeder LSTM (FToF LSTM) network is estab
lished to learn and compare spatial–temporal correlations between feeders. The joint PToP-CNN and FToF-LSTM
can achieve collaborative superiority on mining the fault features in the zero-sequence currents. To verify the
prospects in real installations, the hybrid model is implemented in an embedded device, called NVIDIA Jetson
AGX Xavier, and the real-time detection demonstrates the efficiency of the proposed method.
vector machine (SVM) [15]. In [16], a fuzzy measure fusion criterion is 2. CNN-LSTM hybrid classification model
utilized to calculate the similarity between the fault data in the real-time
detection and historical sample set, and the feeder can be classified as 2.1. Framework overview
faulty feeder and healthy feeder accordingly. Because the waveforms
have complete features, the time-series data can be converted into Since CNNs are expert in mining the hidden information in images
waveform images and recognized by using convolutional neural and LSTMs are good at processing the time-series data [20], we propose
network (CNN). In [17], CNN is directly used to adaptively extract and a faulty-feeder detection method based on the hybrid CNN-LSTM model.
combine the fault features in zero-sequence currents, and the faulty Moreover, it is significant for the faulty-feeder detection to capture the
feeder can be identified based on its output results. In [18], the sampled spatial–temporal characteristics and conduct correlation comparisons
data of zero-sequence currents are added with each other to achieve the between different feeders [11]. Thus, we propose a PToP CNN to extract
fused waveforms, and CNN is applied to find the fused waveforms that the spatial features from zero-sequence currents, where the temporal
do not contain the faulty feeder. Although the zero-sequence currents dependencies of the extracted features at time scales are maintained,
can be recognized on the waveform scale, the CNN fails to capture the and an FToF LSTM network to learn and compare the spatial–temporal
temporal features in the currents, which results in insensitivity to features between different feeders. The framework and workflow of the
changes in the temporal domain. proposed faulty-feeder detection method based on the PToP-CNN and
In fact, raw fault signals are displayed in the numerical form on the FToF-LSTM is presented in Fig. 1.
temporal domain and exhibited in waveforms on the spatial domain, Firstly, the initial data of zero-sequence currents in the numerical
which implies that they have complete spatial–temporal features. That form are obtained and preprocessed to generate the waveform images.
is, raw signals need to be fully exploited on the comprehensive aspect, Secondly, each image is split into patches and is recognized on the patch
including the spatial domain and temporal domain, thus improving the scale by using CNN, where the learned high-level features based on these
adaptability to variable fault conditions. Because the CNN is skilled at patches can ensure the temporal invariance. Subsequently, the extracted
extracting spatial features but poor in capturing temporal dependencies, features of each patch are concatenated, called PToP features, and the
while the LSTM network shows the opposite features [19], the spa PToP features of each feeder can be obtained. Finally, the established
tial–temporal features in fault signals can be obtained when the CNN FToF LSTM is used to compare the extracted PToP features between
and LSTM are combined in the learning process. different feeders, and the faulty feeder can be detected based on the
To improve the detection performance, a hybrid model of patch-to- output result of the LSTM. For the practical application, the proposed
patch (PToP) CNN and feeder-to-feeder (FToF) LSTM, namely PToP- CNN-LSTM model can identify the faulty feeder in an accurate and
CNN-FToF-LSTM, is proposed in this paper, and the joint learning of prompt manner, thereby facilitating further fast postfault actions.
spatial–temporal correlations can be conducted. The main contributions
can be summarized as follows. 2.2. Image creation
(1) The spatial features in the zero-sequence currents can be When an SPG fault occurs [21], the first half-cycle zero-sequence
extracted by CNN, and the temporal features can be captured by currents of feeders are sampled, and they are processed by using (1).
LSTM. By the cooperation of CNN and LSTM, the spa
tial–temporal features in the currents can be fully exploited inor
i0k (d)
, k ∈ [1, N] (1)
0k (d) =
without information loss, and the superiority on feature extrac maxDd=1 {|i01 (d)|, ⋯, |i0N (d)| }
tion guarantees the good detection performance under variable
fault conditions. where N is the number of feeders, and D is the total sampling points in
(2) The hybrid CNN-LSTM model can not only extract spa the first half cycle. i0k is the sampled current of the k-th feeder, and inor
0k is
tial–temporal features in the currents, but also compare the the normalized current.
spatial–temporal correlations between feeders. The final detec After data processing, the zero-sequence current of each feeder is
tion results depend on the correlation comparison between normalized into [-1, 1]. To generate the images converted from the
feeders, rather than the fault features of a single feeder. In fact, numerical data quickly, the binary images can be obtained by using the
the correlation comparison enables the proposed method to be waveform encoding method [22], where the value is 1 if the waveform
highly robust to the complex fault scenarios and large localized cross the corresponding section. Finally, the normalized data of each
distortions in signals. feeder is converted into a binary image with size of 128 × 256.
(3) The PToP CNN and FToF LSTM are trained jointly, which implies
that both the spatial and temporal features can be learned 2.3. Patch-to-patch CNN
simultaneously. Furthermore, to verify the real-time perfor
mance, the established PToP-CNN-FToF-LSTM is implemented in With the booming development of CNNs in computer vision, CNNs
the embedded device NVIDIA Jetson AGX Xavier. The experi have been widely used in feature extraction, pattern recognition, image,
mental results indicate that the proposed method has good and video recognition. CNNs mainly consist of the convolution layer,
detection performance under various fault scenarios, thus pooling layer, and fully-connected (FC) layer, where the convolution
demonstrating its considerable prospects in practical distribution layer is the core of CNNs. The convolution layer possesses of a lot of
networks. kernels with certain receptive fields. As the number of kernels increases
and convolution layers are stacked, CNNs can extract and learn spatial
The rest of this article is organized as follows. Section 2 describes the information from the input images.
proposed hybrid CNN-LSTM model. Section 3 presents the conducted CNNs are commonly assumed to be invariant to small image trans
verification cases and discussions. Section 4 compares the performance formations [23], such as image translations, scaling, and deformations.
of the proposed method against other existing methods. Section 5 con That is to say, the recognition results are unchanged when the characters
cludes this article. are slightly translated, rescaled, or deformed. However, the invariance
of CNNs to image transformations would weaken the sensitivity on the
changes in the images. In particularly, for the CNN-LSTM-based detec
tion scheme, the extracted high-level features of CNN loss the temporal
dependencies in raw images, which may easily lead to the mis
alignments of the extracted features at time scales. Thus, we propose a
can use CNN with kernels that have little parameters to extract the fault
features, thus significantly reducing the computational cost. Further
more, the FC layer is discarded. The network structure is shown in Fig. 3.
... The CNN module incorporates six layers, including three convolution
layers, one max pooling layer, one global average pooling (GAP) layer,
and one flatten layer. Among them, the first and second convolution
layers only have 16 kernels, and the third convolution layer has 32
kernels. To downsample the feature maps, the max pooling layer is
added after the first convolution layer, and the second and third
Fig. 2. Process of obtaining patches from raw image. Fig. 3. Structure of CNN module.
convolution layer have the kernels with stride 2, where both the spatial of the forget gate, respectively. Wi and bi denote the weight and devia
dimensions of feature maps are halved accordingly. After that, a GAP tion of the input gate, respectively. Wo and bo denote the weight and
layer is used to obtain the feature maps with size of 1 × 1, and a flatten deviation of the output gate, respectively. gt is the intermediate variable,
layer is applied to flatten all the features into a single vector. In this and Wg and bg denote its learned weight and deviation respectively. σ
paper, the output of each patch is the vector with size of 1 × 32, and the denotes the sigmoid function, and tanh denotes the hyperbolic tangent
extracted features of each image can be represented with a 1024 (32 × function. ⊗ means the element-wise multiplication.
32) feature vector, as shown in (5). Since the extracted features by using the PToP CNN retain the de
pendencies at time scales, the LSTM module is used to handle the tem
( )
Vk = Z xp (5) poral correlation of the features. Besides, it is necessary for faulty-feeder
p=1 detection to conduct correlation comparison between different feeders
[11]. To conduct correlation comparison between feeders, we take the
where Z(.) denotes the process of the feature extraction by using CNN, feature vector Vk as the input xt of the LSTM module. That is to say, the
and denotes the concatenation operation. Vk is the extracted feature LSTM module receives the feature vector {V1, …, Vk, … } from the
vector of the k-th feeder. preceding PToP CNN as the input {x1, …, xt, … }. This type of LSTM can
Therefore, the extracted features of an input image by using the PToP be called as FToF LSTM.
CNN will be transformed into a feature vector containing 1024 elements. In this paper, the established FToF LSTM comprises four layers,
Besides, the feature vector maintains the temporal dependencies, which including two LSTM layers and two dropout layers, where the LSTM
can be further learned by using LSTM for final detection of the faulty layer and dropout layer are alternately stacked. Furthermore, 256
feeder. neurons are adopted in the LSTM layer, and a dropout rate 50 % is
applied in the dropout layer to reduce the overfitting. Subsequently, the
2.4. Feeder-to-feeder LSTM second dropout layer is followed by a FC layer with softmax activation,
and the faulty feeder can be identified according to the classification
As a special kind of recurrent neural network (RNN), the LSTM results. Since a distribution network has many feeders, the identification
network is suitable to process time-series data due to the powerful of the faulty feeder can be considered as a multiclass classification task.
learning capability for long-term dependencies [24]. Owing to the The loss function can be defined as follows:
recurrent architecture, the calculation of the hidden layer nodes can not
1 ∑Q ∑ N
only learn the input of the current layer, but can also retain the L= − P∗ log(Pn,q ) (7)
Q q=1 n=1 n,q
knowledge of the nodes in the previous layers. Moreover, the LSTM can
overcome the drawbacks of RNN on long sequences by using forger gate,
input gate, and output gate. The forget gate is applied to discard where Pn,q is the probability that the n-th feeder in the q-th sample is
redundant information, the input gate is used to select the important classified as the faulty feeder by the CNN-LSTM model, P∗n,q is the actual
information to be stored, and the output gate is utilized to identify the probability of the corresponding faulty-feeder classification. Q is the
output information. A typical LSTM network structure is unrolled in number of samples, and N is the total feeder number.
Fig. 4.
The specific calculation formulas for each gate function of the LSTM 2.5. Training of CNN-LSTM model
cell are shown as follows:
⎧ t [ ]
f = σ(Wf [ ht− 1 , xt] + bf ) In this study, we train the CNN-LSTM model with fault data obtained
⎪ t
⎪ i = σ(Wi ht− 1 , xt + bi ) from PSCAD simulation as it is difficult to collect extensive practical
⎨ t [ ]
g = tanh(Wg ht− 1 , xt + bg ) data. Three distribution network models with four feeders are estab
[ t− 1 t ] (6) lished by using PSCAD, as shown in Fig. 5. The models consider different
⎪ t
⎪ o t= σ (W
⎪ o h , x + bo )
⎪ t t− 1 t t neutral grounding modes and feeder types, and the detailed parameters
⎩ C =t f ⊗t C + i ⊗
⎪ g
h = o ⊗ tanh(Ct ) are shown in Table 1. Among them, the parameters of the overhead lines
are as follows: R1 = 0.17 Ω/km, L1 = 1.21 mH/km, C1 = 0.0097 μF/km,
where ht denotes the hidden layer output at time t, and xt denotes the R0 = 0.23 Ω/km, L0 = 5.48 mH/km, and C0 = 0.006 μF/km, and the
input at time t. ft, it, and ot are the outputs of the forget gate, input gate, parameters of the cable lines are as follows: R1 = 0.098 Ω/km, L1 =
and output gate, respectively. Wf and bf denote the weight and deviation 0.274 mH/km, C1 = 0.351 μF/km, R0 = 0.246 Ω/km, L0 = 0.955 mH/
3. Case study
be accurately identified as the faulty feeder. Finally, the bus fault and
i line fault can be distinguished, and the faulty feeder can be correctly
i shown in Table 4. It is obvious that the proposed method achieves good
4. Comparison
Fig. 8. Zero-sequence currents and their corresponding Grad-CAM visualiza
tion results in PSCAD simulation (a) lI2, Df = 0.09 km, θf = 21.6◦ , Rf = 2 kΩ, Currently, the existing faulty-feeder detection methods can be
ungrounded network; (b) lI3, Df = 2.19 km, θf = 334.8◦ , Rf = 35 Ω, ungrounded
network; (c) Bus I, θf = 205.2◦ , Rf = 1238 Ω, ungrounded network; (d) lII1, Df = Table 4
0.25 km, θf = 72◦ , Rf = 1 Ω, resonant grounded network; (e) lII6, Df = 1.5 km, θf Detection performance of the proposed method in PSCAD simualtion.
= 315◦ , Rf = 80 Ω, resonant grounded network; (f) Bus II, θf = 135◦ , Rf = 2 kΩ,
Models Accuracy Precision Recall F1
resonant grounded network.
Ungrounded network 99.96 % 99.96 % 99.94 % 99.95 %
Resonant grounded network 100 % 100 % 100 % 100 %
divided into characteristic-analysis methods and deep-learning feeder detection in resonant grounding distribution systems. There
methods. To verify the superiority of the proposed method over the fore, the identification results of faulty-feeder detection by using the
existing methods, one characteristic-analysis method and two deep- proposed method and other three methods are compared under line
learning methods are selected for comparison: faults in the distribution networks with resonant grounding.
To evaluate the robustness to noise interference, we compare the
(1) Method in [8]: The extracted fault features of transient energy, detection accuracy of the proposed method and other three methods
kurtosis, and cross-correlation distance are combined by using under noise with different SNRs ranging from 1 dB to 30 dB in PSCAD
multiple evidence estimation method, and the fault trust degree simulation. The detection results are shown in Fig. 11. It can be clearly
of each feeder can be calculated based on the fusion results. The seen that the proposed method outperforms all the compared methods
feeder with maximum fault degree is selected as the faulty feeder. under different SNR levels. Especially, the proposed method still has
Evidently, the method in [8] belongs to the characteristic- high detection accuracy under strong noise interferences, such as SNR
analysis methods. with 1 dB. Therefore, large amounts of experiment results demonstrate
(2) Method in [26]: By using continuous wavelet transform (CWT), that the proposed method has superior robustness against noise
the time–frequency gray scale images of zero-sequence currents interference.
are obtained and input to the CNN for classification, which cor In addition to detection accuracy over the five datasets in PSCAD
responds to the deep-learning methods. The identification result simulation, we also evaluate the classification metrics, such as precision,
of the faulty feeder is ‘1’, and it is ‘0’ for the healthy feeder. recall, and F1 score, as shown in Table 6, and the evaluation metrics in
(3) Method in [27]: In [27], the authors proposed a deep-learning the field test are also provided. It is obvious that the proposed method
method based on the cooperation of variational mode decompo has superior detection performance in terms of all the evaluation metrics
sition (VMD) and LSTM network. Three intrinsic mode functions on the six datasets. There is little detection performance deterioration of
(IMFs) are extracted from zero-sequence currents by using VMD the proposed method when the noise increases, while the other three
and learned by LSTM network accordingly. Similarly, the faulty methods exhibit the opposite performance. Therefore, the proposed
feeder can be detected based on its output results. method has remarkable improvements in the detection performance of
faulty-feeder identification, which can be attributed to the overall
It is noteworthy that the method in [8] directly select the feeder with exploitation of the spatial–temporal features in the zero-sequence cur
the maximum calculated fault degree as the faulty feeder, which implies rents and comprehensive correlation comparison between feeders.
that the bus fault cannot be identified. For the method in [26] and [27],
they fail to compare the correlations between currents in the detection 5. Conclusion
process, which is also not suitable for distinguishing the bus faults and
line faults. Furthermore, these methods mainly focus on the faulty- To fully exploit and compare the spatial–temporal features in the
Fig. 11. Comparison of the detection accuracy under different SNRs.
(1) The PToP CNN can learn the spatial features of zero-sequence
currents in waveforms, and the FToF LSTM can mine the tem
poral dependencies in time-series data. By using the hybrid CNN-
Table 6
Detection performance of the compared methods on the six datasets.
Datasets Models SNR/dB Metrics Method [8] Method [26] Method [27] Proposed method
