0% found this document useful (0 votes)
35 views9 pages

Transfer Learning For Fault Diagnosis of Transmission Lines: Shakiba, Shojaee, Azizi and Zhou

Uploaded by

atheer lu'ay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views9 pages

Transfer Learning For Fault Diagnosis of Transmission Lines: Shakiba, Shojaee, Azizi and Zhou

Uploaded by

atheer lu'ay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Transfer Learning for Fault Diagnosis of Transmission Lines

Fatemeh Mohammadi Shakiba, Milad Shojaee, S. Mohsen Azizi and Mengchu Zhou
New Jersey Institute of Technology, Newark, NJ 07102, USA

ABSTRACT
Recent artificial intelligence-based methods have shown great promise in the use of neural networks
for real-time sensing and detection of transmission line faults and estimation of their locations. The
expansion of power systems including transmission lines with various lengths have made a fault de-
tection, classification, and location estimation process more challenging. Transmission line datasets
are stream data which are continuously collected by various sensors and hence, require generalized
and fast fault diagnosis approaches. Newly collected datasets including voltages and currents might
not have enough and accurate labels (fault and no fault) that are useful to train neural networks. In
this paper, a novel transfer learning framework based on a pre-trained LeNet-5 convolutional neural
arXiv:2201.08018v1 [cs.LG] 20 Jan 2022

network is proposed. This method is able to diagnose faults for different transmission line lengths
and impedances by transferring the knowledge from a source convolutional neural network to predict
a dissimilar target dataset. By transferring this knowledge, faults from various transmission lines,
without having enough labels, can be diagnosed faster and more efficiently compared to the existing
methods. To prove the feasibility and effectiveness of this methodology, seven different datasets that
include various lengths of transmission lines are used. The robustness of the proposed methodology
against generator voltage fluctuation, variation in fault distance, fault inception angle, fault resistance,
and phase difference between the two generators are well shown, thus proving its practical values in
the fault diagnosis of transmission lines.

1. Introduction learning.
Neural networks (NNs) have been extensively used for
Transmission lines (TLs) are exposed to severe atmo- TL fault diagnosis. A powerful technique to handle predic-
spheric and environmental conditions such as lightning strikes, tive modeling for different but somehow related problems is
icing, tree interference, bird nesting, breakage, hurricanes,
transfer learning, in which partial or complete knowledge of
and aging, as well as human activities and lack of preserva-
one Convolutional Neural Network (CNN) is transferred and
tion and caring. Faults cause interruption in the power flow reused to increase the speed of the training process and im-
and deteriorate the efficiency of power systems. Therefore, prove the performance of another CNN. In this study, a trans-
minimizing the effect of faults is a major goal in transmis- fer learning-based CNN is used for the first time to detect,
sion lines. To provide end-users with uninterrupted electri- identify, and locate the faults for various lengths of TLs. It
cal power, a highly reliable, fast, and accurate fault detec-
will be shown in this work that this methodology is reusable,
tion/classification and location estimation scheme is needed
fast, and accurate.
[1, 2, 3]. There are several studies that use CNNs for fault diagno-
There are several sensed physical parameters in a TL that sis of TLs. In [5], a self-attention CNN framework and a time
are not constant including generator voltages, phase differ- series image-based feature extraction model are presented
ence between two generators, fault resistance, fault inception for fault detection and classification of TLs with length of
angle, fault location, and length of the TL. To consider these
100 (𝑘𝑚) using a discrete wavelet transform (DWT) for de-
variations, artificial intelligence methods need to be devel-
noising the faulty voltage and current signals. In [6], authors
oped to deal with TL datasets that include all these parameter present a customized CNN for fault detection and classifica-
variations [4]. tion of 50 (𝑘𝑚) TLs integrated with distributed generators.
Different lengths of TLs result in different impedances The work done in [7] proposes a machine learning-based
and hence different features for performance analysis. In CNN for TLs with length of 280 (𝑘𝑚) to perform fault de-
general, TLs are divided into three categories based on their
tection and classification using DWT for feature extraction.
length: short TLs with maximum 50 (𝑘𝑚) length, medium Shiddieqy et al. [8] present another methodology that
TLs with 50-150 (𝑘𝑚) length, and long TLs with at least considers all features of the TL faults to generate various
150 (𝑘𝑚) length. In this study, to detect, classify, and lo- models for robust fault detection. Then, they take advantage
cate the faults in these three categories, a general solution is of various artificial intelligence methods including CNN to
proposed for the first time that is compatible with different
achieve a 100% detection accuracy. The length of the TLs in
transmission lines and avoids time-consuming trainings and
this study is 300 (𝑘𝑚). The study in [9] presents a scheme
high computational expenses. This work aims to show how to detect and categorize faults in power TLs with length of
a model’s training time can be greatly reduced via transfer 200 (𝑘𝑚) using convolutional sparse auto-encoders. This
[email protected] (F.M. Shakiba); [email protected] (M. Shojaee); approach has the capability to learn extracted features from
[email protected] (S.M. Azizi); [email protected] (M. Zhou) the dataset of voltage and current signals, automatically, for
ORCID (s): 0000-0002-5465-3213 (M. Shojaee); 0000-0002-8178-2520 fault detection and classification. To generate feature vec-
(S.M. Azizi); 0000-0002-5408-8752 (M. Zhou)

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 1 of 9


Transfer Learning for Fault Diagnosis

tors, convolutional feature mapping and mean pooling meth- data point with its corresponding label (class) as 𝑦𝑖
ods are applied to multi-channel signal segments. [10].
There are also several studies that apply transfer learn-
ing methods to time series datasets in different applications. Time series classification is defined as classifying the
Fawaz et al. [10] show how to transfer deep CNN knowl- dataset 𝐷 by considering every input 𝑥𝑖 to train a classifier
edge for time series dataset classification. In [11], an intel- which maps the given inputs to the given labels based on
ligent method is proposed as a deep convolutional transfer every class variable 𝑦𝑖 [10]. It is clear that sometimes 𝐷
learning network which diagnoses the dynamic system faults consists of a set of pairs in which the inputs and labels are
using an unlabeled dataset. In [12], a deep learning frame- vectors. In this study, the system utilizes voltage and current
work is presented to achieve accurate fault diagnosis using waveforms recorded from one end of a two-bus TL as inputs,
transfer learning which speeds up the training of the deep and considers 10 types of faults as well as the non-faulty case
CNN. Shao et al. [13] propose an intelligent fault diagnosis as the labels (classes). Fast Fourier transform (FFT) is ap-
method for a rotor-bearing system which is based on a mod- plied to both current and voltage waveforms to generate the
ified CNN with transfer learning. The study performed in amplitude of the main frequency component of the signals.
[14] presents a two-phase digital-twin-assisted fault diagno- Therefore, the dataset used to train the CNN includes seven
sis method which takes advantage of deep transfer learning. different features from which the first three features are asso-
It performs fault diagnosis in the development and main- ciated with voltages, the second three features are associated
tenance phases. Li et al. [15] presented a deep adversar- with currents, and the last one is related to the zero sequence
ial transfer learning network to diagnose new and unlabeled signal which is the average of the phase currents. The reason
emerging faults in rotary machines. The contributions of this for having the last feature will be discussed later in Section
work are as follows: 3.

1. Proposing a generalized approach for various TLs’ fault 2.2. Convolutional Neural Network
detection and location estimation using transfer learn- Having the capability of learning hierarchical features
ing technique for the first time ; and independently from inputs, CNNs have been widely used
for image datasets. CNNs have a minimum need for pre-
2. Comparing the proposed methodology with three bench- processed data and can handle high dimensional datasets faster
marks and showing its effectiveness in prediction per- and with more details in comparison with most of artificial
formance. CNNs [16, 17]. In general, CNNs include three types of
The rest of the paper is organized as follows. Section 2 layer: convolution layer, pooling layer, and fully connected
presents preliminary concepts used in this study including layer. Convolutional and pooling layers incorporate convo-
time series classification, CNNs, and transfer learning tech- lution blocks which are stacked for feature extraction pur-
nique. In Section 3, the TL model used in this study as well pose. Fully connected layers are used as classifiers and the
as the feature generation approach are discussed. Section 4 output layer, which is a fully connected one, performs the
describes the dataset generation procedure and the proposed classification or regression task.
transfer learning-based method. Section 5 discusses the re- The main advantages of CNN architecture are local re-
sults and compares the proposed method with three bench- ceptive fields, shared weights, and the pooling operation.
marks including K-means clustering, CNN without transfer CNNs take advantage of the concept of a local receptive field
learning, CNN with transfer learning technique but without which means that only a small focused area of the input data
fine tuning process. Finally, conclusions are made in Section is connected to each node in a convolution layer. Because of
6. this trait, the number of parameters is reduced considerably
in CNN which in turn decreases the training computational
expenses of the NN [12].
2. Preliminaries Numerous studies have been performed in TL fault de-
In this section, an overview of time series data classifi- tection, classification, and localization problems using CNNs
cation, CNNs, and transfer learning procedure is provided. to achieve higher accuracies. These studies are divided into
two different categories: The methodologies with a focus
2.1. Time Series Classification on image-based datasets recorded from outdoor TLs [18, 19,
In general, time series data can be defined in two differ- 20], and the ones that consider time-series voltage and cur-
ent ways as described below [10]. rent waveforms recorded from generators and are fed to CNNs
• Definition 1. A time series data is an ordered (time de- as blocks of data points. The methodology proposed in this
pendant) set of real values such as 𝑋 = [𝑥1 , 𝑥2 , ..., 𝑥𝑛 ] study belongs to the second category.
where 𝑛 is the number of real values and the length of
2.3. Transfer Learning
𝑋 [10].
Transfer learning is proposed to solve the learning prob-
• Definition 2. A time series dataset 𝐷 is defined as 𝐷 = lems between two or multiple domains. The combination of
{(𝑥1 , 𝑦1 ), ..., (𝑥𝑛 , 𝑦𝑛 )} with length of 𝑛 which consists deep learning and transfer learning shows notable improve-
of a collection of pairs (𝑥𝑖 , 𝑦𝑖 ) where 𝑥𝑖 is a time series ments in the accuracy and time of fault diagnosis approaches.

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 2 of 9


Transfer Learning for Fault Diagnosis
Source Model
Transfer learning consists of two steps: First, the process of Source Domain
(Feature Extractor) (Classifier)

training a source neural network on a source dataset and task,

Conv_1-2

Conv_n-1
Conv_1-1

Conv_n-2
Pooling

Pooling
Output
Input
and second, transferring the learned features and knowledge
INPUT Layer
Source Task

to a new network to help the training process of the new re- Convolutional Layer Fully-Connected
Layers

lated (target) dataset. Two main concepts are used in transfer Knowledge
learning, namely, domain and task which are defined below. (Weights of
layers)

• Definition 3 (Domain [21, 22]). A domain  = {, (𝑋)} Target Domain Target Model
consists of two elements, namely, a feature space  (Feature Extractor) (New Classifier)

and a marginal probability distribution (𝑋) where

Conv_1-1

Conv_n-1
Conv_1-2

Conv_n-2
Target Task

Pooling

Pooling
Input INPUT
Output
Layer

𝑋 = {𝑥𝑖 }𝑛𝑖=1 ∈  is a dataset in which every 𝑥𝑖 ∈ ℝ𝐷


is sampled from this domain.
Fully-Connected
Convolutional Layer
Layers

• Definition 4 (Task [21, 22]). A task  = {, (𝑌 |𝑋)} Figure 1: A typical model of transfer learning.
consists of two elements given a domain
 = {, (𝑋)}, where  stands for the label space
and (𝑌 |𝑋) is the conditional probability distribution and based on the target dataset, the new fully connected lay-
in which 𝑌 = {𝑦𝑖 }𝑛𝑖=1 shows the label vector of 𝑋 with ers are integrated with the frozen pretrained layers. In the
𝑦𝑖 ∈  as the label of 𝑥𝑖 . fine tuning process, the structure of fully connected layers
There are two domains in transfer learning, namely, a source from the pretrained model is saved and only their weights
domain 𝑠 = {𝑠 , 𝑠 (𝑋𝑠 )} and a target domain 𝑡 = {𝑡 , 𝑡 (𝑋𝑡 )}are updated. Fig. 1 shows the basic concept of transfer learn-
where 𝑠 and 𝑡 show the feature spaces of the source and ing representing two different datasets which have similari-
target domains, respectively, and 𝑠 (𝑋𝑠 ) and 𝑡 (𝑋𝑡 ) stand ties. These datasets are fed into source and target models.
for the marginal probability distribution of them. Based on The knowledge is transferred to the target model to perform
definitions 3 and 4, the definition of transfer learning is pre- the training of target task faster. In this study, the fine tun-
sented below. ing approach is used to achieve satisfactory results. Transfer
learning method is used in this study for TL fault diagnosis
• Definition 5 (Transfer Learning [22]). Considering problems to detect the rare occurrence of failures which are
the source domain 𝑠 , learning task 𝑠 , target domain difficult or impossible to be labeled. It should be noted that
𝑠 , and learning task 𝑡 , the goal of transfer learning is a transfer learning-based CNN has not been used before for
to promote the performance of target predictive func- TL fault diagnosis problems.
tion 𝑓𝑡 (.) in 𝑡 by inducing the knowledge to 𝑠 and
𝑠 while 𝑠 ≠ 𝑡 or 𝑠 ≠ 𝑡 .
3. Transmission Line Model and Feature
In the case of 𝑠 = 𝑡 , a common subproblem of transfer Generation
learning takes place which is called domain adoption. In this In this study, a power system with two generators which
study, the relationship between the source and target data is are connected through a 100 (𝑘𝑚) three-phase TL is used.
the domain adoption because the seven aforementioned fea- Based on some recent studies[24, 25, 26, 5], a common length
tures (currents, voltages, and zero sequence current) repeat is chosen which stands in the medium range of transmission
in both the source and target TLs which only differ in length line lengths and its multiplications can reside in short or long
[23]. TLs. The voltage of both generators is 240 (𝑘𝑉 ) and their
Transfer learning provides the ability to distribute learned frequency is 60 (𝐻𝑧) as shown in Fig. 2. This model is
features across different learning applications. The focus of simulated in MATLAB Simulink’s Simscape Power System,
transfer learning is on the deep learning training step to en- and all the ten short-circuit faults i.e., single Line-to-Ground
hance its capability in common feature extraction and adop- (LG), Line-to-Line (LL), double Line-to-Ground (LLG), and
tion among multiple datasets of a similar problem. Transfer all-Lines-connected (-to-Ground) (LLL/LLLG) as well as
learning is based on using the pretrained layers on a source no-fault state are taken into account.
task to solve a target task. For this purpose, a pretrained
model in which its fully connected layers are cut off is used L (TL Length)

and its convolutional and pooling layers become frozen (their Phase A

weights are not updated) to perform the role of feature ex- G1 Phase B
Phase C
G2
tractors. To adjust the given pretrained network with the Fault
Bus1 Bus2
target dataset and efficient target classification, the fully con-
nected layers (classifier section) need to update their weights. Load Measurement Device Load

There are two different approaches in transfer learning Figure 2: A three-phase and two-generator power system.
including feature extraction and fine tuning. In the first ap-
proach, the fully connected layers are completely removed The parameters of this TL model are shown in Table 1,

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 3 of 9


Transfer Learning for Fault Diagnosis

Input Layer Convolutional Layers Fully-Connected Layers


(Feature Extractor) (Classifier)
7X7 32@ 5X5
Output Layer
32@ 2X2 48@ 2X2 48@ 1X1
7X7 11
256
84
on
ti
ica
s sif
a
Cl
Estima
tion 1
7X7

7X7 Gaussian

7X7
Full Full Connected
Convolutions Subsampling Convolutions Subsampling Connected Connected

Figure 3: The structure of LeNet-5.

Table 3
Table 1
Parameter values for the generation of training dataset.
TL nominal parameters.
Parameter Variations
Parameter Zero Sequence Positive Sequence
R (Ω/(𝑘𝑚)) 0.3864 0.01273 Fault Distance (𝑘𝑚) 1.2,10, 24, 40, 60, 95
L (𝑚𝐻/(𝑘𝑚)) 4.1264 0.9337 Fault Inception angle (◦ ) 1, 20,50, 100, 150
C (𝜇F/(𝑘𝑚)) 7.751 × 10−3 12.74 × 10−3 Fault Resistance 𝑅𝑓 (Ω) 0.1, 1, 10, 20, 30, 40, 50, 60
Phase DifferenceΔ𝜙 (◦ ) −30, 0, 30
Voltage Fluctuations
Table 2 −40, 0, 40
Δ 𝑉𝑖 (𝑘𝑉 ) = 𝑉1 − 𝑉2
Source and load nominal parameters.
Nominal Parameter Source 1 Source 2 Load such cases, a zero-sequence current, which is the average of
Phase to Phase Voltage (𝑘𝑉 ) 240 240 240 the phase currents, is considered to detect ground faults.
Frequency (𝐻𝑧) 60 60 60 Table 3 shows the parameters with their variations to
Resistance (Ω) 0.08929 0.08929 —
generate robust results. It should be noted that because the
Inductance (𝑚𝐻) 16.58 16.58 —
Active Power (𝑘𝑊 ) — — 100
length of a TL is the critical parameter in this study, the vari-
Inductive Reactive Power (kVAR) — — < 100 ation of fault location is defined as a dependent parameter to
Capacitive Reactive Power (kVAR) — — < 100 the TL length (𝐿 = 100𝑘𝑚), and for different lengths, the
initial fault distances for the reference dataset would be dif-
ferent as well.
and Table 2 presents the features of generators. The model
studied in this paper is based on IEEE 39-Bus System which 4. Proposed Fault Diagnosis Approach
includes 10 generators and 46 lines.
In this section, the proposed method based on transferred
The general datasets used in this study consist of the
CNN is described. Acquiring data from one end of a TL is
amplitudes of the main harmonics of the voltage and cur-
the initial step in fault detection/classification and location
rent waveforms calculated by FFT, which computes the fre-
estimation. As shown in Fig. 2, the sensing/measurement
quency components faster and more efficiently than the con-
device is placed at the second bus to record the current and
ventional Fourier transform. For this purpose, after each
voltage of the three-phase TLs. This process is performed
fault incident, 1.5 cycles of the voltage and current wave-
via a 30-sample time window with the sampling frequency
forms are selected. Then, their main harmonics are com-
of 1.2 (𝑘𝐻𝑧), and FFT is applied to each window at every
puted, normalized and fed into the fault diagnosis module.
step to extract the features.
To assess the robustness of the proposed methodology, some
After normalizing the extracted features, the data points
variations in the TL model are considered such as fault type,
are fed to the fault diagnosis and location estimation net-
location, inception angle, resistance, voltage amplitudes of
works. In the fault diagnosis process, 4 outputs are gener-
the generators, and the phase difference between them. Such
ated for each data point which are associated with phases A,
variations help the system to generate datasets that are large
B, C, and the ground G. When there is no fault, the value of
enough to generate reliable results. In general, the dataset
all these 4 outputs are zero and by switching each output to
used in this work includes 7 features that are 3 phase volt-
one, the faulty state of that phase is detected. If not all out-
ages, 3 phase currents, and a zero-sequence current which is
puts are zero, then at least two of them are one that shows the
a detector for ground faults. In other words, a fault diagnosis
connection of those two outputs together (or to the ground).
model cannot distinguish between LL and LLG faults only
Therefore, in the output layer of the CNN, eleven states can
by considering phase voltage and current signal values. In
occur including 0000 (or 1111), 0011, 0110, 1100, 1001,

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 4 of 9


Transfer Learning for Fault Diagnosis

Table 4
The average accuracy results of K-means clustering for 11 types of faults based on statis-
tical testing.
Length (𝑘𝑚) 12.5 25 50 100 200 400 800 Average
Accuracy (%) 82.42 ± 0.11 82.47 ± 0.07 83.47 ± 0.15 85.09 ± 0.16 87.12 ± 0.10 87.67 ± 0.09 83.87 ± 0.17 84.58 ± 0.12

0101, 1010, 0111, 1011, 1101, 1110. A parallel process us- 2.3 library [28] with TensorFlow 2.0 backend [29] is used
ing another CNN is performed for the location estimation of to design LeNet-5, and Scikit-learn library [30] is used for
the faults. In this system, the input dataset is the same as that classification modules. An LeNet-5 is used with the archi-
of the classification procedure; however, the outputs are the tecture depicted in Fig. 3 to classify the faults and estimate
locations of the faults which are real numbers, not binary. their locations.
The detection, classification, and location estimation of To achieve reliable results, a statistical testing method
TL faults are done by considering the 7-dimension input dataset is implemented and each experiment is performed 30 times.
divided into 7 × 7 small blocks. The features are extracted The values in the tables and figures in this section are all
by striding over these blocks and performing the convolu- based on the thirty iterations of running each analysis [31].
tion computation for each window. For this purpose, one of In case that the values are different, the variations are shown
the earliest pretrained CNNs called LeNet-5 [27] is designed in the corresponding tables. This section is divided into two
by using Keras library [28] in Python. LeNet-5 is chosen for main subsections including classification and location esti-
this study because of its simple and straightforward architec- mation of TL faults.
ture including 2 sets of convolutional and average pooling 5.1. Fault Classification
layers following by a flatten layer, 2 fully connected layers, In this subsection, four different simulations are performed
and ultimately a Softmax classifier. To obtain high accuracy, to prove the reliability and accuracy of the proposed approach.
ReLu activation function is used for all the layers except the Different classification and clustering approaches are per-
output which takes advantage of Softmax function for clas- formed including (a) K-means algorithm for clustering with-
sification and the linear regression function for location esti- out knowing the labels, (b) a dedicated CNN (a NN that is
mation. This architecture is depicted in Fig. 3. As shown in independently and specifically trained for one TL) for each
this figure, the input data points are given by 7 × 7 matrices specific TL length, (c) transfer learning method without fine
to the network in order to emulate the image behaviors, and tuning, and (d) transfer learning method with fine tuning
the output layer consists of 11 neurons (equal to the number which is the main focus in this paper.
of classes) for classification or one neuron for the location First, a K-means clustering algorithm is implemented to
estimation tasks using CNN. categorize the faults without knowing their true labels. Then,
To apply the transfer learning method to the TL fault di- the results are compared with the true labels to calculate the
agnosis problem, the LeNet-5 network is trained with the accuracy which is reported in Table 4. The number of iter-
dataset of a TL with length 𝐿 which is initially equal to 100 ations needed for the convergence of TL length variants are
(𝑘𝑚). Then, the weights of convolutional and pooling layers between 15 to 23 in this algorithm [32].
of this network are saved as ".npy" files and a new LeNet- In the second step, a dedicated CNN method is used for
5 with the same architecture uses these files to apply the each TL length variant. The chosen CNN is LeNet-5 which
generated weights to the corresponding layers. Therefore, produces acceptable results and is used for the comparison
the weights of convolutional and pooling layers in the new with the proposed transfer learning-based method. For this
LeNet-5 are frozen and equal to the first four trained lay- purpose, 70% and 30% of the data are used for training and
ers of the initial LeNet-5. These 4 layers perform the fea- testing, respectively. Table 5 demonstrates the accuracy, pre-
ture extraction, and the rest of the layers, which are free to cision, recall, F1 Score, and training time for 7 different lengths
be updated based on the new datasets (dor TLs with difeer- of TLs without using the transfer learning technique. The pa-
ent length of 𝐿8 = 12.5, 𝐿4 = 25, 𝐿2 = 50, 2 × 𝐿 = 200, rameters of performance evaluations are defined as (1)-(4).
4 × 𝐿 = 400, and 8 × 𝐿 = 800), perform the classification
and adaptation tasks. As it is shown in the next section, this
𝑇𝑃 + 𝑇𝑁
process reduces the training time considerably as compared 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
to the case that a new CNN is trained individually for each
specific TL length.
𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
𝑇𝑃 + 𝐹𝑃
5. Results and Discussions
This section presents the results of the proposed transfer 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
(3)
learning-based CNN method. Simulations are run on a PC 𝑇𝑃 + 𝐹𝑁
with Microsoft Windows 10. This PC uses an Intel Corei7-
4710MQ @ 2.50 GHz processor with 8 GB of RAM. Keras 𝐹1 = 2 ×
𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
(4)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
F. M. Shakiba et al.: Preprint submitted to Elsevier Page 5 of 9
Transfer Learning for Fault Diagnosis

Table 5 Table 7
Classification results of various lengths for TLs using a dedi- Classification results for various lengths of TLs using LeNet-5-
cated LeNet-5 NN. based transfer learning method with fine tuning process.
Length Accuracy Precision Recall F1 Score Training Time Length Accuracy Precision Recall F1 Score Training
(𝑘𝑚) (%) (%) (%) (%) (𝑠𝑒𝑐) (𝑘𝑚) (%) (%) (%) (%) Time(𝑠𝑒𝑐)
12.5 99.7 97.61 97.54 97.57 3169.12 12.5 99.48±0.04 95.9±0.3 96.0±0.3 96.0±0.3 1307±46
25 99.69 97.57 97.45 97.50 3493.55 25 99.36±0.04 95.5±0.3 95.1±0.3 95.3±0.3 1414±46
50 99.46 95.84 95.58 95.70 3441.43 50 99.26±0.03 94.3±0.2 93.8±0.3 94.0±0.2 1328±49
100 99.42 95.5 95.23 95.36 3167.27 200 99.01±0.03 92.2±0.2 91.6±0.2 91.9±0.2 1406±43
200 99.15 93.38 93.08 93.22 2783.07 400 98.82±0.04 91.2±0.2 89.8±0.3 90.5±0.2 1371±39
400 99.1 93.08 92.56 92.81 3217.381 800 98.29±0.05 87.4±0.3 80.9±0.4 84.1±0.3 1483±47
800 98.62 89.85 87.83 88.82 3168.63

Table 6 (in average 85%) which is another proof for the significance
Classification results for various lengths of TLs using LeNet-5- and reliability of using transfer learning method when labels
based transfer learning method without fine tuning process. are missing or inadequate.
Length Accuracy Precision Recall F1 Score Training Time
(𝑘𝑚) (%) (%) (%) (%) (𝑠𝑒𝑐)
12.5 90.80 74.06 50.59 60.11 1101.04
25 90.96 75.3 52.62 61.94 1111.66
50 90.89 72.49 55.37 62.78 1103.21
200 91.19 74.25 59.7 66.18 1123.83
400 90.90 71.99 56.18 63.10 1195.01
800 90.11 65.07 42.35 51.30 1193.73

The provided results are obtained after 64 epochs and


with 3 × 10−4 learning rate for the “adam” optimizer.
In the third step, a transfer learning-based method with-
out fine tuning process with 64 epochs is used. In other
words, all layers of LeNet-5 hold on to the weights of pre- Figure 4: Accuracy comparison between three methods includ-
trained layers and become frozen. The purpose of this step ing using the transfer learning method with and without fine
is to show the effect of fine tuning process on the result of tuning process, and dedicated CNN for classification of TL
TL fault classification. Table 6 shows the results of this step. faults.
In the fourth step, the proposed method is implemented
using transfer learning method and LeNet-5 with fine tuning
approach. Table 7 demonstrates the fault classification re-
sults of the transfer learning method for various lengths of
TLs. In order to perform a fair comparison among the trans-
fer learning-based and non-transfer learning-based methods,
the number of epochs (64) and learning rate (3 × 10−4 ) for
“adam” optimizer) remain the same in all of them.
According to the results given in Tables 5 and 7, the
training time of the transfer learning-based method is almost
half of the training time of the dedicated CNN methodology
(which is a LeNet-5 NN without transfer learning and fine
tuning), while the accuracy values are almost similar. This
result is achieved with negligible loss in accuracy level. As
it is shown in Fig. 4, the difference between the proposed
transfer learning-based method accuracy and the dedicated Figure 5: Training time comparison between three methods
CNN method is less than 0.5 % for all various lengths of of using the transfer learning with and without fine tuning
TLs. Fig. 5 shows the comparison of the training time be- process, and the dedicated CNN for classification of TL faults.
tween the transfer learning-based and non-transfer learning-
based approaches. As it is clear, transfer learning decreases
the training time of the classification to less than half of the
training time of the dedicated CNN approach. The k-means 5.2. Fault Location Estimation
clustering technique generates much lower accuracy results To evaluate the accuracy of location estimation method-
ologies, the mean square error (MSE) which is defines as (5),

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 6 of 9


Transfer Learning for Fault Diagnosis

Table 8
Location estimation results for TL length variants using dedi-
cated LeNet-5 NN.
Length (𝑘𝑚) # Epochs MSE (𝑚2 ) Training Time (𝑠𝑒𝑐)
12.5 1.55×10− 1 2687.36
25 1.03×10− 1 2645.50
50 1.13×10− 1 2948.40
100 32 8.73 ×10− 2 2842.66
200 8.03×10− 2 2835.40
400 5.15×10− 2 2668.42
800 5.52×10− 2 2558.84

Figure 6: MSE comparison between two states of using the Table 9


transfer learning method and without its usage. Location estimation results for TL length variants using LeNet-
5-based transfer learning without fine tuning.
Length (𝑘𝑚) # Epochs MSE (𝑚2 ) Training Time (𝑠𝑒𝑐)
12.5 3.21×10− 1 550.86
25 3.01×10− 1 562.62
50 2.98×10− 1 561.61
32
200 2.48×10− 1 555.35
400 2.59×10− 1 550.65
800 2.80×10− 1 537.17

Table 10
Location estimation results for TL length variants using LeNet-
5-based transfer learning method with fine tuning.
Length (𝑘𝑚) # Epochs MSE (𝑚2 ) Training Time (𝑠𝑒𝑐)
12.5 2.26×10− 1 654.99
Figure 7: Training time comparison between two states of 25 2.00×10− 1 654.45
using the transfer learning method and without its usage for 50 1.69×10− 1 643.90
location estimation of TL faults. 32
200 1.10×10− 1 649.84
400 6.79×10− 2 632.82
800 9.88×10− 2 658.35
is calculated for each length of TL individually in the three
considered approaches: A dedicated CNN for each length
variant of TLs, and with and without the fine tuning process tion increases which is expected considering the feature dif-
in transfer learning-based technique. The number of epochs ferences. Fig. 7 shows that transfer learning with fine tuning
is 32 for all of these approaches. method decreases the training time of the TL fault location
estimation to less than one-fourth of the state where a dedi-

𝑛
cated CNN is applied.
𝑀𝑆𝐸 = (𝑥𝑖 − 𝑦𝑖 )2 (5) Considering the test time of detection, classification, and
𝑖=1 location estimation of TL faults, it is concluded that the cu-
Tables 8, 9, and 10 indicate the MSE values and the train- mulative test time does not exceed 6 (𝜇𝑠𝑒𝑐) which satisfies
ing times of these three approaches in locating the TL faults. the IEEE standard specifications [33].
Based on these tables, it can be concluded that the proposed Figs. 8 shows the variations of accuracy for 6 different
transfer learning method decreases the training time of faults TLs using the transfer learning methodology, and compares
location estimation to one-fourth of the training time of the the training and validation accuracy values for the detection
dedicated CNN approach (without transfer learning). Fur- and classification steps. In Fig. 9, the loss variations of TLs
thermore, although the fine tuning process takes negligible for training and validation steps are shown for all 6 variants
amount of time, it reduces the overall error considerably. of the TLs. These figures prove the reliability and efficiency
Based on Fig. 6, the MSE differences between these two of the proposed approach in location estimation of TL faults.
6. Conclusion
methodologies (dedicated CNN and transfer learning tech-
nique with fine tuning) for each length variant is less than In this paper, a generalized, high-speed, and accurate ap-
0.1 (𝑚2 ) which proves the reliability of the proposed method. proach based on the transfer learning methodology is pro-
One general conclusion that can be made from Figs. 4 and 6 posed for the first time to diagnose TL faults and estimate
is that as the difference between the TL lengths of the source their locations. This approach makes use of a pretrained
and target datasets increases, the transfer learning-based ac- CNN called LeNet-5 which is trained based on the dataset of
curacy decreases and the validation loss for location estima- a TL with the length of 𝐿 = 100(𝑘𝑚), and then the weights

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 7 of 9


Transfer Learning for Fault Diagnosis

(a) (b) (c)

(d) (e) (f)

Figure 8: Training and validation accuracy of TLs with the lengths of


a. L/8=12.5 (𝑘𝑚) b. L/4=25 (𝑘𝑚) c. L/2=50 (𝑘𝑚)
d. 2L=200 (𝑘𝑚) e. 4L=400 (𝑘𝑚) f. 8L=800 (𝑘𝑚) based on LeNet-5 using transfer learning method with fine tuning for
classification of TL faults.

Training and Validation Loss Training and Validation Loss Training and Validation Loss
train_loss 0.50 train_loss train_loss
0.45 Val_Loss Val_Loss 0.50 Val_Loss
0.45
0.40 0.45
0.40
0.35 0.40
0.35
0.30 0.35
Loss

Loss

Loss

0.30
0.25 0.30
0.25
0.20 0.25
0.20
0.15 0.20
0.15
0.10 0.15
0.10
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Epochs Epochs Epochs

(a) (b) (c)

Training and Validation Loss Training and Validation Loss Training and Validation Loss
train_loss 0.60 train_loss train_loss
0.50 Val_Loss Val_Loss Val_Loss
0.55 0.8
0.45
0.50
0.40 0.7
0.45
Loss

Loss

Loss

0.35
0.40 0.6
0.30 0.35
0.25 0.30 0.5

0.20 0.25
0.4
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Epochs Epochs Epochs

(d) (e) (f)

Figure 9: Training and Validation loss of TLs with the lengths of


a. L/8=12.5 (𝑘𝑚) b. L/4=25 (𝑘𝑚) c. L/2=50 (𝑘𝑚)
d. 2L=200 (𝑘𝑚) e. 4L=400 (𝑘𝑚) f. 8L=800 (𝑘𝑚) based on LeNet-5 using transfer learning method with fine tuning for location
estimation of TL faults.

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 8 of 9


Transfer Learning for Fault Diagnosis

of feature extractor layers are frozen and transferred to a new Informatics 17 (2020) 3488–3496.
similar CNN. In the second CNN, the fully connected lay- [14] Y. Xu et al., A digital-twin-assisted fault diagnosis using deep transfer
ers update their weights based on the new datasets specific learning, IEEE Access 7 (2019) 19990–19999.
[15] J. Li, R. Huang, G. He, S. Wang, G. Li, W. Li, A deep adversar-
to each length variant of TLs to produce more accurate re- ial transfer learning network for machinery emerging fault detection,
sults. The simulation results show that the training time of IEEE Sensors J. 20 (2020) 8413–8422.
the transfer learning-based approach is half of the required [16] T. N. Hubel, David H & Wiesel, Receptive fields, binocular inter-
training time of the dedicated CNN (without using transfer action & functional architecture in the cat’s visual cortex, The J. of
learning) approach for the classification of TL faults, and physiology 160 (1962) 106–154.
[17] S. Kiruthika, M & Bindu, Classification of electrical power system
for the location estimation of TL faults the training time of conditions with convolutional neural networks, Engineering, Tech &
the proposed method is one-fourth of the required training Applied Science Research 10 (2020) 5759–5768.
time of the dedicated CNN (without using transfer learning). [18] Z. Lei, Xusheng & Sui, Intelligent fault detection of high voltage line
Such remarkable reduction in training time proves the re- based on the faster r-cnn, Measurement 138 (2019) 379–385.
duction in computational load and therefore memory usage, [19] Y. Wang et al., Image classification towards transmission line fault
detection via learning deep quality-aware fine-grained categorization,
which have been always significant issues for the associated J. of Visual Comm & Image Rep 64 (2019) 102647.
online datasets. Moreover, the accuracy of the fault clas- [20] Z. Dai et al., Fast & accurate cable detection using cnn, Applied
sification and the MSE of the fault location estimation are Intelligence 50 (2020) 4688–4707.
nearly similar to those given by a specifically trained (dedi- [21] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions
cated) CNN for each length of TLs. These results prove the on knowledge and data engineering 22 (2009) 1345–1359.
[22] H. Zheng et al., Cross-domain fault diagnosis using knowledge trans-
generality and reliability of the proposed transfer learning fer strategy: a review, IEEE Access 7 (2019) 129260–129290.
methodology for the TL fault diagnosis problems. [23] G. Csurka, Domain adaptation for visual applications: A comprehen-
sive survey, arXiv preprint arXiv:1702.05374 (2017).
[24] K. M. UDOFIA et al., Fault detection, classification & location on
References 132kv transmission line based on dwt & anfis, surge (VS) 7 (2020).
[1] J. Hare et al., Fault diagnostics in smart micro-grids: A survey, Re- [25] . D. AbdelAziz, & Hasaneen, Detection & classification of one con-
newable & Sustainable Energy Reviews 60 (2016) 1114–1124. ductor open faults in parallel transmission line using artificial neural
[2] M. Bjelić, B. Brković, M. Žarković, T. Miljković, Fault detection in a network, Intl. J. of Scientific Research & Engineering Trends 2 (2016)
power transformer based on reverberation time, International Journal 139–146.
of Electrical Power & Energy Systems 137 (2022) 107825. [26] M. N. Mahmud et al., A robust transmission line fault classification
[3] M. Khoshbouy, A. Yazdaninejadi, T. G. Bolandi, Transmission line scheme using class-dependent feature & 2-tier multilayer perceptron
adaptive protection scheme: A new fault detection approach based network, Electrical Engineering 100 (2018) 607–623.
on pilot superimposed impedance, International Journal of Electrical [27] Y. LeCun, et al., Lenet-5, convolutional neural networks, URL:
Power & Energy Systems 137 (2022) 107826. http://yann. lecun. com/exdb/lenet 20 (2015) 14.
[4] J. Zheng, P. Li, K. Xu, X. Kong, C. Wang, J. Lin, C. Zhang, A distance [28] F. Chollet, et al., Keras, 2015. URL: https://github.com/fchollet/
protection scheme for hvdc transmission lines based on the steady- keras.
state parameter model, International Journal of Electrical Power & [29] M. Abedi et al., TensorFlow: Large-scale machine learning on hetero-
Energy Systems 136 (2022) 107658. geneous systems, 2015. URL: https://www.tensorflow.org/, software
[5] S. R. Fahim et al., Self attention convolutional neural network with available from tensorflow.org.
time series imaging based feature extraction for transmission line fault [30] F. Pedregosa et al., Scikit-learn: Machine learning in python, the
detection & classification, Electric Power Sys. Research 187 (2020) Journal of machine Learning research 12 (2011) 2825–2830.
106437. [31] M. Haroush et al., Statistical testing for efficient out of distribution
[6] P. Rai et al., Fault classification in power system distribution network detection in deep neural networks, arXiv preprint arXiv:2102.12967
integrated with distributed generators using cnn, Electric Power Sys. (2021).
Research (2020) 106914. [32] B. Zhao, Y. Ge, H. Chen, Landslide susceptibility assessment for a
[7] S. Fuada et al., A high-accuracy of transmission line faults (tlfs) clas- transmission line in gansu province, china by using a hybrid approach
sification based on convolutional neural network, Intl. J. of Electron- of fractal theory, information value, and random forest models, Envi-
ics & Telecommunications 66 (2020) 655–664. ronmental Earth Sciences 80 (2021) 1–23.
[8] H. Shiddieqy et al., Power line transmission fault modeling & dataset [33] A. Raza et al., A review of fault diagnosing methods in power trans-
generation for ai based automatic detection, Intl. Symposium on Elec- mission sys., Applied Sciences 10 (2020) 1312.
tronics & Smart Devices (ISESD) (2019) 1–5.
[9] K. Chen et al., Detection & classification of transmission line faults
based on unsupervised feature learning & convolutional sparse au-
toencoder, IEEE Tran. on Smart Grid 9 (2016) 1748–1758.
[10] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.-A. Muller, Trans-
fer learning for time series classification, IEEE international confer-
ence on big data (Big Data) (2018) 1367–1376.
[11] L. Guo et al., Deep convolutional transfer learning network: A new
method for intelligent fault diagnosis of machines with unlabeled data,
IEEE Trans. on Industrial Electronics 66 (2018) 7316–7325.
[12] S. Shao et al., Highly accurate machine fault diagnosis using deep
transfer learning, IEEE Transactions on Industrial Informatics 15
(2018) 2446–2455.
[13] H. Shao et al., Intelligent fault diagnosis of rotor-bearing system un-
der varying working conditions with modified transfer convolutional
neural network and thermal images, IEEE Transactions on Industrial

F. M. Shakiba et al.: Preprint submitted to Elsevier Page 9 of 9

You might also like