Liao 2019

This document presents a novel fault diagnosis method for oil-immersed transformers that utilizes a four-stage data preprocessing approach and Gradient Boosting Decision Tree (GBDT). The proposed method enhances diagnostic accuracy by effectively filtering outliers and optimizing model parameters using Particle Swarm Optimization. Experimental results indicate that this approach significantly outperforms traditional diagnostic models, demonstrating its effectiveness and practicality in transformer fault diagnosis.

Uploaded by

Paiko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Liao 2019

Uploaded by

Paiko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

An Oil-Immersed Transformer Fault Diagnosis Method

Based on Data Preprocessing and Gradient Boosting

Weihan Liao, Huiru Wang, Jianlian Zhang, Chuangxin Guo Jianli Yao, Yuanwen Jin
College of Electrical Engineering State Grid Zhejiang Shaoxing Electric Power Company
Zhejiang University Shaoxing, China
Hangzhou, China [email protected]
[email protected]
Abstract—In view of traditional transformer fault diagnosis quickly and automatically determine the number of clusters
methods’ drawbacks including sensibility to noise, low diagnostic and centers, but its division is rough.
accuracy and difficulty to determine model parameters, a
In terms of data-driven learning, current mainstream
transformer fault diagnosis method based on four-stage data
transformer fault diagnosis models can be divided into
preprocessing and Gradient Boosting is proposed. Firstly, the 14-
dimensional features are obtained based on data of dissolved gas
unsupervised learning clustering models [8], [9] and supervised
in the oil. Secondly, four-stage data preprocessing method (Local learning classification models [10], [11], [12]. The cluster-
Outlier Factor, Canopy, K-Means, S MOTE) is used to identify based method is simple and less rely on historical data, but its
and replace outliers to obtain de-noising samples. Finally, a fault diagnostic accuracy is relatively low. The classification-based
diagnosis model based on GBDT is constructed, which is method includes Bayesian classifier [10] artificial neural
optimized by Particle S warm Optimization (PS O) algorithm. The network [11], decision tree [12] and so on. Early researches on
example verifies that compared with Linear Discriminant classification method pursue higher diagnose accuracy by
Analysis (LDA), K Nearest Neighbor (KNN), Back-Propagation means of designing and fine tuning single classification model.
Neural Network (BPNN), S upport Vector Machine (S VM) and
Recently, comb ined learning is applied to further improve
Random Forest (RF), this method improves the fault diagnosis
accuracy significantly, which shows its effectiveness and model performance [12].
practicality. The main contributions of this paper are summarized
below:
Index Terms—DGA, transformers, Gradient Boosting,  A four-stage data preprocessing method is proposed to
GBDT, data mining. effectively filter the outliers at any d iscrete time point to
ensure the accuracy of model t rain ing. Problems in
I. INT RODUCT ION
preprocessing accuracy, initial clustering center selection,
Oil-immersed transformer is key equipment in the power poor noise resistance of K-means clustering and data
system whose state directly affects the safety and economy of imbalance are solved in this method.
the power grid system [1]. For the reasons of increasing power  Applying the decision tree based ensemble learning
demand and inadequate investment for new equipment, the method- Gradient Boosting Decision Tree (GBDT) in oil-
transformers are in increasing risks of fault, which may lead to immersed transformer fault diagnosis for the first time.
a huge economic loss. Therefore, it is vital to establish high- The model p roposed in this paper achieves higher
accuracy fault diagnosis model for transformers.
accuracy than traditional fault diagnosis models.
Dissolved gas analysis (DGA) method is widely used in
 Using the Particle Swarm Optimizat ion (PSO) algorith m
oil-immersed transformer fau lt diagnosis. Early DGA methods
to optimize the iteration times and learning rate for model
determine fault type by judging the ratio of different gases,
training, the diagnostic accuracy is further improved
such as IEC [2] and Rogers [3] ratio methods, which are simple
compared to the model constructed under empirical
and practical but can lead to low accuracy in many scenes.
Besides, these methods also ignore the great influence of data parameters.
noise on diagnosis. To solve these problems and improve the The rest of this paper is organized as follows: Section II
accuracy of fault diagnosis, recent research is carried out from discusses a four-stage data preprocessing method. The
aspects of data preprocessing and data-driven learning. implementation details of the Gradient Boost Decision Tree
In terms of abnormal data processing, Yan et al. proposed (GBDT) are then shown in section III. Section IV summarizes
relevant methods based on time series [4] and clustering [5]. the process of oil-immersed transformers fault diagnosis. The
The former requires data fro m mu ltiple consecutive moments, calculation results and analysis are shown in section V. Finally,
while the latter can analyze data features on discrete time the conclusion is drawn in section VI.
points, which have lower requirement of the data acquisition. It II. FOUR-ST AGE DAT A PREPROCESSING
is pointed out that K-Means clustering have problems in initial
clustering center selection[6]. Canopy clustering [7] can Local Outlier Factor (LOF) is a classic algorithm to
eliminate outliers globally, but it cannot judge outliers fro m
This work is supported by the National Key R&D Program of China (No. the aspect of sub-categories. K-Means clustering can divide
2017YFB0902600) and "whole-process risk assessment and application of the dataset into different clusters for more accurate outlier
'prevention-check-emergency' management and control system for regional recognition. However, there are two main problems in the K -
power grid"

978-1-7281-1981-6/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
Means clustering. First, this method is sensitive to noise, 1) Copy S p1 to Sp1c, set distance thresholds T1 and T2
which may lead to low clustering accuracy. Second, the K (T1 >T2 ).
initial cluster centers are rando mly given, which may lead to 2) Create a new Canopy. Select data point x in Sp1
pool convergence and massive computation. randomly and add it to the Canopy as the center of the new
Data set S Canopy and remove x from Sp1c.
Set threshold Set threshold T1,T2 Set outlier Calculate average intra-cluster
3) Calculate the distance from all data points y to data
point x in Sp1c.
LOFth Set status number N Extract data point
threshold Kc distances dav1-davN

Calculate LOF
of data
Canopy cluster K-means cluster Calculate distance between data
point and its cluster center d(x,m)
4) If d (x, y) <T1 , return the corresponding data point to
N Cluster number
SMOTE
augmentation the current Canopy.
LOF LOFth？ k=N?
centers clusters
d(x,m) Kc×d av ?
5) If d (x, y) <T2 , move the corresponding point out of
Sp1c.
Y Save data Y m1-mk C1-Ck Y Save data
N cluster centers
Data set Sp1 Data set Sp3 Data set Sp4
m1-mk
6) Repeat step 2) ~ 5) until Sp1c is empty, forming K
Stage 1 Stage 2

Fig.1 Process of four-stage data preprocessing

Stage 3 Stage 4
Canopies and corresponding clustering centers.
As shown in Fig.1, a four-stage data preprocessing The expected cluster nu mber N is the nu mber of
method is proposed to solve the above problems. The LOF transformer status. If the canopy number K doesn’t equal N,
algorithm is used to do preliminary outlier elimination in the the distance threshold T1 and T2 may be inappropriate, repeat
first stage. In order to solve the problem in in itial cluster center steps 1)-6) until K is equal to N.
setting, a lightweight clustering method — Canopy is used in C. Fine clustering: K-Means clustering
the second stage. Then, K-Means is applied for fine tune In traditional K-means clustering, the initial cluster
clustering. The distances between point and cluster center are
centers are randomly selected, which leads to massive
calculated for the outliers elimination based on the distance calculation and poor convergence. After Canopy clustering, the
criterion. Finally, the dataset is complemented by the SMOTE Canopy number K and the Canopy centers m1 , m2 , m3 , …, mk
algorithm to prevent the sample imbalance after denoising.
can be used as initial references. The improved K-means
A. Preliminary preprocessing: Local Outlier Factor clustering process is as follows:
The LOF algorith m is used to preliminary eliminate 1) Input data set Sp1 . Create k clusters C1 , C2 , C3 , …, Ck
outliers fro m the input data set S. When the data point has m and put k cluster centers m1 , m2 , m3 , …, mk generated by
dimensional features, the Euclidean distance between data Canopy cluster respectively into C1 , C2 , C3 , …, Ck as the init ial
point x and data point y can be expressed by equation (1): clustering centers of C1 , C2 , C3 , …, Ck.
m
2) Query distances between data x in the data set Sp1 and
d ( x, y )   (x i  yi ) 2 (1) k cluster centers. Put data x into the nearest cluster.
i 1 3) Recalcu late mean position as new cluster center of
d n (x) is the nth longest distance of x and other data points: each cluster and update m1 , m2 , m3 , … mk.
d n ( x )  d ( x, yn ) (2) 4) Repeat steps 2) ~ 3) until the cluster centers no
where yn is the nth closest data point from data x. rn (x, y) is longer change.
max of d n (x) and d (x, y). K-means clustering improves the accuracy of cluster
division and center position. On this basis, calculate the
rn ( x, y )  max{d n ( x), d ( x, y )} (3)
average intra-cluster distance.
Define Nn (x) set of y where d (x, y) is less than or equal to
d n (x), the local reachable density can be expressed as: d avk 
 xSk , x  m d ( x, m) (6)
N n ( x) Sk -1
lrd n ( x)  (4)
 yN ( x ) rn ( x, y) where S k is the cluster of index k, | S k | is the data number of S k.
n
Set outlier threshold Kc. Do the following for all data points
where | Nn (x)| is the number of data points in set Nn (x). The in the data set Sp1 : Calcu late the distance between the data x
LOF of data x can be obtained with and its cluster center m as d (x, m). If d (x, m) > Kc · d avk, record
lrd ( y ) x as an outlier.
 yNn ( x ) lrdn ( x) Count the numbers of outliers in clusters C1 , C2 , C3 , …, Ck
LOFn ( x)= n
(5)
N n ( x) as Nd1 ,Nd2 ,Nd3 , …, Ndk. Remove all the outliers fro m Sp1 . The
remained data set is Sp3 .
LOFn (x) and the relative density of data x are negatively
correlated. Data with small LOF is likely to be an outlier. Set D. Data supplement: SMOTE augmentation
LOF threshold LOFth , if LOFn (x)< LOFth , eliminate x from S. In order to prevent data number imbalance between
Sp1 represents data set after LOF preprocessing. different clusters after deleting noise data, SMOTE algorith m
B. Rapid division: Canopy clustering is applied to generate new data for supplement. The data
generation method is as follows:
Canopy clustering algorithm can quickly accomplish the
xnew  x1 +rand(0,1)  ( x2  x1 ) (7)
clustering task with a small amount of computation . The
process of Canopy clustering of data set Sp1 is as follows:

Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
Replenish a total number of Ndi generated data for cluster When dealing with a binary classification p roblem, the
Si using SMOTE, where i is the cluster index, i∈(1,k). Sp4 is GBDT is generated as follows:
the final data set made up of all the remained clusters. 1) Set the iterat ion times M, set the current iterat ion
number m to 0, and init ialize the first classifier, as shown in
III. GRADIENT BOOST ING DECISION TREE equation (13).
Traditional fault diagnosis focuses on tuning a single
 y ),
N
i
model to get a better classification performance, which faces F ( x)  0.5  log(
0
i 1
k  1, 2,..., N (13)
 (1  y )
N
two challenges: preferences and overfitt ing. GBDT is based i 1 i
on Gradient Boosting ensemble learning method, which yi of the current
2) Calculate the negative gradient value
improves classification accuracy and generalization ability.
The ensemble learn ing method and training process of GBDT loss function as shown in equation (14).
are discussed in this section.  L( yi , F ( xi )) 
yi     (14)
A. Ensemble learning method  F ( xi )  F ( x )  Fm1( x )
Ensemb le learning integrates several homogeneous weak 3) Fit yi , calculate the parameter γim of each leaf node.
learners together. Bagging and Boosting are two typical
ensemble learning algorith ms [13]. Bagging rando mly  xi R jkm
yik
 jm  (15)
generates n training sets to train n weak learners in parallel.  xi R jkm
( yi  yi )(1  yi  yi )
Random forests are typical applications of the bagging method.
Boosting generates different weak learners by changing where R jm is the J-leaf node classificat ion tree at the mth
weight vector D of data set S. Data with larger error in the iteration.
previous learner training gets higher weight, which is more 4) update the classifier model according to the leaf node
likely to be used in e next learner training process. 
parameters and the attenuation coefficient .
The final pred iction consists of predictions of all the weak
Fm ( x)  Fm 1 ( x)    j 1  jm I ( x  R jm )
J
learners, which reflects the idea of model integration. (16)

B. Gradient boosting 5) Repeat the steps 2-4 M times, each time the current
iteration m is increased by one, and the final binary-class
Gradient Boosting is based on the Boosting idea. The GBDT model is generated at the end of the iterations.
new learner is the co mb ination of the current learner and a In order to build a classifier that can solve mult i-
weak learner, as shown in equation (8).
classification problems, the binary classifiers need to be
Fm 1 ( x)  Fm ( x)  m h( x) 1  m  M (8) properly comb ined. In the GBDT training process, the
where Fm ( x) is the current learner, Fm +1 ( x ) denotes the new combination of "one positive class, multip le negative classes"
learner,  m is the learning rate, and h( x ) is the weak learner can achieve this effect, as shown in Fig.2.
Training data Binary classifier Result
that fits the negative gradient of the current loss function. Positive

The specific process of Gradient Boosting is shown in Class 1 Class 2 ... Class i ... Class n C1 Negative Only the
forecasting result
equation (9) to (12): Class 1
Positive

Class 2 ... Class i ... Class n C2 Negative

of Ci is positive

1) First set the iteration t imes M, and init ialize the ... Classification
result: class i
...

...

model with constant values.

...

Positive

Class 1 Class 2 ... Class i ... Class n Ci Postive


N
F0 ( x)  arg min  L( yi ,  ) (9)
...

...
...

Positive
i 1
Class 1 Class 2 ... Class i ... Class n Cn Negative
2) Calculate the negative gradient value yi of the current
loss function, which is the fitting target of the regression tree Test data

Fig.2 Multi-class GBDT

in this iteration.
Assume that the required nu mber of model classificat ions
 L( yi , F ( xi ))  is N. In training, the ith class data( i [1, N ] ) is taken as
yi     , i  1, N (10)
 F ( xi )  F ( x )  F m1( x ) positive class, and the remain ing N-1 categories’ data are
3) Find the optimal model parameters based on yi to get negative samples. There are n sets training data with different
positive classes separately used to generate binary classifiers
the optimal base classifier  m . C1 to Cn . In in ference, if forecasting result of Ci is positive, the
N
classification result of the ensemble model is class i.
 m  arg min ,  (y i   h( xi ;  m )) 2 (11)
i 1
IV. PROCESS OF FAULT DIAGNOSIS
4) Repeat steps 2-4 M times to generate the final model.
The process of oil-immersed transformer fault diagnosis
includes feature extraction, data preprocessing, state encoding
C. Gradient boosting decision tree and model training. Using PSO algorith m optimizes the
GBDT integrated decision trees by Gradient which has training parameter for better model. In practice, put the features
good performance in precision and generalization. extracting fro m online monitoring or offline experiment data

Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
into the model, the device state will be predicted, which can be TABLE II. SAMP LE DISTRIBUTION
used to guide the device operation and maintenance. State Code Total Samples Training Samples Test Samples
0 49 41 8
A. Feature extraction
1 47 36 11
The dissolved gas data includes the gas concentration of 2 43 34 9
H2, CH4, C2H6, C2H4, and C2H2. The concentrations are 3 51 41 10
quite different among d ifferent gases. Therefore, the dissolved 4 47 38 9
gas data should be first normalized as a feature. 5 49 37 12
C x (K) 6 48 40 8
C nx (K)  (17) Total 334 267 67
 iSo Ci (K）/N
where Cnx (K) is the normalized gas concentration for gas type A. Data preprocessing effect testing
K of data x. Cx (K) is un-normalized gas concentration, So In order to evaluate the effect of four-stage data
represents the data set and N is the number of samples in So . preprocessing, random selected 63 samples in training data,
After normalization, the features of Cn (H2), Cn (CH4), Cn simu late noise under actual conditions by randomly selected at
(C2H2), Cn (C2H4), Cn (C2H6) are obtained. least one feature dimensions in C(H2), C(CH4), C(C2H2),
It is pointed out the traditional feature ext raction methods C(C2H4), C(C2H6), replace the selected feature by zero , or
such as IEC, Rogers, Dornenburg cannot reflect enough the max nu mber, or rando m value within zero and the max
associations under the limit of feature dimensions [11]. The number in this feature dimension of all the training data.
non-coding method can be used to extract additional 9
TABLE III. OUTLIERS IDENTIFICATION OF DIFFERENT P REPROCESSING
dimensions ratio features [14].
14 dimensions of data features are denoted as S14 , where Preprocessing Correct Identification Wrong Identification
5 dimensions are normalized gas concentration, reflecting the LOF 34 20
direct impact of dissolved gases, and 9 dimensions are Four-stage 50 11
concentration ratio, reflecting the correlation of different gases. The outlier identificat ion result shows that four stage
preprocessing have better identification accuracy. Change the
B. Data preprocessing and state encoding number of data preprocessing stages, and select three sets of
Four-stage data preprocessing is performed on data set S parameters to train the GBDT model.
using the method described in Chapter II.
TABLE IV. DIAGNOSTIC ACCURACY OF DIFFERENT PREPROCESSING
According to IEC standards, oil-immersed transformer
fault can be divided into 6 categories, which are encoded M odel Parameter M=1000 M=1000 M=1000
Preprocessing η=0.005 η=0.05 η=0.5
together with the normal state as shown in table I.
No preprocessing 0.791 0.851 0.821
TABLE I. DEVICE STATE CODE LOF+SM OTE 0.821 0.866 0.851
Device State State Code Four-stage 0.836 0.910 0.881
Normal 0 Table IV shows that the four-stage preprocessing leads to
Low temperature overheating 1 higher diagnostic accuracy under all the training parameters.
M edium temperature overheating 2
B. Model comparison
High temperature overheating 3
Partial discharge 4 Linear Discriminant Analysis (LDA), Back-Propagation
Low energy discharge 5 Neural Net work (BPNN), K Nearest Neighbor (KNN),
High energy discharge 6 Support Vector Machine (SVM) and Random Forest (RF)
have been used in related researches. The fault diagnosis
C. Training parameter optimization accuracy of above models and GBDT (after data
GBDT fits the negative gradient of loss function through preprocessing, under empirical parameters) are as shown.
mu lti-iterat ions. Finding the appropriate value of iteration
TABLE V. DIAGNOSTIC ACCURACY OF DIFFERENT MODELS
times M and learning rate η is essential for model training.
PSO is a heuristic intelligence algorith m that can quickly Label LDA BPNN KNN SVM RF GBDT
optimize nonlinear functions [15], which can be applied to 0 0.625 1 0.625 0.875 0.875 0.875
optimizing train ing parameters of GBDT. Classificat ion 1 0.818 0.818 0.727 0.909 0.909 0.909
accuracy of the test set is used as an evaluation function in 2 0.889 0.889 0.889 0.889 0.889 0.889
3 0.6 0.8 1 0.7 0.8 0.8
training parameters optimization.
4 0.778 0.889 0.778 0.667 0.778 1
5 0.75 0.917 0.667 0.917 0.833 0.917
V. CASE ST UDY 6 0.625 1 0.5 0.25 0.875 1
Total 0.731 0.896 0.746 0.761 0.851 0.910
334 cases oil-immersed transformer historical fault data
provided by a regional power grid are used for case analysis.
The sample distribution is shown in table II.

Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
TABLE VI. COMP UTATIONAL TIME OF DIFFERENT MODELS model and PSO optimization. Co mpared with traditional fault
LDA BPNN KNN SVM RF GBDT diagnosis, the proposed method makes imp rovement in anti-
Train(ms) 1.498 34505.4 0.99 3.49 1469.76 2124.53 noise, diagnostic accuracy and model parameters determination.
Test(ms) 0.001 0.49 1.49 0.49 73.86 2.99 REFERENCES
The diagnostic accuracy from high to low is GBDT,
[1] Y. Zhang, S. L. Ho, and W. Fu, “Applying response surface
BPNN, RF, SVM, KNN, LDA. The co mparison of six models
method to oil-immersed transformer cooling system for design
demonstrates the effectiveness of GBDT. The co mputational optimization,” IEEE Transactions on Magnetics, vol. 54, no. 11,
time of GBDT in testing is higher than LDA, BPNN, KNN pp. 1-5, 2018.
and SVM, but still in millisecond level. Hence, GDBT can [2] Z. Liu, B. Song, E. Li, et al. “Study of ‘code absence’ in the IEC
meet the timeliness requirements in engineering practice. three-ratio method of dissolved gas analysis,” IEEE Electrical
C. Training parameter comparison Insulation Magazine, vol. 31, no. 6, pp. 6-12, 2015.
In order to analy ze the in fluence of iteration t imes M and [3] I. B. M . Taha, H. G. Zaini, and S. S. M . Ghoneim, “Comparative
learning rate η on the accuracy of the GBDT model, the study between dorneneburg and rogers methods for transformer
comparison of diagnosis accuracy under different t rain ing fault diagnosis based on dissolved gas analysis using M atlab
parameters is shown in table VII and Fig 3. Simulink Tools,” IEEE Conference on Energy Conversion.
CENCON, 2015, pp. 363-367.
TABLE VII. DIAGNOSTIC ACCURACY UNDER DIFFERENT TRAINING [4] J. Lin, G. Sheng, Y. Yan, et al. “Online monitoring data
P ARAMETERS
cleaning of transformer considering time series correlation,”
η = 0.005 η = 0.05 η = 0.27 η = 0.5 IEEE/PES Transmission and Distribution Conference and
M =100 0.791 0.836 0.851 0.881 Exposition. TD, 2018, pp. 1-9.
M = 840 0.836 0.910 0.955(PSO) 0.896 [5] X. Liang, Y. Wang, H. Li, et al. “Power transformer abnormal
M = 1000 0.836 0.910(Empirical) 0.925 0.881 state recognition model based on improved K-M eans
M = 5000 0.851 0.881 0.910 0.866 clustering,” IEEE Electrical Insulation Conference. EIC, 2018,
pp. 327-330.
ACC [6] D. Tanır and F. Nuriyeva, “An effective method determining the
initial cluster centers for K-means for clustering gene expression
data,” International Conference on Computer Science and
Engineering. UBM K, 2017, pp. 751-754.
[7] A. M ccallum, K. Nigam, L. H. Ungar, “Efficient clustering of
high-dimensional data sets with application to reference
matching,” International Conference on Knowledge Discovery
and Data Mining. DBLP, 2000, pp. 169-178.
η [8] M . Bartłomiejczyk, M . Gutten, and Š. Hamacek, “Analysis of
transformer state by fuzzy TOPSIS and AHP method,”
M Proceedings of the International Scientific Conference on
Electric Power Engineering. EPE, 2014, pp. 451-456.
[9] S. Tang, G. Peng, and Z. Zhong, “An improved fuzzy C-means
clustering algorithm for transformer fault,” China International
Fig.3 Diagnostic accuracy under different training parameters
Conference on Electricity Distribution. CICED, 2016, pp. 1-5.
The fault diagnosis accuracy optimized by PSO (M = 840,
[10] Q. Xie, H. Zeng, L. Ruan, et al. “Transformer fault diagnosis
η = 0.27, ACC = 0.955) is local optimu m and also better than
based on bayesian network and rough set reduction theory,”
under empirical parameters. Co mpared with the empirical IEEE Tencon - Spring, 2013, pp. 262-266.
parameters (M = 1000, η = 0.05, ACC = 0.910), the model
[11] J. Dai, H. Song, G. Sheng, et al. “Dissolved gas analysis of
performance after optimization increased by 4.95%. insulating oil for power transformer fault diagnosis with deep
belief network,” IEEE Transactions on Dielectrics and
VI. CONCLUSION
Electrical Insulation, vol. 24, no. 5, pp. 2828-2835, 2017.
An oil-immersed transformer fault diagnosis method [12] X. Chen, H. Cui, and L. Luo, “Fault diagnosis of transformer
based on four-stage data preprocessing and GBDT is proposed based on random forest,” International Conference on
in this paper. The four-stage preprocessing method reduces the Intelligent Computation Technology and Automation, 2011, pp.
impact of noise on diagnostic results, which takes data 132-134.
imbalance into account and can effectively find and replace [13] G. Ditzler, J. LaBarck, J. Ritchie, et al. “Extensions to online
outliers. The concept and imp lement of the gradient boosting feature selection usin g Bagging and Boosting,” IEEE
decision tree model is then proposed, which has better Transactions on Neural Networks and Learning Systems, vol. 29,
accuracy than traditional fault diagnosis models. no. 9, pp. 4504-4509, 2018.
The whole process of the proposed oil-immersed [14] C. X. Sun, “Online monitoring and fault diagnosis of oil in
transformer fault diagnosis method is summarized, which electrical equipment,” Science Publishing Company, 2003.
consists of feature extraction, data preprocessing, s tate [15] L. Tong, X. Li, J. Hu, et al. “A PSO optimization scale-
encoding and training optimization. Case studies demonstrate transformation stochastic-resonance algorithm with stability
the effectiveness of the four-stage data preprocessing, GBDT mutation operator,” IEEE Access, vol. 6, pp. 1167-1176, 2018.