Liao 2019
Liao 2019
Calculate LOF
of data
Canopy cluster K-means cluster Calculate distance between data
point and its cluster center d(x,m)
4) If d (x, y) <T1 , return the corresponding data point to
N Cluster number
SMOTE
augmentation the current Canopy.
LOF LOFth? k=N?
centers clusters
d(x,m) Kc×d av ?
5) If d (x, y) <T2 , move the corresponding point out of
Sp1c.
Y Save data Y m1-mk C1-Ck Y Save data
N cluster centers
Data set Sp1 Data set Sp3 Data set Sp4
m1-mk
6) Repeat step 2) ~ 5) until Sp1c is empty, forming K
Stage 1 Stage 2
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
Replenish a total number of Ndi generated data for cluster When dealing with a binary classification p roblem, the
Si using SMOTE, where i is the cluster index, i∈(1,k). Sp4 is GBDT is generated as follows:
the final data set made up of all the remained clusters. 1) Set the iterat ion times M, set the current iterat ion
number m to 0, and init ialize the first classifier, as shown in
III. GRADIENT BOOST ING DECISION TREE equation (13).
Traditional fault diagnosis focuses on tuning a single
y ),
N
i
model to get a better classification performance, which faces F ( x) 0.5 log(
0
i 1
k 1, 2,..., N (13)
(1 y )
N
two challenges: preferences and overfitt ing. GBDT is based i 1 i
on Gradient Boosting ensemble learning method, which yi of the current
2) Calculate the negative gradient value
improves classification accuracy and generalization ability.
The ensemble learn ing method and training process of GBDT loss function as shown in equation (14).
are discussed in this section. L( yi , F ( xi ))
yi (14)
A. Ensemble learning method F ( xi ) F ( x ) Fm1( x )
Ensemb le learning integrates several homogeneous weak 3) Fit yi , calculate the parameter γim of each leaf node.
learners together. Bagging and Boosting are two typical
ensemble learning algorith ms [13]. Bagging rando mly xi R jkm
yik
jm (15)
generates n training sets to train n weak learners in parallel. xi R jkm
( yi yi )(1 yi yi )
Random forests are typical applications of the bagging method.
Boosting generates different weak learners by changing where R jm is the J-leaf node classificat ion tree at the mth
weight vector D of data set S. Data with larger error in the iteration.
previous learner training gets higher weight, which is more 4) update the classifier model according to the leaf node
likely to be used in e next learner training process.
parameters and the attenuation coefficient .
The final pred iction consists of predictions of all the weak
Fm ( x) Fm 1 ( x) j 1 jm I ( x R jm )
J
learners, which reflects the idea of model integration. (16)
B. Gradient boosting 5) Repeat the steps 2-4 M times, each time the current
iteration m is increased by one, and the final binary-class
Gradient Boosting is based on the Boosting idea. The GBDT model is generated at the end of the iterations.
new learner is the co mb ination of the current learner and a In order to build a classifier that can solve mult i-
weak learner, as shown in equation (8).
classification problems, the binary classifiers need to be
Fm 1 ( x) Fm ( x) m h( x) 1 m M (8) properly comb ined. In the GBDT training process, the
where Fm ( x) is the current learner, Fm +1 ( x ) denotes the new combination of "one positive class, multip le negative classes"
learner, m is the learning rate, and h( x ) is the weak learner can achieve this effect, as shown in Fig.2.
Training data Binary classifier Result
that fits the negative gradient of the current loss function. Positive
The specific process of Gradient Boosting is shown in Class 1 Class 2 ... Class i ... Class n C1 Negative Only the
forecasting result
equation (9) to (12): Class 1
Positive
1) First set the iteration t imes M, and init ialize the ... Classification
result: class i
...
...
Positive
N
F0 ( x) arg min L( yi , ) (9)
...
...
...
Positive
i 1
Class 1 Class 2 ... Class i ... Class n Cn Negative
2) Calculate the negative gradient value yi of the current
loss function, which is the fitting target of the regression tree Test data
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
into the model, the device state will be predicted, which can be TABLE II. SAMP LE DISTRIBUTION
used to guide the device operation and maintenance. State Code Total Samples Training Samples Test Samples
0 49 41 8
A. Feature extraction
1 47 36 11
The dissolved gas data includes the gas concentration of 2 43 34 9
H2, CH4, C2H6, C2H4, and C2H2. The concentrations are 3 51 41 10
quite different among d ifferent gases. Therefore, the dissolved 4 47 38 9
gas data should be first normalized as a feature. 5 49 37 12
C x (K) 6 48 40 8
C nx (K) (17) Total 334 267 67
iSo Ci (K)/N
where Cnx (K) is the normalized gas concentration for gas type A. Data preprocessing effect testing
K of data x. Cx (K) is un-normalized gas concentration, So In order to evaluate the effect of four-stage data
represents the data set and N is the number of samples in So . preprocessing, random selected 63 samples in training data,
After normalization, the features of Cn (H2), Cn (CH4), Cn simu late noise under actual conditions by randomly selected at
(C2H2), Cn (C2H4), Cn (C2H6) are obtained. least one feature dimensions in C(H2), C(CH4), C(C2H2),
It is pointed out the traditional feature ext raction methods C(C2H4), C(C2H6), replace the selected feature by zero , or
such as IEC, Rogers, Dornenburg cannot reflect enough the max nu mber, or rando m value within zero and the max
associations under the limit of feature dimensions [11]. The number in this feature dimension of all the training data.
non-coding method can be used to extract additional 9
TABLE III. OUTLIERS IDENTIFICATION OF DIFFERENT P REPROCESSING
dimensions ratio features [14].
14 dimensions of data features are denoted as S14 , where Preprocessing Correct Identification Wrong Identification
5 dimensions are normalized gas concentration, reflecting the LOF 34 20
direct impact of dissolved gases, and 9 dimensions are Four-stage 50 11
concentration ratio, reflecting the correlation of different gases. The outlier identificat ion result shows that four stage
preprocessing have better identification accuracy. Change the
B. Data preprocessing and state encoding number of data preprocessing stages, and select three sets of
Four-stage data preprocessing is performed on data set S parameters to train the GBDT model.
using the method described in Chapter II.
TABLE IV. DIAGNOSTIC ACCURACY OF DIFFERENT PREPROCESSING
According to IEC standards, oil-immersed transformer
fault can be divided into 6 categories, which are encoded M odel Parameter M=1000 M=1000 M=1000
Preprocessing η=0.005 η=0.05 η=0.5
together with the normal state as shown in table I.
No preprocessing 0.791 0.851 0.821
TABLE I. DEVICE STATE CODE LOF+SM OTE 0.821 0.866 0.851
Device State State Code Four-stage 0.836 0.910 0.881
Normal 0 Table IV shows that the four-stage preprocessing leads to
Low temperature overheating 1 higher diagnostic accuracy under all the training parameters.
M edium temperature overheating 2
B. Model comparison
High temperature overheating 3
Partial discharge 4 Linear Discriminant Analysis (LDA), Back-Propagation
Low energy discharge 5 Neural Net work (BPNN), K Nearest Neighbor (KNN),
High energy discharge 6 Support Vector Machine (SVM) and Random Forest (RF)
have been used in related researches. The fault diagnosis
C. Training parameter optimization accuracy of above models and GBDT (after data
GBDT fits the negative gradient of loss function through preprocessing, under empirical parameters) are as shown.
mu lti-iterat ions. Finding the appropriate value of iteration
TABLE V. DIAGNOSTIC ACCURACY OF DIFFERENT MODELS
times M and learning rate η is essential for model training.
PSO is a heuristic intelligence algorith m that can quickly Label LDA BPNN KNN SVM RF GBDT
optimize nonlinear functions [15], which can be applied to 0 0.625 1 0.625 0.875 0.875 0.875
optimizing train ing parameters of GBDT. Classificat ion 1 0.818 0.818 0.727 0.909 0.909 0.909
accuracy of the test set is used as an evaluation function in 2 0.889 0.889 0.889 0.889 0.889 0.889
3 0.6 0.8 1 0.7 0.8 0.8
training parameters optimization.
4 0.778 0.889 0.778 0.667 0.778 1
5 0.75 0.917 0.667 0.917 0.833 0.917
V. CASE ST UDY 6 0.625 1 0.5 0.25 0.875 1
Total 0.731 0.896 0.746 0.761 0.851 0.910
334 cases oil-immersed transformer historical fault data
provided by a regional power grid are used for case analysis.
The sample distribution is shown in table II.
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.
TABLE VI. COMP UTATIONAL TIME OF DIFFERENT MODELS model and PSO optimization. Co mpared with traditional fault
LDA BPNN KNN SVM RF GBDT diagnosis, the proposed method makes imp rovement in anti-
Train(ms) 1.498 34505.4 0.99 3.49 1469.76 2124.53 noise, diagnostic accuracy and model parameters determination.
Test(ms) 0.001 0.49 1.49 0.49 73.86 2.99 REFERENCES
The diagnostic accuracy from high to low is GBDT,
[1] Y. Zhang, S. L. Ho, and W. Fu, “Applying response surface
BPNN, RF, SVM, KNN, LDA. The co mparison of six models
method to oil-immersed transformer cooling system for design
demonstrates the effectiveness of GBDT. The co mputational optimization,” IEEE Transactions on Magnetics, vol. 54, no. 11,
time of GBDT in testing is higher than LDA, BPNN, KNN pp. 1-5, 2018.
and SVM, but still in millisecond level. Hence, GDBT can [2] Z. Liu, B. Song, E. Li, et al. “Study of ‘code absence’ in the IEC
meet the timeliness requirements in engineering practice. three-ratio method of dissolved gas analysis,” IEEE Electrical
C. Training parameter comparison Insulation Magazine, vol. 31, no. 6, pp. 6-12, 2015.
In order to analy ze the in fluence of iteration t imes M and [3] I. B. M . Taha, H. G. Zaini, and S. S. M . Ghoneim, “Comparative
learning rate η on the accuracy of the GBDT model, the study between dorneneburg and rogers methods for transformer
comparison of diagnosis accuracy under different t rain ing fault diagnosis based on dissolved gas analysis using M atlab
parameters is shown in table VII and Fig 3. Simulink Tools,” IEEE Conference on Energy Conversion.
CENCON, 2015, pp. 363-367.
TABLE VII. DIAGNOSTIC ACCURACY UNDER DIFFERENT TRAINING [4] J. Lin, G. Sheng, Y. Yan, et al. “Online monitoring data
P ARAMETERS
cleaning of transformer considering time series correlation,”
η = 0.005 η = 0.05 η = 0.27 η = 0.5 IEEE/PES Transmission and Distribution Conference and
M =100 0.791 0.836 0.851 0.881 Exposition. TD, 2018, pp. 1-9.
M = 840 0.836 0.910 0.955(PSO) 0.896 [5] X. Liang, Y. Wang, H. Li, et al. “Power transformer abnormal
M = 1000 0.836 0.910(Empirical) 0.925 0.881 state recognition model based on improved K-M eans
M = 5000 0.851 0.881 0.910 0.866 clustering,” IEEE Electrical Insulation Conference. EIC, 2018,
pp. 327-330.
ACC [6] D. Tanır and F. Nuriyeva, “An effective method determining the
initial cluster centers for K-means for clustering gene expression
data,” International Conference on Computer Science and
Engineering. UBM K, 2017, pp. 751-754.
[7] A. M ccallum, K. Nigam, L. H. Ungar, “Efficient clustering of
high-dimensional data sets with application to reference
matching,” International Conference on Knowledge Discovery
and Data Mining. DBLP, 2000, pp. 169-178.
η [8] M . Bartłomiejczyk, M . Gutten, and Š. Hamacek, “Analysis of
transformer state by fuzzy TOPSIS and AHP method,”
M Proceedings of the International Scientific Conference on
Electric Power Engineering. EPE, 2014, pp. 451-456.
[9] S. Tang, G. Peng, and Z. Zhong, “An improved fuzzy C-means
clustering algorithm for transformer fault,” China International
Fig.3 Diagnostic accuracy under different training parameters
Conference on Electricity Distribution. CICED, 2016, pp. 1-5.
The fault diagnosis accuracy optimized by PSO (M = 840,
[10] Q. Xie, H. Zeng, L. Ruan, et al. “Transformer fault diagnosis
η = 0.27, ACC = 0.955) is local optimu m and also better than
based on bayesian network and rough set reduction theory,”
under empirical parameters. Co mpared with the empirical IEEE Tencon - Spring, 2013, pp. 262-266.
parameters (M = 1000, η = 0.05, ACC = 0.910), the model
[11] J. Dai, H. Song, G. Sheng, et al. “Dissolved gas analysis of
performance after optimization increased by 4.95%. insulating oil for power transformer fault diagnosis with deep
belief network,” IEEE Transactions on Dielectrics and
VI. CONCLUSION
Electrical Insulation, vol. 24, no. 5, pp. 2828-2835, 2017.
An oil-immersed transformer fault diagnosis method [12] X. Chen, H. Cui, and L. Luo, “Fault diagnosis of transformer
based on four-stage data preprocessing and GBDT is proposed based on random forest,” International Conference on
in this paper. The four-stage preprocessing method reduces the Intelligent Computation Technology and Automation, 2011, pp.
impact of noise on diagnostic results, which takes data 132-134.
imbalance into account and can effectively find and replace [13] G. Ditzler, J. LaBarck, J. Ritchie, et al. “Extensions to online
outliers. The concept and imp lement of the gradient boosting feature selection usin g Bagging and Boosting,” IEEE
decision tree model is then proposed, which has better Transactions on Neural Networks and Learning Systems, vol. 29,
accuracy than traditional fault diagnosis models. no. 9, pp. 4504-4509, 2018.
The whole process of the proposed oil-immersed [14] C. X. Sun, “Online monitoring and fault diagnosis of oil in
transformer fault diagnosis method is summarized, which electrical equipment,” Science Publishing Company, 2003.
consists of feature extraction, data preprocessing, s tate [15] L. Tong, X. Li, J. Hu, et al. “A PSO optimization scale-
encoding and training optimization. Case studies demonstrate transformation stochastic-resonance algorithm with stability
the effectiveness of the four-stage data preprocessing, GBDT mutation operator,” IEEE Access, vol. 6, pp. 1167-1176, 2018.
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 08:07:13 UTC from IEEE Xplore. Restrictions apply.