Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We
Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We
Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We
ABSTRACT Rolling bearing is one of the most critical components in rotating machinery, so in order to
efficiently select features, reduce feature dimensions and improve the correctness of fault diagnosis, a feature
selection and fusion method based on weighted multi-dimensional feature fusion is proposed. Firstly, features
are extracted from different domains to constitute the original high-dimensional feature set. Considering the
large number of invalid and redundant features contained in such original feature set, a feature selection
process that combines with support vector machine (SVM) single feature evaluation, correlation analysis and
principal component analysis-weighted load evaluation (PCA-WLE) is put forward in this paper for selecting
sensitive features. The selected features are weighted and fused according to their sensitivity so as to further
weaken the interference of low important features. Finally, this process is applied to the data provided by the
Case Western Reserve University Bearing Data Center and Xi’an Jiaotong University School of Mechanical
Engineering, respectively, and the fault is diagnosed by using the particle swarm optimization-support vector
machine (PSO-SVM). The results show that this method can accurately identify different fault categories and
degrees of bearing, which is superior and practical than single-domain fault diagnosis with higher recognition
INDEX TERMS Features selection, feature weighting, sensitive features, fault diagnosis.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
19008 VOLUME 8, 2020
Y. Li et al.: Bearing Fault Feature Selection Method Based on Weighted Multidimensional Feature Fusion
The qualitative empirical knowledge based method mainly includes feature selection and feature dimension reduction.
depends on the accumulated experience gained during the Compared with features reduction and patterns recognition,
operation of the system. According to the incomplete prior there are relatively few studies on features reduction. On the
experience, the operating state of the equipment is described one hand, the increasing feature extraction method leads to an
and a qualitative model is established. The next state of increase in the feature vector dimension, but not all fault fea-
the equipment is predicted by reasoning. This kind of fault tures have an effect on bearing fault diagnosis. The increase
diagnosis method includes singed directed graph [12], fault of invalid features is likely to cause the diagnosis process
tree [13], expert system [14] and so on. However, the diag- to be more complicated and the accuracy of the diagnosis
nostic ability of knowledge-based fault diagnosis methods results to be reduced [24]. On the other hand, different types
depends only on the historical experience of experts or field of features have different applicability in different types of
workers. With the acceleration of industrial upgrading and the bearing failures or different stages of bearing operation [25].
deepening of relevant professional knowledge, the empirical Therefore, features should be simplified after feature extrac-
knowledge often exceeds the range that can be grasped by tion is completed, and the optimal features for maintaining
ordinary workers, making it difficult to carry out. operating. the intrinsic information about the faults should be retained
This method is especially not suitable for large industrial under the condition of reducing the number of features as
systems. Moreover, the above two methods are more suitable much as possible, so as to effectively and efficiently diagnose
for systems with fewer input, output and state variables, and the faults of bearings. Liao et al. [26] selected two different
are less practical for multi-sensor and mass acquisition data clustering analysis methods to classify the bearing data, and
systems. used the correlation analysis method to reduce the dimen-
Data-driven fault diagnosis methods include: (1) statistical- sionality of the data; Yang et al. [27] extracted the fault fea-
based methods; (2) signal-based methods; and (3) artificial tures in the vibration signal by means of ensemble empirical
intelligence-based methods. With the rapid development mode decomposition (EEMD), and reduced them by using
of data mining, computer technology and artificial intelli- principal component analysis (PCA); In [28], correlation,
gence [15], data-driven fault diagnosis methods have increas- monotonicity and robustness were selected as the evaluation
ingly shown their strong applicability, and often use a indicators of the features. Using these indicators, the residual
combination of three methods. Based on the redundant sec- life trend of the bearing was well displayed, and the remaining
ond generation wavelet packet transform (RSGWPT), service life of the bearing was effectively predicted; In [29],
Liu et al. [16] extracted 56 features of the vibration signal an adaptive feature selection technique was proposed. This
and input support vector machine (SVM) for fault identifi- technique can be used to remove redundant features and
cation; Tian et al. [17] selected permutation entropy (PE) as reduce the amount of computation for pattern recognition;
the fault feature, and proposed a manifold-based dynamic In [3], the Hilbert time-time (HTT) transform was combined
time warping method for fault diagnosis; Li et al. [18] with principal component analysis to extract and reduced
selected 1634 features and classified the bearing faults the bearing fault features. At present, some researchers have
using the method of fuzzy C-means with a variable studied the selection and dimension reduction of bearing fault
focal point (FCMFP); In [19], composite multiscale fuzzy features, but there are still some deficiencies in these research
entropy (CMFE) was selected as the feature to train the work. For one thing, many articles only consider single-fault
ensemble support vector machine (ESVM) for fault diag- features, such as time domain statistics or frequency domain
nosis of the rolling element bearings; In [20], the energy statistics, which cannot reflect fault information more com-
entropy of the intrinsic mode function (IMFs) of the bearing prehensively, and the comprehensiveness of features is poor;
vibration signal is extracted, and combined with probabilistic For another, the existing methods of dimensionality reduction
neural network (PNN) and simplified fuzzy adaptive reso- mostly use a single method such as Linear Discriminant
nance theory map (SFAM) for online bearing fault diagnosis; Analysis (LDA) and PCA, which cannot reflect the differ-
In [21], the hierarchical symbol dynamic entropy (HSDE) is ence between samples. Moreover, these methods use math-
used as a sensitive feature input binary tree support vector ematical means to reprocess the data. The new features are
machine (BT-SVM) to effectively identify the fault of the obtained by combining a plurality of original features, and the
bearing. Most of these tasks use statistic and signal analysis to physical information cannot be directly represented to guide
extract the features of vibration signals and to diagnose faults the subsequent equipment processes. Therefore, the selection
based on artificial intelligence. Some documents also use and dimension reduction of fault features should be further
deep learning methods to automatically extract fault features explored in order to adaptively select the optative sensitive
for diagnosis. All kinds of them have greatly advanced the features.
fault diagnosis research of bearings. The rest of this paper is organized as follows. In section II,
The data-driven fault diagnosis process can be divided a basic theories of the SVM, PCA, correlation analysis and
into four steps: signal processing, feature extraction, features multi-dimensional feature extraction techniques is outlined.
reduction, and patterns recognition [22], [23], and the first In section III, the specific steps of the proposed feature selec-
three are the foundation of the fourth step. Features reduction tion method is described in detail, and the system framework
of the method is given. Experimental verifications of actual bearing has poor lubrication. The stability of such features
data are conducted in Sections 4 and conclusion and recom- is poor, and sometimes the value decreases as the degree of
mendations for future work are summarized in section V. failure increases. In general, time-domain based fault feature
extraction is still in a relatively early stage.
The information expressed by this method is not comprehen- The feature extraction according to entropy theory is appli-
sive and requires a joint distribution of time and frequency to cable to the environment with high signal-to-noise ratio,
characterize the signal. but when the effective signal is completely submerged by
noise, a large overlap will be triggered between different
3) ENERGY FEATURE signal entropies, making it difficult to accurately distinguish
Since the measured vibration signal contains not only the features.
operating condition information related to the bearing itself, One of the cores of comprehensive diagnosis and pre-
but also a large amount of information about other rotating diction of bearing development fault state is the extraction
parts and structures in the unit equipment, of which the latter of signal fault features. It is particularly crucial to select
belong to background noise compared to the former [40]. features that can accurately represent the fault category to
Background noise is usually so large that the slight bearing improve the accuracy of the diagnosis results. From the above
fault information will be submerged and difficult for extrac- analysis, we can know that the fault information displayed
tion. Thus, it is hard to accurately assess the working con- by different categories of features is not the same, so it is
dition of bearing through the conventional time-domain and necessary to establish a high-dimensional feature set that can
frequency-domain methods [41], [42]. Therefore, the method represent the fault state of the bearing to a large extent. In this
of time-frequency analysis based on Wigner-Ville Distribu- paper, based on the different characteristics of the system,
tion (WVD) [43], Wavelet Transform (WT) [44] and Empir- the time domain, frequency domain statistical parameters,
ical Mode Decomposition (EMD) [45] has been widely used wavelet packet decomposition energy and entropy composi-
in recent years. tion feature set are extracted for subsequent operations.
The time-frequency analysis can characterize the variation
of the signal spectral components over time, and finally B. SUPPORT VECTOR MACHINE
characterize the distribution of signal strength or energy The support Vector Machine (SVM) [49], [50] is a clas-
simultaneously in time and frequency. As soon as the rolling sification method based on the principle of structural risk
bearing breaks down, the energy of the fault feature band minimization proposed by Vapnik et al.. The main purpose
corresponding to the vibration signal will be significantly of the SVM is to not only correctly classify the various
increased, so that the fault type and the fault location can be sample points, but also to maximize the spacing between the
determined by judging the characteristic frequency band in them, that is, to maximize the minimum distance between the
the wavelet decomposition result that includes fault informa- optimally divided hyperplane and all training sample points.
tion. Therefore, the deep information of the fault type can be The principle can be described as follows:
reflected by decomposing the signal via the time-frequency Given an original data sample set:
method and extracting the energy characteristics in different n o
frequency bands. (xi , yi ) xi ∈ Rd , yi ∈ {−1, +1} , i = 1, 2, . . . , n . (3)
4) ENTROPY FEATURE where n is the number of training data samples, and xi is the
Entropy is a measure of information uncertainty [46]. The input of the model; d represents the dimension of the training
entropy of different frequency bands can be used to mea- sample; yi is the sample category; -1 and 1 are category labels.
sure the uncertainty of signal distribution state and signal For the linearly separable case, the separation plane equa-
complexity, so it can quantitatively describe the information tion is w · x + b = 0. The sample (xi , yi ) needs to satisfy:
contained in the signal. According to the overall average
yi [(w · xi ) + b] ≥ 1, i = 1, 2, . . . , n (4)
characteristics of the signal source, entropy can manifest the
complexity of system internal information, so the essential where w is the plane normal vector and b is the constant term.
information of the bearing fault can be extracted based on The distance between the nearest sampling point and the
the effective entropy value. Commonly used entropy features separation plane is 1/ kwk. Therefore, the maximum spac-
are Shannon entropy, index entropy [47], and permutation ing of 1/ kwk can be equivalent to the minimum value of
entropy [48]. For a discrete random variable X with a sample kwk2 . The separation line determined by w is the optimal
space of [x1 , x2 , . . . , xn ], the Shannon entropy is: separation line, and the sample points on the separation line
n w · x + b = ±1 are called support vectors.
1 X
H (X ) = E log2 =− p (xi ) log2 p (xi ) (1) The Lagrange optimization method is adopted to convert it
p (xi )
i=1 into its dual problem, namely, the maximization function:
where p (xi ) represents the probability of the sample. Index
entropy can avoid the case where the logarithm of Shannon X 1
max W (α) = αi − αi αj yi yj xi · xj
entropy is prone to undefined and zero values. Its definition 2
I =1
is as follows:
n where, αi is Lagrange multiplier, and αi ≥ 0, i = 1, . . . , n.
pi e(1−pi )
HEXP = − (2) It actually aims to find the optimal solution of quadratic
i=1 function with constraints, and the sample corresponds to the
non-zero αi in the solution is support vector, so that the Least squares support vector machine-Quantum behaved par-
optimal classification function can be obtained in this way: ticle swarm optimization (QPSO-LSSVM); In [55], Zhu et al.
( n ) proposed a multi-scale global fuzzy entropy (MGFE) fea-
ture extraction method, and introduced multiple class feature
f (x) = sgn (w · x) + b = sgn
αi yi (xi · x) + b
∗ ∗
selection (MCFS) method to filter features, and finally input
(6) SVM for fault diagnosis.
However, most literature on bearing diagnostics uses SVM
where, αi∗ is optimal Lagrange factor and b∗ is classification for pattern recognition only in the final step of its algorithm.
threshold, which are the parameters for determining optimal In this paper, the SVM is directly introduced into the feature
hyper-plane partition. The positive or negative function indi- selection part to diagnose a single feature. According to its
cates the class attributes. diagnostic rate, it is judged whether the feature has strong
Regarding the linear inseparable case, the slack variable correlation with the bearing fault information, and the invalid
ξi is introduced, so as to convert the problem of looking for information is eliminated. By this method, features capable of
hyper-plane into quadratic programming problem: expressing fault information in the feature set can be extracted
to a greater extent. The method can perform a screening of the
φ (ω) = min 1 kωk2 + C original feature set.
2 (7)
s.t.y [(w · x ) + b] ≥ 1 − ξ , ξ ≥ 0, i = 1, . . . n
i i i i
In the field of statistical signal processing research, correla-
where, ξi is the positive slack variable that allows misclas- tion analysis has been the focus of scholars. The study of cor-
sification, representing the deviation amount of correspond- relation is a method that uses the relevant two sets of variables
ing data point xi from the hyper-plane. C is penalty factor, to reflect the overall relevance. The Pearson correlation coef-
indicating the degree of punishment for misclassification. ficient represents the degree of linear correlation between the
It is used for control the weight between looking for the two sets of variables [56]. The Pearson correlation coefficient
hyper-plane with maximal spacing in the objective function R can be expressed as a formula:
and guaranteeing the minimum deviation amount at the data PN
cov (A, B)
point. i=1 Ai − A Bi − B
σA σB
= q
PN 2 PN 2 (9)
For the nonlinear separable case, the low-dimensional i=1 Ai − A i=1 Bi − B
input space can be mapped into the high-dimensional feature
space by introducing the kernel function, so as to realize the where A and B represent two sets of features of equal length.
linear classification after nonlinear classification transforma- N is the number of samples in the variable; Ai and Bi are the ith
tion. In this case, the classification function becomes: measurements of variables A and B; A and B are the average
( n ) of variables A and B, respectively.
X The correlation coefficient R ranges from −1 to +1. When
f (x) = sgn αi yi K (xi , x) + b
∗ ∗
the value is 0, there is no linear correlation between the two
features. If the value is at [−1, 0), it indicates that the two
where K (xi · x) is the kernel function. features are negatively correlated; if the value is at (0, +1],
Replacing the inner product of the original space with a the two features are positively correlated. The closer the abso-
kernel function is the key to SVM. Common kernel func- lute value of the correlation coefficient R is to 1, the higher
tions [51] are as follow: 1) Linear kernel K (x, y) = x · y; the degree of correlation between the two features, indicating
2) Polynomial kernel K (x, y) = [(x · y) + 1]q ; 3) Radial that the duplicate information of the two features is larger;
basis function (RBF) kernel K (x, y) = exp kx − yk2 /2σ 2 ;
When the absolute value of R is 1, the information repre-
4) Sigmoid kernel K (x, y) = tanh (α (x · y) + b). sented by the two features can be replaced with each other.
The fault diagnosis of rolling bearings is usually a multi- Therefore, the larger the absolute value of R, the lower the
class identification task. In view of the better classifi- significance of the corresponding feature [57]: 1) |R| ≥ 0.8,
cation ability of SVM for nonlinear and small training highly correlated; 2) 0.5 ≤ |R| < 0.8 , moderate correlation;
samples, it is still widely used in the field of machine 3) 0.3 ≤ |R| < 0.5, low correlation; 4) |R| < 0.3 , weak
fault diagnosis. Liu et al. [52] used the SVM to verify correlation, which can be regarded as nonlinear correlation.
the superiority of the method by merging the Minim In this part, the Pearson correlation coefficient method is
Entropy Deconvolution (MED) with the hierarchical fuzzy used to select the selected features again. The highly corre-
entropy; Wan et al. [53] combined the objective wavelet lated features of each type of feature are selected, and only
transform (EWT) with multi-scale entropy to obtain new one of the main features is taken as a sensitive feature. It is
features, and input SVM to improve the fault diagnosis effi- considered that the physical information expressed by the
ciency of the bearing; In [54], a novel rolling bearing fault remaining features is basically the same as this sensitive fea-
diagnosis strategy was proposed based on Improved multi- ture, which is a redundant feature and is excluded to achieve
scale permutation entropy (IMPE), Laplacian score (LS) and the purpose of the second screening.
D. PRINCIPAL COMPONENT ANALYSIS AND corresponds to the eigenvalues λi of the original feature, and
WEIGHTED LOAD EVALUATION then the contribution rate of the j-th principal component is:
Principal component analysis (PCA) is a commonly used data Cj = Pm ∗ 100% (12)
i=1 λj
processing and analysis method. Its purpose is to reduce the
Since the variance of each principal component is decreas-
data to eliminate overlapping information in the coexistence
ing, the amount of information contained is also decreasing.
of many information [3], [58], [59].
Therefore, in the actual analysis, it is generally not to select
PCA maps high-dimensional data space to low-dimensional
m principal components, but to select the first k principal
space by orthogonal transform, which recombines many
components (Ak reaches 85%-90%) according to the cumu-
original features with certain correlation into a group of rel-
lative contribution rate of each principal component. The
atively uncorrelated integrated features. This not only retains
contribution rate here refers to the proportion of the variance
the main information of the original variables, but the new
of a principal component to the total variance, that is, the pro-
features are not related to each other. From the mathematical
portion of a certain eigenvalue to the total eigenvalues:
point of view, the m-dimensional feature is mapped to the Pk
k(k<m) dimension, and the obtained k-dimensional feature is λi
Ak = Pi=1m ∗ 100% (13)
i=1 λi
the principal component feature extracted from the original
data feature. This k-dimensional principal component feature
The greater the variance contribution rate, the stronger
already contains most of the information. These new vari-
the ability of the selected principal components to reflect
ables are irrelevant and are arranged in descending order of
comprehensive information.
variance [60]. The specific analysis steps are as follows:
(1) Original data standardization: For the evaluation 2) WEIGHTED LOAD EVALUATION
objects in a group n, there are m features: X1 , X2 , . . . Xm ; then
In practical applications, after selecting the important princi-
the j-th indicator value of the i-th evaluation object is marked
pal components, we must also pay attention to the interpreta-
as xij , so the sample space matrix of the evaluation object can
tion of the actual meaning of the principal components. Since
be obtained:
the new principal component is obtained by orthogonal trans-
x11 · · · x1m formation of the original features, each principal component
X = (X1 , X2 , . . . , Xm ) = ... ..
.. (10)
reflects the comprehensive information of multiple original
variables. Therefore, it is difficult to obtain the original fault
xn1 ··· xnm
information directly from the principal component. To this
where, Xi = (x1i , x2i . . . , xni ) , i = 1, 2, . . . , m. The various end, the article uses the load analysis method to obtain the
indicators xij are standardized: load factor matrix of the k principal components and score
the original features.
xij − µj
x̃ij = , i = 1, 2, . . . , n; j = 1, 2, . . . , m. (11) (1) Load factor matrix:
Sj It can be learned from the principle of the principal com-
q 2 ponent analysis method that each principal component can be
where, µj = 1n ni=1 aij , Sj = n−1 1 Pn
i=1 aij − µj ; µj obtained by linear combination X1 , X2 , . . . , Xm :
and Sj are the sample mean value and standard deviation of the
X −µ F1 = α11 × X1 + α21 × X2 + · · · + αm1 × Xm
j-th features, respectively; then the corresponding X̃j = j Sj j
is standardized characteristic variable. F2 = α12 × X1 + α22 × X2 + · · · + αm2 × Xm
(2) Correlation coefficient matrix: the correlation coeffi- ..
x̃ki ·x̃kj
cient of the standardized feature is rij = k=1 n−1 , i, j = Fm = α1m × X1 + α2m × X2 + · · · + αmm × Xm (14)
1, 2, . . . , m. The correlation coefficient matrix is composed
as R = rij m∗n , and rii = 1, rij = rji . Each principal component Fi in (14) corresponds to i-th
(3) Computerization of eigenvalues and eigenvectors: eigenvalue λi :
according to the correlation coefficient matrix, the eigenval- Fi = X αi (i = 1, 2, . . . , m) . (15)
ues λ1 ≥ λ2 ≥ . . . λm ≥ 0 can be obtained from big to
small. αj = α1j , α2j , . . . , αmj represents the eigenvector According to the correlation matrix theorem, αi satisfies
corresponding to the i-th eigenvalue λi . The eigenvectors the Equation (16):
β1 , β2 , . . . , βm can be obtained after orthogonalization T and
unitization on this basis, where βj = β1j , β2j , . . . , βmj . αi αiT = 1 (16)
(4) Selection of important principal components: Let the 1
principal component as F1 , F2 , . . . , Fm . The contribution rate Combining (15) and (16), we get:
and cumulative contribution rate of principal components m
are mainly calculated based on the previously computerized X= Fi αiT (17)
eigenvalues. The contribution of each principal component 1
As mentioned above, the k principal components inevitably be doped with some invalid or redundant features,
F1 , F2 , . . . , Fk (k < m) is obtained according to the original and may lead to ‘‘dimension disaster’’, which will increase
features X1 , X2 , . . . , Xm , and this meets COV Fi , Fj = 0,
the calculation amount and reduce the prediction efficiency.
namely, Fi and Fj are not correlated; The variance D (Fi ) Therefore, it is necessary to reduce the dimension of the
is greater, so the first k principal components can stand for feature set as much as possible while ensuring the integrity
the majority of information in original features with lowered of the information to obtain the best feature vector in accor-
dimensionalities. The linear equations of the first k principal dance with the processing background. Depending on the best
components can be derived: sensitive characteristics, bearing faults can be diagnosed and
the operation of the equipment can be further guided.
Fi = α1i × X1 + α2i × X2 + · · · + αki × Xk ,
i = 1, 2, . . . , k. (18) A. CONSTRUCTION OF FAULT FEATURE SET
Then the original features can be expressed: According to the multi-dimensional feature extraction
method mentioned in section 2.1, the feature set Q1 of the
X bearing under a certain machining condition can be con-
X̂ = Fi αiT (19)
structed. The feature set contains four types of features,
namely four sub-feature sets {T1 , F1 , E1 , S1 }, which represent
where, the combination coefficient αi = (α1i , α2i , . . . , αki )T time-domain features, frequency domain features, energy fea-
of each principal component is the load factor matrix corre- tures and information entropy features, respectively, where
sponding to the eigenvalue of the original feature. Q1 = T1 + F1 + E1 + S1 .
(2) Original feature evaluation 12 commonly used time-domain parameters t1 ∼ t12 are
The feature vector coefficients of the principal components selected to form a time-domain feature set T1 , including:
can be calculated based on the obtained principal component mean, RMS, absolute mean, amplitude of RMS, peak-to-peak
feature values, contribution rates, and load factors, as (20) value, peak factor, standard deviation, kurtosis factor, form
displays [61]. factor, pulse factor, margin factor and skewness factor. For the
cij frequency-domain parameters, due to the working principle
fij = √ , i = 1, 2, . . . k, j = 1, 2, . . . m. (20)
λi of the bearing, the corresponding fault frequency component
will be generated when the bearing fails. The change of each
where fij is the eigenvector coefficient of the main component,
frequency component in the signal will cause corresponding
αij is the component load of each feature under the principal
changes in the power spectrum. By describing the varia-
component, and λi is the eigenvalue of the corresponding
tion of the main frequency band in the power spectrum, the
principal component.
frequency-domain feature variation of the bearing signal can
The weight ωi corresponding to each principal component
be well described. The frequency domain feature set consists
can be obtained from the corresponding variance contribu-
of CF, RMSF, and STDF: P1 = {p1 , p2 , p3 }. The relevant
tion rate. A mathematical model of the principal component
calculation equation is shown in Table 1.
composite score can be obtained by linearly summing all the
The energy of each frequency component in the signal con-
principal components.
tains a wealth of fault information. The article decomposes
X the original signal by means of wavelet packet decompo-
F= ωi × F i , i = 1, . . . , k. (21) sition. By conducting i-th layer of wavelet packet decom-
1 position on original signal X, a wavelet packet decomposi-
The weighted load score ν of the original feature can be tion sequence Si,j (j = 1, 2, . . . , 2i ) can be obtained. The
expressed: secondary energy type is used to indicate the reconstructed
k signal corresponding to each frequency band; then the energy
spectrum [62] of the j-th frequency band of i-th layer of
ν= ωi × αij , i = 1, . . . , k, j = 1, . . . m. (22)
wavelet packet decomposition is:
Ei,j (l) = xi,j (l) (23)
The information represented by a single feature is limited, wherein, xi,j (l) is the discrete point amplitude of the recon-
and does not fully reflect the fault information of the bear- structed signal, j is the frequency band serial number of
ing signal. Extracting multiple features can more accurately the i-th layer after decomposition, l is the sampling point
determine the fault category. Therefore, it is necessary to serial number(l = 1, 2, . . . n), n is the total number of
construct multi-features with different dimensions such as signal sampling points. Then the wavelet packet energy
statistical parameters, energy and various entropies, and to spectrum of each frequency
T band can be obtained: Ei =
Ei,1 , Ei,2 , · · · , Ei,2i . The total signal energy ET at cer-
use the difference complementarity between different fea-
tures to construct a more comprehensive high-dimensional tain time window is equal to the sum of the energy of
feature set that expresses fault type information. However, each component. This constitutes an energy feature set:
if the feature concentration dimension is too high, it will E1 = e1 , e2 , . . . e2i , ET .
classifier for diagnosis; the weight Wi of the sub-feature the fault information to the greatest extent can be obtained
set is obtained according to Ri , which, to some extent, can through multiplying each standardized feature by its corre-
represent the ability of the fault information in diagnosing the sponding weight and then summing them up.
bearing [63]. The corresponding calculation formula is: Step 5: The fusion feature is used as an input for pattern
Ri recognition so as to train the fault classifier, and the obtained
Wi = , i = 1, 2, . . . , M . (25) weights can be further reflected in the feature extraction
process to guide the model to perform feature extraction
i=1 according to a certain weight.
where, M is the number of features.
Before the weighted fusion of features, it is necessary IV. EXPERIMENTS AND ANALYSIS RESULTS
to standardize the features to prevent from flushing out the A. CASE1
features with smaller data values by those with greater data 1) DATA DESCRIPTION
values, so as to avoid affecting the calculation results due to In order to verify the feature selection method proposed in
different dimensions. The feature value qi of the i-th feature this paper, the rolling bearing fault signal provided by the
is normalized according to (26). laboratory of Case Western Reserve University(CWRU) [64]
qi − min (qi ) is taken as an example for testing. The bearing parameters
q0i = , i = 1, 2, . . . , M . (26) used in the test are shown in Table 2. The entire test stand
max (qi ) − min (qi )
consists of a three-phase asynchronous motor (left), a torque
A feature that can describe the fault information to the encoder (center), a dynamometer (right) and associated vibra-
greatest extent can be obtained through multiplying the fea- tion acceleration sensors, as shown in Fig. 2.
ture qi in feature set Q4 by its corresponding weight Wi The single-point-fault was introduced to the test bearings
and then summing them up. The sum of the weights is 1. using electro-discharge machining with fault positions of
The new fusion feature obtained from this linear weighted inner raceway, outer raceway and rolling element. Fault diam-
combination is calculated as follows: eters include 0.1778mm, 0.3556mm and 0.5334mm (fault
M = q1 • W1 + q2 • W2 + . . . + qi • Wi severity: mild fault, moderate fault and severe fault). With
( normal bearing data, bearing data can be divided into 10 types
Wi > 0
× P i = 1, 2, . . . , M . (27) for each condition. The motor no-load speed is 1797r/min.
W = 1i The vibration signals at the driving end is recorded
Depending on the weight Wi , the sensitivity of the selected by the acceleration vibration sensor with a sampling fre-
feature to the fault can be obtained. Combined with SVM, the quency of 12 kHz under different motor loads of 0-3 horse-
fault type and severity of the experimental data are identified power(motor speeds of 1730 to 1797 rpm). The data under
to verify the effectiveness of the method. the three load conditions form three data sets A, B and C
respectively. As can be learned from the motor speed and
C. SYSTEM BLOCK DIAGRAM the sensor sampling frequency, about 400 data points are
collected in one rotation of the bearing. Therefore, in order
The implementation flow of the feature screening model
to ensure that the length of a single sample can completely
proposed in this paper is shown in Fig.1. Taking the fault
and accurately reflect that data distribution of the bearing
diagnosis of the bearing as the goal, the process is divided
vibration signals in this state, the first 120000 points of the
into five steps: signal processing, feature extraction, feature
raw data in each sample are taken, and every 1200 data
selection, feature weighting fusion and patterns recognition.
points are regarded as a small sample length, so that each
Step 1: Collecting the vibration signal in the bearing oper-
raw data can produce 100 samples. Let the first 70 groups
ation and decomposing the collected signal into different
be used for establishing the sample knowledge base and the
frequency bands.
last 30 groups be the validation samples to test the method
Step 2: Feature extraction is performed on the original
validity. Detailed information of the bearing vibration data
signal and each frequency band signal, and its time-domain,
set is shown in Table 3.
frequency-domain, energy and entropy features are obtained
When the fault diameter is 0.5334mm, the samples with
and constitute the original high dimensional feature set.
the bearing condition of 1hp in the normal state and different
Step 3: The original features are sequentially subjected to
fault states are extracted, and the respective vibration signals
three feature selection processes: SVM single feature selec-
are shown in Fig. 3.
tion, correlation analysis and PCA-WLE. The invalid feature
and redundant feature in the feature set are eliminated to
obtain the low-dimensional sensitive feature set. 2) ANALYSIS RESULTS
Step 4: The corresponding diagnosis rates are obtained by According to the system flow shown in Fig. 1, the extracted
inputting the features in low-dimensional feature set into the original signal X {x1 , x2 , . . . , xn } is processed first. The
SVM, respectively, and the corresponding weight is obtained ‘‘db5’’ wavelet is selected to decompose the vibration signal
based on the diagnosis rate. A fusion feature that can describe into four layers, and the characteristic signals 16 frequency
FIGURE 6. The principal component and comprehensive score. FIGURE 7. The Feature weights of sensitive features.
TABLE 7. Bearing fault diagnosis results of different types of feature sets. TABLE 8. Tested bearing parameters.
TABLE 10. Single fault feature correlation rate. TABLE 11. Features score and rank.
the redundant features through correlation analysis, one time F = 0.7341 × F1 + 0.2121 × F2 + 0.0532 × F3 (31)
domain feature (RMS), one frequency domain feature (CF),
By processing the entropy features in the same way,
six energy features, and seven entropy features can be
the feature set needed by the bearing under this operation
obtained. Therefore, the PCA-WLE method shall be adopted
condition can finally be obtained:
to separately reduce the dimensionality of the energy features
and entropy features. A = {T2 , P3 , e1 , e6 , e14 , S1 , S8 , S16 }
The principal component analysis results of the two types
of features are separately shown in Fig. 11. Taking energy fea- The fault diagnosis rates corresponding to the above
tures for example, the principal components (first three) with 8 sensitive features are: 0.7673, 0.8994, 0.5220, 0.5975,
cumulative contribution rate greater than 90% are retained, 0.5975, 0.7799, 0.5346 and 0.5975 By introducing these into
and their load factor matrix as well as principal component formula (27), the corresponding weights can be obtained,
eigenvector coefficients (Table 11) are calculated. Taking the which are 0.1449, 0.1698, 0.0986, 0.1128, 0.1128, 0.1473,
principal component F1 as an example, its linear expression 0.1009 and 0.1128. Fig. 13 shows the weights of 7 features
is: after screening.
By inputting the normalized test set to the trained
F1 = 0.3661 × e1 + 0.4601 × e6 + 0.4102 × e8 + 0.2890 PSO-SVM model after multiplying by the corresponding
× e12 + 4170 × e14 + 0.4781 × Ez (30) weights, the type of failure can be identified, and the results
are shown in Fig. 14. According to Table 12, which shows
The weight of each principal component can be obtained the diagnosis results of different features, unweighted fusion
via the corresponding variance contribution rate, which is features and the weighted fusion features proposed in this
0.7341, 0.2121, and 0.0532, respectively. A weighted load article, the sensitive features based on weighted fusion have
score of the original features can be obtained by linearly sum- a higher diagnosis rate than the single-domain features and
ming all the main components (Table 11). Fig. 12 presents unweighted fusion features.
of the bearing fault at one time instead of considering that [4] X. Xue and J. Zhou, ‘‘A hybrid fault diagnosis approach based on
different fault levels of various fault types may correspond mixed-domain state features for rotating machinery,’’ ISA Trans., vol. 66,
pp. 284–295, Jan. 2017, doi: 10.1016/j.isatra.2016.10.014.
to the same feature value. As shown in Fig. 15, the value [5] Z. Wang, Q. Zhang, J. Xiong, M. Xiao, G. Sun, and J. He, ‘‘Fault diagnosis
of form factor features of severe fault in inner ring highly of a rolling bearing using wavelet packet denoising and random forests,’’
overlaps that of minor fault in rolling element, which can IEEE Sensors J., vol. 17, no. 17, pp. 5581–5588, Sep. 2017, doi: 10.
cause misjudgment in the fault diagnosis and affect the final [6] Y. Wei, Y. Li, M. Xu, and W. Huang, ‘‘A review of early fault diagnosis
diagnosis result. Therefore, the respective wear degree shall approaches and their applications in rotating machinery,’’ Entropy, vol. 21,
be further differentiated after identifying the fault type. But no. 4, p. 409, Apr. 2019, doi: 10.3390/e21040409.
[7] J. Peng, L. Fan, W. Xiao, and J. Tang, ‘‘Anomaly monitoring method for
the proposed feature selection method is still applicable to key components of satellite,’’ Sci. World J., vol. 2014, pp. 1–14, Jan. 2014,
this identification plan. doi: 10.1155/2014/104052.
[8] J. Xiong, C. Li, J. Cen, Q. Liang, and Y. Cai, ‘‘Fault diagnosis method
based on improved evidence reasoning,’’ Math. Problems Eng., vol. 2019,
V. CONCLUSION pp. 1–9, Mar. 2019, doi: 10.1155/2019/7491605.
In this paper, a bearing fault feature selection method based [9] H. Li and D. Xiao, ‘‘Fault diagnosis of Tennessee Eastman pro-
on weighted Multidimensional fusion is proposed. In addition cess using signal geometry matching technique,’’ EURASIP J. Adv.
Signal Process., vol. 2011, no. 1, p. 83, Oct. 2011, doi: 10.1186/
to taking into account the importance of different features 1687-6180-2011-83.
during fault diagnosis, this method makes up for the inability [10] F. Bagheri, H. Khaloozaded, and K. Abbaszadeh, ‘‘Stator fault detection
of the principal component in explaining the physical mean- in induction machines by parameter estimation, using adaptive Kalman
ing when reducing the dimensions based on traditional PCA filter,’’ in Proc. Medit. Conf. Control Autom., Jun. 2007, pp. 1–6, doi: 10.
algorithm. Meanwhile, by means of developing an original [11] H. Li and D. Y. Xiao, ‘‘Survey on data driven fault diagnosis methods,’’
high-dimensional feature set via extracting feature param- Control Decis., vol. 26, no. 1, pp. 1–9 and 16, 2011.
eters from different domains, a feature selection process [12] X. Ma and D. Li, ‘‘A hybrid fault diagnosis method based on fuzzy
signed directed graph and neighborhood rough set,’’ in Proc. 6th Data
algorithm that combines with SVM single feature evaluation, Driven Control Learn. Syst. (DDCLS), May 2017, pp. 253–258, doi: 10.
correlation analysis and PCA-WLE is put forward in this 1109/ddcls.2017.8068078.
paper for selecting sensitive features, and the weighted fusion [13] Y. Wang, X. Li, J. Ma, and S. Li, ‘‘Fault diagnosis of power trans-
former based on fault-tree analysis (FTA),’’ IOP Conf. Ser., Earth Environ.
is conducted according to the correlation between selected Sci., vol. 64, May 2017, Art. no. 012099, doi: 10.1088/1755-1315/64/1/
features and fault information. In the end, the fusion fea- 012099.
tures are input into the PSO-SVM classifier for fault diag- [14] Q. Yao, J. Wang, and G. Zhang, ‘‘A fault diagnosis expert system based
on aircraft parameters,’’ in Proc. 12th Web Inf. Syst. Appl. Conf. (WISA),
nosis, which proves that this method has strong identification
Sep. 2015, pp. 314–317, doi: 10.1109/wisa.2015.21.
ability. [15] S. Yuanyuan, G. Lili, and W. Yongming, ‘‘Artificial intelligence and
The results of experiment on test data from Case Western learning techniques in intelligent fault diagnosis,’’ in Proc. 4th Int. Conf.
Reserve University bearing data center and Xi’an Jiaotong Comput. Sci. Netw. Technol. (ICCSNT), Dec. 2015.
[16] Z. Liu, W. Guo, J. Hu, and W. Ma, ‘‘A hybrid intelligent multi-
University School of Mechanical Engineering indicate that fault detection method for rotating machinery based on RSGWPT,
the features selected by this proposed method can accurately KPCA and Twin SVM,’’ ISA Trans., vol. 66, pp. 249–261,
identify and classify the different fault categories and fault Jan. 2017.
[17] Y. Tian, Z. Wang, and C. Lu, ‘‘Self-adaptive bearing fault diag-
severity of bearing. Therefore, it is applicable to the feature nosis based on permutation entropy and manifold-based dynamic
selection of various bearings and rotating machineries with time warping,’’ Mech. Syst. Signal Process., vol. 114, pp. 658–673,
great application potential. Jan. 2019.
[18] C. Li, J. V. De Oliveira, M. Cerrada, F. Pacheco, D. Cabrera, V. Sanchez,
and G. Zurita, ‘‘Observer-biased bearing condition monitoring: From fault
AUTHOR CONTRIBUTIONS detection to multi-fault classification,’’ Eng. Appl. Artif. Intell., vol. 50,
Wei Dai and Weifang Zhang conceived and they were expe- pp. 287–301, Apr. 2016.
[19] J. Zheng, H. Pan, and J. Cheng, ‘‘Rolling bearing fault detection and
rience at assembly process; Yazhou Li built up the model diagnosis based on composite multiscale fuzzy entropy and ensemble sup-
and did the simulations; Wei Dai and Yazhou Li con- port vector machines,’’ Mech. Syst. Signal Process., vol. 85, pp. 746–759,
tributed to the writing and editing of the manuscript. Wei Feb. 2017.
[20] J. Ben Ali, L. Saidi, A. Mouelhi, B. Chebel-Morello, and F. Fnaiech,
Dai checked manuscript and provided some suggestions for ‘‘Linear feature selection and classification using PNN and SFAM neu-
revision. ral networks for a nearly online diagnosis of bearing naturally pro-
gressing degradations,’’ Eng. Appl. Artif. Intell., vol. 42, pp. 67–81,
Jun. 2015.
REFERENCES [21] Y. Li, Y. Yang, X. Wang, B. Liu, and X. Liang, ‘‘Early fault diagnosis
[1] Y. Lei, J. Lin, Z. He, and M. J. Zuo, ‘‘A review on empirical mode of rolling bearings based on hierarchical symbol dynamic entropy and
decomposition in fault diagnosis of rotating machinery,’’ Mech. Syst. Sig- binary tree support vector machine,’’ J. Sound Vibrat., vol. 428, pp. 72–86,
nal Process., vol. 35, nos. 1–2, pp. 108–126, Feb. 2013, doi: 10.1016/j. Aug. 2018.
ymssp.2012.09.015. [22] Z. Gao, C. Cecati, and S. X. Ding, ‘‘A survey of fault diagnosis and
[2] S. Adamczak, K. Stępień, and M. Wrzochal, ‘‘Comparative study of mea- fault-tolerant techniques—Part I: Fault diagnosis with model-based and
surement systems used to evaluate vibrations of rolling bearings,’’ Pro- signal-based approaches,’’ IEEE Trans. Ind. Electron., vol. 62, no. 6,
cedia Eng., vol. 192, pp. 971–975, 2017, doi: 10.1016/j.proeng.2017.06. pp. 3757–3767, Jun. 2015.
167. [23] Z. Gao, C. Cecati, and S. Ding, ‘‘A survey of fault diagnosis and fault-
[3] B. Pang, G. Tang, T. Tian, and C. Zhou, ‘‘Rolling bearing fault diagnosis tolerant techniques—Part II: Fault diagnosis with knowledge-based and
based on an improved HTT transform,’’ Sensors, vol. 18, no. 4, p. 1203, hybrid/active approaches,’’ IEEE Trans. Ind. Electron., vol. 62, no. 6,
Apr. 2018, doi: 10.3390/s18041203. pp. 3768–3774, Jun. 2015.
[24] M. Kang, J. Kim, J.-M. Kim, A. C. C. Tan, E. Y. Kim, and B.-K. Choi, [44] R. Kumar and M. Singh, ‘‘Outer race defect width measurement in taper
‘‘Reliable fault diagnosis for low-speed bearings using individually roller bearing using discrete wavelet transform of vibration signal,’’ Mea-
trained support vector machines with kernel discriminative feature anal- surement, vol. 46, no. 1, pp. 537–545, Jan. 2013.
ysis,’’ IEEE Trans. Power Electron., vol. 30, no. 5, pp. 2786–2797, [45] S. Mohanty, K. K. Gupta, and K. S. Raju, ‘‘Hurst based vibro-acoustic fea-
May. 2015. ture extraction of bearing using EMD and VMD,’’ Measurement, vol. 117,
[25] R. Li, P. Sopon, and D. He, ‘‘Fault features extraction for bear- pp. 200–220, Mar. 2018.
ing prognostics,’’ J. Intell. Manuf., vol. 23, no. 2, pp. 313–321, [46] J. Li and J. Guo, ‘‘A new feature extraction algorithm based on entropy
Apr. 2012. cloud characteristics of communication signals,’’ Math. Problems Eng.,
[26] Y. Yang, Y. Liao, G. Meng, and J. Lee, ‘‘A hybrid feature selec- vol. 2015, pp. 1–8, Jun. 2015.
tion scheme for unsupervised learning and its application in bearing [47] L. Huang, M. Wang, and L. Wu, ‘‘Research on change detection approach
fault diagnosis,’’ Expert Syst. Appl., vol. 38, no. 9, pp. 11311–11320, using PSO algorithm and multiple thresholds exponential entropy in
Sep. 2011. remote sensing images,’’ Eng. Surveying Mapping, vol. 27, no. 7, pp. 1–5,
[27] C. Yang and T. Wu, ‘‘Diagnostics of gear deterioration using EEMD 2018.
approach and PCA process,’’ Measurement, vol. 61, pp. 75–87, [48] J. Zheng, H. Pan, S. Yang, and J. Cheng, ‘‘Generalized composite
Feb. 2015. multiscale permutation entropy and Laplacian score based rolling bear-
[28] B. Zhang, L. Zhang, and J. Xu, ‘‘Degradation feature selection for remain- ing fault diagnosis,’’ Mech. Syst. Signal Process., vol. 99, pp. 229–243,
ing useful life prediction of rolling element bearings,’’ Qual. Rel. Engng. Jan. 2018.
Int., vol. 32, no. 2, pp. 547–554, Mar. 2016. [49] V. N. Vapnik, The Nature of Statistical Learning Theory. New York, NY,
[29] Z. Wei, Y. Wang, S. He, and J. Bao, ‘‘A novel intelligent method USA: Springer, 1995.
for bearing fault diagnosis based on affinity propagation clustering [50] Y. Li, W. Dai, X. Wu, and Y. Kan, ‘‘Surface quality evaluation based
and adaptive feature selection,’’ Knowl.-Based Syst., vol. 116, pp. 1–12, on roughness prediction model,’’ in Proc. Int. Conf. Inf. Technol. Electr.
Jan. 2017. Eng. (ICITEE), 2018, doi: 10.1145/3148453.3306271.
[30] B. Samanta and K. Al-Balushi, ‘‘Artificial neural network based [51] Z. Zhang, Y. Zhang, and Q. Liu, ‘‘Fault diagnosis on bearing by support
fault diagnostics of rolling element bearings using time-domain fea- vector machine and wavelet analysis,’’ Machinery Des. Manuf., vol. 313,
tures,’’ Mech. Syst. Signal Process., vol. 17, no. 2, pp. 317–328, no. 3, pp. 204–207, 2017.
Mar. 2003. [52] Y. Liu and S. Liu, ‘‘Application of MED and hierarchical fuzzy entropy
[31] V. Sugumaran, V. Muralidharan, and K. Ramachandran, ‘‘Feature selection to rolling bearing fault diagnosis,’’ Machinery Des. Manuf., no. 11,
using decision tree and classification through proximal support vector pp. 49–52 and 56, 2018.
machine for fault diagnostics of roller bearing,’’ Mech. Syst. Signal Pro- [53] S. Wan, L. Dou, R. Liu, and X. Zhang, ‘‘Fault diagnosis for high voltage
cess., vol. 21, no. 2, pp. 930–942, Feb. 2007. circuit breakers based on EWT and multi-scale entropy,’’ J. Vib., Meas.
[32] L. Yuan, Y. He, J. Huang, and Y. Sun, ‘‘A new neural-network-based fault Diagnosis, vol. 38, no. 4, pp. 672–678 and 867, 2018.
diagnosis approach for analog circuits by using kurtosis and entropy as a [54] Y. Li, W. Zhang, Q. Xiong, D. Luo, G. Mei, and T. Zhang,
preprocessor,’’ IEEE Trans. Instrum. Meas., vol. 59, no. 3, pp. 586–595, ‘‘A rolling bearing fault diagnosis strategy based on improved mul-
Mar. 2010, doi: 10.1109/tim.2009.2025068. tiscale permutation entropy and least squares SVM,’’ J. Mech. Sci.
Technol., vol. 31, no. 6, pp. 2711–2722, Jun. 2017, doi: 10.1007/
[33] M. S. Ballal, Z. J. Khan, H. M. Suryawanshi, and R. L. Sonolikar, ‘‘Adap-
tive neural fuzzy inference system for the detection of inter-turn insulation
[55] K. Zhu, L. Chen, and X. Hu, ‘‘Rolling element bearing fault
and bearing wear faults in induction motor,’’ IEEE Trans. Ind. Electron.,
diagnosis based on multi-scale global fuzzy entropy, multiple class
vol. 54, no. 1, pp. 250–258, Feb. 2007.
feature selection and support vector machine,’’ Trans. Inst. Meas.
[34] L. Zhen, H. Zhengjia, Z. Yanyang, and C. Xuefeng, ‘‘Bearing condition
Control, vol. 41, no. 14, pp. 4013–4022, Oct. 2019, doi: 10.1177/
monitoring based on shock pulse method and improved redundant lifting
scheme,’’ Math. Comput. Simul., vol. 79, no. 3, pp. 318–338, Dec. 2008,
[56] X. Xiao, Q. He, Z. Li, A. O. Antoce, and X. Zhang, ‘‘Improving trace-
doi: 10.1016/j.matcom.2007.12.004.
ability and transparency of table grapes cold chain logistics by integrating
[35] A. Bellini, A. Yazidi, F. Filippetti, C. Rossi, and G.-A. Capolino, ‘‘High WSN and correlation analysis,’’ Food Control, vol. 73, pp. 1556–1563,
frequency resolution techniques for rotor fault detection of induction Mar. 2017.
machines,’’ IEEE Trans. Ind. Electron., vol. 55, no. 12, pp. 4200–4209,
[57] H. Zhao, D. Zhang, S. Huang, S. Mo, and H. Wei, ‘‘Analysis on the relation
Dec. 2008.
between cloud-to-ground lightning density and lightning trip rate in Hainan
[36] P. M. Baggenstoss and F. Kurth, ‘‘Comparing shift-autocorrelation with province based on pearson correlation coefficient,’’ High Voltage App.,
cepstrum for detection of burst pulses in impulsive noise,’’ J. Acoust. Soc. vol. 55, no. 8, pp. 186–192, 2019.
Amer., vol. 136, no. 4, pp. 1574–1582, Oct. 2014. [58] G. Chen, J. Chen, Y. Zi, J. Pan, and W. Han, ‘‘An unsupervised feature
[37] Y. Guo and K. K. Tan, ‘‘Order-crossing removal in Gabor order tracking extraction method for nonlinear deterioration process of complex equip-
by independent component analysis,’’ J. Sound Vibrat., vol. 325, nos. 1–2, ment under multi dimensional no-label signals,’’ Sens. Actuators A, Phys.,
pp. 471–488, Aug. 2009. vol. 269, pp. 464–473, Jan. 2018.
[38] Y. Lei, Z. He, Y. Zi, and X. Chen, ‘‘New clustering algorithm-based fault [59] C. Wang and J. Cai, ‘‘Research on fault diagnosis of rolling bearing based
diagnosis using compensation distance evaluation technique,’’ Mech. Syst. on empirical mode decomposition and principal component analysis,’’ Acta
Signal Process., vol. 22, no. 2, pp. 419–435, Feb. 2008. Metrologica Sinica, vol. 40, no. 6, pp. 1077–1082, 2019.
[39] J. Pei, S. Zhang, M. Qi, and G. Wan, ‘‘A new method for fault diagno- [60] X. Meng, C. Feng, and S. Gao, ‘‘Research on consistency of turbine
sis of fluid end in drilling pump,’’ Acta Petrolei Sinica, vol. 30, no. 4, blade temperature distribution based on principal component analysis,’’
pp. 617–620, 2009. J. Harbin Univ. Commerce (Natural Sci. Ed.), vol. 35, no. 4, pp. 451–457,
[40] R. Tiwari, V. K. Gupta, and P. Kankar, ‘‘Bearing fault diagno- 2019.
sis based on multi-scale permutation entropy and adaptive neuro [61] Y. Ma, ‘‘Principal component analysis of quality indexes of different
fuzzy classifier,’’ J. Vibrat. Control, vol. 21, no. 3, pp. 461–467, varieties of actinidia arguta,’’ Sci. Technol. Food Ind., vol. 40, no. 5,
Feb. 2015. pp. 233–238, 2019.
[41] W. Sun, G. An Yang, Q. Chen, A. Palazoglu, and K. Feng, ‘‘Fault [62] X. J. Zeng, X. L. Zhang, M. A. Hong-Jiang, and L. Li, ‘‘Traveling wave
diagnosis of rolling bearing based on wavelet transform and envelope fault location method for power grids based on wavelet packet energy
spectrum correlation,’’ J. Vibrat. Control, vol. 19, no. 6, pp. 924–941, spectra,’’ High Voltage Eng., vol. 34, no. 11, pp. 2311–2316, 2008.
Apr. 2013. [63] L. Zhai, ‘‘Research on image classification based on weighted multi-
[42] Y. Ying, J. Li, P. Chai, Y. Chen, and J. Pang, ‘‘Study on rolling bearing fault feature fusion and SVM,’’ M.S. thesis, School Comput., Central China
diagnosis based on multi-dimensional feature extraction,’’ J. Shanghai Normal Univ., Wuhan, China, 2016.
Univ. Electr. Power, vol. 34, no. 5, pp. 413–421, 2018. [64] Bearing Data Center. Accessed: Oct. 18, 2019. [Online]. Available:
[43] J. Rosero, L. Romeral, J. Ortega, and E. Rosero, ‘‘Short-circuit detection
by means of empirical mode decomposition and Wigner–Ville distribution [65] B. Wang, Y. Lei, N. Li, and N. Li, ‘‘A hybrid prognostics approach for
for PMSM running under dynamic condition,’’ IEEE Trans. Ind. Electron., estimating remaining useful life of rolling element bearings,’’ IEEE Trans.
vol. 56, no. 11, pp. 4534–4547, Nov. 2009. Rel., to be published.
YAZHOU LI was born in Hebi, Henan, China, WEIFANG ZHANG received the B.S. and Ph.D.
in 1996. He received the B.E. degree in energy degrees from the School of materials science
and power engineering from the Beijing University and Engineering, Harbin Institute of Technology,
of Civil Engineering and Architecture, in 2018. Harbin, China, in 1996 and 1999, respectively.
He is currently pursuing the M.S. degree with the He is currently a Professor with the School of Reli-
School of Energy and Power Engineering, Beihang ability and Systems Engineering, Beihang Univer-
University. sity. His main research interests include material
His research interests include condition mon- and structural damage, structural safety and health
itoring, fault diagnosis, reliability assessment, monitoring, and life prediction.
the evaluation and analysis based on big data.