Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We

Received December 27, 2019, accepted January 11, 2020, date of publication January 17, 2020, date of current
version January 30, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.2967537
Bearing Fault Feature Selection Method Based on

Weighted Multidimensional Feature Fusion
YAZHOU LI 1, WEI DAI 2, AND WEIFANG ZHANG 2
1 School of Energy and Power Engineering, Beihang University, Beijing 100191, China
2 School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
Corresponding author: Wei Dai ([email protected])

This work was supported by the National Natural Science Foundation of China (No. 51705015), and the Technical foundation program
(No. JSZL2017601C002 and JCKY2018203C005) from the Ministry of Industry and Information Technology of China.
ABSTRACT Rolling bearing is one of the most critical components in rotating machinery, so in order to
efficiently select features, reduce feature dimensions and improve the correctness of fault diagnosis, a feature
selection and fusion method based on weighted multi-dimensional feature fusion is proposed. Firstly, features
are extracted from different domains to constitute the original high-dimensional feature set. Considering the
large number of invalid and redundant features contained in such original feature set, a feature selection
process that combines with support vector machine (SVM) single feature evaluation, correlation analysis and
principal component analysis-weighted load evaluation (PCA-WLE) is put forward in this paper for selecting
sensitive features. The selected features are weighted and fused according to their sensitivity so as to further
weaken the interference of low important features. Finally, this process is applied to the data provided by the
Case Western Reserve University Bearing Data Center and Xi’an Jiaotong University School of Mechanical
Engineering, respectively, and the fault is diagnosed by using the particle swarm optimization-support vector
machine (PSO-SVM). The results show that this method can accurately identify different fault categories and
degrees of bearing, which is superior and practical than single-domain fault diagnosis with higher recognition
ability.
INDEX TERMS Features selection, feature weighting, sensitive features, fault diagnosis.
I. INTRODUCTION and maintaining the safe operation of the equipment are of

Rotating machinery is a very essential power unit for indus- great significance [6].
trial applications and is widely used in various production and Generally, fault diagnosis can be divided into three
processing fields [1]. As a key component of the transmission types [7]–[9], namely, analytical model-based method, qual-
of power in a rotating machine, the running state of the rolling itative empirical knowledge based method, and data driven
bearing is directly related to the performance state of the based method. The analytical model-based method is based
mechanical equipment [2], [3]. Due to the harsh working on the mathematical model of the known diagnostic object,
environment and often at full load, the rolling bearings are and the information of the measured object is processed
extremely easy to wear out and accumulate to form faults. according to a certain mathematical method. Using this
Once the fault occurs, it may cause a series of impacts method requires having enough sensors, understanding the
on the enterprise, such as production equipment shutdown, process mechanism structure, and a more accurate quanti-
economic benefit damage and casualties [4]. According to tative mathematical model. At present, the method mainly
statistics, due to the damage of the bearing, the rotating includes three methods: method based on parameter esti-
mechanical equipment can not operate normally, account- mation [10], method based on state estimation and method
ing for about 40% [5]. Therefore, monitoring the bearing based on equivalent space, and all of them have been stud-
status, discovering and eliminating potential faults in time, ied in depth. However, due to the fact that it is difficult to
obtain an accurate mathematical model of the research object
The associate editor coordinating the review of this manuscript and in practice, the scope and effect of the method are greatly
approving it for publication was Yu Wang . limited [11].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
19008 VOLUME 8, 2020
Y. Li et al.: Bearing Fault Feature Selection Method Based on Weighted Multidimensional Feature Fusion
The qualitative empirical knowledge based method mainly includes feature selection and feature dimension reduction.
depends on the accumulated experience gained during the Compared with features reduction and patterns recognition,
operation of the system. According to the incomplete prior there are relatively few studies on features reduction. On the
experience, the operating state of the equipment is described one hand, the increasing feature extraction method leads to an
and a qualitative model is established. The next state of increase in the feature vector dimension, but not all fault fea-
the equipment is predicted by reasoning. This kind of fault tures have an effect on bearing fault diagnosis. The increase
diagnosis method includes singed directed graph [12], fault of invalid features is likely to cause the diagnosis process
tree [13], expert system [14] and so on. However, the diag- to be more complicated and the accuracy of the diagnosis
nostic ability of knowledge-based fault diagnosis methods results to be reduced [24]. On the other hand, different types
depends only on the historical experience of experts or field of features have different applicability in different types of
workers. With the acceleration of industrial upgrading and the bearing failures or different stages of bearing operation [25].
deepening of relevant professional knowledge, the empirical Therefore, features should be simplified after feature extrac-
knowledge often exceeds the range that can be grasped by tion is completed, and the optimal features for maintaining
ordinary workers, making it difficult to carry out. operating. the intrinsic information about the faults should be retained
This method is especially not suitable for large industrial under the condition of reducing the number of features as
systems. Moreover, the above two methods are more suitable much as possible, so as to effectively and efficiently diagnose
for systems with fewer input, output and state variables, and the faults of bearings. Liao et al. [26] selected two different
are less practical for multi-sensor and mass acquisition data clustering analysis methods to classify the bearing data, and
systems. used the correlation analysis method to reduce the dimen-
Data-driven fault diagnosis methods include: (1) statistical- sionality of the data; Yang et al. [27] extracted the fault fea-
based methods; (2) signal-based methods; and (3) artificial tures in the vibration signal by means of ensemble empirical
intelligence-based methods. With the rapid development mode decomposition (EEMD), and reduced them by using
of data mining, computer technology and artificial intelli- principal component analysis (PCA); In [28], correlation,
gence [15], data-driven fault diagnosis methods have increas- monotonicity and robustness were selected as the evaluation
ingly shown their strong applicability, and often use a indicators of the features. Using these indicators, the residual
combination of three methods. Based on the redundant sec- life trend of the bearing was well displayed, and the remaining
ond generation wavelet packet transform (RSGWPT), service life of the bearing was effectively predicted; In [29],
Liu et al. [16] extracted 56 features of the vibration signal an adaptive feature selection technique was proposed. This
and input support vector machine (SVM) for fault identifi- technique can be used to remove redundant features and
cation; Tian et al. [17] selected permutation entropy (PE) as reduce the amount of computation for pattern recognition;
the fault feature, and proposed a manifold-based dynamic In [3], the Hilbert time-time (HTT) transform was combined
time warping method for fault diagnosis; Li et al. [18] with principal component analysis to extract and reduced
selected 1634 features and classified the bearing faults the bearing fault features. At present, some researchers have
using the method of fuzzy C-means with a variable studied the selection and dimension reduction of bearing fault
focal point (FCMFP); In [19], composite multiscale fuzzy features, but there are still some deficiencies in these research
entropy (CMFE) was selected as the feature to train the work. For one thing, many articles only consider single-fault
ensemble support vector machine (ESVM) for fault diag- features, such as time domain statistics or frequency domain
nosis of the rolling element bearings; In [20], the energy statistics, which cannot reflect fault information more com-
entropy of the intrinsic mode function (IMFs) of the bearing prehensively, and the comprehensiveness of features is poor;
vibration signal is extracted, and combined with probabilistic For another, the existing methods of dimensionality reduction
neural network (PNN) and simplified fuzzy adaptive reso- mostly use a single method such as Linear Discriminant
nance theory map (SFAM) for online bearing fault diagnosis; Analysis (LDA) and PCA, which cannot reflect the differ-
In [21], the hierarchical symbol dynamic entropy (HSDE) is ence between samples. Moreover, these methods use math-
used as a sensitive feature input binary tree support vector ematical means to reprocess the data. The new features are
machine (BT-SVM) to effectively identify the fault of the obtained by combining a plurality of original features, and the
bearing. Most of these tasks use statistic and signal analysis to physical information cannot be directly represented to guide
extract the features of vibration signals and to diagnose faults the subsequent equipment processes. Therefore, the selection
based on artificial intelligence. Some documents also use and dimension reduction of fault features should be further
deep learning methods to automatically extract fault features explored in order to adaptively select the optative sensitive
for diagnosis. All kinds of them have greatly advanced the features.
fault diagnosis research of bearings. The rest of this paper is organized as follows. In section II,
The data-driven fault diagnosis process can be divided a basic theories of the SVM, PCA, correlation analysis and
into four steps: signal processing, feature extraction, features multi-dimensional feature extraction techniques is outlined.
reduction, and patterns recognition [22], [23], and the first In section III, the specific steps of the proposed feature selec-
three are the foundation of the fourth step. Features reduction tion method is described in detail, and the system framework
VOLUME 8, 2020 19009

of the method is given. Experimental verifications of actual bearing has poor lubrication. The stability of such features
data are conducted in Sections 4 and conclusion and recom- is poor, and sometimes the value decreases as the degree of
mendations for future work are summarized in section V. failure increases. In general, time-domain based fault feature
extraction is still in a relatively early stage.
II. BASIC THEORIES 2) FREQUENCY-DOMAIN FEATURE

A. MULTIDIMENSIONAL FEATURE EXTRACTION The working principle of bearing determines that the corre-
1) TIME-DOMAIN FEATURE sponding fault frequency component will be inevitably gener-
As the simplest and most direct signal analysis method, time- ated in the frequency-domain when the bearing breaks down.
domain analysis is currently applied to most rolling bear- Therefore, from the perspective of the frequency domain,
ing online monitoring systems. Generally, it performs signal fault extraction of bearings is theoretically feasible.
analysis by calculating the simple statistical characteristic The Fast Fourier Transform (FFT) can obtain the distri-
quantity of the signal, and then selects appropriate feature bution of the frequency components of the signal in the
parameters to accurately classify different types of faults. The spectrogram, and can provide more intuitive information
statistical parameters are mainly divided into two categories content than the time domain waveform. A state spectrum
according to the presence or absence of the dimension [30]. analysis of the rolling bearing can help monitor its operating
The first category is the dimensional statistical parameters, state or find the location of vibration source. E.g., the bearing
including maximum value, minimum value, mean value, root rotational frequency and amplitude as well as phase of the
mean square (RMS) value, peak-to-peak value, and stan- main frequency components such as higher harmonics can
dard deviation. The other category is dimensionless statistical be obtained through spectrum analysis, which provides an
parameters, including skewness [31], kurtosis factor [32], effective analysis method for judging the location, type and
peak factor, form factor, pulse factor, and margin factor. severity of the bearing fault; in the process of inspecting
Using dimensional statistical parameters to describe the bearing turntable, the operating condition and deterioration
bearing state can reflect part of the fault information, such as degree of the bearing can be judged by comparing the ampli-
RMS of vibration signal, which can directly reflect the vibra- tude variation under the same frequency component and the
tion intensity of the bearing and is an important evaluation presence of a new frequency. The frequency domain-based
index. However, the dimensionless statistical characteristic analysis method can smooth the non-stationary components
values are not only related to the type, size and state of the in the signal and reflect the frequency information in the
bearing, but also in connection with the changes in external signal, but it also has the limitations of being unable to reflect
motion parameters (e.g. speed, pressure, load, etc.), and for the change of the signal frequency with time, so it is not
different working conditions, there will be large variation in suitable for analyzing non-stationary signals.
feature values so that it is unable to draw an unified conclu- Commonly used spectral methods include Fourier trans-
sion. The dimensionless statistical parameter is insensitive form, cepstrum analysis, refinement spectrum analysis [35],
to changes in external parameters, i.e., independent of the order tracking spectrum analysis, etc. Different analysis
bearing’s motion conditions, so it is an ideal monitoring methods focus on different directions. For example, in addi-
parameter in machine condition. For example, the kurtosis, tion to identifying and separating the periodic components in
peak factor, and pulse factor can be adopted for detecting the the signal, cepstrum analysis can also effectively extract the
impact component in the vibration signal. The dimensionless fault information in the signal when there is an unrecogniz-
characteristic parameter shall be zero-averarized, which is, able multi-cluster modulation sideband in the bearing fault
removing the mean from the original data and leaving only signal [36]. The order tracking spectrum analysis is available
the dynamic part. for extracting bearing fault features under variable speed.
It can be seen that the information reflected by the By establishing the corresponding relationship between the
time domain parameters is limited, and the information rotation speed and frequency in speed-up and speed-down
that different eigenvalues can display is also different. stages, it analyzes by converting the time-domain signal
Since the wear-type failure of the bearing is usually reflected into angle signal [37]. In addition, some statistical indica-
in the high amplitude level of the vibration signal, RMS tors [38], such as center frequency (CF), root mean square
and the peak value can be used to determine the degree of frequency(RMSF) and standard deviation frequency(STDF),
wear [33]. The RMS of the vibration signal increases with also have good discrimination ability for bearing faults.
the wear of the bearing. However, although RMS can reflect CF and the RMSF can describe the position change of
the surface roughness caused by the manufacturing quality or the main spectrum of the power spectrum, and STDF can
wear of the bearing working surface, it has certain limitations describe the degree of dispersion of the spectral energy [39].
on the failures such as partial peeling, scratches, indentations In summary, both time-domain features and frequency-
and pits on the bearing components. The pulse shape of domain features are a representation of the overall signal.
these discrete faults has a high peak, and for such an impact Or completely in the time-domain, or completely in the
fault [34], the peak factor is more representative than RMS. frequency-domain, it is impossible to characterize when and
When the peak factor is relatively small, it can reflect that the how the signal will change at a certain frequency component.
19010 VOLUME 8, 2020

The information expressed by this method is not comprehen- The feature extraction according to entropy theory is appli-
sive and requires a joint distribution of time and frequency to cable to the environment with high signal-to-noise ratio,
characterize the signal. but when the effective signal is completely submerged by
noise, a large overlap will be triggered between different
3) ENERGY FEATURE signal entropies, making it difficult to accurately distinguish
Since the measured vibration signal contains not only the features.
operating condition information related to the bearing itself, One of the cores of comprehensive diagnosis and pre-
but also a large amount of information about other rotating diction of bearing development fault state is the extraction
parts and structures in the unit equipment, of which the latter of signal fault features. It is particularly crucial to select
belong to background noise compared to the former [40]. features that can accurately represent the fault category to
Background noise is usually so large that the slight bearing improve the accuracy of the diagnosis results. From the above
fault information will be submerged and difficult for extrac- analysis, we can know that the fault information displayed
tion. Thus, it is hard to accurately assess the working con- by different categories of features is not the same, so it is
dition of bearing through the conventional time-domain and necessary to establish a high-dimensional feature set that can
frequency-domain methods [41], [42]. Therefore, the method represent the fault state of the bearing to a large extent. In this
of time-frequency analysis based on Wigner-Ville Distribu- paper, based on the different characteristics of the system,
tion (WVD) [43], Wavelet Transform (WT) [44] and Empir- the time domain, frequency domain statistical parameters,
ical Mode Decomposition (EMD) [45] has been widely used wavelet packet decomposition energy and entropy composi-
in recent years. tion feature set are extracted for subsequent operations.
The time-frequency analysis can characterize the variation
of the signal spectral components over time, and finally B. SUPPORT VECTOR MACHINE
characterize the distribution of signal strength or energy The support Vector Machine (SVM) [49], [50] is a clas-
simultaneously in time and frequency. As soon as the rolling sification method based on the principle of structural risk
bearing breaks down, the energy of the fault feature band minimization proposed by Vapnik et al.. The main purpose
corresponding to the vibration signal will be significantly of the SVM is to not only correctly classify the various
increased, so that the fault type and the fault location can be sample points, but also to maximize the spacing between the
determined by judging the characteristic frequency band in them, that is, to maximize the minimum distance between the
the wavelet decomposition result that includes fault informa- optimally divided hyperplane and all training sample points.
tion. Therefore, the deep information of the fault type can be The principle can be described as follows:
reflected by decomposing the signal via the time-frequency Given an original data sample set:
method and extracting the energy characteristics in different n o
frequency bands. (xi , yi ) xi ∈ Rd , yi ∈ {−1, +1} , i = 1, 2, . . . , n . (3)

4) ENTROPY FEATURE where n is the number of training data samples, and xi is the
Entropy is a measure of information uncertainty [46]. The input of the model; d represents the dimension of the training
entropy of different frequency bands can be used to mea- sample; yi is the sample category; -1 and 1 are category labels.
sure the uncertainty of signal distribution state and signal For the linearly separable case, the separation plane equa-
complexity, so it can quantitatively describe the information tion is w · x + b = 0. The sample (xi , yi ) needs to satisfy:
contained in the signal. According to the overall average
yi [(w · xi ) + b] ≥ 1, i = 1, 2, . . . , n (4)
characteristics of the signal source, entropy can manifest the
complexity of system internal information, so the essential where w is the plane normal vector and b is the constant term.
information of the bearing fault can be extracted based on The distance between the nearest sampling point and the
the effective entropy value. Commonly used entropy features separation plane is 1/ kwk. Therefore, the maximum spac-
are Shannon entropy, index entropy [47], and permutation ing of 1/ kwk can be equivalent to the minimum value of
entropy [48]. For a discrete random variable X with a sample kwk2 . The separation line determined by w is the optimal
space of [x1 , x2 , . . . , xn ], the Shannon entropy is: separation line, and the sample points on the separation line
n w · x + b = ±1 are called support vectors.
1 X
H (X ) = E log2 =− p (xi ) log2 p (xi ) (1) The Lagrange optimization method is adopted to convert it
p (xi )
i=1 into its dual problem, namely, the maximization function:
where p (xi ) represents the probability of the sample. Index
N
entropy can avoid the case where the logarithm of Shannon X 1
max W (α) = αi − αi αj yi yj xi · xj

(5)
entropy is prone to undefined and zero values. Its definition 2
I =1
is as follows:
n where, αi is Lagrange multiplier, and αi ≥ 0, i = 1, . . . , n.
pi e(1−pi )
X
HEXP = − (2) It actually aims to find the optimal solution of quadratic
i=1 function with constraints, and the sample corresponds to the
VOLUME 8, 2020 19011

non-zero αi in the solution is support vector, so that the Least squares support vector machine-Quantum behaved par-
optimal classification function can be obtained in this way: ticle swarm optimization (QPSO-LSSVM); In [55], Zhu et al.
( n ) proposed a multi-scale global fuzzy entropy (MGFE) fea-
ture extraction method, and introduced multiple class feature
X
f (x) = sgn (w · x) + b = sgn
∗
αi yi (xi · x) + b
∗ ∗

i=1
selection (MCFS) method to filter features, and finally input
(6) SVM for fault diagnosis.
However, most literature on bearing diagnostics uses SVM
where, αi∗ is optimal Lagrange factor and b∗ is classification for pattern recognition only in the final step of its algorithm.
threshold, which are the parameters for determining optimal In this paper, the SVM is directly introduced into the feature
hyper-plane partition. The positive or negative function indi- selection part to diagnose a single feature. According to its
cates the class attributes. diagnostic rate, it is judged whether the feature has strong
Regarding the linear inseparable case, the slack variable correlation with the bearing fault information, and the invalid
ξi is introduced, so as to convert the problem of looking for information is eliminated. By this method, features capable of
hyper-plane into quadratic programming problem: expressing fault information in the feature set can be extracted
 to a greater extent. The method can perform a screening of the
N
φ (ω) = min 1 kωk2 + C original feature set.
 X
εi

2 (7)
i=1

s.t.y [(w · x ) + b] ≥ 1 − ξ , ξ ≥ 0, i = 1, . . . n
 C. CORRELATION ANALYSIS
i i i i
In the field of statistical signal processing research, correla-
where, ξi is the positive slack variable that allows misclas- tion analysis has been the focus of scholars. The study of cor-
sification, representing the deviation amount of correspond- relation is a method that uses the relevant two sets of variables
ing data point xi from the hyper-plane. C is penalty factor, to reflect the overall relevance. The Pearson correlation coef-
indicating the degree of punishment for misclassification. ficient represents the degree of linear correlation between the
It is used for control the weight between looking for the two sets of variables [56]. The Pearson correlation coefficient
hyper-plane with maximal spacing in the objective function R can be expressed as a formula:
and guaranteeing the minimum deviation amount at the data PN
cov (A, B)

point. i=1 Ai − A Bi − B
R=
σA σB
= q
PN 2 PN 2 (9)
For the nonlinear separable case, the low-dimensional i=1 Ai − A i=1 Bi − B
input space can be mapped into the high-dimensional feature
space by introducing the kernel function, so as to realize the where A and B represent two sets of features of equal length.
linear classification after nonlinear classification transforma- N is the number of samples in the variable; Ai and Bi are the ith
tion. In this case, the classification function becomes: measurements of variables A and B; A and B are the average
( n ) of variables A and B, respectively.
X The correlation coefficient R ranges from −1 to +1. When
f (x) = sgn αi yi K (xi , x) + b
∗ ∗
(8)
the value is 0, there is no linear correlation between the two
i=1
features. If the value is at [−1, 0), it indicates that the two
where K (xi · x) is the kernel function. features are negatively correlated; if the value is at (0, +1],
Replacing the inner product of the original space with a the two features are positively correlated. The closer the abso-
kernel function is the key to SVM. Common kernel func- lute value of the correlation coefficient R is to 1, the higher
tions [51] are as follow: 1) Linear kernel K (x, y) = x · y; the degree of correlation between the two features, indicating
2) Polynomial kernel K (x, y) = [(x · y) + 1]q ; 3) Radial that the duplicate information of the two features is larger;
basis function (RBF) kernel K (x, y) = exp kx − yk2 /2σ 2 ;

When the absolute value of R is 1, the information repre-
4) Sigmoid kernel K (x, y) = tanh (α (x · y) + b). sented by the two features can be replaced with each other.
The fault diagnosis of rolling bearings is usually a multi- Therefore, the larger the absolute value of R, the lower the
class identification task. In view of the better classifi- significance of the corresponding feature [57]: 1) |R| ≥ 0.8,
cation ability of SVM for nonlinear and small training highly correlated; 2) 0.5 ≤ |R| < 0.8 , moderate correlation;
samples, it is still widely used in the field of machine 3) 0.3 ≤ |R| < 0.5, low correlation; 4) |R| < 0.3 , weak
fault diagnosis. Liu et al. [52] used the SVM to verify correlation, which can be regarded as nonlinear correlation.
the superiority of the method by merging the Minim In this part, the Pearson correlation coefficient method is
Entropy Deconvolution (MED) with the hierarchical fuzzy used to select the selected features again. The highly corre-
entropy; Wan et al. [53] combined the objective wavelet lated features of each type of feature are selected, and only
transform (EWT) with multi-scale entropy to obtain new one of the main features is taken as a sensitive feature. It is
features, and input SVM to improve the fault diagnosis effi- considered that the physical information expressed by the
ciency of the bearing; In [54], a novel rolling bearing fault remaining features is basically the same as this sensitive fea-
diagnosis strategy was proposed based on Improved multi- ture, which is a redundant feature and is excluded to achieve
scale permutation entropy (IMPE), Laplacian score (LS) and the purpose of the second screening.
19012 VOLUME 8, 2020

D. PRINCIPAL COMPONENT ANALYSIS AND corresponds to the eigenvalues λi of the original feature, and
WEIGHTED LOAD EVALUATION then the contribution rate of the j-th principal component is:
1) PRINCIPAL COMPONENT ANALYSIS λj
Principal component analysis (PCA) is a commonly used data Cj = Pm ∗ 100% (12)
i=1 λj
processing and analysis method. Its purpose is to reduce the
Since the variance of each principal component is decreas-
data to eliminate overlapping information in the coexistence
ing, the amount of information contained is also decreasing.
of many information [3], [58], [59].
Therefore, in the actual analysis, it is generally not to select
PCA maps high-dimensional data space to low-dimensional
m principal components, but to select the first k principal
space by orthogonal transform, which recombines many
components (Ak reaches 85%-90%) according to the cumu-
original features with certain correlation into a group of rel-
lative contribution rate of each principal component. The
atively uncorrelated integrated features. This not only retains
contribution rate here refers to the proportion of the variance
the main information of the original variables, but the new
of a principal component to the total variance, that is, the pro-
features are not related to each other. From the mathematical
portion of a certain eigenvalue to the total eigenvalues:
point of view, the m-dimensional feature is mapped to the Pk
k(k<m) dimension, and the obtained k-dimensional feature is λi
Ak = Pi=1m ∗ 100% (13)
i=1 λi
the principal component feature extracted from the original
data feature. This k-dimensional principal component feature
The greater the variance contribution rate, the stronger
already contains most of the information. These new vari-
the ability of the selected principal components to reflect
ables are irrelevant and are arranged in descending order of
comprehensive information.
variance [60]. The specific analysis steps are as follows:
(1) Original data standardization: For the evaluation 2) WEIGHTED LOAD EVALUATION
objects in a group n, there are m features: X1 , X2 , . . . Xm ; then
In practical applications, after selecting the important princi-
the j-th indicator value of the i-th evaluation object is marked
pal components, we must also pay attention to the interpreta-
as xij , so the sample space matrix of the evaluation object can
tion of the actual meaning of the principal components. Since
be obtained:
  the new principal component is obtained by orthogonal trans-
x11 · · · x1m formation of the original features, each principal component
X = (X1 , X2 , . . . , Xm ) =  ... ..
.
..  (10)
. 
reflects the comprehensive information of multiple original

variables. Therefore, it is difficult to obtain the original fault
xn1 ··· xnm
information directly from the principal component. To this
where, Xi = (x1i , x2i . . . , xni ) , i = 1, 2, . . . , m. The various end, the article uses the load analysis method to obtain the
indicators xij are standardized: load factor matrix of the k principal components and score
the original features.
xij − µj
x̃ij = , i = 1, 2, . . . , n; j = 1, 2, . . . , m. (11) (1) Load factor matrix:
Sj It can be learned from the principle of the principal com-
q 2 ponent analysis method that each principal component can be
where, µj = 1n ni=1 aij , Sj = n−1 1 Pn
i=1 aij − µj ; µj obtained by linear combination X1 , X2 , . . . , Xm :
P
and Sj are the sample mean value and standard deviation of the
X −µ F1 = α11 × X1 + α21 × X2 + · · · + αm1 × Xm
j-th features, respectively; then the corresponding X̃j = j Sj j
is standardized characteristic variable. F2 = α12 × X1 + α22 × X2 + · · · + αm2 × Xm
(2) Correlation coefficient matrix: the correlation coeffi- ..
Pn
x̃ki ·x̃kj
.
cient of the standardized feature is rij = k=1 n−1 , i, j = Fm = α1m × X1 + α2m × X2 + · · · + αmm × Xm (14)
1, 2, . . . , m. The correlation coefficient matrix is composed
as R = rij m∗n , and rii = 1, rij = rji . Each principal component Fi in (14) corresponds to i-th
(3) Computerization of eigenvalues and eigenvectors: eigenvalue λi :
according to the correlation coefficient matrix, the eigenval- Fi = X αi (i = 1, 2, . . . , m) . (15)
ues λ1 ≥ λ2 ≥ . . . λm ≥ 0 can be obtained from big to
T
small. αj = α1j , α2j , . . . , αmj represents the eigenvector According to the correlation matrix theorem, αi satisfies
corresponding to the i-th eigenvalue λi . The eigenvectors the Equation (16):
β1 , β2 , . . . , βm can be obtained after orthogonalization T and
m
X
unitization on this basis, where βj = β1j , β2j , . . . , βmj . αi αiT = 1 (16)
(4) Selection of important principal components: Let the 1
principal component as F1 , F2 , . . . , Fm . The contribution rate Combining (15) and (16), we get:
and cumulative contribution rate of principal components m
X
are mainly calculated based on the previously computerized X= Fi αiT (17)
eigenvalues. The contribution of each principal component 1
VOLUME 8, 2020 19013

As mentioned above, the k principal components inevitably be doped with some invalid or redundant features,
F1 , F2 , . . . , Fk (k < m) is obtained according to the original and may lead to ‘‘dimension disaster’’, which will increase
features X1 , X2 , . . . , Xm , and this meets COV Fi , Fj = 0,

the calculation amount and reduce the prediction efficiency.
namely, Fi and Fj are not correlated; The variance D (Fi ) Therefore, it is necessary to reduce the dimension of the
is greater, so the first k principal components can stand for feature set as much as possible while ensuring the integrity
the majority of information in original features with lowered of the information to obtain the best feature vector in accor-
dimensionalities. The linear equations of the first k principal dance with the processing background. Depending on the best
components can be derived: sensitive characteristics, bearing faults can be diagnosed and
the operation of the equipment can be further guided.
Fi = α1i × X1 + α2i × X2 + · · · + αki × Xk ,
i = 1, 2, . . . , k. (18) A. CONSTRUCTION OF FAULT FEATURE SET
Then the original features can be expressed: According to the multi-dimensional feature extraction
method mentioned in section 2.1, the feature set Q1 of the
m
X bearing under a certain machining condition can be con-
X̂ = Fi αiT (19)
structed. The feature set contains four types of features,
1
namely four sub-feature sets {T1 , F1 , E1 , S1 }, which represent
where, the combination coefficient αi = (α1i , α2i , . . . , αki )T time-domain features, frequency domain features, energy fea-
of each principal component is the load factor matrix corre- tures and information entropy features, respectively, where
sponding to the eigenvalue of the original feature. Q1 = T1 + F1 + E1 + S1 .
(2) Original feature evaluation 12 commonly used time-domain parameters t1 ∼ t12 are
The feature vector coefficients of the principal components selected to form a time-domain feature set T1 , including:
can be calculated based on the obtained principal component mean, RMS, absolute mean, amplitude of RMS, peak-to-peak
feature values, contribution rates, and load factors, as (20) value, peak factor, standard deviation, kurtosis factor, form
displays [61]. factor, pulse factor, margin factor and skewness factor. For the
cij frequency-domain parameters, due to the working principle
fij = √ , i = 1, 2, . . . k, j = 1, 2, . . . m. (20)
λi of the bearing, the corresponding fault frequency component
will be generated when the bearing fails. The change of each
where fij is the eigenvector coefficient of the main component,
frequency component in the signal will cause corresponding
αij is the component load of each feature under the principal
changes in the power spectrum. By describing the varia-
component, and λi is the eigenvalue of the corresponding
tion of the main frequency band in the power spectrum, the
principal component.
frequency-domain feature variation of the bearing signal can
The weight ωi corresponding to each principal component
be well described. The frequency domain feature set consists
can be obtained from the corresponding variance contribu-
of CF, RMSF, and STDF: P1 = {p1 , p2 , p3 }. The relevant
tion rate. A mathematical model of the principal component
calculation equation is shown in Table 1.
composite score can be obtained by linearly summing all the
The energy of each frequency component in the signal con-
principal components.
tains a wealth of fault information. The article decomposes
k
X the original signal by means of wavelet packet decompo-
F= ωi × F i , i = 1, . . . , k. (21) sition. By conducting i-th layer of wavelet packet decom-
1 position on original signal X, a wavelet packet decomposi-
The weighted load score ν of the original feature can be tion sequence Si,j (j = 1, 2, . . . , 2i ) can be obtained. The
expressed: secondary energy type is used to indicate the reconstructed
k signal corresponding to each frequency band; then the energy
spectrum [62] of the j-th frequency band of i-th layer of
X
ν= ωi × αij , i = 1, . . . , k, j = 1, . . . m. (22)
1
wavelet packet decomposition is:
2
Ei,j (l) = xi,j (l) (23)
III. METHOD AND SYSTEM FRAMEWORK
The information represented by a single feature is limited, wherein, xi,j (l) is the discrete point amplitude of the recon-
and does not fully reflect the fault information of the bear- structed signal, j is the frequency band serial number of
ing signal. Extracting multiple features can more accurately the i-th layer after decomposition, l is the sampling point
determine the fault category. Therefore, it is necessary to serial number(l = 1, 2, . . . n), n is the total number of
construct multi-features with different dimensions such as signal sampling points. Then the wavelet packet energy
statistical parameters, energy and various entropies, and to spectrum of each frequency
T band can be obtained: Ei =
Ei,1 , Ei,2 , · · · , Ei,2i . The total signal energy ET at cer-

use the difference complementarity between different fea-
tures to construct a more comprehensive high-dimensional tain time window is equal to the sum of the energy of
feature set that expresses fault type information. However, each component. This constitutes an energy feature set:
if the feature concentration dimension is too high, it will E1 = e1 , e2 , . . . e2i , ET .
19014 VOLUME 8, 2020

TABLE 1. Time-domain and frequency-domain feature parameters.
Let pj = Ei,j /E and

P
pj = 1, then the corresponding Directing at a certain sub-feature set with the feature
wavelet energy spectrum entropy can be given according to number greater than 5, it is further screened by means of
the measurement of information entropy, that is: PCA-WLE method. Firstly, the PCA is used, and only k
principal components with contribution degree greater than
i
2
X 90% and in close correlation with bearing fault information
S=− pj log2 pj (24) are selected. Secondly, the original features are then scored
j=1 according to the principal components. The load matrix of
these principal components is calculated, and the principal
This constitute an entropy feature set: S1 = {s1 , s2 , . . . components are weighted and fused in accordance with their
s2i , ST }. contribution rate so as to obtain the score ranking of the
original features. The three features with the highest sum
B. FEATURE SELECTION AND WEIGHTING FUSION score in the principal component are selected through the
1) FEATURE SELECTION weighted calculation method of load evaluation, so that the
First, the single-variable feature selection is conducted four new sub-feature sets (T4 , P4 , E4 and S4 ) are formed,
through SVM, and the respective diagnostic rate ϕ can be which constitute the total feature set Q4 . Finally, the feature
initially obtained by taking each feature in the feature set as selection process is completed.
the input of the SVM classifier. The features with ϕ ≥50%
are considered as in close correlation with the bearing fault 2) FEATURE WEIGHTING FUSION
information, so they are retained; the features with ϕ <50% The weight of a feature is deemed as an evaluation of the
are regarded as invalid features and eliminated. Thereby a feature importance. The sensitivity of the evaluated features
screened feature set Q2 is obtained. to failure is determined by assigning the figures between
Then, the corresponding similarity γ is obtained based on [0, 1] to them. According to the feature selection process,
the correlation analysis of the four sub-feature sets in Q2 . The a set of features that are sensitive to fault information has been
group of features with γ ≥ 85% is considered to have greater obtained. However, the correlation analysis shows that the
similarity in the contained bearing fault information, so only fault information reflected by different sensitive features is
the features with the highest diagnostic rate screened in last distinguished, so the weighted fusion on the selected sensitive
round of screening are retained as the main features, and the features is necessary to obtain more accurate and reliable data
remaining features are regarded as redundant and removed. analysis results.
In this way, the second round of screening is completed and The corresponding diagnosis success rate Ri is obtained via
the feature set Q3 is obtained. putting the obtained new sub-feature set Q4 into the SVM
VOLUME 8, 2020 19015

classifier for diagnosis; the weight Wi of the sub-feature the fault information to the greatest extent can be obtained
set is obtained according to Ri , which, to some extent, can through multiplying each standardized feature by its corre-
represent the ability of the fault information in diagnosing the sponding weight and then summing them up.
bearing [63]. The corresponding calculation formula is: Step 5: The fusion feature is used as an input for pattern
Ri recognition so as to train the fault classifier, and the obtained
Wi = , i = 1, 2, . . . , M . (25) weights can be further reflected in the feature extraction
M
process to guide the model to perform feature extraction
P
Ri
i=1 according to a certain weight.
where, M is the number of features.
Before the weighted fusion of features, it is necessary IV. EXPERIMENTS AND ANALYSIS RESULTS
to standardize the features to prevent from flushing out the A. CASE1
features with smaller data values by those with greater data 1) DATA DESCRIPTION
values, so as to avoid affecting the calculation results due to In order to verify the feature selection method proposed in
different dimensions. The feature value qi of the i-th feature this paper, the rolling bearing fault signal provided by the
is normalized according to (26). laboratory of Case Western Reserve University(CWRU) [64]
qi − min (qi ) is taken as an example for testing. The bearing parameters
q0i = , i = 1, 2, . . . , M . (26) used in the test are shown in Table 2. The entire test stand
max (qi ) − min (qi )
consists of a three-phase asynchronous motor (left), a torque
A feature that can describe the fault information to the encoder (center), a dynamometer (right) and associated vibra-
greatest extent can be obtained through multiplying the fea- tion acceleration sensors, as shown in Fig. 2.
ture qi in feature set Q4 by its corresponding weight Wi The single-point-fault was introduced to the test bearings
and then summing them up. The sum of the weights is 1. using electro-discharge machining with fault positions of
The new fusion feature obtained from this linear weighted inner raceway, outer raceway and rolling element. Fault diam-
combination is calculated as follows: eters include 0.1778mm, 0.3556mm and 0.5334mm (fault
M = q1 • W1 + q2 • W2 + . . . + qi • Wi severity: mild fault, moderate fault and severe fault). With
( normal bearing data, bearing data can be divided into 10 types
Wi > 0
× P i = 1, 2, . . . , M . (27) for each condition. The motor no-load speed is 1797r/min.
W = 1i The vibration signals at the driving end is recorded
Depending on the weight Wi , the sensitivity of the selected by the acceleration vibration sensor with a sampling fre-
feature to the fault can be obtained. Combined with SVM, the quency of 12 kHz under different motor loads of 0-3 horse-
fault type and severity of the experimental data are identified power(motor speeds of 1730 to 1797 rpm). The data under
to verify the effectiveness of the method. the three load conditions form three data sets A, B and C
respectively. As can be learned from the motor speed and
C. SYSTEM BLOCK DIAGRAM the sensor sampling frequency, about 400 data points are
collected in one rotation of the bearing. Therefore, in order
The implementation flow of the feature screening model
to ensure that the length of a single sample can completely
proposed in this paper is shown in Fig.1. Taking the fault
and accurately reflect that data distribution of the bearing
diagnosis of the bearing as the goal, the process is divided
vibration signals in this state, the first 120000 points of the
into five steps: signal processing, feature extraction, feature
raw data in each sample are taken, and every 1200 data
selection, feature weighting fusion and patterns recognition.
points are regarded as a small sample length, so that each
Step 1: Collecting the vibration signal in the bearing oper-
raw data can produce 100 samples. Let the first 70 groups
ation and decomposing the collected signal into different
be used for establishing the sample knowledge base and the
frequency bands.
last 30 groups be the validation samples to test the method
Step 2: Feature extraction is performed on the original
validity. Detailed information of the bearing vibration data
signal and each frequency band signal, and its time-domain,
set is shown in Table 3.
frequency-domain, energy and entropy features are obtained
When the fault diameter is 0.5334mm, the samples with
and constitute the original high dimensional feature set.
the bearing condition of 1hp in the normal state and different
Step 3: The original features are sequentially subjected to
fault states are extracted, and the respective vibration signals
three feature selection processes: SVM single feature selec-
are shown in Fig. 3.
tion, correlation analysis and PCA-WLE. The invalid feature
and redundant feature in the feature set are eliminated to
obtain the low-dimensional sensitive feature set. 2) ANALYSIS RESULTS
Step 4: The corresponding diagnosis rates are obtained by According to the system flow shown in Fig. 1, the extracted
inputting the features in low-dimensional feature set into the original signal X {x1 , x2 , . . . , xn } is processed first. The
SVM, respectively, and the corresponding weight is obtained ‘‘db5’’ wavelet is selected to decompose the vibration signal
based on the diagnosis rate. A fusion feature that can describe into four layers, and the characteristic signals 16 frequency
19016 VOLUME 8, 2020

FIGURE 1. Implementation of feature selection model.
TABLE 3. Detailed information of the bearing vibration data set.
FIGURE 2. Experimental test stand [3], [64].
TABLE 2. Tested bearing parameters.
the fourth layer and the total energy EZ (EZ = 16

P
i=1 ei ) of the
fourth layer, the energy feature set E1 = {e1 , e2 , . . . , e16 , EZ }
can be formed. According to the energy ratio pi = ei /EZ of
bands at the fourth layer from the low frequency to the high different frequency bands, the information entropy character-
frequency are obtained. The wavelet packet coefficients are istic of signal can be obtained, and the information entropy
reconstructed to obtain the reconstructed signal. By calculat- feature set S = {s1 , s2 , . . . , s16 , SZ } can be formed. The
ing the total band energy ei (i = 1, 2, . . . , 16) of each node in time domain feature set and the frequency domain feature set
VOLUME 8, 2020 19017

TABLE 4. Single fault feature correlation rate.
FIGURE 3. Vibration signal of bearing in different states (fault diameter is

0.5334mm).
TABLE 5. PCA analysis results of remaining entropy features.
FIGURE 4. Trends in the characteristics of training samples.
entropy features, respectively. And the horizontal axis repre-

sents the number of features. Based on the screening thresh-
old of 50%, it is considered that the feature with a diagnosis
rate lower than 50% is an invalid feature because of being in
weak correlation with the intrinsic information of the bearing
fault. According to Fig. 5, only the features with a diagnosis
FIGURE 5. The SFDR of 49 features of training samples. rate greater than 50% are retained, which are 25 features in
total.
By conducing correlation analysis on the remaining parts
are T = {t1 , t2 , . . . t12 } and P = {p1 , p2 , p3 }, respectively. of the four types of features and introducing (9), the linear
The fault feature set Q1 contains 49 fault features, each of correlation degree between any of the two features can be
which has different distinguishing characteristics and varia- obtained. The results are shown in Table 4.
tion degrees. In this paper, the features of mean value, RMS, When multiple feature correlations are greater than 85%,
skewness, e1 , e9 and s9 are used as examples, which can be only one feature with the highest diagnostic rate is retained,
shown in Fig. 4. Therefore, the degree of correlation between which refers to the gray portion in the table. It can be observed
features and fault information is different, which requires a from the table that there are still 7 remaining features based
further screening to extract the sensitive features that are more on entropy and only 4 remaining features of three types.
suitable for fault diagnosis. In order to further refine the remaining effective features and
The obtained original feature set is input into the SVM reduce the amount of computation, the remaining information
for single feature selection, and the single feature diagnos- entropy features are processed by the PCA-WLE method. The
tic rates(SFDR) of different features can be respectively seven principal components and the corresponding cumula-
obtained, as shown in Fig. 5. Among them, the 1-12 (red), tive contribution rates are obtained, as shown in Table 5.
13-15 (blue), 16-32 (yellow) and 33-49 (green) represent The (first four) principal components with a cumulative
time-domain, frequency-domain, wavelet energy and wavelet contribution rate of about 90% are retained, and their load
19018 VOLUME 8, 2020

FIGURE 6. The principal component and comprehensive score. FIGURE 7. The Feature weights of sensitive features.
factor matrixes (TABLE 5) are calculated. Taking principal

component F1 as an example, its linear expression is:
F1 = 0.4895 × S1 + 0.7061 × S2 + 0.3975
× S3 + 0.7866 × S6
− 0.2930 × S7 + 0.9036 × S8 + 0.7944 × S13 . (28)
By introducing the obtained principal component eigenval-
ues and load factors into (20), the eigenvector coefficients of
the principal components can be obtained.
The weight ωi for each principal component can be
obtained from the corresponding variance contribution rate, FIGURE 8. Bearing fault diagnosis result based on weighted fusion
feature.
which are 0.4620, 0.2931, 0.1604 and 0.0844, respec-
tively. By linearly summing all the principal components,
the weighted load scores of the original features can be With the purpose of verifying the validity and applica-
obtained (Table 5). Fig. 6 is a graph of each principal com- bility of the proposed method of selecting fault features,
ponent and comprehensive score. It can be seen that the com- the four different types of screened features are compared
prehensive score considers the difference of the four principal with the weighted fusion features obtained by this method.
components and eliminates the problem of large fluctuation Table 7 presents the comparison results.
between different principal components. It is obvious that the selection feature with weighted fusion
F = 0.4620 × F1 + 0.2931 × F2 + 0.1604 in this paper has a higher diagnostic accuracy than the single
feature. And the diagnostic rate of selection feature with
× F3 + 0.0844 × F4 (29)
weighted fusion is higher than that of selection feature with-
According to the score, the top three features are: S2 , S1 out weighted fusion (from 95.67% to 99.61%).
and S13 . By this means, the specific fault feature set of this
case can be formed: B. CASE2
1) DATA DESCRIPTION
A = {T2 , T9 , P3 , EZ , S1 , S2 , S13 }
In this case, the test data about bearing lifetime in acceleration
By inputting the remaining features into the SVM again, provided by Xi’an Jiaotong University School of Mechanical
the diagnostic result rate Ri can be obtained. According Engineering is taken as the input of the model. The acceler-
to (27), the corresponding weights are obtained as 0.1633, ated life testing stand consists of an alternating current (AC)
0.1179, 0.1693, 0.1361, 0.1451, 0.1391 and 0.1293 which induction motor, a motor speed controller, a support shaft,
respectively represent the proportion of the features in the two support bearings (heavy duty roller bearings) and a
fault feature set. Fig. 7 shows the weights of 7 features after hydraulic loading system [65], as shown in Fig. 9. The bear-
screening. ing parameters used in the test are shown in Table 8. Each set
The training set is input into the SVM classifier to train the of data contains the whole process data of the rolling bearing
classification model, and the model is optimized by Particle ranging from normal operation to severe failure (10 times
Swarm optimization (PSO) to obtain the optimal parameters of the maximum amplitude in normal operation) under this
c and g of the SVM. The normalized test set features, after operating condition.
multiplying by the corresponding weights, are input into the Based on the 25.6kHz of sampling frequency of the accel-
SVM classifier to identify the fault category, and the result is eration sensor and 1min of sampling interval, in this test,
shown in Fig. 8. about 700 data points are collected during a complete rotation
VOLUME 8, 2020 19019

TABLE 6. Features score and rank.
TABLE 7. Bearing fault diagnosis results of different types of feature sets. TABLE 8. Tested bearing parameters.
TABLE 9. Detailed information of the bearing vibration data set.
FIGURE 9. Experimental test stand [65].
of the bearing. In order to ensure that the length of a single

sample can completely express the signal data distribution,
FIGURE 10. The SFDR of 49 features of training samples.
the first 32200 points in each small sample are taken. With
a sample length of 700 points, each original sample can
generate 46 small sample sets. Let the first 30 groups be 2) ANALYSIS RESULTS
used for establishing the sample knowledge base and the Similarly, the ‘‘db5’’ wavelet is applied to decompose the
last 16 groups be the validation samples to test the method vibration signal into 4 layers and extract the features from
validity. original vibration signal. In this way, 49 fault features can
The data under the operation condition at 2250r/min of also be obtained. By inputting the obtained original feature
rotation rate and 11kN of radial load is selected for verifi- set into SVM for single feature screening, 26 fault features
cation. Detailed information of the bearing vibration data set with a diagnosis rate greater than 50% are finally obtained,
is shown in Table 9. The vibration signals, according to their as shown in Fig. 10.
amplitude changes, can be divided into four types: normal As can be learned from Table 10, the summary of corre-
data, mild damage, moderate damage and severe damage. lation analysis on the 26 remaining features, by removing
19020 VOLUME 8, 2020

TABLE 10. Single fault feature correlation rate. TABLE 11. Features score and rank.
FIGURE 12. The principal component and comprehensive score.
FIGURE 11. PCA analysis results of remaining entropy features.

each principal component and the integrate score.
the redundant features through correlation analysis, one time F = 0.7341 × F1 + 0.2121 × F2 + 0.0532 × F3 (31)
domain feature (RMS), one frequency domain feature (CF),
By processing the entropy features in the same way,
six energy features, and seven entropy features can be
the feature set needed by the bearing under this operation
obtained. Therefore, the PCA-WLE method shall be adopted
condition can finally be obtained:
to separately reduce the dimensionality of the energy features
and entropy features. A = {T2 , P3 , e1 , e6 , e14 , S1 , S8 , S16 }
The principal component analysis results of the two types
of features are separately shown in Fig. 11. Taking energy fea- The fault diagnosis rates corresponding to the above
tures for example, the principal components (first three) with 8 sensitive features are: 0.7673, 0.8994, 0.5220, 0.5975,
cumulative contribution rate greater than 90% are retained, 0.5975, 0.7799, 0.5346 and 0.5975 By introducing these into
and their load factor matrix as well as principal component formula (27), the corresponding weights can be obtained,
eigenvector coefficients (Table 11) are calculated. Taking the which are 0.1449, 0.1698, 0.0986, 0.1128, 0.1128, 0.1473,
principal component F1 as an example, its linear expression 0.1009 and 0.1128. Fig. 13 shows the weights of 7 features
is: after screening.
By inputting the normalized test set to the trained
F1 = 0.3661 × e1 + 0.4601 × e6 + 0.4102 × e8 + 0.2890 PSO-SVM model after multiplying by the corresponding
× e12 + 4170 × e14 + 0.4781 × Ez (30) weights, the type of failure can be identified, and the results
are shown in Fig. 14. According to Table 12, which shows
The weight of each principal component can be obtained the diagnosis results of different features, unweighted fusion
via the corresponding variance contribution rate, which is features and the weighted fusion features proposed in this
0.7341, 0.2121, and 0.0532, respectively. A weighted load article, the sensitive features based on weighted fusion have
score of the original features can be obtained by linearly sum- a higher diagnosis rate than the single-domain features and
ming all the main components (Table 11). Fig. 12 presents unweighted fusion features.
VOLUME 8, 2020 19021

FIGURE 15. Nassi–Shneiderman diagram(N-S) of some features.

FIGURE 13. The Feature weights of sensitive features.
can effectively utilize the difference and complementarity

between the features, so as to discover the fault informa-
tion hidden in the signal more accurately and comprehen-
sively. Compared to the features without weighted fusion,
the diagnostic rate after weighted fusion of multi-domain
features is also improved. This means that weighting can
further highlight the importance of sensitive features and
expand their role in the classification process. Meanwhile,
feature weighting also makes the clustering center closer to
the dense region of corresponding clustering, thus highlight-
ing the clustering performance [38].
Although PCA is applied to process the features in this
FIGURE 14. Bearing fault diagnosis result based on weighted fusion
feature.
paper, the displayed results are not directly adopted since
the principal component cannot directly represent the physi-
TABLE 12. Bearing fault diagnosis results of different types of feature
cal meaning of the fault. Instead, the principal components
sets. are further processed to find the sensitive features with
highest grade. At the same time, SVM is also used in the
process of feature selection, which can provide the indica-
tors that reflect feature differentiation through single feature
extraction.
In the section of Case1, the RMS, form factor, STDF, total
energy of wavelet packet, Wavelet package entropy s1 , s2 and
s13 are eventually extracted as the final sensitive features
to constitute the feature set, but this only shows that these
features are closely related to this bearing failure. Different
C. DISCUSSION rotating machines and operating environments have different
As can be observed from Table 7, the diagnosis results of fea- effects on vibration signals, so although the method proposed
tures in different domains are various, of which the diagnosis in this paper has been successfully applied to bearings, it is
results of time-domain features and entropy features are still available for the feature extraction and selection in other
significantly better than that of frequency-domain features machines.
and energy features. This is because different features have The method proposed in this paper still has room for
their own importance degree in the fault diagnosis, and improvement. First of all, considering the weak physical
sensitive features in close relationship to fault information information contained in some of the features, the features
have higher fault identification ability. Moreover, considering that can be applied to fault diagnosis are not enumerated or
the differentiated number of features included in different applied in detail, which makes the fault feature set unable
domains, although some feature sets contain many features, to fully describe the bearing fault information. Secondly,
most of which are not related to the fault or in high over- the SVM classifier, which is used in this paper with better
lap and will generate strong interference with the later fault diagnosis results, can explain the superiority of this feature
diagnosis. selection method. However, the diagnostic results can be
Compared to the feature extraction in a single domain, further improved by optimizing the parameters of the SVM or
the features extracted from multiple domains have higher via selecting a more advanced diagnostic algorithm. Finally,
diagnosis rate. This shows that the multi-domain features the proposed method classifies the categories and degrees
19022 VOLUME 8, 2020

of the bearing fault at one time instead of considering that [4] X. Xue and J. Zhou, ‘‘A hybrid fault diagnosis approach based on
different fault levels of various fault types may correspond mixed-domain state features for rotating machinery,’’ ISA Trans., vol. 66,
pp. 284–295, Jan. 2017, doi: 10.1016/j.isatra.2016.10.014.
to the same feature value. As shown in Fig. 15, the value [5] Z. Wang, Q. Zhang, J. Xiong, M. Xiao, G. Sun, and J. He, ‘‘Fault diagnosis
of form factor features of severe fault in inner ring highly of a rolling bearing using wavelet packet denoising and random forests,’’
overlaps that of minor fault in rolling element, which can IEEE Sensors J., vol. 17, no. 17, pp. 5581–5588, Sep. 2017, doi: 10.
1109/jsen.2017.2726011.
cause misjudgment in the fault diagnosis and affect the final [6] Y. Wei, Y. Li, M. Xu, and W. Huang, ‘‘A review of early fault diagnosis
diagnosis result. Therefore, the respective wear degree shall approaches and their applications in rotating machinery,’’ Entropy, vol. 21,
be further differentiated after identifying the fault type. But no. 4, p. 409, Apr. 2019, doi: 10.3390/e21040409.
[7] J. Peng, L. Fan, W. Xiao, and J. Tang, ‘‘Anomaly monitoring method for
the proposed feature selection method is still applicable to key components of satellite,’’ Sci. World J., vol. 2014, pp. 1–14, Jan. 2014,
this identification plan. doi: 10.1155/2014/104052.
[8] J. Xiong, C. Li, J. Cen, Q. Liang, and Y. Cai, ‘‘Fault diagnosis method
based on improved evidence reasoning,’’ Math. Problems Eng., vol. 2019,
V. CONCLUSION pp. 1–9, Mar. 2019, doi: 10.1155/2019/7491605.
In this paper, a bearing fault feature selection method based [9] H. Li and D. Xiao, ‘‘Fault diagnosis of Tennessee Eastman pro-
on weighted Multidimensional fusion is proposed. In addition cess using signal geometry matching technique,’’ EURASIP J. Adv.
Signal Process., vol. 2011, no. 1, p. 83, Oct. 2011, doi: 10.1186/
to taking into account the importance of different features 1687-6180-2011-83.
during fault diagnosis, this method makes up for the inability [10] F. Bagheri, H. Khaloozaded, and K. Abbaszadeh, ‘‘Stator fault detection
of the principal component in explaining the physical mean- in induction machines by parameter estimation, using adaptive Kalman
ing when reducing the dimensions based on traditional PCA filter,’’ in Proc. Medit. Conf. Control Autom., Jun. 2007, pp. 1–6, doi: 10.
1109/med.2007.4433953.
algorithm. Meanwhile, by means of developing an original [11] H. Li and D. Y. Xiao, ‘‘Survey on data driven fault diagnosis methods,’’
high-dimensional feature set via extracting feature param- Control Decis., vol. 26, no. 1, pp. 1–9 and 16, 2011.
eters from different domains, a feature selection process [12] X. Ma and D. Li, ‘‘A hybrid fault diagnosis method based on fuzzy
signed directed graph and neighborhood rough set,’’ in Proc. 6th Data
algorithm that combines with SVM single feature evaluation, Driven Control Learn. Syst. (DDCLS), May 2017, pp. 253–258, doi: 10.
correlation analysis and PCA-WLE is put forward in this 1109/ddcls.2017.8068078.
paper for selecting sensitive features, and the weighted fusion [13] Y. Wang, X. Li, J. Ma, and S. Li, ‘‘Fault diagnosis of power trans-
former based on fault-tree analysis (FTA),’’ IOP Conf. Ser., Earth Environ.
is conducted according to the correlation between selected Sci., vol. 64, May 2017, Art. no. 012099, doi: 10.1088/1755-1315/64/1/
features and fault information. In the end, the fusion fea- 012099.
tures are input into the PSO-SVM classifier for fault diag- [14] Q. Yao, J. Wang, and G. Zhang, ‘‘A fault diagnosis expert system based
on aircraft parameters,’’ in Proc. 12th Web Inf. Syst. Appl. Conf. (WISA),
nosis, which proves that this method has strong identification
Sep. 2015, pp. 314–317, doi: 10.1109/wisa.2015.21.
ability. [15] S. Yuanyuan, G. Lili, and W. Yongming, ‘‘Artificial intelligence and
The results of experiment on test data from Case Western learning techniques in intelligent fault diagnosis,’’ in Proc. 4th Int. Conf.
Reserve University bearing data center and Xi’an Jiaotong Comput. Sci. Netw. Technol. (ICCSNT), Dec. 2015.
[16] Z. Liu, W. Guo, J. Hu, and W. Ma, ‘‘A hybrid intelligent multi-
University School of Mechanical Engineering indicate that fault detection method for rotating machinery based on RSGWPT,
the features selected by this proposed method can accurately KPCA and Twin SVM,’’ ISA Trans., vol. 66, pp. 249–261,
identify and classify the different fault categories and fault Jan. 2017.
[17] Y. Tian, Z. Wang, and C. Lu, ‘‘Self-adaptive bearing fault diag-
severity of bearing. Therefore, it is applicable to the feature nosis based on permutation entropy and manifold-based dynamic
selection of various bearings and rotating machineries with time warping,’’ Mech. Syst. Signal Process., vol. 114, pp. 658–673,
great application potential. Jan. 2019.
[18] C. Li, J. V. De Oliveira, M. Cerrada, F. Pacheco, D. Cabrera, V. Sanchez,
and G. Zurita, ‘‘Observer-biased bearing condition monitoring: From fault
AUTHOR CONTRIBUTIONS detection to multi-fault classification,’’ Eng. Appl. Artif. Intell., vol. 50,
Wei Dai and Weifang Zhang conceived and they were expe- pp. 287–301, Apr. 2016.
[19] J. Zheng, H. Pan, and J. Cheng, ‘‘Rolling bearing fault detection and
rience at assembly process; Yazhou Li built up the model diagnosis based on composite multiscale fuzzy entropy and ensemble sup-
and did the simulations; Wei Dai and Yazhou Li con- port vector machines,’’ Mech. Syst. Signal Process., vol. 85, pp. 746–759,
tributed to the writing and editing of the manuscript. Wei Feb. 2017.
[20] J. Ben Ali, L. Saidi, A. Mouelhi, B. Chebel-Morello, and F. Fnaiech,
Dai checked manuscript and provided some suggestions for ‘‘Linear feature selection and classification using PNN and SFAM neu-
revision. ral networks for a nearly online diagnosis of bearing naturally pro-
gressing degradations,’’ Eng. Appl. Artif. Intell., vol. 42, pp. 67–81,
Jun. 2015.
REFERENCES [21] Y. Li, Y. Yang, X. Wang, B. Liu, and X. Liang, ‘‘Early fault diagnosis
[1] Y. Lei, J. Lin, Z. He, and M. J. Zuo, ‘‘A review on empirical mode of rolling bearings based on hierarchical symbol dynamic entropy and
decomposition in fault diagnosis of rotating machinery,’’ Mech. Syst. Sig- binary tree support vector machine,’’ J. Sound Vibrat., vol. 428, pp. 72–86,
nal Process., vol. 35, nos. 1–2, pp. 108–126, Feb. 2013, doi: 10.1016/j. Aug. 2018.
ymssp.2012.09.015. [22] Z. Gao, C. Cecati, and S. X. Ding, ‘‘A survey of fault diagnosis and
[2] S. Adamczak, K. Stępień, and M. Wrzochal, ‘‘Comparative study of mea- fault-tolerant techniques—Part I: Fault diagnosis with model-based and
surement systems used to evaluate vibrations of rolling bearings,’’ Pro- signal-based approaches,’’ IEEE Trans. Ind. Electron., vol. 62, no. 6,
cedia Eng., vol. 192, pp. 971–975, 2017, doi: 10.1016/j.proeng.2017.06. pp. 3757–3767, Jun. 2015.
167. [23] Z. Gao, C. Cecati, and S. Ding, ‘‘A survey of fault diagnosis and fault-
[3] B. Pang, G. Tang, T. Tian, and C. Zhou, ‘‘Rolling bearing fault diagnosis tolerant techniques—Part II: Fault diagnosis with knowledge-based and
based on an improved HTT transform,’’ Sensors, vol. 18, no. 4, p. 1203, hybrid/active approaches,’’ IEEE Trans. Ind. Electron., vol. 62, no. 6,
Apr. 2018, doi: 10.3390/s18041203. pp. 3768–3774, Jun. 2015.
VOLUME 8, 2020 19023

[24] M. Kang, J. Kim, J.-M. Kim, A. C. C. Tan, E. Y. Kim, and B.-K. Choi, [44] R. Kumar and M. Singh, ‘‘Outer race defect width measurement in taper
‘‘Reliable fault diagnosis for low-speed bearings using individually roller bearing using discrete wavelet transform of vibration signal,’’ Mea-
trained support vector machines with kernel discriminative feature anal- surement, vol. 46, no. 1, pp. 537–545, Jan. 2013.
ysis,’’ IEEE Trans. Power Electron., vol. 30, no. 5, pp. 2786–2797, [45] S. Mohanty, K. K. Gupta, and K. S. Raju, ‘‘Hurst based vibro-acoustic fea-
May. 2015. ture extraction of bearing using EMD and VMD,’’ Measurement, vol. 117,
[25] R. Li, P. Sopon, and D. He, ‘‘Fault features extraction for bear- pp. 200–220, Mar. 2018.
ing prognostics,’’ J. Intell. Manuf., vol. 23, no. 2, pp. 313–321, [46] J. Li and J. Guo, ‘‘A new feature extraction algorithm based on entropy
Apr. 2012. cloud characteristics of communication signals,’’ Math. Problems Eng.,
[26] Y. Yang, Y. Liao, G. Meng, and J. Lee, ‘‘A hybrid feature selec- vol. 2015, pp. 1–8, Jun. 2015.
tion scheme for unsupervised learning and its application in bearing [47] L. Huang, M. Wang, and L. Wu, ‘‘Research on change detection approach
fault diagnosis,’’ Expert Syst. Appl., vol. 38, no. 9, pp. 11311–11320, using PSO algorithm and multiple thresholds exponential entropy in
Sep. 2011. remote sensing images,’’ Eng. Surveying Mapping, vol. 27, no. 7, pp. 1–5,
[27] C. Yang and T. Wu, ‘‘Diagnostics of gear deterioration using EEMD 2018.
approach and PCA process,’’ Measurement, vol. 61, pp. 75–87, [48] J. Zheng, H. Pan, S. Yang, and J. Cheng, ‘‘Generalized composite
Feb. 2015. multiscale permutation entropy and Laplacian score based rolling bear-
[28] B. Zhang, L. Zhang, and J. Xu, ‘‘Degradation feature selection for remaining fault diagnosis,’’ Mech. Syst. Signal Process., vol. 99, pp. 229–243,
ing useful life prediction of rolling element bearings,’’ Qual. Rel. Engng. Jan. 2018.
Int., vol. 32, no. 2, pp. 547–554, Mar. 2016. [49] V. N. Vapnik, The Nature of Statistical Learning Theory. New York, NY,
[29] Z. Wei, Y. Wang, S. He, and J. Bao, ‘‘A novel intelligent method USA: Springer, 1995.
for bearing fault diagnosis based on affinity propagation clustering [50] Y. Li, W. Dai, X. Wu, and Y. Kan, ‘‘Surface quality evaluation based
and adaptive feature selection,’’ Knowl.-Based Syst., vol. 116, pp. 1–12, on roughness prediction model,’’ in Proc. Int. Conf. Inf. Technol. Electr.
Jan. 2017. Eng. (ICITEE), 2018, doi: 10.1145/3148453.3306271.
[30] B. Samanta and K. Al-Balushi, ‘‘Artificial neural network based [51] Z. Zhang, Y. Zhang, and Q. Liu, ‘‘Fault diagnosis on bearing by support
fault diagnostics of rolling element bearings using time-domain fea- vector machine and wavelet analysis,’’ Machinery Des. Manuf., vol. 313,
tures,’’ Mech. Syst. Signal Process., vol. 17, no. 2, pp. 317–328, no. 3, pp. 204–207, 2017.
Mar. 2003. [52] Y. Liu and S. Liu, ‘‘Application of MED and hierarchical fuzzy entropy
[31] V. Sugumaran, V. Muralidharan, and K. Ramachandran, ‘‘Feature selection to rolling bearing fault diagnosis,’’ Machinery Des. Manuf., no. 11,
using decision tree and classification through proximal support vector pp. 49–52 and 56, 2018.
machine for fault diagnostics of roller bearing,’’ Mech. Syst. Signal Pro- [53] S. Wan, L. Dou, R. Liu, and X. Zhang, ‘‘Fault diagnosis for high voltage
cess., vol. 21, no. 2, pp. 930–942, Feb. 2007. circuit breakers based on EWT and multi-scale entropy,’’ J. Vib., Meas.
[32] L. Yuan, Y. He, J. Huang, and Y. Sun, ‘‘A new neural-network-based fault Diagnosis, vol. 38, no. 4, pp. 672–678 and 867, 2018.
diagnosis approach for analog circuits by using kurtosis and entropy as a [54] Y. Li, W. Zhang, Q. Xiong, D. Luo, G. Mei, and T. Zhang,
preprocessor,’’ IEEE Trans. Instrum. Meas., vol. 59, no. 3, pp. 586–595, ‘‘A rolling bearing fault diagnosis strategy based on improved mul-
Mar. 2010, doi: 10.1109/tim.2009.2025068. tiscale permutation entropy and least squares SVM,’’ J. Mech. Sci.
Technol., vol. 31, no. 6, pp. 2711–2722, Jun. 2017, doi: 10.1007/
[33] M. S. Ballal, Z. J. Khan, H. M. Suryawanshi, and R. L. Sonolikar, ‘‘Adap-
s12206-017-0514-5.
tive neural fuzzy inference system for the detection of inter-turn insulation
[55] K. Zhu, L. Chen, and X. Hu, ‘‘Rolling element bearing fault
and bearing wear faults in induction motor,’’ IEEE Trans. Ind. Electron.,
diagnosis based on multi-scale global fuzzy entropy, multiple class
vol. 54, no. 1, pp. 250–258, Feb. 2007.
feature selection and support vector machine,’’ Trans. Inst. Meas.
[34] L. Zhen, H. Zhengjia, Z. Yanyang, and C. Xuefeng, ‘‘Bearing condition
Control, vol. 41, no. 14, pp. 4013–4022, Oct. 2019, doi: 10.1177/
monitoring based on shock pulse method and improved redundant lifting
0142331219844555.
scheme,’’ Math. Comput. Simul., vol. 79, no. 3, pp. 318–338, Dec. 2008,
[56] X. Xiao, Q. He, Z. Li, A. O. Antoce, and X. Zhang, ‘‘Improving trace-
doi: 10.1016/j.matcom.2007.12.004.
ability and transparency of table grapes cold chain logistics by integrating
[35] A. Bellini, A. Yazidi, F. Filippetti, C. Rossi, and G.-A. Capolino, ‘‘High WSN and correlation analysis,’’ Food Control, vol. 73, pp. 1556–1563,
frequency resolution techniques for rotor fault detection of induction Mar. 2017.
machines,’’ IEEE Trans. Ind. Electron., vol. 55, no. 12, pp. 4200–4209,
[57] H. Zhao, D. Zhang, S. Huang, S. Mo, and H. Wei, ‘‘Analysis on the relation
Dec. 2008.
between cloud-to-ground lightning density and lightning trip rate in Hainan
[36] P. M. Baggenstoss and F. Kurth, ‘‘Comparing shift-autocorrelation with province based on pearson correlation coefficient,’’ High Voltage App.,
cepstrum for detection of burst pulses in impulsive noise,’’ J. Acoust. Soc. vol. 55, no. 8, pp. 186–192, 2019.
Amer., vol. 136, no. 4, pp. 1574–1582, Oct. 2014. [58] G. Chen, J. Chen, Y. Zi, J. Pan, and W. Han, ‘‘An unsupervised feature
[37] Y. Guo and K. K. Tan, ‘‘Order-crossing removal in Gabor order tracking extraction method for nonlinear deterioration process of complex equip-
by independent component analysis,’’ J. Sound Vibrat., vol. 325, nos. 1–2, ment under multi dimensional no-label signals,’’ Sens. Actuators A, Phys.,
pp. 471–488, Aug. 2009. vol. 269, pp. 464–473, Jan. 2018.
[38] Y. Lei, Z. He, Y. Zi, and X. Chen, ‘‘New clustering algorithm-based fault [59] C. Wang and J. Cai, ‘‘Research on fault diagnosis of rolling bearing based
diagnosis using compensation distance evaluation technique,’’ Mech. Syst. on empirical mode decomposition and principal component analysis,’’ Acta
Signal Process., vol. 22, no. 2, pp. 419–435, Feb. 2008. Metrologica Sinica, vol. 40, no. 6, pp. 1077–1082, 2019.
[39] J. Pei, S. Zhang, M. Qi, and G. Wan, ‘‘A new method for fault diagno- [60] X. Meng, C. Feng, and S. Gao, ‘‘Research on consistency of turbine
sis of fluid end in drilling pump,’’ Acta Petrolei Sinica, vol. 30, no. 4, blade temperature distribution based on principal component analysis,’’
pp. 617–620, 2009. J. Harbin Univ. Commerce (Natural Sci. Ed.), vol. 35, no. 4, pp. 451–457,
[40] R. Tiwari, V. K. Gupta, and P. Kankar, ‘‘Bearing fault diagno- 2019.
sis based on multi-scale permutation entropy and adaptive neuro [61] Y. Ma, ‘‘Principal component analysis of quality indexes of different
fuzzy classifier,’’ J. Vibrat. Control, vol. 21, no. 3, pp. 461–467, varieties of actinidia arguta,’’ Sci. Technol. Food Ind., vol. 40, no. 5,
Feb. 2015. pp. 233–238, 2019.
[41] W. Sun, G. An Yang, Q. Chen, A. Palazoglu, and K. Feng, ‘‘Fault [62] X. J. Zeng, X. L. Zhang, M. A. Hong-Jiang, and L. Li, ‘‘Traveling wave
diagnosis of rolling bearing based on wavelet transform and envelope fault location method for power grids based on wavelet packet energy
spectrum correlation,’’ J. Vibrat. Control, vol. 19, no. 6, pp. 924–941, spectra,’’ High Voltage Eng., vol. 34, no. 11, pp. 2311–2316, 2008.
Apr. 2013. [63] L. Zhai, ‘‘Research on image classification based on weighted multi-
[42] Y. Ying, J. Li, P. Chai, Y. Chen, and J. Pang, ‘‘Study on rolling bearing fault feature fusion and SVM,’’ M.S. thesis, School Comput., Central China
diagnosis based on multi-dimensional feature extraction,’’ J. Shanghai Normal Univ., Wuhan, China, 2016.
Univ. Electr. Power, vol. 34, no. 5, pp. 413–421, 2018. [64] Bearing Data Center. Accessed: Oct. 18, 2019. [Online]. Available:
[43] J. Rosero, L. Romeral, J. Ortega, and E. Rosero, ‘‘Short-circuit detection https://csegroups.case.edu/bearingdatacenter/home
by means of empirical mode decomposition and Wigner–Ville distribution [65] B. Wang, Y. Lei, N. Li, and N. Li, ‘‘A hybrid prognostics approach for
for PMSM running under dynamic condition,’’ IEEE Trans. Ind. Electron., estimating remaining useful life of rolling element bearings,’’ IEEE Trans.
vol. 56, no. 11, pp. 4534–4547, Nov. 2009. Rel., to be published.
19024 VOLUME 8, 2020

YAZHOU LI was born in Hebi, Henan, China, WEIFANG ZHANG received the B.S. and Ph.D.
in 1996. He received the B.E. degree in energy degrees from the School of materials science
and power engineering from the Beijing University and Engineering, Harbin Institute of Technology,
of Civil Engineering and Architecture, in 2018. Harbin, China, in 1996 and 1999, respectively.
He is currently pursuing the M.S. degree with the He is currently a Professor with the School of Reli-
School of Energy and Power Engineering, Beihang ability and Systems Engineering, Beihang Univer-
University. sity. His main research interests include material
His research interests include condition mon- and structural damage, structural safety and health
itoring, fault diagnosis, reliability assessment, monitoring, and life prediction.
the evaluation and analysis based on big data.
WEI DAI was born in Datong, Shanxi, China.

He received the B.S. degree with the School
of Mechanical Engineering and Automation,
Beihang University, and the Ph.D. degree in
mechanical engineering from the Beihang Univer-
sity and University of Bath, in 2011.
He is currently an Associate Professor with
Beihang University. His research interests include
reliability manufacturing theory and technology,
and the evaluation and analysis based on big data.
VOLUME 8, 2020 19025

Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We

Uploaded by

Copyright:

Available Formats

Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Li Et Al. - 2020 - Bearing Fault Feature Selection Method Based On We

Uploaded by

Copyright:

Available Formats

Received December 27, 2019, accepted January 11, 2020, date of publication January 17, 2020, date of current

version January 30, 2020.

Bearing Fault Feature Selection Method Based on

Corresponding author: Wei Dai ([email protected])

I. INTRODUCTION and maintaining the safe operation of the equipment are of

VOLUME 8, 2020 19009

II. BASIC THEORIES 2) FREQUENCY-DOMAIN FEATURE

19010 VOLUME 8, 2020

VOLUME 8, 2020 19011

19012 VOLUME 8, 2020

VOLUME 8, 2020 19013

19014 VOLUME 8, 2020

TABLE 1. Time-domain and frequency-domain feature parameters.

Let pj = Ei,j /E and

VOLUME 8, 2020 19015

19016 VOLUME 8, 2020

FIGURE 1. Implementation of feature selection model.

TABLE 3. Detailed information of the bearing vibration data set.

FIGURE 2. Experimental test stand [3], [64].

TABLE 2. Tested bearing parameters.

the fourth layer and the total energy EZ (EZ = 16

VOLUME 8, 2020 19017

TABLE 4. Single fault feature correlation rate.

FIGURE 3. Vibration signal of bearing in different states (fault diameter is

TABLE 5. PCA analysis results of remaining entropy features.

FIGURE 4. Trends in the characteristics of training samples.

entropy features, respectively. And the horizontal axis repre-

19018 VOLUME 8, 2020

factor matrixes (TABLE 5) are calculated. Taking principal

VOLUME 8, 2020 19019

TABLE 6. Features score and rank.

TABLE 9. Detailed information of the bearing vibration data set.

FIGURE 9. Experimental test stand [65].

of the bearing. In order to ensure that the length of a single

19020 VOLUME 8, 2020

FIGURE 12. The principal component and comprehensive score.

FIGURE 11. PCA analysis results of remaining entropy features.

VOLUME 8, 2020 19021

FIGURE 15. Nassi–Shneiderman diagram(N-S) of some features.

can effectively utilize the difference and complementarity

19022 VOLUME 8, 2020

VOLUME 8, 2020 19023

19024 VOLUME 8, 2020

WEI DAI was born in Datong, Shanxi, China.

VOLUME 8, 2020 19025

You might also like