An effective zero-shot learning approach for intelligent fault detection using 1D CNN

Zhang, Siyu; Wei, Hua-Liang; Ding, Jinliang

doi:10.1007/s10489-022-04342-1

An effective zero-shot learning approach for intelligent fault detection using 1D CNN

Open access
Published: 01 December 2022

Volume 53, pages 16041–16058, (2023)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

An effective zero-shot learning approach for intelligent fault detection using 1D CNN

Download PDF

Abstract

Data-driven fault detection techniques have attracted extensive attention in engineering, industry and many other areas in recent years. In many real applications, the following situation often occurs: data for certain types of faults (unseen faults) are not available to train models that are used for fault detection. Such a scenario can occur when data collection becomes highly time-consuming or destructive. To address this challenging problem, a novel fault detection method using zero-shot learning (ZSL) is proposed in this paper, which contains three phases: feature extraction, label embedding, and feature embedding. The method first extracts features from raw signals by applying a one-dimensional convolutional neural network (1D CNN), then builds semantic descriptions (human-defined) as fault attributes shared between seen faults and unseen faults, and finally uses a bi-linear compatibility function to find the highest-ranking fault type. The proposed semantic space based zero-shot learning with 1D CNN is called SSB-ZSL-1DCNN. The cosine distance is used to measure the similarity between feature embeddings and fault attributes. An important characteristic of SSB-ZSL-1DCNN is that the model, trained using only samples of seen faults, can be used to detect unseen defects. To evaluate the proposed method, two case studies are designed based on two well-known benchmarks (the Tennessee-Eastman chemical control process and the rolling bearing experiments at the Case Western Reserve University, respectively). The results demonstrate that the proposed method shows remarkable performance in detecting unseen faults.

Few-shot intelligent fault diagnosis based on an improved meta-relation network

Article 09 November 2023

A New Diagnosis Method with Few-shot Learning Based on a Class-rebalance Strategy for Scarce Faults in Industrial Processes

Article 18 February 2023

Multidomain variance-learnable prototypical network for few-shot diagnosis of novel faults

Article 18 April 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Fault detection and classification play an essential role in process control, monitoring, health management and maintenance because it builds a bridge between system monitoring data and its health status [1]. In industry, many complex process systems or plants comprise a large number of components; many key components (e.g. rolling bearings commonly used in machines) are expensive, vulnerable to damage, and prone to fault. Therefore, carrying out condition monitoring and fault detection of complex processes and machines are paramount for either safety or economic purposes.

Traditional fault detection and diagnosis methods can be roughly categorized into two groups: model-based and data-driven methods. The former relies on explicit mathematical models of the plant, while the latter uses historical data of the plant to determine its health status [2]. Traditional methods work under the following assumption: the distribution of the training data is similar to that of the test data (e.g., the fault samples), implying that the training data should contain a good number of fault samples so that the models used for fault detection are well trained. However, such an assumption may be violated in many real applications due to the following reasons. Firstly, it usually assumes that data are well collected and sufficiently represent both health and all potential faults status of the system (machine, plant, equipment, etc.) of interest. In practice, however, it could be such a case where no or few samples of target faults are available during system operation [3]. Secondly, few plants or systems would be allowed to operate to an occurrence of a major fault or a number of minor faults; therefore, normally it is very difficult (if not impossible) to collect a sufficiently large amount of samples to well train a fault detection model, because faults are destructive and can result in enormous losses [4]. Thirdly, systems (e.g. machines) typically gradually decline from health to failure, implying that obtaining adequate fault samples for data-driven approaches is time-consuming and costly.

A way to effectively solve fault detection tasks with small and imbalanced data is to develop intelligent fault detection (IFD) algorithms using, e.g., data augmentation-based strategy and transfer learning [1]. Such an algorithm first augments the data-by-data generation or data resampling, then uses the augmented data to extract features with machine learning models, e.g., neural networks, together with a feature adaptation process (where necessary), and finally builds a suitable fault classifier to identify the types of faults. However, in practice there exists such a case where a certain type of fault may not be observed or recorded due to some reasons.

In certain extreme circumstances, signals for specific fault types or working conditions are unobtainable, implying that diagnosis models cannot be well trained due to the lack of training samples for unseen fault classes. Moreover, in data-driven fault detection, identifying the unseen fault classes is a challenging task for traditional IFD methods [1]. Still, in practice there is a high need to tackle a realistic and highly challenging scenario as follows: samples of one or more certain fault types are not available at all or just a very limited small number of samples are available. In recent years, a learning method, called zero-shot learning (ZSL), has been widely used in image classification due to its power to recognize new objects ( not seen in the model training stage) based on information inferred from seen classes [5]. ZSL provides a powerful tool for solving the unseen fault detection problem concerned in this work, and a reasonable solution is to combine fault detection with zero-shot learning to classify the certain (unseen) types of faults without using samples of these types of faults. Inspired by the idea of zero-shot learning [6], this paper proposes a new zero-shot fault detection method based on a semantic space embedded model for industrial systems or devices. The implementation of the proposed method is as follows. Step 1: To build convolutional neural networks for feature extraction from raw data; Step 2: To define and specify a semantic based space for faults by creating a shared attribute form (matrix) for each type of fault; Step 3: To adopt and define a bilinear compatibility function to learn the relationship between the extracted features and fault attributes, based on which the highest-ranking unseen fault class is determined.

The performance of the proposed method is tested and assessed on real datasets, collected from a bearing and a complex chemical system, respectively. Two case studies are presented accordingly. The first case is about the bearing, widely used as a piece of crucial rotary equipment in many applications. We propose a method to detect ‘large-diameter faults’ by training classifiers only using samples of ‘small-diameter faults’. The second case emphasizes an entire and comprehensive industrial process control system, which aims to detect different unseen types of faults (i.e., not used in the training stage).

The main contributions of this paper are summarized as follows:

1)
We design a new zero-shot learning scheme for unseen fault detection without using samples of the unseen faults in the training stage. Specifically, for the bearing case, the proposed approach can detect and identify large-diameter in bearing, by training a model using data containing small-diameter fault samples but without including samples of large-diameter faults.
2)
To improve the adaptation of zero-shot learning to 1D time-series (most industrial fault data are in such a format), we comprehensively analyze various feature extraction methods, including traditional methods and deep learning based methods.
3)
The proposed semantic space, which treats the fault attributes as the side information, builds a bridge from seen faults to unseen faults for zero-shot fault detection.

The remainder of the paper is organized as follows. Section 2 provides a relevant literature review on traditional, intelligent, and zero-shot fault detection. Section 3 presents the details of our proposed method. Section 4 conducts two case studies of fault detection to demonstrate and verify the effectiveness of the proposed method. Case 1 is concerned with the detection of the unseen faults of larger size diameters occurring in the Case Western Reserve University (CWRU) experimental dataset. Case 2 conducts experiments on the Tennessee-Eastman process (TEP) dataset, aiming at detecting unseen types of faults. Finally, Section 5 concludes the main work.

2 Relevant literate review

2.1 Traditional fault detection and intelligent fault detection

The procedure of traditional classification-based, data-driven fault detection contains three main steps: data acquisition, feature extraction, and fault detection and classification [7]. In practice, data are collected via different means including the use of numerous sensors. Feature extraction is usually implemented through linear or nonlinear transformation and data decomposition. Commonly used linear methods includes principal component analysis (PCA) [8] and independent component analysis (ICA) [9]. Nonlinear data processing approaches, such as kernel based methods, are usually more powerful for characterizing nonlinear relationships, for example, it has been proved that the kernel principal analysis (KPCA) works better than its linear counterpart PCA for many applications [10]. It is usually important and useful to reduce the dimensionality of features or variables in the training space for several reasons, e.g., to make the classification tasks easier to implement, improve classification accuracy, or make the models and results easier to explain. PCA performs poorly when extracting features from a set of signals that are nonlinearly associated to or dependent on each other. ICA can work better for extracting non-Gaussian features from multivariate signals [11].

Recently, 1D convolutional neural network (CNN) was introduced to extract optimal damage-sensitive features automatically for vibration-based fault detection. For example, in [12], raw signals were transformed into two-dimensional grayscale images based on wavelet transform and deep CNN to extract robust features. In [13], a novel fault detection and classification method was proposed by using the DWT and CWT filter banks.

After feature extraction, the resulting features are fed into a fault classification model to determine the system’s health status. Many machine learning models have been developed for fault detection and classification (see e.g., [14]). In [15], a bearing fault diagnosis method was proposed based on deep CNN and random forest (RF) ensemble learning.

As mentioned in Section 1, traditional fault detection methods may not work well with small or imbalance data, therefore intelligent fault detection methods are needed to guarantee fault detection performance. A way to build intelligent fault detectors is to use deep learning. Deep transfer learning (DTL) methods have been introduced to the field of fault detection to overcome the difficulty in data collection (e.g., samples of certain faults are not enough or not available). DTL methods treat the insufficient samples as a cross-domain learning task and aims to find a solution by performing domain adaptation [16] for handling different distributions between source domain data and target domain data. Zhang et al. [17] investigated an end-to-end method based on a deep convolutional neural network to achieve high accuracy when the working load changes. Wen et al. [18] used sparse auto-encoder and the maximum mean discrepancy term to transfer training features to testing features; in this way, none of the target fault samples were needed for fault detection. In [19], a new optimal transport-based deep domain adaptation method was presented for rotating machine fault diagnosis. In [20], a deep adversarial domain adaptation (DADA) method was proposed for rolling bearing fault diagnosis; the method builds a DADA network to better address the commonly encountered challenge in real world applications: the distribution of the target domain data is different from that of the source domain data. Note that in typical deep transfer learning [21], it is assumed that the same faults appear in both the training and test stages.

2.2 Zero-shot learning

Recently, Lampert et al. [22] proposed a zero-shot learning (ZSL) scheme, which has received significant attention in the field of image recognition. Instead of using trained objects, it uses a high-level description provided by field experts to detect the target items. The description comprises semantic attributes, e.g., colour, shape and even habits, which could be pre-learned without samples of unseen classes. Roughly speaking, ZSL is a method for training models for pattern recognition of unseen types of images based on side information learned from seen classes with relevant description [23]. ZSL has two learning schemes: inductive and transductive models [24]. In the inductive model, only data from seen classes are available during the training stage While in the transductive model, it is assumed that data of both the unseen classes (i.e., unlabeled classes) and seen classes are available for model training, hence it is a type of semi-unsupervised learning. This study is mainly concerned with the inductive scheme.

Zero-shot classification approaches under an inductive setting could be broadly categorized into four groups, namely, direct-attribute prediction based, semantic space embedded based, non-linear multi-modal embedded based, and common space embedded based [25].

For the direct-attribute prediction based method, the most representative model is the direct attribute prediction (DAP) method presented by Lampert et al. [26], which directly builds the relationship between visual features and attributes, and then uses the learned model to predict the attributes of the unseen samples. Lampert et al. [26] also presented an indirect attribute prediction (IAP) method, in which the unseen samples were first assigned to seen classes. Then the unseen samples were predicted using the semantic attribute relationship between seen and unseen types. Note that direct-attribute prediction methods suffer from several drawbacks. Firstly, the two-step prediction method is an indirect approach to find a solution by solving intermediate problems; the solution might be optimal for predicting attributes based on attribute classifiers, but it is not necessarily optimal for predicting classes. Secondly, it is difficult for DAP to extend to incremental learning scenarios. These drawbacks can be overcome by a method based on semantic space embedding discussed below.

The method based on semantic space embedding learns a mapping from features to a semantic space [27, 28]. Frome et al. [29] constructed a deep visual-semantic model by learning a linear mapping from image features to the joint embedding space based on an online learning-to-rank algorithm. Akata et al. [6] presented a label-embedding method for learning a bilinear compatibility function between an image and a label embedding to find the matching embeddings assigned a higher score than the mismatching ones. Akata et al. [30] also proposed a label embedding model for fine-grained classification by combining supervised attributes and unsupervised output embeddings from hierarchies or text corpora. Romera et al. [5] used squared loss as a compatibility function with a regularizer to optimize classification accuracy. Kodirov et al. [31] presented a semantic autoencoder to handle project domain shift problems by reconstructing features after projecting to the semantic space. The approaches based on semantic space embedding enable the representation of visual samples in semantic space and allow recognition in such a space.

A non-linear multi-modal embedded model is usually capable of learning non-linear compatibility relations to optimize projection accuracy or embeddings. Xian et al. [32] extended the bilinear compatibility function to multiple linear (piecewise linear) compatibility functions, making a collection of maps highly interpretable.

Embedding both features and semantic descriptions into a common space is referred to as the common space embedded based method. Changpinyo et al. [33] proposed a synthetic classifier approach for zero-shot classification, which used linear combinations of base classifiers to train classifiers of unseen classes. Hayashi et al. [34] proposed a cluster-based method for multivariate binary classification in ZSL situation, where classifiers (models) were trained based on seen classes first and then used to separate the future data (test data) into two classes: the seen class and unseen (unknown) class. In [35], a novel one-class classification (OCC) method was proposed and used for image classification. The proposed OCC approach can effectively determine whether the input data of interest were from the seen class or the unseen class; the method is potentially very useful for developing and adapting ZSL methods and algorithms.

2.3 Zero-shot learning for fault detection

Zero-shot learning might bring breakthroughs in intelligent fault detection, especially for classifying the types of unseen faults under the condition that samples of these unseen faults are not available for some reason. Many preliminary research results on zero-shot fault detection have already been reported recently in the literature. Lv et al. [36] used a hybrid attribute conditional adversarial denoising autoencoder to tackle the zero-shot fault diagnosis problem. Gao et al. [37] proposed a ZSL method based on contractive stacked autoencoders for bearing fault diagnosis under different working loads. Feng et al. [3] introduced a novel fault description model based on an attribute transfer strategy to classify zero-shot faults in complex mechanical systems. Xing et al. [38] proposed a label description space embedding model for detecting the unseen compound faults of machines. Xu et al. [39] presented a zero-shot intelligent diagnosis method for unseen compound faults of devices using a visual space-based model.

It is worth highlighting that visual attributes ( e.g., colour and shape) used for zero-shot image recognition are unsuitable for sensor signal processing (e.g., vibration signals) [3]. When a new type of fault occurs in a system or machine, we will first notice the semantic attribute and description rather than individual samples. For example, from the description “an equipment that converts gas or vapour into liquid and transfers heat from the tube to the air near the tube,” professional workers can detect the object “condenser” without seeing it at all. Similarly, if “high condensing temperature” is a pre-defined fault type, then it is straightforward for us to know that such a fault occurs in the condenser when we are told the high-level attribute information that “high temperature gas from the compressor does not exchange heat well”. Furthermore, it is redundant to design separate attributes for each type of fault because it is not helpful for us to explore that fault attributes defined by humans transcend class boundaries [22], hence the attributes should be shared with different classes of seen or unseen faults. For example, both “reactor cooling water inlet temperature change” and “reactor cooling water valve change” [40] occur at “reactor”, so the attribute “related to reactor” could be shared across the above two seen faults. Then the attribute would be transferred to unseen faults in the testing stage. In conclusion, the fault attributes could include many aspects such as the position of the fault, the related process variable, the size of the fault, etc. The fault attributes provide side information for unseen classes faults, which facilitate the model to detect unseen faults and directly solve the zero-shot fault detection problem.

Feature extraction from raw signals is another considerably crucial process for zero-shot fault detection. Feng et al. [3] used supervised principal component analysis [41] to extract features, under the assumptions that the process control system is linear and follows Gaussian distribution. Such assumptions are strong since most of the data generated by complex industrial processes are non-linear. In [39], 1D vibration signals of interest were transformed to time-frequency images and then fed into a convolutional neural network (CNN) to extract features. It is worth mentioning that converting 1D vibration signals into 2D representations, an additional procedure, is of high computational complexity and needs some application-specific adaptation.

3 Proposed method

3.1 Problem formulation

Following [6], we assume that there is a training (seen) dataset $S={\left\{\left({x}_{i}^{s},{y}_{i}^{s}\right)\right\}}_{i=1}^{{N}_{s}}$ with ${x}_{i}^{s}\in {X}^{s}$, ${y}_{i}^{s}\in {Y}^{s}$, which consists of ${N}_{s}$ fault data samples and $s$ classes of seen faults. Each sample ${x}_{i}^{s}$ corresponds to a label ${y}_{i}^{s}$. Likewise, given a testing (unseen) dataset $U={\left\{\left({x}_{i}^{u},{y}_{i}^{u}\right)\right\}}_{i=1}^{{N}_{u}}$ with ${x}_{i}^{u}\in {X}^{u}$, ${y}_{i}^{u}\in {Y}^{u}$, the dataset consists of ${N}_{u}$ fault data samples and $u$ classes of unseen faults. Each sample ${x}_{i}^{u}$ corresponds to a label ${y}_{i}^{u}$. The attributes of a fault are denoted as $A=\left[{A}^{s},{A}^{u}\right]\in {R}^{L\times C}$, where $L=s+u$, and $C$ is the number of fault attributes. It is important to point out that both ${A}^{s}$ and ${A}^{u}$ are available in the training stage because the fault attributes are class-level common knowledge rather than expert knowledge. We can obtain the fault attributes in advance. The samples and classes need to meet the following conditions in zero-shot learning settings: ${Y}^{s}\cup {Y}^{u}=Y$, ${Y}^{s}\cap {Y}^{u}=\varphi$.

3.2 Model structure

The proposed method, SSB-ZSL-1DCNN, motivated by and adapted from the idea of semantic space embedding, comprises three steps: feature extraction, human-defined label embeddings, and a feature embedding model. The overall structure of the method is presented in Fig. 1.

1)
Feature extraction: 1D CNN is preferable when tackling industrial fault 1D signals since 1D CNN is easier to train and needs lower computational complexity than 2D convolutions [42]. Therefore, we use 1D CNN as the feature extractor. The architecture of the designed 1D CNN is shown in Table 1. It contains two convolution layers, two max-pooling layers, one flatten layer and one fully-connected layer. The inputs of the 1D CNN are the 1D time-series signals, and the outputs from the fully-connected layer are extracted features.
Table 1 The architecture of the designed 1D CNN for feature extraction
Full size table
2)
Human-defined label embeddings: In practical fault diagnosis, tagging each fault sample is complex and time-consuming. Fault attributes (represented by a matrix A in the Section 4) provide side information which can be used to establish the relationships between seen faults and unseen faults. Fault attributes allow for sharing characteristics of faults such as fault position and fault effect, which are easily annotated by experts and transformed into computer-readable vector forms [30]. The description of each attribute could be a binary value $\varphi^{\text{0,1}}\in\left\{\text{0,1}\right\}$or a continuous value ${\varphi }^{C}\in \left[\text{0,1}\right]$ for each class. The attributes for each fault class can be written as:

$${\Phi }\left(y\right)={\left[{\varphi }_{y,1},\dots ,{\varphi }_{y,E} \right]}^{T}$$
(1)
where ${\varphi }_{y,1}$ could be one of the binary numbers in $\left\{\text{0,1}\right\}$ or a real number between 0 and 1; y denotes fault class, and E denotes the dimension of attributes for a fault class. Note that continuous attributes ${\varphi }^{C}$ carry more information than binary attributes ${\varphi }^{\text{0,1}}$. For illustration purposes here, we consider the attribute matrix in binary, but in the subsequent experiments, we set random and continuous attributes rather than binary attributes. If a fault does not have this attribute, set it to 0; if it has this attribute, set it to a random number in the range of (0,1).

3)
Feature embedding model: We define a prediction function f by maximizing the bi-linear compatibility function F as follows:
$$f\left(x;w\right)=\text{arg }\ \underset{y\in Y}{\text{max}}F(x,y;w)$$
(2)
where w denotes the parameter vector of F and can be written as a $D\times E$ matrix W that D is the extracted features dimension and E is the attributes dimension. The bi-linear form of the compatibility function $F:X\times Y \to R$ between a raw fault data space X and a fault label space Y can be defined as follows:
$$F\left(x,y;W\right)={\theta \left(x\right)}^{T}W{\Phi }\left(y\right)$$
(3)
where the extracted features are denoted by $\theta \left(x\right)$ and fault label embedding is denoted by ${\Phi }\left(y\right)$. $F\left(x,y;W\right)$ is an optimized compatibility function based on the ranking, enabling that the correct label will get the highest rank than any other labels by learning W. This idea is closely related to the web scale annotation by image embedding (WSABIE) algorithm [43] which learns a low-dimensional joint embedding space for both images and annotations to classify the annotations from the ranked list of annotations. The significant difference between our method and WSABIE is that the latter learns both ${\Phi }\left(y\right)$ and W, whereas the former only learns W and uses fault attributes as side information ${\Phi }\left(y\right)$.

4)
Parameterestimation: similar to the formulation defined in the unregularized structured SVM [43], the weighted approximate ranking objective function is to minimize:
$$\sum\nolimits_{y\in Y}\frac{{\beta }_{{r}_{\Delta ({x}_{n},{y}_{n})}}}{{r}_{\Delta ({x}_{n},{y}_{n})}}\sum\nolimits_{y\in Y}\text{m}\text{a}\text{x}\left\{0,s\left({x}_{n},{y}_{n},y\right)\right\}$$
(4)
where
$$s\left({x}_{n},{y}_{n},y\right)=\Delta \left({y}_{n},y\right)+F\left({x}_{n},y;W\right)-F\left({x}_{n},{y}_{n};W\right)$$
(5)
$${\beta }_{k}={\sum }_{i=1}^{k} {\alpha }_{i}$$
(6)
$${r}_{\Delta ({x}_{n},{y}_{n})}=\sum\nolimits_{y\in Y}1(\text{s}\left({x}_{n},{y}_{n},y\right)>0)$$
(7)
Here, $s\left({x}_{n},{y}_{n},y\right)$ is misclassification loss function with margin $\Delta \left({y}_{n},y\right)$, where $\Delta \left({y}_{n},y\right)=1$ if $y\ne {y}_{n}$ and 0 otherwise. As suggested in the WSABIE algorithm, we choose ${\alpha }_{i}=1/i$. ${r}_{\Delta ({x}_{n},{y}_{n})}$ is the upper bound on the rank of fault label ${y}_{n}$ related to fault data ${x}_{n}$. The ranking-based function (4) aims to obtain higher compatibility between the feature extraction and fault label embedding of the target fault label than between the feature extraction and fault label embedding of the wrong fault labels.

In the training stage, we use the extracted features $\theta \left(x\right)$ and fault attributes ${\Phi }\left(y\right)$, which are only from seen fault classes, to learn W. We apply stochastic gradient descent (SGD) to optimize W and then find the highest scored class y, if ${\text{arx}}\;_{y\in Y}{\text{max}}s(x_n,y_n,y)\neq y_n$ :

$${W}^{\left(t\right)}={W}^{(t-1)}+{\eta }_{t}{\beta }_{\lfloor\frac{N-1}{k}\rfloor}\theta \left({x}_{n}\right){\left[{\Phi }\left({y}_{n}\right)-{\Phi }\left(y\right)\right]}^{T}$$

(8)

where ${\eta }_{t}$ is the learning rate at iteration t; in this study, a constant step size ${\eta }_{t}=\eta$ is used. Based on WSABIE, ${r}_{\Delta ({x}_{n},{y}_{n})}$ is approximated as ${r}_{\Delta ({x}_{n},{y}_{n})}\approx \lfloor\frac{N-1}{k}\rfloor$, where N is the number of fault labels and k is the number of wrong fault labels. After the completion of the training stage, the best W can be obtained.

In the testing stage, we embed a feature onto the best W and use the cosine similarity measure to search for the nearest fault attribute vector, which belongs to one of the unseen fault classes.

4 Experiments

This section presents two case studies for two real datasets: rolling bearing fault dataset created by Case Western Reserve University (CWRU) and chemical process control fault dataset known as Tennessee-Eastman process (TEP). The two case studies were carried out from different perspectives to comprehensively evaluate the performance of the proposed method.

4.1 The Case Western Reserve University (CWRU) dataset

4.1.1 Introduction to the CWRU dataset

The CWRU dataset, consisting of vibration-based rolling bearing fault data, is from the Case Western Reserve University Bearing Data Center [44]. The test bench, shown in Fig. 2, contains a 2 hp reliance electric motor, a torque and a dynamometer. In addition, an acceleration sensor is installed above the bearing housing at the fan-end and drive-end to collect the vibration acceleration when collecting the fault data.

The faults are located on drive-end bearing and fan-end bearing, respectively, which contain inner race fault, rolling element fault and outer race fault with four working loads (0,1,2 and 3 hp). The fault dimeter for each type of faults ranges from 0.007 to 0.028 inch on the bearings using electro-discharge machining (EDM). As for the variables in each class of faults, there are drive-end acceleration data (DE), fan-end acceleration data (FE), base plate acceleration data (BA) and motor speed (RPM). The sampling rate of signals for the dataset is 12 kHz.

We chose to use the 12 kHz drive-end bearing fault data as our experimental dataset. Overall, there are four groups of experiments, namely, 0 hp, 1 hp, 2 hp and 3 hp. For each group, there are nine kinds of faults in total, and only DE is selected as a variable because the vibration signal, collected at the drive-end, is more comprehensive and less infected by other components and environmental noise. For graphical illustration purposes, the vibration signal samples of the rolling race fault with 3 hp and 0.021 inch fault diameter are shown in Fig. 3, where the top panel shows the waveform of the signal in the time domain and the bottom panel shows the corresponding spectrum. For each type of fault, the first 102,400 data for DE are considered; the data were then pre-processed and rearranged with an overlap sampling approach, with the overlap ratio of 50%, resulting in a total of 200 samples, each consists of 1024 data points. The details of faults in the dataset are given in Table 2.

Table 2 Fault labels (9 features in each working load)

An effective zero-shot learning approach for intelligent fault detection using 1D CNN

Abstract

Similar content being viewed by others

Few-shot intelligent fault diagnosis based on an improved meta-relation network

A New Diagnosis Method with Few-shot Learning Based on a Class-rebalance Strategy for Scarce Faults in Industrial Processes

Multidomain variance-learnable prototypical network for few-shot diagnosis of novel faults

1 Introduction

2 Relevant literate review

2.1 Traditional fault detection and intelligent fault detection

2.2 Zero-shot learning

2.3 Zero-shot learning for fault detection

3 Proposed method

3.1 Problem formulation

3.2 Model structure

4 Experiments

4.1 The Case Western Reserve University (CWRU) dataset

4.1.1 Introduction to the CWRU dataset

4.1.2 Model implementation

4.1.3 Accuracy of zero-shot fault detection

4.1.4 Comparison with other ZSL methods

4.1.5 Performance comparison of feature extraction methods

4.2 The Tennessee-Eastman Process (TEP) dataset

4.2.1 Introduction to the TEP dataset

4.2.2 Model implementation

4.2.3 Fault detection accuracy with ZSL

4.2.4 Comparison with traditional feature extraction methods

4.2.5 Comparison with traditional multi-classification methods

4.2.6 Comparison with other ZSL methods

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation