CNN-LSTM Neural Network hybrid setup is selected for the choice of the proposed model.
3.1. Proposed Hall Sensor Fault-Diagnosis System and Fault-Recovering System
In this research, several machine learning models were implemented to create a fault-detection system and signal-recovery system for a brushless DC (BLDC) motor that has three internal Hall sensors spaced 120 degrees apart. Deep neural networks (DNNs) are known to perform well in pattern recognition tasks, but it is also important to have a sufficient and diverse dataset to train a high-performing fault-detection system model. It is necessary to include all types of faults in the training data so that the model can learn to handle various scenarios that may occur in the system. To train the fault-recovery system which requires Hall sensors’ signal with the fault-detection system’s output. Different conditions of the fault-diagnosis system’s output gave different sets of sequences of Hall sensors’ signals. In order to train a good fault-recovery system, it required an adaptive model to extract more information from the complex data.
The proposed model was developed using the TensorFlow and Keras libraries. The inputs are first processed by a convolutional layer (CNN) which uses a convolutional operation to obtain feature mapping. The convolutional kernel slides through the input and calculates the integral of the pointwise multiplication based on the size of the input. The output of the convolutional layer is then used as the input for the LSTM layer, but a dense layer is added between the CNN output and the LSTM layer to extract information through the time steps and input size. Finally, the LSTM’s output is processed by an output layer (dense) that uses an activation function such as the SoftMax function to compute probability distributions for each class, and the class with the highest probability is chosen.
The convolutional neural network (CNN) model, as depicted in
Figure 5, is a type of neural network that is particularly well-suited for tasks involving the analysis of two-dimensional data such as images. One key aspect of CNNs is the use of convolutional layers, which apply a mathematical operation called convolution to the input data. Convolution involves sliding a kernel, or small matrix, over the input data and calculating the integral of the elementwise product between the kernel and the input. This process allows the CNN to identify patterns and features in the data that are relevant for the task at hand. The output of the convolutional layer is a set of feature maps that capture different aspects of the input data. The equation for the convolutional layer can be expressed as:
The feature map is produced by applying a convolution operation to the input (
X) and weight (
W), and adding the bias constant (
). The row (
i), column (
j), and layer (
m) of the feature map are denoted by variables
i,
j, and m, respectively. The output of this operation is then standardized using a non-linear function (
f) as described in references [
26,
27]. After the feature map is extracted, it is passed through a pooling layer which shrinks the input and reduces computational load and memory usage. The pooling layer also helps to prevent overfitting. Finally, the input is classified using a fully connected layer and an activation function (e.g., SoftMax, ReLu, etc.). The SoftMax function layer calculates the probability of the input data belonging to the machine learning state labeled class [
26,
28]. The SoftMax function is frequently employed as the activation function in a multi-class classifier’s output layer, with
K denoting the number of classes. The function is defined as follows:
where the SoftMax function (
) is utilized to determine the probability that input data (
z) corresponds to each class. This is achieved through the exponential function (
e), which divides the exponential of the input (
) by the sum of the exponentials of the outputs (
) based on the index
j and upper limit
K.
Recurrent neural networks (RNNs) are designed to process sequential data such as time series or natural language. They have a memory input called the hidden state (represented by ht in
Figure 6a) which allows them to incorporate information from past time steps in the input sequence. This is in contrast to traditional feedforward neural networks, which process one input at a time and do not incorporate past information. RNNs are trained using backpropagation, a process in which the network receives an input, produces an output, and adjusts its weights and biases to reduce the error between the output and the desired output. The equation of RNN can be expressed as follows:
The RNN output, denoted by
y, is associated with the weight
W, which serves to incorporate the RNN’s memory, represented by
h, into the output. The memory formula is precisely defined as follows:
The memory of the RNN at time step
t is calculated using the non-linear activation function (
f), the weights
W and
W, the memory from the previous time step (
h), the input at the current time step (
x), and the bias constant (
b).
One limitation of the recurrent neural network (RNN) model is its difficulty in training on long sequences of data [
29]. This can result in vanishing gradients, where the gradients of the weights in the network become very small and thus have a minimal impact on the network’s output, or exploding gradients, where the gradients become very large, leading to unstable training. To address these issues, the Long Short-Term Memory (LSTM) model was introduced as an improvement on the RNN model. The layout of an LSTM is illustrated in
Figure 6b. In contrast to the RNN, which has only one type of memory, the LSTM has both short-term and long-term memory, allowing it to capture both short-term and long-term dependencies in the data. This enables the LSTM to effectively handle long sequences and maintain information from earlier time steps, as it can selectively store and forget information in its short-term and long-term memory cells. The LSTM is composed of four functions: (1) Forget: the equation for the forget function is expressed as
in which
F is the output of the forget function and
is the sigmoid function. (2) Store: the equation of the store function is defined as
in which
S is the output of the store function, * is element-wise multiplication,
S is the output of the store function from the previous time step, and tanh is the hyperbolic tangent function. (3) Update: the equation of the store function is defined as
in which
U is the output of the update function and
U is the output of the update function from the previous time step. (4) Output: the equation of the LSTM output function is defined as
in which output (
Y) of the LSTM is produced using four functions: forget, store, update, and output [
27,
30]. The forget function discards irrelevant information from the previous time step, while the store function saves pertinent new information in the cell state. The update function then selectively updates the information in the cell state, and the output function controls the information that is passed on to the next time step.
There are several different machine learning algorithms. The CNN-LSTM neural network hybrid setup was chosen for the proposed model for the beneficial reasons listed. The 1D-CNN layer setup is selected for the CNN model layer as the CNN model can learn the characteristics of the raw data through the convolutional and pooling layer. Subsequently, the LSTM model is to serve the purpose of identifying the illustration of the sequential data and the model is specifically designed to learn to recognize crucial input and store it in a long state. The combination of CNN and LSTM hybrid model setup offers the benefit of better feature-extraction ability and improves the robustness of the model [
31,
32]. The CNN-LSTM hybrid setup as shown in
Figure 7 mainly consists of an input layer, 1D-CNN layer, LSTM layer, and output layer.