Denoising With MFIF
Denoising With MFIF
Table of Contents
Denoising Using Multivariate Fast Iterative Filtering (MFIF)
Concept of MFIF
Key Steps in MFIF
Algorithm for MFIF
Algorithm 1: Multivariate Fast Iterative Filtering
Advantages of MFIF
1. Multivariate Capability
2. Adaptive Filtering
3. Efficiency in Processing
4. Improved Signal Characterization
5. Robustness Against Non-linearities
6. Energy Efficiency
7. Anomaly Detection Enhancements
Other Noise Removal Algorithms
Data Outlier Detection Using Deep Isolation Forest (DIF)
Overview of Deep Isolation Forest (DIF)
Advantages of DIF
Potential Questions Related to DIF
Causes of Outliers in IIoT Sensor Data
Implications of Outliers
Example
Deep Multi-Scale Fusion Neural Network-Based Data Fusion and Fault Classification
Page 2 of 45
2. Feature Extraction
Visual Aids
Conclusion
Why Deep Isolation Forest (DIF) is Used?
How DIF is Different from Other Outlier Detection Algorithms?
Page 4 of 45
Concept of MFIF
MFIF is used to process multivariate time-series signals, where:
By applying Fast Iterative Filtering (FIF) to each channel, noise is separated from useful data.
Page 7 of 45
This filter length helps in determining which parts of the signal contain useful information
versus noise.
Defined as:
w(u − 1)
𝜃(u) = arccos ( )
w(u)
⋅
∥w(u)∥ ∥w(u − 1)∥
4. Repeat until all noise is removed and final denoised signal (S) is obtained.
Advantages of MFIF
Page 8 of 45
Removes noise effectively: Filters out uncorrelated signal components.
Improves efficiency: Reduces unnecessary processing of noisy data.
Enhances anomaly detection: Helps identify outliers in smart cities, industrial automation, and
environmental monitoring.
Energy-efficient: Reduces computational overhead in IIoT systems.
By using MFIF, IoT networks can process cleaner data, make faster decisions, and improve reliability in
real-time applications. 🚀
Using Multivariate Fast Iterative Filtering (MFIF) for noise removal in sensor data offers several
advantages over other algorithms, making it particularly suitable for IIoT applications. Here are some
reasons why MFIF might be preferred:
1. Multivariate Capability
Handling Multiple Channels: MFIF is designed to process multivariate signals, which is common in
IIoT systems where data is collected from multiple sensors simultaneously. This ability to handle
multiple correlated signals allows for better noise reduction across the entire dataset.
2. Adaptive Filtering
Dynamic Filter Length: MFIF computes a unique filter length that adapts to the characteristics of
the signal. This adaptability allows it to respond more effectively to changing noise conditions,
unlike static filtering methods that may not perform well under varying noise levels.
3. Efficiency in Processing
Fast Iterative Approach: The iterative nature of MFIF allows it to quickly converge to a denoised
solution. This is especially important in real-time applications where timely decision-making is
crucial. Many traditional algorithms may require more computational resources or time to achieve
similar results.
6. Energy Efficiency
Page 9 of 45
Resource Optimization: By effectively reducing noise, MFIF minimizes the processing of irrelevant
data, leading to lower energy consumption. This is particularly valuable in IIoT scenarios where
sensor power efficiency is crucial.
Kalman Filters: Best for linear systems and may require accurate modeling of the system. They can
be computationally intensive and may not handle non-linearities well.
Wavelet Transforms: Excellent for localized frequency analysis and can handle non-stationary
signals, but the implementation can be complex, and choosing the right wavelet is critical.
Moving Average Filters: Simple and easy to implement, but they can introduce lag and may not
effectively deal with sharp spikes or rapid changes in the signal.
In summary, MFIF stands out for its adaptability, efficiency, and ability to handle complex, multivariate
signals, making it a strong choice for real-time applications in IIoT systems where noise removal is
crucial.
ϕv : A neural network that transforms the data D into a new d-dimensional space.
s: Ensemble size, which determines how many isolation trees (iTrees) will be created.
θv : Random initialization of the weights of the neural network.
s × u.
node.
I(n/U) = ΩU j ≈U J (n∣Uj )
Ω: An integration function that measures how well the iTree isolates the data point.
J (n∣Uj ): Represents the isolation challenge for the data point n with respect to the iTree Uj .
4. Identifying Outliers
If the calculated isolation score exceeds a predefined threshold, the data point is classified as an
outlier. The method's effectiveness largely depends on the proximity of sensor nodes to each
other.
Advantages of DIF
Adaptability: The method can adjust to the spatial relationships between sensor nodes, enhancing
its accuracy in detecting outliers.
Simplicity: The algorithm is relatively easy to implement and does not require complex parameter
tuning.
2. How does the random initialization of weights θv affect the performance of the DIF?
3. What criteria can be used to select the threshold for identifying outliers?
Investigate different methods for determining a suitable threshold and its influence on the
detection rate.
4. How does the spatial distribution of sensor nodes influence the effectiveness of the DIF?
Analyze the relationship between node proximity and outlier detection accuracy.
5. In what scenarios might the DIF outperform traditional outlier detection methods?
Compare DIF with other algorithms (e.g., statistical methods, clustering-based approaches)
and identify specific use cases where DIF is advantageous.
6. What are the computational requirements for implementing the DIF in a real-time IIoT system?
Discuss the efficiency of the algorithm in terms of processing speed and resource consumption.
7. How does DIF handle dynamic changes in sensor data over time?
Consider the algorithm's ability to adapt to changes in the data distribution.
8. What limitations might the Deep Isolation Forest approach have in certain applications?
Page 12 of 45
Identify potential challenges or drawbacks when using DIF in specific IIoT scenarios.
By addressing these questions, one can gain a deeper understanding of the Deep Isolation Forest
method and its applicability to outlier detection in IIoT networks.
In the context of Industrial Internet of Things (IIoT) sensor networks, an outlier refers to a data point
or observation that significantly deviates from the expected pattern or distribution of the data.
Outliers can arise from various factors, including:
3. Communication Errors: Data transmission issues, such as packet loss or corruption during the
communication between sensors and the central system, can result in inaccurate data.
4. Human Errors: Mistakes during data collection, calibration, or maintenance of the sensors can lead
to outlier readings.
5. Rare Events: Genuine but infrequent events (e.g., equipment failures, accidents) that occur within
the monitored environment can manifest as outliers.
Implications of Outliers
Data Integrity: Outliers can compromise the integrity of the dataset, leading to incorrect analyses
and decisions.
Decision Making: When outliers are not identified and handled appropriately, they can skew results
and lead to faulty conclusions or inappropriate actions.
Anomaly Detection: Identifying outliers is crucial for detecting anomalies that may indicate system
failures, safety hazards, or security breaches.
Example
For instance, in a temperature monitoring system for industrial equipment:
A temperature reading of 200°C when the normal operational range is between 50°C to 100°C
would be considered an outlier. This could indicate a malfunctioning temperature sensor, a
sudden spike in temperature due to a fire, or an incorrect data transmission.
In summary, in the IIoT context, an outlier is a data point that significantly differs from the expected
behavior of the system, warranting investigation to determine its cause and whether it should be
included in further analyses.
Page 13 of 45
An anomaly is something that is unusual or unexpected. In simpler terms, it's a situation or occurrence
that doesn't fit the normal pattern or behavior. For example:
If most days it rains for an hour but one day it rains for the entire day, that day is an anomaly.
In a classroom, if all students score between 70 and 90 on a test but one student scores 30, that
score is an anomaly.
Anomalies can indicate something important, like a problem or a change that needs attention.
Deep Multi-Scale Fusion Neural Network-Based Data Fusion and Fault Classification
In complex systems, especially in Industrial Internet of Things (IIoT) applications, combining data from
multiple sensors is crucial for accurate monitoring and decision-making. This process is known as
sensor fusion. The goal of sensor fusion is to create a reliable and comprehensive representation of
the system's state, which improves accuracy, reliability, and data quality.
How It Works
Convolution Layers: The first seven convolution layers of the network extract shared features,
resulting in feature maps denoted as Fc .
Sub-Branches: These feature maps are then split into two sub-branches, each containing six
additional convolution layers. The outputs from these branches are the scale-specific feature
maps Fc1 and Fc2 .
Fusion Process: The various feature maps Fc1 , Fc2 , and other relevant features are combined to
produce cross-scale properties f . This combination helps capture a richer representation of the
sensor data.
Multi-Loss Optimization: All learning components, including the fused features and those from the
individual branches, are optimized through a multi-loss approach. This means that the network
learns not only to classify faults but also to effectively fuse the data from multiple sources.
Benefits of DMSFNN
Page 15 of 45
Improved Accuracy: By fusing data from multiple sensors, the DMSFNN can provide a more
accurate and consistent assessment of the system's state than relying on individual sensors.
Enhanced Reliability: The network can better handle uncertainties and missing data, as it
leverages information from multiple sources.
Cost and Complexity Reduction: Integrating sensor data helps simplify the overall system design
by reducing the number of required components while maintaining performance.
Conclusion
The Deep Multi-Scale Fusion Neural Network is a powerful tool for data fusion and fault classification
in IIoT applications. Its architecture enables it to effectively combine information from multiple
sensors, leading to improved accuracy and reliability in monitoring dynamic systems. This approach is
particularly beneficial in environments where sensor data is variable and prone to noise or other
distortions.
3.4.1.
Single Scale Feature Learning
The backbone provides the various scale branches with shared
feature maps. Use the six
layer CNN architecture to build the single-scale branch. To be
more precise, the first three
convolution layers efficiently extract the signal features by using
an equal number of
convolution kernels. Additionally, there are progressively fewer
convolution kernels in the
final three layers. It is still quite good at reducing the dimension of
features and extracting
high-level information. Equation (5) defines the branch outputs
𝐹𝑐𝑖
and 𝑖 ∈ {1,2} for the inputs
�
�𝑗 and 𝑗 ∈ {1,2,...,𝑛}.
�
�𝑐𝑖
= 𝑛𝑐𝑖
(𝑛𝑐(𝑦𝑗;𝜃𝑐);𝜃𝑐𝑖
)
(5)
where 𝐹𝑐𝑖
represents the raw input's branch feature. 𝑦𝑗, 𝑛𝑐, 𝑛𝑐𝑖
, and 𝑘 ∈ {1,2} represent
the scale-specific subnetwork and the backbone network. The
network parameters are 𝜃𝑐 and
�
�𝑐𝑖
, respectively. To give a reduced dimension feature embedding for
each sample, the feature
dimension is first compressed using global max-pooling, as shown
by the top output feature
map 𝐹𝑐𝑖
Page 16 of 45
of the 𝑐𝑗 branches. Equation (6) is then used to find the
posterior probability of each
class.
�
�𝑐𝑖
= ℎ𝑛(𝐹𝑐𝑖
)
�
�(𝑧𝑐𝑖
)=
𝑒𝑥𝑝(𝑥𝑥𝑗
𝐷
𝑙=1
∑ 𝑒𝑥𝑝(𝑥𝑙
𝑈𝑧𝑐𝑖 )
(6)
�
� 𝑧𝑐𝑖 )
(7)
When ℎ𝑛 indicates the global max-pooling operation, 𝑞(𝑐𝑖) is
the likelihood that the model
would give input 𝑥𝑗 the label 𝑦𝑗, and 𝑥𝑙 is the class 𝑙
parameter. Equation (8) defines the
objective loss function for every observable instance in the
training set.
�
�𝑐𝑖
= −1
𝑛
𝑛
𝑗=1
�
�
𝑙=1
∑ ∑ 𝐽{𝑥𝑗=𝑙}𝑙𝑜𝑔𝑞(𝑧𝑐𝑖
)
(8)
where the indicator function is 𝐽(⋅) in order for 𝐽(𝑡𝑟𝑢𝑒) = 1
to occur, and otherwise is 0.
explain it in simpler words
1. Backbone Network: This part of the network provides shared information that is useful for all the
branches that follow. Think of it as a common foundation for the branches to build upon.
2. Single-Scale Branch: Each branch uses a six-layer convolutional neural network (CNN) to learn
specific features from the data. Here's how it works:
First Three Layers: These layers use an equal number of filters (or convolution kernels) to
capture essential details from the input data. They are good at extracting the initial, important
Page 17 of 45
features.
Last Three Layers: These layers use progressively fewer filters, which helps to reduce the
number of features while still keeping the most relevant high-level information.
3. Outputs of the Branches: Each branch produces outputs (features) denoted as Fc , where i can be
i
1 or 2 (representing two different branches). The equation (5) describes how these outputs are
calculated based on the input data yj .
nci represents the function for the scale-specific subnetwork and backbone.
4. Feature Reduction: To simplify the features and reduce their dimensions, a global max-pooling
operation is applied. This operation takes the most important feature from each area of the
feature map, resulting in a smaller set of key features.
5. Class Probability Calculation: The model then predicts the likelihood of each class (category) using
equation (6). This tells us how confident the model is that the input data belongs to a particular
class.
6. Likelihood of Labels: The notation q(ci ) indicates the probability that the model assigns the label
7. Loss Function: To train the model, we need a way to measure how well it's doing. The objective
loss function in equation (8) helps calculate the difference between the predicted labels and the
actual labels in the training data.
The function uses an indicator J (⋅) that is 1 if the condition is true (meaning the predicted
label matches the actual label) and 0 otherwise. The goal is to minimize this loss during
training, which helps the model learn better.
Summary
In simpler terms, this section describes how the network learns features from sensor data using a
specific structure of layers. It involves:
3.4.2.
Multi-Scale Feature Fusion Learning
To get robust features for fault diagnosis and cross-scale
information complementation,
First concatenate the numerous scale-specific features 𝐹𝑐𝑖
to create the fusion feature maps 𝑓
with 𝑑 channels.
�
� =𝐶𝑎𝑡(𝐹𝑐1
,𝐹𝑐2
)
(9)
the concatenation operation is represented by 𝐶𝑎𝑡. After that, a
Page 18 of 45
spatial attention module
is used to mine the discriminative features further and boost
efficiency. In this framework, at
each spatial location 𝑣 of 𝑓, a global average pooling operation
yields a global feature map 𝑠.
�
�𝑣 = 1
𝑑
𝑑
∑ 𝑓𝑣,𝑙
𝑙=1
(10)
Next, create the spatial attention map 𝐹𝑎 by using a sigmoid
function and a 1 × 1
convolution to 𝑠. As a result, adding up the weighted features
yields the new fusion features
�
�𝑐𝐹
. The information can be stated as follows in equation (11) and
(12),
�
�𝑎 = 𝜎(𝑋∗𝑠+𝑐)
�
� =𝑓+𝐹𝑎⊕𝑓
(11)
(12)
where the channel-wise product operation is shown by ⊕ and the
sigmoid function is
represented by 𝜎(⋅). Finally, a global pooling layer is
implemented to squeeze the feature's
dimensions and integrate features from several convolutional
channels. For the fusion feature
maps 𝑓, This framework uses both the global max-pooling and
average-pooling processes.
Max-pooling is a technique that effectively extracts specific and
discriminative information
from signals by determining the most outstanding value in each
zone. Furthermore, by
applying equation (13), average pooling facilitates the retrieval of
the signal's global
information using average procedures.
�
�𝑐𝐹
= ℎ𝑛(𝑓) +ℎ𝑏(𝑓)
(13)
where the global average pooling procedure is denoted by ℎ𝑏.
Next, features 𝑧𝑐𝐹
are used
to make predictions. The task's objective function is the softmax
classification loss. The data
in equation (14) can be expressed as follows.
�
�𝑐𝐹
= −1
𝑛
Page 19 of 45
𝑛
𝑗=1
𝐷
∑ ∑ 𝐽{𝑥𝑗=𝑙}
𝑙=1
𝑙𝑜𝑔 𝑞(𝑧𝑐𝐹
)
(14)
At last, the proposed DMSFNN network classifies the sensor data
as faulty or normal. To
enhance the fault diagnosis, the weight parameter 𝑧𝑐𝐹
are optimized using the MAO algorithm,
and the stepwise procedure of MAO is detailed below. An analysis
was carried out to calculate
the effect of network disruptions on fault diagnosis. Less than 2%
more errors were made
during short outages (up to one minute), and up to 15% more
were made during prolonged
outages (more than five minutes). This emphasises the
importance of an effective network
architecture and efficient data recovery plans.
explain it in simper terms
1. Combining Features: The first step is to merge (concatenate) the features obtained from different
scales (denoted as Fc1 and Fc2 ). This creates a new set of feature maps f that contain more
comprehensive information.
The equation f = Cat(Fc1 , Fc2 ) shows how the features from the two branches are
combined.
2. Enhancing Important Features: To make sure the most important features stand out, a spatial
attention module is used. This module helps the network focus on the parts of the feature maps
that are most relevant for identifying faults.
A global average pooling operation is performed at each spatial location v of the feature map
f . This operation calculates an average value for each feature, resulting in a global feature
map s.
3. Creating Attention Maps: A spatial attention map Fa is then created using a sigmoid function and a
1x1 convolution on the global feature map s. This attention map helps to highlight important
The equations (11) and (12) describe how the attention map is calculated and how it is used to
adjust the original features, resulting in new fusion features.
4. Pooling for Feature Reduction: To reduce the dimensions of the features further and integrate the
information from different channels, a global pooling layer is applied. This involves:
Page 20 of 45
Max-pooling: This technique extracts the highest values from each section of the feature maps,
emphasizing the most significant features.
Average pooling: This method computes the average of the features, providing a general
overview of the data.
The combined result from these pooling operations is represented in equation (13).
5. Making Predictions: The resulting features zcF are then used to classify the sensor data as either
faulty or normal. The classification is done using a softmax classification loss function, which helps
Equation (14) describes the process for calculating this loss, which helps the model improve its
predictions during training.
6. Improving Fault Diagnosis: To enhance the accuracy of the fault diagnosis, the model's weight
parameters zcF are optimized using a specific algorithm called the MAO algorithm.
7. Network Disruptions Impact: An analysis was done to see how network disruptions affect the
model's fault diagnosis ability. It was found that short disruptions (less than one minute) led to a
small increase in errors (less than 2%), while longer disruptions (more than five minutes) caused a
more significant increase in errors (up to 15%). This highlights the importance of having a robust
network architecture and effective recovery strategies.
Summary
In simpler terms, this section describes how to combine and enhance features from different sensor
data sources to improve the identification of faults. Key steps include:
Making predictions about the status of the sensors and optimizing the model to enhance accuracy.
Understanding the impact of network issues on performance to ensure reliable fault detection.
3. Reproduction
4. Assortment (Selection)
Each axolotl represents a potential solution to the optimization problem, where fault diagnosis
parameters are fine-tuned for better results.
Similarly, in the MAO algorithm, some solutions "lose" parts of their values (i.e., they are reset).
If an axolotl "loses" a part, it has a chance to regenerate a new one, helping the algorithm explore
different possibilities.
It mimics nature's way of improving and adapting, leading to more accurate fault detection in
sensor data.
In simple terms, the MAO algorithm is like a group of axolotls learning to survive: they change color to
match the best performers, heal injuries, and reproduce better versions of themselves. This process
helps fine-tune the fault diagnosis system to work as accurately and efficiently as possible.
This data may include temperature, humidity, light intensity, and gas levels.
2. Feature Extraction
Important characteristics (features) are extracted from the raw sensor data.
The model needs hyperparameters (e.g., learning rate, number of layers, weight values) to be
optimized for better accuracy.
Transition Phase: The algorithm makes weaker solutions (poor fault classifiers) learn from the
best-performing ones.
Injury and Restoration: Some parts of the model are randomly modified (reset and
regenerated) to explore new solutions.
Reproduction & Selection: The best-performing solutions are used to create improved versions,
refining the model over multiple iterations.
Temperature (°C) Humidity (%) Gas Level (PPM) Light Intensity (Lux) Machine Condition
Learning rate
Weight values
Activation functions
If these parameters are not chosen correctly, the model may not predict faults accurately.
The algorithm createsmany axolotls (solutions) and evaluates which ones perform best.
2. Axolotls Adapt to Improve the Model
Best axolotl guides others: Poor solutions adjust based on the best-performing ones (like
learning from the strongest axolotl).
Regeneration (Mutation): Some parameters are randomly reset and replaced (like an axolotl
regenerating a lost limb).
Selection: The best solutions survive and are used to create even better ones.
Page 27 of 45
📌 Example Prediction:
New Input: Temp = 48°C, Humidity = 78%, Gas = 620 PPM, Light = 150 Lux
Optimized Model Output: 🚨 Fault Detected!
Now, you can take action before the machine fails, preventing costly damage.
Process Uses convolutional neural networks (CNNs) to Mimics axolotl behaviors (growth, regeneration,
learn and combine multi-scale features from reproduction) to find the best parameter
sensors. settings.
Outcome Provides a fault diagnosis model that Enhances fault classification accuracy by fine-
classifies data as normal or faulty. tuning DMSFNN parameters.
False Positive (FP) Mistakenly classifying faulty operation as normal (false alarm).
Performance Metrics:
1. Accuracy: Measures how many predictions are correct.
TP + TN
Accuracy =
TP + TN + FP + FN
2. Precision: Measures how many of the predicted normal operations are actually normal.
TP
P recision =
TP + FP
3. Sensitivity (Recall): Measures how well the model detects faulty conditions.
TP
Sensitivity =
TP + FN
6. False Alarm Rate: Measures how often the model wrongly classifies normal operations as faulty.
FP
F AR =
FP + TN
Simulation results (Figures 3-10 in the paper) show that the proposed method performs better in
terms of accuracy, precision, and computational time than other techniques.
Key Takeaways:
DMSFNN + MAO helps in more efficient fault detection in IoT sensor networks.
The method is tested using a simulation on a powerful computer.
The performance is compared using standard metrics like accuracy, precision, and false alarm rate.
The proposed method performs better than existing fault detection models.
Would you like an example of how this works in a real-world IoT system? 😊
5. Conclusion
This manuscript effectively implements Enhanced Fault Diagnosis
in IoT, Uniting Data
Fusion with Deep Multi-Scale Fusion Neural Network (FD-IoT-
DMSFNN). Python is used to
implement the recommended strategy. Performance metrics like
Accuracy, Precision,
Sensitivity, Specificity, F-measure, MCC, False Alarm Rate, and
Computational Time are
examined here. The proposed method achieves 14.41%, 15.98%,
and 19.76% higher precision
than existing techniques like FD-IoT-LSTM, FD-IoT-DRL, and FD-
IoT-ML-LSTM, respectively.
Based on the present stage of fusion model development, the
following challenges and
barriers are encountered while fusing sensory data. No one-size-
fits-all fusion model is
available to identify and predict mechanical defects in the field.
Many of the fusion models
available today are built around a particular gadget. Providing a
uniform framework for
recognising mechanical equipment failures in the future would be
beneficial. Environmental
factors are uncontrollable; there is a lot of noise in the data
collected throughout the process.
Data fusion and feature extraction are typically wrong if the
unique data are used directly.
When raw data are provided, using an appropriate data
Page 34 of 45
pretreatment strategy rather than the
methods used in this work is crucial. In future development, a set
of preprocessing techniques
for various sensors utilised in mechanical gear defect analysis and
prediction would be
helpful. When employing fusion approaches based on deep
learning, finding the correct
hyperparameters takes a long time. Moreover, overfitting may
happen. Fusion methods
usually need manual feature extraction, which increases
computation time. Most fusion
algorithms are centred on research into feature and decision-level
fused algorithms. There
aren't many algorithms for combining data. In light of this,
developing data fusion techniques
will be essential.
Declaration of interests: The authors declare that they have no
known competing financial interests or
personal relationships that could have appeared to influence the
work reported in this paper
sumamrise it in simpler terms
Summary of Conclusion
This paper successfully introduces a method for Enhanced Fault Diagnosis in IoT using a Deep Multi-
Scale Fusion Neural Network (FD-IoT-DMSFNN), implemented in Python. The performance of this
method is evaluated using various metrics like Accuracy, Precision, Sensitivity, Specificity, F-measure,
and False Alarm Rate.
Key Findings:
1. Lack of Universal Models: There isn't a one-size-fits-all model for detecting mechanical defects in
different types of equipment. Most existing models are designed for specific devices.
2. Environmental Noise: Data collected can be noisy due to uncontrollable environmental factors,
affecting the accuracy of data fusion and feature extraction.
3. Data Preprocessing Needs: It's important to apply appropriate data preprocessing methods before
using raw data to improve the results.
4. Hyperparameter Optimization: Finding the right hyperparameters for deep learning models can be
time-consuming, and there's a risk of overfitting.
5. Manual Feature Extraction: Many fusion methods require manual feature extraction, which can
increase computation time.
6. Limited Algorithms for Data Fusion: Most current research focuses on feature and decision-level
fusion, with fewer algorithms dedicated to combining raw data effectively.
Page 35 of 45
Future Work Suggestions:
Develop preprocessing techniques for various sensors used in mechanical defect analysis.
Create new data fusion techniques to enhance the overall performance of fault diagnosis systems.
Declaration of Interests: The authors confirm they have no financial conflicts or personal relationships
that could affect this research.
3. Proposed Methodology
This section discusses Enhanced Fault Diagnosis in IoT Uniting
Data Fusion with Deep
Multi-Scale Fusion Neural Network (FD-IoT-DMSFNN). First, the
CWRU Dataset is used to
obtain input sensor data. Then, the sensor data is normalised
using Multivariate Fast Iterative
Filtering. Then, as the normalised sensor data are widely
scattered and varied, the Deep
Isolation Forest (DIF) technique is used to detect data outliers.
The cleaned sensor data is then
combined via data fusion, and a Deep Multi-Scale Fusion Neural
Network (DMSFNN) can be
used to detect faults in the IIoT. In general, no optimisation
procedures are adopted by the
DMSFNN approach to scale the ideal parameters. For this reason,
a Mexican Axolotl
Optimization (MAO) is presented in this framework to adjust the
weight parameter of
DMSFNN, demonstrating an improved fault identification
procedure. The suggested
methodology's block diagram is depicted in Figure 1, and the
proposed framework's thorough
description is provided below.
Figure 1. Block diagram for the proposed FD-IoT-DMSFNN.
3.1. Data Acquisition
Initially, the CWRU dataset serves as the input data source. The
Case Western Reserve
University Laboratory's CWRU dataset [28] was produced by
placing multiple accelerometers
around the bearing motor to gather vibration data under various
operating situations. An
electric spark destroyed the bearing motor in the CWRU dataset to
simulate real bearing
failures. The rolling element, inner raceway, and outer racetrack
of the bearing drive or fan
end have specific areas of difficulty. The Case Western Reserve
University (CWRU) bearing
dataset was used. It comprises vibration signals obtained from a
test rig for bearings under
various working settings, such as standard and malfunctioning
states. High-resolution time
series data was obtained by sampling the data at a rate of 12 kHz;
the dataset is roughly 4 GB.
Page 36 of 45
To collect vibration data, accelerometers installed on the bearing
housing are part of the
sensor network. A data-collecting system digitalises the analogue
signals from these sensors
and then wirelessly sent to a central server via the Zigbee
protocol, guaranteeing minimal
power consumption and excellent dependability.
Figure 2. Configuring a bearing motor to produce a CWRU dataset.
The four available diameters of the bearing rolling element could
indicate different types
of faults. The four load categories and four bearing motor speeds
correspond to a distinct
bearing motor problem. Real-world industrial applications are
modelled by the bearing motor
design shown in Figure 2. A torque transducer, encoder, motor,
and bearings are among the
components. Electrical discharge machining was used to create
fictional flaws in the bearings.
Figure 2 shows a Bearing Motor to Produce a CWRU Dataset. A
portion of the CWRU dataset
was used to generate the experimental data. Collecting data
under typical conditions took a
motor load of one horsepower and a speed of 1772 rpm. The
diameter was 0.007 inches, and
the frequency was 12000 samples per second. The rolling of the
driving end and the outer and
inner raceways were all damaged.
provide the important points to say for the presentation regrading
the data acqustion part introduction and data acqution part
2. Significance of the Dataset: The CWRU dataset is essential for simulating real-world conditions,
particularly for identifying and diagnosing faults in bearing motors.
Sensors gather data under various operating conditions, including normal and faulty states.
2. Dataset Characteristics:
The dataset consists of high-resolution time-series data sampled at 12 kHz.
The total dataset size is approximately 4 GB, providing comprehensive information for analysis.
3. Simulation of Faults:
Page 37 of 45
The dataset was created by intentionally damaging the bearing motor, simulating actual
bearing failures through electrical discharge machining.
Different types of faults can be identified based on variations in the rolling elements of the
bearing, load categories, and motor speeds.
The accelerometers on the bearing housing are part of a sensor network that digitalizes analog
signals for processing.
Data is wirelessly transmitted to a central server using the Zigbee protocol, which ensures low
power consumption and high reliability.
2. Fault Indicators:
The four diameters of the bearing rolling element correspond to different fault types.
The setup reflects real-world industrial applications, as seen in the bearing motor
configuration diagram.
3. Experimental Conditions:
Data was collected under standard conditions with a motor load of one horsepower and a
speed of 1772 rpm.
Faults were introduced in the driving end and the outer and inner raceways of the bearing to
facilitate comprehensive fault analysis.
Visual Aids
Figures: Reference Figure 1 (block diagram of the proposed methodology) and Figure 2
(configuration of the bearing motor for CWRU dataset) to illustrate data acquisition and the
experimental setup.
Conclusion
Highlight the importance of the CWRU dataset in enhancing fault diagnosis accuracy and
effectiveness in IoT applications, emphasizing the relevance of using high-quality, real-world data
for developing robust diagnostic algorithms.
Robustness to High (deep learning Moderate (sensitive Low (requires Low (depends on
Noise improves feature to fine-tuned density threshold
extraction) hyperparameters) parameters) settings)
Unlike traditional Isolation Forest (IF), DIF uses deep learning techniques to extract meaningful
multi-scale features, leading to better anomaly detection.
3. Resilient to Noisy Data:
Industrial sensor data is noisy; DIF is robust to noise due to its deep feature representation.
4. Scalability:
Works efficiently on large datasets like CWRU, whereas methods like One-Class SVM and
DBSCAN struggle.
5. Adaptive to Complex IoT Fault Patterns:
DIF captures complex fault patterns better than rule-based or distance-based methods.
Conclusion
Deep Isolation Forest is chosen because itoutperforms traditional outlier detection methods in
handling large-scale, high-dimensional, and noisy IoT sensor data. By using DIF, the system improves
fault diagnosis accuracy by filtering out irrelevant anomalies before the deep learning model
processes the data.
Page 39 of 45
Implemented E-greedy algorithm to train the actor and dynamically adjust policies via the
critic.
Overfitting Risk: The actor-critic model may overfit to the training set, reducing generalization
to real-world industrial environments.
Lack of Multi-Scale Feature Extraction: Does not effectively extract multi-scale patterns, which
are crucial in complex industrial faults.
Why It Failed?
The model struggles when data is scarce, leading to unreliable fault detection in real-world
settings.
Policy adjustment using the critic does not always adapt well to dynamic industrial conditions.
The model does not consider feature fusion, leading to suboptimal fault identification.
Memory Constraints: LSTM models rely heavily on past data and may struggle with large-scale
IIoT datasets.
Why It Failed?
The approach cannot handle real-time fault detection efficiently due to the long training time.
Feature selection is not optimized, which reduces the model's ability to detect subtle fault
patterns.
It does not integrate data fusion techniques, limiting its performance when multiple sensors
are involved.
Privacy Concerns: Although FL improves data security, it does not entirely eliminate privacy
risks (e.g., model inversion attacks).
Limited Generalization: FL struggles with unbalanced datasets, leading to poor fault detection
for rare fault conditions.
Why It Failed?
Computationally expensive, as FL requires multiple rounds of distributed model training.
FL-based models struggle when fault patterns are highly complex or rare.
The aggregation technique (FedAvg) does not handle sensor fusion, making it less effective for
multi-sensor data integration.
Why FD-IoT-DMSFNN is Better?
Page 42 of 45
Why Deep Learning (DL) Instead of Machine Learning (ML) for Fault Diagnosis in
IIoT?
Deep Learning (DL) is preferred over traditional Machine Learning (ML) in fault diagnosis for Industrial
IoT (IIoT) because of its ability to handle complex, multi-sensor data, extract hierarchical features
automatically, and improve fault identification accuracy. Below are the key reasons for choosing DL
over ML in this research:
🔹 Example: In bearing fault detection, traditional ML requires manual feature extraction (e.g.,
frequency domain analysis, statistical features), while DL learns these features automatically.
Multi-layer neural networks learn low-level features (e.g., noise patterns) in early layers and high-
level fault patterns in deeper layers.
🔹 Example: A CNN can detect small cracks in an industrial machine's vibration data, which may be
missed by ML models that rely on manually engineered features.
Traditional ML models struggle to handle heterogeneous sensor data and often require separate
preprocessing steps.
🔹 Example: In FD-IoT-DMSFNN, the model fuses data from different sensors and learns
interdependencies between them, which ML models cannot do effectively.
ML models require carefully labeled and preprocessed data, making them unsuitable for real-time
applications.
DL models, especially autoencoders and CNNs, are resistant to noise and can perform feature
learning even with raw sensor data.
🔹 Example: In IIoT fault diagnosis, if some sensors provide incomplete data, DL can still make
predictions using learned patterns, while ML models may fail or require extensive data cleaning.
🔹 Example: The proposed FD-IoT-DMSFNN model achieves 14.41%–19.76% higher precision than ML-
based fault detection methods.
Page 44 of 45
DL models can perform end-to-end learning, making them more efficient for real-time fault
detection in IIoT systems.
🔹 Example: In real-time industrial fault detection, a DL model can continuously monitor sensor data
and detect anomalies without manual intervention, whereas ML models require preprocessed data
and feature extraction.
Multi-Sensor Fusion Difficult to integrate multiple Easily combines data from different
sensor sources sensors
Accuracy in Fault Detection Moderate accuracy Higher accuracy and better
generalization
Page 45 of 45
📌 Final Takeaway:
Deep Learning is the best choice forIIoT fault diagnosis because it:
✅ Handles complex, multi-sensor data
✅ Learns hierarchical features automatically
✅ Provides higher accuracy and robustness
✅ Enables real-time fault detection
✅ Eliminates the need for manual feature selection
That's why FD-IoT-DMSFNN is designed using Deep Learning, not traditional ML techniques. 🚀