Mest D 24 00142

Journal of Mechanical Science and Technology
An improved MF1-FedAvg based Federated Learning method with MSRANet for

machinery fault diagnosis
--Manuscript Draft--
Manuscript Number: MEST-D-24-00142
Full Title: An improved MF1-FedAvg based Federated Learning method with MSRANet for
machinery fault diagnosis
Article Type: Original Paper
Keywords: Federated learning; Fault diagnosis; Rolling bearings; Multiscale residual attention
network
Abstract: Current fault detection methods for rolling bearings suffer from insufficient data
features, limiting the generalization capability of models. Typically, conventional
approaches train the model with a significant amount of labeled data to improve
reliability. However, centralized training poses potential risks of data privacy leakage.
In response to this issue, we propose a federated learning-based fault diagnosis
model. In this method, fault diagnosis models for different clients are collaboratively
trained by multiple entities with distinct fault characteristics, eliminating the need for
third-party aggregation and thereby reducing the risk of data leakage. Specifically, we
design a multi-scale residual neural network with the ability to perform direct feature
extraction from fault data. This proposed network integrates attention units for various
scales, emphasizing key features of bearing faults and enhancing the fault recognition
capability of local models. Moreover, to address the inherent problem in traditional
federated learning frameworks—disparities in client contributions, leading to
suboptimal model quality and prolonged training times—this research introduces an
innovative weighted strategy based on multi-class F1 scores. This strategy assigns
higher weight to high-quality local clients, thereby enhancing both model quality and
training speed. Experiments were conducted on two authentic bearing datasets, and
the results demonstrate that the proposed method can achieve an average reduction of
approximately 15% in training iteration times compared to the federated averaging
algorithm, coupled with an average enhancement of about 5% in fault diagnosis
accuracy. The experimental results indicate that the proposed method exhibits
outstanding accuracy and robustness.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript Click here to access/download;Manuscript;Article 1.22-
1(4).docx
G. Bell et al. / Journal of Mechanical Science and Technology 23 (2009) 1261~1269
An improved MF1-FedAvg based Federated Learning

method with MSRANet for machinery fault diagnosis
Xiuyan Liu1, Chunqiu Pang1, Tingting Guo2, Donglin He1
1
School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
2
School of Electromechanical and Automative Engineering, Yantai University, Yantai, China
Keywords: Federated learning; Fault diagnosis; Rolling bearings; Multiscale residual attention network
Correspondence to: Tingting Guo / [email protected]
Abstract Current fault detection methods for rolling bearings suffer from insufficient data features, limiting the generalization
capability of models. Typically, conventional approaches train the model with a significant amount of labeled data to improve reliability.
However, centralized training poses potential risks of data privacy leakage. In response to this issue, we pro-pose a federated learn-
ing-based fault diagnosis model. In this method, fault diagnosis models for different clients are collaboratively trained by multiple
entities with distinct fault characteristics, eliminating the need for third-party aggregation and thereby reducing the risk of data leakage.
Specifically, we design a multi-scale residual neural network with the ability to perform direct feature extraction from fault data. This
proposed network integrates attention units for various scales, emphasizing key features of bearing faults and enhancing the fault
recognition capability of local models. Moreover, to address the inherent problem in traditional federated learning frameworks—
disparities in client contributions, leading to suboptimal model quality and prolonged training times—this research introduces an
innovative weighted strategy based on multi-class F1 scores. This strategy assigns higher weight to high-quality local clients, thereby
enhancing both model quality and training speed. Experiments were conducted on two authentic bearing datasets, and the results
demonstrate that the proposed method can achieve an average reduction of approximately 15% in training iteration times compared
to the federated averaging algorithm, coupled with an average enhancement of about 5% in fault diagnosis accuracy. The experi-
mental results indicate that the proposed method exhibits outstanding accuracy and robustness.
1. Introduction sharing data between different companies and plants, which is

often undesirable or even unfeasible because data privacy is
Rolling bearings are crucial components of machinery widely important in the industry and conflicts of interest. moreover, var-
used in the national economy and critical areas of national de- ying sampling frequencies among different organizations may
fense. These bearings operate under severe conditions of high make the collection and utilization of data daunting. For example,
temperature, pressure, and load for prolonged periods, making vibration data and current signals can be obtained separately by
them susceptible to mechanical failure, which can significantly different industrial clients, and it is not straightforward to directly
impact the overall performance of industrial equipment. Conse- utilize different types of data for better performance. These chal-
quently, scholars have been focusing on fault diagnosis for rolling lenges are known as the data silo problem and usually occur in
bearings, and data-driven fault diagnosis methods based on su- different industries.
Federated learning presents a promising approach to tackling
pervised learning have achieved remarkable results. [1-3].
a range of challenges. It is a nascent machine learning tech-
Although data-driven fault diagnosis methods based on su-
nique first introduced by Google in 2016 to predict users' text
pervised learning have demonstrated great detection results,
input within many Android devices while preserving data privacy
the available methods require a considerable amount of high-
on each device [4]. In essence, federated learning involves a 联邦学习
quality supervised data to train an effective fault detection model.
central server coordinating multiple decentralized clients for ma-
However, in most industries, collecting high-quality supervisory
chine learning, as illustrated in Fig. 1.
data, especially in some profound fault states, is very challeng-
In recent years, a plethora of data-driven fault diagnosis tech-
ing because the time the bearing is in the fault state is limited.
niques based on artificial neural networks (ANN) and support
Therefore, collecting high-quality supervisory data would require
vector machines (SVM) [5-7] were successfully developed and
significant human and financial resources.
implemented. Among these, deep neural network-based fault
To solve the issue, a more intuitive solution would involve
detection methods gained significant attention for their powerful
amalgamating supervised data from various organizations,
automatic feature extraction capability and end-to-end training
leading to the joint development of a global fault diagnosis model.
mode integrated with classifiers [8-9]. The success of deep
This approach minimizes expenses and results in superior per-
learning methods was witnessed in various applications related
formance owing to creating a model that all stakeholders can
to mechanical fault diagnosis. In addition to traditional fully con-
update.
nected networks, several variants of neural networks were also
Despite the potential benefits, such solutions often require
1
0000 Journal of Mechanical Science and Technology 00 (0) (2020)
successfully developed and applied to mechanical fault diagno-

sis. For instance, Feng et al. [10] introduced a normalized deep
convolutional neural network to address the challenge of unbal-
anced fault data. Meanwhile, Liu et al. [11] proposed an RNN-
based self-encoder for the fault diagnosis of rolling bearings. Ad-
ditionally, Zhao et al. [12] presented two deep residual systolic
networks (DRSNs) based on Resnet, one DRSN using channel-
shared thresholding (DRSN-CS) and the other using channel
thresholding (DRSN-CW), to enhance the capability of ResNet
to extract features from noisy vibration signals with the ultimate
objective of achieving higher accuracy. Overall, deep neural net-
work-based fault detection methods demonstrated high success
rates in addressing the challenges associated with mechanical
fault diagnosis.
The achievement of successful data-based deep learning
methods was contingent upon the availability of high-quality Fig. 1. Illustration of the federated learning scheme.
data. However, users often faced a shortage of such data, and
there were variations in the data from different domains. To al- learning algorithms. This model enables the collaborative partic-
leviate these domain differences, migration learning was pro- ipation of multiple clients in the training process while safeguard-
posed. Migration learning could learn the domain similarity of ing data privacy. Importantly, the model does not necessitate di-
distinct datasets and facilitate data migration by capitalizing on rect access to the local data of different clients. Instead, it con-
this similarity when the data stemmed from different distributions. tinuously updates and refines itself through parameter commu-
In their research, Zhang et al. [13] proposed a semi-supervised nication with various clients. This iterative process enhances the
integrated learner (SSIT) that leveraged migration learning for fault detection capabilities of the clients, contributing to improved
predicting engine-bearing faults in aero-engine systems. The model performance without compromising data confidentiality.
proposed approach addressed the issues of low prediction ac- Unlike traditional centralized machine learning methodologies
curacy and overfitting frequently encountered during aero-en- that require collecting all data for model training, federated learn-
gine bearing fault diagnosis. The experimental findings demon- ing does not necessitate data collection from individual clients.
strated that migration learning could overcome the instability of In each training round of federated learning, the local clients
data in traditional supervised learning methods. download the initial model parameters from the central server,
Nonetheless, to enhance the stability and accuracy of data- use their local data for training, and subsequently upload the
driven fault diagnosis models, it was recommended to utilize su- trained model parameters to the server. The server then incor-
pervised training data from multiple domains. porates and updates the model parameters based on the up-
The feasibility of mitigating the problem of insufficient fault loaded parameters from each client to generate a new server
data in federated learning by utilizing data from multiple local parameter. This iterative process enables the aggregated model
customers while protecting data privacy was investigated. to converge to a certain level of accuracy.
McMahan et al. [14] introduced the federated averaging algo- Throughout the training process of federated learning, each
rithm to reduce the communication frequency between custom- client’s data remains within the local area, thereby ensuring the
ers and the central server, thereby enhancing the communica- privacy of the data.
tion efficiency of federated learning. Li et al. [15] proposed a fed- 多尺度残差注意力网络
The main contributions of this paper are as follows:
erated learning-based intelligent diagnosis method to tackle the (1) We are pleased to present a novel approach for fault de-
limited sharing of mechanical equipment fault data. Zhang et al. tection in bearings using a multi-scale residual attention network
[16] proposed a federated learning method to address the chal- based on Res2net. The proposed network employs an attention
lenge of data islands in machinery fault diagnosis. Wang et al. mechanism to extract key features of bearing faults, enhancing
[17] improved the efficiency of federated learning by allowing the model's feature learning and fault detection capability. Spe-
edge nodes to select specific models from the cloud for asyn- cifically, multi-scale features are extracted and fused to obtain
chronous updates based on local data distribution, thereby re- the critical features of faults. Macro-F1加权聚合策略
ducing computation and communication. However, the tradi- (2) We also propose a Macro-F1 weighted aggregation strat-
tional federated averaging algorithm did not consider data im- egy to optimize the training of imbalanced datasets in the model
balance, which could be resolved by devising solutions for data aggregation phase of federation learning. This strategy weights
imbalance between devices. To this end, Geng et al. [18] opti- the clients based on their different Macro-F1 scores and can im-
mized the weighting strategy of the model based on traditional prove the accuracy of federated learning.
federated learning, resulting in an improved fault classification (3) We evaluated the effectiveness of our proposed network
accuracy of about 8.6% on average. Motivated by these previ- by conducting several experiments on two real-bearing datasets.
ous studies, in this paper, the objective of this study is to formu- Our results demonstrate that the proposed network can signifi-
late a centralized fault diagnosis model based on federated cantly improve fault detection performance. We perform even
better by combining the proposed network with the Macro-F1
weighted aggregation strategy. Overall, our study provides a
valuable contribution to the field of fault detection in bearings.
The remainder of this paper is organized as follows: Section
2
2 introduces the proposed method, and Section 3 experimen-

tally validates the proposed method. Finally, Section 4 presents
the conclusions.
2. Proposed method
2.1 Problem definition
In this study, supervised datasets, denoted as 𝑁 𝑖 ={𝑛𝑖 }, 𝑖 =
1,2, …., and 𝐷𝑖 = {𝑑𝑖 }, 𝑖 = 1,2, …, 𝐷𝑖 are acquired by individual cli-
ents. Each client aims to construct a fault diagnosis model by in-
tegrating all the available data. Conventionally, the global model
𝑀𝑎𝑙𝑙 is trained by pooling all the data on 𝐷𝑖 . However, this ap-
proach is unsuitable for the present scenario where the server
cannot access the client's data.
To address this issue, we aim to develop a global fault diagno-
sis model 𝑀𝑓𝑒𝑑 that applies to all clients 𝑁 𝑖 . To achieve this, lo-
cally trained models of different clients are communicated to the 带有故障分类单元
server, aggregating them to construct 𝑀𝑓𝑒𝑑 . It is assumed that 的残差块
𝑀𝑓𝑒𝑑 has a validation set, and the validation results adjust the
model aggregation. By constructing a central fault diagnosis
model, individual clients can learn from the fault information of
other participants while ensuring data privacy.
2.2 Multiscale Residual Attention Network

The present study investigates the effectiveness of multi-scale
feature extraction in fault detection models. Conventionally, fault
detection models employ Convolutional Neural Networks (CNN)
or Recurrent Neural Networks (RNN). However, the researchers Fig.2. Architecture of the proposed MSRANet model.
were motivated by Res2Net's [19] proposal of multi-scale feature
extraction and developed their proposed multi-scale residual at- demonstrates its effectiveness in improving the model's gener-
tention network, referred to as MSRANet. As illustrated in Fig. 2, alization capability and performance.
MSRANet comprises three residual blocks with fault classifica- Specifically, the original features are subjected to global max-
tion units. imum pooling and global average pooling to obtain two feature
卷积块 (1) Convolutional blocks in residual blocks: The residual blocks maps. These maps are then fed into a two-layer Multi-Layer Per-
consist of two Convolutional blocks, each with two 3x3 convolu- ceptron (MLP) with the ReLU activation function, where the first
tional layers, and the ReLU activation function is applied through- layer has C/r neurons (r being the reduction rate) and the sec-
out the network. The authors set up three residual blocks on four ond layer has C neurons. It is important to note that the two-layer
channels to extract multi-scale information on the features. Each neural network is shared. The resulting MLP output features un-
residual block consists of the convolution block and the proposed dergo element-wise summation and sigmoid activation opera-
attention unit. tions, yielding the final channel attention feature. The channel
注意力单元 (2) Attention unit in the residual block: The attention unit, de- attention feature and the input feature map F are multiplied ele-
picted in Fig.3, extracts various scales of bearing fault infor- ment-wise to generate the input features required by the Spatial
mation on the four channels. To effectively extract this infor- attention module. This proposed method shows promise in en-
mation, this paper proposed an attention unit that can highlight hancing the performance of spatial attention models and could
生成空间注意力模块所需的输入特
critical fault information and suppress extraneous feature infor- have potential plications in various fields. 征
mation. The input feature map of the module is obtained from the fea-
This study presents a novel approach to acquiring input feature map output of the Channel attention module. Initially, global
tures essential for the spatial attention module. In multiscale net- max pooling and global average pooling techniques are applied
works, the SE (Squeeze-and-Excitation)[20] attention mecha- on the channel to generate two feature maps concatenated on
nism is extensively utilized, emphasizing the acquisition of fea- the channel. The spatial attention feature is generated by using
ture layers and channel-wise weights for incoming features. This the sigmoid function multiplied by the input feature of the module
enables the network to prioritize channels critical to its attention. to obtain the final generated feature.
In contrast to the SE attention mechanism, the attention mech- The original vibration acceleration data is used as input,
anism proposed in this paper concurrently incorporates both and it first passes through a convolutional layer with a kernel
channel attention and spatial attention. Experimental testing size of 1x1 to obtain initial feature maps 𝑋𝑖 , where i belongs
与SE注意力机制相反，本文提出的注意力机制同时结合了通道注意力和空间注意力
3
the bearing faults. This feature map is subsequently com-

bined with the original data and sent to the fault classification
unit through the hopping layer connection for the final fault
classification.
Thus, MSRANet can extract multi-scale features for detect-
ing and classifying bearing faults, augmenting the model's ac-
curacy.
2.3 Federated Learning with Macro-F1

Weighting Strategy
Fig. 3. Illustration of the proposed attention unit.
Federated learning has surfaced as a promising approach
for the distributed training of models that do not necessitate
to the set {1,2,3,4}. Each feature map has the same scale size,
access to client data. Such an approach enables clients to
but the channel is one-fourth of the input features. Except for
collaborate in training a globally optimized model. In every
the first channel, the rest have a convolution block.
training iteration, clients obtain the global model from the
The output of the convolution block is defined as e attention
server, employ their local data to update the model weights,
unit weights 𝐾𝑖 and𝐾𝑖−1 output and then operated with the
and then transmit the updated model weights back to the
feature map added to get 𝐾𝑖 .
server. The server subsequently performs an aggregation al-
gorithm to update the model weights. One such algorithm, Fe-
 xi i 1 dAvg, was proposed by [21]. The FedAvg algorithm (shown

Yi   Ki ( xi ) i2 in Algorithm 1) involves the local parameter update of a ran-
K ( x  y ) 2  i  s domly selected subset of clients in each iteration, followed by
 i i i 1
(1)
the server's weighted averaging of all client parameters. This
procedure efficiently minimizes communication bandwidth
Upon receiving the feature map 𝑋𝑖 each convolution block
and costs. The main steps involved in the FedAvg algorithm
performs output calculation to expand the perceptual field be-
are briefly outlined as follows:
yond 𝑋𝑖 . The feature information obtained from various percep-
(1) Client Update: In lines 1-7, b represents the batch size
tual fields exhibits dissimilarities; the smaller perceptual field de-
of each client. The equation in line 6 executes stochastic gra-
tects feature details, while the larger perceptual field primarily
dient descent, while line 7 returns the local gradient to the
focuses on feature location information. Consequently, multi-
server.
scale convolution is implemented to amalgamate features from
(2) Server Aggregation: The present study delves into the
different perceptual fields. Fig. 3 elucidates that the second
concept of server aggregation, which involves the initialization
channel employs a residual block to transform the feature map
of model parameters by the server and their dissemination to
𝑋2 into the output 𝑌2 . After that, 𝑌2 is subjected to a 3X3 con-
each client at the outset, as indicated in line 10. In line 12, a
volution block in the same channel as 𝑋2 , generating output
random sampling technique is employed to select m clients
equivalent to that obtained from a 5×5 convolution. Furthermore,
whose gradient updates are averaged to generate a global
a fusion of features from the 3×3 and 5×5 receptive fields pro-
update. Subsequently, in lines 13-16, 𝑆𝑡 , a randomly chosen
duces 𝑌3 , whereas applying a 7×7 receptive field to 𝑌4 results
subset of participating clients, undergoes training on the up-
in its formation. As a result, the four channels encode feature
dated gradient, which is then uploaded to the server. The up-
information of varying scales, which are integrated using 1X1
dated parameters are obtained by performing averaging, as
convolution to generate multi-scale features that identify bearing
mentioned earlier. Finally, in line 15, the global parameters
faults.
𝑤𝑡+1 are calculated by computing weighted averages of the
(3) Fault classifier: The proposed fault classifier is a multi-
received parameters from the participating clients.
classifier that consists of two fully connected layers and a
The conventional federated averaging algorithm consoli-
layer with cross-entropy loss.
dates model parameters acquired from individual clients by
In MSRANet, the output 𝑌𝑖−1 from the previous residual
employing a weighted average technique predicated on the
block is subjected to a channel attention unit, which assigns
sample count originating from each participant. Nevertheless,
weights based on the significance of critical bearing failure
in practical industrial scenarios, the volume and nature of data
information. The weighted 𝑌𝑖−1 is then added to 𝑌𝑖 and 𝑋𝑖+1
held by each client frequently exhibit asymmetry, thus consid-
and transmitted to the next residual block. It is important to
erably influencing the weighted average outcome of the fed-
note that the attention units employed in each residual block
erated averaging algorithm and hampering fault diagnosis.
are uniform. The outputs of the four groups of 𝑌𝑖 obtained
Hence, there is a pressing need to tackle the obstacle of up-
from these attention units are concatenated and subjected to
holding optimal performance in federated learning amidst im-
a 1X1 convolution, resulting in a multi-scale feature map of
balanced client data.
4
 Pr ecision
Algorithm 1 FedAvg.
i
1: Function ClientUpate Run on local clients
Pr ecisionmacro  i
2: while iter < max_iter do n (2)
3: // Received initialize model w init from sever n
4: for each local epoch i from 1 to E do  Recall i
5: for batch b  B do Recallmacro  i 1
6: w  w  a( w, b) n (3)
return w to sever Pr ecisionmacro Recallmacro
7: F1imacro 2
8: end for Pr ecisionmacro  Recallmacro
(4)
9: end for
10: end while
In this study, the Macro-F1 scores for each client were calcu-
11: Function SeverAggregation Run on Sever node
lated by computing the average accuracy and recall for each
12: Initialize winit category. However, the precision and recall of certain fault clas-
13: for each round t  1,2, do ses may be lower due to smaller sample sizes, which can occur
14: m  max(C  K , 1) when these classes are scarce and only a few clients possess
15: St  (random set of n clients) them. As a result, the Macro-F1 scores of these classes may
16: for each participant i  St in parallel do also be lower. In contrast to the traditional federal average algo-
rithm, this can lead to a faster decrease in the weights of these
17: wti1  ClientUpate(i, wt )
K
clients. Conversely, high Macro-F1 scores were consistently ob-
ni
18: wt 1   wti1 served during the federated learning iterations for fault data
i 1 types with factual data.
19: end for Based on Eq. (5), the weights of model aggregation are af-
20: end for fected by the Macro-F1 scores. An improved federated average
The Federated averaging algorithm, used in the model aggre- algorithm MF1-FedAvg is proposed based on the proposed
gation stage, necessitates a consistent evaluation index for the Algorithm 2 MF1-FedAvg.
weights of individual clients. However, the original algorithm fails 1: Function ClientUpate Run on local clients
to consider the data disparities between clients, solely relying on 2: while iter < max_iter do
the volume of data they own. To enhance the algorithmic perfor- 3: // Received initialize model winit from sever
mance of the traditional FedAvg, the accuracy was incorporated 4: for each local epoch i from 1 to E do
in the weights of model aggregation. Still, the performance of 5: for batch b  B do
federated learning did not experience significant improvement w  w  a( w, b)
6:
[22]. With a large number of clients engaged in federated learn-
7: Calculate the M  F1 f . or each client
ing, holding different amounts and types of data, the algorithm’s
8: return M  F1, w to sever
performance is impacted by the number and types of faults held
by each client. Thus, to improve the weighting strategy of the 9: end for
conventional federated averaging algorithm, various factors like 10: end for
classification accuracy, recall, scarcity level, and others should 11: end while
be considered. The F1-score is frequently employed to weigh 12: Function SeverAggregation Run on Sever node
precision and recall for binary classification issues and is deter- 13: Initialize winit
mined as the average of precision and recall. Nonetheless, 14: for each round t  1,2, do
faulty data classes in practical industries are often multiclass, 15: m  max(C  K , 1)
rendering using the F1-score challenging.
16: St  (random set of n clients)
In this paper, a novel approach is proposed to tackle the chal-
lenges at hand. Considering that real industrial fault data fre- 17: for each participant i  St in parallel do
quently entails multiclass classification, this paper integrates the 18: wti1  ClientUpate(i, wt )
MF1 metric into the model's weight aggregation strategy.Based k
on Eq.(2)-Eq.(4),the Macro-F1 score is incorporated into the  n M  F1 w
i i
i
model-weighted aggregation strategy, building upon the tradi- 19: wt 1  i 1

K
tional federated averaging algorithm. The Macro-F1 score, com-  n M  F1
i 1
i i
puted as each category's average precision and recall, calcu-
lates the weight. Notably, the Macro-F1 score-based weight cal- 20: end for
culation is formulated in Algorithm 2. 21: end for
5
model aggregation weighting strategy and the algorithm is

processed as follows: 3.1.1 Dataset1:CWRU dataset
k
The Electrical Engineering Laboratory of Case Western Re-

i 1
ni F1macro wti1 serve University [23] has proposed an open dataset known as
the "rolling bearing dataset," which is made available to the
wt 1  k research community for experimentation and analysis. The
 n F1
i 1
i macro dataset was generated using an experimental platform shown
(5) in Fig.5, and faults were introduced using electric discharge
machining (EDM). The sampling frequency for the system
Moreover, the lower model quality will lead to an increment in was 12 kHz, and it was generated by considering four me-
the number of model aggregation iterations, which will consume chanical health states: healthy, outer ring failure, inner ring
more model transmission time. The proposed novel model ag- failure, and ball failure. Each failure state was further classi-
gregation weight strategy, through judicious weight allocation, fied into three degrees of severity, i.e., mild, moderate, and
assigns more weight to high-quality clients. Consequently, this severe, based on the damage diameter of 0.007 inches,
enhances the quality of the federated server model, reducing the 0.014 inches, and 0.021 inches, respectively. Ten machine
number of convergence iterations for the algorithm. This, in turn, operating condition cases were diagnosed for each of the four
effectively diminishes model transmission time, accelerating the states. The data distribution for the different failure types is
algorithm's convergence speed. presented in Table 1, where the bearing failure types are rep-
The proposed federated learning scheme for bearing fault di- resented by 1-9, while the normal type is represented by 0.
agnosis is depicted in Fig.4.The procedure commences with the The dataset comprises 10,000 data points, with each bearing
initialization of model weights, followed by dissemination to each failure type having 1000 data points. The dataset has been
participating client. Subsequently, each client undergoes local divided into 8000 training samples and 2000 test samples to
training utilizing the MSRANet and then forwards the trained ensure the experiment's credibility.
model and the Macro-F1 scores to the server. The server as-
signs weights to each client based on their corresponding
Macro-F1 scores to obtain a novel aggregated model. The up- 3.1.2 Dataset2: Jiangnan University bearing da-
dated model is then broadcast to the clients, and this iterative taset
process continues until the maximum number of iterations is To further assess the efficacy of the proposed approach,
reached. Notably, the integration test data consisting of individ- the dataset of bearings from Jiangnan University [24] was em-
ual client data is employed for model testing to ensure data con- ployed as Case 2 for additional verification. This dataset was
fidentiality. procured from the fault diagnosis test bench of the centrifugal
3. Experimental study fan system with rolling bearings at Jiangnan
3.1 Data descriptions
(a)
Fig. 4. Overall flow chart of the proposed federated learning scheme for (b)
the bearing fault diagnosis scenario.
Fig. 5. Rolling bearing failure experimental device.
6
(2) SVM: It is a generalized linear classifier class that classi-

Table 1. Descriptions of Dataset 1.
fies data by supervised learning[26]. Before the emergence of
Fault loca- Motor Fault size Length of Sam- La- deep learning, SVM was considered the most successful and
tion Speed (mil) ple bel best-performing algorithm in machine learning in the last decade.
Normal 1797(rpm) 0 1024 0 This paper uses a multi-classification SVM with hyperparame-
IF 1797(rpm) 7 1024 1 ters C=5 and gamma=0.05.
BF 1797(rpm) 14 1024 2 (3) Resnet: A one-dimensional residual neural network[27]
OF 1797(rpm) 21 1024 3
with four convolutional blocks of the same structure as the pro-
posed model stacked on a single scale.
IF 1797(rpm) 7 1024 4
(4) MSRNet: The attention unit is removed from the model
BF 1797(rpm) 14 1024 5
proposed in this paper.
OF 1797(rpm) 21 1024 6 To verify the proposed model and method, we conducted ex-
IF 1797(rpm) 7 1024 7 periments on a computer server with NVIDIA 4090. The fault di-
BF 1797(rpm) 14 1024 8 agnosis model and the proposed MF1-FedAvg approach are im-
Of 1797(rpm) 21 1024 9 plemented by Pytorch and Python.
Table 4 presents a comprehensive exposition of the specific
Table 2. Descriptions of Dataset 2. parameters of our proposed model. The hyperparameters em-
ployed for the training process are standardized across all mod-
Fault location Motor Speed Fault size (mil) Length of Sample Label els. The results of this study are typically computed as the mean
Normal 1000(rpm) 0 1024 0 and standard deviation of five iterations. Table 5 itemizes the
IF 1000(rpm) 7 1024 1 hyperparameters utilized for the experiments documented in
BF 1000(rpm) 14 1024 2 this paper.
OF 1000(rpm) 21 1024 3 The results of this study, as depicted in Fig.6, indicate that
MSRANet outperforms the other models in terms of diagnostic
accuracy. Our analysis further revealed that SVM exhibits
University. The experimental setup for the rolling bearing
suboptimal performance when handling multiple classification
fault diagnosis is illustrated in Figure 5. The vibration signal in tasks due to the unsuitability of support vector machines for such
the vertical direction was acquired using a PCBMA352A60 problems. Additionally, the performance of SVM is hindered by
acceleration sensor, with a sampling frequency of 50 kHz and the need for matrix operations, leading to increased memory
a speed of 1000 r/min. The four mechanical health states - and time consumption, especially with larger datasets.
healthy, outer ring failure, inner ring failure, and ball failure - Moreover, our investigation demonstrated that MSRNet's
were considered, and the four corresponding machine oper- multiscale feature extraction significantly improves training ac-
ation status conditions were diagnosed. The data distribution curacy compared to Resnet. These findings prove that mul-
for different fault types is presented in Table 2, wherein 1-3 tiscale feature extraction is superior to single-scale extraction for
denote the bearing fault type, and 0 denotes the normal type. fault detection diagnosis.
Each bearing fault type comprises 1000 data samples; the to-
tal number of data samples is 4000. To ensure the experi- 3.2.2 Evaluation of the Proposed Attention
ment's authenticity, the Jiangnan University dataset was seg- Module
regated into 3,000 training samples and 1000 test samples.
An innovative attention unit is posited herein to amplify the
diagnostic efficacy of multiscale residual networks, which can
3.2 Experimental results automatically accentuate crucial characteristics within every
3.2.1 Performance evaluation of the proposed channel, exhibiting a strong correlation with fault features.
MSRANet model This study presents our proposed attention unit's efficacy
in improving fault diagnosis performance. To establish the ef-
In this study, we present our proposed MSRANet model for fectiveness of the attention unit, we conducted a comparative
bearing fault diagnosis and detection and assess its superiority analysis between the model variants with and without the at-
through comparative experiments on two datasets, Dataset 1 tention unit. We observed the comparison outcomes in Figure
and Dataset 2. We also compare it with variants of the model 7; the CNN model with the proposed attention unit has higher
and deep learning methods commonly used for bearing fault de- diagnostic accuracy than the CNN model on both datasets.
tection. Which are described as follows: As shown in Fig.7(a), the MSRANet model with the attention
(1) CNN: A one-dimensional single-scale CNN composed of unit demonstrates a 1.5% increase in accuracy compared to
two convolutional layers with a kernel size of 3, the same as the the MSRANet model without the attention unit. Additionally,
convolutional block structure of the proposed model[25]. The Dataset 2 has a higher noise level than Dataset 1. To further
model parameters are detailed in Table 3. examine the effectiveness of the attention unit, we evaluated
the performance of the MSRANet model with and without the
7
Table 3. Configuration of CNN model parameters.
Layer Filter Kernel_size Strides Padding Activation

Conv1D 64 3 1 valid Relu
MaxPooling1D - 3 1 valid -
MaxPooling1D - 3 1 valid -
Dense - - - - softmax
Table 4. Configuration of MSRAN model parameters.
Layer Filter Kernel_size Strides Padding Activation

MaxPooling1D - 1 1 valid - (b)
Conv1D block(4) 64,128; 1,3; 1,5; 1,7; 1,9 1 valid Relu
Fig. 6. Results of accuracy comparison of different models on two real
MaxPooling1D(4) - 1,3; 1,5; 1,7; 1,9 1 valid -
bearing datasets.
Self-Attention(4) - - - - softmax
Concatenate(4) - - - - - attention unit using Dataset 2. As shown in Fig.7(b), we found
Conv1D 256 1 1 Valid Relu that the MSRANet model with the attention unit outperformed
AvgPooling1D - 1 1 Valid - the MSRANet model without the attention unit by 5% accu-
racy.
Flatten - - - - -
The present study demonstrates that integrating the sug-
Dense - - - - softmax
gested attention module into the model architecture substan-
Table 5. Hyperparameters are used in this paper. tially enhances its fault diagnosis efficacy. The model exhibits
a meaningful elevation in its capacity to identify faults. These
Parameter Value outcomes carry significant implications for the realm of fault
Learning rate 0.02 diagnosis.
𝑁𝑟𝑜𝑢𝑛𝑑 100 The confusion matrix in Fig. 8 corresponds to MSRNet and
MSRANet from left to right. Our proposed MSRANet achieves
𝑁𝑖𝑛𝑝𝑢𝑡 (CWRU) 1024
the highest diagnostic accuracy, proving the excellent perfor-
𝑁𝑖𝑛𝑝𝑢𝑡 (JNU) 1024
mance of the proposed model in this paper. Notably, the
Batch size 16
model incorporating the attention unit demonstrated superior
performance across all fault categories for both datasets, in-
dicating its potential as a promising approach for enhancing
classification accuracy. These outcomes provide compelling
evidence for the utility of the proposed attention unit in the
domain of fault detection, as it effectively identifies salient fea-
tures that are instrumental in detecting faults, thereby leading
to improved performance.
Based on the findings mentioned earlier, the attention
mechanism in this work represents a significant contribution
toward fault detection.
(a)
8
achieves the highest accuracy rate on both datasets. The pre-

sent study provides compelling evidence that the proposed
model’s efficacy has a commendable generalization perfor-
mance.
3.2.3 Performance of the proposed model on

federated learning
Through the conducted experiments, it has been demon-
strated that the proposed model exhibits outstanding perfor-
mance in detecting bearing faults. Subsequently, to evaluate
the performance of the proposed model in the context of fed-
erated learning, the following experiments were conducted
under a set of predefined assumptions:
(1) The absence of communication between different cus-
tomers and the sharing of data
Fig. 7. Comparison of fault diagnosis performance of the proposed attention
unit on CNN and MSRANet. (2) identical troubleshooting tasks for all customers and
similar fault labels
(3) limitations in each customer's data, rendering the devel-
opment of an effective fault diagnosis model using local data
alone impractical
(4) sharing a fault diagnosis model between the central
server and each client.
In this study, different learning schemes are implemented
to show showcase the efficacy of the proposed model. In par-
ticular, we applied the following methods, with consistent ex-
perimental settings across all of them.
(a) (b) (1)Baseline: In this study, we further evaluated an extreme
scenario wherein communication between different clients is
disregarded, in contrast to the traditionally localized model
training approach. Specifically, each client utilizes its local
training data to train an individual fault diagnosis model, sub-
sequently employing this locally trained model for testing pur-
poses.
(2)Centralized: This approach adheres to a traditional cen-
tralized machine learning paradigm. It presupposes that all
data from distinct client terminals can be accessed by the
server without privacy constraints. In essence, the relevant
(c) (d)
datasets from diverse client terminals are amalgamated into
a comprehensive training dataset, subsequently employed di-
Fig. 8. The Confusion matrix of MSRNet, and MSRANet models under two
real bearing failure datasets. rectly in the development of data-driven fault diagnosis mod-
els. This method of data sharing often imposes an upper
Table 6. The diagnostic accuracy of different deep learning on two datasets. bound on the performance of models given the available train-
ing data.
Dataset1 Dataset2 (3)FedAvg: FedAvg is a commonly employed federated
Methods
Accuracy Accuracy learning algorithm that aggregates model parameters through
SVM 51.0% 76.0% weighted averaging. The fundamental concept of FedAvg in-
CNN 93.5% 80.0% volves uploading parameters from local models to the server,
Resnet 97.5% 85.0% where the server computes the average of all model parame-
MSRNet 98.1% 92.5% ters and subsequently broadcasts this average back to all lo-
MSRANet 99.6% 97.5% cal devices. This process can be iterated multiple times until
convergence.
In addition, we put the experimental results of both datasets The proposed model outperforms the CNN in both the
together in Table 6, and it can be seen that the proposed model baseline approach and the centralized and federated learning
schemes, as demonstrated by the accuracy comparison in
9
Fig.9. Moreover, the proposed model exhibits the smallest ac-

curacy difference between the joint and centralized learning
schemes on both datasets, indicating its potential for achiev-
ing the same diagnostic performance as the centralized learn-
ing scheme.
The classification results of the CNN, MSRNet, and MSRANet
under FedAvg for each class on both datasets are shown in the
confusion matrix in Fig.10, further supporting the superiority of
the proposed model in fault diagnosis. Overall, this study high-
lights the effectiveness of the proposed model in federation
learning, which has implications for improving the accuracy and
efficiency of fault diagnosis.
3.3 Performance comparison of FedAvg and

MF1-FedAvg (b)
Fig. 9. The test accuracy of two data sets under different methods.
Based on dataset 1 and dataset 2, the algorithm performance
of the proposed MF1-Fedavg is compared with the traditional
FedAvg in this paper; Fig. 11 shows the comparison of the diag-
nostic accuracy of the two algorithms on the two datasets, the
model used by the local client is the proposed MSRANet, exper-
iments on both datasets were set up to have three clients.
The present study investigates the performance of the pro-
posed Mf1-FedAvg algorithm in fault diagnosis using two differ-
ent datasets, namely, the CWRU and the Jiangnan University
datasets. The CWRU dataset comprising clean mechanical sig-
nals was initially examined, and the achieved test accuracy is
(a) (b)
presented in Fig.11(a). The MF1-FedAvg algorithm, which as-
signs the weight share of each client in the aggregation model
(c) (d)
(a)
(e) (f)
Fig. 10. The confusion matrix of the federated MSRANet models under two
real datasets.
10
(a)
(a)
(b)
Fig. 11. The comparison of testing accuracy between two datasets under
FedAvg and the proposed method. (b)
Fig. 12. Comparison of accuracy trends during iterative training on two da-
based on the client's MF1 score, outperformed the conven- tasets.
tional FedAvg algorithm regarding diagnostic accuracy. This
finding suggests that modifying the weighting strategy of the Fe-
dAvg algorithm to allocate a more significant weight to clients
with high-quality data can effectively enhance the fault diagnosis
accuracy within the FedAvg framework. The proposed method
attained a reasonably high diagnostic accuracy due to the rela-
tively cleaner nature of the CWRU dataset.
Next, the performance of the proposed method was evaluated
using the Jiangnan University dataset, which contains more
noisy signals than the CWRU dataset. Fig.11(b) shows the ex-
perimental results of the proposed method, indicating that the
Mf1-FedAvg algorithm still outperforms FedAvg in terms of di-
agnostic accuracy despite the noise. However, the achieved ac-
curacy is lower than that of the CWRU dataset, which is ex-
pected given the increased noise levels at the Jiangnan Univer-
sity dataset.
Based on the findings presented in Fig.12, it is evident that the
MF1-FedAvg algorithm demonstrates a significantly smaller Fig. 13. Effect of sample size on model performance for different cases of
number of iterations required for convergence in comparison to two data sets.
11
the traditional FedAvg algorithm for the two real data sets ex-
amined. Specifically, the number of iterations is reduced by
approximately 15% in the MF1-FedAvg algorithm relative to
traditional FedAvg. This reduction in the number of iterations
can be attributed to a modification in the weighting strategy
utilized in traditional FedAvg. The MF1- FedAvg algorithm
employs a weighting strategy based on the multiclassification
F1 score, which assigns higher weights to customers with
higher-quality data during each iteration of joint learning. Con-
sequently, the aggregated model obtained from a single ag-
gregation exhibits higher quality, reducing the number of iter-
ations required to reach convergence in Federated learning.
3.4 Performance evaluation of MF1-Fedavg al-

gorithm under different parameters
In this subsection, we present an experimental investiga-
tion of the influence of model parameters. Specifically, we ex- (a)
plore accuracy in two distinct real-bearing fault datasets. The
experimental findings, illustrated in Fig.13, demonstrate that
the classification accuracy is notably reduced when the fault
category exhibits a small sample dimension. This perfor-
mance decrease is attributed to the insufficiency of data fea-
tures contained in the sample when the sample dimension is
limited. On the contrary, as the sample dimension is aug-
mented, a progressive rise in the classification accuracy is
observed.
Nevertheless, beyond a certain point, the effect of the sam-
ple dimension on the classification accuracy is minimal, and
we set the sample dimension to 1024 for both datasets under
consideration. As depicted in Fig.13, the experimental results
reveal that increasing the sample dimension from 1024 to
2048 only yields marginal improvements in accuracy. How-
ever, such an increase in sample dimension significantly am-
plifies the computational complexity of the federal learning
process and negatively impacts the algorithm's convergence
speed. Therefore, the sample dimensions of the datasets in (b)
Fig. 14. MSRANet model visualization results of dataset 1
this paper are all set to 1024.
3.5 Visualizations of Learned Features

To further demonstrate the feature learning ability of this
model for bearing fault signals, the t-SNE technology [28] is
introduced to visualize the features learned by the model.
The high-level data representations in the first fully connected
layer in the network are focused on.Fig.14 shows the feature
map obtained by t-SNE for the CWRU dataset. From Fig.14,
the result shows that different types of faults are clustered to-
gether by this proposed model, which further proves that the
model had a good performance in fault classification and
identification.
12
multiclassification F1 scores. Additionally, we present a mul-

tiscale attention network model, denoted as MSRANet, which
empowers local clients to identify and emphasize relevant fault
features. This is accomplished by extracting fault data at various
scales and utilizing an attention module to highlight key features.
To assess the effectiveness of the proposed method, experi-
ments were conducted on two bearing fault datasets. The per-
formance of the MSRANet model was compared against con-
ventional deep learning models, revealing superior diagnostic
accuracy. Furthermore, we compared the performance of MF1-
FedAvg with traditional FedAvg, demonstrating a substantial im-
provement in fault classification accuracy with the former.
Through the incorporation of weights derived from the federated
averaging algorithm, our proposed method effectively enhances
the performance of local models while maintaining performance
levels similar to those of centralized models.
This study presents a promising solution for identifying and
(a) classifying faults in bearing devices, validating the efficacy of
federated learning in fault identification. However, the proposed
method in this study has limitations. Similar to the FedAvg algo-
rithm, our approach is primarily designed for data pertaining to
the same type of bearings and may not be suitable for experi-
mental data involving different types of bearings. Future re-
search should carefully address the challenge of non-independ-
ent data distribution, with a focus on transfer learning, to over-
come these limitations.
Acknowledgments
The research work is supported by the National Natural Sci-
ence Foundation of China, grant number 62001262, and the Na-
ture Science Foundation of Shandong Province, grant number
ZR2020QF008.
Nomenclature-----------------------------------------
(b) 𝑀𝑎𝑙𝑙 : Global fault diagnosis model
Fig. 15. MSRANet model visualization results of dataset 2 𝑀𝑓𝑒𝑑 : Global Federated Learning fault diagnosis model
C : Channel
The fault diagnosis confusion matrix and t-SNE visualization MLP : Multilayer perceptron
results for dataset 2 are shown in Fig.15.The experimental re- r : Reduction rate
sults in Fig.15 prove that the MSRANet model still has good fault F : Feature map
classification and diagnosis ability on dataset 2, which verifies b : Batch size
that the proposed MSRANet model has good generalization 𝛼 : Learning rate
ability. Therefore, the model has reference significance in the 𝑆𝑡 : Randomly chosen subset of clients
fault diagnosis of rolling bearings. E : Epoch
𝑤 : Model parameters
4. Conclusions
In this paper, we introduce a novel federated learning algo- References
rithm, denoted as MF1-FedAvg, built upon the established Fed- [1] H. T. Shi, L. Guo, S. Tan, X. T. Bai and J. Sun, Rolling Bearing
erated Average (FedAvg) algorithm. The proposed approach Initial Fault Detection Using Long Short-Term Memory Recur-
aims to mitigate issues associated with low-quality client data in rent Network, IEEE Access, 7 (2019) 171559-171569.
traditional federated learning methodologies. This is achieved [2] J. Li, X. Li and D. He, A Directed Acyclic Graph Network Com-
through the integration of a weighting strategy that incorporates bined With CNN and LSTM for Remaining Useful Life Prediction,
IEEE Access, 7 (2019) 75464-75475.
13
[3] Q. Liu and C. Huang, A Fault Diagnosis Method Based on 104(1) (2022) 1-19.
Transfer Convolutional Neural Networks, IEEE Access, 7 (2019) [19]S. H. Gao, M. M. Cheng, K. Zhao, X. Y. Zhang, M. H. Yang and
171423-171430. P. Torr, Res2Net: A New Multi-Scale Backbone Architecture,
[4] K. Bonawitz, F. Salehi, J. Konečný, B. Mcmahan and M Grute- IEEE Transactions on Pattern Analysis and Machine Intelli-
ser, Federated Learning with Autotuned Communication-Effi- gence, 43 (2) (2021) 652-662.
cient Secure Aggregation, 2019 53rd Asilomar Conference on [20]Hu J, Shen L, Sun G. Squeeze-and-excitation networks//Pro-
Signals, Systems, and Computers, (2019). ceedings of the IEEE conference on computer vision and pat-
[5] T. Han, D. Jiang, Q. Zhao, L. Wang and K. Yin, Comparison of tern recognition.( 2018) 7132-7141.
random forest, artificial neural networks and support vector ma- [21]H. B. Mcmahan, E. Moore, D. Ramage, S. Hampson and B. Ar-
chine for intelligent diagnosis of rotating machinery, Transac- cas, Communication-Efficient Learning of Deep Networks from
tions of the Institute of Measurement and Control, 40 (8) (2017) Decentralized Data, Proceedings of the 20 th International Con-
2681-2693. ference on Artificial Intelligence and Statistics, (2017).
[6] T. Han, D. Jiang, Y. Sun, N. Wang and Y Yang, Intelligent Fault [22]K. Bonawitz, F. Salehi, J. Konečný, B. Mcmahan and M Grute-
Diagnosis Method for Rotating Machinery via Dictionary Learn- ser, Federated Learning with Autotuned Communication-Effi-
ing and Sparse Representation-Based Classification, Measure- cient Secure Aggregation, 2019 53rd Asilomar Conference on
ment, 118 (2018) 181-193. Signals, Systems, and Computers, (2019).
[7] C. Liu, D. Jiang and W. Yang, Global geometric similarity [23]W. A. Smith and R. B. Randall, Rolling element bearing diag-
scheme for feature selection in fault diagnosis, Expert Systems nostics using the Case Western Reserve University data: A
with Applications, 41 (8) (2014) 3585-3595. benchmark study, Mechanical Systems and Signal Processing,
[8] W. Zhang, X. Li, X. Jia, H. Ma and X. Li, Machinery fault diag- 64-65 (2015) 100-131.
nosis with imbalanced data using deep generative adversarial [24]K. Li, School of Mechanical Engineering, Jiangnan University,
networks, Measurement, 152 (2019) 107377. (2019). http://madnet.org.
[9] X. Li, W. Zhang, H. Ma, Z. Luo and X. Li, Data alignments in [25]He, K. , Zhang, X. , Ren, S. , & Sun, J. Deep residual learn-
machinery remaining useful life prediction using deep adversar- ing for image recognition. IEEE. (2016).
ial neural networks, Knowledge-Based Systems, 197 (2020) [26]Van der Maaten L, Hinton G. Visualizing data using t-SNE.
105843. J.Mach.Learn.Res, (2008), 9(11).
[10]F. Jia, Y. Lei, N. Lu and S. Xing, Deep normalized convolutional
neural network for imbalanced fault classification of machinery Author information
and its understanding via visualization, Mechanical Systems
and Signal Processing, 110 (2018) 349-367. Xiuyan Liu received the Ph.D. degree in
[11]H. Liu, J. Zhou, Y. Zheng, W. Jiang and Y. Zhang, Fault diagno- computer application technology from the
sis of rolling bearings with recurrent neural network-based auto- Ocean University of China, Qingdao,
encoders, Isa Transactions, 77 (2018) 167-178. China, in 2017. She is currently an Asso-
[12]M. Zhao, S. Zhong, X. Fu, B. Tang and M. Pecht, Deep Residual ciate Professor with the School of Infor-
Shrinkage Networks for Fault Diagnosis, IEEE Transactions on mation and Control Engineering, Qingdao
Industrial Informatics, 16 (7) (2020) 4681-4690. University of Technology, Qingdao. Her
[13] ZHANG Zhenliang, LIU Junqiang, HUANG Liang, et al. A bear- current research interests include deep
ing fault diagnosis method based on semi-supervised and trans- learning, mechanical fault diagnosis, and advanced signal pro-
fer learning[J]. Journal of Beijing University of Aeronautics and cessing.
Astronautics, 2019, 45(11) 2291-2300.
[14]H. B. Mcmahan, E. Moore, D. Ramage, S. Hampson and B. Ar- Chunqiu Pang is a Master’s student at
cas, Communication-Efficient Learning of Deep Networks from the School of Information and Control
Decentralized Data, Proceedings of the 20 th International Con- Engineering, Qingdao University of
ference on Artificial Intelligence and Statistics, (2017). Technology, Qingdao, China. His cur-
[15]Z. Li, Z. Li, Y. Li, J. Tao, Q. Mao and X. Zhang, An Intelligent rent research interests include deep
Diagnosis Method for Machine Fault Based on Federated learning and Federated Learning, and
Learning, Applied Sciences, 11 (24) (2021) 12117. their applications in bearing fault diagno-
[16]W. Zhang, X. Li, H. Ma, Z. Luo and X. Li, Federated learning for sis.
machinery fault diagnosis with dynamic validation and self-su-
pervision, Knowledge-Based Systems, 213 (1) (2021) 106679.
[17]Q. Wang, Q. Li, K. Wang, H. Wang and P. Zeng, Efficient fed-
erated learning for fault diagnosis in industrial cloud-edge com-
puting, Computing, 103 (10) (2021) 2319-2337.
[18]D. Geng, H. He, X. Lan and Chang Liu, Bearing fault diagnosis
based on improved federated learning algorithm, Computing,
14
Agreement (Submission/ Copyright Transfer) Click here to access/download;Agreement (Submission/
Copyright Transfer);JMST_Submission Agreement.doc
SUBMISSION AGREEMENT
The Korean Society of Mechanical Engineers
#702 KSTC (New Bldg.), 22, 7-gil, Teheran-ro, Gangnam-gu, Seoul 06130, Korea
Tel: +82-2-501-3605, Fax: +82-2-501-3649, E-mail: [email protected]
- Please fill out this Submission Agreement form and upload it to EM System when you
submit your manuscript to JMST.
- You may add lines if it is required.
- Incomplete forms will be rejected.
Title of
Manuscript
First name Last name Affiliated Signature

institute
1st author Xiuyan Liu Qingdao University

of Technology
2nd author Chunqiu Pang Qingdao University

of Technology
3rd author Tingting Guo Qingdao University

of Technology
4th author Donglin He Qingdao

University of
Technology
Author(s) agree to submit the above manuscript to Journal of Mechanical

Science and Technology for review.
Date: ________2024/1/21________________________
Corresponding Author’s Signature: ___________________________________

Mest D 24 00142

Uploaded by

Copyright:

Available Formats

Mest D 24 00142

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mest D 24 00142

Uploaded by

Copyright:

Available Formats

Journal of Mechanical Science and Technology

An improved MF1-FedAvg based Federated Learning method with MSRANet for

Manuscript Number: MEST-D-24-00142

Article Type: Original Paper

G. Bell et al. / Journal of Mechanical Science and Technology 23 (2009) 1261~1269

An improved MF1-FedAvg based Federated Learning

Correspondence to: Tingting Guo / [email protected]

1. Introduction sharing data between different companies and plants, which is

successfully developed and applied to mechanical fault diagno-

2 introduces the proposed method, and Section 3 experimen-

2.2 Multiscale Residual Attention Network

the bearing faults. This feature map is subsequently com-

2.3 Federated Learning with Macro-F1

model-weighted aggregation strategy, building upon the tradi- 19: wt 1  i 1

model aggregation weighting strategy and the algorithm is

(2) SVM: It is a generalized linear classifier class that classi-

Table 3. Configuration of CNN model parameters.

Layer Filter Kernel_size Strides Padding Activation

Table 4. Configuration of MSRAN model parameters.

Layer Filter Kernel_size Strides Padding Activation

achieves the highest accuracy rate on both datasets. The pre-

3.2.3 Performance of the proposed model on

Fig.9. Moreover, the proposed model exhibits the smallest ac-

3.3 Performance comparison of FedAvg and

3.4 Performance evaluation of MF1-Fedavg al-

3.5 Visualizations of Learned Features

multiclassification F1 scores. Additionally, we present a mul-

First name Last name Affiliated Signature

1st author Xiuyan Liu Qingdao University

2nd author Chunqiu Pang Qingdao University

3rd author Tingting Guo Qingdao University

4th author Donglin He Qingdao

Author(s) agree to submit the above manuscript to Journal of Mechanical

Corresponding Author’s Signature: ___________________________________

You might also like