Electronics 13 02135
Electronics 13 02135
Electronics 13 02135
Article
Enhancing Edge-Assisted Federated Learning with
Asynchronous Aggregation and Cluster Pairing
Xiaobao Sha 1,2 , Wenjian Sun 1, *, Xiang Liu 1 , Yang Luo 1,2 and Chunbo Luo 1,2, *
1 Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China,
Huzhou 313001, China; [email protected] (X.S.); [email protected] (X.L.);
[email protected] (Y.L.)
2 School of Information and Communication Engineering, University of Electronic Science and Technology of
China, Chengdu 611731, China
* Correspondence: [email protected] (W.S.); [email protected] (C.L.)
Abstract: Federated learning (FL) is widely regarded as highly promising because it enables the
collaborative training of high-performance machine learning models among a large number of clients
while preserving data privacy by keeping the data local. However, many existing FL frameworks
have a two-layered architecture, thus requiring the frequent exchange of large-scale model parameters
between clients and remote cloud servers over often unstable networks and resulting in significant
communication overhead and latency. To address this issue, we propose to introduce edge servers
between the clients and the cloud server to assist in aggregating local models, thus combining
asynchronous client–edge model aggregation with synchronous edge–cloud model aggregation. By
leveraging the clients’ idle time to accelerate training, the proposed framework can achieve faster
convergence and reduce the amount of communication traffic. To make full use of the grouping
properties inherent in three-layer FL, we propose a similarity matching strategy between edges
and clients, thus improving the effect of asynchronous training. We further propose to introduce
model-contrastive learning into the loss function and personalize the clients’ local models to ad-
dress the potential learning issues resulting from asynchronous local training in order to further
improve the convergence speed. Extensive experiments confirm that our method exhibits significant
improvements in model accuracy and convergence speed when compared with other state-of-the-art
Citation: Sha, X.; Sun, W.; Liu, X.; Luo,
Y.; Luo, C. Enhancing Edge-Assisted
federated learning architectures.
Federated Learning with
Asynchronous Aggregation and Keywords: federated learning; edge computing; Non-IID data
Cluster Pairing. Electronics 2024, 13,
2135. https://doi.org/10.3390/
electronics13112135
1. Introduction
Academic Editors: Luis Pires, Ricardo
Santos and Joaquim Monteiro Federated learning (FL) is a distributed machine learning framework for a large num-
ber of clients to collaboratively train machine learning models without sharing local data [1].
Received: 30 April 2024 It is an effective data privacy protection method in the current artificial intelligence environ-
Revised: 26 May 2024
ment that prevents the exposure of raw data [2], which has shown extensive applications in
Accepted: 28 May 2024
many fields, such as healthcare, the intelligent Internet of Things, and finance services [3].
Published: 30 May 2024
Traditional federated learning is typically structured with a cloud server connecting
multiple clients, thus requiring the exchange of a large number of model parameters during
multiple update iterations [1]. However, as the amount of participating clients increases,
Copyright: © 2024 by the authors.
the upstream bandwidth load in the network also increases, thus leading to network
Licensee MDPI, Basel, Switzerland. congestion and a decrease in the speed of upstream communication [4]. Such communica-
This article is an open access article tion inefficiency directly affects the overall training performance of the machine learning
distributed under the terms and model [4]. In order to alleviate the huge communication pressure towards the cloud server,
conditions of the Creative Commons existing studies have exploited novel architectures such as adding an edge server layer
Attribution (CC BY) license (https:// between clients and the cloud server [5]. The edge server acts as an intermediate in the
creativecommons.org/licenses/by/ system by grouping the clients and fusing the machine learning models, which is referred
4.0/). to as edge-assisted FL. By preaggregating local models at the edge layer, the amount of data
updated by the model can be reduced by an order of magnitude [6]. With the same band-
width, the number of clients occupying the channel is reduced, and the available bandwidth
is increased. This approach increases the bandwidth available for each client’s upstream
link, thereby significantly reducing the time and traffic overhead of transmitting the model
parameters from the clients to the server [7]. However, edge-assisted federated learning is
constrained by the system heterogeneity, where the clients vary in their computing and
communication capabilities; as well, data heterogeneity often happens in different clients
with diverse local data. Furthermore, the bottleneck between communication from the
cloud to clients is still a pressing issue [6]. For example, during the process of synchronous
global model aggregation, the clients remain idle, so a significant amount of waiting time is
wasted. While all clients perform asynchronous updates, a rather serious staleness effect
occurs if the clients’ data are not independently and identically distributed (Non-IID) to
a large extent. Therefore, there is a need to develop a set of synchronous–asynchronous
hybrid update strategy for edge-assisted federated learning.
In the context of federated learning, client devices are often resource-constrained in
wireless networks [8]. Therefore, the system needs to address a multiobjective problem,
such as achieving high global model accuracy in the shortest possible time [9]. However,
the data owned by each client are Non-IID and influenced by client data drift [10], which
significantly slows down the convergence of the global model [11]. Current studies have
attempted to solve the FL challenge caused by Non-IID data. Fedprox [12] introduces a
proximal term in the loss function to prevent local updates from deviating too far from
the initial global model. SCAFFOLD [13] adds a correction term in the local model update
formula to overcome gradient disparities. The MOON algorithm [14] draws inspiration
from contrastive learning and introduces a model-contrastive loss to ensure the closest
possible correspondence between the parameters of the global and local models. These
methods effectively alleviate the negative impact of Non-IID data. The three-layer archi-
tecture introduces data heterogeneity among the edges, which will be further amplified
by the cloud–edge–client model aggregation, thus leading to a decrease in the overall
model performance.
Clustered federated learning (CFL) investigates the potential relationships between
the data of each client [15] to address the Non-IID problem [16]. In CFL, clients are clustered
based on their similarity, and federated optimization is performed within each cluster. This
approach greatly improves the performance of the model within each cluster and enhances
efficiency [17]. Fundamentally, CFL is closely related to multitask learning: if the tasks
within each cluster are similar, they belong to the same learning task. Information exchange
between clusters—only occurring during the aggregation process with the cloud server—is
limited. Although multitask learning can train models that are highly adapted to each task
within a cluster [18], it is difficult to train a global model that can perfectly match all tasks.
The aforementioned works have demonstrated a clear correlation between edge–client
pairing and CFL. The edge layer in the three-tier FL architecture is paired with a subset of
clients to form a cluster. Multiple clusters can then collectively train a global model instead
of each cluster training its own separate model. The key challenge is how to maximize
the rich and heterogeneous information contained in each cluster to accelerate the global
model convergence.
Therefore, we enhance edge-assisted federated learning (EEFL) with asynchronous
aggregation and cluster pairing over wireless networks to address the slow convergence
and low accuracy of the global model caused by Non-IID training data. In summary, this
paper makes the following contributions:
• We propose a novel semi-asynchronous training and aggregation strategy to sig-
nificantly improve the convergence speed at the edge for the three-layer federated
learning architecture. In each training round, the clients in the coverage area of the
same edge server are asynchronously trained with the subglobal model obtained after
the edge aggregation. This strategy effectively trades the waiting time with local
model training to accelerate the convergence.
Electronics 2024, 13, 2135 3 of 16
2. Related Work
In this section, we review the existing research on federated learning and summarize
the differences between our work and the existing work. The survey of existing study
focuses on the solution strategies of edge-assisted FL, Non-IID data distribution, and cluster
pairing in wireless network scenarios.
2.1. Edge-Assisted FL
In the context of wireless networks, the clients of federated learning are constrained
by their geographical locations, thus resulting in dispersed data that require frequent
aggregations with the server in order to achieve model convergence [19]. When a large
number of devices communicates directly with the cloud through a wide area network, this
exacerbates congestion in the backbone network, thus leading to significant communication
delays [20]. In order to reduce communication costs and alleviate latency, two approaches
have been proposed. The first one is to reduce the amount of data transmitted in each
iteration [21], for example, model compression techniques such as network pruning [22] and
weight quantization [23]. And the other one is to decrease the communication frequency by
selecting client participation in training based on accuracy, data quality, or model update
importance [24,25].
Another stream of research is to introduce edge servers as a relay layer in order to mit-
igate the transmission overhead in the core network [5]. This is achieved by leveraging the
higher efficiency of client–edge communication, thus allowing for partial model aggrega-
tion. In wireless networks, where the distance between clients and edge servers is generally
closer compared to the cloud server, frequent parameter exchanges between edge servers
and clients can effectively accelerate the training process. Building upon this, the FedEdge
framework [26] further improves communication frequency by asynchronously updating
the edge servers with global model parameters received from the cloud server. Neverthe-
less, directly assigning the received asynchronous global model to the client at the edge
does not fully exploit the value of the asynchronous model due to the staleness effect.
Differently, our EEFL proposes the exponential moving average (EMA) on the global model
during training, which aims to preserve the historical training information.
global model. SCAFFOLD [13] adds a corrective term to the update formula of the local
model, thus overcoming gradient differences. Meanwhile, the MOON algorithm [14] draws
inspiration from contrastive learning and introduces a model-contrastive loss, thereby
maximizing the similarity between the parameter representations of the global and local
models. (2) Improving the aggregation method between the server and clients: FedMA [27]
utilizes Bayesian nonparametric methods, which identify matching sets of convolutional
filters and average them into the global convolutional filters. FedAvgM [28], on the other
hand, introduces a momentum mechanism when updating the global model on the server.
(3) Personalized federated learning: FedRep [29] divides the model into a base layer and a
personalized layer, thus proposing to use the base layer to learn a dimensionality-reduced
representation of the global feature representation between data, while the personalized
layer serves as a unique local head for each client to achieve personalization. In our
EEFL framework, we adopt the idea of personalization and improve the design of the loss
function for local training. Compared to the MOON algorithm, our approach, by splitting
the model into personalized and shared layers, not only reduces the divergence of model
weights during aggregation but also further enhances the local model’s ability to extract
personalized features.
N
| Di |
min f (w) =
w
∑∑ |D| i
f (w) (1)
j =1 i ∈C j
Electronics 2024, 13, 2135 5 of 16
where N is the total number of edges. We denote C j as the client set of the jth edge, and the
N
total number of clients can be defined as K = ∑ C j . The size of each local training dateset
j =1
is Di . The local loss function of the ith client is
| Di |
1
f i (w) =
| Di | ∑ L(w; x j , y j ) (2)
j =1
where ( x j , y j ) denotes the input and label of the training sample. L(w; x j , y j ) is the loss
function measuring the difference between the predicted label and true label (usually using
the crossentropy loss function).
We consider the cumulative number of communications with the cloud server as the
overall iteration count, which is denoted as T.
3.2. Motivation
Our ideas are derived from experiments conducted within the HierFAVG frame-
work [5] using the MNIST dataset [34] as an example. Within this framework, no specific
assignment or allocation between clients and edges is performed; instead, a random group-
ing approach is used. We consider the Non-IID data distribution scenario, where each
client possesses only 1–2 label distributions. We draw inspiration from clustered federated
learning, which aims to group clients with similar data distributions by exploiting the
similarity between local models. This strategy accelerates the convergence of the global
model within each group, thus resulting in a higher classification rate for the data labels
present in that group. Based upon this, we propose the following hypothesis. Considering
that the data distribution among the clients in one group is relatively diversified, and each
group contains a substantial number of training samples from different classes, will the
training process will be expedited?
To validate our hypothesis, we built a three-layer federated learning system consisting
of one cloud server, four edge servers, and 20 client devices. Each client device only
possesses samples from 1–2 classes in the MNIST dataset. Three different aggregation
strategies are compared:
• Edge-NIID: The data labels owned by clients under each edge are similar (i.e., each
edge only contains samples from 3–4 classes).
• Edge-IID: The data labels owned by clients under each edge are different (i.e., each
edge contains samples from 8–9 classes).
• Edge-random: The matching between edge servers and clients is performed entirely
at random, and the data distribution is random.
As shown in Figure 1, the global model converged slowest in the Edge-NIID scenario,
which is clearly unreasonable compared to the other two settings. In three-layer federated
learning, the information that the client can interact with is largely derived from the client
information under its paired edge coverage. If this portion of information tends to be
consistent, it will not be conducive to training a comprehensive model. On the other
hand, the convergence speeds of the Edge-IID and Edge-random strategies are similar,
thus achieving high accuracy. However, this does not imply that the random matching
approach is optimal. The pre-experiments only consider the data distribution among clients
while ignoring many other aspects like the data amount and data quality. In the following
sections, we will extend this idea and explore a novel edge-assisted federated learning
scheme designed to ensure more efficient convergence.
Electronics 2024, 13, 2135 6 of 16
1 .0
0 .8
T e st a c c u ra c y
0 .6
0 .4
E d g e -IID
E d g e -ra n d o m
E d g e -N IID
0 .2
0 2 0 4 0 6 0 8 0 1 0 0
C o m m u n ic a tio n ro u n d
4. Method
In this section, we first propose a scheme to enhance edge-assisted federated learning,
which is named as EEFL. Then, we sequentially introduce three innovative aspects of the
scheme, namely crosslayer asynchronous aggregation, data feature extraction, and im-
proved loss function.
Aggregated Synchronous
model aggregation
Asynchronous
Cloud sever Local dataset aggregation
...
Edge sever 1 Edge sever 2 Edge sever N
Figure 2. Framework of EEFL. To leverage the idle time of clients during the communication between
edges and cloud server, in each round of training, the edge models obtained by edge aggregation are
used to train the clients asynchronously.
After the aggregation is completed, the updated global model wut j is sent to each client
for the next round of reaggregation.
each group and enhance the overall efficiency. However, as evidenced in the experiments
shown in Figure 1, when the data distribution is scattered and each client only possesses
one or two labels, clustering similar clients into groups slows down the convergence speed
of the global model, even though the aggregated global model within each group achieves
higher recognition accuracy for that specific label. When similar clients are clustered into
one cluster and optimization is only performed on the models within the cluster, the variety
of data labels is limited, thus making it difficult to train a comprehensive global model.
Based on this premise, the clients were initially grouped based on their local models’
similarity. The goal was to ensure that the data distribution within each cluster encom-
passed as much characteristics of the entire dataset as possible. (i.e., client distributions
within clusters are dissimilar, while those between clusters are similar). We considered a
cluster as the coverage of an edge server. In conjunction with three-tier federated learning,
we designed the algorithm below to improve the accuracy of the final global model and
accelerate the convergence speed.
To measure the similarity of the clients and avoid excessive computational overhead,
we employed Singular Value Decomposition (SVD) on specific layers of local model. The
singular values Σi obtained from the decomposition serve as the model’s features and are
reduced to a one-dimensional array. It is recommended to use a specific layer close to the
input layer for analysis. Subsequently, we calculate the cosine similarity between different
groups of clients based on the singular values Σi , which is written as
D E
Σ i , Σi ′
′
sim(Σi , Σi ) = i ′ (4)
Σ · Σi
and thereby, we can construct a cosine similarity matrix S. Using the similarity matrix S,
we group the clients based on their similarity, thus dividing them into K/N groups arr.
Starting from the first client, we select K clients for each group in descending order of
similarity until all the groups are formed (ensuring that clients within each group have
similar data distribution characteristics). Each edge server randomly pairs with one client
from each group, thus resulting in N pairs (where the data distributions of clients within
each pair are dissimilar). The specific approach is given in Algorithm 1.
z gt
Maximize aggrement
z st
T ,t
Wlocal
Input
Local model
on t-th round
T ,t −1
Wlocal
Local model
on (t-1)-th round
In Equation (5), µ and γ represent the weights for the two types of comparison losses
individually. The first loss lcon1 represents the maximization of consistency between the
representations of the shared layers in the local model and the global model.
where ztg and zts−1 are the positive sample pairs and negative sample pairs of zts .
lcon2 represents the maximization of consistency between the representations of the
personalized layers in the local model between the current round and the historical rounds.
2
ztp ztp−1
lcon2 = − (7)
ztp ztp−1
Here, the historical models are updated using the EMA method.
( t −1 t
t −1
αwlocal + (1 − α)wlocal , t > t EMA
wlocal = t
(8)
wlocal , t ≤ t EMA
When the global update rounds have not reached the threshold t EMA , the local histori-
cal model is assigned in an one-time manner, thus indicating that the EMA update method
is not beneficial for the learning of the local model at this stage. Once the global round
exceeds the threshold t EMA , the local historical model will start to remember the historical
models using the EMA method. The EMA approach allows the historical model of clients
to comprehensively represent the information from all its previous rounds, rather than just
the information from the previous round alone.
5. Experiment
In this section, we present simulation results for our EEFL method to illustrate the
advantages of the new three-layer FL system.
layers followed by 2 × 2 max pooling and fully connected layers with ReLU activation.
Note that all baselines used the same network architecture as OURS for fair compari-
son. The detailed setup are displayed on Table 1. The hyperparameters µ and γ can be
determined by grid search in practice.
To simulate the data heterogeneity among clients similar to practical scenarios, in ad-
dition to the common IID setting, we employed different data distribution strategies for
different datasets. For MNIST and FashionMNIST, the samples in each client were drawn
from one to two randomly selected classes. For the CIFAR10 dataset, we utilized the
Dirichlet distribution [37] to generate Non-IID data partitions among participants. We
sample pk ∼ Dir N ( β) and allocate the proportion pk,j of class k instances to participant j,
where Dir ( β) is a Dirichlet distribution with concentration parameter β, which is defaulted
to 0.5. With the aforementioned partitioning strategy, each participant may have relatively
few (or even no) data samples in certain classes.
αcD f 2
Ecomp = , Ecomm = pTcomm (10)
2
Baseline models. We compared our proposed algorithm with both traditional centralized
methods and federated learning schemes for performance evaluation:
• Centralized learning: This scheme collects all the raw data to the cloud for training
and provides an upper bound for model accuracy.
• FedAVG [1]: A traditional cloud-based FL scheme with synchronous aggregation.
• FedProx [12]: A cloud-based FL scheme with synchronous aggregation, which adds a
proximal term in the local training process.
• MOON [14]: A cloud-based FL scheme with synchronous aggregation, which adds
extra model-contrasitive loss in the local training process.
• HierFAVG [5]: A cloud–edge–client hierarchical FL scheme that performs synchronous
update in both client–edge aggregation and edge–cloud aggregation.
It is worth noting that we adopted random client–edge pairing for the traditional
two-layer FL schemes (e.g., FedAvg, FedProx, and MOON) to extend to three-layer FL for
fair comparison. In addition, other algorithms were trained from scratch. However, since
this algorithm performs pretraining on the client side for similarity grouping, in order to
ensure fairness, the same number of pretraining rounds were deployed for other algorithms
as well. The update strategies were divided into synchronous and asynchronous types for
fair comparison.
Electronics 2024, 13, 2135 12 of 16
0 .8
0 .8
0 .8
0 .7
T e st a c c u ra c y
T e st a c c u ra c y
T e st a c c u ra c y
0 .6
0 .6
E E F L -a sy n
0 .6 E E F L -a sy n E E F L - a sy n E E F L -sy n
E E F L -sy n E E F L - sy n
M O O N -a sy n
M O O N -a sy n M O O N -a sy n
0 .4 M O O N -sy n
M O O N -sy n M O O N -sy n
F e d A V G -a sy n F e d A V G -a sy n 0 .5 F e d A V G -a sy n
F e d A V G -sy n F e d A V G -sy n F e d A V G -sy n
F e d P ro x -a sy n F e d P ro x -a sy n
F e d P ro x -a sy n
F e d P ro x -sy n F e d P ro x -sy n
F e d P R O X -sy n
0 .4 0 .2 0 .4
0 2 0 4 0 6 0 8 0 1 0 0 0 2 0 4 0 6 0 8 0 1 0 0 0 2 0 4 0 6 0 8 0 1 0 0
C o m m u n ic a tio n ro u n d C o m m u n ic a tio n ro u n d C o m m u n ic a tio n ro u n d
Table 2. The top accuracy in 100 rounds with cloud server and the communication rounds to achieve
target accuracy.
0 .9
0 .8
0 .8 6
0 .7
0 .8 5
T e st a c c u ra c y
0 .6 0 .8 4
0 .8 3
0 .5
0 .8 2
7 0 7 2 7 4 7 6 7 8 8 0
0 .4 E E F L (O U R S )
H iF la s h
R a n d o m p a irin g
0 .3
0 2 0 4 0 6 0 8 0 1 0 0
C o m m u n ic a tio n ro u n d
require obtaining the label distribution of clients in advance; instead, it directly pairs
through the models’ partial layers. Hence, our EEFL approach is safer and also facilitates
extension to other types of datasets.
0 .9
0 .8
0 .7
T e st a c c u ra c y
0 .6
0 .5
0 .4
0 .3 O u rs
C ro s s -e n tro p y lo s s + th e firs t c o m p a ris o n lo s s
C ro s s -e n tro p y lo s s + th e s e c o n d c o m p a ris o n lo s s
C ro s s -e n tro p y lo s s o n ly
0 .2
0 2 0 4 0 6 0 8 0 1 0 0
C o m m u n ic a tio n ro u n d s
The reason behind this improvement lies in the segmentation of the local models,
which not only retains the benefits of contrastive learning for the shared layer models but
also enhances the training process by incorporating the contrastive learning loss of the
personalized layer models. As a result, the extracted information from each model becomes
more diverse and comprehensive, thus leading to the observed improvement in the overall
accuracy of the global model.
6. Conclusions
In this paper, we have proposed a new edge-assisted federated learning (FL) scheme in
wireless network scenarios to address the issues of slow convergence speed and low global
Electronics 2024, 13, 2135 15 of 16
Author Contributions: Conceptualization, W.S.; methodology, X.S. and Y.L.; validation, X.L.; writing—
original draft preparation, X.S.; writing—review and editing, C.L. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was supported in part by the S&T Special Program of Huzhou under Grant
2022GZ14.
Data Availability Statement: Data is contained within the article.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-efficient learning of deep networks from
decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; PMLR:
New York, NY, USA, 2017; pp. 1273–1282.
2. Nasr, M.; Shokri, R.; Houmansadr, A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference
attacks against centralized and federated learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP),
San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 739–753.
3. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings,
R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [CrossRef]
4. Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.T.; Miao, C. Federated Learning in Mobile Edge
Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2019, 22, 2031–2063. [CrossRef]
5. Liu, L.; Zhang, J.; Song, S.H.; Letaief, K.B. Client-Edge-Cloud Hierarchical Federated Learning. In Proceedings of the IEEE
International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6.
6. Wu, Q.; Chen, X.; Ouyang, T.; Zhou, Z.; Zhang, X.; Yang, S.; Zhang, J. Hiflash: Communication-efficient hierarchical federated
learning with adaptive staleness control and heterogeneity-aware client-edge association. IEEE Trans. Parallel Distrib. Syst. 2023,
34, 1560–1579. [CrossRef]
7. Xu, M.; Fu, Z.; Ma, X.; Zhang, L.; Li, Y.; Qian, F.; Wang, S.; Li, K.; Yang, J.; Liu, X. From cloud to edge: A first look at public edge
platforms. In Proceedings of the 21st ACM Internet Measurement Conference, Virtual, 2–4 November 2021; pp. 37–53.
8. Tran, N.H.; Bao, W.; Zomaya, A.Y.; Nguyen, M.N.H.; Hong, C.S. Federated Learning over Wireless Networks: Optimization
Model Design and Analysis. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications,
Paris, France, 29 April–2 May 2019; pp. 1387–1395.
9. Chen, M.; Shlezinger, N.; Poor, H.V.; Eldar, Y.C.; Cui, S. Communication-efficient federated learning. Proc. Natl. Acad. Sci. USA
2021, 118, e2024789118. [CrossRef] [PubMed]
10. Jiang, M.; Wang, Z.; Dou, Q. HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical
Images. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 1087–1095.
11. Li, Q.; Diao, Y.; Chen, Q.; He, B. Federated Learning on Non-IID Data Silos: An Experimental Study. In Proceedings of the IEEE
38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 965–978.
12. Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc.
Mach. Learn. Syst. 2020, 2, 429–450.
13. Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.J.; Stich, S.U.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated
Learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5132–5143.
14. Li, Q.; He, B.; Song, D.X. Model-Contrastive Federated Learning. In Proceedings of the Computer Vision and Pattern Recognition
(CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10708–10717.
15. Ghosh, A.; Hong, J.; Yin, D.; Ramchandran, K. Robust federated learning in a heterogeneous environment. arXiv 2019,
arXiv:1906.06629.
16. Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An Efficient Framework for Clustered Federated Learning. IEEE Trans. Inf. Theory
2020, 68, 8076–8091. [CrossRef]
17. Gong, B.; Xing, T.; Liu, Z.; Xi, W.; Chen, X. Adaptive client clustering for efficient federated learning over non-iid and imbalanced
data. IEEE Trans. Big Data 2022, 99, 1–15. [CrossRef]
Electronics 2024, 13, 2135 16 of 16
18. Liu, B.; Guo, Y.; Chen, X. PFA: Privacy-preserving federated adaptation for effective model personalization. In Proceedings of the
Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 923–934.
19. Xia, W.; Wen, W.; Wong, K.; Quek, T.Q.S.; Zhang, J.; Zhu, H. Federated-Learning-Based Client Scheduling for Low-Latency
Wireless Communications. IEEE Wirel. Commun. 2021, 28, 32–38. [CrossRef]
20. Dinh, T.Q.; Nguyen, D.N.; Hoang, D.T.; Pham, T.V.; Dutkiewicz, E. In-Network Computation for Large-Scale Federated Learning
Over Wireless Edge Networks. IEEE Trans. Mob. Comput. 2021, 22, 5918–5932. [CrossRef]
21. Sattler, F.; Wiedemann, S.; Müller, K.R.; Samek, W. Robust and Communication-Efficient Federated Learning from Non-i.i.d. Data.
IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3400–3413. [CrossRef] [PubMed]
22. Liu, S.; Yu, G.; Yin, R.; Yuan, J.; Shen, L.; Liu, C. Joint Model Pruning and Device Selection for Communication-Efficient Federated
Edge Learning. IEEE Trans. Commun. 2022, 70, 231–244. [CrossRef]
23. Reisizadeh, A.; Mokhtari, A.; Hassani, H.; Jadbabaie, A.; Pedarsani, R. FedPAQ: A Communication-Efficient Federated Learning
Method with Periodic Averaging and Quantization. In Proceedings of the International Conference on Artificial Intelligence and
Statistics, Online, 26–28 August 2020; pp. 2021–2031.
24. Qu, Z.; Duan, R.; Chen, L.; Xu, J.; Lu, Z.; Liu, Y. Context-Aware Online Client Selection for Hierarchical Federated Learning. IEEE
Trans. Parallel Distrib. Syst. 2021, 33, 4353–4367. [CrossRef]
25. Wolfrath, J.; Sreekumar, N.; Kumar, D.; Wang, Y.; Chandra, A. HACCS: Heterogeneity-Aware Clustered Client Selection for
Accelerated Federated Learning. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium
(IPDPS), Lyon, France, 30 May–3 June 2022; pp. 985–995.
26. Wang, K.; He, Q.; Chen, F.; Jin, H.; Yang, Y. FedEdge: Accelerating Edge-Assisted Federated Learning. In Proceedings of the
ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; Association for Computing Machinery: New York, NY, USA,
2023; pp. 2895–2904.
27. Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020,
arXiv:2002.06440.
28. Hsu, T.M.H.; Qi, H.; Brown, M. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification.
arXiv 2019, arXiv:1909.06335.
29. Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Exploiting Shared Representations for Personalized Federated Learning. In
Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 2089–2099.
30. Duan, Q.; Huang, J.; Hu, S.; Deng, R.; Lu, Z.; Yu, S. Combining federated learning and edge computing toward ubiquitous
intelligence in 6G network: Challenges, recent advances, and future directions. IEEE Commun. Surv. Tutor. 2023, 25, 2892–2950.
[CrossRef]
31. Dennis, D.K.; Li, T.; Smith, V. Heterogeneity for the win: One-shot federated clustering. In Proceedings of the International
Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: New York, NY, USA, 2021; pp. 2611–2620.
32. Long, G.; Xie, M.; Shen, T.; Zhou, T.; Wang, X.; Jiang, J. Multi-center federated learning: Clients clustering for better personalization.
World Wide Web 2023, 26, 481–500. [CrossRef]
33. Duan, M.; Liu, D.; Ji, X.; Liu, R.; Liang, L.; Chen, X.; Tan, Y. FedGroup: Efficient Federated Learning via Decomposed
Similarity-Based Clustering. In Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Ap-
plications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking
(ISPA/BDCloud/SocialCom/SustainCom), New York, NY, USA, 30 September–3 October 2021; pp. 228–237.
34. Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 2012, 29, 141–142.
[CrossRef]
35. Krizhevsky, A.; Hinton, G. Learning multiple layers of features from tiny images. Handb. Syst. Autoimmune Dis. 2009, 1, 1–60.
36. Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017,
arXiv:1708.07747.
37. Yurochkin, M.; Agarwal, M.; Ghosh, S.; Greenewald, K.; Hoang, N.; Khazaeni, Y. Bayesian nonparametric federated learning of
neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019;
PMLR: New York, NY, USA, 2019; pp. 7252–7261.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.