Relay-Assisted Federated Edge Learning
Relay-Assisted Federated Edge Learning
system optimization
Chen, L., Fan, L., Lei, X., Duong, T. Q., Nallanathan, A., & Karagiannidis, G. K. (2023). Relay-assisted federated
edge learning: performance analysis and system optimization. IEEE Transactions on Communications.
https://doi.org/10.1109/tcomm.2023.3263566
Published in:
IEEE Transactions on Communications
Document Version:
Peer reviewed version
Publisher rights
© 2023 IEEE.
This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher.
General rights
Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other
copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated
with these rights.
Abstract—In this paper, we study a relay-assisted federated intelligent paradigm namely federated learning (FL) was pro-
edge learning (FEEL) network under latency and bandwidth posed to enable multiple users to train a global model without
constraints. In this network, N users collaboratively train a transmitting the sensitive data [5]–[8]. In this framework, the
global model assisted by M intermediate relays and one edge
server. We firstly propose partial aggregation and spectrum FL server periodically selects some users as the candidates to
resource multiplexing at the relays in order to improve the join each round’s training. Then, the selected users calculate
communication of the relay-assisted FEEL system. Furthermore, the training loss, update the weights and transmit the local
we derive analytical and asymptotic expressions of the system models to the server. Once they received, the server can
outage probability and convergence rate. For the purpose of aggregate the models and repeat the whole procedure until
improving the system performance, we further optimize the relay-
assisted FEEL network by maximizing the number of users it converges [9]–[11].
who participate in each round of federated learning, through In the same time, mobile edge computing (MEC) has
allocation of the wireless bandwidth among users and relays. become one of the most advanced technologies for reducing
Specifically, two bandwidth allocation (BA) schemes have been communication latency and energy consumption [12], [13].
proposed, assuming either instantaneous or statistical channel For example, MEC could be used for video transmission
state information (CSI). Simulations show the advantages of the
proposed BA schemes over other benchmarks, regarding the to suppress jamming [14], where the compression parameter
accuracy and convergence rate of the considered relay-assisted and power control were optimized by reinforcement learning.
FEEL network. Besides, similar concept was used to decide offloading against
Index Terms—Federated learning, edge learning, relay, outage jamming attacks and interference in [15], which could achieve
probability, Internet of Things. a significant reduction in latency and energy consumption.
Therefore, FL can be used in the MEC scenarios, where the
mobile users perform distributed learning and transmit the
I. I NTRODUCTION trained models to be aggregated at the edge server, called
federated edge learning (FEEL) [16]–[18]. The FEEL per-
Recently, fast-growing applications of the Internet of Things formance depends on the number of successfully participated
(IoT) have generated an explosive amount of data to drive users in the federated learning, which is however limited by
artificial intelligence (AI), widely applied in wireless com- the communication overhead, due to practical constraints, such
munication, image processing and other fields [1]–[4]. The as latency and bandwidth [19]–[21]. To reduce the commu-
centralized AI applications need to aggregate distributed data nication overhead, a physical-layer quantization scheme was
from users into the server for training, which is hard to be proposed to upload training models, where the compromise
achieved due to privacy concerns. To tackle this issue, an between FEEL performance and quantization ratio was re-
vealed [22]. Also, to further cope with this overhead, the
L. Chen and L. Fan are both with School of Computer Science, Guangzhou
University, Guangzhou 510006, China (e-mail: [email protected], system resources of FEEL networks can be exploited to sup-
[email protected]). port more users to successfully participate into the federated
X. Lei is with the School of Information Science and Technology, Institute learning [23], [24]. For instance, the trade-off between the
of Mobile Communications, Southwest Jiaotong University, Chengdu 610031,
China (e-mail: [email protected]). communication overhead and computational capability was
T. Q. Duong is with the School of Electronics, Electrical Engineer- investigated in [25], by dividing the deep model into several
ing and Computer Science, Queens University Belfast, Belfast, BT7 sub-models, where the authors enabled heterogeneous mobile
1NN, UK and also with the Department of Electronic Engineering,
Kyung Hee University, Yongin-si, Gyeonggi-do 17104, South Korea (e- users to select models of appropriate size to reduce the amount
mail:[email protected]) of transmitted data. In addition, the system resources such
A. Nallanathan is with the School of Electronic Engineering and Com- as bandwidth can be optimized among the users, in order
puter Science, Queen Mary University of London, London, U.K (e-mail:
[email protected]). to meet practical requirements such as latency and energy
G. K. Karagiannidis is with Aristotle University of Thessaloniki, Greece consumption, by exploiting the channel state information (CSI)
and is also with Cyber Security Systems and Applied AI Research Center, [26], [27].
Lebanese American University (AUL), Lebanon (e-mail: [email protected]).
The work of Lisheng Fan was supported in part by NSFC under Grant Besides the above techniques, relays can be deployed in
62271158 and Grant 62101145; and in part by the Natural Science Foundation FEEL to decrease the communication overhead and thus,
of Guangdong Province under Grant 2021A1515011392. The work of Xianfu enhance the system communication and learning performance.
Lei was supported in part by the National Natural Science Foundation of
China under Grants 61971360 and 62271420. In recent works, relaying has been proposed to be an effec-
The corresponding author of this paper is L. Fan. tive technology in wireless communication systems to extend
2
procedure for the relay selection and partial aggregation. 1 It is straightforward to adopt the DF relaying protocol to decode and
recover the model weights, in order to aggregate models at the relays in
A. System Model this paper. Note that this work can be extended to other relaying protocols,
like AF protocol with some minor modification. In particular, we can use the
A relay-assisted FEEL network is shown in Fig. 1, where summation property of wireless channels and introduce over-the-air computing
N users collaboratively train a global model assisted by one technology to aggregate models without decoding and re-modulation.
3
where wk denotes the model parameter of user Uk , and applied for the considered relay-assisted federated framework.
L(wk , x) is the corresponding loss function. As the data is Without loss of generality, the well-known FedAvg is adopted
distributed, it is generally difficult to solve (1) directly. Hence, in this work to aggregate the local trained models, given by
FL tends to be used by employing an iterative algorithm to X |Dk |
train a global model from the users. Specifically, for each wmt+1
= P vkt+1 , (5)
round, user Uk calculates the training loss, and then the Uk ∈Jm Uk ∈Jm |Dk |
weights are updated using the gradient descending as where the aggregation at the relay Rm is synchronized,
vk ←
− wk − η∇Fk (wk ), (3) which can help reduce the communication overhead and avoid
model staleness, by contrast with the asynchronous federated
where vk is the updated model parameter of user Uk and η learning. Although the synchronous federated learning has the
denotes the learning rate. After that, the updated local models limitation of waiting for slow learners, i.e., stragglers, such
from multiple users are gathered and aggregated at ES. limitation can be alleviated through setting a latency threshold
to drop out slow users and using proper resource allocation to
C. Relay-assisted FEEL avoid a long time waiting.
Note that in the above FedAvg, the problem of “objective
In the considered relay-assisted FEEL, the intermediate
inconsistency” may arise due to the heterogeneity in the size
relays cooperatively assist the model exchange between users
of local dataset and local SGD iteration among users [38].
and ES, to extend the coverage and enhance the transmission
This is because the aggregated model will be biased towards
reliability. Moreover, the relays can perform the operation of
the users with more SGD iterations, which eventually affects
aggregation early to cut down the communication cost.
the federated learning performance. To tackle this problem,
Next, we present in detail the procedure of the FEEL assist-
we can use the important works [34], [38] and especially the
ed by the relays under the paradigm of FedAvg. Specifically,
normalized cumulative gradients to replace the FedAvg, given
the global model parameter is initialized to w0 , and then the
by
global model is updated in a number of rounds. At each round, !
we can divide the model update into the following four steps. X |Dk | X |Dk | v t+1 − wt
t+1 t k
1) User sampling and model broadcast: At this step, ES wm = w + ek .
|DJm | |DJm | ek
firstly selects a group of users for each round t. It then Uk ∈Jm Uk ∈Jm
(6)
broadcasts the global model parameters wt of the previous
If not specified, FedAvg will be used for aggregating the local
round to the selected users with the help of the relays. In
trained models in the subsequent sections.
particular, ES may uniformly select the user subset K out
4) Global Aggregation: At this step, each relay needs to
of N users without replacement, where |K| = K is the user
send its aggregated model to the edge server via the second-
number in the user subset K. Note that the uniform selection
hop relaying link. After gathering all the models from the
can be applied to many scenarios where the importance of
relays, the ES can perform the aggregation as
users is unknown or identical [35]–[37], and it can guarantee
the unbiasedness of the model aggregation with full client X |DJm |
wt+1 = P t+1
wm . (7)
participation in each round. For other scenarios where the users Rm ∈R |D J m
|
Rm ∈R
have different importance, importance-aware scheduling can
be adopted to enhance the federated learning performance.
D. Problem Formulation
2) Local model update: At this step, user Uk firstly sets the
initial local model parameters as wkt+1 = wt , after receiving For the considered relay-assisted FEEL system under laten-
the global model parameters wt from ES. Then, user Uk trains cy and bandwidth constraints, we can optimize the system
its model on its local dataset. Specifically, user Uk conducts performance through minimizing the global loss function,
E epochs of SGD on its local dataset, where there are totally given by
ek = E |Dbk | SGD iterations, and b is the mini-batch size. N
1 X X
Therefore, the local model will be updated in a total of ek P0: min L(wk , x). (8)
times, and in each SGD iteration holds that |D|
k=1 x∈Dk
vkt+1,j+1 ←
− vkt+1,j − ηt+1 ∇Fk (wkt+1,j ; ξk ), (4) However, obtaining an exact expression for the global loss
function of FEEL is generally hard, which causes much diffi-
where j ∈ {1, · · · , ek } is the local SGD iteration index, and culty in solving the optimization in problem P0. To overcome
ξk is the data batch uniformly chosen from the local dataset this difficulty, we turn to perform some analysis on the system
Dk . performance, as shown in the following section.
3) Relay selection and partial aggregation: After finishing
the local update, user Uk needs to transmit its updated weight III. S YSTEM PERFORMANCE ANALYSIS
vkt+1 to a selected intermediate relay Rm . Let Jm denote the
user subset uploaded to relay Rm , and |DJm | is total the A. Latency analysis
training sample amount in the user subset Jm . After receiving The latency is a critical performance metric in the FEEL
and decoding the local models, relay Rm will aggregate the network, as it determines whether the users can finish the
collected models, where some aggregation method can be model training and model upload in time or not. When the
4
devices fail to accomplish uploading in time, the effective From (11), the transmission latency from user Uk to relay
number of successfully participated users will decrease, caus- Rm∗k is given by
ing deterioration in the convergence of federated learning.
I |L|
In the considered relay-assisted FEEL, the latency of each Tk,m ∗ =
I
, (13)
IoT device is related to the computational capability, wireless
k Rk,m ∗
k
channel quality, and relay selection. The latency of local where |L| is the size of the uploaded model. After receiving
training and global aggregation may significantly affect the all the model parameters from the user set Jm∗k , relay Rm∗k
system training performance. Thus, investigating the latency aggregates the local model according to (5). Then, relay Rm∗k
for the considered relay-assisted FEEL is very important. needs to transmit the aggregated model to ES, where the
The total latency of user Uk is denoted as Tktotal , which corresponding transmission data rate from relay Rm∗k to ES
consists of both the local training latency and the uplink is
latency. Note that the downlink latency is ignored in this !
II II
Pm∗k |gm∗k |2
work, as it is generally much smaller than the uplink latency, Rm∗k = Bm∗k log2 1 + , (14)
because the transmit power at the server can be much larger. σ2
Specifically, the local training latency Tklocal of user Uk is given where Pm∗k denotes the transmit power at relay Rm∗k , gm∗k
by denotes the instantaneous channel parameter of the link Rm∗k –
ek bρ ES, and it follows Rayleigh fading with E[|gm∗k |2 ] = λm∗k .
Tklocal = , (9)
fk In this paper, the relays work in a time-division multiplex-
ing mode, where the dual hops share the same frequency
where CPU needs ρ cycles to process one sample training, and resources, i.e.,
fk denotes the computational capability at user Uk . Then, the BmII
∗ =
X
BkI . (15)
local trained model needs to be uploaded to ES via the uplink k
Uk ∈Jm∗
relaying links. In this paper, we perform the relay selection k
where Pout,k is the outage probability of Uk in the process of Proof: See Appendix A.
FL, given by
Thus, a lower bound on the system outage probability can
I II
Pout,k = Pr[Tktotal > γth ] = Pr[Tklocal + Tk,m ∗
k
+ Tm ∗
k
> γth ]. be obtained in Theorem 2
(21) Theorem 2. A lower bound on the system outage probability
is given by
To analyze the system outage performance of the relay-
K
assisted FEEL, we need first to derive the outage probability of lb 1 X lb
Pout = Pout,k
user Uk . In practice, the local training latency of user Uk can K
k=1
be regarded deterministic, as it is not affected by the stochastic
fk |L| ln 2
1 − exp
K
" !
nature of the channels. Hence, we can re-write 1 X AII
k
(γth fk −dk )
= 1 − exp
K λm∗k ζm∗k
I II dk k=1
Pout,k = Pr Tk,m ∗ + Tm∗ > γth −
k k fk 1 − exp B I f(γk |L|f ln−d
2
M
!!!#
Y
k th k ) k
" # × 1− 1 − exp .
|L| |L| dk λk,m ζk
= Pr I
+ II > γth − m=1
Rk,m ∗ Rm∗ fk (26)
k k
I II
Rk,m ∗ Rm ∗ f k Proof: By applying Theorem 1 into (20), the lower bound
k k
= Pr < , (22)
on the system outage probability can be proved.
|L| RI ∗ + RII∗ γth fk − dk
k,mk mk
Note that the above bound contains elementary functions
where dk = ek bρ denotes the CPU cycles needed to finish only, which can be easily computed. Therefore, the system
local training for user Uk . outage probability can be easily evaluated in the whole range
As deriving an exact closed-form solution to Pout,k from of SNR.
(22) is generally hard, we turn to use the inequality of xy/(x+ To obtain more insights on the system design of the relay-
y) < min(x, y) for positive x and y 3 , and then obtain a tight assisted FEEL, we use (26) to provide an approximate expres-
upper bound for the first form in (22) as, lb
sion for Pout , when high SNR region is assumed
I II
Rk,m ∗ Rm∗ 1
k k
< I II
min(Rk,m ∗ , Rm∗ ). (23)
fk |L| ln 2
−1
K M exp
!
I
|L| Rk,m∗ + Rm∗ II |L| k k
1 X Y I
Bk (γth fk −dk )
lb
k k Pout ' 1− 1−
K m=1
λk,m ζk
Then, substituting (23) into (22), we can obtain the lower k=1
k |L| ln 2
bound on the outage probability of user Uk , which can be exp AIIf(γ −1
!!
k th fk −dk )
analytically solved, as shown in Theorem 1, × 1− , (27)
λm∗k ζm∗k
Theorem 1. A lower bound on the outage probability of user
Uk is where the Taylors series approximation of lim e−x ' 1 − x
x→0
lim 1 −
is applied [42]. We further use the approximation of x→0
fk |L| ln 2
lb
1 − exp AII
k
(γth fk −dk ) y→0
Pout,k =1 − exp (1 − x)(1 − y) ' x + y and get the asymptotic expression of
λm∗k ζm∗k lb
Pout for high SNR as
K M
YM 1 − exp B I f(γk |L|f ln−d
2
)
lb
Pout '
1 X Y
exp
fk |L| ln 2
−1 λk,m ζk
k th k k
× 1 − 1 − exp . K BkI (γth fk − dk )
m=1
λk,m ζk k=1 m=1
| {z }
O1
(24) !
fk |L| ln 2 asy
Pm∗ + exp − 1 λm∗k ζm∗k = Pout .
where ζk = Pσk2 and ζm∗k = σ2k are the transmit SNRs at the AII
k (γth fk − dk )
| {z }
user Uk and relay Rm∗k , respectively, and AII
k is given by O2
(28)
K−1
X
K −1 K −2 Note that the above asymptotic expression contains two
AII
k = I
Bk + I
(Btotal − Bk )
i=1
i i−1 parts, where the first part O1 depends on the transmission
i K−i−1 K−1 between users and relays, while the second part O2 depends on
1 1 I 1 asy
× 1− + Bk 1 − , the transmission between relays and edge server. From Pout ,
M M M several insights on the FL system can be obtained,
(25)
• The first part O1 decays exponentially with factor M ,
3 Note that in this inequality, the approximation error is large when x is which indicates that the M intermediate relays can be
equal to y, and the approximation accuracy improves when x differs from fully exploited.
y. In general, x is often different from y due to random wireless channels, • When relay number M is large, the first part O1 ap-
resulting in a fine approximation accuracy on average. Due to these reasons,
the inequality of xy/(x + y) < min(x, y) is widely used in the existing proaches to 0, and the second part O2 will dominate in the
works such as [39]–[41]. system outage probability, indicating that the transmission
6
between the relays and edge server becomes the system Proof: See Appendix B.
bottleneck.
From Theorem 3, we can conclude that for the relay-
• The outage performance of the relay-assisted FEEL sys-
assisted FEELP with partial user participation and user dropout,
tem improves with a larger λk,m and λm∗k , revealing that a N 2 2 2 2 2 2
better transmission channel can enhance FL transmission.
the terms of k=1 pk δk , 6LΓ, 8(e − 1) G , and 4e G H
dominate the convergence performance. Specifically, the term
• Both O1 and O2 are decreasing with respect to BkI and PN 2 2
p δ
k=1 k k is related to the mini-batch SGD used in the
AII
k , indicating that a larger bandwidth of user Uk and local training, and the term 6LΓ is related to non-i.i.d data
intermediate relays m∗k will improve the system outage
distribution of user data. In particular, the convergence upper
performance.
bound decreases monotonically with Γ, and when Γ becomes
zero, i.e., i.i.d. dataset, the term 6LΓ can be removed. More-
C. Convergence Analysis over, the terms 8(e − 1)2 G2 and 4e2 G2 H are both related
The convergence of the relay-assisted FEEL is now ana- to the distributed SGD algorithm and the model aggregation,
lyzed, which is of vital importance for the FL training. For where the term 4e2 G2 H also shows that the effective number
this purpose, we first introduce the following assumptions, of participated users directly affects the convergence upper
Assumption 1: For any user Uk , Fk (·) is µ-strongly convex, bound, revealing that a larger outage probability will dete-
i.e., for any w0 and w1 , riorate the convergence rate seriously. Thus, it is critical to
µ enhance the convergence performance through reducing the
Fk (w1 ) ≥ Fk (w0 ) + (w1 − w0 )T ∇Fk (w0 ) + kw1 − w0 k2 . number of users dropped from the FEEL training, by designing
2
(29) a bandwidth allocation scheme for the considered system.
Assumption 2: For any user Uk , Fk (·) is L-smooth, i.e., for IV. BANDWIDTH ALLOCATION
any w0 and w1 , Inspired by the above convergence results that more users
L successfully participating in each round’s learning process
Fk (w1 ) ≤ Fk (w0 ) + (w1 − w0 )T ∇Fk (w0 ) + kw1 − w0 k2 . can improve the convergence in Theorem 3, problem P0
2
(30) is reformulated as maximizing the successfully participated
user number in each round’s FL by allocating the wireless
Assumption 3: For ξk uniformly and randomly sampled from bandwidth among users and intermediate relays, given by
the local dataset Dk , the variance of user Uk is bounded for
K
all k by X
P1: max Keff = I(Tktotal ≤ γth ) (33a)
{BkI ,Bm
II |U ∈U ,R ∈R}
E k∇Fk (w; ξk ) − ∇Fk (w)k2 ≤ δk2 .
k m
(31) X
k=1
II
s.t. Bm ≤ Btotal , (33b)
Assumption 4: For all users, the expected second-order
Rm ∈R
moment of the norm of the stochastic gradient is uniformly X
BkI = Bm
II
, (33c)
bounded by E k∇Fk (w; ξk )k2 ≤ G2 .
addition to the above assumptions, we use the term Γ =
In P Uk ∈Jm
N
F ∗ − k=1 pk Fk∗ to quantify the degree of non-i.i.d, where F ∗ where (33b) and (33c) are the bandwidth constraints at the
and Fk∗ are the minimum values of F and Fk , respectively. We relays and users, respectively. These two bandwidth constraints
can find from Γ’s definition that the data distribution is i.i.d if also indicate that multiple users will collaborate or compete
Γ = 0, or non-i.i.d otherwise. Moreover, in order to simplify with each other in the frequency domain, which can be
the analysis, we change the timeline to SGD iterations and found in many application scenarios where the users employ
assume that all users have the same e SGD iterations in the some orthogonal frequency resources to communicate, such
convergence analysis. as OFDMA systems. On the other hand, if the users employ
From the above assumptions, the convergence performance the same frequency resource to communicate simultaneously,
of the relay-assisted FEEL can be analyzed, which is presented co-channel interference will arise, and multiple users have
in Theorem 3. to collaborate or compete in some other domains, such as
n o the power domain in multiuser NOMA systems. In this case,
Theorem 3. Under Assumption 1-4, with ψ = max 8 L , e ,
µ the proposed framework of performance analysis and system
2
and ηt = µ(ψ+t) , the convergence should satisfiy optimization in this paper is still applicable, and the results in
this work can serve as a useful benchmark for the federated
E[F (wT ) − F ∗ ] learning with multiuser interference, which can help obtain
" N
L 2 X some insights on the system design.
≤ p2k δk2 + 6LΓ + 8(e − 1)2 G2 In the following, the optimization problem is solved by
µ(ψ + T ) µ
k=1 exploiting the instantaneous or statistical CSI, where flexible
! #
µψ 2 choices can be provided for the system optimization.
+ 4e2 G2 H + w0 − w∗ , (32)
2
A. Instantaneous Bandwidth Allocation
PN N −K(1−Pout ) 0
where H = k=1 pk K(1−Pout ) , and w is the initial value For the instantaneous bandwidth allocation method, the edge
of the global model weights. server needs to make bandwidth allocation decision at each
7
∗
time slot, so that the instantaneous bandwidth allocation tends II
Algorithm 1: Bisection search of Bm ∗
and αk,m
to be used in the system which is sensitive to the performance
of communication and training. Due to the indicator function 1 Input Btotal , Jm ;
and the coupling of constraints (33b) and (33c), the problem 2 Blower = 0, Bupper = Btotal ;
P1 is hard to be directly solved. Thus, we propose to solve 3 while Blower < Bupper do
this problem by dividing it into two sub-problems: minimizing 4 Bmid = (Blower + Bupper )/2;
the total bandwidth required for the selected users and choos- 5 For Uk ∈ Jm , calculate the bandwidth ratio αk,m
ing some users to be dropped out from the FEEL process. according
P to (36) with Bmid ;
Specifically, for the first sub-problem, we relax the problem 6 if Uk ∈Jm αk,m < 1 then
P1 by removing the bandwidth constraint (33b), so that all the 7 Blower
P = Bmid ;
relays can be allocated by the required bandwidth, in order to 8 else if Uk ∈Jm αk,m > 1 then
support the selected users to successfully participate in FEEL 9 Bupper
P = Bmid ;
process. The first sub-problem can be given by 10 else if Uk ∈Jm αk,m = 1 then
II ∗ ∗
X 11 Bm = Bmid , αk,m = αk,m , Uk ∈ Jm ;
II
P2: min Bm (34a) 12 break;
{BkI ,αk,m |Uk ∈U ,Rm ∈R}
Rm ∈R 13 end
|L| |L| 14 end
s.t. Tklocal
+ II r I
+ II II ≤ γth , ∀Uk ∈ U, II ∗ ∗
αk,m Bm k,m B m rm 15 Output Bm , {αk,m |Uk ∈ Jm }
(34b)
X
αk,m = 1, (34c)
Uk ∈Jm
0 ≤ αk,m ≤ 1, (34d) With Bmid as the bandwidth allocated to relay Rm , we then
for each user Uk ∈ Jm and sum up all αk,m .
calculate αk,m P
|2
I P |h ∗
where we have rk,m = log2 1 + k σk,m
2 , By comparing Uk ∈Jm αk,m with
P 1, we can halve the search
2
region with Bupper = Bmid if Uk ∈Jm αk,m > 1, or halve
II
rm = log2 1 + Pmσ|g2m | , and αk,m is the bandwidth
the search region with Blower = Bmid otherwise. The search
allocation ratio from Prelay Rm to user Uk , which satisfies process will continue until the constraint (35b) is satisfied,
0 ≤ αk,m ≤ 1 and Uk ∈Jm αk,m = 1. Constraint (34b) which finally outputs the optimal αk,m for Uk ∈ Jm and
guarantees that all users can successfully participate in BmII
.
the training process. Constraints (34c) and (34d) are the
reformulation of (33c) using BkI = αk,m BmII
as the bandwidth We proceed to solve the second sub-problem when the total
allocated to user Uk from relay Rm . We can find that the bandwidth needed exceeds the system total bandwidth, i.e.,
II
P
optimal solution of P2 should satisfy the conditions given in Rm ∈R m > Btotal . In this case, the participating users
B
Theorem 4, should be adjusted and certain users have to be dropped out
∗
to satisfy the bandwidth constraint (33b). Here, a greedy algo-
II ∗
Theorem 4. For relay Rm , the optimal Bm and αk,m to rithm is utilized to solve the second sub-problem. Specifically,
solve problem P2 should satisfy the user with the largest αk,m Bm II
, i.e., the user occupies the
largest bandwidth, will be dropped out from the FEEL process.
|L| |L| After removing the firstly dropped outP user, we continue to
Tklocal + ∗ II ∗ I + II ∗ II = γth , (35a) solve problem P2 until the constraint Rm ∈R Bm II
≤ Btotal
αk,m Bm rk,m Bm rm
X ∗ is satisfied. In this way, we finally solve problem P1 with
αk,m = 1, (35b) the instantaneous CSI. The greedy based bandwidth alloca-
U ∈J
k m
∗
tion algorithm with the instantaneous CSI is summarized in
0 ≤ αk,m ≤ 1, (35c)
II Algorithm 2.
Bm ≥ 0. (35d)
Proof: See Appendix C.
From Theorem 4, we can observe that there is one and only
one solution to (35) because of the monotonicity and non- B. Statistical Bandwidth Allocation
∗ II ∗ II ∗
trivial value of αk,m and Bm . Moreover, with a given Bm ,
we can get the optimal value of αk,m as
Besides the above instantaneous BA method, we also pro-
II vide a statistical bandwidth allocation, which is performed
∗ rm |L|
αk,m = II ∗ (γ − local I II I
. (36) once for many time slots and applicable to the system that
Bm th Tk )rk,m rm − rk,m |L|
is sensitive to the computational complexity of bandwidth
∗
II
With (36), we can obtain a numerical value of Bm by using allocation at the price of some performance deterioration
an efficient searching algorithm based on the bisection method, compared with the instantaneous bandwidth allocation. In
as shown in Algorithm 1. In particular, we start the search this case, we turn problem P1 into optimizing the statistical
with the middle point Bmid of an initial range [Blower , Bupper ]. expectation of the successfully participated user number in
8
Algorithm 2: Greedy based bandwidth allocation algorith- Algorithm 3: PSO based bandwidth allocation algorithm
m 1 Input U, R, Jm , Btotal , I, T , ω, ϕ1 , ϕ2 ;
1 Input U, R, Jm , Btotal ; 2 Initialize Create I particles randomly;
II ∗ ∗
2 For Rm ∈ R, Solve Bm , αk,m using Algorithm 1; 3 for t = 1 to T do
∗
II
P
3 while Rm ∈R Bm > Btotal do 4 for i = 1 to I do
0 0 ∗ II ∗ Update vit by (39), and update pti by (40);
4 Uk , Rm = arg maxUk ∈U ,Rm ∈R αk,m Bm ; 5
5 BkI 0 = 0; 6 if Ffitness (pti ) ≤ Ffitness (pbestti ) then
6 U = U \ Uk 0 , Jm0 \ Uk 0 ; 7 pbestti = pti ;
II ∗ ∗ end
7 Solve Bm , αk,m using Algorithm 1 with U and 8
10 0 0.8
0.7
10 -1
0.6
10 -2
Test accuracy
Outage probability
0.5
0.4
10 -3
0.3
-4
10 Simulation (M = 1) Instantaneous BA
0.2
Analytical LB (M = 1) Statistical BA
Asymptotic LB (M = 1) UA
0.1 UA-wo-PA
10 -5 Simulation (M = 2)
Ideal FEEL
Analytical LB (M = 2)
Asymptotic LB (M = 2) 0
0 50 100 150 200 250 300 350 400 450 500
10 -6
15 20 25 30 35 Communication round
Transmit SNR (dB) (a) Test accuracy
0.8 0.8
0.7
0.7
0.6
0.6
Test accuracy
0.5
Test accuracy
0.5
0.4
0.4 Instantaneous BA (M = 1)
0.3 Instantaneous BA (M = 2)
Statistical BA (M = 1)
Instantaneous BA 0.3 Statistical BA (M = 2)
0.2
Statistical BA UA (M = 1)
UA UA (M = 2)
0.1 UA-wo-PA 0.2 UA-wo-PA (M = 1)
Ideal FEEL UA-wo-PA (M = 2)
Ideal FEEL
0
0 50 100 150 200 250 300 350 400 450 500 0.1
0.8 1 1.2 1.4 1.6 1.8
Communication round
Latency threshold γ th
(s)
(a) Test accuracy
1.8
gradients in the aggregation can help solve the inconsistency
1.6 problem and enhance the system performance.
1.4 Fig. 5 is provided to show the test accuracy of the several
BA schemes versus γth , where M ∈ {1, 2} and the system
1.2
latency threshold varies from 0.8s to 1.8s. We can observe that
1 for all the aforementioned schemes except the ideal FEEL one,
0.8
the test accuracy gets improved with a larger system threshold,
as a larger threshold can allow more users successfully to
0.6
0 50 100 150 200 250 300 350 400 450 500 participate in FEEL. Moreover, for all the aforementioned
Communication round schemes, the performances with two relays are better than
(b) Training loss those with only one relay, since more relays can help improve
the model transmission rate. In further, the UA and UA-wo-
Fig. 4. Test accuracy and training loss through aggregating the normalized PA schemes have a lower test accuracy than the instantaneous
cumulative gradients.
and statistical BA schemes. In particular, when the latency
threshold is low, the relay-assisted FEEL system using the
UA and UA-wo-PA schemes can not even train an effective
and statistical bandwidth allocation schemes outperform UA, model. This is because that only very few users can success-
showing the effectiveness of the two bandwidth allocation fully participate in FEEL under those schemes. However, the
schemes. Furthermore, the instantaneous bandwidth allocation proposed instantaneous and statistical BA schemes can achieve
scheme can achieve a better near-optimal convergence rate and sufficiently good performance for various latency thresholds,
test accuracy than the statistical bandwidth allocation scheme, which proves that instantaneous and statistical BA schemes
indicating that the instantaneous CSI can help maximize the can provide a feasible bandwidth allocation strategy for the
number of users who can successfully participate in FL at each relay-assisted FEEL.
round more effectively. Fig. 6 shows the impact of Btotal on the test accuracy of
Fig. 4(a) and Fig. 4(b) show the test accuracy and training the several bandwidth allocation schemes, where the relay
loss of the aforementioned BA schemes versus the communi- number M ∈ {1, 2}, γth = 1.2s, and Btotal varies from 50MHz
cation round through aggregating the normalized cumulative to 100MHz. The test accuracy improvements are observed
gradients, where Btotal = 60MHz, and γth = 1.2s. We for all the aforementioned schemes except the ideal FEEL
can observe that the proposed instantaneous and statistical one, as Btotal increases, indicating that a larger bandwidth can
BA schemes outperform UA, proving the effectiveness of help increase the transmission rate of the models. Moreover,
the two bandwidth allocation schemes when aggregating the we can see that with the number of relays increasing from
normalized cumulative gradients. Moreover, aggregating the 1 to 2, all the bandwidth allocation schemes get improved
normalized cumulative gradients can provide a better per- because more relays can help enhance the outage performance
formance with an improved test accuracy of 1%-1.5% than and allow more users successfully participate in FEEL. In
simply aggregating the trained models in the FedAvg, which further, the proposed instantaneous and statistical BA schemes
11
0.8
of the relay-assisted FEEL.
0.75
0.65
In this article, a relay-assisted FEEL system was studied un-
der latency and bandwidth constraints, where we evaluated the
Test accuracy
0.68 Instantaneous BA
Statistical BA A PPENDIX A
0.66 UA
UA-wo-PA
P ROOF OF T HEOREM 1
0.64 Ideal FEEL
To prove Theorem 1, we substitute (23) into (22), and then
0.62 the lower bound of Pout,k is,
0.6 1 I II 1
Pout,k ≥ Pr min(Rk,m ∗ , Rm∗ ) <
0.58 |L| k k γth − Tklocal
1 I II 1
0.56 = 1 − Pr min(Rk,m ∗ , Rm∗ ) ≥
1 2 3 4 5 |L| k k γth − Tklocal
Relay number M
I |L|
= 1 − 1 − Pr Rk,m ∗ <
Fig. 7. Test accuracy of the several BA schemes versus M . k γth − Tklocal
II |L|
× 1 − Pr Rm∗k < . (A.1)
γth − Tklocal
outperform the other bandwidth allocation schemes for a wide
range of Btotal , and they can achieve almost the same accuracy After some manipulations, we can further have,
as the ideal FEEL. These results further verify the proposed |L|
B I (γth −T local )
bandwidth allocation schemes. 2 k k − 1
Pout,k ≥ 1 − 1 − Pr |hk,m∗k |2 <
In Fig. 7, the influence of the relay number on the test ζk
accuracy of the several BA schemes is studied, where the relay
|L|
number varies from 1 to 5, Btotal = 60MHz, and γth =
(
B II∗ γth −T local
k )
1.2s. We can observe from this figure that all the bandwidth 2 2 m
k − 1
×
1 − Pr |gm∗k | <
allocation schemes are improved with a larger M , as spatial ζm∗k
diversity and better transmission connections can be provided
for model uploading. Moreover, the proposed instantaneous |L|
M (
B I γth −T local ) − 1
and statistical BA schemes are superior to the other bandwidth Y 2 k k
= 1 − 1 − Pr |hk,m |2 <
allocation schemes, including UA and the UA-wo-PA schemes. ζk
m=1
In particular, when there are four relays in the network, the
|L|
proposed instantaneous and statistical BA schemes can achieve
(
B II∗ γth −T local
k )
a better test accuracy, at least 5.1% and 3.8% higher than 2 2 m
k − 1
×
1 − Pr |gm∗k | <
.
that of the UA and UA-wo-PA schemes. These results indicate ζm∗k
that the proposed instantaneous and statistical BA schemes can
efficiently exploit multiple relays and improve the performance (A.2)
12
As |hk,m |2 and |gm∗k |2 are exponentially distributed with Proof: Using (B.1), we have
E[|hk,m |2 ] = λk,m and E[|gm∗k |2 ] = λm∗k , the analytical lower
N
bound on Pout,k is written as, I(k ∈ K, Tktotal ≤ γth )|Dk |
X
E[w̄ ] = E wt−e +
t
P total
k=1 k∈K I(Tk ≤ γth )|Dk |
|L| ln 2
M 1 − exp I γ −T local × (vkt − wt−e )
Y Bk ( th k )
Pout,k ≥1−
1 −
1 − exp
m=1
λk,m ζk
N
X |Dk |
! = wt−e + K
N (1 − Pout )|D|
|L|
1 − exp B II∗ (γth −T local ) k=1
× E I(k ∈ K, Tktotal ≤ γth ) (vkt − wt−e )
m k
k
× E exp
. (A.3)
λ m ∗ ζm∗
k k
XN
= wt−e + pk (vkt − wt−e )
k=1
XN
We can
observe from (A.3) that = pk vkt = v̄ t . (B.2)
|L|
exp 1 − exp B II γ −T local /λm∗k ζm∗k is concave k=1
m∗ ( th k )
k
II
for a positive Bm ∗ . By using Jensen’s inequality, we have
k
PN −K(1−Pout )
Using the convexity of the second-order norm, we further in which H = k=1 pk N K(1−Pout )
. For brevity, we rewrite
have (B.10) as
" N
N · I(k ∈ K, Tktotal ≤ γth )
X ∆t+1 ≤ (1 − µηt )∆t + C, (B.11)
Q1 ≤ E pk −1
K(1 − Pout ) h
2
i
k=1
2 where ∆t+1 = E w̄t+1 − w∗ and
t+1 t+1−e
× (vk − w ) N
X
N
C= p2k δk2 + 6LΓ + 8(e − 1)2 G2 + 4e2 G2 H. (B.12)
2
X N − K(1 − Pout ) k=1
= pk · E vkt+1 − wt+1−e .
K(1 − Pout ) v
k=1 Then, we use the recurrence method to prove that ∆t ≤ ψ+t ,
(B.6) n
β2 C
o
where v = max ψ∆0 , µβ−1 . First, for t = 0, we have
v
∆0 ≤ ψ+0 ≤ ∆0 . For t > 0, we have
h 2
i
We can write E vkt+1 − wt+1−e in (B.6) as
2
∆t+1 ≤ (1 − µηt )∆t + ηt2 C
t
β2C
h i
2 t+ψ−1 µβ − 1
X
E vkt+1 − wt+1−e = E ηi ∇Fk (wki ; ξki ) = v+ − v
i=t+1−e (t + ψ)2 (t + ψ)2 (t + ψ)2
" t
# 1
X 2 ≤ . (B.13)
≤E e· ηi ∇Fk (wki ; ξki ) t+ψ+1
i=t+1−e
" i=t+1−e
# Therefore, we have
2
X 2
≤E ηt+1−e e · ∇Fk (wki ; ξki ) L h 2
i
t
E[F (wT ) − F ∗ ] ≤ E w̄t+1 − w∗
2 " 2N
≤ ηt+1−e e2 G2 ≤4ηt+1
2
e2 G2 ≤ 4ηt2 e2 G2 , (B.7) L 2 X 2 2
≤ pk δk + 6LΓ + 8(e − 1)2 G2
where we use the Cauchy-Schwarz inequality for the first µ(ψ + T ) µ
k=1
inequality, and we assume that ηt is non-increasing with
! #
2 2 µψ 0 ∗ 2
respect to t and ηt ≤ 2ηt+E to derive other inequalities. Then, + 4e G H + w −w . (B.14)
2
we can bound the first part Q1 as
h 2
i In this way, we have proven Theorem 3.
Q1 =E w̄t+1 − w∗
N
X N − K(1 − Pout ) A PPENDIX C
≤ pk · 4ηt2 e2 G2 . (B.8)
K(1 − Pout ) P ROOF OF T HEOREM 4
k=1
II
To prove Theorem 4, we start from αk,m > 0 and Bm >0
For the second part Q2 , its bound can be found in [43], which to have
can still hold for this paper. Thus, according to [43],
n weo have I
dRk,m Pk |hk,m |2
that for any round t, if we choose ψ = max 8 L d
µ,e and II
= αk,m Bm log2 1 +
2
ηt = µ(ψ+t) , the second part Q2 is bounded as dαk,m dαk,m σ2
Pk |hk,m |2
II
= Bm log2 1 + > 0, (C.1)
σ2
h 2
i h i
2
Q2 = E v̄ t+1 − w∗ ≤ (1 − µηt )E w̄t − w∗
N
X
! and
+ ηt2 p2k δk2 + 6LΓ + 8(e − 1)2 G2 . (B.9) I
dRk,m Pk |hk,m |2
d II
k=1
II
= II
αk,m Bm log2 1 +
dBm dBm σ2
2
For the third part Q3 , we can derive from Lemma 2 that Q3 Pk |hk,m |
= αk,m log2 1 + > 0, (C.2)
equals to 0 due to the unbiasedness of w̄t+1 . By summa- σ2
rizing
h the above three parts, we have that, for any round t,
i I
E w̄t+1 − w∗
2
is bounded as we can see from (C.1) and (C.2) that Rk,m monotonically
II
increases with αk,m and Bm . Therefore, for αk,m > 0,
II I II
h 2
i h
2
i and Bm > 0, we have that Tk,m and Tm monotonically
E w̄t+1 − w∗ ≤ (1 − µηt )E w̄t − w∗ II
decrease with αk,m and Bm . Moreover, the system latency
is determined by the slowest user. Therefore, we can achieve
N
!
X
+ ηt2 p2k δk2 + 6LΓ + 8(e − 1) G + 4e G H 2 2 2 2
, the optimal solution of P2 if and
II
P only if: 1) all of the
k=1 bandwidth Bm is allocated (i.e., k∈Jm αk,m = 1) and 2)
(B.10) all selected users have the same total latency of γth (i.e.,
14
I II
Tktotal = Tklocal + Tk,m + Tm = γth ). So the optimal solution [18] J. Ren, Y. He, D. Wen, G. Yu, K. Huang, and D. Guo, “Scheduling
can be given by for cellular federated edge learning with importance and channel aware-
ness,” IEEE Trans. Wirel. Commun., vol. 19, no. 11, pp. 7690–7703,
2020.
Tktotal + α∗ B|L|
II ∗ I + B II|L|
∗ II = γth ,
[19] X. Cao, G. Zhu, J. Xu, Z. Wang, and S. Cui, “Optimized power control
k,m m rk,m m rm
design for over-the-air federated edge learning,” IEEE J. Sel. Areas
∗
P
Uk ∈Jm αk,m = 1,
(C.3) Commun., vol. 40, no. 1, pp. 342–358, 2022.
∗
0 ≤ αk,m ≤ 1, [20] H. Sun, X. Ma, and R. Q. Hu, “Adaptive federated learning with gradient
compression in uplink NOMA,” IEEE Trans. Veh. Technol., vol. 69,
B II ≥ 0.
m no. 12, pp. 16 325–16 329, 2020.
[21] H. Yang, J. Zhao, Z. Xiong, K. Lam, S. Sun, and L. Xiao, “Privacy-
∗
Because of the monotonicity and non-trivial value of αk,m preserving federated learning for UAV-enabled networks: Learning-
∗
and Bm II
, there is one and only one solution to (C.3), which based joint scheduling and resource management,” IEEE J. Sel. Areas
Commun., vol. 39, no. 10, pp. 3144–3159, 2021.
finishes the proof of Theorem 4. [22] S. Zheng, C. Shen, and X. Chen, “Design and analysis of uplink and
downlink communications for federated learning,” IEEE J. Sel. Areas
Commun., vol. 39, no. 7, pp. 2150–2167, 2021.
R EFERENCES [23] M. Salehi and E. Hossain, “Federated learning in unreliable and
resource-constrained cellular wireless networks,” IEEE Trans. Commun.,
[1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: vol. 69, no. 8, pp. 5136–5151, 2021.
Applications, trends, technologies, and open research problems,” IEEE [24] J. Xu and H. Wang, “Client selection and bandwidth allocation in
Netw., vol. 34, no. 3, pp. 134–142, 2020. wireless federated learning networks: A long-term perspective,” IEEE
[2] S. Deng, H. Zhao, W. Fang, J. Yin, S. Dustdar, and A. Y. Zomaya, Trans. Wirel. Commun., vol. 20, no. 2, pp. 1188–1200, 2021.
“Edge intelligence: The confluence of edge computing and artificial [25] S. Tang, L. Chen, K. He, J. Xia, L. Fan, and A. Nallanathan, “Compu-
intelligence,” IEEE Internet Things J., vol. 7, no. 8, pp. 7457–7469, tational intelligence and deep learning for next-generation edge-enabled
2020. industrial IoT,” IEEE Trans. Netw. Sci. Eng., vol. PP, no. 99, pp. 1–12,
[3] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edge 2023.
AI: Intelligentizing mobile edge computing, caching and communication [26] Z. Zhao, J. Xia, L. Fan, X. Lei, G. K. Karagiannidis, and A. Nallanathan,
by federated learning,” IEEE Netw., vol. 33, no. 5, pp. 156–165, 2019. “System optimization of federated learning networks with a constrained
[4] W. Zhou, J. Xia, and F. Zhou, “Profit maximization for cache-enabled latency,” IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 1095–1100, 2022.
vehicular mobile edge computing networks,” to appear in IEEE Trans. [27] Y. Wang, Y. Xu, Q. Shi, and T. Chang, “Quantized federated learning
Veh. Technol., pp. 1–6, 2023. under transmission delay and outage constraints,” IEEE J. Sel. Areas
[5] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Commun., vol. 40, no. 1, pp. 323–341, 2022.
“Communication-efficient learning of deep networks from decentralized [28] Q. Bie, Y. Liu, Y. Wang, X. Zhao, and X. Y. Zhang, “Deployment
data,” in Proc. AISTATS, vol. 54, 2017, pp. 1273–1282. optimization of reconfigurable intelligent surface for relay systems,”
[6] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and IEEE Trans. Green Commun. Netw., vol. 6, no. 1, pp. 221–233, 2022.
K. Chan, “Adaptive federated learning in resource constrained edge [29] J. Xia, L. Fan, W. Xu, X. Lei, X. Chen, G. K. Karagiannidis, and A. Nal-
computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. lanathan, “Secure cache-aided multi-relay networks in the presence of
1205–1221, 2019. multiple eavesdroppers,” IEEE Trans. Commun., vol. 67, no. 11, pp.
[7] W. Shi, S. Zhou, Z. Niu, M. Jiang, and L. Geng, “Joint device schedul- 7672–7685, 2019.
ing and resource allocation for latency constrained wireless federated [30] X. Li, R. Fan, H. Hu, N. Zhang, X. Chen, and A. Meng, “Energy-
learning,” IEEE Trans. Wirel. Commun., vol. 20, no. 1, pp. 453–467, efficient resource allocation for mobile edge computing with multiple
2021. relays,” IEEE Internet Things J., vol. 9, no. 13, pp. 10 732–10 750, 2022.
[8] A. Hammoud, H. Otrok, A. Mourad, and Z. Dziong, “On demand fog [31] S. Feng, D. Niyato, P. Wang, D. I. Kim, and Y.-C. Liang, “Joint service
federations for horizontal federated learning in iov,” IEEE Trans. Netw. pricing and cooperative relay communication for federated learning,” in
Serv. Manag., vol. 19, no. 3, pp. 3062–3075, 2022. 2019 International Conference on Internet of Things (iThings), 2019,
[9] Z. Zhao, C. Feng, W. Hong, J. Jiang, C. Jia, T. Q. S. Quek, and M. Peng, pp. 815–820.
“Federated learning with non-iid data in wireless networks,” IEEE Trans. [32] Z. Lin, H. Liu, and Y. A. Zhang, “Relay-assisted cooperative federated
Wirel. Commun., vol. 21, no. 3, pp. 1927–1942, 2022. learning,” IEEE Trans. Wirel. Commun., vol. 21, no. 9, pp. 7148–7164,
[10] Y. Zhan, J. Zhang, Z. Hong, L. Wu, P. Li, and S. Guo, “A survey of 2022.
incentive mechanism design for federated learning,” IEEE Trans. Emerg. [33] Z. Qu, S. Guo, H. Wang, B. Ye, Y. Wang, A. Y. Zomaya, and B. Tang,
Top. Comput., vol. 10, no. 2, pp. 1035–1044, 2022. “Partial synchronization to accelerate federated learning over relay-
[11] M. Wazzeh, H. Ould-Slimane, C. Talhi, A. Mourad, and M. Guizani, assisted edge networks,” IEEE Trans. Mob. Comput., vol. 21, no. 12,
“Privacy-preserving continuous authentication for mobile and iot sys- pp. 4502–4516, 2022.
tems using warmup-based federated learning,” IEEE Netw., pp. 1–7, [34] S. Hosseinalipour, S. Wang, N. Michelusi, V. Aggarwal, C. G. Brinton,
2022. D. J. Love, and M. Chiang, “Parallel successive learning for dynamic
[12] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward distributed model training over heterogeneous wireless networks,” CoRR,
an intelligent edge: Wireless communication meets machine learning,” vol. abs/2202.02947, 2022.
IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, 2020. [35] C. Shen, J. Xu, S. Zheng, and X. Chen, “Resource rationing for wireless
[13] W. Zhou, L. Fan, F. Zhou, F. Li, X. Lei, W. Xu, and A. Nallanathan, federated learning: Concept, benefits, and challenges,” IEEE Commun.
“Priority-aware resource scheduling for UAV-mounted mobile edge Mag., vol. 59, no. 5, pp. 82–87, 2021.
computing networks,” to appear in IEEE Trans. Veh. Technol., pp. 1–6, [36] H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial
2023. worker participation in non-iid federated learning,” in 9th International
[14] L. Xiao, Y. Ding, J. Huang, S. Liu, Y. Tang, and H. Dai, “UAV anti- Conference on Learning Representations, ICLR 2021, 2021.
jamming video transmissions with QoE guarantee: A reinforcement [37] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and
learning-based approach,” IEEE Trans. Commun., vol. 69, no. 9, pp. A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for on-
5933–5947, 2021. device federated learning,” CoRR, vol. abs/1910.06378, 2019.
[15] L. Xiao, X. Lu, T. Xu, X. Wan, W. Ji, and Y. Zhang, “Reinforcement [38] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the ob-
learning-based mobile offloading for edge computing against jamming jective inconsistency problem in heterogeneous federated optimization,”
and interference,” IEEE Trans. Commun., vol. 68, no. 10, pp. 6114– in NeurIPS 2020, 2020.
6126, 2020. [39] J. Tong and C. Zhong, “Full-duplex two-way AF relaying systems with
[16] R. Yu and P. Li, “Toward resource-efficient federated learning in mobile imperfect interference cancellation in Nakagami-m fading channels,” Sci.
edge computing,” IEEE Netw., vol. 35, no. 1, pp. 148–155, 2021. China Inf. Sci., vol. 64, no. 8, 2021.
[17] X. Huang, P. Li, R. Yu, Y. Wu, K. Xie, and S. Xie, “Fedparking: A [40] O. Waqar, H. Tabassum, and R. Adve, “Secure beamforming and
federated learning based parking space estimation with parked vehicle ergodic secrecy rate analysis for amplify-and-forward relay networks
assisted edge computing,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. with wireless powered jammer,” IEEE Trans. Veh. Technol., vol. 70,
9355–9368, 2021. no. 4, pp. 3908–3913, 2021.
15
[41] L. Fan, X. Lei, T. Q. Duong, M. Elkashlan, and G. K. Karagiannidis, Trung Q. Duong (Fellow, IEEE) received his B.Eng.
“Secure multiuser communications in multiple amplify-and-forward degree in electrical and electronics engineering from
relay networks,” IEEE Trans. Commun., vol. 62, no. 9, pp. 3299–3310, Bach Khoa Sai Gon (Vietnam) in 2002, the M.Sc.
2014. degree in computer science from Kyung Hee Uni-
[42] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and versity (South Korea) in 2005, the Ph.D. degree in
Products, 7th ed. San Diego, CA: Academic, 2007. telecommunications systems from Blekinge Institute
[43] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of Technology (Sweden) in 2012. In 2013, he joint
of fedavg on non-iid data,” in ICLR, 2020. Queen’s University Belfast (U.K.) as an academic
staff, where he is now a Chair Professor in Telecom-
munications. He also holds a prestigious Research
Chair of Royal Academy of Engineering. His current
research interests include quantum communications, wireless communications,
signal processing, machine learning, and realtime optimisation.
Dr. Duong has served as an Editor/Guest Editor for the IEEE Transac-
Lunyuan Chen received the bachelor’s degree in tions on Wireless Communications, IEEE Transactions on Communications,
Communication Engineering from Xidian university IEEE Transactions on Vehicular Technology, IEEE Communications Letters,
in 2019. He is currently pursuing the master degree IEEE Wireless Communications Letters, IEEE Wireless Communications,
with the school of Electronics and Communication IEEE Communications Magazines, and IEEE Journal on Selected Areas in
Engineering, Guangzhou University. His current re- Communications. Currently, he is serving as an Executive Editor for IEEE
search interests focus on statistical machine learning Communications Letters. He received the Best Paper Award at the IEEE
and deep learning. VTC-Spring 2013, IEEE ICC 2014, IEEE GLOBECOM 2016, 2019, 2022
IEEE DSP 2017, and IWCMC 2019. He is the recipient of prestigious Royal
Academy of Engineering Research Fellowship (2015-2020) and has won a
prestigious Newton Prize 2017.