Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices

Uploaded by

AnhNguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices

Uploaded by

AnhNguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Clustering-Enhanced Reinforcement Learning for

Adaptive Offloading in Resource-Constrained

Devices
Tran Anh Khoa, Minh-Son Dao, Do-Van Nguyen, Koji Zettsu
Big Data Integration Research Center
National Institute of Information and Communications Technology
Tokyo, Japan
{tran,dao,ngdovan,zettsu}@nict.go.jp

Abstract—The Federated Edge Artificial Intelligence (Edge occurs on the client side, and a separation layer breaks down
AI) deploys AI applications on Internet-of-Things (IoT) devices, the data for transmission to the server. The server then trains
addressing data privacy concerns in the real world. To achieve the remaining DNN layers using the decomposed data, sending
effective Federated Learning (FL), three challenges must be
addressed: i) limited computing power on devices, ii) non-uniform back the corrupted data gradient to devices through the sepa-
impacts on devices, and iii) adaptability to changing network ration layer. Devices use these gradients for backpropagation
conditions. This study introduces a new algorithm called Adaptive on the rest of the DNN. SL significantly reduces computation
Offloading Point (AOP), designed to accelerate local training on compared to FL by having fewer layers on the device side,
constrained devices. It decomposes deep neural network (DNN) where the entire DNN is trained.
layer blocks, enabling training on both client and server sides.
The novelty of the proposed method lies in using a reinforcement While SL benefits collaborators on specific devices, its
learning-based Gaussian mixture model (GMM) clustering to dy- practicality is limited due to time-intensive sequential training
namically determine DNN layer offloading, addressing nullability, on all devices. This limitation led to the development of
computation uniformity, training time, and network bandwidth split federated learning (SFL) [7], aiming for parallel FL and
variation issues. Experimental results on real devices, using vision accelerated device training within the SL algorithm. However,
transformer (ViT) models with an identification image dataset,
show that AOP’s training time is significantly faster than that of current SFL-based research needs more consideration for
previous baseline methods. optimal partitioning strategies and often requires hardware
Index Terms—Edge AI, Federated learning, Adaptive offload- configuration data for model partitioning before manual train-
ing point, Reinforcement learning, Vision transformer. ing. Furthermore, a static partitioning strategy may become
suboptimal as operating conditions fluctuate during training.
I. I NTRODUCTION Here, we list some of the challenges FL, SL are facing that
Federated learning (FL) has gained attention as a privacy- need to be addressed to be effective in edge AI applications:
preserving distributed learning technique and has recently 1) Data and Model Size Impact on Training Efficiency:
become popular [1]. In FL, machine learning (ML) models faces challenges with large DNN models on resource-
like deep neural networks (DNNs) run on IoT devices, known constrained IoT devices like Raspberry Pis or NVIDIA
for their compact and flexible nature (e.g., NVIDIA Jetson or Jetsons [8]. For instance, the lightweight MobileNetV1
Raspberry Pi) [2]. FL allows DNN model training on devices model (3.2 million parameters) takes over 8 hours for
without transmitting raw data to the server, instead sending in- one round and one epoch of training [8]. This study
termediate models. Despite its benefits, FL faces challenges in proposes SplitNN as an efficient alternative, demonstrat-
computational costs and on-device storage, hindering practical ing a significant reduction in training time to 2.5 hours
applications [3]. for one epoch on five Raspberry Pis. While opting for
Implementing FL in edge AI on resource-constrained de- lightweight models is a temporary solution, the study
vices is challenging due to DNN model complexity and highlights the importance of a sustainable, long-term
numerous parameters, often in the millions or hundreds of approach, advocating for model-sharing methods where
millions [4]. To address this, the concept of partitioning and training occurs partially on the client and partly on the
offloading the DNN model at the client edge has emerged. server.
The goal is to ease computational burdens and distribute tasks 2) Heterogeneous Devices Impacting System Operation: FL
among edge clients. For instance, DNN performance in this faces complications due to heterogeneous computing ca-
context has been explored [5], and split learning (SL) [6] is pabilities, architectures, sequential round-robin process-
an ML technique leveraging this concept. ing, and varying network conditions among IoT devices
In SL [6], the DNN model is split into head and tail [9]. The fusion server has to wait for all devices to
segments, located on the server and client sides. Training complete training, impacting model accuracy as each de-
vice contributes differently to the training process [10]. 1) Develop an adaptive offloading point (AOP) algorithm
Addressing this challenge requires finding a solution to generate optimal strategies for device offloading.
that ensures effective coordination and communication Minimize the impact of computational heterogeneity,
among diverse IoT devices during the FL process. ensuring efficiency for diverse devices.
3) Networking Infrastructure and Device Disparities: Dis- 2) Applying reinforcement learning (RL) [15] based on the
parities in IoT device configurations, processing speeds, GMM clustering algorithm offers outstanding efficiency
and network conditions pose challenges in FL system when compared to K-means in FedAdapt. This inno-
deployment for edge AI applications [11]. Mitigating vative approach facilitates faster identification of action
these challenges involves developing robust strategies points, mitigates excessive randomness during training,
to adapt to varying network speeds and geographical and accelerates the training process on devices.
considerations, ensuring stability and effectiveness in 3) Experimental results were conducted on five Raspberry
diverse edge AI applications. Pis 4 and one PC-desktop as an edge client, with
various parameter configurations applied to both VIT
SL integration with VIT models is gaining traction, pri-
and deepVIT transformer models [16], [17], using in-
marily explored on standard image classification datasets [12].
cident image datasets. The results demonstrate that the
However, practical applications and real-world testing on edge
AOP method is 7% faster in training time compared to
devices are limited. Despite SL’s potential, its application
FedAdapt and exhibits a 26% improvement compared
in practice is crucial, given the challenges posed by lower
to classical FL with the VIT model. Similarly, with
configurations on edge devices, emphasizing the ongoing
the deepVIT model, the AOP method achieves a 5.6%
evolution of SL applications in addressing constraints in edge
improvement compared to FedAdapt and a notable
environments.
25.6% improvement compared to classic FL.
The authors in [13] introduced the FedAdapt algorithm,
The rest of this paper is organized as follows: Section II
which combines reinforcement learning-based optimization
introduces the concepts of AOP proposed algorithm. Section III
and K-means clustering [14]. This algorithm dynamically allo-
presents the optimization training for AOP. Section IV offers
cates DNN classes for server offloading, effectively address-
the evaluation performance, and the last section, V, concludes
ing challenges of computational heterogeneity and network
the paper and highlights directions for future research.
bandwidth (BW) variability. Evaluation results indicate that
FedAdapt reduces training time by over 40% compared II. T HE AOP PROPOSED ALGORITHM
to classical FL. However, it uses K-means clustering with This section describes the design and implementation of our
limitations, identifying clusters based on training time per system optimized for real-world FL.
interaction (TTPi) values. The maximum value within each
cluster serves as the agent input (TTPi and FLOP). The output A. System overview
action is initially randomized using the multivariate typical The overarching context centers on employing GMM and
technique and iteratively refined through reward evaluation RL in adaptive settings, specifically for optimizing offloading
scoring, introducing inherent randomness that may slow con- strategies in FL scenarios. The procedure encompasses pre-
vergence and lead to sub-optimal action selection. processing, clustering, RL-based decision-making, and post-
In this study, we present the AOP algorithm, a novel processing to improve the efficiency of training models across
approach detailed in Section II. This algorithm not only tackles distributed devices, delineated in steps 1 to 4 and illustrated
the three challenges mentioned earlier but also expedites in Fig. 1.
FL, alleviates the impact of computational heterogeneity, and 1) FL Round and Pre-processing: Initialize a complete
adapts to varying network BW. Our proposed method employs FL process. After completing the FL round (round t
the GMM approach [14], seeking to identify clusters based on - 1), the pre-processing stage collects observations on
the combination of TTPi, BW, and FLOP values. The agent the state of devices, including computational capabili-
input (TTPi, BW, FLOP) represents the average values of the ties and network bandwidth between each device and
clusters. The output action is determined using a combination servers. Normalization of training time per iteration is
of multivariate normal and max-min scaling techniques. By performed during this stage.
comparing the means of central clusters, a scaling factor is 2) Clustering Module: Employ the clustering module to
obtained, adjusting the action. This process accelerates the group devices with similar training times based on com-
identification of suitable actions, ultimately enhancing the putational homogeneity and network bandwidth. This
reliability of reward scoring. In comparison to K-means, GMM clustering is crucial for RL to determine offloading
significantly reduces training time by constraining random strategies for each cluster.
amplitudes, preventing excessively large or time-consuming 3) Trained RL Agent and Offloading Decision: Use the
actions during training. This is particularly advantageous for trained RL Agent, equipped with group information and
weaker devices, which may otherwise struggle to handle large observations (referred to as state), to generate offloading
OP scores in a limited number of iterations. decisions (referred to as actions) for each device cluster.
This study makes the following contributions: This process involves a fully connected neural network,
for the device and the server is determined by the proportion
of the workload executed, {µct Wtc } and {1 − µct Wtc }, which
are offloaded to the client and server, respectively. Let S(µct )
denote the size of the feature maps transferred between the
device and server during the training of round t. S(µct )
depends on µct as the offloading strategy determines the size
of the transferred feature map. Finally, the training time for
device c in round t can be calculated as follows:
µct Wtc (1 − µct )Wtc S(µct )
Ttc = + + , (1)
Ctc Cts BWtc
where {µct Wtc /Ctc } and {1−µct Wtc /Cts } represent the train-
ing time on the device and server, respectively. {S(µct )/BWtc }
is the communication time during training. In round t, Wtc ,
Fig. 1: Automated splitting point determine algorithm for Cts , Ctc , and BWtc are either constants or variables that are
client-server model. not controlled by AOP. Since the offloading strategy for each
device µct is an offloading point and is controlled by AOP,
our optimization target is defined as minimizing the average
and the training details are covered in a dedicated training time for C devices in a round to reduce the overall
section. training time effectively.
4) Post-processing and Offloading Strategy: In the post- The optimization problem described in Eq. 2 is formulated
processing stage, apply the output of the trained RL as follows:
agent to implement offloading decisions for devices
within each cluster. All devices in a cluster execute the
C
same offloading strategy, determining the layers of the 1 X c
min T
DNN model residing on each device for the current FL c
µt C c=1 t
round (round t). (2)
µc W c (1 − µct )Wtc S(µct )
subject to Ttc = t ct + s + ,
B. Model architecture Ct Ct BWtc
To maximize the efficiency of each transformer model, we Here, the objective
Pis to minimize
the average training time
1 C c
employ a strategy of dividing them into smaller, evenly dis- for C devices C k=1 Tt in round t. The training time
tributed sub-models, layers, or blocks, referred to as offloading Ttc for each device is subject to constraints that consider
points (OPs). For example, we partition a VIT model into the offloading strategy, device capacities (Ctc and Cts ), and
9 OPs and a deepVIT model into 10 OPs, as illustrated in network bandwidth (BWtc ).
Table I. Importantly, this division is carried out meticulously While the optimization seeks to minimize the overall train-
to ensure a balanced selection of OPs without unnecessary re- ing time for all devices, AOP goes beyond optimizing the
dundancy. This precise approach aims to maintain equilibrium maximum training time, which may be constrained by strag-
in OP selection, thereby promoting optimal performance. gler devices. It also aims to reduce the training time for each
individual device. This approach ensures a more balanced
C. Problem formulation
distribution of the computation load across devices. Therefore,
In particular, the network BW between the device and the the defined objective in AOP is to minimize the average
server may vary across different FL rounds. To address this training time for C devices, contributing to the overarching
variability, we observe the bandwidth from the previous FL goal of reducing the total training time over all FL rounds.
round to formulate a load reduction strategy. Unlike a previous The optimization is carried out for µt ∈ [0, 1], representing
study mentioned in [11], where the goal was solely to reduce the offloading point in each round, accounting for variable
the load over the overall training period, our approach goes operational conditions such as Ctc , Cts , and BWtc .
beyond that. Our objective is to diminish the training time
per communication round by establishing consistent offload III. R EINFORCEMENT LEARNING AGENT FOR TRAINING
strategies for all devices and adapting to observed network METHOD IN AOP
changes. This section outlines the training process of the RL agent
In the context of the AOP algorithm, FL training is con- used in AOP to achieve the objectives of the GMM clustering
ducted on each of the C devices, each with a training workload method. RL, a powerful ML technique, is a sequential decision
Wtc per round t. In this FL task, server s has a training optimization method in various applications. Its primary func-
capacity Cts , the participating client c has a training capacity tion within AOP is to autonomously generate a well-considered
Ctc , and the network bandwidth between the device and the load reduction strategy for participating devices, ultimately
server is denoted as BWtc at round t. The offloading strategy maximizing rewards, i.e., the training time on AOP.
TABLE I: FLOPs, trainable parameters of offloading point (OP) for VIT/deepVIT transformer models.
Model VIT deepVIT
Trainable Parameters 7.750M/10.484M 3.878M/6.617M
FLOPs OP = [0.78M, e-6M, 1.1M, 1.1M, 1.1M, 1.1M, 1.1M, OP = [0.39M, e-6M, 0.55M, 0.55M, 0.55M, 0.55M,
1.1M, e-6M] 0.55M, 0..55M, e-6M, 0.128M]
FLOPs (%) OP = [10.5%, ∼ 0%, 14.9%, 14.9%, 14.9%, 14.9%, OP = [10.2%, ∼ 0% , 14.4%, 14.4%, 14.4%, 14.4%,
14.9%, 14.9%,∼ 0%] 14.4%, 14.4%,∼ 0%, 3.3%]
Offloading points 9 10

A. Optimizing for multiple tasks workload that the device performs. Once the f lopN n value
The offloading strategy within the AOP algorithm needs is obtained, the number of FLOPs is calculated and set as
to be adaptable to changing operating conditions. The the target workload on the device. The objective is to select
GMM approach aims to categorize devices and servers an operational workload closest to the target workload. Eq.
into buckets based on a combination of parameters X = 5 defines how the input state and the output action are
{T T P i, BW, F LOP }. This categorization is crucial for opti- represented at each time step T .
mizing the performance of AOP. While using the RL method,
it is essential to generate different output actions in response St = {T T P int , BWtn , F LOPt−1
n
}N
n=1 ,
to changes in multiple tasks. However, in practical scenar- At = {F LOPtn }N (5)
n=1 ,
ios, it has been observed that this approach may lead to
suboptimal offloading, especially when network bandwidth is subject to F LOPtn ∈ {0, 1}
not constrained. To address this challenge, the agent’s input C. Reward function
(comprising TTPi, BW, FLOP) represents the mean µK values
The reward function plays a crucial role in training the
of the N th groups. The output action is determined through
RL agent. The reward result after each training round in
a combination of the standard multivariate technique and rate
the FL system is denoted as Rt in Eq. 6. To ensure the
maximization. A scaling factor π is obtained by comparing
realization of the objective in Eq. 5, one alternative is to set
the parameters of the central groups, and this factor is used
the reward value as the average training time. Due to the
to adjust the action in the heterogeneous groups. This process
varying configurations of devices, the average training time
accelerates the identification of appropriate actions, ultimately
is referenced based on each device completing the base model
enhancing the reliability of the reward point. The RL agent’s
training entirely, denoted as B N or the whole backbone. The
training occurs in a controlled environment where the network
training time of each group generates an average reward score
bandwidth between the device and the pre-processing server
for the OFF sampling communication round.
automatically adjusts to represent the mean of the group
through the expectation-maximization (EM) algorithm. The N
1 X N
EM algorithm consists of two steps: the E-step and the M- Rt = B − T T P iN
t , (6)
step. N n=1
E-step: For each i, j set We chose Proximal Policy Optimization (PPO) for this
πk N (xn |µk , σk ) study due to its user-friendly nature and strong performance
γ(znk ) = K (3) in standard RL tasks. PPO belongs to the class of non-policy
Σj=1 πj N (xn |µj , σj
gradient RL algorithms, which efficiently leverage trajectory
M-step: Update the parameters data from past interactions. This makes PPO our primary focus
ΣN for training in this study.

n=1 γ(znk )xn
µk = ΣN γ(znk )

 n=1
ΣN γ(z )(x −µ )(xn −µk )T (4) D. RL training methodology
Σk = n=1 nk ΣnN k γ(znk )
 n=1
The RL task involves two networks, the actor and critic,

πk = N1 ΣN
n=1 γ(znk )

sharing a three-layer architecture. During RL training, the
B. State and action critic network assists the actor network. After training, only
The input states for the RL agent consist of the average the actor network develops the offload action. Ideally, the RL
training time and local bandwidth of devices within a group. model should be trained online during the FL task. However,
In each training round, the RL’s task is to generate each online training during FL requires waiting for each FL training
group’s corresponding load reduction action, denoted as yn . round to complete for reward calculation. To address this,
The action of each group is then processed to map to the we opt for offline RL model training before FL tasks. To
model of all devices in the group, resulting in the action expedite RL training, we reduce batches for each round,
denoted Ât . The output action for each group is designed as called ”truncated FL bullets.” Instead of using round time for
an absolute value (f lopN
n ) ranging from 0 to 1, allowing the input/output, we gather batch training times per device. The
RL agent to adapt to various backbone models. This value FL model is retrained with regular loops, excluding truncated
is mapped to a percentage of the total model computational ones if offload time surpasses the limit or training shows
Algorithm 1: Condition workload with EM algorithm Algorithm 2: AOP algorithm
for GMM and PPO 1 Requirement: |ω| = ωh,i + ωt ; ωh,i is a head model,
1 Input: a given data X = {x1 , x2 , . . . , xn }; and ωt is a tail model of OP model M = Mh,i + Mt,i
π = {π1 , π2 , . . . , πk } 2 η: learning rate
2 Output: µ = {µ1 , µ2 , . . . , µk }; Σ = {Σ1 , Σ2 , . . . , Σk } 3 unicast initialize - ωh,0 of basic training to full
3 for each i do offloading OPmax
4 x(ttpi, bw, flops) 4 for per i ∈ {R, . . . , R − 1} do
5 Randomly initial π, µ, Σ 5 to c-th client from Round Robin scheduling i for c
6 for t ∈ T do 6 /*run train on clients c ∈ K */
7 /*E-Step 7 generates smashed data sc,i by passing input data
8 for n ∈ N do xc,i through ωc,i
9 for k ∈ K do 8 produces sc,i , yc,i applying Mt,i model forward to
10 γ(znk ) from Eq. 3 the server ▷
11 end for 9 /*run train on server */
12 yn = argmax γ(znk ) 10 produces sc,i , yc,i via sh,i , yc,i continues training
k
13 end for for all c at i
14 /*M-Step 11 produces sc,i , yc,i applying Mh,i model from the
15 for k ∈ K do server ▷
16 for k ∈ K do 12 generates loss ΣR i Li by passing xc,i , yc,i through
17 µk , Σk , πk from Eq. 4 ωt in parallel
18 end for 13 updates |ω| via ωt ← η · ▽ωt (ΣR i Li ) ▷
19 µ mixture component means use to state of 14 backward c − th model gradh,i with cut-layer
environment for policy each i gradient for all c each at i ▷
20 µπ scale weight of mixture components 15 /*run update on server */
21 γ prediction is conducted by group label 16 send c − th model M with average weight
1 K
for client each i M Σc |ω| ▷
22 end for 17 if frequency f ∈ R then
23 end for 18 the actor and critic train of RL finds to OP
24 minibach size Mf for policy action for all c workload from Alg. 1
25 end for 19 end if
26 for k ∈ K do 20 /*run update on clients c ∈ K */
27 Input: initial policy parameters θ0 clipping 21 updates next round weight ωc,i+1 via Mc,h,i ← M
threshold ϵ = 0.2 with cut-layer gradient for all c ▷
28 for iteraction 1, 2, . . . , K do 22 end for
29 for actor 1, 2, . . . , N do
30 Run policy πold in environment for T
timesteps IV. P ERFORMANCE EVALUATION
31 Compute advantage estimates Â1 , . . . , ÂT
In this section, we assess the performance of the proposed
with condition GMM scale
32 end for method, AOP, using incident datasets. Initially, we elaborate on
33 Lt (θ) = the parameter settings for our experiments, encompassing the
min(rt (θ)Ât , clip(rt (θ)), 1 − ϵ, 1 + ϵ)Ât physical clients, dataset, and models. Subsequently, we eval-
34 Optimize surrogate L wrt. θ, with K epochs uate and discuss the performance of our proposed method in
and minibatch size Mf < N T comparison with classical FL and the FedAdapt algorithm,
35 θold ← θ focusing on aspects such as training time and offloading points
36 end for on each client.
37 OP action for all c
A. Simulation settings
38 end for
1) Heterogeneous edge clients: In this section, we examine
diverse heterogeneous devices with varying computing, mem-
ory, and communication capabilities. The ensemble comprises
low standard deviation in the offload action’s multivariable a PC desktop serving as the server (Intel i9-9900K CPU
Gaussian function. In training, parameters are set as follows: 3.60GHz, RAM 64GB) and six edge clients dispersed across
RL Agent discount factor is 0.9 for future state rewards; PPO different locations and distances, including five (client1 to
clip parameter is 0.2; actor and critic network learning rates client5) Raspberry Pis (ARM Cortex, RAM 8GB) and a
are 0.001; update policy for epochs is 50. PC laptop (client6) (Intel i7-CPU 1.10GHz, RAM 16GB).
(a) Animals (b) Collapse (c) Crash (d) Fire (e) Flooding (f) Landslide (g) Snow (h) Treefall
Fig. 2: Incident datasets with eight classes.

To ensure a comprehensive evaluation, we allocate different It is easy to see the following: For VIT, the training time is
bandwidths to devices with varying speeds. For example, in much faster than the two baseline methods on each commu-
client1, client2, we limit the client’s data transfer rate to a mere nication and all rounds. Applying AOP will train a part of the
10Mbps. In contrast, in other devices as client3 to client6, we VIT model at the client and the rest at the server faster than
denoted Inf , which means the speed is unspecified, indicat- classical FL. Furthermore, compared with FedAdapt, our
ing an arbitrary speed dependent on the network bandwidth results show that our AOP is much faster than an outstanding
environment. method using the offloading point technique used in Fig. 3a.
2) Dataset: This section have chosen the incident image Next, we analyze the accuracy. VIT’s accuracy is much higher
dataset for simulation to assess the efficacy of our novel than classical FL and FedAdapt due to faster convergence
offloading point concept. This dataset, previously employed time, see in Fig. 3b. Second, we compare total time training,
in numerous studies [18], comprises various street incident for deepVIT-AOP is still much faster than classical FL and
scenarios categorized into eight classes, as depicted in Fig. 2. slightly faster than FedAdapt. So far, we considered the
These images serve as a semantically rich resource for training testing average accuracy in our results, and it has higher
in the image dataset with image size is 128 × 128. Each class accuracy than classical FL and lower than FedAdapt. After
is anticipated to encompass hundreds of images, providing the training process is completed, the total training time of
a diverse spectrum and capturing commonplace objects and classical FL, VIT FedAdapt, and VIT AOP is 26h33m4s,
surroundings whenever feasible. 21h41m24s, and 19h59m13s, respectively. Thus, VIT AOP is
3) Models and baselines: We chose the transformer model, 7% faster than VIT FedAdapt and 26% faster than classical
specifically the VIT and deepVIT, for evaluating our incident FL.
image dataset due to their inherent parallel processing capa- Similarly, in Fig. 3c deepVIT AOP has a total training time
bilities in the encoder phase. This study focuses solely on of 12h35m12s, while deepVIT FedAdapt is 13h35m22s, and
image datasets, assessing the model’s reduction in training classical FL is 16h58m49s. So deepVIT AOP is 5.6% faster
time while maintaining accuracy. Detailed specifications for than deepVIT FedAdapt and 25.6% faster than classical FL,
the chosen models are provided in Section II and Tab. I. For a while Fig. 3d shows the deepVIT with the highest accuracy.
comprehensive comparison, classical FL and the FedAdapt Finally, in the AOP technique, training time is the most
baseline methods are included, considering aspects such as important when using offloading points to find bottlenecks
accuracy, total training time on devices and individual devices, based on bandwidth to split part of the deep learning model
and optimal offloading point selection. The dataset is randomly to client and server devices. Choosing the suitable model is
split into train/valid/test sets (7 : 2 : 1). All models and more critical because transformer models will be the deciding
baselines are implemented in Python 3.10.9 using the PyTorch factor in results such as training time, accuracy, etc.
library. Experiments are conducted on the server’s NVIDIA
GTX 1080Ti GPU and only CPU for clients. We use the C. Performance evaluation for time training on actual each
SGD optimizer with a learning rate of 0.1, a batch size of edge clients
32, and each client has a batch size of 5. Simulations involve
In this Section, we evaluate the training time on each device
500 communication rounds on each task, with each round
edge client (here, five Raspberry Pis 4 and one PC laptop)
comprising 1 epoch of local training. It’s crucial to note that,
installed in different locations. Consider Fig. 4a, in the case
for a fair comparison, all methods are trained in the same
of VIT-AOP, we see that the client6 is the PC laptop. With such
environment, ensuring identical simulation settings.
a powerful edge device, superior to Raspberry Pi 4, training is
faster, which is obvious. However, there are two Raspberry PI
B. Performance evaluation for total time training and average
as client1, client2. We must see if the retraining time exceeds
accuracy on actual edge clients
three Raspberry Pi edge devices. The reason is straightforward:
To assess the effectiveness of the AOP approach with the we assume/set the bandwidth limit to only be a maximum of
VIT and deepVIT transformer models, we compare it with two 10Mbps, so it is reasonable that the training time on these two
baseline methods: classical FL and FedAdapt. Analyzing the devices is longer than on other devices. That shows that the use
training time spent on each communication round using Algs. of AI applications on edge devices is the future trend, and more
1 and 2 (see Fig. 3), we observe the following: importantly, the devices need to have a robust configuration.
(a) (b)

(c) (d)
Fig. 3: The total training time (s) and average test accuracy (%) for each communication round are shown for classical FL,
FedAdapt, and AOP on two models: VIT (in figures (a) and (b)) and deepVIT (in figures (c) and (d)).

.
(a) (b)
Fig. 4: The training time with AOP algorithm on each client is depicted for two models: VIT (in figure (a)) and deepVIT (in
figure (b)) models.

(a) (b)
Fig. 5: The action mean with AOP algorithm on each client using two VIT (in figure (a)) and deepVIT (in figure (b)) models.
The bandwidth between the client and the server needs to be R EFERENCES
strong enough, and, most importantly, there must be an optimal [1] Q. Li et al., ”A Survey on Federated Learning Systems: Vision, Hype
solution to share tasks in edge computing, especially the AOP and Reality for Data Privacy and Protection,” in IEEE Transactions on
solution we introduce. Similarly, for the deepVIT case in Fig. Knowledge and Data Engineering, vol. 35, no. 4, pp. 3347-3366, 1 April
2023, doi: 10.1109/TKDE.2021.3124599.
4b, the results of training time on each edge device also re- [2] D. Wu, R. Ullah, P. Harvey, P. Kilpatrick, I. Spence and B. Varghese,
validated our focus, contribution and new ideas in this paper. ”FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning,”
in IEEE Internet of Things Journal, vol. 9, no. 21, pp. 20889-20901, 1
D. Performance evaluation for action mean on actual each Nov.1, 2022, doi: 10.1109/JIOT.2022.3176469.
[3] Matsubara, Y., Levorato, M., & Restuccia, F. (2021). Split Computing
edge clients and Early Exiting for Deep Learning Applications: Survey and Research
As highlighted in Section II, determining the workload score Challenges. ACM Computing Surveys, 55, 1 - 30.
[4] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,
for each device in the group is based on a percentage derived Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,
from the action mean. Eq. 1 can be applied to calculate the Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words:
workload score. We define the action mean as the percentage Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929.
[5] L. Lockhart, P. Harvey, P. Imai, P. Willis and B. Varghese, ”Scis-
of FLOPs at layer i out of the total number of FLOPs in the sion: Performance-driven and Context-aware Cloud-Edge Distribution
model. This enables us to assess the impact and importance of of Deep Neural Networks,” in 2020 IEEE/ACM 13th International
each workload. Refer to Figs. 5a and 5b for VIT and deepVIT Conference on Utility and Cloud Computing (UCC), Leicester, United
Kingdom, 2020 pp. 257-268.
models, respectively. Higher workload values indicate that the [6] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning
device is adept at handling the offloading point selection and for health: Distributed deep learning without sharing raw patient data,”
related tasks. For instance, the workload attains its highest arXiv:1812.00564, 2018.
[7] C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “SplitFed:When
value for an edge device, a PC laptop is a client6. Additionally, federated learning meets split learning,” in Proc. AAAI Conf. Ar-
the workload values for each client reflect the environmental tif.Intell., Jun. 2022, pp. 8485–8493.
conditions influencing the cutting point on each device. This [8] Y. Gao et al., ”End-to-End Evaluation of Federated Learning and Split
Learning for Internet of Things,” 2020 International Symposium on
information provides insights into the device’s offloading point Reliable Distributed Systems (SRDS), Shanghai, China, 2020, pp. 91-
and the corresponding workload percentage (i.e., how much % 100, doi: 10.1109/SRDS51746.2020.00017.
of the maximum work). Similarly, simultaneous evaluation of [9] L. Li, H. Xiong, Z. Guo, J. Wang and C. -Z. Xu, ”SmartPC: Hierarchical
Pace Control in Real-Time Federated Learning System,” 2019 IEEE
the workload on different devices allows us to ascertain the Real-Time Systems Symposium (RTSS), Hong Kong, China, 2019, pp.
maximum value achievable by other edge devices. 406-418, doi: 10.1109/RTSS46320.2019.00043.
[10] Imteaj, A., Mamun Ahmed, K., Thakker, U., Wang, S., Li, J., Amini,
V. C ONCLUSIONS M.H. (2023). Federated Learning for Resource-Constrained IoT Devices:
Panoramas and State of the Art. In: Razavi-Far, R., Wang, B., Taylor,
By seamlessly integrating the transformer model from clas- M.E., Yang, Q. (eds) Federated and Transfer Learning. Adaptation,
sical FL with the offloading point technique in edge devices, Learning, and Optimization, vol 27. Springer, Cham.
[11] Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman,
we introduce a groundbreaking solution named adaptive of- A., Ivanov, V., Kiddon, C., Konecný, J., Mazzocchi, S., McMahan,
floading point (AOP). This novel approach is designed for H.B., Overveldt, T.V., Petrou, D., Ramage, D., & Roselander, J.
application in edge AI, offering enhanced model privacy (2019). Towards Federated Learning at Scale: System Design. ArXiv,
abs/1902.01046.
through network splitting and incorporating differential pri- [12] Almalik, Faris et al. “FeSViBS: Federated Split Learning of Vision
vate client-side model updates. AOP outperforms classical FL Transformer with Block Sampling.” International Conference on Medical
and FedAdapt by leveraging the actual network bandwidth, Image Computing and Computer-Assisted Intervention (2023).
[13] D. Wu, R. Ullah, P. Harvey, P. Kilpatrick, I. Spence and B. Varghese,
facilitating parallel processing across clients, and achieving ”FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning,”
significantly faster training times on real edge devices. Our in IEEE Internet of Things Journal, vol. 9, no. 21, pp. 20889-20901, 1
experimental results, conducted on a incident image dataset, Nov.1, 2022, doi: 10.1109/JIOT.2022.3176469.
[14] Christopher M. Bishop. 2006. Pattern Recognition and Machine Learn-
showcase AOP’s superior performance in terms of training ing (Information Science and Statistics). Springer-Verlag, Berlin, Hei-
efficiency and accuracy. These findings hold promising im- delberg.
plications for practical applications, particularly in the realm [15] Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning:
An Introduction. A Bradford Book, Cambridge, MA, USA.
of FL for Edge AI. [16] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,
Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,
ACKNOWLEDGMENT Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929.
This R&D includes the results of ”Research and [17] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., & Feng,
development of optimized AI technology by secure data J. (2021). DeepViT: Towards Deeper Vision Transformer. ArXiv,
coordination (JPMI00316)” by the Ministry of Internal abs/2103.11886.
[18] Levering, A., Tomko, M., Tuia, D., Khoshelham, K.: Detecting unsigned
Affairs and Communications (MIC), Japan. physical road incidents from driver-view images. IEEE Trans. Intell.
Veh. 6(1), 24–33 (2021)
We would like to thank Takamasa Mizoi, Senior Research
Engineer, and Isao Kikuchi, System Engineer, at the Big Data
Integration Research Center, National Institute of Information
and Communications Technology (NICT), for their great co-
operation in the development of the experimental system.

DCNN-a Novel Binary and Multi-Class Network Intrusion Detection Model Via Deep Convolutional Neural Network
No ratings yet
DCNN-a Novel Binary and Multi-Class Network Intrusion Detection Model Via Deep Convolutional Neural Network
23 pages
Pattern Recognitionand Neural Networks
No ratings yet
Pattern Recognitionand Neural Networks
12 pages
Adaptive Batch Size For Federated Learning in Resource-Constrained Edge Computing
No ratings yet
Adaptive Batch Size For Federated Learning in Resource-Constrained Edge Computing
17 pages
Vehcom D 24 00093
No ratings yet
Vehcom D 24 00093
15 pages
Federated Continual Learning For Edge-AI: A Comprehensive Survey
No ratings yet
Federated Continual Learning For Edge-AI: A Comprehensive Survey
35 pages
Self-Aware Distributed Deep Learning Framework For Heterogeneous IoT
No ratings yet
Self-Aware Distributed Deep Learning Framework For Heterogeneous IoT
13 pages
DNN Partitioning For Inference Throughput Acceleration at The Edge
No ratings yet
DNN Partitioning For Inference Throughput Acceleration at The Edge
14 pages
IOT With FL
No ratings yet
IOT With FL
25 pages
Federated Learning in Iot: A Survey From A Resource-Constrained Perspective
No ratings yet
Federated Learning in Iot: A Survey From A Resource-Constrained Perspective
6 pages
A Docker Based Federated Learning Framework Design and Deployment For Multi Modal Data Stream Classification
No ratings yet
A Docker Based Federated Learning Framework Design and Deployment For Multi Modal Data Stream Classification
35 pages
Research Paper
No ratings yet
Research Paper
15 pages
BT 3
No ratings yet
BT 3
28 pages
FedFMSL - Federated Learning of Foundation Models With Sparsely Activated LoRA
No ratings yet
FedFMSL - Federated Learning of Foundation Models With Sparsely Activated LoRA
15 pages
Online-Learning Task Scheduling With GNN-RL Scheduler
No ratings yet
Online-Learning Task Scheduling With GNN-RL Scheduler
17 pages
Technologies-11-00091 - Implementation of Deep Learning Models On An SoC-FPGA Device For Real-Time Music Genre Classification
No ratings yet
Technologies-11-00091 - Implementation of Deep Learning Models On An SoC-FPGA Device For Real-Time Music Genre Classification
18 pages
Video-of-Thought:: Step-by-Step Video Reasoning From Perception To Cognition
No ratings yet
Video-of-Thought:: Step-by-Step Video Reasoning From Perception To Cognition
17 pages
Towards Federated Learning at Scale System Design
No ratings yet
Towards Federated Learning at Scale System Design
15 pages
Speed Up Federated Learning in Heterogeneous Environments A Dynamic Tiering Approach
No ratings yet
Speed Up Federated Learning in Heterogeneous Environments A Dynamic Tiering Approach
9 pages
Sensors 24 04182
No ratings yet
Sensors 24 04182
22 pages
Unstructured PruneFL
No ratings yet
Unstructured PruneFL
22 pages
B12. Fedeteral Learning in Edge
No ratings yet
B12. Fedeteral Learning in Edge
64 pages
Aaaaaa
No ratings yet
Aaaaaa
13 pages
Roth Towards Total Recall in Industrial Anomaly Detection CVPR 2022 Paper
No ratings yet
Roth Towards Total Recall in Industrial Anomaly Detection CVPR 2022 Paper
11 pages
Split Learning in 6G Edge Networks
No ratings yet
Split Learning in 6G Edge Networks
7 pages
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
No ratings yet
A Graph Neural Network Learning Approach To Optimize RIS-Assisted Federated Learning
16 pages
1 Deeplearning Edge Iot Article 1 CV 20250513 IEEENW 32 1 96 101
No ratings yet
1 Deeplearning Edge Iot Article 1 CV 20250513 IEEENW 32 1 96 101
7 pages
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
No ratings yet
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
6 pages
Personalized Edge Intelligence Via Federated Self-Knowledge Distillation
No ratings yet
Personalized Edge Intelligence Via Federated Self-Knowledge Distillation
14 pages
Federated Learning For Edge Networks: Resource Optimization and Incentive Mechanism
No ratings yet
Federated Learning For Edge Networks: Resource Optimization and Incentive Mechanism
7 pages
Study of Ensemble Classifers
No ratings yet
Study of Ensemble Classifers
8 pages
Literature Review On Feature Selection Methods For High-Dimensional Data
No ratings yet
Literature Review On Feature Selection Methods For High-Dimensional Data
10 pages
M R N E F L: Anagement of Esource at The Etwork Dge For Ederated Earning
No ratings yet
M R N E F L: Anagement of Esource at The Etwork Dge For Ederated Earning
29 pages
The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
Electronics 13 02135
No ratings yet
Electronics 13 02135
16 pages
FeDZIO Decentralized Federated Knowledge Distillation On Edge Devices
No ratings yet
FeDZIO Decentralized Federated Knowledge Distillation On Edge Devices
10 pages
Towards Energy-Aware Federated Learning Via MARL: A Dual-Selection Approach For Model and Client
No ratings yet
Towards Energy-Aware Federated Learning Via MARL: A Dual-Selection Approach For Model and Client
9 pages
HFEL Joint Edge Association and Resource Allocation For Cost-Efficient Hierarchical Federated Edge Learning
No ratings yet
HFEL Joint Edge Association and Resource Allocation For Cost-Efficient Hierarchical Federated Edge Learning
14 pages
Week 4 - Diffusion Models
No ratings yet
Week 4 - Diffusion Models
35 pages
Ecc: A Gpu - E C C: G Based High Throughput Framework For Lliptic Urve Ryptography
No ratings yet
Ecc: A Gpu - E C C: G Based High Throughput Framework For Lliptic Urve Ryptography
20 pages
MLT Key
No ratings yet
MLT Key
71 pages
Auction Based Clustered Federated Learning in Mobile Edge Computing System
No ratings yet
Auction Based Clustered Federated Learning in Mobile Edge Computing System
13 pages
Pedestrian Detection Report
100% (1)
Pedestrian Detection Report
7 pages
Topic Modeling Using NLP For Student Feedback
No ratings yet
Topic Modeling Using NLP For Student Feedback
5 pages
Split Learning Over Wireless Networks Parallel Design and Resource Management
No ratings yet
Split Learning Over Wireless Networks Parallel Design and Resource Management
30 pages
Optimizer Algorithms and Convolutional Neural Networks For Text Classification
No ratings yet
Optimizer Algorithms and Convolutional Neural Networks For Text Classification
8 pages
Paper 2
No ratings yet
Paper 2
21 pages
Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
No ratings yet
Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
12 pages
Anhnp CV
No ratings yet
Anhnp CV
5 pages
Accelerating DNN Training in Wireless Federated Edge Learning
No ratings yet
Accelerating DNN Training in Wireless Federated Edge Learning
30 pages
Semi-Synchronous Personalized Federated Learning
No ratings yet
Semi-Synchronous Personalized Federated Learning
17 pages
Image Processing 7
No ratings yet
Image Processing 7
193 pages
AI Software Engineer
No ratings yet
AI Software Engineer
1 page
Ali Mohanad
No ratings yet
Ali Mohanad
127 pages
Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach
No ratings yet
Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach
14 pages
Applsci 12 09124 v2
No ratings yet
Applsci 12 09124 v2
36 pages
Bonawitz, Eichner Et Al 2019 - Towards Federated Learning at Scale
No ratings yet
Bonawitz, Eichner Et Al 2019 - Towards Federated Learning at Scale
15 pages
Human Activity Recognition
No ratings yet
Human Activity Recognition
6 pages
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
No ratings yet
Pneumonia Detection On X-Ray Image Using Improved Depthwise Separable Convolutional Neural Networks
9 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
No ratings yet
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
8 pages
Modern Techniques in Detecting Identifying and Classifying Exg3zx7bqj
No ratings yet
Modern Techniques in Detecting Identifying and Classifying Exg3zx7bqj
11 pages
ChainsFL Blockchain-Driven Federated Learning From Design To Realization
No ratings yet
ChainsFL Blockchain-Driven Federated Learning From Design To Realization
6 pages
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
From Everand
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Deep Learning Based Sentiment
No ratings yet
Deep Learning Based Sentiment
62 pages
Optimising Deep Learning Split Deployment For IoT Edge Networks
No ratings yet
Optimising Deep Learning Split Deployment For IoT Edge Networks
6 pages
Accelerating Federated Learning Via Momentum Gradient Descent
No ratings yet
Accelerating Federated Learning Via Momentum Gradient Descent
13 pages
Knowledge Transfer For On-Device Deep Reinforcement Learning in Resource Constrained Edge Computing Systems PDF
No ratings yet
Knowledge Transfer For On-Device Deep Reinforcement Learning in Resource Constrained Edge Computing Systems PDF
10 pages
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
No ratings yet
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
16 pages
Accelerated
No ratings yet
Accelerated
6 pages
Secure and Efficient Federated Learning Through Layering and Sharding Blockchain
No ratings yet
Secure and Efficient Federated Learning Through Layering and Sharding Blockchain
15 pages
LABSHEET 3 - SN PID Controller PDF
No ratings yet
LABSHEET 3 - SN PID Controller PDF
6 pages
Sustainability 11 03974 PDF
No ratings yet
Sustainability 11 03974 PDF
15 pages
UNIT I-PGI20C05J-Deep Neural Networks
No ratings yet
UNIT I-PGI20C05J-Deep Neural Networks
35 pages
Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights
No ratings yet
Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights
6 pages
Accelerating Federated Learning With Cluster Construction and Hierarchical Aggregation
No ratings yet
Accelerating Federated Learning With Cluster Construction and Hierarchical Aggregation
18 pages
Aop Iccps 2024
No ratings yet
Aop Iccps 2024
10 pages
Fed Adp
No ratings yet
Fed Adp
11 pages
Image Classification Using Resnet
No ratings yet
Image Classification Using Resnet
28 pages
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning
No ratings yet
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning
2 pages
Computation and Communication Efficient Federated Learning Over Wireless Networks
No ratings yet
Computation and Communication Efficient Federated Learning Over Wireless Networks
13 pages
Sujet-5-Edge Machine Learning
No ratings yet
Sujet-5-Edge Machine Learning
1 page
20894-Article Text-24907-1-2-20220628
No ratings yet
20894-Article Text-24907-1-2-20220628
9 pages
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
No ratings yet
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
8 pages
Classifying Images: D.A. Forsyth
No ratings yet
Classifying Images: D.A. Forsyth
24 pages
2022 IEEEIoTM FLIoTVision
No ratings yet
2022 IEEEIoTM FLIoTVision
6 pages
Federated Learning Resource
No ratings yet
Federated Learning Resource
8 pages
A Survey On Federated Learning For Resource-Constrained IoT Devices
No ratings yet
A Survey On Federated Learning For Resource-Constrained IoT Devices
24 pages
DL Lab Manual A.Y 2022-23-1
100% (1)
DL Lab Manual A.Y 2022-23-1
67 pages
A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
No ratings yet
A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
5 pages
Unit 3
No ratings yet
Unit 3
17 pages
A Hybrid Intrution Detection Approach Based On Deep Learning
No ratings yet
A Hybrid Intrution Detection Approach Based On Deep Learning
16 pages
Federated Learning
No ratings yet
Federated Learning
15 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Gans
No ratings yet
Gans
14 pages
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
No ratings yet
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
22 pages
Hands-On Deep Learning For Images With T PDF
No ratings yet
Hands-On Deep Learning For Images With T PDF
3 pages

Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices

Uploaded by

Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices

Uploaded by

Clustering-Enhanced Reinforcement Learning for

Adaptive Offloading in Resource-Constrained

You might also like