Federated Continual Learning For Edge-AI: A Comprehensive Survey
Federated Continual Learning For Edge-AI: A Comprehensive Survey
Survey
ZI WANG, FEI WU, FENG YU, YURUI ZHOU, JIA HU, and GEYONG MIN, Department of
arXiv:2411.13740v1 [cs.LG] 20 Nov 2024
Computer Science, Faculty of Environment, Science and Economy, University of Exeter, United Kingdom
Edge-AI, the convergence of edge computing and artificial intelligence (AI), has become a promising para-
digm that enables the deployment of advanced AI models at the network edge, close to users. In Edge-AI,
federated continual learning (FCL) has emerged as an imperative framework, which fuses knowledge from
different clients while preserving data privacy and retaining knowledge from previous tasks as it learns new
ones. By so doing, FCL aims to ensure stable and reliable performance of learning models in dynamic and
distributed environments. In this survey, we thoroughly review the state-of-the-art research and present the
first comprehensive survey of FCL for Edge-AI. We categorize FCL methods based on three task characteristics:
federated class continual learning, federated domain continual learning, and federated task continual learning.
For each category, an in-depth investigation and review of the representative methods are provided, covering
background, challenges, problem formalisation, solutions, and limitations. Besides, existing real-world ap-
plications empowered by FCL are reviewed, indicating the current progress and potential of FCL in diverse
application domains. Furthermore, we discuss and highlight several prospective research directions of FCL
such as algorithm-hardware co-design for FCL and FCL with foundation models, which could provide insights
into the future development and practical deployment of FCL in the era of Edge-AI.
CCS Concepts: • Networks → Network architectures; • Computing methodologies → Distributed
computing methodologies; Artificial intelligence.
Additional Key Words and Phrases: Federated Continual Learning, Edge-AI, Edge Computing, Artificial
Intelligence, Lifelong Learning, Incremental Learning, Federated Learning
ACM Reference Format:
Zi Wang, Fei Wu, Feng Yu, Yurui Zhou, Jia Hu, and Geyong Min. 2024. Federated Continual Learning for Edge-
AI: A Comprehensive Survey. ACM Comput. Surv. 1, 1 (November 2024), 35 pages. https://doi.org/XXXXXXX.
XXXXXXX
1 INTRODUCTION
Deep Learning (DL) has emerged as a leading approach in artificial intelligence (AI), with demonstra-
ble efficacy across various scientific fields, including computer vision, natural language processing,
and speech recognition [1]. DL utilises artificial neural networks with multiple hidden layers to
model high-level abstractions and learn complex patterns and representations from data [2]. In
recent years, the proliferation of DL applications has catalysed advancements in various sectors,
exemplified by their role in assisting medical diagnostics [3], enhancing autonomous driving sys-
tems [4], and accelerating genomics research [5]. However, traditional implementations of DL rely
on cloud computing systems with centralised servers and data storage, which can raise privacy
Authors’ address: Zi Wang, [email protected]; Fei Wu, [email protected]; Feng Yu, [email protected]; Yurui Zhou,
[email protected]; Jia Hu, [email protected]; Geyong Min, [email protected], Department of Computer Science, Faculty
of Environment, Science and Economy, University of Exeter, Exeter, Devon, United Kingdom, EX4 4RN.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM 0360-0300/2024/11-ART
https://doi.org/XXXXXXX.XXXXXXX
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
2 Wang et al.
concerns when collecting user data, incur high communication costs, and increase latency between
servers and clients. To address these challenges, edge computing has emerged as a promising
approach, which is a distributed computing paradigm that brings computation and storage closer to
data sources, rather than relying on centralised cloud-based data processing. This paradigm shift can
significantly reduce the latency and cost, making it suitable for data-intensive and latency-sensitive
AI applications. Therefore, the convergence of edge computing and AI gives rise to Edge-AI, which
aims to enable real-time AI applications powered by edge computing.
Edge-AI employs a popular distributed machine learning approach called federated learning (FL)
[6], which allows collaborative DL model training across clients while keeping the data localised.
To achieve this, a coordinating server distributes the global model to participating clients, which
then train the model using their local data. By aggregating processed parameters such as gradients
rather than raw data from each client on the coordinating server, FL ensures the overall training
performance and effectiveness of the global model while complying with data security regulations
[7, 8] such as the General Data Protection Regulation (GDPR) and the Data Protection Act (DPA),
addressing growing concerns about user privacy in AI applications.
FL research has mainly focused on model convergence under non-independent and identically
distributed (non-IID) data [9], model aggregation [10], security and privacy [11], resource optimi-
sation and incentive mechanisms [12], etc. Furthermore, most FL works assume that the training
dataset of clients is sampled from a static data distribution [13] and available from the beginning of
the training [14]. Whereas, in real-world scenarios, the progressive data collection, the distribution
of data, the class of samples, and the number of tasks can change over time, bringing significant
challenges to the model adaptability [15].
Recently, continual learning (CL), also known as incremental learning (IL) or lifelong learning
(LL), has become an important approach for learning and accumulating knowledge from a continual
stream of data [13]. Thus, integrating the concept of CL into the FL framework, known as Federated
Continual Learning (FCL), leverages the strengths of both FL and CL to establish a robust
foundation for Edge-AI in dynamic and distributed environments. However, continual learning
from a series of new tasks can cause the model to experience significant performance degradation on
previously learned tasks, a phenomenon known as catastrophic forgetting (CF) [13]. FCL deteriorates
this problem as FL allows clients to join and leave the learning process arbitrarily. Furthermore, the
heterogeneity of FL clients leads local models to learn diverse knowledge, exacerbating catastrophic
forgetting in the global model during the aggregation of these local models. Recent studies (e.g.,
[15–18]) have proposed solutions to tackle these challenges, giving rise to an emerging research
field that is increasingly attracting attention.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 3
methods. They also provided an extensive experimental evaluation of those methods for image
classification tasks.
These surveys are focused on separate areas of FL and CL. None of them has systematically
investigated the challenges and solutions proposed in the emerging paradigm of FCL, especially in
the Edge-AI environments. Recently, Yang et al. [24] conducted a survey of FCL from the perspective
of knowledge fusion. They proposed two frameworks, namely synchronous and asynchronous
FCL, for addressing the spatial-temporal catastrophic forgetting challenge in FCL with knowledge
fusion. Different from their work, our survey thoroughly investigates and categorizes the existing
FCL methods in Edge-AI based on three task characteristics: federated class continual learning,
federated domain continual learning, and federated task continual learning. In sections 2, 3, and 4,
these taxonomies will be explained in more detail.
• We present a comprehensive review and clear taxonomy of the state-of-the-art FCL research
based on different task characteristics: federated class continual learning, federated domain
continual learning, and federated task continual learning, including a large number of papers
in this rapidly expanding research field. The taxonomy, definitions, challenges, and advantages
and disadvantages of the representative methods are thoroughly discussed.
• We provide a review and summary of current real-world applications empowered by FCL,
such as intelligent transportation systems, intelligent medical systems, IoT, and digital twins,
highlighting the versatility and potential of FCL for making real-world impact.
• We deliberate upon and posit several open research challenges including the lack of universal
benchmarks, explainability, algorithm-hardware co-design, and FCL with foundation models,
while proposing prospective directions that could inspire the research community to advance
the field of FCL for its rapid development and wide deployment in the era of Edge-AI.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
4 Wang et al.
Survey
Structure
Introduction Class continual learning over time
Related Surveys
Classes Classes Classes Classes Classes Classes Federated Class
Continual Learning
Contributions …
el
od
M
d
lo
oa
Up
nl
w
Do
…
Do
…
loa
od
dM
el
Federated Task
Client n Local Client Continual Learning
…
Local Clients Regularization-based Methods
Task (t) Task (t+1) … Task (t+n) Architecture-based Methods
*Task identity (i.e., t, t+1,…, t+n) is provided during testing.
Replay-based Methods
Meta Learning-based Approaches
Future Directions Federated Continual
Conclusion Unsupervised Learning
and Challenges Learning Applications
FCL Benchmark
Intelligent Transportation System Smart Energy …
Explainable FCL
Intelligent Medical Systems Digital Twins
Algorithm-Hardware
Co-design for FCL Internet of Things Financial Audit
UAVs Robotics …
FCL with Foundation
Models
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 5
sequence of continual tasks 𝑇𝑚𝑘 = {𝑡𝑚 1 , 𝑡 2 , ..., 𝑡 𝑡 , ...}. Each task 𝑡 𝑡 = {(𝑥 𝑡 , 𝑦 𝑡 )} 𝑁𝑡 consists 𝑁
𝑘 𝑚𝑘 𝑚𝑘 𝑚𝑘 𝑖 𝑖 𝑖=1 𝑡
pairs of sample 𝑥𝑖 and corresponding label 𝑦𝑖𝑡 . The class set 𝐶 𝑡 of task 𝑡𝑚
𝑡 𝑡 includes its new classes
𝑘
and old class set 𝐶 𝑡 −1 in previous 𝑡𝑚
𝑡 −1 tasks. After local training is complete, each client 𝑚 transmits
𝑘 𝑘
the updated model parameters 𝜃𝑚 𝑟 to the server 𝑆 , and server 𝑆 aggregates them into the global
𝑘 𝑔 𝑔
parameter 𝜃𝐺𝑟 to integrate the task knowledge across all clients. Finally, the server 𝑆𝑔 distributes
the global parameter 𝜃𝐺𝑟 to all participating clients in the next training round.
Next, inspired by the tri-level (data-centric, model-centric, and algorithmic) division in CCL [25],
we categorise existing methods into four distinct groups: i) Generative Replay (Section 2.1), which
falls under the data-centric FCCL; ii) Parameter Regularization (Section 2.2), and iii) Parameter
Decomposition and Prompt-based methods (Section 2.3 and 2.4 respectively), which are the model-
centric FCCL; and iv) Knowledge Distillation (Section 2.5), which aligns with the algorithmic FCCL.
This categorization aims to provide a structured overview and facilitate a deeper understanding of
the rapidly evolving field. Finally, in Section 2.6, we summarize these FCCL approaches and analyse
the relation between them.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
6 Wang et al.
to reduce both the local training time and computational costs for clients, and it doesn’t necessitate
access to their private data.
Recently, considering that exemplar-based methods may not be suitable for privacy-sensitive
scenarios, Zhang et al. [17] proposed TARGET which is an effective solution for addressing cat-
astrophic forgetting in FCCL without storing local private client data or any datasets. They first
experimentally confirmed that non-IID settings can intensify the catastrophic forgetting problem
in FL. Then, they used the previously trained global model to transfer knowledge of old tasks to
current ones at the model level. Additionally, a trained generator synthesizes data to simulate
non-IID training datasets with assistant model distillation on the clients at the data level. Therefore,
TARGET does not require extra datasets or the retention of private data from previous tasks, making
it especially suitable for data-sensitive environments.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 7
method added a regularization term reflecting the class proportions in the client dataset to the
standard cross-entropy loss, reducing excessive pressure and subsequent loss on other client data.
Hu et al. [35] designed a new FCL framework called DuAFed, featuring a dual attention mecha-
nism for the scenario of different class increments and unbalanced features of clients. DuAFed first
ensures a balanced pre-training sample distribution by randomly sampling an equal number of
instances from each client. Further, the iCaRL strategy [36] is employed to accommodate dynamic
changes in training tasks. To mitigate the noise generated by clients with an imbalanced quantity
of classes, a channel attention mechanism is added on the client side, where feature compression,
feature map retrieval and regularization via learned weight coefficients of each channel with all the
elements of the corresponding channel are successively performed. Moreover, to solve the challenge
that the respective features are unbalanced and the importance is difficult to capture in FCCL, they
introduced a feature attention mechanism, which can capture the hierarchical importance of the
neural network in multiple local models, for the model aggregation of clients.
Yao et al. [37] proposed federated learning with local continual training (FedCL) leveraging
a parameter-regularization constrained local continual learning strategy to mitigate the weight
divergence and continually integrate knowledge on different local models into the global model,
whose efficiency is verified under the different non-IID class data distribution. Specifically, they
utilized the diagonal of its Fisher information matrix in EWC [38] to evaluate the importance
weight matrix of the global model on a small proxy dataset on the server. This matrix is used in the
loss function to force the local model to fit the local data distribution.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
8 Wang et al.
and restored the specified number of previous gradients that are most dissimilar tasks with the
current task’ gradient to prevent catastrophic forgetting, based on the weight-based pruning
technique. Then, the gradient integrator is designed to mitigate negative knowledge transfer and
improve substant model performance by incorporating gradients from before aggregation and
after aggregation. It is worth noting that FedKNOW as a client-side solution is more scalable than
FedWeIT as a server-side solution for FCCL due to the lower communication cost.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 9
an ‘old model’ to aid the currently updating ‘new model’. Although learning without forgetting
(LwF) [47] was the first successful application of KD in CCL, it can not directly apply to the
federated learning framework due to the centralized nature of CCL. To address this, Usmanova et al.
[48, 49] introduced FLwF to recognize the human activity based on all incrementally seen classes of
behaviour from local clients on the 6 classes representing different human activities in the UCI HAR
dataset. FLwF, which first extended KD to the federated setting, is the implementation of a standard
LwF method in FCL consisting of the past model of a client as the teacher model and one current
client model as the student model. Additionally, FLwF-2T, which consists of two teacher models
including the past model of a client and the server, was proposed to reduce forgetting in FCCL by
leveraging a server that maintains a general knowledge base across all clients’ class distribution.
Ma et al. [15] proposed continual federated learning with distillation (CFeD), which performs
KD at both client and server levels, uniquely featuring an independent unlabeled surrogate dataset
for each client. Specifically, it introduces a client division mechanism to utilize under-exploited
computational resources, aiding in reducing inter-task forgetting. Additionally, inspired by the
mini-batch iterative update approach in centralized training, server-side distillation is designed
to alleviate intra-task forgetting. In their class continual learning experimental scenarios, CFeD
outperforms other baselines, demonstrating the advantage of using the surrogate dataset to obtain
reasonable soft labels for old tasks.
Wei and Li [50] developed the federated learning with knowledge lock (FedKL) to tackle the
issue of catastrophic forgetting in federated learning, particularly the loss of knowledge from other
participants due to local updates. FedKL utilizes KD techniques to preserve previously acquired
knowledge while overcoming server knowledge forgetting caused by data isolation.
Efforts to expand FCCL into areas beyond computer vision, such as intrusion detection [51], are
emerging. Jin et al. [51] introduced FL-IIDS to solve catastrophic forgetting in federated intrusion
detection systems (IDS). However, this approach simplifies the challenge by assuming that traffic
data across local clients in FL-IIDS is IID. They identify three key issues in real-world intrusion
detection: (1) class imbalance in various traffic data types, (2) a predominance of new over old
classes in current tasks, leading to a bias towards new knowledge, and (3) a shrinking in the sample
size of old classes in dynamic example memory, weakening the ability to learn old classes. To combat
these, they proposed dynamic example memory, class gradient balancing loss, and sampling label
smoothing loss, respectively. Notably, their KD strategy, termed label smoothing loss, incorporates
soft labels of old classes into current training, enhancing the model’s generalization over old classes
and mitigating local model forgetting.
• The impact of GLFC [16] on subsequent research and the integration of techniques from
different fields are evident.
• The impact of LwF [47], iCaRL [36] and APD [39] upon the field of FCCL is huge.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
10 Wang et al.
• With the rapid development of generative models, techniques represented by GAN have
emerged in the field of replay-based FCCL, which especially satisfies the users’ data privacy
protection needs in a data-free manner.
• Benefiting from the advanced capabilities and rapid developments of foundation models in
representation and transferability, a notable emergence of innovative FCL methods incor-
porating well-pre-trained models, such as ViT-based methods, have progressively surfaced
[43, 45, 52].
These approaches offer various solutions and pivotal insights to address the challenges encountered
in FCCL which is still in its nascent stage.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 11
Vanilla FL
FCCL + ViT
methods + pseudo + ACGAN + iCaRL
+ re-weighted
+ LwF + APD
rehearsal softmax
Fed-Cprompt FedCIL GLFC DuAFed WSM FLwF & FLwF-2T FedWeIT Cross-FCL
+ contrastive
prompt + IIDS
Rehearsal-free + Top-ranked
method HePCo MFCL Fed-IIDS parameters
extension + semantic segmentaion as knowledge
+ task-irrelevant
Data-free With auxiliary dataset
prompt
manner
FCILPT TARGET CFeD FedKL FedKNOW
LGA FBL
Prompt-based methods Generative replay Parameter regularization Knowledge distillation Parameter decomposition
Fig. 2. Diagram of the relation among FCCL methods. There are five categories in our paper: generative replay
(purple), parameter regularization (yellow), parameter decomposition (orange), prompt-based methods (green)
and knowledge distillation (blue). Auxiliary datasets, the data-free manner with distillation and rehearsal-free
methods, frequently employed in some methods, are indicated by the dashed boxes.
to the distribution of datasets. Traditional continual learning focuses on the dynamics of individual
domains, while the integration of FL further allows each client with its private dataset to be treated
as a separate domain. Therefore, FDCL research focuses on the generalization of different domains
and the dynamic adaptation of individual domains.
To provide a clear understanding of the problem definition of FDCL, we first conceptualize it as
follows. For a given period [0,𝑇 ] and 𝐾 clients engaged in FL, we assume that the 𝑘th client contains
|𝐾 |
a set of sample 𝑥 and label 𝑦 pairs 𝐷𝑘𝑡 = {(𝑥𝑖𝑡 , 𝑦𝑖𝑡 )}𝑖=1 at a given time 𝑡. It is worth noting that
there may be unlabeled samples in some clients, but all samples belong to known classes. Multiple
clients in FL form a global known domain 𝐷𝑔𝑡 = {𝐷 1𝑡 , 𝐷 2𝑡 , . . . , 𝐷 𝐾𝑡 }. Subsequently, local training
and aggregation are performed in the classical FL paradigm. In this way, each client constructs a
local model 𝑓𝜃𝑘 : 𝑋𝑘 → 𝑌𝑘 with its private dataset, which is aggregated by the server to generate
a comprehensive global model 𝑓𝜃𝑔 : 𝑋𝑔 → 𝑌𝑔 after multiple rounds of communication. In FDCL
scenarios, the global model 𝑓𝜃𝑔 not only serves as a representation of the known domain 𝐷𝑔𝑡 but
also can be used to generalize the unknown domain 𝐷𝑢𝑛𝑘 𝑡 . Furthermore, the local domain changes
𝑡 𝑡 +𝑑
over time, which means 𝐷𝑘 ≠ 𝐷𝑘 after a time interval 𝑑. Moreover, in FDCL, the task identity is
not necessary during testing, because if each task has the same classes, the output would be the
same as well.
As illustrated in Fig. 3, FDCL faces two unique challenges:
• Challenge 1: distributed multi-source domains generalization based on privacy pro-
tection: In the distributed environment of FL, each client constitutes a separate domain.
This diversity significantly increases the challenges associated with the global model gener-
alization. Moreover, the commitment to protecting privacy results in data isolation, which
intensifies the complexity of learning and optimizing the global model.
• Challenge 2: unknown domain generalization and known domain drift: In the context
of FDCL, the global model also needs to extend its generalization capabilities beyond the
multi-source domains to cover unknown domains. Furthermore, dynamic data changes in
continual learning result in known domain drift, so models need to have the ability to learn
and adapt efficiently in a time-evolving and uncertain data environment.
Overall, these two challenges outline the balance between maintaining data privacy and improv-
ing model generalization capabilities in a changing environment. According to the different types
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
12 Wang et al.
? Unknown Domain
Challenge 2
Known Domain
Challenge 1
Fig. 3. Overall diagram of challenges faced by FDCL, challenge 1: privacy protection of multi-source domains
and domain drift of the global model concerning the local model (inter-domain); challenge 2: generalization
of the global model for the unknown domain, and domain drift of locally known domains over time and data
(intra-domain).
of approaches, current studies on FDCL can be divided into four main areas: Domain Data Sup-
plementation (Section 3.1), Domain Knowledge Learning (Section 3.2), Domain Model Enhancement
(Section 3.3), and Domain Weight Aggregation (Section 3.4). Various research approaches and the
key contributions of FDCL are detailed in Table 2. In the subsequent subsections, we thoroughly
examine the approaches associated with these four areas.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 13
Instead of synthesizing data locally, Liu et al. [54] trained a data generator on the centre server.
The server initially collects data from each client as a constant reference point to train the data
generator. Then, the trained generator and global model are broadcast to each client to produce
synthetic data. Furthermore, a mechanism of variable weights is also introduced to alleviate the
imbalance in the number of local classes across various clients. Although the above method of
supplementing data from other clients can improve the generalization ability of the global model,
there is still some risk of privacy leakage.
To further improve the security of proxy datasets, Park et al. [55] introduced variable embedding
rehearsal (VER) and server-side training (SST) strategies. On one hand, the authors used the VER
method that combines the security advantages of variable autoencoder (VAE) and embedding-based
reformulation (EBR) by generating random representations of a subset of data from each client. On
the other hand, the SST strategy facilitates training by rehearsing the data representations that
have been safely collected from each client avoiding direct access to the original dataset.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
14 Wang et al.
Supplementing old data from local client. To address the issue of individual domain drift
over time in FDCL, Casado et al. [56–58] utilized different methods to detect and adapt to local
client domain drift. For domain drift detection, the authors used a CUSUM-type (Cumulative Sum)
method based on a beta distribution. Building upon the original method, they proposed a sliding
window technique to detect changes in the confidence distribution of local classifiers. For domain
drift adaptation, they gathered data from new domains to update the long-term storage of the local
client, thereby preserving the memory of the previous domain. Over time, this approach requires
adding more long-term storage memory locally, otherwise forgetting will still occur, which may
not be applicable in some resource-constrained scenarios.
Since it is impractical to store all the data before the local client domain drift, Zhang et al. [59]
proposed a method to periodically store a generalized representation of the local client data, while
taking advantage of dynamically changing data to update models with new domain knowledge. In
addition, the central server also merges new domains from different clients based on the relevance
of the spatial and temporal dimensions.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 15
Intra-domain knowledge learning. To cope with the problem of domain drift in newly
collected data, Guo et al. [62] proposed incremental unsupervised adversarial domain adaptation
(IUADA) that merges FL and adversarial learning. This method aims to transfer knowledge from the
local target domain to the model learned from labelled data. In particular, the local target feature
extractor and discriminator are alternately trained through adversarial learning, separating source
features into positive and negative. Meanwhile, the gradient of the prediction score serves as an
attention weight to obtain distinctive features, which are aligned with local domain features to
adapt domain drift. This adversarial learning approach increases computational complexity and
resource requirements, especially for local clients with limited resources.
Regarding local domain drift in dynamic environments, Huang et al. [63] constructed a model
for local client domain drift evolving. The theoretical demonstration in this work reveals that the
convergence rate of the method in time-evolving scenarios is related to the approximation accuracy.
Moreover, Chen and Xu [64] introduced a dynamic update mechanism that leverages new weights
to adjust the parameters of the output classifier. This mechanism allows the model to seamlessly
integrate information from recently acquired data while preserving previously learned knowledge.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
16 Wang et al.
FCL, which is a state-of-the-art method for training recurrent neural networks (RNN) in dynamic
environments. They slowed down weight forgetting by fixing the weights of hidden layers in the
RNN and training multiple competing prediction heads simultaneously. Mori et al. [70] split the
neural network for each client into a unique feature extraction component and a common feature
extraction component. The authors regarded the local training as learning a unique task without
forgetting the knowledge of a common task, thus introducing the progressive neural network
(PNN) as the continual learning method in their solution. Le et al. [71] mitigated the catastrophic
forgetting and adapted to environmental changes by broad learning (BL), which supports CL without
retraining each client for new data. Moreover, they designed a weighted processing strategy and a
batch-asynchronous technique to support accurate and fast training. This asynchronous update
method combined with BL can decouple local training from the knowledge of the global model.
Zhu et al. [72] proposed the SOINN-RBF method, which effectively combines radial basis function
(RBF) networks and self-organizing incremental neural networks (SOINN). This method aims
to optimize data labelling management and real-time sample domain adaptation through high
dimensional spatial mapping to improve data regularity identification and generalization. Han
et al. [73] presented an incremental tree model construction method based on very fast decision
tree (VFDT) for efficiently handling domain drift. They developed a lightweight practically order-
preserving encoding (POPE) method, which replaces complex encryption algorithms while reducing
computational and communication burdens. Additionally, they adapted a region-counting method
to effectively reduce the memory overhead of POPE.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 17
collaborative semi-supervised prediction tasks by merging global and local models and incorporates
probabilistic client-server consistency techniques to address domain drift.
Yao et al. [78] introduced a graph-aided federated learning (GAFL) approach with a few-shot
node inhibition mechanism to improve the generalization capability of global models. GAFL designs
collaborative graphs of pair-wise and category-wise levels to describe the relationship of customers
to distinguish different data distributions. A continual learning approach is tailored to new clients,
limiting graphics and model updates to a smaller scope, thus minimizing the disruption caused by
the original model domain.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
18 Wang et al.
participating clients in the next training round. In FTCL, the task identity is clearly provided during
learning and testing, so the model can be trained and performed by referring to the task-specific
components.
Under this FTCL scenario, there are two main challenges that need to be solved after each client
updates its local model with the global parameter 𝜃𝐺 to obtain the cross-client task knowledge:
• Challenge 1: Catastrophic forgetting happens due to the insufficient training data of tasks
from other clients in 𝑇 𝐷𝑐𝑖 .
• Challenge 2: The performance of local clients degrades as client and task heterogeneity
increases, causing local model training to update its parameters in the wrong direction.
Recently, many studies have been conducted to provide different methods to solve these chal-
lenges in FTCL. In the following subsections, we will present an elaborated taxonomy of represen-
tative federated task continual learning methods as illustrated in Fig. 4, analyzing extensively their
main motivations, proposed solutions, and related evaluations.
Federated Task
Continual Learning
Fig. 4. The elaborated taxonomy of representative federated task continual learning methods
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 19
and the online determination module (ODM), to evaluate the model performance, determine the
inter-institutional training order and adjust transmission costs in real time.
Chaudhary et al. [81] applied FCL in text classification to minimize catastrophic forgetting,
maximize the inter-client transfer learning and minimize inter-client interference by proposing a
framework called federated selective inter-client transfer (FedSeIT). FedSeIT uses parameter decom-
position methods to decompose each client’s model parameters into three different parameter sets
to access task-adaptive parameters better and selectively leverage task-specific knowledge. Specifi-
cally, the dense local base parameters capture the task-generic knowledge across clients. Sparse
task-adaptive parameters capture task-specific knowledge for each task. Sparse mask parameters
selectively utilize the global knowledge. The authors also proposed a task selection strategy named
selective inter-client transfer (SIT). SIT is designed for efficient assessment of domain overlap
at the global server using encoded data representations and selection of relevant task-adaptive
parameters of foreign clients without sharing data, therefore preserving privacy while keeping
the performance. In evaluation, they used five datasets with unique labels as the FTCL scenario to
demonstrate the effectiveness compared with the baseline method.
Zhang et al. [41] proposed a parameter decomposition-based FCL framework named Cross-FCL.
Cross-FCL uses additive parameter decomposition to separate knowledge of the local model into
base parameters for common knowledge and task-specific parameters for personalized knowledge
of the current local task to minimize the interference between federated learning and continual
learning. The authors also introduced cross-edge strategies on biased global aggregation and local
optimization, which helps reduce memory and computation costs as well as balancing memory
usage and adaptation trade-offs. The authors built a testbed for multi-edge federated learning on
real-world image recognition datasets and other public datasets that are divided into different
disjoint sub-datasets as local task datasets in FTCL settings to demonstrate the effectiveness of the
proposed Cross-FCL framework compared with the baseline.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
20 Wang et al.
FedViT addresses the challenges of catastrophic forgetting, negative knowledge transfer, and
scalability issues in FCL under FTCL settings. By considering the limited storage and computation
capabilities of edge devices, FedViT utilises a small number of samples from each task to improve
the performance against the above challenges. It proposes a knowledge extractor that retains critical
knowledge from past tasks using a small subset of samples, a gradient restorer that converts this
knowledge into gradients to help the model recover past task knowledge quickly, and a gradient
integrator that ensures the combination of new and old task gradients does not lead to a loss in
accuracy for any task.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 21
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
22 Wang et al.
vehicle-to-everything networks and then applied continual learning settings in their decentralised
consensus-driven federated learning method.
To improve the safety of ITS, Yuan et al. [90] utilised federated continual learning for naturalistic
driving action recognition to prevent driver distraction, reduce the risk of traffic accidents, and
alleviate the privacy concerns caused by in-cabin cameras.
Under a similar scenario, Guo et al. [91] targeted the dynamics and heterogeneity challenges
within real-world driver distraction detection and proposed a cost-efficient mechanism ICMFed by
integrating incremental learning, meta-learning and federated learning to improve the efficiency
and safety of intelligent transportation systems.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 23
In the cognitive Internet of Things, modulation classification is an essential enabler for primary
user detection and signal recognition. To process a large amount of heterogeneously cognitive IoT
data in a distributed mechanism, Qi et al. [97] proposed a federated continual learning method
with knowledge distillation to learn the modulation classification knowledge of private classes in
each local device. Similar to [70], they divided the training into two phases, i.e., warm-up phase for
global model learning and customised incremental learning phase for client model learning.
Yang et al. [98] proposed a federated continual learning framework with an asynchronous
semi-supervised training algorithm. Their proposed FedIL framework can help open platform
applications such as IoT to prevent deep learning models from forgetting the learned information
of labelled data and accelerate the convergence of the global model during training.
5.4 UAVs
He et al. [99] combined a stacked board learning system with federated continual learning to
accommodate the increment of input data and enhancement nodes in UAV systems. Their proposed
model can effectively relieve the catastrophic forgetting problem generated by dynamic data
collection, and improve the accuracy of intrusion detection with low computational cost.
The failure detection is also an essential module in UAV networks for swarm-based drone
delivery services. To efficiently utilise the energy of UAVs and the knowledge learned from old
drone flight history, Alkouz et al. [100] proposed a weighted continual federated learning method
by allocating different weights to balance the importance between old and new flying data of drones
incrementally, which performs the failure prediction at the source or when the drones land at
intermediate nodes.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
24 Wang et al.
5.8 Robotics
Obstacle avoidance is a critical and essential function in autonomous mobile robot development.
Robots need to have the capabilities of continually learning the model for obstacle avoidance like
humans. Yu et al. [105] proposed a federated continual learning-empowered obstacle avoidance
covering data collection, model training, and model sharing.
In the domain of socially aware robotics, Guerdan et al. [106] proposed a framework that enables
robots to personalize their settings for new individuals or groups based on FCL. They introduced
four key components as evaluation metrics for the decentralized robot learning framework: adapta-
tion quality, adaptation time, knowledge sharing, and model overhead. Moreover, they developed
an Elastic Transfer method based on importance regularization, which facilitates retaining rele-
vant parameters across multiple robots, thereby enhancing knowledge sharing among robots and
improving both the quality and speed of adaptation.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 25
in FL, and performance across current, past, and future tasks in CL. The specific metrics vary,
with some studies using averaged accuracy to assess forgetting [15, 16] and others employing
forward and backward transfer metrics [17, 40, 49]. Apart from forgetting, future FCL research
should consider the inherent variability across clients more thoroughly, particularly concerning
constraints in computational capacity, energy and memory. Additionally, the characteristics of
data resources, including non-i.i.d data distribution, sample quantity and class imbalance, demand
significant attention, especially in the context of Edge AI. Consequently, the formulation of diverse
metrics, meticulously designed to encapsulate these specific client-side factors, is indispensable for
the nuanced evaluation and advancement of FCL.
3) User-friendly and modular frameworks. Existing frameworks and libraries such as FATE
[126], PySyft [127], TFF [128], Flower [129], FedML [130], FederatedScope [131] and Avalanche [132]
have significantly facilitated FL and CL research. All of these tools are open-source, accompanied
by comprehensive documentation, and support effortless, customized modular implementation in
practice, owing to their plug-and-play nature. Nevertheless, we firmly believe that crafting a user-
friendly and modular framework stands as a fundamental and advantageous initiative to foster the
FCL community for collaborative and sustainable growth. To this end, it is more efficient to introduce
CL-empowered and FL-enabled plug-ins for existing FL and CL frameworks, respectively, rather
than starting from scratch. Alternatively, developing a streamlined and lightweight framework
dedicated to FCL presents another viable strategy.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
26 Wang et al.
may lack a clear understanding of how clients’ contributions affect the global model considering
complex aggregation mechanisms under a decentralised framework. By further exploring the
synergistic consolidation between explainable FL and CL, we can enable more effective, secure,
transparent and trustworthy FCL model development. This synergistic consolidation for explainable
FCL has the potential to facilitate the deployment of secure Edge-AI systems that are not only
powerful but also ethically responsible.
2) Scalability. The scalability challenge will also arise when more clients continually participate
during federated training, making the process more heterogeneous and less efficient, causing the
global model an increasing challenge to accurately generate and efficiently communicate meaningful
and consistent explanations. Therefore, solving scalability challenges in enhancing explainability
for FCL in large-scale scenarios in Edge AI also needs to be further explored.
In short, explainable FCL will be an essential, challenging, but highly rewarding research direction,
helping to accelerate the development of various robust and reliable applications of Edge AI.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 27
7 CONCLUSION
Edge-AI is an emerging and rapidly developing area. To ensure the performance of Edge-AI
applications when handling various devices and evolving data at the edge, federated continual
learning emerges to provide sustained adaptability and stable performance for learning models over
time. In this paper, we are the first to conduct an extensive and comprehensive survey on federated
continual learning for Edge-AI and categorize three scenarios for federated continual learning based
on different task characteristics: federated class continual learning, federated domain continual
learning, and federated task continual learning. We thoroughly summarised the background,
challenges, problem formalisation, advanced solutions, and limitations of each scenario. We also
provide a review and summary of nine real-world applications empowered by federated continual
learning In addition, we highlighted four open research challenges and proposed prospective
directions. We hope this survey will inspire the research community to accelerate the progress of
improving federated continual learning for Edge-AI.
REFERENCES
[1] Shi Dong, Ping Wang, and Khushnood Abbas. A survey on deep learning and its applications. Computer Science
Review, 40:100379, 2021.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
28 Wang et al.
[2] Yiping Zuo, Jiajia Guo, Ning Gao, Yongxu Zhu, Shi Jin, and Xiao Li. A survey of blockchain and artificial intelligence
for 6g wireless communications. IEEE Communications Surveys & Tutorials, 2023.
[3] Andre Esteva, Katherine Chou, Serena Yeung, Nikhil Naik, Ali Madani, Ali Mottaghi, Yun Liu, Eric Topol, Jeff Dean,
and Richard Socher. Deep learning-enabled medical computer vision. NPJ digital medicine, 4(1):5, 2021.
[4] Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. A survey of deep learning applications to
autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 22(2):712–733, 2020.
[5] Nicolae Sapoval, Amirali Aghazadeh, Michael G Nute, Dinler A Antunes, Advait Balaji, Richard Baraniuk, CJ Barberan,
Ruth Dannenfelser, Chen Dun, Mohammadamin Edrisi, et al. Current progress and open challenges for applying
deep learning across the biosciences. Nature Communications, 13(1):1728, 2022.
[6] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient
learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR,
2017.
[7] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. A survey on federated learning. Knowledge-Based
Systems, 216:106775, 2021.
[8] Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N
Galtier, Bennett A Landman, Klaus Maier-Hein, et al. The future of digital health with federated learning. NPJ digital
medicine, 3(1):1–7, 2020.
[9] Hangyu Zhu, Jinjin Xu, Shiqing Liu, and Yaochu Jin. Federated learning on non-iid data: A survey. Neurocomputing,
465:371–390, 2021.
[10] Pian Qi, Diletta Chiaro, Antonella Guzzo, Michele Ianni, Giancarlo Fortino, and Francesco Piccialli. Model aggregation
techniques in federated learning: A comprehensive survey. Future Generation Computer Systems, 2023.
[11] Viraaji Mothukuri, Reza M Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, and Gautam Srivastava. A
survey on security and privacy of federated learning. Future Generation Computer Systems, 115:619–640, 2021.
[12] Latif U Khan, Shashi Raj Pandey, Nguyen H Tran, Walid Saad, Zhu Han, Minh NH Nguyen, and Choong Seon Hong.
Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications
Magazine, 58(10):88–93, 2020.
[13] Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: Theory,
method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
[14] Marcos F Criado, Fernando E Casado, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Non-iid data and continual
learning processes in federated learning: A long road ahead. Information Fusion, 88:263–280, 2022.
[15] Yuhang Ma, Zhongle Xie, Jue Wang, Ke Chen, and Lidan Shou. Continual federated learning based on knowledge
distillation. In IJCAI, pages 2182–2188, 2022.
[16] Jiahua Dong, Lixu Wang, Zhen Fang, Gan Sun, Shichao Xu, Xiao Wang, and Qi Zhu. Federated class-incremental
learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10164–10173,
2022.
[17] Jie Zhang, Chen Chen, Weiming Zhuang, and Lingjuan Lyu. Target: Federated class-continual learning via exemplar-
free distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4782–4793,
2023.
[18] Donald Shenaj, Marco Toldo, Alberto Rigon, and Pietro Zanuttigh. Asynchronous federated continual learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5054–5062, 2023.
[19] Tuo Zhang, Lei Gao, Chaoyang He, Mi Zhang, Bhaskar Krishnamachari, and A Salman Avestimehr. Federated learning
for the internet of things: Applications, challenges, and opportunities. IEEE Internet of Things Magazine, 5(1):24–29,
2022.
[20] Mang Ye, Xiuwen Fang, Bo Du, Pong C Yuen, and Dacheng Tao. Heterogeneous federated learning: State-of-the-art
and research challenges. ACM Computing Surveys, 56(3):1–44, 2023.
[21] Gido M van de Ven, Tinne Tuytelaars, and Andreas S Tolias. Three types of incremental learning. Nature Machine
Intelligence, 4(12):1185–1197, 2022.
[22] Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne
Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern
analysis and machine intelligence, 44(7):3366–3385, 2021.
[23] Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer.
Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 45(5):5513–5533, 2022.
[24] Xin Yang, Hao Yu, Xin Gao, Hao Wang, Junbo Zhang, and Tianrui Li. Federated continual learning via knowledge
fusion: A survey. arXiv preprint arXiv:2312.16475, 2023.
[25] Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. Deep class-incremental
learning: A survey, February 2023.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 29
[26] Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, and Richard Vidal. Federated Learning for Data Streams. In
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 8889–8924. PMLR, April
2023.
[27] Sean M. Hendryx, Dharma Raj KC, Bradley Walls, and Clayton T. Morrison. Federated Reconnaissance: Efficient,
Distributed, Class-Incremental Learning, August 2021.
[28] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative Adversarial Networks, June 2014.
[29] Daiqing Qi, Handong Zhao, and Sheng Li. Better generative replay for continual federated learning. In The Eleventh
International Conference on Learning Representations, 2022.
[30] Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, and Salman Avestimehr. Don’t memorize; mimic
the past: Federated class incremental learning without episodic memory. In Federated Learning and Analytics in
Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
[31] Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, and Salman Avestimehr. A data-free approach to
mitigate catastrophic forgetting in federated class incremental learning for vision tasks. In Thirty-seventh Conference
on Neural Information Processing Systems, 2023.
[32] Jiahua Dong, Yang Cong, Gan Sun, Yulun Zhang, Bernt Schiele, and Dengxin Dai. No one left behind: Real-world
federated class-incremental learning. arXiv preprint arXiv:2302.00903, 2023.
[33] Jiahua Dong, Duzhen Zhang, Yang Cong, Wei Cong, Henghui Ding, and Dengxin Dai. Federated incremental semantic
segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3934–3943,
2023.
[34] Gwen Legate, Lucas Caccia, and Eugene Belilovsky. Re-weighted softmax cross-entropy to control forgetting in
federated learning. arXiv preprint arXiv:2304.05260, 2023.
[35] Kai Hu, Meixia Lu, Yaogen Li, Sheng Gong, Jiasheng Wu, Fenghua Zhou, Shanshan Jiang, and Yi Yang. A federated
incremental learning algorithm based on dual attention mechanism. Applied Sciences, 12(19):10025, 2022.
[36] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. ICaRL: Incremental classifier
and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages
5533–5542, Honolulu, HI, July 2017. IEEE.
[37] Xin Yao and Lifeng Sun. Continual local training for better initialization of federated models. In 2020 IEEE International
Conference on Image Processing (ICIP), pages 1736–1740. IEEE, 2020.
[38] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran
Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan
Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National
Academy of Sciences, 114(13):3521–3526, March 2017.
[39] Jaehong Yoon, Saehoon Kim, Eunho Yang, and Sung Ju Hwang. Scalable and order-robust continual learning with
additive parameter decomposition. In International Conference on Learning Representations, 2019.
[40] Jaehong Yoon, Wonyong Jeong, Giwoong Lee, Eunho Yang, and Sung Ju Hwang. Federated continual learning with
weighted inter-client transfer. In International Conference on Machine Learning, pages 12073–12086. PMLR, 2021.
[41] Zhouyangzi Zhang, Bin Guo, Wen Sun, Yan Liu, and Zhiwen Yu. Cross-fcl: Toward a cross-edge federated continual
learning framework in mobile edge computing systems. IEEE Transactions on Mobile Computing, 2022.
[42] Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, and Lydia Y Chen. Fedknow: Federated
continual learning with signature task knowledge integration at edge. In 2023 IEEE 39th International Conference on
Data Engineering (ICDE), pages 341–354. IEEE, 2023.
[43] Shaunak Halbe, James Seale Smith, Junjiao Tian, and Zsolt Kira. Hepco: Data-free heterogeneous prompt consolidation
for continual federated learning. arXiv preprint arXiv:2306.09970, 2023.
[44] Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, and Lan Zhang. Fed-CPrompt: Contrastive Prompt for Rehearsal-Free
Federated Continual Learning, September 2023.
[45] Jiale Liu, Yu-Wei Zhan, Chong-Yu Zhang, Xin Luo, Zhen-Duo Chen, Yinwei Wei, and Xin-Shun Xu. Federated
class-incremental learning with prompting. arXiv preprint arXiv:2310.08948, 2023.
[46] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the Knowledge in a Neural Network, March 2015.
[47] Zhizhong Li and Derek Hoiem. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(12):2935–2947, December 2018.
[48] Anastasiia Usmanova, François Portet, Philippe Lalanda, and German Vega. Federated continual learning through
distillation in pervasive computing. In 2022 IEEE International Conference on Smart Computing (SMARTCOMP), pages
86–91. IEEE, 2022.
[49] Anastasiia Usmanova, François Portet, Philippe Lalanda, and German Vega. A distillation-based approach integrating
continual learning and federated learning for pervasive services. arXiv preprint arXiv:2109.04197, 2021.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
30 Wang et al.
[50] Guoyizhe Wei and Xiu Li. Knowledge lock: Overcoming catastrophic forgetting in federated learning. In Pacific-Asia
Conference on Knowledge Discovery and Data Mining, pages 601–612. Springer, 2022.
[51] Zhigang Jin, Junyi Zhou, Bing Li, Xiaodong Wu, and Chenxu Duan. Fl-iids: A novel federated learning-based
incremental intrusion detection system. Future Generation Computer Systems, 151:57–70, 2024.
[52] Chenghao Liu, Xiaoyang Qu, Jianzong Wang, and Jing Xiao. Fedet: a communication-efficient federated class-
incremental learning framework based on enhanced transformer. In Proceedings of the Thirty-Second International
Joint Conference on Artificial Intelligence, pages 3984–3992, 2023.
[53] Quande Liu, Cheng Chen, Jing Qin, Qi Dou, and Pheng-Ann Heng. Feddg: Federated domain generalization on
medical image segmentation via episodic learning in continuous frequency space. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 1013–1023, 2021.
[54] Shunjian Liu, Xinxin Feng, and Haifeng Zheng. Overcoming forgetting in local adaptation of federated learning
model. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 613–625. Springer, 2022.
[55] Tae Jin Park, Kenichi Kumatani, and Dimitrios Dimitriadis. Tackling dynamics in federated incremental learning
with variational embedding rehearsal. arXiv preprint arXiv:2110.09695, 2021.
[56] Fernando E Casado, Dylan Lema, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Federated and continual
learning for classification tasks in a society of devices. arXiv preprint arXiv:2006.07129, 2020.
[57] Fernando E Casado, Dylan Lema, Marcos F Criado, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Concept
drift detection and adaptation for federated and continual learning. Multimedia Tools and Applications, pages 1–23,
2022.
[58] Fernando E Casado, Dylan Lema, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Ensemble and continual
federated learning for classification tasks. Machine Learning, pages 1–41, 2023.
[59] Lei Zhang, Guanyu Gao, and Huaizheng Zhang. Spatial-temporal federated learning for lifelong person re-
identification on distributed edges. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
[60] Wenke Huang, Mang Ye, and Bo Du. Learn from others and be yourself in heterogeneous federated learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10143–10153, 2022.
[61] Ying Wang, Fengjun Shang, and Jianjun Lei. Multi-granularity fusion resource allocation algorithm based on dual-
attention deep reinforcement learning and lifelong learning architecture in heterogeneous iiot. Information Fusion,
page 101871, 2023.
[62] Yongxin Guo, Tao Lin, and Xiaoying Tang. Towards federated learning on time-evolving heterogeneous data. arXiv
preprint arXiv:2112.13246, 2021.
[63] Yan Huang, Mengxuan Du, Haifeng Zheng, and Xinxin Feng. Incremental unsupervised adversarial domain adaptation
for federated learning in iot networks. In 2022 18th International Conference on Mobility, Sensing and Networking
(MSN), pages 186–190. IEEE, 2022.
[64] Zhiyong Chen and Shugong Xu. Learning domain-heterogeneous speaker recognition systems with personalized
continual federated learning. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1):33, 2023.
[65] Valerio De Caro, Claudio Gallicchio, and Davide Bacciu. Continual adaptation of federated reservoirs in pervasive
environments. Neurocomputing, 556:126638, 2023.
[66] Zhao Zhang, Yong Zhang, Da Guo, Shuang Zhao, and Xiaolin Zhu. Communication-efficient federated continual
learning for distributed learning system with non-iid data. Science China Information Sciences, 66(2):122102, 2023.
[67] Ajesh Koyatan Chathoth, Clark P Necciai, Abhyuday Jagannatha, and Stephen Lee. Differentially private federated
continual learning with heterogeneous cohort privacy. In 2022 IEEE International Conference on Big Data (Big Data),
pages 5682–5691. IEEE, 2022.
[68] Zichen Ma, Yu Lu, Wenye Li, and Shuguang Cui. Efl: Elastic federated learning on non-iid data. In Conference on
Lifelong Learning Agents, pages 92–115. PMLR, 2022.
[69] Leonard Bereska and Efstratios Gavves. Continual learning of dynamical systems with competitive federated reservoir
computing. In Conference on Lifelong Learning Agents, pages 335–350. PMLR, 2022.
[70] Junki Mori, Isamu Teranishi, and Ryo Furukawa. Continual horizontal federated learning for heterogeneous data. In
2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
[71] Junqing Le, Xinyu Lei, Nankun Mu, Hengrun Zhang, Kai Zeng, and Xiaofeng Liao. Federated continuous learning
with broad network architecture. IEEE Transactions on Cybernetics, 51(8):3874–3888, 2021.
[72] Meng-yuan Zhu, Zhuo Chen, Ke-fan Chen, Na Lv, and Yun Zhong. Attention-based federated incremental learning
for traffic classification in the internet of things. Computer Communications, 185:168–175, 2022.
[73] Zhaoyang Han, Chunpeng Ge, Bingzhe Wu, and Zhe Liu. Lightweight privacy-preserving federated incremental
decision trees. IEEE Transactions on Services Computing, 2022.
[74] Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya Zhang, Qi Tian, and Yanfeng Wang. Federated domain generalization
with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 3954–3963, 2023.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 31
[75] Christophe Dupuy, Tanya G Roosta, Leo Long, Clement Chung, Rahul Gupta, and Salman Avestimehr. Learnings
from federated learning in the real world. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pages 8767–8771. IEEE, 2022.
[76] Xiaoying Wang, Zhiwei Liang, Arthur Sandor Voundi Koe, Qingwu Wu, Xiaodong Zhang, Haitao Li, and Qintai
Yang. Secure and efficient parameters aggregation protocol for federated incremental learning and its applications.
International Journal of Intelligent Systems, 37(8):4471–4487, 2022.
[77] Cobbinah B Mawuli, Jay Kumar, Ebenezer Nanor, Shangxuan Fu, Liangxu Pan, Qinli Yang, Wei Zhang, and Junming
Shao. Semi-supervised federated learning on evolving data streams. Information Sciences, page 119235, 2023.
[78] Zoujing Yao, Pengyu Song, and Chunhui Zhao. Finding trustworthy neighbors: Graph aided federated learning for
few-shot industrial fault diagnosis with data heterogeneity. Journal of Process Control, 129:103038, 2023.
[79] Yavuz Faruk Bakman, Duygu Nur Yaldiz, Yahya H Ezzeldin, and Salman Avestimehr. Federated orthogonal training:
Mitigating global catastrophic forgetting in continual federated learning. arXiv preprint arXiv:2309.01289, 2023.
[80] Hao Wang, Ruihong He, Xiaoyu Zhang, Zhaoying Bian, Dong Zeng, and Jianhua Ma. A peer-to-peer federated
continual learning network for improving ct imaging from multiple institutions. arXiv preprint arXiv:2306.02037, 2023.
[81] Yatin Chaudhary, Pranav Rai, Matthias Schubert, Hinrich Schütze, and Pankaj Gupta. Federated continual learning
for text classification via selective inter-client transfer. arXiv preprint arXiv:2210.06101, 2022.
[82] Giulio Zizzo, Ambrish Rawat, Naoise Holohan, and Seshu Tirupathi. Federated continual learning with differentially
private data sharing. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with
NeurIPS 2022), 2022.
[83] Zhe Wang, Yu Zhang, Xinlei Xu, Zhiling Fu, Hai Yang, and Wenli Du. Federated probability memory recall for
federated continual learning. Information Sciences, 629:551–565, 2023.
[84] Xiaojiang Zuo, Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, and Lydia Y. Chen. FedViT:
Federated continual learning of vision transformer at edge. Future Generation Computer Systems, 154:1–15, May 2024.
[85] Felix Schur, Parnian Kassraie, Jonas Rothfuss, and Andreas Krause. Lifelong bandit optimization: no prior and no
regret. In Uncertainty in Artificial Intelligence, pages 1847–1857. PMLR, 2023.
[86] Dongdong Li, Nan Huang, Zhe Wang, and Hai Yang. Personalized federated continual learning for task-incremental
biometrics. IEEE Internet of Things Journal, 2023.
[87] Subarnaduti Paul, Lars-Joel Frey, Roshni Kamath, Kristian Kersting, and Martin Mundt. Masked autoencoders are
efuficient continual federated learners. arXiv preprint arXiv:2306.03542, 2023.
[88] K Hemant Kumar Reddy, Rajat Shubhra Goswami, and Diptendu Sinha Roy. A deep learning-based smart service
model for context-aware intelligent transportation system. The Journal of Supercomputing, pages 1–23, 2023.
[89] Luca Barbieri, Stefano Savazzi, Mattia Brambilla, and Monica Nicoli. Decentralized federated learning for extended
sensing in 6g connected vehicles. Vehicular Communications, 33:100396, 2022.
[90] Liangqi Yuan, Yunsheng Ma, Lu Su, and Ziran Wang. Peer-to-peer federated continual learning for naturalistic
driving action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 5249–5258, 2023.
[91] Zihan Guo, Linlin You, Sheng Liu, Junshu He, and Bingran Zuo. Icmfed: An incremental and cost-efficient mechanism
of federated meta-learning for driver distraction detection. Mathematics, 11(8):1867, 2023.
[92] Le Sun, Jin Wu, Yang Xu, and Yanchun Zhang. A federated learning and blockchain framework for physiological
signal classification based on continual learning. Information Sciences, 630:586–598, 2023.
[93] Yixing Huang, Christoph Bert, Stefan Fischer, Manuel Schmidt, Arnd Dörfler, Andreas Maier, Rainer Fietkau, and
Florian Putz. Continual learning for peer-to-peer federated learning: A study on automated brain metastasis
identification. arXiv preprint arXiv:2204.13591, 2022.
[94] Kehua Guo, Tianyu Chen, Sheng Ren, Nan Li, Min Hu, and Jian Kang. Federated learning empowered real-time medical
data processing method for smart healthcare. IEEE/ACM Transactions on Computational Biology and Bioinformatics,
2022.
[95] Dong Jin, Shuangwu Chen, Huasen He, Xiaofeng Jiang, Siyu Cheng, and Jian Yang. Federated incremental learning
based evolvable intrusion detection system for zero-day attacks. IEEE Network, 37(1):125–132, 2023.
[96] Martins O Osifeko, Gerhard P Hancke, and Adnan M Abu-Mahfouz. Surveilnet: A lightweight anomaly detection
system for cooperative iot surveillance networks. IEEE Sensors Journal, 21(22):25293–25306, 2021.
[97] Peihan Qi, Xiaoyu Zhou, Yuanlei Ding, Shilian Zheng, Tao Jiang, and Zan Li. Collaborative and incremental
learning for modulation classification with heterogeneous local dataset in cognitive iot. IEEE Transactions on Green
Communications and Networking, 2022.
[98] Nan Yang, Dong Yuan, Yuning Zhang, Yongkun Deng, and Wei Bao. Asynchronous semi-supervised federated
learning with provable convergence in edge computing. IEEE Network, 36(5):136–143, 2022.
[99] Xiaoqiang He, Qianbin Chen, Lun Tang, Weili Wang, Tong Liu, Li Li, Qinghai Liu, et al. Federated continuous learning
based on stacked broad learning system assisted by digital twin networks: An incremental learning approach for
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
32 Wang et al.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 33
[124] Andrea Cossu, Gabriele Graffieti, Lorenzo Pellegrini, Davide Maltoni, Davide Bacciu, Antonio Carta, and Vincenzo
Lomonaco. Is Class-Incremental Enough for Continual Learning? Frontiers in Artificial Intelligence, 5, 2022.
[125] Hamed Hemati, Andrea Cossu, Antonio Carta, Julio Hurtado, Lorenzo Pellegrini, Davide Bacciu, Vincenzo Lomonaco,
and Damian Borth. Class-incremental learning with repetition. In Conference on Lifelong Learning Agents, pages
437–455. PMLR, 2023.
[126] Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. FATE: An Industrial Grade Platform for Collaborative
Learning With Data Protection.
[127] Alexander Ziller, Andrew Trask, Antonio Lopardo, Benjamin Szymkow, Bobby Wagner, Emma Bluemke, Jean-Mickael
Nounahon, Jonathan Passerat-Palmbach, Kritika Prakash, Nick Rose, Théo Ryffel, Zarreen Naowal Reza, and Georgios
Kaissis. PySyft: A Library for Easy Federated Learning. In Muhammad Habib ur Rehman and Mohamed Medhat
Gaber, editors, Federated Learning Systems: Towards Next-Generation AI, Studies in Computational Intelligence, pages
111–139. Springer International Publishing, Cham, 2021.
[128] TensorFlow Federated. https://www.tensorflow.org/federated.
[129] Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei
Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. Flower: A Friendly Federated Learning
Research Framework, March 2022.
[130] Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma,
Abhishek Singh, Hang Qiu, Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar,
Qiang Yang, Murali Annavaram, and Salman Avestimehr. FedML: A Research Library and Benchmark for Federated
Machine Learning, November 2020.
[131] Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, and Jingren
Zhou. FederatedScope: A Flexible Federated Learning Platform for Heterogeneity, November 2022.
[132] Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L. Hayes, Matthias
De Lange, Marc Masana, Jary Pomponi, Gido M. van de Ven, Martin Mundt, Qi She, Keiland Cooper, Jeremy Forest,
Eden Belouadah, Simone Calderara, German I. Parisi, Fabio Cuzzolin, Andreas S. Tolias, Simone Scardapane, Luca
Antiga, Subutai Ahmad, Adrian Popescu, Christopher Kanan, Joost van de Weijer, Tinne Tuytelaars, Davide Bacciu,
and Davide Maltoni. Avalanche: An End-to-End Library for Continual Learning. In 2021 IEEE/CVF Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3595–3605, June 2021.
[133] P. Ramya, S. Venkatesh Babu, and G. Venkatesan. Advancing cybersecurity with explainable artificial intelligence: A
review of the latest research. In 2023 5th International Conference on Inventive Research in Computing Applications
(ICIRCA), pages 1351–1357, 2023.
[134] Michael Ungersböck, Thomas Hiessl, Daniel Schall, and Florian Michahelles. Explainable federated learning: A
lifecycle dashboard for industrial settings. IEEE Pervasive Computing, 22(1):19–28, 2023.
[135] Truong Thu Huong, Ta Phuong Bac, Kieu Ngan Ha, Nguyen Viet Hoang, Nguyen Xuan Hoang, Nguyen Tai Hung,
and Kim Phuc Tran. Federated learning-based explainable anomaly detection for industrial control systems. IEEE
Access, 10:53854–53872, 2022.
[136] Peng Chen, Xin Du, Zhihui Lu, Jie Wu, and Patrick CK Hung. Evfl: An explainable vertical federated learning for
data-oriented artificial intelligence systems. Journal of Systems Architecture, 126:102474, 2022.
[137] José Luis Corcuera Bárcena, Pietro Ducange, Francesco Marcelloni, Giovanni Nardini, Alessandro Noferi, Alessandro
Renda, Fabrizio Ruffini, Alessio Schiavo, Giovanni Stea, and Antonio Virdis. Enabling federated learning of explainable
ai models within beyond-5g/6g networks. Computer Communications, 210:356–375, 2023.
[138] Andreas Holzinger, Anna Saranti, Anne-Christin Hauschild, Jacqueline Beinecke, Dominik Heider, Richard Roettger,
Heimo Mueller, Jan Baumbach, and Bastian Pfeifer. Human-in-the-loop integration with domain-knowledge graphs
for explainable federated deep learning. In International Cross-Domain Conference for Machine Learning and Knowledge
Extraction, pages 45–64. Springer, 2023.
[139] Witold Pedrycz. Design, interpretability, and explainability of models in the framework of granular computing and
federated learning. In 2021 IEEE Conference on Norbert Wiener in the 21st Century (21CW), pages 1–6. IEEE, 2021.
[140] Dawid Rymarczyk, Joost van de Weijer, Bartosz Zieliński, and Bartlomiej Twardowski. Icicle: Interpretable class
incremental continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages
1887–1898, 2023.
[141] Zhaoxiong Yang, Shuihai Hu, and Kai Chen. Fpga-based hardware accelerator of homomorphic encryption for
efficient federated learning. arXiv preprint arXiv:2007.10560, 2020.
[142] Junxue Zhang, Xiaodian Cheng, Wei Wang, Liu Yang, Jinbin Hu, and Kai Chen. {FLASH}: Towards a high-performance
hardware acceleration architecture for cross-silo federated learning. In 20th USENIX Symposium on Networked Systems
Design and Implementation (NSDI 23), pages 1057–1079, 2023.
[143] Zixiao Wang, Biyao Che, Liang Guo, Yang Du, Ying Chen, Jizhuang Zhao, and Wei He. Pipefl: Hardware/software
co-design of an fpga accelerator for federated learning. IEEE Access, 10:98649–98661, 2022.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
34 Wang et al.
[144] Huimin Li, Phillip Rieger, Shaza Zeitouni, Stjepan Picek, and Ahmad-Reza Sadeghi. Flairs: Fpga-accelerated inference-
resistant & secure federated learning. In 2023 33rd International Conference on Field-Programmable Logic and Applica-
tions (FPL), pages 271–276. IEEE, 2023.
[145] Biyao Che, Zixiao Wang, Ying Chen, Liang Guo, Yuan Liu, Yuan Tian, and Jizhuang Zhao. Unifl: Accelerating federated
learning using heterogeneous hardware under a unified framework. IEEE Access, 2023.
[146] Stefano Bianchi, Irene Muñoz-Martin, and Daniele Ielmini. Bio-inspired techniques in a fully digital approach for
lifelong learning. Frontiers in Neuroscience, 14:379, 2020.
[147] Duvindu Piyasena, Miyuru Thathsara, Sathursan Kanagarajah, Siew Kei Lam, and Meiqing Wu. Dynamically
growing neural network architecture for lifelong deep learning on the edge. In 2020 30th International Conference on
Field-Programmable Logic and Applications (FPL), pages 262–268. IEEE, 2020.
[148] Duvindu Piyasena, Siew-Kei Lam, and Meiqing Wu. Accelerating continual learning on edge fpga. In 2021 31st
International Conference on Field-Programmable Logic and Applications (FPL), pages 294–300. IEEE, 2021.
[149] Geethan Karunaratne, Michael Hersche, J Langeneager, Giovanni Cherubini, Manuel Le Gallo, Urs Egger, Kevin
Brew, Sam Choi, Injo Ok, Claire Silvestre, et al. In-memory realization of in-situ few-shot continual learning with a
dynamically evolving explicit memory. In ESSCIRC 2022-IEEE 48th European Solid State Circuits Conference (ESSCIRC),
pages 105–108. IEEE, 2022.
[150] Andrés Otero, Guillermo Sanllorente, Eduardo de la Torre, and Jose Nunez-Yanez. Evolutionary fpga-based spiking
neural networks for continual learning. In International Symposium on Applied Reconfigurable Computing, pages
260–274. Springer, 2023.
[151] Shivam Aggarwal, Kuluhan Binici, and Tulika Mitra. Chameleon: Dual memory replay for online continual learning
on edge devices. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023.
[152] Dhireesha Kudithipudi, Anurag Daram, Abdullah M Zyarah, Fatima Tuz Zohora, James B Aimone, Angel Yanguas-Gil,
Nicholas Soures, Emre Neftci, Matthew Mattina, Vincenzo Lomonaco, et al. Design principles for lifelong learning ai
accelerators. Nature Electronics, pages 1–16, 2023.
[153] Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning
and growth for efficient inference and training in neural networks. The Journal of Machine Learning Research,
22(1):10882–11005, 2021.
[154] Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. On-device training under 256kb
memory. Advances in Neural Information Processing Systems, 35:22941–22954, 2022.
[155] Chaoyang He, Erum Mushtaq, Jie Ding, and Salman Avestimehr. Fednas: Federated deep learning via neural
architecture search. 2021.
[156] Hangyu Zhu and Yaochu Jin. Real-time federated evolutionary neural architecture search. IEEE transactions on
evolutionary computation, 26(2):364–378, 2021.
[157] Wayne Luk. Heterogeneous reconfigurable accelerators: Trends and perspectives. In 2023 60th ACM/IEEE Design
Automation Conference (DAC), pages 1–2. IEEE, 2023.
[158] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan,
Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural
information processing systems, 33:1877–1901, 2020.
[159] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[160] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda
Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In
International conference on machine learning, pages 8748–8763. PMLR, 2021.
[161] Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, and Daniel Rubin.
Rethinking architecture design for tackling data heterogeneity in federated learning. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 10061–10071, 2022.
[162] Yuanyishu Tian, Yao Wan, Lingjuan Lyu, Dezhong Yao, Hai Jin, and Lichao Sun. Fedbert: When federated learning
meets pre-training. ACM Transactions on Intelligent Systems and Technology (TIST), 13(4):1–26, 2022.
[163] Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo,
Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. In International Conference on
Machine Learning, pages 2790–2799. PMLR, 2019.
[164] Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv
preprint arXiv:2104.08691, 2021.
[165] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.
Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
[166] Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy,
and Salman Avestimehr. Slora: Federated parameter efficient fine-tuning of language models. arXiv preprint
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 35
arXiv:2308.06522, 2023.
[167] Liping Yi, Han Yu, Gang Wang, and Xiaoguang Liu. Fedlora: Model-heterogeneous personalized federated learning
with lora tuning. arXiv preprint arXiv:2310.13283, 2023.
[168] Shangchao Su, Bin Li, and Xiangyang Xue. Fedra: A random allocation strategy for federated tuning to unleash the
power of heterogeneous clients. arXiv preprint arXiv:2311.11227, 2023.
[169] Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K Roy-Chowdhury, Ananda Theertha Suresh, and
Samet Oymak. Fedyolo: Augmenting federated learning with pretrained transformers. arXiv preprint arXiv:2307.04905,
2023.
[170] Yuyuan Zhao, Tian Zhao, Peng Xiang, Qingshan Li, and Zhong Chen. Multi-task federated learning medical analysis
algorithm integrated into adapter. In 2023 IEEE 8th International Conference on Big Data Analytics (ICBDA), pages
24–30. IEEE, 2023.
ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.