0% found this document useful (0 votes)
23 views35 pages

Federated Continual Learning For Edge-AI: A Comprehensive Survey

This document presents a comprehensive survey on Federated Continual Learning (FCL) for Edge-AI, highlighting its significance in enabling advanced AI models while maintaining data privacy. It categorizes FCL methods into federated class, domain, and task continual learning, providing an in-depth review of challenges, solutions, and real-world applications. The survey also discusses future research directions to advance FCL in dynamic and distributed environments, making it a vital resource for researchers in the field.

Uploaded by

Aashish Bhambri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views35 pages

Federated Continual Learning For Edge-AI: A Comprehensive Survey

This document presents a comprehensive survey on Federated Continual Learning (FCL) for Edge-AI, highlighting its significance in enabling advanced AI models while maintaining data privacy. It categorizes FCL methods into federated class, domain, and task continual learning, providing an in-depth review of challenges, solutions, and real-world applications. The survey also discusses future research directions to advance FCL in dynamic and distributed environments, making it a vital resource for researchers in the field.

Uploaded by

Aashish Bhambri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Federated Continual Learning for Edge-AI: A Comprehensive

Survey
ZI WANG, FEI WU, FENG YU, YURUI ZHOU, JIA HU, and GEYONG MIN, Department of
arXiv:2411.13740v1 [cs.LG] 20 Nov 2024

Computer Science, Faculty of Environment, Science and Economy, University of Exeter, United Kingdom
Edge-AI, the convergence of edge computing and artificial intelligence (AI), has become a promising para-
digm that enables the deployment of advanced AI models at the network edge, close to users. In Edge-AI,
federated continual learning (FCL) has emerged as an imperative framework, which fuses knowledge from
different clients while preserving data privacy and retaining knowledge from previous tasks as it learns new
ones. By so doing, FCL aims to ensure stable and reliable performance of learning models in dynamic and
distributed environments. In this survey, we thoroughly review the state-of-the-art research and present the
first comprehensive survey of FCL for Edge-AI. We categorize FCL methods based on three task characteristics:
federated class continual learning, federated domain continual learning, and federated task continual learning.
For each category, an in-depth investigation and review of the representative methods are provided, covering
background, challenges, problem formalisation, solutions, and limitations. Besides, existing real-world ap-
plications empowered by FCL are reviewed, indicating the current progress and potential of FCL in diverse
application domains. Furthermore, we discuss and highlight several prospective research directions of FCL
such as algorithm-hardware co-design for FCL and FCL with foundation models, which could provide insights
into the future development and practical deployment of FCL in the era of Edge-AI.
CCS Concepts: • Networks → Network architectures; • Computing methodologies → Distributed
computing methodologies; Artificial intelligence.
Additional Key Words and Phrases: Federated Continual Learning, Edge-AI, Edge Computing, Artificial
Intelligence, Lifelong Learning, Incremental Learning, Federated Learning
ACM Reference Format:
Zi Wang, Fei Wu, Feng Yu, Yurui Zhou, Jia Hu, and Geyong Min. 2024. Federated Continual Learning for Edge-
AI: A Comprehensive Survey. ACM Comput. Surv. 1, 1 (November 2024), 35 pages. https://doi.org/XXXXXXX.
XXXXXXX

1 INTRODUCTION
Deep Learning (DL) has emerged as a leading approach in artificial intelligence (AI), with demonstra-
ble efficacy across various scientific fields, including computer vision, natural language processing,
and speech recognition [1]. DL utilises artificial neural networks with multiple hidden layers to
model high-level abstractions and learn complex patterns and representations from data [2]. In
recent years, the proliferation of DL applications has catalysed advancements in various sectors,
exemplified by their role in assisting medical diagnostics [3], enhancing autonomous driving sys-
tems [4], and accelerating genomics research [5]. However, traditional implementations of DL rely
on cloud computing systems with centralised servers and data storage, which can raise privacy
Authors’ address: Zi Wang, [email protected]; Fei Wu, [email protected]; Feng Yu, [email protected]; Yurui Zhou,
[email protected]; Jia Hu, [email protected]; Geyong Min, [email protected], Department of Computer Science, Faculty
of Environment, Science and Economy, University of Exeter, Exeter, Devon, United Kingdom, EX4 4RN.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM 0360-0300/2024/11-ART
https://doi.org/XXXXXXX.XXXXXXX

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
2 Wang et al.

concerns when collecting user data, incur high communication costs, and increase latency between
servers and clients. To address these challenges, edge computing has emerged as a promising
approach, which is a distributed computing paradigm that brings computation and storage closer to
data sources, rather than relying on centralised cloud-based data processing. This paradigm shift can
significantly reduce the latency and cost, making it suitable for data-intensive and latency-sensitive
AI applications. Therefore, the convergence of edge computing and AI gives rise to Edge-AI, which
aims to enable real-time AI applications powered by edge computing.
Edge-AI employs a popular distributed machine learning approach called federated learning (FL)
[6], which allows collaborative DL model training across clients while keeping the data localised.
To achieve this, a coordinating server distributes the global model to participating clients, which
then train the model using their local data. By aggregating processed parameters such as gradients
rather than raw data from each client on the coordinating server, FL ensures the overall training
performance and effectiveness of the global model while complying with data security regulations
[7, 8] such as the General Data Protection Regulation (GDPR) and the Data Protection Act (DPA),
addressing growing concerns about user privacy in AI applications.
FL research has mainly focused on model convergence under non-independent and identically
distributed (non-IID) data [9], model aggregation [10], security and privacy [11], resource optimi-
sation and incentive mechanisms [12], etc. Furthermore, most FL works assume that the training
dataset of clients is sampled from a static data distribution [13] and available from the beginning of
the training [14]. Whereas, in real-world scenarios, the progressive data collection, the distribution
of data, the class of samples, and the number of tasks can change over time, bringing significant
challenges to the model adaptability [15].
Recently, continual learning (CL), also known as incremental learning (IL) or lifelong learning
(LL), has become an important approach for learning and accumulating knowledge from a continual
stream of data [13]. Thus, integrating the concept of CL into the FL framework, known as Federated
Continual Learning (FCL), leverages the strengths of both FL and CL to establish a robust
foundation for Edge-AI in dynamic and distributed environments. However, continual learning
from a series of new tasks can cause the model to experience significant performance degradation on
previously learned tasks, a phenomenon known as catastrophic forgetting (CF) [13]. FCL deteriorates
this problem as FL allows clients to join and leave the learning process arbitrarily. Furthermore, the
heterogeneity of FL clients leads local models to learn diverse knowledge, exacerbating catastrophic
forgetting in the global model during the aggregation of these local models. Recent studies (e.g.,
[15–18]) have proposed solutions to tackle these challenges, giving rise to an emerging research
field that is increasingly attracting attention.

1.1 Related Surveys


In recent years, comprehensive surveys for FL and CL have been conducted separately [19–23]. For
federated learning, Zhang et al. [19] surveyed FL in the IoT domain and explored FL-empowered
IoT applications such as healthcare, smart city, and autonomous driving. Ye et al. [20] focused on the
challenges of heterogeneous FL from five perspectives: statistical heterogeneity, model heterogene-
ity, communication heterogeneity, device heterogeneity and additional challenges. For continual
learning, Van et al. [21] reviewed CL methods and summarised three types of CL as a common
framework to cross-compare the performances of various methods. Lange et al. [22] surveyed
works on CL for task classification, categorising them into replay-based, regularisation-based, and
parameter isolation methods, based on how task information is stored and used throughout the
learning process. Masana et al. [23] focused on class-incremental learning and categorized the
existing CL methods for image classification into regularisation, rehearsal, and bias-correction

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 3

methods. They also provided an extensive experimental evaluation of those methods for image
classification tasks.
These surveys are focused on separate areas of FL and CL. None of them has systematically
investigated the challenges and solutions proposed in the emerging paradigm of FCL, especially in
the Edge-AI environments. Recently, Yang et al. [24] conducted a survey of FCL from the perspective
of knowledge fusion. They proposed two frameworks, namely synchronous and asynchronous
FCL, for addressing the spatial-temporal catastrophic forgetting challenge in FCL with knowledge
fusion. Different from their work, our survey thoroughly investigates and categorizes the existing
FCL methods in Edge-AI based on three task characteristics: federated class continual learning,
federated domain continual learning, and federated task continual learning. In sections 2, 3, and 4,
these taxonomies will be explained in more detail.

1.2 Aim and Contributions


This survey aims to comprehensively investigate the state-of-the-art research on FCL to provide an
in-depth and consolidated review. From the perspectives of different task characteristics in FCL,
we thoroughly review the background, challenges, and methods of FCL. Furthermore, we explore
existing FCL-empowered applications for Edge-AI. This survey also provides an in-depth discussion
about future research directions, motivating researchers to address important open challenges in
FCL and offering insights that could inspire future advancement in Edge-AI. To the best of our
knowledge, this paper is the first comprehensive survey of federated continual learning
for Edge-AI.
The main contributions of this survey are summarised as follows:

• We present a comprehensive review and clear taxonomy of the state-of-the-art FCL research
based on different task characteristics: federated class continual learning, federated domain
continual learning, and federated task continual learning, including a large number of papers
in this rapidly expanding research field. The taxonomy, definitions, challenges, and advantages
and disadvantages of the representative methods are thoroughly discussed.
• We provide a review and summary of current real-world applications empowered by FCL,
such as intelligent transportation systems, intelligent medical systems, IoT, and digital twins,
highlighting the versatility and potential of FCL for making real-world impact.
• We deliberate upon and posit several open research challenges including the lack of universal
benchmarks, explainability, algorithm-hardware co-design, and FCL with foundation models,
while proposing prospective directions that could inspire the research community to advance
the field of FCL for its rapid development and wide deployment in the era of Edge-AI.

1.3 Survey Organisation


The overview of the survey is shown in Fig. 1, and the remainder of this paper is structured as
follows. Section 2 first provides a detailed definition of dynamically adding new classes in FCL. Then,
four types of approaches for this problem are categorized, and we elucidate the interrelationships
among these approaches. Section 3 analyses four types of solutions to the problem of domain drift
in FCL. Section 4 analyses current popular approaches in federated task continual learning. Section
5 investigates various applications empowered by FCL. Section 6 discusses several important open
challenges and highlights exciting future research directions in FCL for Edge-AI. Finally, Section 7
concludes this survey.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
4 Wang et al.

Survey
Structure
Introduction Class continual learning over time

Motivation Learned New Learned New Learned New

Related Surveys
Classes Classes Classes Classes Classes Classes Federated Class
Continual Learning
Contributions …

Organisation Generative Replay


Task (t) Task (t+1) … Task (t+n) Parameter Regularization
*Task identity (i.e., t, t+1,…, t+n) is not provided during testing.
Local Client Parameter Decomposition
Federated Continual Learning Prompt-based Methods
el
od

el
od
M

Domain continual learning over time Knowledge Distillation


d a

d
lo

oa
Up

nl
w
Do

Data Distribution Data Distribution Data Distribution Federated Domain


Client 1 Continual Learning

Global Server Domain Data Supplementation
Local Client
Task (t) Task (t+1) … Task (t+n) Domain Knowledge Learning
Aggregation Client 2
Up

*Task identity (i.e., t, t+1,…, t+n) is not provided during testing.


Model
loa


Do

Domain Model Enhancement


dM
wn


loa

od
dM

el

Domain Weight Aggregation


od

Task continual learning over time


el

Federated Task
Client n Local Client Continual Learning

Local Clients Regularization-based Methods
Task (t) Task (t+1) … Task (t+n) Architecture-based Methods
*Task identity (i.e., t, t+1,…, t+n) is provided during testing.
Replay-based Methods
Meta Learning-based Approaches
Future Directions Federated Continual
Conclusion Unsupervised Learning
and Challenges Learning Applications
FCL Benchmark
Intelligent Transportation System Smart Energy …
Explainable FCL
Intelligent Medical Systems Digital Twins
Algorithm-Hardware
Co-design for FCL Internet of Things Financial Audit
UAVs Robotics …
FCL with Foundation
Models

Fig. 1. An overview of our federated continual learning survey

2 FEDERATED CLASS CONTINUAL LEARNING


The first FCL scenario that we categorized is federated class continual learning (FCCL). Specifically,
the objective of class continual learning (CCL) is to discriminate between incrementally observed
new classes. For instance, a well-trained model in the CCL setting should distinguish all classes,
such as ‘birds’ and ‘dogs’ in the first task, ‘tigers’ and ‘fish’ in the second task [25]. However, CCL
is regarded as a challenging setting since the task identity is not provided. Moreover, by integrating
into the federated learning paradigm where clients gather data on new classes in a streaming
manner, FCCL leads to exacerbated forgetting challenge of old classes during training [16, 17, 26].
Specifically, there are two challenges in FCCL:
• Challenge 1: intra-task forgetting: In scenarios where a client is not involved in a particular
training round, the newly aggregated global model is at risk of experiencing a performance
drop to retain knowledge previously contributed by that client’s data. Consequently, this can
result in unsatisfying performance when applying the global model to the local data of the
non-participating client.
• Challenge 2: inter-task forgetting: When clients train models with new tasks, the perfor-
mance of new global degrades on old tasks.
To provide a clear understanding of the FCCL problem, we first formalize its definition as
follows. Given a global server 𝑆𝑔 and 𝑀 clients in the FL process, in each federated training round
𝑟 ∈ {1, ..., 𝑅}, each client 𝑚𝑘 (𝑘 ∈ [1, ..., 𝑀]) trains its local model parameters 𝜃𝑚
𝑟
𝑘
by using a

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 5

sequence of continual tasks 𝑇𝑚𝑘 = {𝑡𝑚 1 , 𝑡 2 , ..., 𝑡 𝑡 , ...}. Each task 𝑡 𝑡 = {(𝑥 𝑡 , 𝑦 𝑡 )} 𝑁𝑡 consists 𝑁
𝑘 𝑚𝑘 𝑚𝑘 𝑚𝑘 𝑖 𝑖 𝑖=1 𝑡
pairs of sample 𝑥𝑖 and corresponding label 𝑦𝑖𝑡 . The class set 𝐶 𝑡 of task 𝑡𝑚
𝑡 𝑡 includes its new classes
𝑘
and old class set 𝐶 𝑡 −1 in previous 𝑡𝑚
𝑡 −1 tasks. After local training is complete, each client 𝑚 transmits
𝑘 𝑘
the updated model parameters 𝜃𝑚 𝑟 to the server 𝑆 , and server 𝑆 aggregates them into the global
𝑘 𝑔 𝑔
parameter 𝜃𝐺𝑟 to integrate the task knowledge across all clients. Finally, the server 𝑆𝑔 distributes
the global parameter 𝜃𝐺𝑟 to all participating clients in the next training round.
Next, inspired by the tri-level (data-centric, model-centric, and algorithmic) division in CCL [25],
we categorise existing methods into four distinct groups: i) Generative Replay (Section 2.1), which
falls under the data-centric FCCL; ii) Parameter Regularization (Section 2.2), and iii) Parameter
Decomposition and Prompt-based methods (Section 2.3 and 2.4 respectively), which are the model-
centric FCCL; and iv) Knowledge Distillation (Section 2.5), which aligns with the algorithmic FCCL.
This categorization aims to provide a structured overview and facilitate a deeper understanding of
the rapidly evolving field. Finally, in Section 2.6, we summarize these FCCL approaches and analyse
the relation between them.

2.1 Generative Replay


Replay is a critical strategy to recreate or preserve representations of old classes and combine them
with available training data to address the catastrophic forgetting problem in FCCL.
Shenaj et al. [18] proposed a federated learning system with prototype aggregation for continual
representation (FedSpace), which utilized class prototypes in feature space for each old class as
replay and contrastive learning to preserve previous knowledge to avoid too divergent behaviour
between different clients. Specifically, each client receives the initialized pre-training model over a
custom-generated fractal dataset on the server side. The client then computes the prototypes of
each class, and aggregates them with a weights parameter followed by prototype augmentation.
Furthermore, they introduced a prototype-based loss and an additional loss function based on
contrastive learning for clients’ optimization.
Hendryx et al. [27] proposed the federated prototypical networks to facilitate more efficient
sequential learning of new classes building on the prototypical networks. Specifically, it can enhance
model performance through replaying feature vectors representative classes. However, it overlooks
the non-IID data distribution across these distinct clients. Recent works demonstrated that this can
be overcome via generative replay.
Generative replay (GR), typically implemented through generative adversarial networks (GANs)
[28], acts as a data replay method by modelling the class distribution of real samples and then
synthesizing instances. However, adapting GR to FCL settings is not straightforward. To solve this
challenge, Qi et al. [29] discovered empirically that the unstable learning process from distributed
training on non-IID data using standard federated learning algorithms can significantly impair
GR-based models’ performance. In response, they proposed FedCIL through model consolidation
and consistent enforcement. On the server side, the global model is initialized with combined
parameters and a collection of classification heads from various clients, then consolidated using
instances synthesized by client generators. This can avoid failure caused by simply merging the
parameters originating from clients with imbalanced new class data. To enforce consistency on the
client side, a consistency loss is applied to the output logits of the client’s classification module
during local training.
Babakniya et al. [30, 31] introduced the mimicking federated continual learning (MFCL), a method
akin to FedCIL, designed to compensate for the lack of old data through generative replay and
mitigate forgetting. In MFCL, the generative model which encourages to synthesize more uniform
and balanced class images is trained on the server side in a data-free manner. This enables MFCL

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
6 Wang et al.

to reduce both the local training time and computational costs for clients, and it doesn’t necessitate
access to their private data.
Recently, considering that exemplar-based methods may not be suitable for privacy-sensitive
scenarios, Zhang et al. [17] proposed TARGET which is an effective solution for addressing cat-
astrophic forgetting in FCCL without storing local private client data or any datasets. They first
experimentally confirmed that non-IID settings can intensify the catastrophic forgetting problem
in FL. Then, they used the previously trained global model to transfer knowledge of old tasks to
current ones at the model level. Additionally, a trained generator synthesizes data to simulate
non-IID training datasets with assistant model distillation on the clients at the data level. Therefore,
TARGET does not require extra datasets or the retention of private data from previous tasks, making
it especially suitable for data-sensitive environments.

2.2 Parameter Regularization


Parameter regularization in FCCL is crucial to achieve the dual goals of adapting to new tasks or
data distributions while preserving previously acquired knowledge. This method evaluates the
significance of each network parameter and assigns greater weight to more critical parameters,
thus minimizing catastrophic forgetting.
However, it often cannot accumulate comprehensive data on all classes due to the limited storage
capacity of FL clients. To tackle this challenge, Dong et al. [16] proposed the global-local forgetting
compensation (GLFC) model. This work targets local forgetting caused by the class imbalance
in local clients by implementing a class-aware gradient compensation loss and a class-semantic
relation distillation loss. These losses aim to balance the forgetting of old classes and maintain
consistent inter-class relations across tasks. Moreover, a proxy server is introduced to select the
best old global model for aiding each client’s local training to address global forgetting caused
by the non-IID distribution of classes among clients. Further, a gradient-based prototype sample
communication mechanism is developed to safeguard the privacy of communications between
the proxy server and clients. Then, Dong et al. [32] extended GLFC and proposed the local-global
anti-forgetting (LGA), which surpasses GLFC by efficiently performing local anti-forgetting on old
classes. They proposed a category-balanced gradient-adaptive compensation loss and a category
gradient-induced sematic distillation loss to solve local catastrophic forgetting on old categories. A
proxy server is designed to collect perturbed prototype images of new classes, which can help select
the best old model for global anti-forgetting via self-supervised prototype augmentation. Compared
to the experiments in GLFC, this work conducted more detailed experiments on representative
datasets under various FCCL settings and metrics such as top-1 accuracy, F1 score, and recall.
Apart from these two works, Dong et al. [33] observed that challenges in federated incremental
semantic segmentation (FISS) are heterogeneous forgetting of old classes from both intra-client and
inter-client perspectives. Therefore, they developed the forgetting-balanced learning (FBL) model to
tackle these challenges. Specifically, they introduced a forgetting-balanced semantic compensation
loss and a forgetting-balanced relation consistency loss to handle intra-client heterogeneous
forgetting across old classes, guided by confidently generated pseudo-labels through adaptive class-
balanced pseudo-labelling. Then, a task transition monitor is designed to surmount inter-client
heterogeneous forgetting, enabling new class recognition under privacy protection and storing the
latest old model for global relation distillation.
Inspired by the connection between client drift in FL caused by clients’ unbalanced classes and
catastrophic forgetting for old classes in CL, Legate et al. [34] introduced local client forgetting
problem. Motivated by the balanced softmax cross-entropy method for CL, they applied a re-
weighted softmax (WSM) for the loss function of each client based on its class distribution. Their

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 7

method added a regularization term reflecting the class proportions in the client dataset to the
standard cross-entropy loss, reducing excessive pressure and subsequent loss on other client data.
Hu et al. [35] designed a new FCL framework called DuAFed, featuring a dual attention mecha-
nism for the scenario of different class increments and unbalanced features of clients. DuAFed first
ensures a balanced pre-training sample distribution by randomly sampling an equal number of
instances from each client. Further, the iCaRL strategy [36] is employed to accommodate dynamic
changes in training tasks. To mitigate the noise generated by clients with an imbalanced quantity
of classes, a channel attention mechanism is added on the client side, where feature compression,
feature map retrieval and regularization via learned weight coefficients of each channel with all the
elements of the corresponding channel are successively performed. Moreover, to solve the challenge
that the respective features are unbalanced and the importance is difficult to capture in FCCL, they
introduced a feature attention mechanism, which can capture the hierarchical importance of the
neural network in multiple local models, for the model aggregation of clients.
Yao et al. [37] proposed federated learning with local continual training (FedCL) leveraging
a parameter-regularization constrained local continual learning strategy to mitigate the weight
divergence and continually integrate knowledge on different local models into the global model,
whose efficiency is verified under the different non-IID class data distribution. Specifically, they
utilized the diagonal of its Fisher information matrix in EWC [38] to evaluate the importance
weight matrix of the global model on a small proxy dataset on the server. This matrix is used in the
loss function to force the local model to fit the local data distribution.

2.3 Parameter Decomposition


Parameter decomposition used in FCCL is a method where the model’s parameters are usually
divided into shared global parameters and task-specific parameters, which capture the general
knowledge among all learned tasks and informative knowledge for tasks with specific classes.
This enables the learned global model to adapt to new class-incremental tasks without losing the
previous knowledge learned from previous tasks.
Motivated by additive parameter decomposition (APD) [39], Yoon et al. [40] proposed FedWeIT.
They decomposed the model parameters into dense global parameters and sparse task-specific
parameters to maximize the knowledge transfer between clients while minimizing the interference
of irrelevant knowledge from other clients and communication costs. Further, they divided task-
specific parameters into local base parameters and task-adaptive parameters, which capture the
general knowledge for each client and each task with specific classes per client, respectively.
Moreover, the sparse mask is applied to select only relevant base parameters for the knowledge of
specific classes to lower the number of parameters transferred, thus reducing communication costs.
Similar to FedWeIT, Zhang et al. [41] proposed cross-FCL based on parameter decomposition
inspired by APD [39] and several cross-edge strategies, which is a cross-edge FCL algorithm to
enable cross-edge devices to continually learn tasks without forgetting. They used parameter
decomposition by only aggregating base parameters from given tasks with specific classes to
solve the challenge of knowledge interference from model aggregation in FL and from inter-task
knowledge in CL. In addition, to tackle the cross-edge initial decision for usage between the local
model and global model, several different cross-edge strategies including discard, replace, finetune,
fusion and EWC fusion are proposed for different task relationships.
Different from the above methods, Luo et al. [42] proposed an FCL framework called FedKNOW,
which continually extracts and integrates the knowledge of signature tasks, featuring the concept of
signature tasks that are the most dissimilar tasks identified from local past tasks. Specifically, each
client consists of a knowledge extractor, a gradient restorer and a gradient integrator. FedKNOW
first retained the top-ranked weight parameters that are extracted as specified tasks’ knowledge

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
8 Wang et al.

and restored the specified number of previous gradients that are most dissimilar tasks with the
current task’ gradient to prevent catastrophic forgetting, based on the weight-based pruning
technique. Then, the gradient integrator is designed to mitigate negative knowledge transfer and
improve substant model performance by incorporating gradients from before aggregation and
after aggregation. It is worth noting that FedKNOW as a client-side solution is more scalable than
FedWeIT as a server-side solution for FCCL due to the lower communication cost.

2.4 Prompt-based methods


Recent trends involve designing FCCL approaches using a pre-trained Vision Transformer (ViT)
as a backbone. The ViT adapts its representational capability to streaming class-incremental data
through a continual learning method known as ‘learning to prompt’, by dynamically incorporating
a set of learned model embeddings, i.e., prompts.
Halbe et al. [43] first formulated intra-task forgetting and inter-task forgetting in FCCL. To mitigate
forgetting while minimizing communication costs, protecting client privacy, and enhancing client-
level computational efficiency, they proposed HePCo, a prompt-based data-free FCCL method.
In HePCo, each client performs decomposed prompting, where prompts holding class-specific
task information are used to solve corresponding local tasks. The final prompt is obtained with a
weighted summation by the cosine scores of all these prompts. During local learning, each client
learns the key and prompt matrices along with the classifier while keeping the ViT backbone frozen.
Then, a few parameters including key, prompt weights and classifier weights are transferred to the
server, which lowers communication overhead and safeguards client privacy by preventing local
model inversion. On the server side, the latent generator, which takes as input a class label encoded
using an embedding layer and a noise vector sampled from the standard normal distribution is
trained for the current and previous tasks. Once generator training is finished, data-free distillation
in the latent space is then employed to combat intra-task forgetting for the current task through
finetuning the server model while using pseudo-data corresponding to past tasks helps mitigate
inter-task forgetting.
Bagwe et al. [44] pointed out that the challenge in implementing prompting techniques in
FCCL is the unbalanced class distribution of distributed clients, which can cause biased learning
performance and slow convergence, while asynchronous task appearances further deteriorate it.
They proposed Fed-CPrompt by incorporating asynchronous prompting learning and contrastive
and continual loss (C2Loss) to alleviate inter-task forgetting and inter-client data heterogeneity.
Fed-CPrompt allows class-specific task prompts aggregation in parallel by taking advantage of
task synchronicity. C2Loss is designed to accommodate discrepancies due to biased local training
between clients in environments with heterogeneous data distribution and to curb the forgetting
effect via enforcing distinct task-specific prompts construction.
Differing from previous methods, motivated by the intuition that task-irrelevant prompts may
contain potential common knowledge to enhance the embedded features, Liu et al. [45] integrated
three types of prompts (i.e., task-specific, task-similar prompts and task-irrelevant prompts) into
image feature embedding. This strategy effectively preserves both old and new knowledge within
local clients, thereby addressing the issue of catastrophic forgetting. Additionally, it ensures the
thorough integration of knowledge related to the same task across different clients via sorted and
aligned the task information in the prompt pool. This effectively mitigates the non-IID problem,
which arises due to class imbalances among various clients engaged in the same incremental task.

2.5 Knowledge Distillation


Knowledge Distillation (KD) [46], initially developed for transferring knowledge from larger and
complex models to smaller and compact models, has gained widespread use in FCCL. It enables

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 9

an ‘old model’ to aid the currently updating ‘new model’. Although learning without forgetting
(LwF) [47] was the first successful application of KD in CCL, it can not directly apply to the
federated learning framework due to the centralized nature of CCL. To address this, Usmanova et al.
[48, 49] introduced FLwF to recognize the human activity based on all incrementally seen classes of
behaviour from local clients on the 6 classes representing different human activities in the UCI HAR
dataset. FLwF, which first extended KD to the federated setting, is the implementation of a standard
LwF method in FCL consisting of the past model of a client as the teacher model and one current
client model as the student model. Additionally, FLwF-2T, which consists of two teacher models
including the past model of a client and the server, was proposed to reduce forgetting in FCCL by
leveraging a server that maintains a general knowledge base across all clients’ class distribution.
Ma et al. [15] proposed continual federated learning with distillation (CFeD), which performs
KD at both client and server levels, uniquely featuring an independent unlabeled surrogate dataset
for each client. Specifically, it introduces a client division mechanism to utilize under-exploited
computational resources, aiding in reducing inter-task forgetting. Additionally, inspired by the
mini-batch iterative update approach in centralized training, server-side distillation is designed
to alleviate intra-task forgetting. In their class continual learning experimental scenarios, CFeD
outperforms other baselines, demonstrating the advantage of using the surrogate dataset to obtain
reasonable soft labels for old tasks.
Wei and Li [50] developed the federated learning with knowledge lock (FedKL) to tackle the
issue of catastrophic forgetting in federated learning, particularly the loss of knowledge from other
participants due to local updates. FedKL utilizes KD techniques to preserve previously acquired
knowledge while overcoming server knowledge forgetting caused by data isolation.
Efforts to expand FCCL into areas beyond computer vision, such as intrusion detection [51], are
emerging. Jin et al. [51] introduced FL-IIDS to solve catastrophic forgetting in federated intrusion
detection systems (IDS). However, this approach simplifies the challenge by assuming that traffic
data across local clients in FL-IIDS is IID. They identify three key issues in real-world intrusion
detection: (1) class imbalance in various traffic data types, (2) a predominance of new over old
classes in current tasks, leading to a bias towards new knowledge, and (3) a shrinking in the sample
size of old classes in dynamic example memory, weakening the ability to learn old classes. To combat
these, they proposed dynamic example memory, class gradient balancing loss, and sampling label
smoothing loss, respectively. Notably, their KD strategy, termed label smoothing loss, incorporates
soft labels of old classes into current training, enhancing the model’s generalization over old classes
and mitigating local model forgetting.

2.6 Summary and analysis of FCCL approaches


The major contribution of FCCL methods categorised above is summarised in Table 1. To further
indicate the relation between these representative methods, Fig. 2 distinguishes these methods using
generative replay (purple), parameter regularization (yellow), parameter decomposition (orange),
prompt-based methods (green) and knowledge distillation (blue). Moreover, as indicated by the
dashed boxes, three common strategies and methods, i.e., auxiliary datasets, the data-free manner
with distillation, and the rehearsal-free method, are utilized in several FCCL methods. From this
diagram, we can see that:

• The impact of GLFC [16] on subsequent research and the integration of techniques from
different fields are evident.
• The impact of LwF [47], iCaRL [36] and APD [39] upon the field of FCCL is huge.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
10 Wang et al.

Table 1. Major Contribution of FCCL Methods

Approach Paper Key Contribution


[18] Asynchronous FCL with class prototypes replay
Generative [27] Federated prototypical networks
Replay [29] Model consolidation and consistent enforcement
(Section 2.1) [30, 31] Compensate for the absence of old data by a data-free generative replay
[17] GR and KD within an exemplar-free continual learning
[16] The first work to alleviate local and global forgetting in FCCL
A category-balanced gradient-adaptive compensation loss and a cate-
[32]
gory gradient-induced semantic distillation loss
Parameter
[33] The first global continual segmentation model for FISS
Regulariza-
[34] Re-weight the softmax logits prior to computing the loss
tion (Section
Channel attention NN model and federated aggregation algorithm based
2.2) [35]
on the feature attention mechanism
[37] Importance weight matrix for better initialization of federated models
[40] Weighted inter-client transfer based on task-specific parameters
Parameter De- Task-specific parameter aggregation and cross-edge strategies for initial
[41]
composition decision for federated models
(Section 2.3) knowledge extraction and gradient restoration based on weight-based
[42]
pruning, and gradient integration
A lightweight generation and distillation scheme to consolidate client
[43]
Prompt-based models at the server based on prompting
methods [44] Asynchronous prompt learning and contrastive continual loss
(Section 2.4) A rehearsal-free FCL method based on prompting with the considera-
[45]
tion of privacy and limited memory
[48, 49] The first work to extend LwF to the federated setting
A client division mechanism and the server distillation with the unla-
Knowledge [15]
beled surrogate dataset
Distillation
[50] Overcoming the server knowledge forgetting caused by data isolation
(Section 2.5)
Sample label smoothing loss function leveraging KD to enhance the
[51]
local model memory

• With the rapid development of generative models, techniques represented by GAN have
emerged in the field of replay-based FCCL, which especially satisfies the users’ data privacy
protection needs in a data-free manner.
• Benefiting from the advanced capabilities and rapid developments of foundation models in
representation and transferability, a notable emergence of innovative FCL methods incor-
porating well-pre-trained models, such as ViT-based methods, have progressively surfaced
[43, 45, 52].
These approaches offer various solutions and pivotal insights to address the challenges encountered
in FCCL which is still in its nascent stage.

3 FEDERATED DOMAIN CONTINUAL LEARNING


In this section, we delve into various strategies aimed at addressing challenges in the second FCL
scenario, namely Federated Domain Continual Learning (FDCL). In FDCL, ‘domain’ typically refers

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 11

Vanilla FL
FCCL + ViT
methods + pseudo + ACGAN + iCaRL
+ re-weighted
+ LwF + APD
rehearsal softmax

Fed-Cprompt FedCIL GLFC DuAFed WSM FLwF & FLwF-2T FedWeIT Cross-FCL
+ contrastive
prompt + IIDS
Rehearsal-free + Top-ranked
method HePCo MFCL Fed-IIDS parameters
extension + semantic segmentaion as knowledge
+ task-irrelevant
Data-free With auxiliary dataset
prompt
manner
FCILPT TARGET CFeD FedKL FedKNOW
LGA FBL

Prompt-based methods Generative replay Parameter regularization Knowledge distillation Parameter decomposition

Fig. 2. Diagram of the relation among FCCL methods. There are five categories in our paper: generative replay
(purple), parameter regularization (yellow), parameter decomposition (orange), prompt-based methods (green)
and knowledge distillation (blue). Auxiliary datasets, the data-free manner with distillation and rehearsal-free
methods, frequently employed in some methods, are indicated by the dashed boxes.

to the distribution of datasets. Traditional continual learning focuses on the dynamics of individual
domains, while the integration of FL further allows each client with its private dataset to be treated
as a separate domain. Therefore, FDCL research focuses on the generalization of different domains
and the dynamic adaptation of individual domains.
To provide a clear understanding of the problem definition of FDCL, we first conceptualize it as
follows. For a given period [0,𝑇 ] and 𝐾 clients engaged in FL, we assume that the 𝑘th client contains
|𝐾 |
a set of sample 𝑥 and label 𝑦 pairs 𝐷𝑘𝑡 = {(𝑥𝑖𝑡 , 𝑦𝑖𝑡 )}𝑖=1 at a given time 𝑡. It is worth noting that
there may be unlabeled samples in some clients, but all samples belong to known classes. Multiple
clients in FL form a global known domain 𝐷𝑔𝑡 = {𝐷 1𝑡 , 𝐷 2𝑡 , . . . , 𝐷 𝐾𝑡 }. Subsequently, local training
and aggregation are performed in the classical FL paradigm. In this way, each client constructs a
local model 𝑓𝜃𝑘 : 𝑋𝑘 → 𝑌𝑘 with its private dataset, which is aggregated by the server to generate
a comprehensive global model 𝑓𝜃𝑔 : 𝑋𝑔 → 𝑌𝑔 after multiple rounds of communication. In FDCL
scenarios, the global model 𝑓𝜃𝑔 not only serves as a representation of the known domain 𝐷𝑔𝑡 but
also can be used to generalize the unknown domain 𝐷𝑢𝑛𝑘 𝑡 . Furthermore, the local domain changes
𝑡 𝑡 +𝑑
over time, which means 𝐷𝑘 ≠ 𝐷𝑘 after a time interval 𝑑. Moreover, in FDCL, the task identity is
not necessary during testing, because if each task has the same classes, the output would be the
same as well.
As illustrated in Fig. 3, FDCL faces two unique challenges:
• Challenge 1: distributed multi-source domains generalization based on privacy pro-
tection: In the distributed environment of FL, each client constitutes a separate domain.
This diversity significantly increases the challenges associated with the global model gener-
alization. Moreover, the commitment to protecting privacy results in data isolation, which
intensifies the complexity of learning and optimizing the global model.
• Challenge 2: unknown domain generalization and known domain drift: In the context
of FDCL, the global model also needs to extend its generalization capabilities beyond the
multi-source domains to cover unknown domains. Furthermore, dynamic data changes in
continual learning result in known domain drift, so models need to have the ability to learn
and adapt efficiently in a time-evolving and uncertain data environment.
Overall, these two challenges outline the balance between maintaining data privacy and improv-
ing model generalization capabilities in a changing environment. According to the different types

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
12 Wang et al.

? Unknown Domain

Challenge 2

Known Domain

Challenge 1

Fig. 3. Overall diagram of challenges faced by FDCL, challenge 1: privacy protection of multi-source domains
and domain drift of the global model concerning the local model (inter-domain); challenge 2: generalization
of the global model for the unknown domain, and domain drift of locally known domains over time and data
(intra-domain).

of approaches, current studies on FDCL can be divided into four main areas: Domain Data Sup-
plementation (Section 3.1), Domain Knowledge Learning (Section 3.2), Domain Model Enhancement
(Section 3.3), and Domain Weight Aggregation (Section 3.4). Various research approaches and the
key contributions of FDCL are detailed in Table 2. In the subsequent subsections, we thoroughly
examine the approaches associated with these four areas.

3.1 Domain Data Supplementation


The approach of domain data supplementation aims to achieve local dataset expansion by incor-
porating the data distribution from other clients. It can address the issue of poor generalization
caused by dataset isolation from different clients and the absence of old data from the local client.
A viable strategy involves employing data synthesis techniques to create proxy datasets that mimic
others’ domains by supplementing data from other clients while maintaining privacy [53–55].
Additionally, direct storage of old data also serves as an effective way as it supplements old data
from the local client to remember the previous domain [56–59].
Supplementing data from other clients. Inspired by feature extraction in image frequency
domain space, Liu et al. [53] proposed federated domain generalization (FedDG), which exchanges
part of frequency information across clients to supplement data in a privacy-conscious manner.
Specifically, FedDG decomposes the amplitude (i.e., low-level distribution) and phase signals (i.e.,
high-level semantics) by fast fourier transform (FFT). Based on this, an ‘amplitude distribution
bank’ is created for client data sharing, where each client generates new signals by interpolating the
local amplitude with data from the shared bank while maintaining the local phase signal constant.
These interpolated signals are transformed by the inverse fourier transform (IFT) to create a proxy
dataset as complements. FedDG implicitly synthesizes data from other clients, bridging the gap
between local and global models.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 13

Table 2. Different Research Approaches in FDCL

Type of Approaches Paper Key Contribution


[53] Continuous frequency space interpolation
Training generators on the central server for clients to
Domain Data [54]
produce synthetic global data
Supplementation
[55] Variable embedding rehearsal and server-side training
(Section 3.1)
[56–58] Client domain drift detection and adaptation
[59] Storing generalized representations of local data
Federated cross-correlation learning and dual-domain
[60]
knowledge distillation loss
Domain
[15] Continuous distillation federated learning
Knowledge
[61] Lifetime federated meta-reinforcement learning
Learning (Section
[62] Incremental unsupervised adversarial domain adaptation
3.2)
[63] Client-side and time-drift modeling
[64] Weights dynamic update
[65] Echo state networks and intrinsic plasticity
[66] Synaptic intelligence in FCL
[67] Differential privacy and synaptic intelligence
[68] Elastic federated learning
Domain Model [69] Reservoir computing and recurrent neural network
Enhancement [70] Progressive neural network
(Section 3.3) [71] Broad learning in FCL
Radial basis function and self-organizing incremental
[72]
neural network
[73] Very fast decision tree and order-preserving encoding
[74] Genetic algorithm and domain flatness constraint
Non-uniform device selection for natural language un-
[75]
Domain Weight derstanding
Aggregation [76] Orthogonal gradient aggregation
(Section 3.4) [77] Semi-supervised FL on evolving data streams
[78] Graph-aided FL approach with a few-shot node inhibition

Instead of synthesizing data locally, Liu et al. [54] trained a data generator on the centre server.
The server initially collects data from each client as a constant reference point to train the data
generator. Then, the trained generator and global model are broadcast to each client to produce
synthetic data. Furthermore, a mechanism of variable weights is also introduced to alleviate the
imbalance in the number of local classes across various clients. Although the above method of
supplementing data from other clients can improve the generalization ability of the global model,
there is still some risk of privacy leakage.
To further improve the security of proxy datasets, Park et al. [55] introduced variable embedding
rehearsal (VER) and server-side training (SST) strategies. On one hand, the authors used the VER
method that combines the security advantages of variable autoencoder (VAE) and embedding-based
reformulation (EBR) by generating random representations of a subset of data from each client. On
the other hand, the SST strategy facilitates training by rehearsing the data representations that
have been safely collected from each client avoiding direct access to the original dataset.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
14 Wang et al.

Supplementing old data from local client. To address the issue of individual domain drift
over time in FDCL, Casado et al. [56–58] utilized different methods to detect and adapt to local
client domain drift. For domain drift detection, the authors used a CUSUM-type (Cumulative Sum)
method based on a beta distribution. Building upon the original method, they proposed a sliding
window technique to detect changes in the confidence distribution of local classifiers. For domain
drift adaptation, they gathered data from new domains to update the long-term storage of the local
client, thereby preserving the memory of the previous domain. Over time, this approach requires
adding more long-term storage memory locally, otherwise forgetting will still occur, which may
not be applicable in some resource-constrained scenarios.
Since it is impractical to store all the data before the local client domain drift, Zhang et al. [59]
proposed a method to periodically store a generalized representation of the local client data, while
taking advantage of dynamically changing data to update models with new domain knowledge. In
addition, the central server also merges new domains from different clients based on the relevance
of the spatial and temporal dimensions.

3.2 Domain Knowledge Learning


The approach of domain knowledge learning considers the potential privacy risks associated with
proxy datasets. It prioritizes the direct knowledge transfer over the data transfer approach to
address inter-domain differences among different clients and intra-domain drift within the local
client. In the case of inter-domain knowledge learning, the strategy focuses on efficiently
transferring knowledge from other clients to the local client [15, 60, 61]. In the case of intra-
domain knowledge learning, it emphasizes the application of previously acquired knowledge to
adapt the local domain drift [62–64].
Inter-domain knowledge learning. To solve heterogeneity and catastrophic forgetting prob-
lems in distributed domains, Huang et al. [60] presented a method based on federated cross-
correlation and continual learning. The authors built a cross-correlation matrix across different
clients with an unlabeled public dataset and exploited knowledge distillation techniques in the
local updating process. A new loss function is proposed by federated cross-correlation learning
that boosts model similarity while accounting for model diversity. The generated loss function is
combined with a dual-domain knowledge distillation-based loss function, where the latest model is
computed from a mixture of the learned global model and the previous local model.
Similarly using the knowledge distillation method, Ma et al. [15] implemented continual federated
learning with distillation (CFeD) at both the client and server side with different learning objectives
for different clients. The essence of CFeD lies in using the learned model in the previous domain
to predict a proxy dataset and then utilising the predictions as pseudo-labels to retain knowledge
in currently inaccessible domains. Moreover, CFeD also presents a server distillation mechanism
specifically designed to address within-task forgetting in different domains. It involves adjusting
the aggregated global model to imitate the output from both the previous global domain and the
current local domain model.
Different from the above methods, Wang et al. [61] developed a complex lifetime federated
meta-reinforcement learning (LFMRL) algorithm, which leveraged prior obtained knowledge from
federated meta-learning to quickly adapt to new domains. Specifically, LFMRL devises a knowledge
fusion algorithm that integrates federated meta-learning and dual-attention deep reinforcement
learning to update local gradient data and generate shared models. Additionally, LFMRL also
designs an efficient knowledge transfer mechanism for the rapid learning of new domains in new
environments. This approach enhances the generalization of the model not only in the known
domains but also in the unknown domains.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 15

Intra-domain knowledge learning. To cope with the problem of domain drift in newly
collected data, Guo et al. [62] proposed incremental unsupervised adversarial domain adaptation
(IUADA) that merges FL and adversarial learning. This method aims to transfer knowledge from the
local target domain to the model learned from labelled data. In particular, the local target feature
extractor and discriminator are alternately trained through adversarial learning, separating source
features into positive and negative. Meanwhile, the gradient of the prediction score serves as an
attention weight to obtain distinctive features, which are aligned with local domain features to
adapt domain drift. This adversarial learning approach increases computational complexity and
resource requirements, especially for local clients with limited resources.
Regarding local domain drift in dynamic environments, Huang et al. [63] constructed a model
for local client domain drift evolving. The theoretical demonstration in this work reveals that the
convergence rate of the method in time-evolving scenarios is related to the approximation accuracy.
Moreover, Chen and Xu [64] introduced a dynamic update mechanism that leverages new weights
to adjust the parameters of the output classifier. This mechanism allows the model to seamlessly
integrate information from recently acquired data while preserving previously learned knowledge.

3.3 Domain Model Enhancement


The approach of domain model enhancement emphasizes improving the models of local clients to
mitigate the issue of memory loss. It enhances the model’s capabilities of resistance to forgetting
and adaptability by integrating innovative network structures or techniques. According to whether
the structure of the model is fixed, existing works based on this approach can be divided into two
categories: fixed structure [65–68] and non-fixed structure [69–73]. This approach aims to
leverage the learning capabilities of the client model to retain the memory of the old domain while
seamlessly integrating new domain knowledge.
Fixed Structure. De Caro et al. [65] facilitated effective learning in dynamic environments by
employing echo state networks (ESNs) and intrinsic plasticity (IP). The authors introduced the
FedIP algorithm to optimize the processing of stationary data in a federated learning setting and
adapt the learning rules of IP. In non-stationary scenarios, FedCLIP extends FedIP by updating
memory buffers and sampling the mini-batches. Zhang et al. [66] exploited synaptic intelligence
(SI) for weight updating to maintain the memory of the previous domain. They also added a
structural regularization loss term that integrates knowledge from other local models to tune
the global model towards a global optimum while minimizing weight variance. Chathoth et al.
[67] observed that non-IID data distributions have a significant impact on the performance of
differential privacy (DP) stochastic algorithms. To counteract this issue, they integrated DP with SI
to meet the privacy requirements of each client. In particular, they mitigated catastrophic forgetting
by adding a quadratic SI loss to the objective function to minimize modifications to parameters
that affect the previous model. Furthermore, they improved the (𝜀, 𝛿)-DP training method for a
cohort-based DP setting, tailoring it to meet the distinct privacy requirements of each cohort. To
bridge continual learning with federated learning and improve the robustness of different clients
to non-IID problems, Ma et al. [68] introduced the elastic federated learning (EFL) framework. It
integrates an elasticity term that constrains the volatility of crucial parameters, as determined by
the Fisher information matrix, within the local objective function. Moreover, they employed scaling
aggregation coefficients to counteract convergence degradation. The framework is further optimized
through sparsification and quantization techniques, effectively compressing both upstream and
downstream communications.
Non-fixed Structure. Drawing on insights from predictive coding in neuroscience, where
updating the parameters of only some of the active heads can prevent the other inactive heads
from being forgotten, Bereska et al. [69] proposed an approach based on reservoir computing in

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
16 Wang et al.

FCL, which is a state-of-the-art method for training recurrent neural networks (RNN) in dynamic
environments. They slowed down weight forgetting by fixing the weights of hidden layers in the
RNN and training multiple competing prediction heads simultaneously. Mori et al. [70] split the
neural network for each client into a unique feature extraction component and a common feature
extraction component. The authors regarded the local training as learning a unique task without
forgetting the knowledge of a common task, thus introducing the progressive neural network
(PNN) as the continual learning method in their solution. Le et al. [71] mitigated the catastrophic
forgetting and adapted to environmental changes by broad learning (BL), which supports CL without
retraining each client for new data. Moreover, they designed a weighted processing strategy and a
batch-asynchronous technique to support accurate and fast training. This asynchronous update
method combined with BL can decouple local training from the knowledge of the global model.
Zhu et al. [72] proposed the SOINN-RBF method, which effectively combines radial basis function
(RBF) networks and self-organizing incremental neural networks (SOINN). This method aims
to optimize data labelling management and real-time sample domain adaptation through high
dimensional spatial mapping to improve data regularity identification and generalization. Han
et al. [73] presented an incremental tree model construction method based on very fast decision
tree (VFDT) for efficiently handling domain drift. They developed a lightweight practically order-
preserving encoding (POPE) method, which replaces complex encryption algorithms while reducing
computational and communication burdens. Additionally, they adapted a region-counting method
to effectively reduce the memory overhead of POPE.

3.4 Domain Weight Aggregation


The approach of domain weight aggregation assesses and reorganizes the relationships between
clients based on the uploaded model weights. Researches based on this approach can be divided into
two key parts, non-uniform weight aggregation [74–76] and reorganization relationships of
clients [77, 78]. The first part concentrates on directly optimizing the weight aggregation process
to improve the model’s generalization capabilities, while the second one emphasizes indirectly
influencing the weight aggregation process by organizing relationships between clients and further
considers adaptation to dynamically changing environments.
Non-uniform weight aggregation Zhang et al. [74] devised an FL-friendly generalization
adjustment (GA) method that combines a genetic algorithm with domain flatness constraint to
determine the best weights for each client. Specifically, the flatness of each domain is evaluated by
the difference in generalization between the global and local models. Meanwhile, domain weights are
dynamically adjusted during server aggregation. Dupuy et al. [75] showed that in natural language
understanding (NLU) training, non-uniform device selection based on the number of interactions
improves model performance, with benefits increasing over time. Wang et al. [76] introduced the
orthogonal gradient aggregation (OGA) method instead of uniform weight aggregation, which
updates gradients orthogonally to prior parameter spaces to prevent catastrophic forgetting in
domain transfer, thereby retaining old knowledge and enhancing privacy. Although this approach
solves the problem of generalization of data from different source domains (known domains), the
generalization performance for unknown domains and domain drift problems still needs to be
discussed thoroughly.
Reorganization relationships of clients. Mawuli et al. [77] proposed a semi-supervised
federated learning approach on evolving data streams (SFLEDS) that addresses domain drift and
privacy protection. Their proposed method utilized a distributed prototype-based technique that
uses k-means clustering to group data stream instances into micro-clusters. Then, an error-driven
technique is employed to capture inter-and-intra domain drift. Specifically, it efficiently performs

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 17

collaborative semi-supervised prediction tasks by merging global and local models and incorporates
probabilistic client-server consistency techniques to address domain drift.
Yao et al. [78] introduced a graph-aided federated learning (GAFL) approach with a few-shot
node inhibition mechanism to improve the generalization capability of global models. GAFL designs
collaborative graphs of pair-wise and category-wise levels to describe the relationship of customers
to distinguish different data distributions. A continual learning approach is tailored to new clients,
limiting graphics and model updates to a smaller scope, thus minimizing the disruption caused by
the original model domain.

3.5 Summary and analysis of FDCL approaches


This section summarizes and analyzes the four types of FDCL approaches mentioned above,
addressing domain generalization and domain drift adaptation from a unique perspective. In
summary:
• Domain Data Supplementation. This approach emphasizes strengthening the data-level
complements to mitigate data scarcity and diversity issues. However, the use of synthetic
datasets may pose a risk of privacy leakage, and there is the possibility of using synthetic
data to infer the distribution of the original data.
• Domain Knowledge Learning. This approach focuses on leveraging acquired knowledge for
better application across different clients and drift issues. Despite the benefits, the complexity
of distillation methods may lead to additional computational and communication costs, which
may be detrimental to the implementation of some resource-constrained edge devices.
• Domain Model Enhancement. This approach aims at improving the model’s resistance to
forgetting. Specific model enhancement methods may be effective in small datasets or simple
tasks, while their generalizability and adaptability to other complex tasks are inconclusive.
• Domain Weight Aggregation. This approach prioritizes the optimization of weight rela-
tionships between clients. It proves beneficial for known domains, however, its performance
in generalizing to unknown domains deserves further investigation.
Each approach contributes valuable insights for overcoming challenges in FDCL. Nonetheless,
they are also accompanied by specific limitations that require further attention.

4 FEDERATED TASK CONTINUAL LEARNING


In this section, we will investigate the emerging challenges and state-of-the-art research of the
third FCL scenario, namely Federated Task Continual Learning (FTCL).
In this scenario, the local clients learn a set of distinct tasks over time for which the task
identity is explicitly provided, i.e., the learning algorithm is clear about which task will be executed.
Then global aggregation is performed in multiple rounds to generate a global model enabling the
distribution and update of the knowledge for different tasks across clients.
Similar to the previous sections, we first provide a clear formalisation of the FTCL problem.
Considering a global server 𝑆 and 𝐶 distributed clients in a federated framework, each client
𝑐𝑖 ∈ {𝑐 1, ..., 𝑐𝐶 } learns a local model on its private task dataset 𝑇 𝐷𝑐𝑖 with task sequence {1, ..., 𝑡, ...,𝑇 },
𝑡
where 𝑇 𝐷𝑐𝑡 𝑖 = {𝑥 𝑡𝑗 , 𝑦𝑡𝑗 }𝑁𝑗=1 is a labeled dataset for task 𝑡 with 𝑁 𝑡 instances of 𝑥 𝑡𝑗 and its label 𝑦𝑡𝑗 . In
FTCL, there is no relationship among the datasets 𝑇 𝐷𝑐𝑖 across all clients. In each federated training
round 𝑟 ∈ {1, ..., 𝑅}, each client 𝑐𝑖 updates its model parameters 𝜃𝑐𝑟𝑖 by using task dataset 𝑇 𝐷𝑐𝑖 in a
task continual learning setting and accelerate the current task learning with learned knowledge
from the past tasks. Then, each client 𝑐𝑖 transmits updated model parameters 𝜃𝑐𝑟𝑖 after training
to the server 𝑆, and the server 𝑆 aggregates them into the global parameter 𝜃𝐺𝑟 to integrate the
task knowledge across all clients. Finally, the server 𝑆 distributes the global parameter 𝜃𝐺𝑟 to all

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
18 Wang et al.

participating clients in the next training round. In FTCL, the task identity is clearly provided during
learning and testing, so the model can be trained and performed by referring to the task-specific
components.
Under this FTCL scenario, there are two main challenges that need to be solved after each client
updates its local model with the global parameter 𝜃𝐺 to obtain the cross-client task knowledge:
• Challenge 1: Catastrophic forgetting happens due to the insufficient training data of tasks
from other clients in 𝑇 𝐷𝑐𝑖 .
• Challenge 2: The performance of local clients degrades as client and task heterogeneity
increases, causing local model training to update its parameters in the wrong direction.
Recently, many studies have been conducted to provide different methods to solve these chal-
lenges in FTCL. In the following subsections, we will present an elaborated taxonomy of represen-
tative federated task continual learning methods as illustrated in Fig. 4, analyzing extensively their
main motivations, proposed solutions, and related evaluations.

Federated Task
Continual Learning

Regularization- Architecture-based Replay-based Meta Learning- Unsupervised


based methods Methods Methods based Approaches Learning

Fig. 4. The elaborated taxonomy of representative federated task continual learning methods

4.1 Regularization-based methods


In this category, the solution is characterized by adding explicit regularization terms to balance
the old and new tasks. Bakman et al. [79] aimed at addressing the global catastrophic forgetting
problem in FTCL under realistic assumptions that do not require access to past data samples. The
authors compared and analyzed the conventional regularization-based approaches and proposed
a federated orthogonal training (FOT) framework. FOT uses their proposed FedProject average
method in the aggregation to make the global updates of new tasks orthogonal to previous tasks’
activation principal subspace to decrease the performance disruption on old tasks. Their evaluation
was compared with state-of-the-art methods, and the results indicated that FOT alleviates global
forgetting while maintaining high accuracy performance with negligible extra communication and
computation costs.

4.2 Architecture-based methods


To better solve the inter-task interference problem in FTCL, constructing specific modules or adding
different parameters in the architecture is an effective and flexible solution that can explicitly
help. Wang et al. [80] focused on the scenario where both data privacy and high-performance
image reconstruction are required in multi-institutional collaborations. The authors proposed
a peer-to-peer federated continual learning network called icP2P-FL to alleviate catastrophic
forgetting with reduced communication costs. icP2P-FL uses the cyclic task-incremental continual
learning mechanism across multiple institutions as the FTCL setting. The authors also designed
an intermediate controller that includes two modules, the performance assessment module (PAM)

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 19

and the online determination module (ODM), to evaluate the model performance, determine the
inter-institutional training order and adjust transmission costs in real time.
Chaudhary et al. [81] applied FCL in text classification to minimize catastrophic forgetting,
maximize the inter-client transfer learning and minimize inter-client interference by proposing a
framework called federated selective inter-client transfer (FedSeIT). FedSeIT uses parameter decom-
position methods to decompose each client’s model parameters into three different parameter sets
to access task-adaptive parameters better and selectively leverage task-specific knowledge. Specifi-
cally, the dense local base parameters capture the task-generic knowledge across clients. Sparse
task-adaptive parameters capture task-specific knowledge for each task. Sparse mask parameters
selectively utilize the global knowledge. The authors also proposed a task selection strategy named
selective inter-client transfer (SIT). SIT is designed for efficient assessment of domain overlap
at the global server using encoded data representations and selection of relevant task-adaptive
parameters of foreign clients without sharing data, therefore preserving privacy while keeping
the performance. In evaluation, they used five datasets with unique labels as the FTCL scenario to
demonstrate the effectiveness compared with the baseline method.
Zhang et al. [41] proposed a parameter decomposition-based FCL framework named Cross-FCL.
Cross-FCL uses additive parameter decomposition to separate knowledge of the local model into
base parameters for common knowledge and task-specific parameters for personalized knowledge
of the current local task to minimize the interference between federated learning and continual
learning. The authors also introduced cross-edge strategies on biased global aggregation and local
optimization, which helps reduce memory and computation costs as well as balancing memory
usage and adaptation trade-offs. The authors built a testbed for multi-edge federated learning on
real-world image recognition datasets and other public datasets that are divided into different
disjoint sub-datasets as local task datasets in FTCL settings to demonstrate the effectiveness of the
proposed Cross-FCL framework compared with the baseline.

4.3 Replay-based methods


In FTCL, replay-based methods include saving samples in memory and approximating and re-
covering old data distributions which are then used to rehearse knowledge in training current
tasks. Zizzo et al. [82] defined the classic FTCL problem and proposed to mitigate the catastrophic
forgetting by extending the conventional local replay methods with the global buffer to adapt a
novel scenario where clients dynamically join the FL system and have varying participation rates in
training rounds. By sharing differential private data of participating clients to the global aggregator,
the local models can access prior task information across clients after aggregation and use the
global buffer together with the local buffer to improve performance.
Wang et al. [83] found most existing FCL work neglected the maintenance or consolidation of
old knowledge, resulting in performance degradation on previous tasks. Therefore, the authors
defined this problem under the FTCL setting and designed a federated probability memory replay
(FedPMR) framework including a probability distribution alignment (PDA) module and a parameter
consistency constraint (PCC) module to enhance the resistance ability to the catastrophic forgetting
problem. Specifically, PDA uses a simple but effective replay buffer in the learning process for
retaining exemplars and replaying initial probability experiences from past tasks to solve the
probability bias problem occurring in previous tasks. PCC personalizes the guidance for past
tasks learned at different times with the adaptive weight assignment to mitigate the imbalance in
parameter variations between previous and new tasks.
Recently, to solve the challenges of continually learning new tasks without forgetting previous
ones due to limited resources on edge devices for transformer-based computer vision models, Zuo et
al. [84] proposed a framework for FCL of vision transformers (ViTs) on edge devices, called FedViT.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
20 Wang et al.

FedViT addresses the challenges of catastrophic forgetting, negative knowledge transfer, and
scalability issues in FCL under FTCL settings. By considering the limited storage and computation
capabilities of edge devices, FedViT utilises a small number of samples from each task to improve
the performance against the above challenges. It proposes a knowledge extractor that retains critical
knowledge from past tasks using a small subset of samples, a gradient restorer that converts this
knowledge into gradients to help the model recover past task knowledge quickly, and a gradient
integrator that ensures the combination of new and old task gradients does not lead to a loss in
accuracy for any task.

4.4 Meta Learning-based Approaches


FTCL can be achieved not only by adding additional terms to the loss function but also by explicitly
designing and integrating other optimization methods such as meta-learning. Schur et al. [85]
focused on a lifelong learning (i.e., continual learning) scenario where an agent faces kernelized
bandit problems sequentially, with different unknown but shared kernel information between prob-
lems. The authors designed a lifelong bandit optimizer (LIBO) based on meta-learning approaches to
transfer knowledge across Bayesian optimization problems and extended LIBO to F-LIBO under the
federated learning framework. In F-LIBO, each BO task is performed by a peer in a network without
data exchanges, sequentially updating a kernel estimate to approximate the true kernel across tasks,
guaranteeing optimal performance in comparison with an oracle with complete environmental
knowledge over time.
Li et al. [86] aimed to solve privacy threats in real-world task-incremental scenarios of distributed
systems for biometrics. The authors proposed a personalized FCL framework to avoid memory
explosion and catastrophic forgetting in FTCL for biometrics. This framework includes a continual
task-distillation-based adaptive model-agnostic meta-learning module to retain the old knowledge
and learn the knowledge transferring between incremental users. Besides, a personalized FCL
strategy, sharing global meta parameters and reserving local learnable learning rates, is designed
in the framework to enhance local performance and reduce communication costs.

4.5 Unsupervised Learning


Paul et al. [87] extend FedWeIT [41] to an unsupervised continual federated learning framework
called unsupervised continual federated masked autoencoders for density estimation (CONFED-
MADE). CONFEDMADE integrates masked autoencoders with federated learning within an unsu-
pervised learning framework under the FTCL setting. It indirectly benefits from the experiences
of other clients without direct exposure to specific tasks and data to protect privacy. The masked
autoencoders with setup masking strategy integrate with task attention mechanisms, facilitating
selective knowledge transfer between clients to solve catastrophic forgetting.

4.6 Summary and analysis of FTCL approaches


In summary, as the task identity is explicitly provided in FTCL, it will be more effective and
efficient to train models with task-specific components. Therefore, compared to the regulation-
based approaches, architecture-based and replay-based approaches have attracted more attention
in recent research, which not only prevent catastrophic forgetting but also improve the efficiency
of sharing learned representations across tasks and clients. To further improve the effectiveness
of knowledge transferring and privacy preservation between FL clients, meta-learning-based
approaches and unsupervised learning approaches are integrated to better solve FTCL challenges.
However, how to optimize the trade-off between the performance and computational complexity
of these advanced approaches requires further attention.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 21

5 FEDERATED CONTINUAL LEARNING APPLICATIONS


The previous three sections have thoroughly described the definitions, challenges, and strategies of
different scenarios of FCL. In this section, we investigate various applications empowered by FCL.
Table 3 shows a summary of existing application scenarios of FCL research.
Table 3. Summary of Federated Continual Learning Applications

Application Paper Brief Summary


[88] ITS-SS: Real-time prediction in the ITS.
[89] C-FL: Adapt to changing environments on the road.
Intelligent
[90] FedPC: Naturalistic driving action recognition for safe intelligent
Transportation
transportation system.
System
[91] ICMFed: Driver distraction detection for efficient and safe intelli-
gent transportation system.
[92] MetaCL: A smart physiological signal classification.
Intelligent
[93] ICL: Automatic brain metastasis identification.
Medical Systems
[94] A real-time medical data processing for computer-aided diagnosis.
[95] MENIFLD_QoS: Intrusion detection system in resource con-
strained IoT.
[96] SurveilNet: Lightweight federated IoT surveillance system with
Internet of
continual learning ability.
Things
[97] FIL: Modulation classification in cognitive IoTs.
[98] FedIL: Federated continual learning framework with asynchro-
nous training in edge networks.
[99] FCL-SBLS: Intrusion detection system in UAV Networks with low
computational cost.
UAVs
[100] Failure prediction for drones at the source or intermediate nodes.
[101] A photovoltaic power prediction to improve the power supply
reliability.
Smart Energy
[102] FLD: A fault line detection system for medium- and low-voltage
power distribution networks.
[103] BL-FCL: An efficient distributed model training framework for
Digital Twin
digital twin network.
[104] A federated continual learning framework improving the assur-
Financial Audit
ance of financial statements.
[105] A visual obstacle avoidance system for robots.
Robotics [106] A federated continual learning framework for socially aware ro-
botics supporting settings personalization.

5.1 Intelligent Transportation System


Reddy et al. [88] applied incremental federated learning to process the real-time data collected in
the moving vehicles to achieve real-time prediction tasks in the intelligent transportation system.
The prediction models deployed in the edge node will be incrementally trained based on the vehicle
learning updates after a predetermined amount of time, achieving real-time forecasting of the state
of the road and other conditions for the autonomous vehicle system.
From the vehicle aspect, Barbieri et al. [89] periodically collected new sensor data for model
training to adapt to changing environments on the road. They considered this incremental data in the

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
22 Wang et al.

vehicle-to-everything networks and then applied continual learning settings in their decentralised
consensus-driven federated learning method.
To improve the safety of ITS, Yuan et al. [90] utilised federated continual learning for naturalistic
driving action recognition to prevent driver distraction, reduce the risk of traffic accidents, and
alleviate the privacy concerns caused by in-cabin cameras.
Under a similar scenario, Guo et al. [91] targeted the dynamics and heterogeneity challenges
within real-world driver distraction detection and proposed a cost-efficient mechanism ICMFed by
integrating incremental learning, meta-learning and federated learning to improve the efficiency
and safety of intelligent transportation systems.

5.2 Intelligent Medical Systems


Sun et al. [92] combined federated learning, meta-learning-empowered continual learning and block-
chain for physiological signal classification to protect data privacy and overcome the catastrophic
forgetting problem. In their proposed framework, the federated learning method was used to train
an Auto-Encoder-based feature extractor for the original physiological signal. They proposed a
knowledge base module to process and store the knowledge representations learned by each task
to solve the catastrophic forgetting caused by time, domain and institution change. Specifically,
they created a mask function for each task using the feature representation vectors obtained
by the feature extractor and used meta-learning methods to continuously accumulate important
knowledge of all tasks in updating the knowledge base.
Brain metastasis identification is a critical scenario that requires multicenter collaboration
while maintaining strict data privacy requirements among involved medical institutions. Huang
et al. [93] proposed an effective continual learning method integrated with peer-to-peer feder-
ated learning to address the performance fluctuation in cyclic weight transfer. They investigated
regularization-based methods and utilised synaptic intelligence by adding penalties for important
network parameter changes. This method can effectively improve automatic brain metastasis
identification sensitivity with peer-to-peer federated learning.
Computer-aided diagnosis (CAD) is a critical research in the medical field, which has benefited
from advances in AI technology in recent years. To further achieve real-time medical data processing
and human-like progressive learning, Guo et al. [94] combined the idea of federated learning and
incremental learning and proposed a real-time medical data processing method, which reduces the
time and space resource costs while mitigates the catastrophic forgetting of the disease diagnosis
model.

5.3 Internet of Things


With the increasing zero-day attacks which may escape the existing intrusion detection system
through unknown vulnerabilities caused by collecting sensitive information such as voice, finger-
print, and image in IoT devices, Jin et al. [95] proposed a federated continual learning method
which embeds discriminative auto-encode model to help the intrusion detection model update
with changes of attacks. Considering the resource-constrained IoT device features, their proposed
method can infer unseen network attacks while performing fine-grained known attack identification
without intensive model retraining.
In IoT surveillance networks, introducing federated learning to develop a collaborative surveil-
lance system is necessary to solve the problem of data sharing and the inadequacy of training data
in anomaly detection. Osifeko et al. [96] presented a lightweight scheme that allows nodes to learn
from new anomalies continuously. Their proposed SurveilNet updates the model after receiving a
false classification report or interval trigger and then slows down the learning process from the
obtained knowledge to prevent forgetting issues.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 23

In the cognitive Internet of Things, modulation classification is an essential enabler for primary
user detection and signal recognition. To process a large amount of heterogeneously cognitive IoT
data in a distributed mechanism, Qi et al. [97] proposed a federated continual learning method
with knowledge distillation to learn the modulation classification knowledge of private classes in
each local device. Similar to [70], they divided the training into two phases, i.e., warm-up phase for
global model learning and customised incremental learning phase for client model learning.
Yang et al. [98] proposed a federated continual learning framework with an asynchronous
semi-supervised training algorithm. Their proposed FedIL framework can help open platform
applications such as IoT to prevent deep learning models from forgetting the learned information
of labelled data and accelerate the convergence of the global model during training.

5.4 UAVs
He et al. [99] combined a stacked board learning system with federated continual learning to
accommodate the increment of input data and enhancement nodes in UAV systems. Their proposed
model can effectively relieve the catastrophic forgetting problem generated by dynamic data
collection, and improve the accuracy of intrusion detection with low computational cost.
The failure detection is also an essential module in UAV networks for swarm-based drone
delivery services. To efficiently utilise the energy of UAVs and the knowledge learned from old
drone flight history, Alkouz et al. [100] proposed a weighted continual federated learning method
by allocating different weights to balance the importance between old and new flying data of drones
incrementally, which performs the failure prediction at the source or when the drones land at
intermediate nodes.

5.5 Smart Energy


Solar energy has become the most potential alternative energy source because of its inexhaustibility
and non-pollution. However, photovoltaic (PV) power output has strong fluctuation and intermit-
tency, and its power curve has obvious non-stationarity. As a result, PV power generation is bound
to affect the power quality and supply-demand balance when connected to the grid on a large scale.
In this scenario, Zhang et al. [101] proposed a federated continual learning method which utilised
the broad learning system through regional data sharing with incremental model update strategy
to predict PV data to ensure the power supply reliability.
The power transmission and distribution networks with voltage levels of 35 kV and below are
treated as critical infrastructures for large-scale power supply in cities and industrial areas. However,
achieving accurate fault line detection is challenging in medium- and low-voltage distribution
network systems because of training data scarcity caused by inactive relay protection devices under
this scenario. To solve this problem, Zhang et al. [102] proposed a fault line detection system by
integrating federated learning and incremental learning strategy to improve the detection accuracy
for small-sample and streaming data environments.

5.6 Digital Twin


In the digital twin network (DTN), distributed data protection mechanisms are considered to be
utilised to mitigate user privacy threats. To achieve this objective, Lv et al. [103] improved the
federated learning framework with continual learning and proposed a blockchain-based secure
distributed data sharing architecture. In this work, their proposed architecture can avoid retraining
when new data comes by introducing Broad Learning into Federated Continuous Learning to speed
up the model training process in DTN.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
24 Wang et al.

5.7 Financial Auditing


Some researchers are also exploring the improvement of financial statements through auditing.
Schreyer et al. [104] identified two data distribution shift problems as catastrophic forgetting
and model interference during auditing. To solve these two problems, they proposed a federated
continual learning framework which applied an auto-encoder network model to utilise the previous
knowledge. Their proposed framework enables auditors to incrementally learn industry-specific
models from distributed data of multiple audit clients to improve the assurance of financial state-
ments.

5.8 Robotics
Obstacle avoidance is a critical and essential function in autonomous mobile robot development.
Robots need to have the capabilities of continually learning the model for obstacle avoidance like
humans. Yu et al. [105] proposed a federated continual learning-empowered obstacle avoidance
covering data collection, model training, and model sharing.
In the domain of socially aware robotics, Guerdan et al. [106] proposed a framework that enables
robots to personalize their settings for new individuals or groups based on FCL. They introduced
four key components as evaluation metrics for the decentralized robot learning framework: adapta-
tion quality, adaptation time, knowledge sharing, and model overhead. Moreover, they developed
an Elastic Transfer method based on importance regularization, which facilitates retaining rele-
vant parameters across multiple robots, thereby enhancing knowledge sharing among robots and
improving both the quality and speed of adaptation.

6 FUTURE DIRECTIONS AND CHALLENGES


In this section, we highlight and discuss three future directions of FCL that provide promising
opportunities for the growth of Edge-AI, which include: explainable FCL, algorithm-hardware
co-design for FCL, and FCL with foundation models.

6.1 FCL Benchmark


As researches in FCL intensify, establishing a benchmark with representative datasets, fair evaluation
criteria and framework is crucial for assessing existing FCL methods and guiding future develop-
ments. To the best of our knowledge, despite there have been some notable works [15, 16, 40, 49], a
robust and universally accepted benchmark is yet to emerge in FCL research. Moreover, existing
FCL studies often adopt datasets and evaluation metrics from FL and CL fields. Here, we describe
potential FCL benchmarks from three distinct aspects as follows.
1) Common datasets. A prevalent benchmark in FL is LEAF [107], which comprises four vision
task datasets and two for NLP tasks. In addition, specialized benchmarks are emerging for areas like
multimodal federated learning [108] and federated graph learning [109]. In CL, several classification
datasets such as CORe50 [110], SHVN [111], Stream-51 [112], CUB-200 [113], CLEAR [114], and
SlimageNet [115] have been specifically designed, alongside commonly used datasets like MNIST
[116], CIFAR-100 [117], and ImageNet [118] variants. However, we believe that FCL tasks with
blurry task boundaries, where scenarios featuring class overlap or sharing across tasks, align more
closely with real-world applications like e-commerce services and food image classification than the
prevalent disjoint tasks [119, 120], as evidenced in the works [121–125]. Therefore, this practical
setting merits increased attention in FCL studies.
2) Diverse evaluation metrics. Comprehensive yet unified metrics are equally important
for fair comparisons in FCL experiments. Most FCL studies have concentrated on addressing the
catastrophic forgetting problem by evaluating the performance of the final global or local model

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 25

in FL, and performance across current, past, and future tasks in CL. The specific metrics vary,
with some studies using averaged accuracy to assess forgetting [15, 16] and others employing
forward and backward transfer metrics [17, 40, 49]. Apart from forgetting, future FCL research
should consider the inherent variability across clients more thoroughly, particularly concerning
constraints in computational capacity, energy and memory. Additionally, the characteristics of
data resources, including non-i.i.d data distribution, sample quantity and class imbalance, demand
significant attention, especially in the context of Edge AI. Consequently, the formulation of diverse
metrics, meticulously designed to encapsulate these specific client-side factors, is indispensable for
the nuanced evaluation and advancement of FCL.
3) User-friendly and modular frameworks. Existing frameworks and libraries such as FATE
[126], PySyft [127], TFF [128], Flower [129], FedML [130], FederatedScope [131] and Avalanche [132]
have significantly facilitated FL and CL research. All of these tools are open-source, accompanied
by comprehensive documentation, and support effortless, customized modular implementation in
practice, owing to their plug-and-play nature. Nevertheless, we firmly believe that crafting a user-
friendly and modular framework stands as a fundamental and advantageous initiative to foster the
FCL community for collaborative and sustainable growth. To this end, it is more efficient to introduce
CL-empowered and FL-enabled plug-ins for existing FL and CL frameworks, respectively, rather
than starting from scratch. Alternatively, developing a streamlined and lightweight framework
dedicated to FCL presents another viable strategy.

6.2 Explainable FCL


For Edge-AI, intelligent models are distributedly deployed across various places such as industries
or private communities with distinct security requirements and constraints. In the near future, with
the increasing demands for secure, robust and reliable FCL systems and applications, developing
methods to provide insights into model updates and decisions in decentralised and collaborative
FCL environments will be a trending need in the field of Edge AI. By enhancing the explainability
and interpretability of FCL for Edge AI, it is easier to detect and prevent attacks, as well as to verify
and validate the correctness and fairness of the models [133], thereby building trust in the entire
process of FCL.
Recently, some researchers have invested their efforts in combining FL with explainable AI
models to enhance transparency and trustworthiness [134–139]. For example, the accuracy and
trustworthiness of Quality of Experience (QoE) forecasting in Beyond 5G/6G network are improved
in [137], the trust and the prediction performance of anomaly detection for industrial control
systems are improved by the proposed FL-based explainable model in [135]. Meanwhile, the
preliminary exploration of the interpretable CL approach was introduced in [140], which mitigates
the interpretability concept drift and outperforms the performance of existing exemplar-free
methods of common CCL. However, they only analysed the exemplar-free scenario and closed-set
recognition where training and test samples share the same label space, while not investigating
the potential impact of incorporating a replay buffer on the model performance and exploring its
compatibility with open-set settings where test samples do not come from the training.
There has been no research yet to explore explainable FCL, but the above requirements to
improve model transparency and trustworthiness, the limitations and challenges of explainable FL
and explainable CL also exist in FCL. To solve these challenges, two important research aspects
deserve attention for explainable FCL.
1) Synergistic Consolidation. Simply introducing the CL methods into existing explainable FL
or vice versa is not appropriate due to (i) the explainable FL model may present fewer capabilities
in preventing catastrophic forgetting and maintaining the explainability over time to generate
explanations for dynamically changing models after applying CL, or (ii) the explainable CL model

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
26 Wang et al.

may lack a clear understanding of how clients’ contributions affect the global model considering
complex aggregation mechanisms under a decentralised framework. By further exploring the
synergistic consolidation between explainable FL and CL, we can enable more effective, secure,
transparent and trustworthy FCL model development. This synergistic consolidation for explainable
FCL has the potential to facilitate the deployment of secure Edge-AI systems that are not only
powerful but also ethically responsible.
2) Scalability. The scalability challenge will also arise when more clients continually participate
during federated training, making the process more heterogeneous and less efficient, causing the
global model an increasing challenge to accurately generate and efficiently communicate meaningful
and consistent explanations. Therefore, solving scalability challenges in enhancing explainability
for FCL in large-scale scenarios in Edge AI also needs to be further explored.
In short, explainable FCL will be an essential, challenging, but highly rewarding research direction,
helping to accelerate the development of various robust and reliable applications of Edge AI.

6.3 Algorithm-Hardware Co-design for FCL


Recently, there have been some studies investigating the possibility of the algorithm and hardware
co-design in FL and CL, respectively. In FL, existing researches focus on addressing the computa-
tional bottleneck of cryptographic algorithms [141–145]. In CL, the researchers emphasize fitting
the computational and storage resource constraints of edge devices and accelerating the training
and inference of neural networks with resistance to forgetting [146–152]. However, current FCL
research focuses on algorithms, while hardware-involved co-design has not been explored yet.
As network model parameters continuously increase and computational complexity signifi-
cantly grows, FCL research confronts communication and computation challenges, particularly in
resource-constrained environments. The algorithms and hardware in FCL are closely related and
complement each other. Currently, ongoing studies are further advancing the co-design approach,
offering significant potential for addressing these challenges. In this section, four potential research
directions and associated challenges are outlined below.
1) Hardware-aware pruning and quantization algorithms [153, 154] could significantly
reduce communication and computational overheads in FCL. Furthermore, the quantized fixed-
point numbers are well-suited for parallel computation on hardware such as GPU, FPGA, ASIC,
and CIM. The challenge lies in maintaining the accuracy of the network model after lightweight.
2) Neural Architecture Search (NAS) could be used for automated hardware-aware design to
address device heterogeneity in FCL [155, 156]. Hardware-aware NAS enables aggressive control
of hardware resource requirements to ensure latency of training and inference on different devices.
The complexity of NAS is a key barrier to the application of this method.
3) Spare matrix multiplication and mixed precision supported hardware could be designed
to improve the performance and reduce the energy consumption of the FCL device, addressing
the sparsity and low bit-width of the network structure brought by the pruning and quantization
algorithms. However, designing hardware to support these operations and ensuring high utilization
of hardware computing arrays is not trivial.
4) Domain-specific hardware [142, 157] consisting of reconfigurable hardware and instructions
can take advantage of its flexibility to adapt to multi-tasking in FCL, where different tasks can be
abstracted into reusable basic operators for acceleration hardware design. The primary difficulty
originates from co-optimization of the reconfigurable circuits and compilers.
Although there are still challenges in the promising research directions mentioned above, FCL for
algorithm and hardware co-design is expected to become a trending topic and will fundamentally
facilitate the development of intelligent learning systems.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 27

6.4 FCL with Foundation Models


Foundation models (FMs), such as GPT [158], BERT [159], and CLIP [160], capture rich knowledge
and data representations through pre-training on large-scale datasets, making them adaptable to a
wide range of downstream tasks via fine-tuning. It has evolved into fundamental infrastructures
across domains like natural language processing (NLP), computer vision (CV), and speech processing.
Recently, there has been an emerging trend in research that integrates FMs with CL or FL. For
instance, the pre-trained model based on Transformers can effectively mitigate the catastrophic
forgetting problem of CL compared to the convolutional neural network (CNN)-based approach in
CV [161]. Similarly, FMs combined with FL also improve performance while preserving privacy in
NLP [162]. To further drive advancements in the combination between FCL and FMs for Edge-AI,
two promising research directions are highlighted as follows.
1) Reducing computational overhead on resource-constrained edge devices. FMs en-
counter significant challenges in communication and computation while facilitating FCL. Currently,
the communication overhead has been mitigated by adopting parameters-efficient fine-tuning
(PEFT) methods [163–165], but the challenge of computational bottlenecks has not yet received
sufficient attention, especially on resource-constrained edge devices in Edge-AI. Consequently, a
promising research direction lies in reducing the computation and storage requirements of FCL with
FMs by leveraging model compression and knowledge distillation techniques, making it suitable
for resource-constrained devices while maintaining performance in FCL for Edge-AI.
2) FMs-based cross-modal FCL for evolving environments. Although FMs hold promise
for dealing with data heterogeneity, device heterogeneity, and multi-tasking in FCL [166–170],
the aspect of continuous learning for data streams in dynamic settings remains underexplored.
Additionally, the complexity of integrating multimodal data within FMs further introduces ad-
ditional difficulties. Therefore, another promising research direction leads to the development
of cross-modal FCL strategies based on FMs, aiming to adapt to different types of data drifts in
evolving environments.
To summarize, the challenges and opportunities of applying FMs in FCL are intertwined. Integrat-
ing FCL with foundation models covers core issues of FL and DL, as well as bridging interdisciplinary
fields such as data privacy, communication technology, and software engineering. Solving these
complex problems requires a collaborative effort from both academia and industry.

7 CONCLUSION
Edge-AI is an emerging and rapidly developing area. To ensure the performance of Edge-AI
applications when handling various devices and evolving data at the edge, federated continual
learning emerges to provide sustained adaptability and stable performance for learning models over
time. In this paper, we are the first to conduct an extensive and comprehensive survey on federated
continual learning for Edge-AI and categorize three scenarios for federated continual learning based
on different task characteristics: federated class continual learning, federated domain continual
learning, and federated task continual learning. We thoroughly summarised the background,
challenges, problem formalisation, advanced solutions, and limitations of each scenario. We also
provide a review and summary of nine real-world applications empowered by federated continual
learning In addition, we highlighted four open research challenges and proposed prospective
directions. We hope this survey will inspire the research community to accelerate the progress of
improving federated continual learning for Edge-AI.

REFERENCES
[1] Shi Dong, Ping Wang, and Khushnood Abbas. A survey on deep learning and its applications. Computer Science
Review, 40:100379, 2021.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
28 Wang et al.

[2] Yiping Zuo, Jiajia Guo, Ning Gao, Yongxu Zhu, Shi Jin, and Xiao Li. A survey of blockchain and artificial intelligence
for 6g wireless communications. IEEE Communications Surveys & Tutorials, 2023.
[3] Andre Esteva, Katherine Chou, Serena Yeung, Nikhil Naik, Ali Madani, Ali Mottaghi, Yun Liu, Eric Topol, Jeff Dean,
and Richard Socher. Deep learning-enabled medical computer vision. NPJ digital medicine, 4(1):5, 2021.
[4] Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. A survey of deep learning applications to
autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 22(2):712–733, 2020.
[5] Nicolae Sapoval, Amirali Aghazadeh, Michael G Nute, Dinler A Antunes, Advait Balaji, Richard Baraniuk, CJ Barberan,
Ruth Dannenfelser, Chen Dun, Mohammadamin Edrisi, et al. Current progress and open challenges for applying
deep learning across the biosciences. Nature Communications, 13(1):1728, 2022.
[6] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient
learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR,
2017.
[7] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. A survey on federated learning. Knowledge-Based
Systems, 216:106775, 2021.
[8] Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N
Galtier, Bennett A Landman, Klaus Maier-Hein, et al. The future of digital health with federated learning. NPJ digital
medicine, 3(1):1–7, 2020.
[9] Hangyu Zhu, Jinjin Xu, Shiqing Liu, and Yaochu Jin. Federated learning on non-iid data: A survey. Neurocomputing,
465:371–390, 2021.
[10] Pian Qi, Diletta Chiaro, Antonella Guzzo, Michele Ianni, Giancarlo Fortino, and Francesco Piccialli. Model aggregation
techniques in federated learning: A comprehensive survey. Future Generation Computer Systems, 2023.
[11] Viraaji Mothukuri, Reza M Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, and Gautam Srivastava. A
survey on security and privacy of federated learning. Future Generation Computer Systems, 115:619–640, 2021.
[12] Latif U Khan, Shashi Raj Pandey, Nguyen H Tran, Walid Saad, Zhu Han, Minh NH Nguyen, and Choong Seon Hong.
Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications
Magazine, 58(10):88–93, 2020.
[13] Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: Theory,
method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
[14] Marcos F Criado, Fernando E Casado, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Non-iid data and continual
learning processes in federated learning: A long road ahead. Information Fusion, 88:263–280, 2022.
[15] Yuhang Ma, Zhongle Xie, Jue Wang, Ke Chen, and Lidan Shou. Continual federated learning based on knowledge
distillation. In IJCAI, pages 2182–2188, 2022.
[16] Jiahua Dong, Lixu Wang, Zhen Fang, Gan Sun, Shichao Xu, Xiao Wang, and Qi Zhu. Federated class-incremental
learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10164–10173,
2022.
[17] Jie Zhang, Chen Chen, Weiming Zhuang, and Lingjuan Lyu. Target: Federated class-continual learning via exemplar-
free distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4782–4793,
2023.
[18] Donald Shenaj, Marco Toldo, Alberto Rigon, and Pietro Zanuttigh. Asynchronous federated continual learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5054–5062, 2023.
[19] Tuo Zhang, Lei Gao, Chaoyang He, Mi Zhang, Bhaskar Krishnamachari, and A Salman Avestimehr. Federated learning
for the internet of things: Applications, challenges, and opportunities. IEEE Internet of Things Magazine, 5(1):24–29,
2022.
[20] Mang Ye, Xiuwen Fang, Bo Du, Pong C Yuen, and Dacheng Tao. Heterogeneous federated learning: State-of-the-art
and research challenges. ACM Computing Surveys, 56(3):1–44, 2023.
[21] Gido M van de Ven, Tinne Tuytelaars, and Andreas S Tolias. Three types of incremental learning. Nature Machine
Intelligence, 4(12):1185–1197, 2022.
[22] Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne
Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern
analysis and machine intelligence, 44(7):3366–3385, 2021.
[23] Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer.
Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 45(5):5513–5533, 2022.
[24] Xin Yang, Hao Yu, Xin Gao, Hao Wang, Junbo Zhang, and Tianrui Li. Federated continual learning via knowledge
fusion: A survey. arXiv preprint arXiv:2312.16475, 2023.
[25] Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. Deep class-incremental
learning: A survey, February 2023.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 29

[26] Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, and Richard Vidal. Federated Learning for Data Streams. In
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 8889–8924. PMLR, April
2023.
[27] Sean M. Hendryx, Dharma Raj KC, Bradley Walls, and Clayton T. Morrison. Federated Reconnaissance: Efficient,
Distributed, Class-Incremental Learning, August 2021.
[28] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative Adversarial Networks, June 2014.
[29] Daiqing Qi, Handong Zhao, and Sheng Li. Better generative replay for continual federated learning. In The Eleventh
International Conference on Learning Representations, 2022.
[30] Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, and Salman Avestimehr. Don’t memorize; mimic
the past: Federated class incremental learning without episodic memory. In Federated Learning and Analytics in
Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
[31] Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, and Salman Avestimehr. A data-free approach to
mitigate catastrophic forgetting in federated class incremental learning for vision tasks. In Thirty-seventh Conference
on Neural Information Processing Systems, 2023.
[32] Jiahua Dong, Yang Cong, Gan Sun, Yulun Zhang, Bernt Schiele, and Dengxin Dai. No one left behind: Real-world
federated class-incremental learning. arXiv preprint arXiv:2302.00903, 2023.
[33] Jiahua Dong, Duzhen Zhang, Yang Cong, Wei Cong, Henghui Ding, and Dengxin Dai. Federated incremental semantic
segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3934–3943,
2023.
[34] Gwen Legate, Lucas Caccia, and Eugene Belilovsky. Re-weighted softmax cross-entropy to control forgetting in
federated learning. arXiv preprint arXiv:2304.05260, 2023.
[35] Kai Hu, Meixia Lu, Yaogen Li, Sheng Gong, Jiasheng Wu, Fenghua Zhou, Shanshan Jiang, and Yi Yang. A federated
incremental learning algorithm based on dual attention mechanism. Applied Sciences, 12(19):10025, 2022.
[36] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. ICaRL: Incremental classifier
and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages
5533–5542, Honolulu, HI, July 2017. IEEE.
[37] Xin Yao and Lifeng Sun. Continual local training for better initialization of federated models. In 2020 IEEE International
Conference on Image Processing (ICIP), pages 1736–1740. IEEE, 2020.
[38] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran
Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan
Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National
Academy of Sciences, 114(13):3521–3526, March 2017.
[39] Jaehong Yoon, Saehoon Kim, Eunho Yang, and Sung Ju Hwang. Scalable and order-robust continual learning with
additive parameter decomposition. In International Conference on Learning Representations, 2019.
[40] Jaehong Yoon, Wonyong Jeong, Giwoong Lee, Eunho Yang, and Sung Ju Hwang. Federated continual learning with
weighted inter-client transfer. In International Conference on Machine Learning, pages 12073–12086. PMLR, 2021.
[41] Zhouyangzi Zhang, Bin Guo, Wen Sun, Yan Liu, and Zhiwen Yu. Cross-fcl: Toward a cross-edge federated continual
learning framework in mobile edge computing systems. IEEE Transactions on Mobile Computing, 2022.
[42] Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, and Lydia Y Chen. Fedknow: Federated
continual learning with signature task knowledge integration at edge. In 2023 IEEE 39th International Conference on
Data Engineering (ICDE), pages 341–354. IEEE, 2023.
[43] Shaunak Halbe, James Seale Smith, Junjiao Tian, and Zsolt Kira. Hepco: Data-free heterogeneous prompt consolidation
for continual federated learning. arXiv preprint arXiv:2306.09970, 2023.
[44] Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, and Lan Zhang. Fed-CPrompt: Contrastive Prompt for Rehearsal-Free
Federated Continual Learning, September 2023.
[45] Jiale Liu, Yu-Wei Zhan, Chong-Yu Zhang, Xin Luo, Zhen-Duo Chen, Yinwei Wei, and Xin-Shun Xu. Federated
class-incremental learning with prompting. arXiv preprint arXiv:2310.08948, 2023.
[46] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the Knowledge in a Neural Network, March 2015.
[47] Zhizhong Li and Derek Hoiem. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(12):2935–2947, December 2018.
[48] Anastasiia Usmanova, François Portet, Philippe Lalanda, and German Vega. Federated continual learning through
distillation in pervasive computing. In 2022 IEEE International Conference on Smart Computing (SMARTCOMP), pages
86–91. IEEE, 2022.
[49] Anastasiia Usmanova, François Portet, Philippe Lalanda, and German Vega. A distillation-based approach integrating
continual learning and federated learning for pervasive services. arXiv preprint arXiv:2109.04197, 2021.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
30 Wang et al.

[50] Guoyizhe Wei and Xiu Li. Knowledge lock: Overcoming catastrophic forgetting in federated learning. In Pacific-Asia
Conference on Knowledge Discovery and Data Mining, pages 601–612. Springer, 2022.
[51] Zhigang Jin, Junyi Zhou, Bing Li, Xiaodong Wu, and Chenxu Duan. Fl-iids: A novel federated learning-based
incremental intrusion detection system. Future Generation Computer Systems, 151:57–70, 2024.
[52] Chenghao Liu, Xiaoyang Qu, Jianzong Wang, and Jing Xiao. Fedet: a communication-efficient federated class-
incremental learning framework based on enhanced transformer. In Proceedings of the Thirty-Second International
Joint Conference on Artificial Intelligence, pages 3984–3992, 2023.
[53] Quande Liu, Cheng Chen, Jing Qin, Qi Dou, and Pheng-Ann Heng. Feddg: Federated domain generalization on
medical image segmentation via episodic learning in continuous frequency space. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 1013–1023, 2021.
[54] Shunjian Liu, Xinxin Feng, and Haifeng Zheng. Overcoming forgetting in local adaptation of federated learning
model. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 613–625. Springer, 2022.
[55] Tae Jin Park, Kenichi Kumatani, and Dimitrios Dimitriadis. Tackling dynamics in federated incremental learning
with variational embedding rehearsal. arXiv preprint arXiv:2110.09695, 2021.
[56] Fernando E Casado, Dylan Lema, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Federated and continual
learning for classification tasks in a society of devices. arXiv preprint arXiv:2006.07129, 2020.
[57] Fernando E Casado, Dylan Lema, Marcos F Criado, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Concept
drift detection and adaptation for federated and continual learning. Multimedia Tools and Applications, pages 1–23,
2022.
[58] Fernando E Casado, Dylan Lema, Roberto Iglesias, Carlos V Regueiro, and Senén Barro. Ensemble and continual
federated learning for classification tasks. Machine Learning, pages 1–41, 2023.
[59] Lei Zhang, Guanyu Gao, and Huaizheng Zhang. Spatial-temporal federated learning for lifelong person re-
identification on distributed edges. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
[60] Wenke Huang, Mang Ye, and Bo Du. Learn from others and be yourself in heterogeneous federated learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10143–10153, 2022.
[61] Ying Wang, Fengjun Shang, and Jianjun Lei. Multi-granularity fusion resource allocation algorithm based on dual-
attention deep reinforcement learning and lifelong learning architecture in heterogeneous iiot. Information Fusion,
page 101871, 2023.
[62] Yongxin Guo, Tao Lin, and Xiaoying Tang. Towards federated learning on time-evolving heterogeneous data. arXiv
preprint arXiv:2112.13246, 2021.
[63] Yan Huang, Mengxuan Du, Haifeng Zheng, and Xinxin Feng. Incremental unsupervised adversarial domain adaptation
for federated learning in iot networks. In 2022 18th International Conference on Mobility, Sensing and Networking
(MSN), pages 186–190. IEEE, 2022.
[64] Zhiyong Chen and Shugong Xu. Learning domain-heterogeneous speaker recognition systems with personalized
continual federated learning. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1):33, 2023.
[65] Valerio De Caro, Claudio Gallicchio, and Davide Bacciu. Continual adaptation of federated reservoirs in pervasive
environments. Neurocomputing, 556:126638, 2023.
[66] Zhao Zhang, Yong Zhang, Da Guo, Shuang Zhao, and Xiaolin Zhu. Communication-efficient federated continual
learning for distributed learning system with non-iid data. Science China Information Sciences, 66(2):122102, 2023.
[67] Ajesh Koyatan Chathoth, Clark P Necciai, Abhyuday Jagannatha, and Stephen Lee. Differentially private federated
continual learning with heterogeneous cohort privacy. In 2022 IEEE International Conference on Big Data (Big Data),
pages 5682–5691. IEEE, 2022.
[68] Zichen Ma, Yu Lu, Wenye Li, and Shuguang Cui. Efl: Elastic federated learning on non-iid data. In Conference on
Lifelong Learning Agents, pages 92–115. PMLR, 2022.
[69] Leonard Bereska and Efstratios Gavves. Continual learning of dynamical systems with competitive federated reservoir
computing. In Conference on Lifelong Learning Agents, pages 335–350. PMLR, 2022.
[70] Junki Mori, Isamu Teranishi, and Ryo Furukawa. Continual horizontal federated learning for heterogeneous data. In
2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
[71] Junqing Le, Xinyu Lei, Nankun Mu, Hengrun Zhang, Kai Zeng, and Xiaofeng Liao. Federated continuous learning
with broad network architecture. IEEE Transactions on Cybernetics, 51(8):3874–3888, 2021.
[72] Meng-yuan Zhu, Zhuo Chen, Ke-fan Chen, Na Lv, and Yun Zhong. Attention-based federated incremental learning
for traffic classification in the internet of things. Computer Communications, 185:168–175, 2022.
[73] Zhaoyang Han, Chunpeng Ge, Bingzhe Wu, and Zhe Liu. Lightweight privacy-preserving federated incremental
decision trees. IEEE Transactions on Services Computing, 2022.
[74] Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya Zhang, Qi Tian, and Yanfeng Wang. Federated domain generalization
with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 3954–3963, 2023.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 31

[75] Christophe Dupuy, Tanya G Roosta, Leo Long, Clement Chung, Rahul Gupta, and Salman Avestimehr. Learnings
from federated learning in the real world. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pages 8767–8771. IEEE, 2022.
[76] Xiaoying Wang, Zhiwei Liang, Arthur Sandor Voundi Koe, Qingwu Wu, Xiaodong Zhang, Haitao Li, and Qintai
Yang. Secure and efficient parameters aggregation protocol for federated incremental learning and its applications.
International Journal of Intelligent Systems, 37(8):4471–4487, 2022.
[77] Cobbinah B Mawuli, Jay Kumar, Ebenezer Nanor, Shangxuan Fu, Liangxu Pan, Qinli Yang, Wei Zhang, and Junming
Shao. Semi-supervised federated learning on evolving data streams. Information Sciences, page 119235, 2023.
[78] Zoujing Yao, Pengyu Song, and Chunhui Zhao. Finding trustworthy neighbors: Graph aided federated learning for
few-shot industrial fault diagnosis with data heterogeneity. Journal of Process Control, 129:103038, 2023.
[79] Yavuz Faruk Bakman, Duygu Nur Yaldiz, Yahya H Ezzeldin, and Salman Avestimehr. Federated orthogonal training:
Mitigating global catastrophic forgetting in continual federated learning. arXiv preprint arXiv:2309.01289, 2023.
[80] Hao Wang, Ruihong He, Xiaoyu Zhang, Zhaoying Bian, Dong Zeng, and Jianhua Ma. A peer-to-peer federated
continual learning network for improving ct imaging from multiple institutions. arXiv preprint arXiv:2306.02037, 2023.
[81] Yatin Chaudhary, Pranav Rai, Matthias Schubert, Hinrich Schütze, and Pankaj Gupta. Federated continual learning
for text classification via selective inter-client transfer. arXiv preprint arXiv:2210.06101, 2022.
[82] Giulio Zizzo, Ambrish Rawat, Naoise Holohan, and Seshu Tirupathi. Federated continual learning with differentially
private data sharing. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with
NeurIPS 2022), 2022.
[83] Zhe Wang, Yu Zhang, Xinlei Xu, Zhiling Fu, Hai Yang, and Wenli Du. Federated probability memory recall for
federated continual learning. Information Sciences, 629:551–565, 2023.
[84] Xiaojiang Zuo, Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, and Lydia Y. Chen. FedViT:
Federated continual learning of vision transformer at edge. Future Generation Computer Systems, 154:1–15, May 2024.
[85] Felix Schur, Parnian Kassraie, Jonas Rothfuss, and Andreas Krause. Lifelong bandit optimization: no prior and no
regret. In Uncertainty in Artificial Intelligence, pages 1847–1857. PMLR, 2023.
[86] Dongdong Li, Nan Huang, Zhe Wang, and Hai Yang. Personalized federated continual learning for task-incremental
biometrics. IEEE Internet of Things Journal, 2023.
[87] Subarnaduti Paul, Lars-Joel Frey, Roshni Kamath, Kristian Kersting, and Martin Mundt. Masked autoencoders are
efuficient continual federated learners. arXiv preprint arXiv:2306.03542, 2023.
[88] K Hemant Kumar Reddy, Rajat Shubhra Goswami, and Diptendu Sinha Roy. A deep learning-based smart service
model for context-aware intelligent transportation system. The Journal of Supercomputing, pages 1–23, 2023.
[89] Luca Barbieri, Stefano Savazzi, Mattia Brambilla, and Monica Nicoli. Decentralized federated learning for extended
sensing in 6g connected vehicles. Vehicular Communications, 33:100396, 2022.
[90] Liangqi Yuan, Yunsheng Ma, Lu Su, and Ziran Wang. Peer-to-peer federated continual learning for naturalistic
driving action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 5249–5258, 2023.
[91] Zihan Guo, Linlin You, Sheng Liu, Junshu He, and Bingran Zuo. Icmfed: An incremental and cost-efficient mechanism
of federated meta-learning for driver distraction detection. Mathematics, 11(8):1867, 2023.
[92] Le Sun, Jin Wu, Yang Xu, and Yanchun Zhang. A federated learning and blockchain framework for physiological
signal classification based on continual learning. Information Sciences, 630:586–598, 2023.
[93] Yixing Huang, Christoph Bert, Stefan Fischer, Manuel Schmidt, Arnd Dörfler, Andreas Maier, Rainer Fietkau, and
Florian Putz. Continual learning for peer-to-peer federated learning: A study on automated brain metastasis
identification. arXiv preprint arXiv:2204.13591, 2022.
[94] Kehua Guo, Tianyu Chen, Sheng Ren, Nan Li, Min Hu, and Jian Kang. Federated learning empowered real-time medical
data processing method for smart healthcare. IEEE/ACM Transactions on Computational Biology and Bioinformatics,
2022.
[95] Dong Jin, Shuangwu Chen, Huasen He, Xiaofeng Jiang, Siyu Cheng, and Jian Yang. Federated incremental learning
based evolvable intrusion detection system for zero-day attacks. IEEE Network, 37(1):125–132, 2023.
[96] Martins O Osifeko, Gerhard P Hancke, and Adnan M Abu-Mahfouz. Surveilnet: A lightweight anomaly detection
system for cooperative iot surveillance networks. IEEE Sensors Journal, 21(22):25293–25306, 2021.
[97] Peihan Qi, Xiaoyu Zhou, Yuanlei Ding, Shilian Zheng, Tao Jiang, and Zan Li. Collaborative and incremental
learning for modulation classification with heterogeneous local dataset in cognitive iot. IEEE Transactions on Green
Communications and Networking, 2022.
[98] Nan Yang, Dong Yuan, Yuning Zhang, Yongkun Deng, and Wei Bao. Asynchronous semi-supervised federated
learning with provable convergence in edge computing. IEEE Network, 36(5):136–143, 2022.
[99] Xiaoqiang He, Qianbin Chen, Lun Tang, Weili Wang, Tong Liu, Li Li, Qinghai Liu, et al. Federated continuous learning
based on stacked broad learning system assisted by digital twin networks: An incremental learning approach for

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
32 Wang et al.

intrusion detection in uav networks. IEEE Internet of Things Journal, 2023.


[100] Balsam Alkouz, Athman Bouguettaya, and Abdallah Lakhdari. Failure-sentient composition for swarm-based drone
services. arXiv preprint arXiv:2305.13892, 2023.
[101] Le Zhang, Jizhong Zhu, Di Zhang, and Yun Liu. An incremental photovoltaic power prediction method considering
concept drift and privacy protection. Applied Energy, 351:121919, 2023.
[102] Le Zhang, Jizhong Zhu, Shenglin Li, Alberto Borghetti, and Di Zhang. Online fault line detection in small-sample and
streaming data environments. IEEE Transactions on Instrumentation and Measurement, 2023.
[103] Zhihan Lv, Chen Cheng, and Haibin Lv. Blockchain based decentralized learning for security in digital twins. IEEE
Internet of Things Journal, 2023.
[104] Marco Schreyer, Hamed Hemati, Damian Borth, and Miklos A Vasarhelyi. Federated continual learning to detect
accounting anomalies in financial auditing. arXiv preprint arXiv:2210.15051, 2022.
[105] Xianjia Yu, Jorge Pena Queralta, and Tomi Westerlund. Towards lifelong federated learning in autonomous mobile
robots with continuous sim-to-real transfer. Procedia Computer Science, 210:86–93, 2022.
[106] Luke Guerdan and Hatice Gunes. Federated continual learning for socially aware robotics. In 2023 32nd IEEE
International Conference on Robot and Human Interactive Communication (RO-MAN), pages 1522–1529. IEEE, 2023.
[107] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith,
and Ameet Talwalkar. LEAF: A Benchmark for Federated Settings, December 2019.
[108] Tiantian Feng, Digbalay Bose, Tuo Zhang, Rajat Hebbar, Anil Ramakrishna, Rahul Gupta, Mi Zhang, Salman Aves-
timehr, and Shrikanth Narayanan. FedMultimodal: A Benchmark for Multimodal Federated Learning. In Proceedings
of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pages 4035–4045, New York,
NY, USA, August 2023. Association for Computing Machinery.
[109] Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang,
Philip S. Yu, Yu Rong, Peilin Zhao, Junzhou Huang, Murali Annavaram, and Salman Avestimehr. FedGraphNN: A
Federated Learning System and Benchmark for Graph Neural Networks, September 2021.
[110] Vincenzo Lomonaco and Davide Maltoni. CORe50: A New Dataset and Benchmark for Continuous Object Recognition.
In Proceedings of the 1st Annual Conference on Robot Learning, pages 17–26. PMLR, October 2017.
[111] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural
images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[112] Ryne Roady, Tyler L. Hayes, Hitesh Vaidya, and Christopher Kanan. Stream-51: Streaming Classification and Novelty
Detection From Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops, pages 228–229, 2020.
[113] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The Caltech-UCSD Birds-200-2011
Dataset.
[114] Zhiqiu Lin, Jia Shi, Deepak Pathak, and Deva Ramanan. The CLEAR Benchmark: Continual LEArning on Real-World
Imagery.
[115] Antreas Antoniou, Massimiliano Patacchiola, Mateusz Ochal, and Amos Storkey. Defining Benchmarks for Continual
Few-Shot Learning, April 2020.
[116] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing
Magazine, 29(6):141–142, 2012.
[117] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
[118] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
[119] Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learning
with a memory of diverse samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 8218–8227, 2021.
[120] Siddeshwar Raghavan, Jiangpeng He, and Fengqing Zhu. Online Class-Incremental Learning for Real-World Food
Image Classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages
8195–8204, 2024.
[121] Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual
learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[122] Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online Continual Learning on Class Incremental
Blurry Task Configuration with Anytime Inference. In International Conference on Learning Representations, October
2021.
[123] Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. GDumb: A Simple Approach that Questions Our Progress
in Continual Learning. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer
Vision – ECCV 2020, volume 12347, pages 524–540. Springer International Publishing, Cham, 2020.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 33

[124] Andrea Cossu, Gabriele Graffieti, Lorenzo Pellegrini, Davide Maltoni, Davide Bacciu, Antonio Carta, and Vincenzo
Lomonaco. Is Class-Incremental Enough for Continual Learning? Frontiers in Artificial Intelligence, 5, 2022.
[125] Hamed Hemati, Andrea Cossu, Antonio Carta, Julio Hurtado, Lorenzo Pellegrini, Davide Bacciu, Vincenzo Lomonaco,
and Damian Borth. Class-incremental learning with repetition. In Conference on Lifelong Learning Agents, pages
437–455. PMLR, 2023.
[126] Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. FATE: An Industrial Grade Platform for Collaborative
Learning With Data Protection.
[127] Alexander Ziller, Andrew Trask, Antonio Lopardo, Benjamin Szymkow, Bobby Wagner, Emma Bluemke, Jean-Mickael
Nounahon, Jonathan Passerat-Palmbach, Kritika Prakash, Nick Rose, Théo Ryffel, Zarreen Naowal Reza, and Georgios
Kaissis. PySyft: A Library for Easy Federated Learning. In Muhammad Habib ur Rehman and Mohamed Medhat
Gaber, editors, Federated Learning Systems: Towards Next-Generation AI, Studies in Computational Intelligence, pages
111–139. Springer International Publishing, Cham, 2021.
[128] TensorFlow Federated. https://www.tensorflow.org/federated.
[129] Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei
Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. Flower: A Friendly Federated Learning
Research Framework, March 2022.
[130] Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma,
Abhishek Singh, Hang Qiu, Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar,
Qiang Yang, Murali Annavaram, and Salman Avestimehr. FedML: A Research Library and Benchmark for Federated
Machine Learning, November 2020.
[131] Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, and Jingren
Zhou. FederatedScope: A Flexible Federated Learning Platform for Heterogeneity, November 2022.
[132] Vincenzo Lomonaco, Lorenzo Pellegrini, Andrea Cossu, Antonio Carta, Gabriele Graffieti, Tyler L. Hayes, Matthias
De Lange, Marc Masana, Jary Pomponi, Gido M. van de Ven, Martin Mundt, Qi She, Keiland Cooper, Jeremy Forest,
Eden Belouadah, Simone Calderara, German I. Parisi, Fabio Cuzzolin, Andreas S. Tolias, Simone Scardapane, Luca
Antiga, Subutai Ahmad, Adrian Popescu, Christopher Kanan, Joost van de Weijer, Tinne Tuytelaars, Davide Bacciu,
and Davide Maltoni. Avalanche: An End-to-End Library for Continual Learning. In 2021 IEEE/CVF Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3595–3605, June 2021.
[133] P. Ramya, S. Venkatesh Babu, and G. Venkatesan. Advancing cybersecurity with explainable artificial intelligence: A
review of the latest research. In 2023 5th International Conference on Inventive Research in Computing Applications
(ICIRCA), pages 1351–1357, 2023.
[134] Michael Ungersböck, Thomas Hiessl, Daniel Schall, and Florian Michahelles. Explainable federated learning: A
lifecycle dashboard for industrial settings. IEEE Pervasive Computing, 22(1):19–28, 2023.
[135] Truong Thu Huong, Ta Phuong Bac, Kieu Ngan Ha, Nguyen Viet Hoang, Nguyen Xuan Hoang, Nguyen Tai Hung,
and Kim Phuc Tran. Federated learning-based explainable anomaly detection for industrial control systems. IEEE
Access, 10:53854–53872, 2022.
[136] Peng Chen, Xin Du, Zhihui Lu, Jie Wu, and Patrick CK Hung. Evfl: An explainable vertical federated learning for
data-oriented artificial intelligence systems. Journal of Systems Architecture, 126:102474, 2022.
[137] José Luis Corcuera Bárcena, Pietro Ducange, Francesco Marcelloni, Giovanni Nardini, Alessandro Noferi, Alessandro
Renda, Fabrizio Ruffini, Alessio Schiavo, Giovanni Stea, and Antonio Virdis. Enabling federated learning of explainable
ai models within beyond-5g/6g networks. Computer Communications, 210:356–375, 2023.
[138] Andreas Holzinger, Anna Saranti, Anne-Christin Hauschild, Jacqueline Beinecke, Dominik Heider, Richard Roettger,
Heimo Mueller, Jan Baumbach, and Bastian Pfeifer. Human-in-the-loop integration with domain-knowledge graphs
for explainable federated deep learning. In International Cross-Domain Conference for Machine Learning and Knowledge
Extraction, pages 45–64. Springer, 2023.
[139] Witold Pedrycz. Design, interpretability, and explainability of models in the framework of granular computing and
federated learning. In 2021 IEEE Conference on Norbert Wiener in the 21st Century (21CW), pages 1–6. IEEE, 2021.
[140] Dawid Rymarczyk, Joost van de Weijer, Bartosz Zieliński, and Bartlomiej Twardowski. Icicle: Interpretable class
incremental continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages
1887–1898, 2023.
[141] Zhaoxiong Yang, Shuihai Hu, and Kai Chen. Fpga-based hardware accelerator of homomorphic encryption for
efficient federated learning. arXiv preprint arXiv:2007.10560, 2020.
[142] Junxue Zhang, Xiaodian Cheng, Wei Wang, Liu Yang, Jinbin Hu, and Kai Chen. {FLASH}: Towards a high-performance
hardware acceleration architecture for cross-silo federated learning. In 20th USENIX Symposium on Networked Systems
Design and Implementation (NSDI 23), pages 1057–1079, 2023.
[143] Zixiao Wang, Biyao Che, Liang Guo, Yang Du, Ying Chen, Jizhuang Zhao, and Wei He. Pipefl: Hardware/software
co-design of an fpga accelerator for federated learning. IEEE Access, 10:98649–98661, 2022.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
34 Wang et al.

[144] Huimin Li, Phillip Rieger, Shaza Zeitouni, Stjepan Picek, and Ahmad-Reza Sadeghi. Flairs: Fpga-accelerated inference-
resistant & secure federated learning. In 2023 33rd International Conference on Field-Programmable Logic and Applica-
tions (FPL), pages 271–276. IEEE, 2023.
[145] Biyao Che, Zixiao Wang, Ying Chen, Liang Guo, Yuan Liu, Yuan Tian, and Jizhuang Zhao. Unifl: Accelerating federated
learning using heterogeneous hardware under a unified framework. IEEE Access, 2023.
[146] Stefano Bianchi, Irene Muñoz-Martin, and Daniele Ielmini. Bio-inspired techniques in a fully digital approach for
lifelong learning. Frontiers in Neuroscience, 14:379, 2020.
[147] Duvindu Piyasena, Miyuru Thathsara, Sathursan Kanagarajah, Siew Kei Lam, and Meiqing Wu. Dynamically
growing neural network architecture for lifelong deep learning on the edge. In 2020 30th International Conference on
Field-Programmable Logic and Applications (FPL), pages 262–268. IEEE, 2020.
[148] Duvindu Piyasena, Siew-Kei Lam, and Meiqing Wu. Accelerating continual learning on edge fpga. In 2021 31st
International Conference on Field-Programmable Logic and Applications (FPL), pages 294–300. IEEE, 2021.
[149] Geethan Karunaratne, Michael Hersche, J Langeneager, Giovanni Cherubini, Manuel Le Gallo, Urs Egger, Kevin
Brew, Sam Choi, Injo Ok, Claire Silvestre, et al. In-memory realization of in-situ few-shot continual learning with a
dynamically evolving explicit memory. In ESSCIRC 2022-IEEE 48th European Solid State Circuits Conference (ESSCIRC),
pages 105–108. IEEE, 2022.
[150] Andrés Otero, Guillermo Sanllorente, Eduardo de la Torre, and Jose Nunez-Yanez. Evolutionary fpga-based spiking
neural networks for continual learning. In International Symposium on Applied Reconfigurable Computing, pages
260–274. Springer, 2023.
[151] Shivam Aggarwal, Kuluhan Binici, and Tulika Mitra. Chameleon: Dual memory replay for online continual learning
on edge devices. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023.
[152] Dhireesha Kudithipudi, Anurag Daram, Abdullah M Zyarah, Fatima Tuz Zohora, James B Aimone, Angel Yanguas-Gil,
Nicholas Soures, Emre Neftci, Matthew Mattina, Vincenzo Lomonaco, et al. Design principles for lifelong learning ai
accelerators. Nature Electronics, pages 1–16, 2023.
[153] Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning
and growth for efficient inference and training in neural networks. The Journal of Machine Learning Research,
22(1):10882–11005, 2021.
[154] Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. On-device training under 256kb
memory. Advances in Neural Information Processing Systems, 35:22941–22954, 2022.
[155] Chaoyang He, Erum Mushtaq, Jie Ding, and Salman Avestimehr. Fednas: Federated deep learning via neural
architecture search. 2021.
[156] Hangyu Zhu and Yaochu Jin. Real-time federated evolutionary neural architecture search. IEEE transactions on
evolutionary computation, 26(2):364–378, 2021.
[157] Wayne Luk. Heterogeneous reconfigurable accelerators: Trends and perspectives. In 2023 60th ACM/IEEE Design
Automation Conference (DAC), pages 1–2. IEEE, 2023.
[158] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan,
Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural
information processing systems, 33:1877–1901, 2020.
[159] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[160] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda
Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In
International conference on machine learning, pages 8748–8763. PMLR, 2021.
[161] Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, and Daniel Rubin.
Rethinking architecture design for tackling data heterogeneity in federated learning. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 10061–10071, 2022.
[162] Yuanyishu Tian, Yao Wan, Lingjuan Lyu, Dezhong Yao, Hai Jin, and Lichao Sun. Fedbert: When federated learning
meets pre-training. ACM Transactions on Intelligent Systems and Technology (TIST), 13(4):1–26, 2022.
[163] Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo,
Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. In International Conference on
Machine Learning, pages 2790–2799. PMLR, 2019.
[164] Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. arXiv
preprint arXiv:2104.08691, 2021.
[165] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.
Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
[166] Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy,
and Salman Avestimehr. Slora: Federated parameter efficient fine-tuning of language models. arXiv preprint

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.
Federated Continual Learning for Edge-AI: A Comprehensive Survey 35

arXiv:2308.06522, 2023.
[167] Liping Yi, Han Yu, Gang Wang, and Xiaoguang Liu. Fedlora: Model-heterogeneous personalized federated learning
with lora tuning. arXiv preprint arXiv:2310.13283, 2023.
[168] Shangchao Su, Bin Li, and Xiangyang Xue. Fedra: A random allocation strategy for federated tuning to unleash the
power of heterogeneous clients. arXiv preprint arXiv:2311.11227, 2023.
[169] Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K Roy-Chowdhury, Ananda Theertha Suresh, and
Samet Oymak. Fedyolo: Augmenting federated learning with pretrained transformers. arXiv preprint arXiv:2307.04905,
2023.
[170] Yuyuan Zhao, Tian Zhao, Peng Xiang, Qingshan Li, and Zhong Chen. Multi-task federated learning medical analysis
algorithm integrated into adapter. In 2023 IEEE 8th International Conference on Big Data Analytics (ICBDA), pages
24–30. IEEE, 2023.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: November 2024.

You might also like