FL For Computer Vision
FL For Computer Vision
Abstract—Computer Vision (CV) is playing a significant role in grasp their surroundings but also paves the way for more
transforming society by utilizing machine learning (ML) tools for informed decision-making processes [4]. By incorporating
arXiv:2308.13558v1 [cs.CV] 24 Aug 2023
numerous flaws, such delicate visual data could be exploited essence, FL enables devices to collaboratively learn a shared
or exposed [19]. Secondly, the privacy-preserving analysis model without having to share raw data, a feature that
of images and videos from video surveillance applications, holds immense potential for applications in diverse domains,
where CV methods are used to detect violations (e.g. mask- particularly in CV. In CV, the applications of FL range
wearing, distance keeping, etc.) in public spaces can be quite from object detection and image classification to semantic
challenging too [20], [19]. Such videos frequently include segmentation, among others, with an array of potential real-
inadvertently caught private objects including faces, car plates, world use cases in sectors such as healthcare, autonomous
computer screens, and more. The ability of methods to detect driving, and surveillance systems. However, the adoption of
and even identify humans and other objects in unconstrained FL in CV is not without its challenges. Issues related to model
environments can put at risk the anonymity of people in performance, communication efficiency, data heterogeneity,
monitored places and, if not used properly, can become a and privacy preservation need to be addressed and carefully
threat to citizen privacy [21]. managed. This systematic literature review aims to delve deep
The rise of edge computing architectures has introduced a into these challenges, exploring the progress, limitations, and
new potential to privacy-preserving CV, since with the proper future directions of applying FL in the realm of CV. The
use of the limited edge resources, it has become possible to review will address several research questions that encapsulate
perform basic CV tasks without transferring sensitive data to the core facets of this intriguing intersection of FL and CV,
the cloud [22]. Sharing the trained model rather than releasing which can summarized in Table I.
the actual data also helps to retain the privacy of the data, During our systematic review, we initially collected 912
without losing on prediction performance [23]. Due to their papers. To ensure the relevance of the literature, we applied
built-in privacy-preserving features, federated learning (FL) basic criteria such as title, abstract, and topic alignment with
[24] and split learning (SL) [25] are the two ML techniques, our research question. We then established detailed inclusion
that perform on visual data on a distributed manner and have and exclusion criteria to streamline the selection process. The
attracted the interest of researchers in the CV field. Such inclusion criteria encompassed papers proposing FL solutions,
approaches assume a client-server (edge-cloud) architecture, discussing the applications of FL in CV, implementing tech-
where the clients usually have fewer resources than the server. niques, or proposing enhanced versions of FL. Conversely, the
The clients train their models individually, using their own exclusion criteria were applied to exclude publications that did
data, and then exchange their models either with the server or not specifically use FL, it was only mentioned in the literature
among them in order to synchronize what they learned [26]. review , used for comparison purposes or used other research
The ML models are trained locally on the client (edge) devices sectors, etc. One author took the lead in the selection strategy
and this happens in parallel, as long as the clients receive and conducted the initial screening, ensuring consistency with
and process training data. Periodically, they exchange models our research theme. After removing duplicates, we identified
and aggregate them following simple or more sophisticated 385 unique articles. We then conducted a thorough assessment
strategies, which may involve additional training on the cloud of the remaining articles by carefully reviewing titles, abstracts,
for avoiding the bias introduced in the clients [27]. In the case and conclusions. Based on this assessment, we narrowed down
of SL the training of the model layers is also split between the selection to 255 articles that exhibited relevance based
the edge and the cloud. Only a few layers are trained on the on title and abstract. In the subsequent stage, we applied the
edge, in order to facilitate training with little resources, and specified inclusion and exclusion criteria to the remaining
the remaining layers are trained on the cloud [28]. articles, leading to the exclusion of certain studies that did not
A risk that emerges from this distributed training process for meet our criteria.
CV models is the exposure to malicious users that intentionally During the selection process, the following exclusion criteria
introduce noise or falsified models in order to bias the FL were adopted: (i) duplicate records, (ii) papers that did not
model for their benefit [29]. Adversarial training techniques can comment on the performance of FL in CV, (iii) papers related to
be employed to strengthen the FL models, but even adversarial the implementation of similar but not FL techniques, (iv) papers
training has leakages [30]. The use of blockchain technologies related to research sectors other than CV, and (v) papers written
can help mitigate many of the threats of data or model-sharing in languages other than English. Furthermore, the following
methods. All the model-sharing activities of a node that is inclusion criteria were used to select relevant literature: (i)
allowed to share are stored as transactions in the blockchain and proposes an improvement an FL-based solution in CV, (ii)
all information about the providers’ profiles is also stored in the addresses the privacy and security issues in CV using FL
blockchain [31]. In this multi-party setup, when a node needs techniques, (iii) measured and optimized model performance
to update its model by using the models of other nodes, first in FL, (iv) discusses the deployment of FL systems in real-
performs a request for models to its neighbouring nodes, then world CV applications
validates the nodes by checking the blockchain and retrieves By applying these exclusion and inclusion criteria, we
their models from the blockchain. The resulting FL model is ensured that the selected articles provided insights, solutions,
consequently stored on the blockchain in a new transaction or advancements specifically related to the filter bubble
[32]. phenomenon in RSss. This process resulted in a final selection
of 28 articles that met our inclusion criteria. In order to ensure
a comprehensive review, we conducted a reference scan of
A. Survey Methodology the selected articles, which led us to identify an additional
FL is emerging as a powerful machine learning paradigm 6 relevant papers. Consequently, a total of 34 articles were
that allows for the training of AI models across numerous included in our systematic review on the existence of the filter
decentralized devices while maintaining data privacy. In bubble.
3
TABLE I
R ESEARCH QUESTIONS COVERED IN THIS REVIEW.
Global Model
5
Aggregator
1 1
4 2 1 Initialization
2 Local Training
3 3
3
3
1 Encrypt and Send
1 Gradients
4 Secure Aggregation
Global Model
Aggregator A A
2 2
5
4 4
B
B Results to be exchanged
1 1 Data Preprocessing
2 Initialization
3 Local Training
H
H 4
1 Encrypt and Send
Gradients
5 Secure Aggregation
X
6
Global Model
• Communication Overhead: In CL, participants typically CL may face challenges in terms of scalability, as direct
need to establish direct communication channels to communication and coordination among all participants
exchange data and model updates. This can require become increasingly complex with a larger number of
significant communication overhead, especially when entities.
the number of participants is large. FL reduces this These differences highlight the varying degrees of decentraliza-
communication overhead by relying on a central server tion, data ownership, communication, and scalability between
or coordinator that facilitates the aggregation process. CL and FL. While they share common principles, the specific
• Privacy and Security Focus: While both approaches implementation and focus of each approach can differ based
prioritize privacy and security, FL places a stronger on the context and requirements of the collaborative training
emphasis on data privacy. FL minimizes the exposure of scenario.
raw data by exchanging only model updates or gradients
between participants and the central server. CL may
involve more direct sharing of data or model parameters, B. Problem formulation
which can introduce potential privacy risks. Konečnỳ et al.’s work has made FL well-known, but there are
• Scalability: FL is particularly suitable for large-scale dis- other definitions of the concept available in the literature [44].
tributed environments with a large number of participants FL can be realized through various topologies and compute
or devices. Its decentralized nature allows for scalability plans despite a shared objective of combining knowledge
and efficient training across a vast network of devices. from non-co-located data. This section aims to provide a
detailed explanation of what FL is, while also highlighting the
7
Global Model
5
Aggregator
1 1
4 2 1 Task-Specific Pre-training
Transfer learning
2 Model Alignment
3 3
3
3
1 Knowledge Transfer to the
1 Global Model
4 Secure Aggregation
Target model N
Target model 1
Target model 2
Federated learning
significant challenges and technical considerations that arise which can be a useful approach to handle the statistical
when using FL in CV. heterogeneity of the data in FL. While FL and classical
The goal of FL is generally to build a global statistical model distributed learning both aim to minimize the empirical risk
using data from many remote devices, which can range from across distributed entities, there are several challenges that
tens to millions in number. This process involves minimizing must be addressed when implementing this objective in a
an objective function that captures the desired characteristics federated setting.
of the model.
m
X C. Aggregation approaches in FL
minF (w), where F (w) := pk Fk (w) (2)
w
k=1
Following local training, the models are combined using an
aggregation algorithm. To be more specific, each model takes
where m is the overall number of devices/centers, Fk is the an update step in its respective center k, utilizing a learning
local objective function for the k th device/center, and pk rate η and the gradients gk , so that [46]:
represents relative impact of every device/center with pk ≥ 0
m
P
and pk = 1. k
wt+1 = wt − η.gk , ∀k (3)
k=1
The empirical risk for a local dataset can be expressed Subsequently, the weights are gathered into the global model
using the local objective function Fk . The relative weight in a manner that is relative to the sample size of each center
or influence of each device, represented by pk , can be set [46]:
by the user. There are two common choices for setting pk :
1
pk = m or pk = nk
n , where n is the total number of samples
m
X nk k
from all devices. Although this is a common FL objective, Wt+1 = wt+1 (4)
n
there are other alternatives, such as multi-task learning [45], k=1
where related local models are learned simultaneously and each There are numerous FL aggregation techniques in the
client corresponds to a task. Both multi-task learning and meta- literature. In the following, we will discuss some of the most
learning allow for personalized or device-specific modeling, useful techniques specifically applicable to CV-based FL.
8
● Communication overhead
● Heterogeneity of client nodes
● Non-IID data ● Averaging
● Device compatibility ● Weighted variant
● Network issues ● Deployment over heterogeneous ● Federated averaging ● Data is distributed
● Human bias environments ● Trimmed mean ● Entities share their model updates
● Privacy and robustness to attacks ● Efficient communication ● Krum-based directly with each other.
● Performance ● Dispersed FL ● Majority voting ● Access to the entire dataset
● Replication of research results ● New organizational models ● Quantization and compression ● More centralized approach
Aggregation Collaborative
Open challenges Future directions
techniques learning
Federated Learning in
Computer Vision
Federated
K-means Local
Vertical Unsupervised
Differential
Federated PCA Object detection
privacy
Global
Federated NN Video
Horizontal
surveillance
Blockchain
Federated SVM
Supervised Secure Face detection
Federated aggregation
transfer learning Federated DT
Cryptographic Hommomorphic
Federated linear Medical AI
techniques encryption
models
MPC Autonomous
FSSL
driving
Semi-supervised Few-shot
Self-supervised
These techniques include: Fast Fourier Transform (FFT) is to obtain both the amplitude
map and the phase map of the parameters (conversion from
1) Averaging aggregation: There are many federated averag- parameters to map). The inverse FFT (IFFT) is used for the
ing (FedAvg) aggregation strategies in the literature. FedAvg inverse operation (conversion from map to parameters). Fig. 6
algorithm is thoroughly detailed in [33]. Fig. 5 provides a illustrates the principle of the PFA algorithm, which can be
summary of the main differences, characteristics, and principles employed as an aggregation technique in FL.
among the alternatives.
2) Progressive Fourier aggregation: The paper [47] 3) FedGKT aggregation: FL group knowledge transfer
addressed the retrogress and class imbalance problems using (FedGKT) represents a streamlined FL approach tailored to
a personalized FL approach. Precisely, for better integrating edge devices with limited resources. Its primary objective
the parameters of client models in the frequency domain, a is to combine the advantages of FedAvg and split learning
progressive Fourier aggregation (PFA) is used at the server. (SL) by employing local stochastic gradient descent (SGD)
Next, a deputy-enhanced transfer (DET) is designed at the training, similar to FedAvg, while simultaneously ensuring
clients’ side to easily share overall knowledge with the low computational burden at the edge, akin to SL. Notably,
personalized local model. In particular, the approach involves FedGKT facilitates the seamless transmission of knowledge
the development of PFA at the server, which ensures a stable from numerous compact edge-trained convolutional neural
and efficient gathering of global knowledge. This is achieved networks (CNNs) to a significantly larger CNN trained at a
by gradually integrating client models from low-frequency cloud server. Fig. 7 illustrates the overview of FedGKT, (i) at
to high-frequency. Additionally, the authors introduce a the edge device, a compact CNN is employed, comprising a
deputy model at the client’s end to receive the aggregated lightweight feature extractor and classifier, efficiently trained
server model. This facilitates the implementation of the using its private data (local training). (ii) Following local
DET strategy, which follows three types of decisions (dk ): training, consensus is reached among all edge nodes to
Recover-Exchange-Sublimate. These steps aim to enhance generate uniform tensor dimensions as output from the feature
the personalized local model by smoothly transferring global extractor. Subsequently, the larger server model is trained,
knowledge. In this process, the advantage of utilizing the wherein the extracted features from the edge-side model
9
Characteristics
Characteristics
● It has been demonstrated to exhibit
● This approach is highly appropriate for wireless
Characteristics divergence in scenarios where the
topologies, where network conditions and user
data is not identically distributed
availability can undergo rapid changes.
● It is possible to adapt the central model's among clients.
● Turbo-aggregate's secure aggregation handles
size to the diversity of data distribution through ● This approach prevents participating
user dropouts but lacks adaptability for new users.
the application of a Bayesian non-parametric devices from performing varying
Extending it with a self-configurable protocol could
Technique. amounts of local work based on their
accommodate on-the-go new users by adjusting
● The Bayesian non-parametric mechanism underlying system constraints.
system specifications such as encoding configuration
is susceptible to model poisoning attacks,
and clustering for resilience and privacy guarantees.
wherein adversaries can manipulate the system
Principle
to expand the global model to lodge poisoned
local models. The clients execute several Principle
batch updates on their local
data and then send the updated Is a multi-group approach where clients
weights instead of the gradients are categorized into multiple groups,
Principle to the server. and model updates are shared among
Before aggregating, consider these groups in a cyclic fashion. Besides,
the permutation invariance of an additive secret sharing mechanism
neurons to enable updating Statistical-oriented is employed to safeguard the privacy
the global model's size. of clients' data.
FedAvg
Statistical-oriented Communication and security-oriented
Statistical-oriented
Communication-oriented Communication-oriented
Allow clients to perform multiple Include a term close to the local training A hierarchical client-edge-cloud aggregation
local updates on the model architecture is implemented, where edge servers
sub-problem on every client device to
before sharing those updates aggregate model updates from their respective
restrict the impact of each local model
with the server. clients and subsequently send them to the
update on the global model.
cloud server for global aggregation.
serve as inputs. The training process employs a knowledge requires a substantially smaller number of parameters, ranging
distillation (KD)-based loss function [10] to minimize the from 54 to 105 times fewer.
disparity between ground truth and soft label predictions,
representing probabilistic estimations obtained from the edge- D. Privacy technologies of FL
side model ( Periodic transfer). (iii) To enhance the edge FL places a strong emphasis on privacy management, which
model’s performance, the server conveys its predicted soft involves analyzing security models in order to effectively pro-
labels to the edge, enabling the edge to conduct further training tect personal information. In this section, various technologies
on its local dataset using a KD-based loss function with the that are currently used to safeguard privacy in the context of
server’s soft labels ( Transfer back). As a result, both the server FL will be discussed.
and edge models mutually benefit from knowledge exchange, 1) Secure MPC model: Secure MPC (SMPC) models
leading to performance improvement. (iv) After completion involve data from multiple parties and use a well-defined
of the training process, the final model is a fusion of the simulation framework to provide safety certification. These
local feature extractor and the shared server model ( Edge- models are designed to ensure that there is no interaction or
sided model) [48]. The researchers who proposed FedGKT sharing of knowledge data between the parties, meaning that
[48] claimed that it enables cost-effective edge training by users are unaware of the input and output data as well as any
significantly reducing computational expenses. In comparison other information [49]. This zero-knowledge model can be
to edge training with FedAvg, FedGKT demands only 9 to considered a form of secure and complex computing protocol
17 times less computational power (measured in FLOPs) and that is based on publicly available knowledge. Research has
10
0 123456782149732 6 7 6 2 9 62
;<
*#+,#+
:;<=>?:@;<
78 7 79 23456
-0 -. -/
BCD
-A
10 1. 1/ S4GTU43EFGHIJK4LMNOPQIR4
xv¡v{wG FE
@ C
HP DH
wMH
vwzG@F
}xyE u@
uxC
vuOLyzF wCJPvv PD
E
yLD x@vM¡NvC
wED z
y EM L
}£Ý K{z{D
@DH
LG@C C
H
[50]. The MPC protocol enables model training and verification third parties to identify individual users, thus rendering the
Þvx~K y@
F
v@CC´ECvHP
zw @{Pv
EJPvwzK
xyC HxDvCuE@FwGÃHGI?A~~~K@FDHDHLGP@C@FIELJE
without requiring users to share sensitive data. However, SMPC data irrecoverable [52], [53]. However, this method necessitates
}v xz
has certain limitations. It is unable to prevent servers from the transfer of data, which can negatively impact data accuracy,
ßÓÉ×à Íã
privacy attacks from other clients. Additionally, the four-round
¯ ¬ä
scenarios. In addition, some researchers have suggested the
HDEF@DHÅLGFçèE xv{xH
vF}E
vwP{ CDHKCEF xv¡L{vG
interactive nature of the protocol can lead to wasted data and
{vxvæª z}z{v
reduced model accuracy as the server does not have access to
wJ
°
zPL{QxvM}{L{
°
incorporation
GHM{@vDHvLwGw
of±differential
±
| yw|z E¨
DOE©EGDzEPEF
privacy within FL to safeguard
zvx
client data by masking client contributions duringµ training
}}vzy }z{z
GD v
HPKx@ |KwEyF
zv Åy
OEK FLKL~PvEF xyuwuz
the client data until the submission phase is completed [51].
L{{KG
vL xvOC¡vEwJI
EF@GwvPwQzE}
F?~}vxEJ A
[54]. For instance, [55] introduced a differentially private
FL framework for analyzing histopathology images, which
QF@EOLFNQLFFEPL{FMEMLGPDF@HGEJEJIEJE|HMEP}EJ @HPDL
2) Differential privacy: Differential privacy technology, as
indicated by past studies, has been employed as a protective are considered some of the most intricate medical images.
}EJB|I@GJ~~~wDF@HGHGI{PHGICLM@C@PHG}EJB|
JE@GJ@DDzEEJIE@PHG}EJ M@GDF@GPQEFNGLOCEJIEQFL
11
FedAvg
mimic a distributed environment. Furthermore, [56] presented
an FL framework capable of generating a global model from
distributed health data stored in diverse locations, without the
need to move or share raw data. This framework also ensures
privacy protection through differential privacy. An evaluation
using real electronic health data from a million patients across
two healthcare applications indicated the framework’s success
in delivering considerable privacy without compromising the Client 1 Client 3
global model’s utility.
Differential privacy can be used in the training of DNN Client 2
through the use of the differentially private stochastic gradient
supervised learning
descent (DP-SGD) algorithm. [57] introduces deepee, a free
labelled data
and open-source framework for differentially private DL that
is compatible with the PyTorch DL framework for medical
image analysis. This study uses parallel processing to calculate
and modify the gradients for each sample. This process is
made efficient through the use of a data structure that stores
shared memory references to the neural network weights Fig. 8. The basic setup of a typical FL system with supervised learning
in order to save memory. It also provides specialized data involves several steps. Initially, the server picks three clients and transmits the
loading procedures and privacy budget tracking based on global model wtg to them. Subsequently, these clients employ their labeled
to update the global model wtg , generating their respective local models.
the Gaussian differential privacy framework, as well as the data Once this process is completed, the clients send their updated local models
ability to automatically modify the user-provided neural back to the server. The server then aggregates these local models into a new
network architecture to ensure that it adheres to DP standards. global model using the FedAvg algorithm [62].
Besides, [58] incorporates FL into the DL of medical image
analysis models to enhance the protection of local models and
prevent adversaries from inferring private medical data through privacy, using blockchain ledger technology to decentralize the
attacks such as model reconstruction or model inversion. FL process without the need for a central server. Specifically,
Cryptographic techniques such as masks and homomorphic homomorphic encryption protects the privacy of the model’s
encryption are utilized. Instead of relying on the size of the gradients. This framework involves training a local model
datasets, as is commonly done in DL, the contribution rate using a capsule network for the segmentation and classification
of each local model to the global model during each training of COVID-19 images, securing the local model with the
epoch is determined based on the qualities of the datasets homomorphic encryption scheme, and sharing the model over a
owned by the participating entities. Additionally, a dropout- decentralized platform using a blockchain-based FL algorithm.
tolerant approach for FL is presented in which the process
is not interrupted if the number of online clients is above a
E. Learning process
predetermined threshold.
3) Homomorphic encryption: Homomorphic encryption 1) Supervised FL: Supervised FL trains ML models on
is a type of encryption used in the ML process that uses sensitive labeled data across multiple devices and learns to
parameter exchange to protect the privacy of user data. Unlike predict an output based on the input data. Each device runs
differential privacy, it does not transmit the data or models the model and updates its parameters based on its local data,
themselves and encrypts the data without allowing it to be and then the updates are aggregated to improve the global
discovered. This makes it less likely for the original data to model. The supervision is ensured through the presence of
be leaked. The additive homomorphic encryption model is labeled data in the participating entities’ local datasets, and
the most commonly used version in practice. Besides, despite the learning process can be implemented using support vector
the benefits of FL, there is a risk that private or sensitive machines (SVM) [61], linear models (LMs), neural networks
personal data may be exposed through membership attacks (NN), or decision trees (DT) [34]. Fig. 8 illustrates an example
when model parameters or summary statistics are shared of a typical supervised FL with three clients.
with a central site. To address this issue, [59] presents a 2) Unsupervised FL: Unsupervised FL is a variant of FL
secure FL-based framework that employs fully homomorphic in which the participating devices do not have access to
encryption (FHE). In doing so, the Cheon-Kim-Kim-Song labeled data. Instead, they must rely on unsupervised learning
(CKKS) construction has been used, which enables the techniques, such as clustering or dimensionality reduction, to
execution of approximate calculations, on real and floating- learn useful representations of the data. This can be useful
point numbers benefiting from ciphertext rescaling and packing. in scenarios where collecting and annotating labeled data is
Moving on, [60] introduces a homomorphic encryption and difficult or expensive, or where the data is highly sensitive and
blockchain-based privacy-preserving aggregation framework cannot be shared with a centralized server. The learning process
for medical image analysis. This allows hospitals to collaborate can be implemented using federated principal component
and train encrypted federated models while maintaining data analysis (PCA) [63], federated k-means [64], etc.
12
3) Semi-supervised FL: FL allows training ML algorithms a small amount of labeled private facial expression data to
with a semi-supervised learning (SSL) process on remote train local models on individual devices is proposed. They
datasets without the need to share the data itself. However, are then aggregated in a central server to create a globally
data annotation remains a challenge, especially in fields like optimal model. FL is also used to update the feature extractor
medicine and surgery where specialized knowledge is often network on unlabeled private facial data from user devices to
required. To address this issue, semi-supervised FL has recently learn robust face representations.
been used where the participating devices have access to a
dataset that contains both labeled and unlabeled examples. The F. Evaluation of FL-based CV scheme
labeled examples are used to train a model using supervised
learning techniques, while the unlabeled examples are used to To assess the efficiency of FL systems in CV, it is essential to
learn additional useful features using unsupervised techniques employ metrics that are responsive in gauging the performance
such as clustering or dimensionality reduction. Fig. 9 illustrates of implementing FL across various CV applications. Besides
an example of a typical semi-supervised FL with three clients. the famous metrics that are commonly employed in ML and
In this direction, Kassem et al. [65] propose FedCy, a DL, such as accuracy, F1-score, and area under the curve
federated SSL (FSSL) system that combines FL with self- (AUC) [70], other metrics are summarized in Table II.
supervised learning to improve the performance of surgical
phase recognition using a decentralized dataset containing both III. A PPLICATIONS OF FL IN CV
labeled and unlabeled videos. FedCy uses temporal patterns in FL has diverse applications. It enables privacy-preserving
the labeled data to guide the unsupervised training of unlabeled object detection, face detection, and video surveillance in smart
data towards task-specific features for phase recognition. This environments. It also facilitates advancements in healthcare
scheme outperforms state-of-the-art (SOTA) FSSL methods and medical AI, and plays a crucial role in autonomous driving
on the task of automatically recognizing surgical phases using within CV. Below, the survey provides detailed information
a multi-institutional dataset of laparoscopic cholecystectomy about each of these domains.
videos. Additionally, it learns more generalizable features when
tested on data from an unseen domain. [66] examines the
effectiveness of SOTA video SSL techniques when used in A. Object detection
a large-scale FL setting, as simulated using the kinetics-400 A special CV task is the segmentation of images and more
dataset. The limitations of these techniques in this context are specifically the detection of salient objects in one or multiple
identified before introducing a new federated SSL framework images. Salient object detection (SOD) is an important pre-
for a video called FedVSSL. FedVSSL incorporates various processing step in many CV tasks, such as object detection and
aggregation strategies and partial weight updates and has been tracking, semantic segmentation, and the interaction between
shown through experiments to outperform centralized SOTA by robots and humans (e.g. for tracing human hands during
6.66% on UCF-101 and 5.13% on HMDB-51 in downstream imitation learning scenarios).
retrieval tasks. 1) CL for object detection: CL techniques that combine the
Moving forward, [67] presents a self-supervised privacy merits of multiple feature learning techniques from images
preservation framework for action recognition, namely SPAct. seem to be the most promising technique. The main issue to
It includes three main components: (i) an anonymization handle is the separation of the object from its background,
function, (ii) a self-supervised privacy removal module, and any foreground occlusions, or noise. Other important issues
(iii) an action recognition module. A minimax optimization related to the size and variety of objects to be detected in the
strategy is used to train this framework, which minimizes the same image and the need to track objects across consecutive or
action recognition cost function and maximizes the privacy cost groups of images. For example, the work in [71] emphasized
function through a contrastive self-supervised loss. By using on the use of RGB-D methods that combine RGB images
existing protocols for known action and privacy attributes, this with depth images in order to improve the SOD across
framework achieves a good balance between action recognition multiple images. They proposed a CL framework (CoNet),
and privacy protection, similar to the current SOTA supervised which combines the edge information extracted from low-level
methods. Additionally, a new protocol to test the generalization features of the images with the merits of a spatial attention map
of the learned anonymization function to novel action and that detects salient features in the images and depth images
privacy attributes is introduced. that better locate salient objects in a scene. The three different
In [68], FedUTN is proposed which is an FL allowing each collaborators are combined in a knowledge collector module
client to train a model that works well on both independently that first concatenates salient and edge features to jointly learn
and identically distributed (IID) and non-independent and the boundaries and locations of salient objects and then uses
identically distributed (non-IID) data. In this framework, each the depth information to separate salient regions from their
party has two networks, a target network, and an online background. Another work [72] presents a NN architecture for
network. FedUTN uses the online network parameters from content-aware segmentation of sets of images that employs co-
each terminal to update the target network, which is different saliency maps generated from the input images using a group
from the method used in previous studies. FedUTN also CL framework (GCoNet). The proposed method outperforms
introduces a new control algorithm for training. After testing, standard SOD alternatives and is capable of detecting co-
it was found that FedUTN’s method of aggregation is simpler, salient objects in real-time. The main criteria employed are
more effective, and more robust than other FL algorithms, the compactness of the extracted objects within the group of
and outperforms the SOTA algorithm by 0.5%-1.6% under images and the separability of objects from other noise objects
normal conditions. In [69], a few-shot FL framework utilizing in the scenes.
13
supervised learning
labelled data ( ) labelled representation ( )
encode
FedAvg
, , ,
unsupervised learning
unlabelled data
Fig. 9. The structure of our semi-supervised FL system involves a server and three clients, which adhere to the standard FL process to update a global
a ag
autoencoder wt g . Next, the server employs the encoder from wt+1 to encode the labeled dataset D on the server, resulting in a labeled representation
dataset D′ . Afterward, the server utilizes supervised learning with D′ to train a classifier wts , producing a new classifier wt+1
s [62].
TABLE II
T HE EVALUATION METRICS USED IN THE SUGGESTED FL- BASED CV SCHEMES .
Moving on, the research work on hand detection and tracking the problem of occlusions, aspect changes and articulations
over long videos [73], [74] focused on the need for fast and that may hinder the proper detection of objects in uncontrolled
long-term object detection even in the presence of temporal scenes, [75] propose a method that first detects object parts
occlusions and changes among frames. In order to tackle these and consequently tries to associate them and detect the object.
problems, the researchers suggested the use of a hand detection The detected object parts are called granules, i.e. small areas
model (Based on faster R-CNN) combined with the projection with simple properties (e.g. color) that separate them from their
across an image (frame) sequences in order to detect clusters of context. Using a combinatorial optimization method, based on
bounding boxes that remain almost stable across frames, even simulated annealing, they learn how to associate the proper
if they are occluded in some of the frames. In order to handle neighboring granules that compose the object. The problem of
14
overlapping objects is also discussed by [76], where the authors paradigm as an alternative to FL-based averaging (FedAvg) or
propose a new object detection algorithm that improves the more proficient averaging techniques.
localization of the detected objects and suppresses redundant 2) FL for object detection: The learning process for object
detection boxes, using a better loss function. detection is usually handled as a centralized offline task, but
in resource-restricted environments and applications that need
Moreover, CL for medical image segmentation and clas- mobility, privacy, and security decentralized and distributed
sification has been applied by [77]. The authors distinguish approaches, as well as cloud-edge collaborative approaches
between the image segmentation and annotation of segments have been proposed. In an attempt to optimize object detection
task, and that of detecting the severity of the disease by performance, authors in [84] proposed an FL approach that
considering the image as a whole. They propose the use of improves the performance of the federated averaging model
CL method for disease severity classification that is based on aggregation over independent and identically distributed (IID)
attention over the detected and annotated segments. A weakly data. The main claim of their work is that SOTA DL CNN
supervised framework for CL is proposed by [78] for allowing models have been trained in controlled, centralized and
object detection using labels at the image level. The proposed balanced datasets and cannot perform well on non-independent
framework combines a weakly supervised learner (i.e. a two- and identically distributed (non-IID) data. Since in cases
stream CNN built on VGG16) with a strongly supervised that need privacy and security, such as in medical images
learner (i.e. faster-RCNN) and trains the two subnetworks for example, the data can be non-IID, the use of simple
in a collaborative manner. The former subnetwork optimizes federated algorithms, such as FedAvg, can be problematic.
the multi-label classification task, whereas the later optimize For this purpose, they propose a weighted variant of FedAvg
the prediction consistency of the detected objects’ bounding that improves FL performance in non-IID image data. Using
boxes. Another weakly supervised object detection approach collaborative intelligence on the edge, or balancing between
that employs image-level labels in order to learn to detect the the edge and the cloud is an efficient FL paradigm that can
accurate location of objects in the training images has been improve the performance of object detection tasks. The work
presented by [79]. A CL framework has first been trained of [85] has shown that splitting the deep NN models between
on the image-level labels aiming to optimize the image-level the cloud and the edge, apart from being privacy-preserving
classifier and to assign higher weights to instances with more and secure can also be more efficient for object detection tasks.
confidence (i.e. without much noise, with fewer objects and Efficiency can be achieved by quantizing and compressing
simpler backgrounds). Then, a curriculum learning component the tensors of the first layers before sending them to the final
is employed for guiding the learning procedure of object layers that reside in the cloud. Depending on where the split is
localization and classification. In [80], the authors also propose performed, different compression techniques (lossless or lossy
an object detection method from remote sensing images, which ones) are preferred. The same network split strategy (splitNN)
combines a weakly supervised detector for image-level labels has been used in [86] for protecting the privacy of medical
and a strongly supervised detector for object localization. In data. More specifically, the client nodes train the network up
a similar manner, [81] split the salient object detection task to a certain layer and the outputs are sent to the server to
into two sub-tasks that are examined in parallel. The first perform the rest of the training. The inverse process takes
task relates to the estimation of internal semantics and the place during the back-propagation of the gradients. In order to
second to the prediction of the object boundaries. VGG-16 further protect the labels of the training samples from exposure
is used as a basis for feature extraction from images and an to the server, the authors also propose a u-shaped forward
additional layer is used to define their semantics. Two decoders and backward propagation process in which the first and last
are combined for the second part. The first detects the broader layers of the network (along with the training labels) are kept
area of interest in the images and the second performs a fine- in the clients. Finally, they propose a vertically partitioned
grained boundary detection. The CL network joins the two data model in which multiple clients train the same first layers
networks in order to efficiently fuse semantics and boundaries of their networks in parallel and send the outputs to the server
and extract the saliency maps of each image. Combining that concatenates them before proceeding with the remaining
an image enhancement network with the object recognition layers. This way the clients share the model (at least the last
network in order to be able to recognize objects in images of layers) without sharing their data.
extremely low resolution is proposed in [82]. The two networks
were trained using a CL technique in which the knowledge
from the object recognition network is used to enhance the B. Self-supervised-based FL
low-resolution images that are given as input to the other Self-supervised learning (SSL) and its variants, momen-
network. The enhanced images are then fed to the classifier to tum contrast (MoCo), bootstrap your own latent (BYOL),
improve its performance. Four different losses (reconstruction, and simple siamese (SimSiam), are powerful techniques
perpetual, classification and edge loss) are combined, in order for learning representations from centralized data [87]. FL
to optimize the object recognition performance. Transferring has been combined with SSL to address privacy concerns
the knowledge distillation method in a distributed and CL with decentralized data. However, there is a lack of in-
setup has been proven beneficial for online object detection depth understanding of the fundamental building blocks for
tasks [83]. The student models (one in each node) are trained federated self-supervised learning (FedSSL). The reference
separately using as their teacher model, an ensemble that [87] introduces a federated hybrid self-supervised learning
comes from the fusion of the student logits. The teacher’s (FedHSSL) framework that combines VFL and SSL techniques
knowledge is then distilled back to the students in the form such as SimSiam, addressing data deficiency. FedHSSL utilizes
of a soft target. This logic can easily be ported to the FL cross-party views and local views of aligned and unaligned
15
TABLE III
A SUMMARY OF FL FRAMEWORKS PROPOSED FOR OBJECT DETECTION TASKS . L EARNING TYPE (LT), BEST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).
processing aerial images collected by unmanned aerial vehicles at the central server on a large-scale data repository. In
(UAV) swarms. The UAV swarms used for vision sensing doing so, fine-tuning is used since small datasets are not
collect haze images, and are supported by ground sensors that appropriate for action recognition models to learn complex
collect information about air quality. Each UAV employs a ML spatio-temporal features. Moving forward, an asynchronous
model in order to correlate air quality with the haze images federated optimization is adopted and the convergence bound
and shares its trained gradient with a central server. The server has been shown since the present clients’ computing resources
combines the gradients a learns a global model, which is then were heterogeneous. In a different paradigm, a driver action
used to predict air-quality distribution in the region. Besides, recognition system is built by [99] using FL. The latter has
platforms that supports FL powered applications of CV in been used for model training to protect users’ privacy while
[96], [97]. Lia et al. in [96] demonstrate FedVision in a fire enabling online model upgrades. Similarly, in [48], an FL-
detection task, using a YOLOv3 model for object detection based driver activity recognition system is implemented by
and they report its use on three pilots concerning safety training the detection model in a decentralized fashion. FedAvg
hazard detection, photovoltaic panel monitoring and suspicious and FedGKT have been implemented and their performance
transaction monitoring in ATMs via cameras. Catalfamo et has been demonstrated on the 2022 AI City Challenge.
al. [97] thoroughly examine the use of edge federation for Zhao et al. [62] proposed a semi-supervised FL for activity
implementing ML-based solutions. The platform is introduced recognition, which helps edge devices conducting unsupervised
for deploying and managing complex services on the edge learning of general representations using autoencoders and non
that utilizes micro-services, which are small, independent, annotated local data. In this case the cloud server performed
and loosely-connected services. This platform enables the supervised learning by training activity classifiers on the
management of these services across a network of edge devices learned representations, and annotated data.
by abstracting the physical devices they run on. The the 2) Crowd counting: When more refined tasks are assigned
effectiveness of this solution was demonstrated through a to CV algorithms in dense crowds, such as the detection of
case study involving video analysis in the field of morphology. face mask wearing [91], the difficulty increases since the target
1) Action recognition: Using knowledge distillation, [98] objects (i.e. masks) are in different scales and occlusions. FL is
allows client nodes with limited computational resources to a promising solution that enhances the privacy of individuals.
execute action recognition by performing model compression However, scale variations, occlusions and the diversity in
17
crowd distribution are still the open issues that demand efficient [107]. However, privacy and ethical concerns are increasingly
detection techniques, such as deep negative correlation learning critical, which makes it difficult to collect large quantities
[100], relational attention [101], etc. of data from multiple institutions. FL provides a promising
Crowd counting is a complex CV task especially when decentralized solution to CL by exchanging client models
multiple sensors (i.e. cameras) are combined. FL approaches instead of private data. Sheller et al. [108] performed the first
can be helpful since they allow the distributed trainers to study that investigated the use of FL for multi-institutional
exchange their models and improve their performance quickly, collaboration, and enabled the training of DL models without
especially when a centralized trainer is able to periodically sharing patients’ data. In particular, the aggregation process
validate and improve the quality of the aggregated model involves calculating a weighted average of institutional updates,
[27]. However, when the distributed nodes are not trusted where each institution’s weight is determined by the proportion
in advance, or when the quality of their data is ambiguous, of total data instances it holds. This entire process, comprising
more control mechanisms and incentives are needed to avoid local training, update aggregation, and distribution of new
the deterioration of the resulting model. When the distributed parameters, is referred to as a federated round. Linardos et
trainers have to co-operate in order to reach a consensus, either al. [46] modeled cardiovascular magnetic resonance using a
it is for crowd counting or for any other task, it is important FL scheme that concentrated on the diagnosis of hypertrophic
to provide them enough incentives in order to become trustful cardiomyopathy. A 3D-CNN model, pre-trained on action
[102]. Blockchain-based approaches [103] manage to distribute recognition, was deployed. Moreover, shape prior information
the incentive among the trained models and evaluate their has been integrated into 3D-CNN using two techniques and
reliability in order to fairly partition any potential profit (or four data augmentation strategies (Fig. 10). This approach
trust). This way, they allow FL algorithms to become more has then been evaluated on the automatic cardiac diagnosis
robust, by including trustful trainers and honest reporters that challenge (ACDC) dataset. The multi-site fMRI classification
detect misbehaviors and block them from the FL process. problem is addressed by [109] while preserving privacy using
3) Anomaly detection: An interesting data mining task a FL model. Accordingly, a decentralized iterative optimization
with many CV applications is anomaly detection. It involves has been deployed before using a randomization mechanism
the identification of strange patterns in data, which may to alter shared local model weights. In the same way, Dayan
indicate a fake or incorrect situation. Several cutting-edge et al. [110] developed and trained a FL model on data from
ML and DL algorithms have been developed in the literature data from 20 institutes worldwide, namely EMR chest X-ray
in order to detect and prevent such incidents. When it comes AI model (EXAM). The latter was built based on the COV-
to CV various DL architectures, from variational autoencoders 2 clinical decision support (CDS) model. This helps predict
(VAE), generative adversarial networks (GANs) or recurrent the future oxygen requirements of COVID-19 patients using
neural networks (RNNs) can be trained to detect anomalies chest X-rays, laboratory data, and inputs of vital signs. In the
in video sequences [104]. The capitalization of the use of same way, an FL-based solution that screens COVID-19 from
autoencoders and FL for detecting anomalies is introduced chest X-ray (CXR) images is deployed by [111]. In addition,
in [105]. The bipartite structure of autoencoders, and their a communication-efficient CNN-based FL scheme to multi-
ability to reconstruct the input in their output, allows the chest diseases classification from CXR images is proposed
detection of anomalies based on the errors spotted during by [112]. Yan et al. [113] proposed a real-time contribution
the regeneration of the input. The authors train two clients measurement approach for participants in FL, which is called
using different parts of the training dataset and they merge the Fedcm. The latter has been applied to identify COVID-19 based
resulting models using FL library called PySyft for secure and on medical images. Moving forward, Roth et al. [114] deployed
private DL. They employ the MSE loss metric to compare the a FL approach for building medical imaging breast density
input and output and examine whether it exceeds a threshold classification solutions. Data from seven clinical institutions
in order to decide whether it is an anomaly or not. Bharti et around the world has been used to train the FL algorithm.
al. [106] have proposed an edge-enabled FL approach for the
automatic inspection of products using CV. Their main basis is Because of the gaps in fMRI distributions from distinct
SqueezeNet, a lightweight model pre-trained on the ImageNet sites, a mixture of experts domain adaptation (MoE-DA) and
dataset, which is able to identify 100 different types of objects. adversarial domain alignment (ADA) schemes have been inte-
Although the image processing model has been trained using grated into the FL algorithm. The reference [115] introduced a
multiple images of normal products the various defects are variation-aware FL approach, in which the variations between
detected using an anomaly detection algorithm. SqueezeNet clients have been reduced by transforming the images of all
acts as a feature extractor from images, which are then fed to a clients onto a common image space. A privacy-preserving
dense layer with as many output neurons as possible anomaly generative adversarial network, namely PPWGAN-GP is
classes, plus one for the normal products. In its federated introduced. Moving on, a modified CycleGAN is deployed
version, the edge server aggregates the various local models for every client to transfer its raw images to the target image
and computes a new global model that is reshared with the space defined by the shared synthesized images. Accordingly,
local nodes. Table IV summarizes some of the FL frameworks this helps address the cross-client variation problem while
proposed for video surveillance and smart environments. preserving privacy. Similarly, for data privacy-preserving,
[116] used a FL scheme to securely access and meta-analyze
biomedical data without sharing individual information. Specif-
E. Healthcare and medical AI ically, brain structural relationships are investigated across
Nowadays, DL methods with large-scale datasets can clinical cohorts and diseases. Sheller and their colleagues
produce clinically useful models for computer-aided diagnosis [117] deployed a FL scheme to facilitate multi-institutional
18
TABLE IV
S UMMARY OF FL FRAMEWORKS PROPOSED FOR VIDEO SURVEILLANCE AND SMART ENVIRONMENTS . B EST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).
[48] ResNet 8 AI City Optimise FL for resource-limited accuracy = FL results relatively close No
Challenge, devices with FedGKT 95% to the centralized results but
StateFarm are not superior.
[62] LSTM OPP, DG, Addressed the problem of the lack accuracy = Require labeled data on the No
PAMAP2 of labeled data on the client’s level 82% central server.
[99] MobileNetV2, StateFarm Evaluation of FL for driver’s action accuracy = Evaluation considered only No
ResNet50, recognition 85% the accuracy of the model.
VGG16,
Xception
[98] ReNet HMDB51, Custom model initialisation through accuracy = Limited consideration of No
UCF101 knowledge distillation with asyn- 89.5% non-iid data.
chronous aggregation
[102]CNN MNIST, Next- Federated crowd sensing with an accuracy = Even though the loss re- No
Character- incentive mechanism to reward and 80% duced significantly, it is still
Prediction motivate participants high.
[103]DDCBF NA Suggest an FL framework to dis- accuracy = Require long time for con- No
tribute trust and incentive among 97% vergence
trainers
[100]DRFL Wider Face Proposes a cascade network with mAP = Can not achieve real-time de- Yes2
two stages trained with FL 98.5% tection due to computational
complexity.
[106]CNN MVTech FL-based approach to enable visual F1-score Limited evaluation consid- Yes3
inspectors to recognise unseen de- ¿= 90% ering only detection perfor-
fects in industrial setups mance.
collaborations without the need to share patients’ data. In [118], for watermarking-based FL. The first is called WAFFLE
MR data from multiple institutions is shared with privacy proposed to prevent global FL model theft by offering a
preservation. Moreover, cross-site modeling for MR image mechanism for model owners to showcase their ownership of
reconstruction is introduced to reduce domain shift and improve the models, and the second is a client-side backdoor-triggered
the generalization of the FL model. Moving on, [119] combined watermarking is adopted to secure FL model verification.
differential privacy and weakly-supervised attention multiple Blockchain is also another means used for data privacy, Polap
instance learning (WS-AMIL) in order to develop a privacy- et al. [126] developed an intelligent medical system based on
preserving FL approach for gigapixel whole slide images in agent architecture using blockchain and FL technologies. Since
computational pathology. Researchers in [120] implemented FL algorithms do not inherently contain privacy-preserving
differential-privacy schemes for protecting patients’ data in mechanisms and can be sensitive to privacy-centered attacks
a FL setup designed for brain tumour segmentation on the that can divulge patients’ data, it is important to augment them
BraTS dataset. In the same field, Bercea et al. [121] developed with privacy-enhancing technologies, especially in clinical
a disentangled FL approach to segment brain pathologies in applications that are implemented in a multi-institutional
an unsupervised mode. Cetinkaya et al. [122] attempted to setting. In this context, a differentially private FL solution
improve the performance of FL-based medical image analysis is suggested by [127] to segment multi-site medical images
in non IID settings using image augmentation. in order to further enhance privacy against attacks. Besides,
Guo et al. [128] propose an FL-based approach for distributed
The schemes [123], [124], [125] consider whether the data in medical cyber-physical systems. Their approach helps
necessity of watermarking is required when using FL. For in training DL models for disease diagnosis following three
example, [123] introduced a FL-based zero-watermarking steps which are repeated in cycle: i) training a global model
technique for security and privacy preservation in teleder- using offline medical images and transferring the global model
matology healthcare frameworks. FL-based autoencoder is to the local diagnosis nodes, ii) re-training the local models
employed for extracting image features from dermatology using local data and iii) sending them back to the central
data using two-dimensional discrete cosine transform (2D- server for federated averaging. The authors in [129] propose a
DCT). Conversely, [124], [125] are two strategies proposed
gTX[YScO\XUYZU^T~QcQR[UMQR[QcUTR[QRXT[TQXUVQZYcQUOR^UOQcUNTX[YScO\UX[OR^Oc^TkO[TYRhUiQU[aYU
QjOPWO[TYRU[QMNRTWQXUT\`PbU^T~QcQR[UO^O`[O[TYRUXQ[rW`XUOXUZYcU[NQUUaQUOcQUX[OR^Oc^TkTRSUWXTRSUORUOjQcOSQU
ZcY\UOPPUYZU[NQUMQR[QcX]UaNTPQUZYcUUaQUMYRXT^QcUYRQUMQR[QcUQR[TcQPbUWRXQQRh 19
µ¡¥¢« µ¢«¢«
´«ª¡§
Ϋ£Ï¥ª
ÖÔ©²«¥¥
°¥±²£¢³¡¡ ¸¹«¡¥¥ª ²¥¢¢¥
´«ª¡§ £¢¥«¢¡
µ¡¥¥ª
¡¡¢£¤ ¥ µ¡¥¢« µ¢«¢« ¶«º²ª§¢¥««¥ ¹£©§ ±¢©£¨¢«¡¥¢±¥¢£ÏÐ
«Ï¢«Ï¡¢¢«¡¥£¢¥«¢¡
ÖÔ©²«¥¥
¦¢§¨©¥ª«¬¬¬ ¶·«¡¥¥ª ²¥¢¢¥
§§ £¢¥«¢¡
΢¡ª¢«Ï¢«Ï¡¢¢«¡¥£¢¥«¢¡¥«¥¢
¡¨¨¥ª« ¨ ©Ð¡«¡¥¥ª
¬®¯¬®¯¬¯ ´«ª¡§ ´«ª¡§
µ¡¥¥ª Ϋ£Ï¥ª
·¨©«¥«Ð©±
¸¹«¡¥¥ª ¸¢±¢¡«¢±
°¥±²£¢³¡¡ ¢Ô©²«¥
»¼½¾¼¿À¼Á¼ÿÄÅ¿ƿÀÃÇÈȿɽÊË̽ÂËÇÁ
¹£©§ ±¢©£¨¢«¡¥¢±¥©©Ð²¡
ÀÇÊʽÍÇýÂ˾¼¿ÀÃÇÈȿɽÊË̽ÂËÇÁ ¶«º²ª§¢¥««¥ £¢¥«¢¡
ÑÊÇÒ¿ÇÓ¿ÂýËÁ¿È¼Â ÖÔ©²«¥¥
ÑÊÇÒ¿Çӿ¼È¿ȼ ¶·«¡¥¥ª §¢¡ª¢±«¢«
¢« ΢¡ª¢«¢«
ÑÊÇÒ¿ÇÓ¿ÕÇÌ¼Ê ¢«
΢¡ª¢©©£¢¥«¢¡¥«¥¢¨ ©Ð¡
«¡¥¥ª
×RUYjQcjTQaUYZU[NQÙT̀QPTRQUWXQ^UZYcU[NQUQ_`QcT\QR[XUTRU[NTXUX[W^bh
Fig. 10. The methodology applied for conducting the experiments in this study is outlined in [46]. Leave center out and collaborative cross-validations, both
employed collaborative data sharing (CDS) and FL for the training.
MNOPPQRSTRSUVQMOWXQUYZU[NTXUPT\T[O[TYR]UOR^UTRU[NTXUQ_O\`PQUaQUMORUXQQU[NTXUVbU[NQÙcQXQRMQUYZUOUXQMYR^ÙQOdUYRU
technique to harmonizeeO PPU^fgQVcYglobal
RfXUTR[Qdrifts
RXT[bUin^TXFL
[cTVmodels
W[TYRhon healthcare and AI applications.
iQ
local
UYj
and
QcO P
PÙT̀Q P
TR QUTX UXW\\O
heterogeneous medical images. In doing so, the local update
cTkQ^UTRUlTShmnh
op
drift is first mitigated by normalizing amplitudes of images
qRU[NQUTRT[TOPUQ_`QcT\QR[X]UaQUMY\`OcQU[NQUVOXQPTRQU^O[OU[YUOUMWcO[Q^UjQcXTYRUaT[NUTR^WMQ^UXNO`QÙcTYcXUOXUjTXWr
transformed into the frequency domain and then a client weight F. Autonomous driving
OPTkQ^UTRUlTShmshUiQUcQXWP[XUOcQUYW[PTRQ^UTRUtOVPQmshUlYcUVY[NUMYPPOVYcO[TjQUPQOcRTRSUZcO\QaYcdX]ÙQcZYc\ORMQU
perturbation guiding each local model to reach a flat optimum Autonomous driving has recently received increasing interest
T\`cYjQXUOXUaQUTR^WMQUOUXNO`QÙcTYcUuvOXdQ^UvwqxUOR^ÙQOdXUaNQRUaQUWXQU[NQÙQcrX[cWM[WcQUX`PT[yOÙcTYcUYZU[NQU
is designed based on harmonized features.
NQOc[fXUX[cWM[WcQhUlzUQ_NTVT[XUXT\TPOcÙQcZYc\ORMQU[YU{|}UTRUOPPUMOXQX]UNYaQjQcU[NQUX[OR^Oc^U^QjTO[TYRXUOcQU\WMNU
due to the advance of CV, which is in the core of this
PYaQcUTRU[NQUMOXQUYZUlzhUvYcQYjQc]U[NQU^T~QcQRMQUTRÙQcZYc\ORMQXUYZU{|}UOR^UlzU^QMcQOXQXUOQcUMYRX[cOTRTRSU
The problem setting of federated domain generalization technology. Vehicles leverage object detectors to analyze
[NQUXYPW[TYRUX`OMQUWXTRSU[NQUXNO`QÙcTYcXh
(FedDG) is solved in [130]. This helps in learning a FL images collected by multiple sensors and cameras, analyze their
UXbX[Q\O[TMUMY\`OcO[TjQUX[W^bUTXUMYR^WM[Q^UYRU[NQU^T~QcQR[UOWS\QR[O[TYRUXQ[rW`XUWXTRSUVY[NU[NQU{{eUOR^U
architecture from various distributed SDs, which enables its surroundings in real-time and then recognize different objects,
z{r{eUQjOPWO[TYRÙcYMQ^WcQXhUQUR^U[NO[U{{eÙQcZYc\ORMQU^cY`XUTRUOPPUMYPPOVYcO[TjQUPQOcRTRSUZcO\QaYcdXU
generalization to unseen TDs. This was made possible by including other vehicles, road signs, barriers, pedestrians, etc.,
utOVPQmUOR^UlTShmxUYRMQUOWS\QR[O[TYRXUOcQUTR[cY^WMQ^h
introducing a the episodic learning in continuous frequency which help them in safely navigating the roads. While a
qRU[NQUMOXQUYZUz{r{e]UaNQcQU[NQU[cOTRUOR^U[QX[TRSUXQ[XUMY\QUZcY\U^T~QcQR[U^TX[cTVW[TYRX]U{|}ÙQcZYc\ORMQU
space (ELCFS) technique. Moving on, [131] propose a partial plethora of studies have focused on improving the accuracy by
MYRXTX[QR[PbUT\`cYjQXUZcY\UO^^T[TYROPUOWS\QR[O[TYRX]UcQOMNTRSUT[XUNTSNQX[UaNQRU[NQU\YX[UOWS\QR[O[TYRXUOcQU
O``PTQ^UulTShmxhUlzUOPXYUVQRQ[XUZcY\U[NQUVOXTMUOWS\QR[O[TYRXUYZUcY[O[TYRUOR^UT̀`TRS]UVW[UT[XÙQcZYc\ORMQU^cY`XU
initialization-based cross-domain personalized FL, namely training DL algorithms on centralized large-scale datasets,
OXUXNO`QUOR^UTR[QRXT[bUOWS\QR[O[TYRXUOcQUTR[cY^WMQ^]UOR^U{|}UXWc`OXXQXUlzhUqR[QcQX[TRSPb]U\Y^QPXU[cOTRQ^UWR^QcU
PartialFed. Their method is based on loading a subset of few of them have addressed the users’ privacy. To that
ORUlzUZcO\QaYcdUuQT[NQcUlzUYcUlzrexUQ_NTVT[UTRMcQ^TVPbUMYRXTX[QR[UcQXWP[XUOMcYXXU^T~QcQR[UTRT[TOPTkO[TYRXUYZU[NQU
the global model’s parameters instead of loading the entire end, using FL in autonomous driving has recently attracted
XO\QUXQ[rW`X]UaNTPQU{|}rZcO\Q^U\Y^QPXUXNYaUOUNTSNUO\YWR[UYZUjOcTORMQh
model, as it is done in most of the FL approaches. Thus, it increasing attention. However, numerous challenges have been
is closer to the split Learning paradigm. More over, Wang et raised, including data discrepancy across clients and the server,
al. [132] design an effective communication, and computation expensive communication, systems heterogeneity and privacy
!"#""$%"&'((% )**+,-../012034.5625678.,95:;8<6==<6058><9
efficient FL scheme using progressive training. This helps concerns. Typically, privacy issues include internal and external 0
to reduce computation and two-way communication costs, data, such as the faces of pedestrians, vehicles’ positions, etc. 123456789
while preserving almost the same performance of the final The approach to on-device ML using FL and validates
models. While FL allows collaboratively training, using a joint it through a case study on wheel steering angle prediction
model that is trained in multiple medical centers that maintain for autonomous driving vehicles is presented in [135]. The
their data decentralized to preserve privacy, the federated results show that FL can significantly improve the quality
optimizations face the heterogeneity and non-uniformity of of local edge models and reach the same accuracy level as
data distribution across medical centers. To overcome this centralized ML without negative effects. FL can also accelerate
issue, an FL scheme with shared label distribution, namely model training speed and reduce communication overhead,
FedSLD, is proposed by [133]. This approach can reduce making it useful for deploying ML/DL components to various
the discrepancy brought by data heterogeneity and adjust the embedded systems. Fig. 11 presents a FL-based framework
contribution of every sample to the local objective during for wheel steering angle prediction in autonomous vehicles.
optimization via the knowledge of clients’ label distributions. Nguyen et al. [136] propose a communication-efficient FL
Tbale V presents a Summary of FL frameworks proposed for to detect fatigue driving behaviors, namely FedSup. It helps
20
TABLE V
S UMMARY OF FL FRAMEWORKS PROPOSED FOR HEALTHCARE AND MEDICAL AI APPLICATIONS . B EST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).
[46] 3D-CNN M&M and ACDC Modeling cardiovascular mag- AUC = Presence of bias against ACDC Yes4
datasets netic resonance with focus on 78 on the shape and intensity set-
diagnosing hypertrophic car- up where FL exhibits an AUC
diomyopathy performance of about 0.85 to
0.89.
[47] PFA Real-World A personalized retrogress- AUC= Generalization with other Yes6
Dermoscopic FL resilient FL with modification 88.92, datasets is not confirmed.
Dataset 5 in the clients and server. F1=
70.75
[109] MoE-DA, Autism Brain Privacy-preserving FL and do- ACC= The model updating strategy is Yes7
ADA Imaging Data main adaptation for multi-site 78.9 not optima. Additionally,the sen-
Exchange dataset fMRI analysis sitivity of the mapping function
(ABIDE I) was difficult to estimate.
[115] PPWGAN- LocalPCa, Address cross-client variation AUC= Do no address the inter-observer No
GP, PROSTATEx problem among medical image 96.79, problem and incomplete image-
modified challenge [134] data using VAFL. ACC= to-image translation.
Cycle- 98.3%
GAN
[116] fPCA ADNI, PPMI, Meta-analysis of large-scale sub- N/A No comparisons with the SOTA No
MIRIAD and UK cortical brain data using FL have been reported.
Biobank
[110] fPCA Synthetic and real Predict the future oxygen re- AUC¿ Normalization techniques to en- Yes8
world private data quirements of symptomatic pa- 92 able the training of AI models in
tients with COVID-19 using in- FL were not investigated.
puts of vital signs
[126] Agent- Private data Blockchain technology and ACC= Poor internet connection hinders No
based threaded FL for private sharing 80 accessing the data. The initial
mod of medical images. adaptation of the classifier for
practical use needs more investi-
gation.
[133] FedSLD MNIST, FL with shared label distribution ACC= Reduce the impact of non-IID No
CIFAR10, Organ- to classify medical images 95.85 data by leveraging the clients’
MNIST(axial), label distribution.
PathMNIST
[131] PartialFed, KITTI, WFace, Partial initialization for cross- ACC= Reduce performance degradation No
PartialFed- VOC, LISA, domain personalized FL 95.92 caused by extreme distribution
Adaptive DOTA, COCO, heterogeneity.
WC, CP, CM,
Kit, DL
in progressively optimizing the sharing model with tailored interconnected networks and heterogeneous data generated at
client–edge–cloud architecture and reduces communication the network edge in the upcoming 6G environment. A two-
overhead by a Bayesian convolutional neural network (BCNN) layer FL model is proposed that utilizes the distributed end-
data selection strategy. In [136], a federated autonomous edge-cloud architecture to achieve more efficient and accurate
driving network (FADNet) solution is introduced for enhancing learning while ensuring data privacy protection and reducing
models’ stability, ensuring convergence, and handling imbal- communication overhead. A novel multi-layer heterogeneous
anced data distribution problems where FL models are trained. model selection and aggregation scheme is designed to better
Zhou and their colleagues [137] discuss the need for utilize the local and global contexts of individual vehicles and
distributed ML techniques to take advantage of the massive roadside units (RSUs) in 6 G-supported vehicular networks.
21
Aggregation
server
Input
steering
wheel angle
Error back-
Network
propagation
prediction angle
Image frames
CNN
Local training Local training Local training Optical flow
frames
Fig. 11. FL-based framework for wheel steering angle prediction in autonomous vehicles [135].
This context-aware distributed learning mechanism is then environment for FL. Consequently, it is important to establish
applied to address intelligent object detection in modern the mechanisms that manage heterogeneous nodes in the same
intelligent transportation systems with autonomous vehicles. network, considering their restrictions and capabilities. The use
Fig. 12 presents an overview of the two-layer FL model of central (cloud-based) nodes that carry part of the training
based on convolutional neural network (TFL-CNN) framework. process, either using additional training data or by training
Table VI presents a summary of FL frameworks proposed for parts of the learning model (e.g. some layers of the DNN
autonomous vehicles. as in the split learning paradigm) can be beneficial for the
overall performance. A third issue that must be considered is
IV. O PEN ISSUES the statistical heterogeneity of data that arrives in each node.
Learning using non-IID data is still an open challenge for FL
The advances in CV algorithms have increased the number algorithms, and various alternatives are still under evaluation.
and variety of tasks that range from the detection and tracking Finally, privacy and robustness concerns are still present and
of specific objects in static images or controlled image streams methods that preserve privacy and at the same time guarantee
to public surveillance and the detection of normal or abnormal the resulting model robustness to any kind of breaches or
behaviors and conditions in the wild, with applications from attacks must be properly designed and developed.
agricultural crop monitoring to law enforcement and medical
diagnostics. The information captured in surveillance cameras’
footage may contain sensitive and private data that must be A. Communication overhead
protected (e.g., the presence of individuals at a place), the CV A crucial component of a FL environment is communication
models trained to detect abnormal behaviors in public may between the client nodes and the central server. The ML
suffer from race, ethnicity or gender bias, and scene perception model is downloaded, uploaded, and trained over the course of
software in self-driving cars may suffer from the ability to several communication rounds. Although transferring models
adapt to new environments or conditions (e.g. fog, mist or instead of training datasets is significantly more efficient,
haze). these communication rounds may be delayed when the device
FL is a new method for protecting privacy when developing has limited bandwidth, energy, or power. The communication
a DNN model, that uses data from various clients and trains overhead also increases when multiple client devices participate
a common model for all clients. This is achieved by fusing in each communication round thus leading to a bottleneck.
distributed ML, encryption and security, and introducing incen- Further delays may occur from the need to synchronise the
tive mechanisms based on game theory and economic theory. receipt of models from all clients, including those with low
FL could therefore serve as the cornerstone of next-generation bandwidth or unstable connections. An additional burden to
ML that meets societal and technological requirements for the client synchronization overhead can be caused by the
ethical AI development and implementation. non-identical distribution of training data to the nodes, which
Despite their many advantages, FL solutions also have may delay the training of some models, and consequently
several challenges to address. First of all, since FL is a the model update process [144]. This delay in cybersecurity
distributed learning technique it is important to guarantee the can be considered an intrusion if it surpasses a predetermined
efficient communication between the federated network nodes threshold [10].
so that they can communicate the learned model parameters. In order to mitigate the communication overhead and
In addition to this, the models may be trained on nodes establish communication-efficient federation strategies, [145]
of different hardware architecture and capabilities (e.g. IoT have proposed the compression of transferred data. In a
devices, smartphones, etc.) which constitute a heterogeneous different approach, [146] have focused on identifying irrelevant
22
Aggregated
parameters Global aggregation
Contextual
Information
Weighted Global
aggregation caching
Weighted Nodel
aggregation selection
models and precluding them from the aggregation, thus with low communication overhead for use in the FL scenario
significantly reducing the communication cost. This is achieved by leveraging the covariance stationarity. The experimental
by sharing the global tendency of model updating between the results show that FedCor can improve convergence rates by
central server and all the client nodes and asking each client 34% to 99% on FMNIST and 26% to 51% on CIFAR-10
to align its update with this tendency before communicating compared to the SOTA method. Besides, FL still suffers
the model. from significant challenges such as the lack of convergence
In the case of FLchain [36], the model communication cost and the possibility of catastrophic forgetting. However, it is
between the clients and a central server, which is common demonstrated in [149] that the self-attention-based architectures
in FL, is replaced by the cost of sharing the models with the (e.g., Transformers) have shown to be more resistant to changes
blockchain ledger. For this reason, it is important to consider in data distribution and therefore can improve the effectiveness
the time needed for this process, including local training, model of FL on heterogeneous devices. The authors of [150] propose
transmission, consensus, and block mining. an alternative approach, called FedAlign, which aims to address
data heterogeneity by focusing on local learning rather than
proximal restriction. They conducted a study using second-
B. Heterogeneity of client nodes
order indicators to evaluate the effectiveness of different algo-
One of the major challenges in FL is the heterogeneity of rithms in FL and found that standard regularization methods
data on different clients, which can hinder effective training. performed well in mitigating the effects of data heterogeneity.
To address this issue, client selection strategies are often used FedAlign was found to be a simple and effective method
in an attempt to improve the convergence rate of the FL for overcoming data heterogeneity, with competitive accuracy
process. While active client selection strategies have been compared to SOTA FL methods and minimal computational
proposed in recent studies, they do not take into account and memory overhead.
the loss correlations between clients and only offer limited
improvement compared to a uniform selection strategy [147]. On the other hand, when FL is limited only to client
To overcome this limitation, FedCor, an FL framework that nodes that share the same model architecture then the FedAvg
utilizes a correlation-based client selection strategy to improve algorithm and its alternatives (e.g. FedSDG, FedProx, etc.) can
the convergence rate of FL is proposed in [148]. The loss be applied to merge the locally trained models in each iteration
correlations between clients is modeled using a Gaussian [151]. However, such approaches assume that the underlying
process (GP) and use this model to select clients in a way nodes share similar hardware architecture, specifications and
that significantly reduces the expected global loss in each processing capabilities in general, which is not always the case
round. Moreover, an efficient GP training method is developed in FL settings. For example, when smartphones, monitoring
23
TABLE VI
S UMMARY OF FL FRAMEWORKS PROPOSED FOR AUTONOMOUS VEHICLES , INCLUDING THE ML MODEL USED , DATASET, DESCRIPTION OF THE MAIN
CONTRIBUTION , BEST PERFORMANCE (BP), LIMITATIONS AND PROJECT LINK AVAILABILITY (PLA).
[136] FADNet Udacity, P2P federated framework for RMSE= Limited deployment experi- Yes9
Gazebo, Carla training autonomous driving 0.07 ment
models
[138] BSUM- MNIST Dispersed FL framework to im- accuracy = NP-hard and non-convex for- No
based prove the robustness, privacy- 99% mulation of the problem
solution awareness and communication
constraints
[139] CNN Private dataset Addresses the case where ve- accuracy= Require long training itera- No
hicles and servers are consid- 92.5% tions for small images
ered honest but curious through
blockchain
[140] CNN MNIST, A reactive method for the al- accuracy = The impact of resource allo- No
FMNIST location of wireless resources, 88 % cation methods is diminished
dynamically occurring at each when strengthening the role
round of the proximal term
[135] CNN SullyChen End-to-end ML model for han- RMSE= 9.2 Synchroneous aggregation No
dling real-time generated data
[141] U-NET, Lyft Level 5 AV Feasibility study of using FL for accuracy = Evaluated only the accuracy No
CNN vehicular applications 95%
[137] TFL-CNN BelgiumTSC An improved model selection F1-score = Limited evaluation of the No
and aggregation algorithm 94% framework
[142] YOLO3 KITTI Evaluation of real-time object MAP = Sensitive to changes in labels No
detection in real traffic environ- 68.5% across clients
ments
[143] SNNs BelgiumTSC Leverages spike NN to optimize accuracy = Suffers from security issue No
resources required for the train- 95%
ing
cameras and field programmable gate arrays (FPGA) attached C. Non-IID data
cameras are orchestrated in an FL setting, the models have
heterogeneous architectures and model parameter sharing is Although FL offers a promising method to privacy pro-
infeasible [152], [153]. The barriers of conventional FL are tection, there are significant difficulties when FL is used in
removed when the models share their models instead of their the real world as opposed to centralized learning. Numerous
parameters or updates. Algorithms such as the federated model studies have shown that the accuracy of FL on non-IID
distillation (FedMD) propose a model agnostic federation or heterogeneous data would inevitably deteriorate, mainly
solution, which sends the predictions of the local models because of the divergence in the weights of local models that
to the central node instead of the models themselves [154]. result from non-IID data [156]. Either the FL approach is
Another impact of node heterogeneity applies to the overall horizontal (i.e. aggregating the local models’ weights on a
performance of the FL process since the federation of heteroge- central server) or vertical (i.e. aggregating the model outputs
neous nodes with varying training data structure and size and in the guest client to calculate the loss function) the non-IID
varying processing capabilities usually requires in each round data can take various forms (e.g. attribute, label or temporal
all the local nodes to train their models before proceeding to skew) that can affect the overall performance, and mitigation
the next iteration. As a result, slow nodes, with low usability measures must be taken to avoid it [157]. These include data
data may degrade the time performance of the federation. The sharing and augmentation, and the fine-tuning of local models
work [155] proposes a reinforcement learning-based central using a combination of local and global information. Wang wt
server, which gradually weights the clients based on the quality al. [158] propose FAVOR, a reinforcement-based method (deep
of their models and their overall response in an attempt to Q-learning) for choosing the clients that contribute models to
establish a group of clients that are used to achieve almost the aggregation phase in each round. The proposed method
optimal performance. reduces the non-IID data bias and the communication overhead.
24
Since the number of dimensions of the state space equals the Blockchain can be beneficial for preserving privacy in FL
number of model weights times the number of client nodes, a settings and can help avoid various attack types, from single
dimensionality reduction technique, such as PCA, is applied in point of failure and membership inference to Byzantine, label
order to compress the state space. Another group of approaches flipping, and data poisoning ones [165]. The FL blockchain,
tries to balance the bias of non-IID data by clustering the local or FLchain for short, paradigm can transform mobile-edge
updates using a hierarchical algorithm [159]. Such approaches computing to a secure and privacy-preserving system, once the
result in multiple models that are trained independently and proper solutions are found for allocating resources, providing
in parallel, leading to a faster and better convergence of each incentives to the client nodes, and protecting the security
group. and privacy of data at an optimal communication cost.
In their proposed architecture, the mobile-edge computing
D. Device compatibility and network issues (MEC) servers can act either as learning clients or as miners
FL relies on the participation of multiple devices, which to establish blockchain consensus and the mobile devices
may have different hardware and software configurations. This associated with each server focus only on the learning tasks.
can make it difficult to ensure that the model can be trained on The mobile devices transmit their models to the MEC server
all devices, and may require additional efforts to optimize the as transaction. Each MEC server stores the local models it
model for different device types [160]. FL requires a stable collects to the blockchain after verifying them with other
and reliable network connection in order to train the model MEC servers. The aggregation node or any other local node
and exchange model updates between participants and the is then able to retrieve the stored and verified models from
central server. If the network connection is unreliable or has the blockchain, aggregate them and use the resulting model
low bandwidth, this can negatively impact the training process in the next iteration [36]. Watermarking is another means of
and the performance of the model. Explicitly, if the parties model security preserving [123], [124], [125].
involved in FL are located in different geographical regions,
the time it takes for model updates to be transmitted over the G. Replication of research results
network can vary significantly, leading to delays in the training Despite the fact that the performance of FL models requires
process. further improvement compared to that obtained with centralized
training, replicating FL research results and conducting fair
E. Human bias comparisons are still challenging. This is mainly due to the lack
FL relies on the participation of multiple parties, which of exploration in different tasks with a unified FL framework.
may have different biases or perspectives that can influence To that end, [166] develop an FL benchmarking and library
the model’s training and performance. This can be particularly platform for CV applications, called FedCV. This helps in
challenging in the context of CV, as the model may be trained bridging gaps between SOTA algorithms and facilitating the
on data with biased or inaccurate annotations. It is important to development of FL solutions. Additionally, three main taks of
carefully consider and address these issues in order to ensure CV, including image object detection, image segmentation and
that the model is fair and unbiased [161]. image classification can be evaluated on this toolkit, as shown
in Fig. 13. Moreover, numerous FL algorithms, models, and
F. Privacy and robustness to attacks non-IID benchmarking datasets have been uploaded and still
The resilience and data privacy of existing FL protocol de- the toolkit can be updated.
signs have been shown to be compromised by adversaries both Besides, enhancing FL-based systems’ efficiency is a
inside and outside the system. The sharing of gradients during delicate task due to the per-client memory cost and large
training can reveal private information and cause data leakage number of parameters. Thus, using the FedCV framework,
to the central server or a third party. Similarly, malicious users which is an efficient and flexible distributed training toolkit
can try to affect the global model by introducing poisoned that has easy-to-use APIs, along with benchmarks, and different
local data or gradients (model poisoning) in an attempt to evaluation settings can be useful for the FL research community
destroy its convergence (i.e. Byzantine attack) or to implant to conduct advanced research CV studies.
a trigger in the model that constantly predicts adversarial
classes, without losing performance on the main task. These V. C HALLENGES OF USING FL IN CV
two times of attacks either aim to breach user privacy (e.g. FL is a promising approach for solving privacy and data
infer class representatives, infer class membership or infer user distribution challenges in CV tasks such as object detection,
properties) or poison the model in order to control its behavior autonomous vehicles, etc. However, there are several challenges
[162]. Homomorphic encryption, secure joint computation associated with the use of FL in these applications. For instance,
from multiple parties, and differential privacy are some of FL assumes that the data distribution across clients is similar.
the means for mitigating privacy breaches. Respectively, the However, in CV tasks, the data may be heterogeneous, and the
defenses against poisoning attacks focus on the detection of models need to be robust to handle variations in lighting, view-
malicious users based on the anomalies detected in their models point, and occlusions. FL relies on communication between
(e.g. different distributions of features than the rest of the users clients and the server. For CV tasks, such as object detection,
[163]) focusing on increasing the robustness against Byzantine the models can be large, and the communication costs can
attacks. They also examine the resulting models to detect be high. This can result in slower training times and higher
whether they are compromised or not, using a combination of energy consumption. Moreover, CV tasks require labeled data
backdoor-enabled and clean inputs and examining the behavior for training the models while FL requires each client to have
of the model [164]. labeled data, which can be challenging in scenarios where
25
Model Data
labeling data is expensive or time-consuming. Additionally, employed to validate all the remaining issues discussed in this
FL requires clients to share their data with the server, which section, such as the node and data heterogeneity, the need for
can raise privacy and security concerns. Malicious clients can robustness and privacy etc.
manipulate the data or models, compromising the integrity of
the system. Lastly, FL requires aggregating the models from FedVision [96] is an online platform for developing object
multiple clients to produce a global model. In CV tasks, this detection solutions using FL and a three step workflow that
can be challenging due to differences in the quality of the comprises: i) image annotation, ii) horizontal FL model
models and the data distribution across clients. training and iii) model update. Since the platform is generic, it
allows users to configure the learning parameters, schedule the
On the other hand, FedCV [166] is a benchmarking communication between the server and clients, allocate tasks
framework for the evaluation of FL in popular CV tasks and monitor the utilization of resources. The main challenges
such as image classification and segmentation and object mentioned in this section for FL also apply in the case of
detection. It comprises non-IID datasets and various models FedVision, which however chooses specific strategies to tackle
and algorithms for experimentation. The experiments validate them. For example, it uses model compression to reduce
the aforementioned open challenges for FL in CV tasks: i) the communication overhead, cloud object storage to store huge
degradation of model accuracy when non-IID data are used, amounts of data (model parameters) in the server, and an
ii) the complexity of optimising the training in the FL setting, one-stage approach, based on YOLOv3, to perform end-to-end
iii) the huge number of parameters of NN models used in CV training of the model that identifies the bounding box and
that affects the FL performance. The same benchmark can be the class of the object. However, the real challenges for FL
26
approaches in CV come with the application in real-world or regulatory requirements. Such applications can benefit
images from surveillance cameras, etc. [167]. The large-scale from FL since they can ease network load and enable
of data collected and the requirement for almost real-time private learning amongst many devices and organizations.
inference raises more design challenges for FL experts. • Large language models and generative chatbots:
Advanced language models, such as ChatGPT, can assist
VI. F UTURE DIRECTIONS in various ways to improve the use of FL in CV
by (i) offering explanations, answering questions, and
The future of FL is very promising, but also challenging providing tutorials related to FL and CV [168]. By helping
for researchers, with the main directions being as follows: researchers and practitioners understand the concepts
• Deployment over heterogeneous environments: Smart- better, it can drive wider adoption and more informed
phones are becoming the most popular edge devices application of FL in CV [169]; (ii) simplifying complex
for ML applications since they allow users to perform algorithms, provide pseudocode, and suggest optimization
a large variety of tasks from face detection to voice strategies; assisting with code debugging or suggesting
recognition. In this direction, FL can be used to support modifications to FL algorithms, thereby improving their
such tasks without exposing private information. On the efficiency or performance [170]; (iii) providing insights
other side, IoT networks combine wearables, mobile and into the ethical implications and considerations when
stable sensors, and smartphones in order to establish applying FL in CV, especially in terms of data privacy
smart environments for the end users. All these constitute and usage; (iv) simulating client-server conversations,
a diverse and heterogeneous environment in which models allowing developers to anticipate challenges and refine
of varying complexity have to be communicated in the their systems; and (v) offering advice on best practices
place of data in order to implement federation while for integration, be it with databases, cloud services, or
protecting data privacy. edge devices [171].
• Efficient communication: When creating techniques for
federated networks, communication is a crucial bottleneck VII. C ONCLUSION
to take into account. This is due to the fact that federated
networks may contain a sizable number of devices (such Federated Learning (FL) has emerged as a revolutionary
as millions of smartphones), and communication through- paradigm in the realm of Computer Vision (CV), fostering
out the network may be much slower than local computing. collaborative machine learning without compromising data
Reducing the overall number of communication rounds privacy. This review navigated through the intricate alleys of
or the quantity of communicated messages at each round FL, from its foundational concepts to the myriad applications in
are the two main ways to further cut communication. CV. The aggregation approaches such as averaging, Progressive
The communication can be more efficient using: i) Local Fourier, and FedGKT accentuate FL’s versatility. Moreover, the
updating techniques that allow to cut down the overall inclusion of privacy technologies like the Secure MPC model,
communication rounds, ii) model compression techniques differential privacy, and homomorphic encryption underscores
including quantization, subsampling, and sparsification its commitment to safeguarding data.
can be used to reduce the size of messages conveyed It is remarkable to note the vast landscape of CV applications
during each update round, iii) when the connection with benefitting from FL, ranging from object and face detection
the server becomes a bottleneck, decentralized topolo- to innovative domains like healthcare, autonomous driving,
gies provide an alternative, particularly when working and smart environment surveillance. Yet, like any evolving
in networks with low bandwidth or excessive latency. technology, FL in CV is not devoid of challenges. Issues
Iterative optimization algorithms are parallelized using like communication overhead, device heterogeneity, and the
asynchronous communication. An appealing strategy for conundrums posed by non-IID data offer fertile grounds for
reducing stragglers in heterogeneous contexts is the use of future research.
asynchronous systems. Moreover, using 5G/6G networks While current advancements set a promising trajectory, the
will offer significantly higher speeds and lower latency open issues highlight areas ripe for exploration and innovation.
compared to previous generations. This can enable faster The challenges also underscore the importance of collaboration
and more efficient FL in CV applications. between researchers, practitioners, and industries to make FL
• Dispersed FL: Concerns about FL’s robustness exist more efficient, inclusive, and robust for CV.
since it could cease to function if the aggregate server As we stand on the cusp of a technological evolution, FL
fails (e.g., due to a malicious attack or physical defect). offers a beacon of hope, combining the best of collaborative
Dispersed FL can be used as a more robust alternative to learning and data privacy. The journey ahead is replete with
FL, with groups of nodes with a lot of computing power opportunities and challenges, making it an exhilarating era for
to collaborate in more sub-global iterations to increase researchers and enthusiasts alike.
their performance and consequently support the overall
performance of the federation. R EFERENCES
• New organizational models: The term ”devices” can
[1] Y. Himeur, K. Ghanem, A. Alsalemi, F. Bensaali, and A. Amira,
refer to entire companies or institutions in the context of “Artificial intelligence based anomaly detection of energy consumption
FL. For applications in predictive healthcare, hospitals in buildings: A review, current trends and new perspectives,” Applied
are an example of a company with a lot of patient data. Energy, vol. 287, p. 116601, 2021.
[2] A. Sayed, Y. Himeur, F. Bensaali, and A. Amira, “Artificial intelligence
Hospitals must adhere to strong privacy laws and may with iot for energy efficiency in buildings,” Emerging Real-World
have to maintain local data due to ethical, administrative, Applications of Internet of Things, pp. 233–252, 2022.
27
[3] D. Ng, X. Lan, M. M.-S. Yao, W. P. Chan, and M. Feng, “Federated [26] A. R. Khan, A. Zoha, L. Mohjazi, H. Sajid, Q. Abbasi, and M. A. Imran,
learning: a collaborative effort to achieve better medical imaging models “When federated learning meets vision: An outlook on opportunities and
for individual sites that have small labelled datasets,” Quantitative challenges,” in EAI International Conference on Body Area Networks.
Imaging in Medicine and Surgery, vol. 11, no. 2, p. 852, 2021. Springer, 2021, pp. 308–319.
[4] A. Al-Kababji, F. Bensaali, S. P. Dakua, and Y. Himeur, “Automated [27] K. Giorgas and I. Varlamis, “Online federated learning with imbalanced
liver tissues delineation techniques: A systematic survey on machine class distribution,” in 24th Pan-Hellenic Conference on Informatics,
learning current trends and future orientations,” Engineering Applica- 2020, pp. 91–95.
tions of Artificial Intelligence, vol. 117, p. 105532, 2023. [28] S. Abuadbba, K. Kim, M. Kim, C. Thapa, S. A. Camtepe, Y. Gao,
[5] A. N. Sayed, Y. Himeur, and F. Bensaali, “From time-series to 2d H. Kim, and S. Nepal, “Can we use split learning on 1d cnn models
images for building occupancy prediction using deep transfer learning,” for privacy preserving training?” in Proceedings of the 15th ACM
Engineering Applications of Artificial Intelligence, vol. 119, p. 105786, Asia Conference on Computer and Communications Security, 2020, pp.
2023. 305–318.
[6] Y. Teng, J. Zhang, and T. Sun, “Data-driven decision-making model [29] M. S. Jere, T. Farnan, and F. Koushanfar, “A taxonomy of attacks on
based on artificial intelligence in higher education system of colleges federated learning,” IEEE Security & Privacy, vol. 19, no. 2, pp. 20–28,
and universities,” Expert Systems, vol. 40, no. 4, p. e12820, 2023. 2020.
[7] Y. Himeur, S. Al-Maadeed, N. Almaadeed, K. Abualsaud, A. Mohamed, [30] J. Zhang, Y. Chen, and H. Li, “Privacy leakage of adversarial training
T. Khattab, and O. Elharrouss, “Deep visual social distancing monitor- models in federated learning systems,” in Proceedings of the IEEE/CVF
ing to combat covid-19: A comprehensive survey,” Sustainable cities Conference on Computer Vision and Pattern Recognition, 2022, pp.
and society, vol. 85, p. 104064, 2022. 108–114.
[8] Y. Elmir, Y. Himeur, and A. Amira, “Ecg classification using deep cnn [31] Y. Himeur, A. Sayed, A. Alsalemi, F. Bensaali, A. Amira, I. Varlamis,
and gramian angular field,” arXiv preprint arXiv:2308.02395, 2023. M. Eirinaki, C. Sardianos, and G. Dimitrakopoulos, “Blockchain-
[9] A. Chouchane, A. Ouamane, Y. Himeur, W. Mansoor, S. Atalla, A. Ben- based recommender systems: Applications, challenges and future
zaibak, and C. Boudellal, “Improving cnn-based person re-identification opportunities,” Computer Science Review, vol. 43, p. 100439, 2022.
using score normalization,” arXiv preprint arXiv:2307.00397, 2023. [32] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain
[10] H. Kheddar, Y. Himeur, S. Al-Maadeed, A. Amira, and F. Bensaali, and federated learning for privacy-preserved data sharing in industrial
“Deep transfer learning for automatic speech recognition: Towards better iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp.
generalization,” arXiv preprint arXiv:2304.14535, 2023. 4177–4186, 2019.
[11] Y. Himeur, S. Al-Maadeed, I. Varlamis, N. Al-Maadeed, K. Abualsaud, [33] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb, “Federated
and A. Mohamed, “Face mask detection in smart cities using deep machine learning: Survey, multi-level classification, desirable criteria
and transfer learning: lessons learned from the covid-19 pandemic,” and future directions in communication and networking systems,” IEEE
Systems, vol. 11, no. 2, p. 107, 2023. Communications Surveys & Tutorials, vol. 23, no. 2, pp. 1342–1397,
[12] F. Esposito and D. Malerba, “Machine learning in computer vision,” 2021.
Applied Artificial Intelligence, vol. 15, no. 8, pp. 693–705, 2001. [34] Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, and B. He, “A
[13] A. Copiaco, Y. Himeur, A. Amira, W. Mansoor, F. Fadli, S. Atalla, survey on federated learning systems: vision, hype and reality for data
and S. S. Sohail, “An innovative deep anomaly detection of building privacy and protection,” IEEE Transactions on Knowledge and Data
energy consumption using energy time-series images,” Engineering Engineering, 2021.
Applications of Artificial Intelligence, vol. 119, p. 105775, 2023. [35] M. Aledhari, R. Razzak, R. M. Parizi, and F. Saeed, “Federated learning:
[14] S. Alyamkin, M. Ardi, A. C. Berg, A. Brighton, B. Chen, Y. Chen, H.-P. A survey on enabling technologies, protocols, and applications,” IEEE
Cheng, Z. Fan, C. Feng, B. Fu et al., “Low-power computer vision: Access, vol. 8, pp. 140 699–140 725, 2020.
Status, challenges, and opportunities,” IEEE Journal on Emerging and [36] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and
Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 411–421, H. V. Poor, “Federated learning for internet of things: A comprehensive
2019. survey,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp.
[15] Z. Tong and G. Tanaka, “Reservoir computing with untrained con- 1622–1658, 2021.
volutional neural networks for image recognition,” in 2018 24th [37] S. AbdulRahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi,
International Conference on Pattern Recognition (ICPR). IEEE, 2018, and M. Guizani, “A survey on federated learning: The journey from
pp. 1289–1294. centralized to distributed on-site learning and beyond,” IEEE Internet
[16] S. Mohamed and G. Rubino, “A study of real-time packet video quality of Things Journal, vol. 8, no. 7, pp. 5476–5497, 2020.
using random neural networks,” IEEE transactions on circuits and [38] Y. Cheng, Y. Liu, T. Chen, and Q. Yang, “Federatfed learning for
systems for video technology, vol. 12, no. 12, pp. 1071–1083, 2002. privacy-preserving ai,” Communications of the ACM, vol. 63, no. 12,
[17] Y. Himeur, S. S. Sohail, F. Bensaali, A. Amira, and M. Alazab, pp. 33–36, 2020.
“Latest trends of security and privacy in recommender systems: a [39] S. Yang, B. Ren, X. Zhou, and L. Liu, “Parallel distributed logistic re-
comprehensive review and future perspectives,” Computers & Security, gression for vertical federated learning without third-party coordinator,”
vol. 118, p. 102746, 2022. arXiv preprint arXiv:1911.09824, 2019.
[18] S. Quach, P. Thaichon, K. D. Martin, S. Weaven, and R. W. Palmatier, [40] S. Sharma, C. Xing, Y. Liu, and Y. Kang, “Secure and efficient federated
“Digital technologies: tensions in privacy and data,” Journal of the transfer learning,” in 2019 IEEE International Conference on Big Data
Academy of Marketing Science, vol. 50, no. 6, pp. 1299–1323, 2022. (Big Data). IEEE, 2019, pp. 2569–2576.
[19] Y. Himeur and K. A. Sadi, “Robust video copy detection based on ring [41] G. Bjelobaba, A. Savić, T. Tošić, I. Stefanović, and B. Kocić,
decomposition based binarized statistical image features and invariant “Collaborative learning supported by blockchain technology as a model
color descriptor (rbsif-icd),” Multimedia Tools and Applications, vol. 77, for improving the educational process,” Sustainability, vol. 15, no. 6,
pp. 17 309–17 331, 2018. p. 4780, 2023.
[20] D. Patrikar and M. Parate, “Anomaly detection using edge computing [42] Y. Liu, X. Zhang, Y. Kang, L. Li, T. Chen, M. Hong, and Q. Yang,
in video surveillance system: review,” Int J Multimed Info Retr, vol. 11, “Fedbcd: A communication-efficient collaborative learning framework
pp. 85–110, 2022. for distributed features,” IEEE Transactions on Signal Processing,
[21] A. Xiang, “Being’seen’vs.’mis-seen’: Tensions between privacy and vol. 70, pp. 4277–4290, 2022.
fairness in computer vision,” Harvard Journal of Law & Technology, [43] E. Gabrielli, G. Pica, and G. Tolomei, “A survey on decentralized
Forthcoming, 2022. federated learning,” arXiv preprint arXiv:2308.04604, 2023.
[22] A. Alsalemi, Y. Himeur, F. Bensaali, and A. Amira, “An innovative [44] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated
edge-based internet of energy solution for promoting energy saving in optimization: Distributed machine learning for on-device intelligence,”
buildings,” Sustainable Cities and Society, vol. 78, p. 103571, 2022. arXiv preprint arXiv:1610.02527, 2016.
[23] A. Sayed, Y. Himeur, A. Alsalemi, F. Bensaali, and A. Amira, [45] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated
“Intelligent edge-based recommender system for internet of energy multi-task learning,” Advances in neural information processing systems,
applications,” IEEE Systems Journal, vol. 16, no. 3, pp. 5001–5010, vol. 30, 2017.
2021. [46] A. Linardos, K. Kushibar, S. Walsh, P. Gkontra, and K. Lekadir,
[24] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated “Federated learning for multi-center imaging diagnostics: A study in
learning,” Synthesis Lectures on Artificial Intelligence and Machine cardiovascular disease,” arXiv preprint arXiv:2107.03901, 2021.
Learning, vol. 13, no. 3, pp. 1–207, 2019. [47] Z. Chen, M. Zhu, C. Yang, and Y. Yuan, “Personalized retrogress-
[25] O. Gupta and R. Raskar, “Distributed learning of deep neural network resilient framework for real-world medical federated learning,” in
over multiple agents,” Journal of Network and Computer Applications, International Conference on Medical Image Computing and Computer-
vol. 116, pp. 1–8, 2018. Assisted Intervention. Springer, 2021, pp. 347–356.
28
[48] K. Doshi and Y. Yilmaz, “Federated learning-based driver activity [69] D. Shome and T. Kar, “Fedaffect: Few-shot federated learning for facial
recognition for edge devices,” in Proceedings of the IEEE/CVF expression recognition,” in Proceedings of the IEEE/CVF International
Conference on Computer Vision and Pattern Recognition, 2022, pp. Conference on Computer Vision, 2021, pp. 4168–4175.
3338–3346. [70] H. Kheddar, M. Hemis, Y. Himeur, D. Megı́as, and A. Amira, “Deep
[49] H. Zhu, “On the relationship between (secure) multi-party computation learning for diverse data types steganalysis: A review,” arXiv preprint
and (secure) federated learning,” arXiv preprint arXiv:2008.02609, arXiv:2308.04522, 2023.
2020. [71] W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate rgb-d salient
[50] D. Byrd and A. Polychroniadou, “Differentially private secure multi- object detection via collaborative learning,” in Computer Vision–ECCV
party computation for federated learning in financial applications,” 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020,
in Proceedings of the First ACM International Conference on AI in Proceedings, Part XVIII 16. Springer, 2020, pp. 52–69.
Finance, 2020, pp. 1–9. [72] Q. Fan, D.-P. Fan, H. Fu, C.-K. Tang, L. Shao, and Y.-W. Tai, “Group
[51] R. Kanagavelu, Z. Li, J. Samsudin, Y. Yang, F. Yang, R. S. M. Goh, collaborative learning for co-salient object detection,” in Proceedings of
M. Cheah, P. Wiwatphonthana, K. Akkarajitsakul, and S. Wang, “Two- the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
phase multi-party computation enabled privacy-preserving federated 2021, pp. 12 288–12 298.
learning,” in 2020 20th IEEE/ACM International Symposium on Cluster, [73] S. Teeparthi, V. Jatla, M. S. Pattichis, S. Celedón-Pattichis, and
Cloud and Internet Computing (CCGRID). IEEE, 2020, pp. 410–419. C. LópezLeiva, “Fast hand detection in collaborative learning environ-
[52] A. Triastcyn and B. Faltings, “Federated learning with bayesian ments,” in International Conference on Computer Analysis of Images
differential privacy,” in 2019 IEEE International Conference on Big and Patterns. Springer, 2021, pp. 445–454.
Data (Big Data). IEEE, 2019, pp. 2587–2596. [74] S. Teeparthi, “Long term object detection and tracking in collaborative
[53] X. Wu, Y. Zhang, M. Shi, P. Li, R. Li, and N. N. Xiong, “An adaptive learning environments,” arXiv preprint arXiv:2106.07556, 2021.
federated learning scheme with differential privacy preserving,” Future [75] C. Huang and R. Nevatia, “High performance object detection by
Generation Computer Systems, vol. 127, pp. 362–372, 2022. collaborative learning of joint ranking of granules features,” in 2010
[54] U. Shah, I. Dave, J. Malde, J. Mehta, and S. Kodeboyina, “Maintaining IEEE Computer Society Conference on Computer Vision and Pattern
privacy in medical imaging with federated learning, deep learning, Recognition. IEEE, 2010, pp. 41–48.
differential privacy, and encrypted computation,” in 2021 6th Inter- [76] X. Fang, Z. Kuang, R. Zhang, X. Shao, and H. Wang, “Collaborative
national Conference for Convergence in Technology (I2CT). IEEE, learning in bounding box regression for object detection,” Pattern
2021, pp. 1–6. Recognition Letters, 2021.
[55] M. Adnan, S. Kalra, J. C. Cresswell, G. W. Taylor, and H. R. Tizhoosh, [77] Y. Zhou, X. He, L. Huang, L. Liu, F. Zhu, S. Cui, and L. Shao, “Col-
“Federated learning and differential privacy for medical image analysis,” laborative learning of semi-supervised segmentation and classification
Scientific reports, vol. 12, no. 1, pp. 1–10, 2022. for medical images,” in Proceedings of the IEEE/CVF Conference on
[56] O. Choudhury, A. Gkoulalas-Divanis, T. Salonidis, I. Sylla, Y. Park, Computer Vision and Pattern Recognition, 2019, pp. 2079–2088.
G. Hsu, and A. Das, “Differential privacy-enabled federated learning [78] J. Wang, J. Yao, Y. Zhang, and R. Zhang, “Collaborative learning for
for sensitive health data,” arXiv preprint arXiv:1910.02578, 2019. weakly supervised object detection,” arXiv preprint arXiv:1802.03531,
[57] A. Ziller, D. Usynin, R. Braren, M. Makowski, D. Rueckert, and 2018.
G. Kaissis, “Medical imaging deep learning with differential privacy,” [79] D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge
Scientific Reports, vol. 11, no. 1, pp. 1–8, 2021. for weakly supervised object detection under a collaborative self-paced
[58] L. Zhang, J. Xu, P. Vijayakumar, P. K. Sharma, and U. Ghosh, curriculum learning framework,” International Journal of Computer
“Homomorphic encryption-based privacy-preserving federated learning Vision, vol. 127, no. 4, pp. 363–380, 2019.
in iot-enabled healthcare system,” IEEE Transactions on Network [80] S. Chen, D. Shao, X. Shu, C. Zhang, and J. Wang, “FCC-Net: A full-
Science and Engineering, pp. 1–17, 2022. coverage collaborative network for weakly supervised remote sensing
[59] D. Stripelis, H. Saleem, T. Ghai, N. Dhinagar, U. Gupta, C. Anastasiou, object detection,” Electronics, vol. 9, no. 9, p. 1356, 2020.
G. Ver Steeg, S. Ravi, M. Naveed, P. M. Thompson et al., “Secure [81] Y. Liang, G. Qin, M. Sun, J. Qin, J. Yan, and Z. Zhang, “Semantic
neuroimaging analysis using federated learning with homomorphic and detail collaborative learning network for salient object detection,”
encryption,” in 17th International Symposium on Medical Information Neurocomputing, vol. 462, pp. 478–490, 2021.
Processing and Analysis, vol. 12088. SPIE, 2021, pp. 351–359. [82] J. Seo and H. Park, “Object recognition in very low resolution images
[60] R. Kumar, J. Kumar, A. A. Khan, H. Ali, C. M. Bernard, R. U. Khan, using deep collaborative learning,” IEEE Access, vol. 7, pp. 134 071–
S. Zeng et al., “Blockchain and homomorphic encryption based privacy- 134 082, 2019.
preserving model aggregation for medical images,” Computerized [83] Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online
Medical Imaging and Graphics, vol. 102, p. 102139, 2022. knowledge distillation via collaborative learning,” in Proceedings of the
[61] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, IEEE/CVF Conference on Computer Vision and Pattern Recognition,
and W. Shi, “Federated learning of predictive models from federated 2020, pp. 11 020–11 029.
electronic health records,” International journal of medical informatics, [84] P. Yu and Y. Liu, “Federated object detection: Optimizing object
vol. 112, pp. 59–67, 2018. detection model with federated learning,” in Proceedings of the 3rd
[62] Y. Zhao, H. Liu, H. Li, P. Barnaghi, and H. Haddadi, “Semi- International Conference on Vision, Image and Signal Processing, 2019,
supervised federated learning for activity recognition,” arXiv preprint pp. 1–6.
arXiv:2011.00851, 2020. [85] H. Choi and I. V. Bajić, “Deep feature compression for collaborative
[63] A. Grammenos, R. Mendoza Smith, J. Crowcroft, and C. Mascolo, “Fed- object detection,” in 2018 25th IEEE International Conference on
erated principal component analysis,” Advances in Neural Information Image Processing (ICIP). IEEE, 2018, pp. 3743–3747.
Processing Systems, vol. 33, pp. 6453–6464, 2020. [86] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning
[64] H. H. Kumar, V. Karthik, and M. K. Nair, “Federated k-means clustering: for health: Distributed deep learning without sharing raw patient data,”
A novel edge ai based approach for privacy preservation,” in 2020 IEEE arXiv preprint arXiv:1812.00564, 2018.
International Conference on Cloud Computing in Emerging Markets [87] Y. He, Y. Kang, J. Luo, L. Fan, and Q. Yang, “A hybrid self-supervised
(CCEM). IEEE, 2020, pp. 52–56. learning framework for vertical federated learning,” arXiv preprint
[65] H. Kassem, D. Alapatt, P. Mascagni, C. AI4SafeChole, A. Karargyris, arXiv:2208.08934, 2022.
and N. Padoy, “Federated cycling (fedcy): Semi-supervised federated [88] W. Zhuang, Y. Wen, and S. Zhang, “Divergence-aware federated self-
learning of surgical phases,” IEEE Transactions on Medical Imaging, supervised learning,” arXiv preprint arXiv:2204.04385, 2022.
pp. 1–1, 2022. [89] A. Saeed, F. D. Salim, T. Ozcelebi, and J. Lukkien, “Federated
[66] Y. Rehman, Y. Gao, J. Shen, P. de Gusmão, and N. Lane, “Federated self-supervised learning of multisensor representations for embedded
self-supervised learning for video understanding,” Lecture Notes in intelligence,” IEEE Internet of Things Journal, vol. 8, no. 2, pp. 1030–
Computer Science (including subseries Lecture Notes in Artificial 1040, 2020.
Intelligence and Lecture Notes in Bioinformatics), vol. 13691 LNCS, [90] R. Yan, L. Qu, Q. Wei, S.-C. Huang, L. Shen, D. Rubin, L. Xing, and
pp. 506–522, 2022, cited By 0. Y. Zhou, “Label-efficient self-supervised federated learning for tackling
[67] I. Dave, C. Chen, and M. Shah, “Spact: Self-supervised privacy data heterogeneity in medical imaging,” IEEE Transactions on Medical
preservation for action recognition,” in Proceedings of the IEEE/CVF Imaging, 2023.
Conference on Computer Vision and Pattern Recognition, vol. 2022- [91] R. Zhu, K. Yin, H. Xiong, H. Tang, and G. Yin, “Masked face detection
June. IEEE Computer Society, 2022, pp. 20 132–20 141, cited By algorithm in the dense crowd based on federated learning,” Wireless
3. Communications and Mobile Computing, vol. 2021, 2021.
[68] S. Li, Y. Mao, J. Li, Y. Xu, J. Li, X. Chen, S. Liu, and X. Zhao, [92] Y. Himeur, S. Al-Maadeed, H. Kheddar, N. Al-Maadeed, K. Abualsaud,
“Fedutn: federated self-supervised learning with updating target network,” A. Mohamed, and T. Khattab, “Video surveillance using deep transfer
Applied Intelligence, 2022, cited By 0. learning and deep domain adaptation: Towards better generalization,”
29
Engineering Applications of Artificial Intelligence, vol. 119, p. 105698, [113] B. Yan, B. Liu, L. Wang, Y. Zhou, Z. Liang, M. Liu, and C.-Z. Xu,
2023. “Fedcm: A real-time contribution measurement method for participants
[93] A. B. Sada, M. A. Bouras, J. Ma, H. Runhe, and H. Ning, “A distributed in federated learning,” in 2021 International Joint Conference on Neural
video analytics architecture based on edge-computing and federated Networks (IJCNN). IEEE, 2021, pp. 1–8.
learning,” in 2019 IEEE Intl Conf on Dependable, Autonomic and [114] H. R. Roth, K. Chang, P. Singh, N. Neumark, W. Li, V. Gupta,
Secure Computing, Intl Conf on Pervasive Intelligence and Computing, S. Gupta, L. Qu, A. Ihsani, B. C. Bizzo et al., “Federated learning
Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Sci- for breast density classification: A real-world implementation,” in
ence and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Domain Adaptation and Representation Transfer, and Distributed and
IEEE, 2019, pp. 215–220. Collaborative Learning. Springer, 2020, pp. 181–191.
[94] F. Concone, C. Ferdico, G. L. Re, and M. Morana, “A federated learning [115] Z. Yan, J. Wicaksana, Z. Wang, X. Yang, and K.-T. Cheng, “Variation-
approach for distributed human activity recognition,” in 2022 IEEE aware federated learning with multi-source decentralized medical image
International Conference on Smart Computing (SMARTCOMP). IEEE, data,” IEEE Journal of Biomedical and Health Informatics, 2020.
2022, pp. 269–274. [116] S. Silva, B. A. Gutman, E. Romero, P. M. Thompson, A. Altmann,
[95] Y. Liu, J. Nie, X. Li, S. H. Ahmed, W. Y. B. Lim, and C. Miao, and M. Lorenzi, “Federated learning in distributed medical databases:
“Federated learning in the sky: Aerial-ground air quality sensing Meta-analysis of large-scale subcortical brain data,” in 2019 IEEE 16th
framework with uav swarms,” IEEE Internet of Things Journal, 2020. international symposium on biomedical imaging (ISBI 2019). IEEE,
[96] Y. Liu, A. Huang, Y. Luo, H. Huang, Y. Liu, Y. Chen, L. Feng, T. Chen, 2019, pp. 270–274.
H. Yu, and Q. Yang, “Fedvision: An online visual object detection [117] M. J. Sheller, B. Edwards, G. A. Reina, J. Martin, S. Pati, A. Kotrotsou,
platform powered by federated learning,” in Proceedings of the AAAI M. Milchenko, W. Xu, D. Marcus, R. R. Colen et al., “Federated
Conference on Artificial Intelligence, vol. 34, 2020, pp. 13 172–13 179. learning in medicine: facilitating multi-institutional collaborations
[97] A. Catalfamo, A. Celesti, M. Fazio, G. Randazzo, and M. Villari, “A without sharing patient data,” Scientific reports, vol. 10, no. 1, pp.
platform for federated learning on the edge: a video analysis use case,” 1–12, 2020.
in 2022 IEEE Symposium on Computers and Communications (ISCC), [118] P. Guo, P. Wang, J. Zhou, S. Jiang, and V. M. Patel, “Multi-institutional
2022, pp. 1–7. collaborations for improving deep learning-based magnetic resonance
[98] P. Jain, S. Goenka, S. Bagchi, B. Banerjee, and S. Chaterji, “Federated image reconstruction using federated learning,” in Proceedings of the
action recognition on heterogeneous embedded devices,” arXiv preprint IEEE/CVF Conference on Computer Vision and Pattern Recognition,
arXiv:2107.12147, 2021. 2021, pp. 2423–2432.
[99] B. Zhang, J. Wang, J. Fu, and J. Xia, “Driver action recognition [119] M. Y. Lu, R. J. Chen, D. Kong, J. Lipkova, R. Singh, D. F. Williamson,
using federated learning,” in 2021 the 7th International Conference on T. Y. Chen, and F. Mahmood, “Federated learning for computational
Communication and Information Processing (ICCIP), 2021, pp. 74–77. pathology on gigapixel whole slide images,” Medical image analysis,
[100] Z. Shi, L. Zhang, Y. Liu, X. Cao, Y. Ye, M.-M. Cheng, and vol. 76, p. 102298, 2022.
G. Zheng, “Crowd counting with deep negative correlation learning,” [120] W. Li, F. Milletarı̀, D. Xu, N. Rieke, J. Hancox, W. Zhu, M. Baust,
in Proceedings of the IEEE conference on computer vision and pattern Y. Cheng, S. Ourselin, M. J. Cardoso et al., “Privacy-preserving
recognition, 2018, pp. 5382–5390. federated brain tumour segmentation,” in International workshop on
[101] A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao, and L. Shao, machine learning in medical imaging. Springer, 2019, pp. 133–141.
“Relational attention network for crowd counting,” in Proceedings of [121] C. I. Bercea, B. Wiestler, D. Rueckert, and S. Albarqouni, “Feddis:
the IEEE/CVF international conference on computer vision, 2019, pp. Disentangled federated learning for unsupervised brain pathology
6788–6797. segmentation,” arXiv preprint arXiv:2103.03705, 2021.
[102] Y. Jiang, R. Cong, C. Shu, A. Yang, Z. Zhao, and G. Min, “Federated [122] A. E. Cetinkaya, M. Akin, and S. Sagiroglu, “Improving performance
learning based mobile crowd sensing with unreliable user data,” in 2020 of federated learning based medical image analysis in non-iid settings
IEEE 22nd International Conference on High Performance Computing using image augmentation,” in 2021 International Conference on
and Communications; IEEE 18th International Conference on Smart Information Security and Cryptology (ISCTURKEY). IEEE, 2021, pp.
City; IEEE 6th International Conference on Data Science and Systems 69–74.
(HPCC/SmartCity/DSS). IEEE, 2020, pp. 320–327. [123] B. Han, R. Jhaveri, H. Wang, D. Qiao, and J. Du, “Application of robust
[103] X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, “Flchain: A blockchain zero-watermarking scheme based on federated learning for securing the
for auditable federated learning with trust and incentive,” in 2019 5th healthcare data,” IEEE Journal of Biomedical and Health Informatics,
International Conference on Big Data Computing and Communications 2021.
(BIGCOM). IEEE, 2019, pp. 151–159. [124] B. G. Tekgul, Y. Xia, S. Marchal, and N. Asokan, “Waffle: Watermark-
[104] B. R. Kiran, D. M. Thomas, and R. Parakkal, “An overview of deep ing in federated learning,” in 2021 40th International Symposium on
learning based methods for unsupervised and semi-supervised anomaly Reliable Distributed Systems (SRDS). IEEE, 2021, pp. 310–320.
detection in videos,” Journal of Imaging, vol. 4, no. 2, p. 36, 2018. [125] X. Liu, S. Shao, Y. Yang, K. Wu, W. Yang, and H. Fang, “Secure
[105] S. Singh, S. Bhardwaj, H. Pandey, and G. Beniwal, “Anomaly detection federated learning model verification: A client-side backdoor triggered
using federated learning,” in Proceedings of International Conference watermarking scheme,” in 2021 IEEE International Conference on
on Artificial Intelligence and Applications. Springer, 2021, pp. 141– Systems, Man, and Cybernetics (SMC). IEEE, 2021, pp. 2414–2419.
148. [126] D. Połap, G. Srivastava, and K. Yu, “Agent architecture of an intelligent
[106] S. Bharti, A. McGibney, and T. O’Gorman, “Edge-enabled federated medical system based on federated learning and blockchain technology,”
learning for vision based product quality inspection,” in 2022 33rd Journal of Information Security and Applications, vol. 58, p. 102748,
Irish Signals and Systems Conference (ISSC). IEEE, 2022, pp. 1–6. 2021.
[107] A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, Y. Liu, [127] A. Ziller, D. Usynin, N. Remerscheid, M. Knolle, M. Makowski,
E. Topol, J. Dean, and R. Socher, “Deep learning-enabled medical R. Braren, D. Rueckert, and G. Kaissis, “Differentially private federated
computer vision,” NPJ digital medicine, vol. 4, no. 1, pp. 1–9, 2021. deep learning for multi-site medical image segmentation,” arXiv
[108] M. J. Sheller, G. A. Reina, B. Edwards, J. Martin, and S. Bakas, preprint arXiv:2107.02586, 2021.
“Multi-institutional deep learning modeling without sharing patient data: [128] K. Guo, N. Li, J. Kang, and J. Zhang, “Towards efficient federated
A feasibility study on brain tumor segmentation,” in International learning-based scheme in medical cyber-physical systems for distributed
MICCAI Brainlesion Workshop. Springer, 2018, pp. 92–104. data,” Software: Practice and Experience, vol. 51, no. 11, pp. 2274–
[109] X. Li, Y. Gu, N. Dvornek, L. H. Staib, P. Ventola, and J. S. Duncan, 2289, 2021.
“Multi-site fmri analysis using privacy-preserving federated learning [129] M. Jiang, Z. Wang, and Q. Dou, “Harmofl: Harmonizing local and
and domain adaptation: Abide results,” Medical Image Analysis, vol. 65, global drifts in federated learning on heterogeneous medical images,”
p. 101765, 2020. arXiv preprint arXiv:2112.10775, 2021.
[110] I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, [130] Q. Liu, C. Chen, J. Qin, Q. Dou, and P.-A. Heng, “Feddg: Federated
A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsai et al., “Federated learning domain generalization on medical image segmentation via episodic
for predicting clinical outcomes in patients with covid-19,” Nature learning in continuous frequency space,” in Proceedings of the
medicine, vol. 27, no. 10, pp. 1735–1743, 2021. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
[111] I. Feki, S. Ammar, Y. Kessentini, and K. Muhammad, “Federated 2021, pp. 1013–1023.
learning for covid-19 screening from chest x-ray images,” Applied Soft [131] B. Sun, H. Huo, Y. Yang, and B. Bai, “Partialfed: Cross-domain
Computing, vol. 106, p. 107330, 2021. personalized federated learning via partial initialization,” Advances in
[112] A. E. Cetinkaya, M. Akin, and S. Sagiroglu, “A communication efficient Neural Information Processing Systems, vol. 34, 2021.
federated learning approach to multi chest diseases classification,” [132] H.-P. Wang, S. U. Stich, Y. He, and M. Fritz, “Progfed: Effective, com-
in 2021 6th International Conference on Computer Science and munication, and computation efficient federated learning by progressive
Engineering (UBMK). IEEE, 2021, pp. 429–434. training,” arXiv preprint arXiv:2110.05323, 2021.
30
[133] J. Luo and S. Wu, “Fedsld: Federated learning with shared label distribu- area-efficient and flexible architectures for optimal ate pairing on fpga,”
tion for medical image classification,” arXiv preprint arXiv:2110.08378, arXiv preprint arXiv:2308.04261, 2023.
2021. [154] D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model
[134] G. Litjens, O. Debats, J. Barentsz, N. Karssemeijer, and H. Huisman, distillation,” arXiv preprint arXiv:1910.03581, 2019.
“Computer-aided detection of prostate cancer in mri,” IEEE transactions [155] J. Pang, Y. Huang, Z. Xie, Q. Han, and Z. Cai, “Realizing the
on medical imaging, vol. 33, no. 5, pp. 1083–1092, 2014. heterogeneity: A self-organized federated learning framework for iot,”
[135] H. Zhang, J. Bosch, and H. H. Olsson, “End-to-end federated learning IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3088–3098, 2020.
for autonomous driving vehicles,” in 2021 International Joint Confer- [156] A. Alsalemi, Y. Himeur, F. Bensaali, and A. Amira, “Smart sensing and
ence on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8. end-users’ behavioral change in residential buildings: An edge-based
[136] A. Nguyen, T. Do, M. Tran, B. X. Nguyen, C. Duong, T. Phan, internet of energy perspective,” IEEE Sensors Journal, vol. 21, no. 24,
E. Tjiputra, and Q. D. Tran, “Deep federated learning for autonomous pp. 27 623–27 631, 2021.
driving,” in 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, [157] H. Zhu, J. Xu, S. Liu, and Y. Jin, “Federated learning on non-iid data:
2022, pp. 1824–1830. A survey,” Neurocomputing, vol. 465, pp. 371–390, 2021.
[137] X. Zhou, W. Liang, J. She, Z. Yan, I. Kevin, and K. Wang, “Two- [158] H. Wang, Z. Kaplan, D. Niu, and B. Li, “Optimizing federated learning
layer federated learning with heterogeneous model aggregation for on non-iid data with reinforcement learning,” in IEEE INFOCOM
6g supported internet of vehicles,” IEEE Transactions on Vehicular 2020-IEEE Conference on Computer Communications. IEEE, 2020,
Technology, vol. 70, no. 6, pp. 5308–5317, 2021. pp. 1698–1707.
[159] C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical
[138] L. U. Khan, Y. K. Tun, M. Alsenwi, M. Imran, Z. Han, and C. S. Hong,
clustering of local updates to improve training on non-iid data,” in 2020
“A dispersed federated learning framework for 6g-enabled autonomous
International Joint Conference on Neural Networks (IJCNN). IEEE,
driving cars,” IEEE Transactions on Network Science and Engineering,
2020, pp. 1–9.
2022.
[160] H. Bousbiat, Y. Himeur, I. Varlamis, F. Bensaali, and A. Amira, “Neural
[139] Y. Li, X. Tao, X. Zhang, J. Liu, and J. Xu, “Privacy-preserved federated load disaggregation: Meta-analysis, federated learning and beyond,”
learning for autonomous driving,” IEEE Transactions on Intelligent Energies, vol. 16, no. 2, p. 991, 2023.
Transportation Systems, vol. 23, no. 7, pp. 8423–8434, 2021. [161] S. Pouriyeh, O. Shahid, R. M. Parizi, Q. Z. Sheng, G. Srivastava,
[140] I. Donevski, J. J. Nielsen, and P. Popovski, “On addressing heterogeneity L. Zhao, and M. Nasajpour, “Secure smart communication efficiency
in federated learning for autonomous vehicles connected to a drone in federated learning: Achievements and challenges,” Applied Sciences,
orchestrator,” Frontiers in Communications and Networks, vol. 2, p. vol. 12, no. 18, p. 8980, 2022.
709946, 2021. [162] L. Lyu, H. Yu, X. Ma, L. Sun, J. Zhao, Q. Yang, and P. S. Yu, “Privacy
[141] A. M. Elbir, B. Soner, S. Çöleri, D. Gündüz, and M. Bennis, “Federated and robustness in federated learning: Attacks and defenses,” arXiv
learning in vehicular networks,” in 2022 IEEE International Mediter- preprint arXiv:2012.06337, 2020.
ranean Conference on Communications and Networking (MeditCom). [163] S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning
IEEE, 2022, pp. 72–77. attacks in collaborative deep learning systems,” in Proceedings of the
[142] D. Jallepalli, N. C. Ravikumar, P. V. Badarinath, S. Uchil, and M. A. 32nd Annual Conference on Computer Security Applications, 2016, pp.
Suresh, “Federated learning for object detection in autonomous vehicles,” 508–519.
in 2021 IEEE Seventh International Conference on Big Data Computing [164] S. Andreina, G. A. Marson, H. Möllering, and G. Karame, “Baffle:
Service and Applications (BigDataService). IEEE, 2021, pp. 107–114. Backdoor detection via feedback-based federated learning,” in 2021
[143] K. Xie, Z. Zhang, B. Li, J. Kang, D. Niyato, S. Xie, and Y. Wu, IEEE 41st International Conference on Distributed Computing Systems
“Efficient federated learning with spike neural networks for traffic sign (ICDCS). IEEE, 2021, pp. 852–863.
recognition,” IEEE Transactions on Vehicular Technology, vol. 71, [165] Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-
no. 9, pp. 9980–9992, 2022. based federated learning for traffic flow prediction,” Future Generation
[144] F. Sattler, S. Wiedemann, K.-R. Müller, and W. Samek, “Robust and Computer Systems, vol. 117, pp. 328–337, 2021.
communication-efficient federated learning from non-iid data,” IEEE [166] C. He, A. D. Shah, Z. Tang, D. F. N. Sivashunmugam, K. Bhogaraju,
transactions on neural networks and learning systems, vol. 31, no. 9, M. Shimpi, L. Shen, X. Chu, M. Soltanolkotabi, and S. Avestimehr,
pp. 3400–3413, 2019. “Fedcv: a federated learning framework for diverse computer vision
[145] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, tasks,” arXiv preprint arXiv:2111.11066, 2021.
“Federated learning with compression: Unified analysis and sharp [167] J. Luo, X. Wu, Y. Luo, A. Huang, Y. Huang, Y. Liu, and Q. Yang,
guarantees,” in International Conference on Artificial Intelligence and “Real-world image datasets for federated learning,” arXiv preprint
Statistics. PMLR, 2021, pp. 2350–2358. arXiv:1910.11089, 2019.
[146] W. Luping, W. Wei, and L. Bo, “Cmfl: Mitigating communication [168] S. S. Sohail, F. Farhat, Y. Himeur, M. Nadeem, D. Ø. Madsen, Y. Singh,
overhead for federated learning,” in 2019 IEEE 39th international S. Atalla, and W. Mansoor, “Decoding chatgpt: A taxonomy of existing
conference on distributed computing systems (ICDCS). IEEE, 2019, research, current challenges, and possible future directions,” Journal of
pp. 954–964. King Saud University-Computer and Information Sciences, p. 101675,
[147] I. Varlamis, C. Sardianos, C. Chronis, G. Dimitrakopoulos, Y. Himeur, 2023.
A. Alsalemi, F. Bensaali, and A. Amira, “Using big data and [169] F. Farhat, E. S. Silva, H. Hassani, D. Ø. Madsen, S. S. Sohail, Y. Himeur,
federated learning for generating energy efficiency recommendations,” M. A. Alam, and A. Zafar, “Analyzing the scholarly footprint of chatgpt:
International Journal of Data Science and Analytics, pp. 1–17, 2022. mapping the progress and identifying future trends,” 2023.
[148] M. Tang, X. Ning, Y. Wang, J. Sun, Y. Wang, H. Li, and Y. Chen, [170] S. S. Sohail, F. Farhat, Y. Himeur, M. Nadeem, D. Ø. Madsen, Y. Singh,
“Fedcor: Correlation-based active client selection strategy for heteroge- S. Atalla, and W. Mansoor, “The future of gpt: A taxonomy of existing
neous federated learning,” in 2022 IEEE/CVF Conference on Computer chatgpt research, current challenges, and possible future directions,”
Vision and Pattern Recognition (CVPR), 2022, pp. 10 092–10 101. Current Challenges, and Possible Future Directions (April 8, 2023),
2023.
[149] L. Qu, Y. Zhou, P. P. Liang, Y. Xia, F. Wang, E. Adeli, L. Fei-
[171] S. S. Sohail, D. Ø. Madsen, Y. Himeur, and M. Ashraf, “Using chatgpt
Fei, and D. Rubin, “Rethinking architecture design for tackling data
to navigate ambivalent and contradictory research findings on artificial
heterogeneity in federated learning,” in Proceedings of the IEEE/CVF
intelligence,” Available at SSRN 4413913, 2023.
Conference on Computer Vision and Pattern Recognition, 2022, pp.
10 061–10 071.
[150] M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, and C. Chen, “Local
learning matters: Rethinking data heterogeneity in federated learning,”
in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2022, pp. 8397–8406.
[151] H. Bousbiat, R. Bousselidj, Y. Himeur, A. Amira, F. Bensaali, F. Fadli,
W. Mansoor, and W. Elmenreich, “Crossing roads of federated learning
and smart grids: Overview, challenges, and perspectives,” arXiv preprint
arXiv:2304.08602, 2023.
[152] M. Domı́nguez-Morales, J. P. Domı́nguez-Morales, Á. Jiménez-
Fernández, A. Linares-Barranco, and G. Jiménez-Moreno, “Stereo
matching in address-event-representation (aer) bio-inspired binocular
systems in a field-programmable gate array (fpga),” Electronics, vol. 8,
no. 4, p. 410, 2019.
[153] O. Azzouzi, M. Anane, M. Koudil, M. Issad, and Y. Himeur, “Novel