0% found this document useful (0 votes)
17 views30 pages

FL For Computer Vision

This document reviews the advancements of federated learning (FL) in computer vision (CV) applications, highlighting its advantages over traditional centralized training methods. It discusses the challenges of implementing FL in CV, including privacy concerns and model performance, while proposing a taxonomy of FL techniques and their applications. The paper also identifies open research challenges and potential future directions for FL and blockchain in CV.

Uploaded by

Long Long
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views30 pages

FL For Computer Vision

This document reviews the advancements of federated learning (FL) in computer vision (CV) applications, highlighting its advantages over traditional centralized training methods. It discusses the challenges of implementing FL in CV, including privacy concerns and model performance, while proposing a taxonomy of FL techniques and their applications. The paper also identifies open research challenges and potential future directions for FL and blockchain in CV.

Uploaded by

Long Long
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

1

Federated Learning for Computer Vision


Yassine Himeur∗ , Iraklis Varlamis† , Hamza Kheddar‡ , Abbes Amira§ , Shadi Atalla∗ , Yashbir Singh∥ , Faycal
Bensaali‡ and Wathiq Mansoor∗
∗ College of Engineering and Information Technology, University of Dubai, Dubai, UAE
† Department of Informatics and Telematics, Harokopio University of Athens, Greece
‡ LSEA Laboratory, Electrical Engineering Department, University of Medea, Algeria
§ Department of Computer Science, University of Sharjah, UAE
¶ Institute of Artificial Intelligence, De Montfort University, Leicester, United Kingdom
∥ Department of Radiology, Mayo Clinic, Rochester, MN, USA
∥ Electrical Engineering Department, Qatar University, Qatar

Abstract—Computer Vision (CV) is playing a significant role in grasp their surroundings but also paves the way for more
transforming society by utilizing machine learning (ML) tools for informed decision-making processes [4]. By incorporating
arXiv:2308.13558v1 [cs.CV] 24 Aug 2023

a wide range of tasks. However, the need for large-scale datasets to


CV techniques, artificial agents can move beyond mere data
train ML models creates challenges for centralized ML algorithms.
The massive computation loads required for processing and the analysis, delving into a more profound comprehension of the
potential privacy risks associated with storing and processing visual realm [5]. The integration of such insights holds the
data on central cloud servers put these algorithms under severe potential to revolutionize various domains by enhancing the
strain. To address these issues, federated learning (FL) has depth of understanding and the efficacy of decision-making
emerged as a promising solution, allowing privacy preservation processes in AI systems [6], [7].
by training models locally and exchanging them to improve overall
performance. Additionally, the computational load is distributed Understanding the contents and concepts of an image in-
across multiple clients, reducing the burden on central servers. volves a significant amount of information which is connected
with image segmentation, extraction of features and objects,
This paper presents, to the best of the authors’ knowledge, the first
review discussing recent advancements of FL in CV applications, and synthesis of the scene as a whole [8]. Although the CV
comparing them to conventional centralized training paradigms. and perception task is reflexively performed by humans, grace
It provides an overview of current FL applications in various
CV tasks, emphasizing the advantages of FL and the challenges to their ability for abstraction, it is still quite complex for
of implementing it in CV. To facilitate this, the paper proposes a artificial agents [9]. It is also hard for humans to explain how
taxonomy of FL techniques in CV, outlining their applications they perceive the world and develop ML modules that behave
and security threats. It also discusses privacy concerns related in a similar manner [10]. The four main techniques that are
to implementing blockchain in FL schemes for CV tasks and employed for training machines to perform CV tasks are either
summarizes existing privacy preservation methods. Moving on,
the paper identifies open research challenges and potential future based on statistics (i.e. on patterns learned from large training
research directions to further exploit the potential of FL and datasets), on the logic expressed in the form of rules, on deep
blockchain in CV applications. neural networks (DNNs) that capture the non-linear relations
Index Terms—Computer vision (CV), Federated learning between image features and the final decision or on genetic
(FL), Blockchain, Horizontal federated learning (HFL), Vertical and evolutionary algorithms that combine multiple decisions in
federated learning (VFL). order to find the one that maximizes the overall performance
[11], [12].
The huge advancements and impressive results in the
I. I NTRODUCTION
performance of CV algorithms in tasks such as image clas-
In recent years, the emergence and evolution of machine sification, image segmentation, object detection, and scene
learning (ML) has carved out a distinct niche for itself within perception, came from a shift from signal processing methods
the broader realm of artificial intelligence (AI). Specifically, to solutions that rely on deep learning (DL) methods [13].
ML emphasizes the design, development, and training of The biggest challenge for DL methods is that they require
algorithms and software agents to sift through, interpret, huge computational resources and energy for training, which
and make data-driven decisions based on patterns hidden in turn makes them hardly applicable in many application
within vast datasets [1]. These artificial agents are increasingly setups, where decisions have to be taken on the edge, using
being equipped with multifaceted sensory capabilities that low resources and limited power (e.g. in drones, mobile phones,
allow them to interact more intuitively with their environment. etc.) [14]. For this purpose low-power CV challenges have
Among these, vision stands out as the most profound, granting been established and new energy-efficient algorithms and light-
machines the ability to ’see’ and ’understand’ the world around weight DL architectures have been proposed, such as reservoir
them in ways comparable to human cognition [2]. Given the computing [15], or random neural networks [16].
intrinsic importance of visual data, computer vision (CV) has Another challenge for CV systems is the protection of user
rapidly risen to prominence within the ML community. CV privacy. Collecting millions of people’s images and videos
seeks to enable machines to interpret and derive meaningful poses serious privacy risks, which must be seriously considered
conclusions from visual data, mimicking the human ability [17], [18]. First of all, the right to be forgotten is constantly
to recognize, process, and respond to visual stimuli [3]. violated by companies that collect visual data and retain it
This capability not only empowers artificial agents to better permanently or use it for multiple purposes. Also, through
2

numerous flaws, such delicate visual data could be exploited essence, FL enables devices to collaboratively learn a shared
or exposed [19]. Secondly, the privacy-preserving analysis model without having to share raw data, a feature that
of images and videos from video surveillance applications, holds immense potential for applications in diverse domains,
where CV methods are used to detect violations (e.g. mask- particularly in CV. In CV, the applications of FL range
wearing, distance keeping, etc.) in public spaces can be quite from object detection and image classification to semantic
challenging too [20], [19]. Such videos frequently include segmentation, among others, with an array of potential real-
inadvertently caught private objects including faces, car plates, world use cases in sectors such as healthcare, autonomous
computer screens, and more. The ability of methods to detect driving, and surveillance systems. However, the adoption of
and even identify humans and other objects in unconstrained FL in CV is not without its challenges. Issues related to model
environments can put at risk the anonymity of people in performance, communication efficiency, data heterogeneity,
monitored places and, if not used properly, can become a and privacy preservation need to be addressed and carefully
threat to citizen privacy [21]. managed. This systematic literature review aims to delve deep
The rise of edge computing architectures has introduced a into these challenges, exploring the progress, limitations, and
new potential to privacy-preserving CV, since with the proper future directions of applying FL in the realm of CV. The
use of the limited edge resources, it has become possible to review will address several research questions that encapsulate
perform basic CV tasks without transferring sensitive data to the core facets of this intriguing intersection of FL and CV,
the cloud [22]. Sharing the trained model rather than releasing which can summarized in Table I.
the actual data also helps to retain the privacy of the data, During our systematic review, we initially collected 912
without losing on prediction performance [23]. Due to their papers. To ensure the relevance of the literature, we applied
built-in privacy-preserving features, federated learning (FL) basic criteria such as title, abstract, and topic alignment with
[24] and split learning (SL) [25] are the two ML techniques, our research question. We then established detailed inclusion
that perform on visual data on a distributed manner and have and exclusion criteria to streamline the selection process. The
attracted the interest of researchers in the CV field. Such inclusion criteria encompassed papers proposing FL solutions,
approaches assume a client-server (edge-cloud) architecture, discussing the applications of FL in CV, implementing tech-
where the clients usually have fewer resources than the server. niques, or proposing enhanced versions of FL. Conversely, the
The clients train their models individually, using their own exclusion criteria were applied to exclude publications that did
data, and then exchange their models either with the server or not specifically use FL, it was only mentioned in the literature
among them in order to synchronize what they learned [26]. review , used for comparison purposes or used other research
The ML models are trained locally on the client (edge) devices sectors, etc. One author took the lead in the selection strategy
and this happens in parallel, as long as the clients receive and conducted the initial screening, ensuring consistency with
and process training data. Periodically, they exchange models our research theme. After removing duplicates, we identified
and aggregate them following simple or more sophisticated 385 unique articles. We then conducted a thorough assessment
strategies, which may involve additional training on the cloud of the remaining articles by carefully reviewing titles, abstracts,
for avoiding the bias introduced in the clients [27]. In the case and conclusions. Based on this assessment, we narrowed down
of SL the training of the model layers is also split between the selection to 255 articles that exhibited relevance based
the edge and the cloud. Only a few layers are trained on the on title and abstract. In the subsequent stage, we applied the
edge, in order to facilitate training with little resources, and specified inclusion and exclusion criteria to the remaining
the remaining layers are trained on the cloud [28]. articles, leading to the exclusion of certain studies that did not
A risk that emerges from this distributed training process for meet our criteria.
CV models is the exposure to malicious users that intentionally During the selection process, the following exclusion criteria
introduce noise or falsified models in order to bias the FL were adopted: (i) duplicate records, (ii) papers that did not
model for their benefit [29]. Adversarial training techniques can comment on the performance of FL in CV, (iii) papers related to
be employed to strengthen the FL models, but even adversarial the implementation of similar but not FL techniques, (iv) papers
training has leakages [30]. The use of blockchain technologies related to research sectors other than CV, and (v) papers written
can help mitigate many of the threats of data or model-sharing in languages other than English. Furthermore, the following
methods. All the model-sharing activities of a node that is inclusion criteria were used to select relevant literature: (i)
allowed to share are stored as transactions in the blockchain and proposes an improvement an FL-based solution in CV, (ii)
all information about the providers’ profiles is also stored in the addresses the privacy and security issues in CV using FL
blockchain [31]. In this multi-party setup, when a node needs techniques, (iii) measured and optimized model performance
to update its model by using the models of other nodes, first in FL, (iv) discusses the deployment of FL systems in real-
performs a request for models to its neighbouring nodes, then world CV applications
validates the nodes by checking the blockchain and retrieves By applying these exclusion and inclusion criteria, we
their models from the blockchain. The resulting FL model is ensured that the selected articles provided insights, solutions,
consequently stored on the blockchain in a new transaction or advancements specifically related to the filter bubble
[32]. phenomenon in RSss. This process resulted in a final selection
of 28 articles that met our inclusion criteria. In order to ensure
a comprehensive review, we conducted a reference scan of
A. Survey Methodology the selected articles, which led us to identify an additional
FL is emerging as a powerful machine learning paradigm 6 relevant papers. Consequently, a total of 34 articles were
that allows for the training of AI models across numerous included in our systematic review on the existence of the filter
decentralized devices while maintaining data privacy. In bubble.
3

TABLE I
R ESEARCH QUESTIONS COVERED IN THIS REVIEW.

RQ# Question Objective


What is the primary motivation behind conduct- Present the underlying reasons or driving factors for undertaking a compre-
RQ1
ing a review of FL for CV? hensive review specifically on the application of FL in the domain of CV.
How is the problem of data privacy handled in Investigate the mechanisms, strategies, or solutions implemented within the
RQ2
FL for CV? FL framework, specifically for CV tasks, to address and ensure data privacy.
How is model performance measured and opti- Delve into the methods, metrics, and techniques used to evaluate the efficacy
RQ3
mized in FL scenarios? of models trained using FL.
Understand the strategies, techniques, and solutions employed to handle the
How are issues related to data distribution
RQ4 challenges posed by non-Independent and Identically Distributed (non-IID)
across nodes (non-IID data) managed?
data in decentralized or FL environments.
What are the strategies to handle communica- Discuss the various techniques and methods implemented to optimize or
RQ5
tion efficiency in FL? enhance the communication process between nodes or devices in a FL system.
Investigate the various methodologies, techniques, or safeguards that are
What are the approaches to ensure robustness
RQ6 employed to protect FL systems against potential vulnerabilities, attacks, or
and security in FL?
failures.
Explore and understand the practical applications, implementation challenges,
How are FL systems for CV deployed in the
RQ7 infrastructure requirements, and real-world scenarios where FL techniques are
real world?
used specifically in the domain of CV.
What are the challenges and limitations in Identify the potential difficulties, drawbacks, and constraints encountered when
RQ8
applying FL to CV tasks? using FL in the context of CV.
What are the trends and future directions in FL Explore the current trajectories, innovations, and potential advancements in
RQ9
for CV? the application of FL to CV.

B. Related reviews C. Contribution


By contrast to the above-mentioned reviews, this study
While several studies have recently addressed the topic of FL presents to the best of the authors’ knowledge the first
and its various applications, no work has specifically focused review of FL-based contributions in CV. Typically, it (i)
on reviewing the applications of FL in CV. For instance, [33] offers a comparative perspective by presenting related reviews,
discusses various applications of FL in communication and thereby situating its contributions in the broader scholarly
networking systems, including wireless sensor networks, IoT, conversation; (ii) provides foundational knowledge about FL,
and vehicular networks, and highlight some challenges that including its definition and problem formulation; (iii) discusses
need to be addressed to fully realize the potential of FL, such different aggregation methods, covering averaging aggregation,
as security concerns, communication overheads, and model het- progressive Fourier aggregation, and FedGKT aggregation, (iv)
erogeneity. Moving on, [34] provides an extensive examination presents a detailed overview of privacy technologies associated
of FL while reviewing the growing concerns about data privacy with FL, including the Secure MPC model, differential privacy,
and security, (ii) offers design considerations and solutions; and homomorphic encryption, (v) distinguishes and elaborates
(iii) Highlights different privacy-preserving approaches utilized on supervised, unsupervised, and semi-supervised FL and
in FL; and (iv) it suggests potential future paths. Similarly, evaluates CV schemes based on FL.
[35] discusses (i) the key enabling technologies and protocols Moreover, it offers an in-depth look at how FL is employed
used in FL such as secure aggregation, differential privacy, in various CV tasks such as object detection, video surveillance,
homomorphic encryption, etc.; and (ii) challenges faced by FL healthcare, autonomous driving, and more. Moving on, it
such as communication overheads, heterogeneity of devices addresses significant challenges and concerns in the FL domain,
and data, privacy concerns, and security issues. such as communication overhead, device compatibility, and
human bias. Additionally, by pinpointing the specific obstacles
Besides, [36] provides a thorough exploration of the in- that researchers and practitioners might face, the review
creasing applications of FL in IoT networks. It begins by contributes to the growing body of knowledge about the
acknowledging the limitations of centralized AI methods reliant intersection of FL and CV. Lastly, beyond its retrospective
on data collection and privacy concerns. FL, as a collaborative lens, the review also peers into the future, suggesting potential
and distributed AI approach, offers a solution for intelligent IoT avenues of research and development in the field.
applications without data-sharing needs. The authors showcase
various use cases in healthcare, transportation, etc., where FL II. BACKGROUND OF FL
can be utilized in IoT networks. Additionally, they address FL enables the training of AI models without the sharing of
the challenges and potential opportunities linked with FL. training data. While most AI has been trained on data gathered
Nguyen et al. [36] discuss IoT-based FL and its applications and crunched in a unique repository, today’s ML algorithms
and summarizes the technical aspects, key contributions, and are shifting towards a decentralized scheme. ML models can
limitations of each reference work. Moreover, [37] introduces collaboratively be trained on edge devices, e.g., mobile phones,
a novel classification of FL subjects and research domains and laptops, or private servers.
encompasses comprehensive taxonomies that address various
challenging aspects, contributions, and trends in the literature. A. Definition
Additionally, the survey delves into significant challenges and By overviewing recent studies of FL frameworks proposed
potential research paths for developing more robust FL systems. in CV applications, FL models can be categorized into three
4

types: with encryption to overcome the problem of data sample


overlapping at distributed clients during the local training
Horizontal FL (HFL): refers to training a shared overall [39]. With VFL different clients can cooperatively train their
model by the clients using their datasets, which are character- models without sharing their data (that differ in features), by
ized by the same feature space but have different sample spaces, only sharing their predictions for the samples they have in
as portrayed in Fig. 1. In this regard, local FL participants common. Let us consider a scenario with two parties, party
can adopt the same AI model (e.g., neural network-NN) for A and party B. Party A holds a dataset with features XA
training their datasets. Afterwards, the server combines the and labels YA , while party B holds a dataset with features
local updates communicated by the local clients to develop XB and labels YB . The goal is to jointly train a model
the overall update without accessing local data [38]. that leverages the complementary information from both
Let us consider a HFL scenario with N parties. Each party parties’ datasets without sharing their raw data. The training
i, where i ∈ {1, 2, . . . , N }, has a local dataset denoted by process in VFL typically involves the following steps: (1)
Di . Each Di consists of a set of input data points Xi and The datasets at party A and party B need to be pre-processed
corresponding labels Yi . The goal is to train a global model to ensure compatibilities and feature alignment, such as data
denoted by f , which can generalize well on all parties’ datasets. transformation, feature mapping, etc. (2) Initialize the model
The model f is typically represented as a parameterized parameters θA , θB , and θ of the local and global models. (3)
function, such as a neural network, with trainable parameters Each party independently performs local training on its own
denoted by θ. The training process in HFL typically involves dataset using its local model parameters θA , and θB . Party
the following steps: (1) the process starts with the initialization A uses XA and YA , while party B uses XB and YB . Local
of local model parameters θi for each participating party i. (2) loss functions LA (θ) and LB (θ) are optimized to update the
Then, each party performs local training on its own dataset local model parameters and reduce the discrepancies between
Di using its local model parameters θi . During local training, the local model’s predictions and the corresponding labels in
a local loss function Li (θi ) is optimized to minimize the each party’s dataset. The steps (4), (5), and (6) are similar to
discrepancy between the local model’s predictions and the steps (3), (4), and (5) of HFL. The mathematical formulation
corresponding labels in Di . The aim is to update the local of L(θ) in VFL, for N parties, remains the same as the HFL
model parameters θi and reduce the local loss Li . (3) After case (Equation 1).
local training, the parties encrypt their gradients calculated with
respect to their local loss functions. The encrypted gradients Federated transfer learning (FTL): handles datasets with
ensure that the parties’ local updates remain private. The limited overlap both in the sample space and in the feature
encrypted gradients are then sent to a trusted aggregator spaces, as shown in Fig. 3. Unlike VFL systems the number
or a central server for further processing. (4) The trusted of common samples and thus labels that can be shared
aggregator or central server performs secure aggregation on across the models is small thus limiting their reusability
the encrypted gradients received from the parties. Secure across the different models. A common transfer space is
aggregation techniques, such as secure multiparty computation used for mapping the different feature spaces and exchanging
(MPC) or homomorphic encryption, are employed to aggregate knowledge. Specifically, adopting transfer learning methods
the gradients while maintaining privacy. The aggregator can helps calculate feature values from distinct feature spaces,
compute the average of the encrypted gradients or perform which is utilized for training local datasets [40], [10].
more sophisticated aggregation schemes, such as averaging Let us consider a scenario with N parties, where each party
the parameters, of θi to obtain the global model parameters θ. i, where i ∈ {1, 2, . . . , N }, has a local dataset denoted by
(5) Once the secure aggregation is completed, the aggregated Di and a task denoted by Ti . The goal in FTL is to jointly
gradients are decrypted by the trusted aggregator or central train a global model that leverages the knowledge learned
server. The global model parameters θ are then updated using from different tasks. The training process in FTL typically
the decrypted aggregated gradients. This update ensures that involves the following steps: (1) Each party i performs local
the global model benefits from the collective knowledge of training on its own Di and task-specific model parameters
all parties while preserving the privacy of individual data. θi . During this step, the local model optimizes a task-specific
The mathematical formulation of HFL objective typically loss function Li (θi ) to minimize the discrepancy between the
involves minimizing a global loss function L(θ) over all parties’ local model’s predictions and the corresponding labels in Di .
datasets. This can be expressed as: (2) The parties communicate and align their models (shared
layers or parameters) to create a joint model architecture.
N
X (3) This transfer of knowledge, from party i to the global
L(θ) = Li (θi ) + λ.R(θ), (1)
model noted as Ki(θ), can occur through parameter sharing
i=1
PN or feature extraction. steps (4) and (5) remain the same as
where i=1 Li (θi ) represents the sum of local loss functions HFL. The global loss function L(θ) in FTL typically includes
for all parties, λ is a regularization parameter, and R(θ) a combination of task-specific loss terms and transfer-related
represents a regularization term that helps prevent overfitting terms.
and encourages model simplicity. In smart healthcare, FTL can support disease diagnosis
by collaborating countries with multiple hospitals that have
Vertical FL (VFL): as illustrated in Fig. 2, VFL trains different patients (sample space) and different monitor and
ML models on datasets having the same sample space, therapeutic programs (feature space). In this way, FTL can
but different feature spaces. In this context, techniques enrich the shared AI model output for improving the accuracy
that rely on entity alignment can be used in combination of diagnosis.
5

Global Model

5
Aggregator

1 1
4 2 1 Initialization

2 Local Training
3 3
3
3
1 Encrypt and Send
1 Gradients

4 Secure Aggregation

2 2 2 Decrypt and Update


5
Global Model
Model 1 Model 2 Model N

Database 1 Database 2 Database n

Fig. 1. Architecture of HFL used in CV.

their model updates directly with each other, whereas


Collaborative learning (CL): CL refers to a cooperative in FL, the updates are aggregated on a central server or
approach in which multiple entities actively participate and coordinator.
share their data and model updates to collectively train a shared • Privacy and Security: CL and FL prioritize privacy
model. It involves a collaborative effort where participants and security. They aim to protect sensitive data by
collaborate and contribute to the model’s training by sharing keeping it decentralized and only exchanging model
their local knowledge and insights [41]. CL, within the context updates or gradients instead of raw data. This ensures
of FL, specifically refers to the scenario where participants that participants’ data remains private while enabling
actively collaborate and exchange data and model updates collaborative model training.
among themselves to collectively improve the shared model • Iterative Improvement: Both methods involve iterative
[42]. In this sense, CL represents a specific instance of FL improvement, where the shared model is updated based
that emphasizes the collaborative nature of model training, on the aggregated information from all participants.
where participants engage in direct data and model parameter The process of local updates, aggregation, and model
sharing with each other, while still adhering to the principles refinement is repeated until convergence is achieved.
of decentralized data ownership and privacy preservation While CL and FL share similarities, there are also some key
characteristic of FL [43]. CL can be justified as a form of FL differences between the two approaches:
based on the following key aspects: • Data Ownership: In CL, all participants typically have
• Data Distribution: In CL, data is distributed across access to and ownership of the entire dataset. They
multiple entities or participants who collaborate to train actively share and collaborate on the data and model
a shared model. Similarly, FL involves training a model updates. In contrast, FL operates on decentralized data,
on decentralized data stored across multiple devices or where each participant retains ownership and control over
servers, where each device contributes its local data. their local data, which is not directly accessible to other
• Local Model Updates: In both CL and FL, individual participants.
participants or devices perform local model updates using • Centralization vs. Decentralization: CL tends to have
their respective data. These updates capture the specific a more centralized approach, where participants directly
patterns and characteristics present in their local data. communicate and share their updates with each other. FL,
• Aggregation and Collaboration: Both approaches in- on the other hand, employs a decentralized approach,
volve aggregating the local model updates from multiple where local updates are sent to a central server or
participants or devices. In CL, participants typically share coordinator for aggregation and model updates.
6

Global Model

Aggregator A A

2 2
5

4 4
B

Data of Client A Data of Client B


3 3

Model A Encrypted model trained Model B


A Sending public key

B Results to be exchanged

Encrypted data alignement

1 1 Data Preprocessing

2 Initialization

3 Local Training

H
H 4
1 Encrypt and Send
Gradients

5 Secure Aggregation

Decrypt and Update

X
6
Global Model

Data of Client A No data exchange Data of Client B

Fig. 2. Architecture of VFL used in CV.

• Communication Overhead: In CL, participants typically CL may face challenges in terms of scalability, as direct
need to establish direct communication channels to communication and coordination among all participants
exchange data and model updates. This can require become increasingly complex with a larger number of
significant communication overhead, especially when entities.
the number of participants is large. FL reduces this These differences highlight the varying degrees of decentraliza-
communication overhead by relying on a central server tion, data ownership, communication, and scalability between
or coordinator that facilitates the aggregation process. CL and FL. While they share common principles, the specific
• Privacy and Security Focus: While both approaches implementation and focus of each approach can differ based
prioritize privacy and security, FL places a stronger on the context and requirements of the collaborative training
emphasis on data privacy. FL minimizes the exposure of scenario.
raw data by exchanging only model updates or gradients
between participants and the central server. CL may
involve more direct sharing of data or model parameters, B. Problem formulation
which can introduce potential privacy risks. Konečnỳ et al.’s work has made FL well-known, but there are
• Scalability: FL is particularly suitable for large-scale dis- other definitions of the concept available in the literature [44].
tributed environments with a large number of participants FL can be realized through various topologies and compute
or devices. Its decentralized nature allows for scalability plans despite a shared objective of combining knowledge
and efficient training across a vast network of devices. from non-co-located data. This section aims to provide a
detailed explanation of what FL is, while also highlighting the
7

Global Model

5
Aggregator

1 1
4 2 1 Task-Specific Pre-training
Transfer learning

2 Model Alignment
3 3
3
3
1 Knowledge Transfer to the
1 Global Model

4 Secure Aggregation

Target model N
Target model 1

Target model 2

2 2 2 Decrypt and Global


5
Model Training
Model 1 Model 2 Model N

Database 1 Database 2 Database n

Federated learning

Fig. 3. Architecture of FTL used in CV.

significant challenges and technical considerations that arise which can be a useful approach to handle the statistical
when using FL in CV. heterogeneity of the data in FL. While FL and classical
The goal of FL is generally to build a global statistical model distributed learning both aim to minimize the empirical risk
using data from many remote devices, which can range from across distributed entities, there are several challenges that
tens to millions in number. This process involves minimizing must be addressed when implementing this objective in a
an objective function that captures the desired characteristics federated setting.
of the model.
m
X C. Aggregation approaches in FL
minF (w), where F (w) := pk Fk (w) (2)
w
k=1
Following local training, the models are combined using an
aggregation algorithm. To be more specific, each model takes
where m is the overall number of devices/centers, Fk is the an update step in its respective center k, utilizing a learning
local objective function for the k th device/center, and pk rate η and the gradients gk , so that [46]:
represents relative impact of every device/center with pk ≥ 0
m
P
and pk = 1. k
wt+1 = wt − η.gk , ∀k (3)
k=1
The empirical risk for a local dataset can be expressed Subsequently, the weights are gathered into the global model
using the local objective function Fk . The relative weight in a manner that is relative to the sample size of each center
or influence of each device, represented by pk , can be set [46]:
by the user. There are two common choices for setting pk :
1
pk = m or pk = nk
n , where n is the total number of samples
m
X nk k
from all devices. Although this is a common FL objective, Wt+1 = wt+1 (4)
n
there are other alternatives, such as multi-task learning [45], k=1

where related local models are learned simultaneously and each There are numerous FL aggregation techniques in the
client corresponds to a task. Both multi-task learning and meta- literature. In the following, we will discuss some of the most
learning allow for personalized or device-specific modeling, useful techniques specifically applicable to CV-based FL.
8

● Communication overhead
● Heterogeneity of client nodes
● Non-IID data ● Averaging
● Device compatibility ● Weighted variant
● Network issues ● Deployment over heterogeneous ● Federated averaging ● Data is distributed
● Human bias environments ● Trimmed mean ● Entities share their model updates
● Privacy and robustness to attacks ● Efficient communication ● Krum-based directly with each other.
● Performance ● Dispersed FL ● Majority voting ● Access to the entire dataset
● Replication of research results ● New organizational models ● Quantization and compression ● More centralized approach

Aggregation Collaborative
Open challenges Future directions
techniques learning

Federated Learning in
Computer Vision

Data Partition Learning process Privacy level Application

Federated
K-means Local
Vertical Unsupervised
Differential
Federated PCA Object detection
privacy
Global
Federated NN Video
Horizontal
surveillance
Blockchain
Federated SVM
Supervised Secure Face detection
Federated aggregation
transfer learning Federated DT
Cryptographic Hommomorphic
Federated linear Medical AI
techniques encryption
models

MPC Autonomous
FSSL
driving

Semi-supervised Few-shot

Self-supervised

Fig. 4. Taxonomy of FL frameworks for CV applications.

These techniques include: Fast Fourier Transform (FFT) is to obtain both the amplitude
map and the phase map of the parameters (conversion from
1) Averaging aggregation: There are many federated averag- parameters to map). The inverse FFT (IFFT) is used for the
ing (FedAvg) aggregation strategies in the literature. FedAvg inverse operation (conversion from map to parameters). Fig. 6
algorithm is thoroughly detailed in [33]. Fig. 5 provides a illustrates the principle of the PFA algorithm, which can be
summary of the main differences, characteristics, and principles employed as an aggregation technique in FL.
among the alternatives.
2) Progressive Fourier aggregation: The paper [47] 3) FedGKT aggregation: FL group knowledge transfer
addressed the retrogress and class imbalance problems using (FedGKT) represents a streamlined FL approach tailored to
a personalized FL approach. Precisely, for better integrating edge devices with limited resources. Its primary objective
the parameters of client models in the frequency domain, a is to combine the advantages of FedAvg and split learning
progressive Fourier aggregation (PFA) is used at the server. (SL) by employing local stochastic gradient descent (SGD)
Next, a deputy-enhanced transfer (DET) is designed at the training, similar to FedAvg, while simultaneously ensuring
clients’ side to easily share overall knowledge with the low computational burden at the edge, akin to SL. Notably,
personalized local model. In particular, the approach involves FedGKT facilitates the seamless transmission of knowledge
the development of PFA at the server, which ensures a stable from numerous compact edge-trained convolutional neural
and efficient gathering of global knowledge. This is achieved networks (CNNs) to a significantly larger CNN trained at a
by gradually integrating client models from low-frequency cloud server. Fig. 7 illustrates the overview of FedGKT, (i) at
to high-frequency. Additionally, the authors introduce a the edge device, a compact CNN is employed, comprising a
deputy model at the client’s end to receive the aggregated lightweight feature extractor and classifier, efficiently trained
server model. This facilitates the implementation of the using its private data (local training). (ii) Following local
DET strategy, which follows three types of decisions (dk ): training, consensus is reached among all edge nodes to
Recover-Exchange-Sublimate. These steps aim to enhance generate uniform tensor dimensions as output from the feature
the personalized local model by smoothly transferring global extractor. Subsequently, the larger server model is trained,
knowledge. In this process, the advantage of utilizing the wherein the extracted features from the edge-side model
9

Characteristics
Characteristics
● It has been demonstrated to exhibit
● This approach is highly appropriate for wireless
Characteristics divergence in scenarios where the
topologies, where network conditions and user
data is not identically distributed
availability can undergo rapid changes.
● It is possible to adapt the central model's among clients.
● Turbo-aggregate's secure aggregation handles
size to the diversity of data distribution through ● This approach prevents participating
user dropouts but lacks adaptability for new users.
the application of a Bayesian non-parametric devices from performing varying
Extending it with a self-configurable protocol could
Technique. amounts of local work based on their
accommodate on-the-go new users by adjusting
● The Bayesian non-parametric mechanism underlying system constraints.
system specifications such as encoding configuration
is susceptible to model poisoning attacks,
and clustering for resilience and privacy guarantees.
wherein adversaries can manipulate the system
Principle
to expand the global model to lodge poisoned
local models. The clients execute several Principle
batch updates on their local
data and then send the updated Is a multi-group approach where clients
weights instead of the gradients are categorized into multiple groups,
Principle to the server. and model updates are shared among
Before aggregating, consider these groups in a cyclic fashion. Besides,
the permutation invariance of an additive secret sharing mechanism
neurons to enable updating Statistical-oriented is employed to safeguard the privacy
the global model's size. of clients' data.

FedAvg
Statistical-oriented Communication and security-oriented

FedMA Federated averaging Turbo-Aggregate

FedPAQ FedProx HierFAVG

Statistical-oriented
Communication-oriented Communication-oriented

Principle Principle Principle

Allow clients to perform multiple Include a term close to the local training A hierarchical client-edge-cloud aggregation
local updates on the model architecture is implemented, where edge servers
sub-problem on every client device to
before sharing those updates aggregate model updates from their respective
restrict the impact of each local model
with the server. clients and subsequently send them to the
update on the global model.
cloud server for global aggregation.

Characteristics Characteristics Characteristics

● In FedPAQ, the updated global ● This multi-tier arrangement facilitates a more


● The suggestion aims to enhance
model is calculated as the average streamlined model exchange within the current
convergence on statistically heteroge-
of the local models, leading to signi- edge-cloud architecture, enhancing efficiency.
neous data.
ficant complexity in both strongly ● It is susceptible to device stragglers and end
● During the global aggregation phase,
convex and non-convex scenarios. device drop-outs.
all devices are given equal weight,
without considering the variations in
device capabilities.

Fig. 5. Overview of key federated averaging Aggregation Approaches.

serve as inputs. The training process employs a knowledge requires a substantially smaller number of parameters, ranging
distillation (KD)-based loss function [10] to minimize the from 54 to 105 times fewer.
disparity between ground truth and soft label predictions,
representing probabilistic estimations obtained from the edge- D. Privacy technologies of FL
side model ( Periodic transfer). (iii) To enhance the edge FL places a strong emphasis on privacy management, which
model’s performance, the server conveys its predicted soft involves analyzing security models in order to effectively pro-
labels to the edge, enabling the edge to conduct further training tect personal information. In this section, various technologies
on its local dataset using a KD-based loss function with the that are currently used to safeguard privacy in the context of
server’s soft labels ( Transfer back). As a result, both the server FL will be discussed.
and edge models mutually benefit from knowledge exchange, 1) Secure MPC model: Secure MPC (SMPC) models
leading to performance improvement. (iv) After completion involve data from multiple parties and use a well-defined
of the training process, the final model is a fusion of the simulation framework to provide safety certification. These
local feature extractor and the shared server model ( Edge- models are designed to ensure that there is no interaction or
sided model) [48]. The researchers who proposed FedGKT sharing of knowledge data between the parties, meaning that
[48] claimed that it enables cost-effective edge training by users are unaware of the input and output data as well as any
significantly reducing computational expenses. In comparison other information [49]. This zero-knowledge model can be
to edge training with FedAvg, FedGKT demands only 9 to considered a form of secure and complex computing protocol
17 times less computational power (measured in FLOPs) and that is based on publicly available knowledge. Research has
10
0 123456782149732 6 7 6 2 9 62
;<
*#+,#+
:;<=>?:@;<
 78 7 79 23456
 

-0 -. -/
 

 

 
 
BCD
-A

10 1. 1/ S4GTU43EFGHIJK4LMNOPQIR4

!"#$%&' !"#$%&( !"#$%&) 1A


VWXYZY[45\5]^_6 àb5 ]5c]_]5^^d]5^àa56ce]f5g_]h2icc45^5]j5]7k]_]5^^aj5
Fig. 6. Principle of PFA algorithm.

l_9]a5]i ]5 ca_6mklina6c5]c45!^-.*- `_o!/(.-


`h.65#.-
_g5̀" 5a6c45e]5p956qr _f a62ic
5q4qà56c7s5\9crdt64 6q5 []6^e5]mst[na ^5f\`_r5 c_af\]_j5c45`_q`\5]d [\j
^_6 àb5 f_ 5`ga
c4_9co5a6 a^c99]o5orc45q9_ff9 6aq c
a_62

„v$~&v'%x)/z0{-'†}‡€yx}v€ˆvxyz†€w†wˆyxy‰ tak̀\T_Z

uvwvxyzvz{vyuuxvuyzv|}vx~vx€|+v!‚,-ƒ€ '(#.( [\jY\_qn
vzvx}ˆy‡v~†€vwzŠ|vuxy|v}z{v€|vˆvx‹€xy w ‡v€ w‡ †
v w z} y}z { vŒ Ž Œ  Œ‘‘
€‹’v|“~u”•–01 —y23
w|’ v|˜™”•š—€›}vx~v|†w’†6u7‚•
456088 89:‚œ€y
;<=9=>7v~†yzvz{vŒŽŒŒ‘‘
†w|‡v|›Šyuuxvuyz†€w „vˆx€ˆ€}vz{vžx€uxv}}†~v’€x†vx“uuxvuyz†€wŸž’“
z€}zy›Š† 9wzvuxyzv‡†vw9z€
|v}†wz{v‹xv¡vw‡Š|€y†w‚
¢w}ˆ†xv|›Šz{v‹y‡zz{yz€„‰‹xv¡vw‡Š‡€ˆ€wvwz}€‹ˆyxyvzvx}yxvzRS {vTUVWXYVWZ[
›y}†}‹€xz {vwvz„€x£‡9yˆy›††zŠ”••—€xž’“ yuuxvuyzv}z{vxvyz†~vŠ€„‰
‹xv¡vw‡Š‡€ˆ
9  €wvwz}€ ‹ˆ
y
xyvzvx}z€}{yxv£w€„ v|uv‹9x€ |†¤vxvwz‡†vwz}
„{†vxvzxy†w†wuz{v†x{†u{‰‹xv¡vw‡Š‡€ˆ€wvwz} „{†‡{yЇ€wzy†w}ˆv‡†¥‡
£w€„v|uv‹ €xvy‡{†w|†~9†|y‡†vwz‚¦ˆv‡†¥‡yŠ‹€xy‡€w~¬­®­¯ €z†€w°y­¯±yŠvx€‹§‰z{XS__`\Y\abZc
‡†vwz€|v„v¥
9  x}zxv} {
y
ˆ
v†z}ˆyxyvzvxzvw}€x¨© ª« †wz€y²‰³
yzx†´¨©µ ª«¯°¬­¯±® „{vxv¶yw|·xvˆxv}vwzz{v€zˆzyw|†wˆz‡{ywwve}RZ]fg\hZiaf̀jj
yw|¸¹¼yw|¸º!y"#$xvz%"&{
'%v} ˆyz†y½}{yˆ
()()* v€‹z{v£vxw1v!&‚'%œ{
)/0v
-w
'2%„v€
$3 ›zy† wz{vy¼ˆ †Âz|v
yˆ» yw|ˆ{y}vyˆ» z{x€u{z{v’€x†vxzxyw}‹€x » ¾» ¿ y} ÀÁ
‹€€„}à ?@ABCDEFG@DHGI@GJKEFHLJHMÏ@CNGLOCEJØIEDF@GPQEF ?wAxyy@FMzH
}HÆI{¾FE~
Fig. 7. Principle of FedGKT algorithm.

»Ÿ¨µ ŸÄÅ Ç ¨€E


µ
ŸËÅQÌLF
¿{
ÍÀºÎC
@ÑD°HÐÒLÓÔGLÑQ±ÕÖ}×EÅJÙEºF¾Ú•
@DEJ‚EÅ @FGHG Ÿ•IƒFL{K„
© ©
ÈÉÊ
LF@ PPv†{ED zEMwzvC HEGۇD†zvw@zPEŠG}L †w{ IzM LK{ܗD‚œ€v @DHLG´@zCxyK‡zzLOE FOH
€„‰D zƒ†‡P
shown that SMPC can be used to create a security model that measure for data privacy, where it processes and obscures

„{†‡{‡ yw› ˆvv |v


improves computing efficiency in low-security environments
uz {v’’œ” {v
sensitive attributes. This practice makes it challenging for

‹xv¡v{wG FE
‡Š‡@ C
€ˆ HP €DH
wMH
vwzG@F
}‹€xyE u@
uxC
v‰uOLyz†€F wCJP„vv Š PD
ˆE ‹
€Šy€LD „‰‹x@vM¡NvC
wED z
‡Šy EM L
}£Ý K„†{z{D
@DH
LG@C C

[50]. The MPC protocol enables model training and verification third parties to identify individual users, thus rendering the

Þvx€~K y@
F
v@CC´E‡CvHPˆ‰
z‹€w @{Pv‡
EJPvwzK
xyC HxDvCuE†@€FwGÃHGI?Œ‚A~~ˆ~ŽK@FDHDHLGP@C@FIELJE
without requiring users to share sensitive data. However, SMPC data irrecoverable [52], [53]. However, this method necessitates
}v xz
has certain limitations. It is unable to prevent servers from the transfer of data, which can negatively impact data accuracy,

DzEGE{F@C@ݟ FMÄÅzHDÆEM¾D{FED LD áâz EM CLã{¯J¬ÉˆÍãw{¯D®äŒã‚z


¯ ®å@ ÅP@PE|EFEPDF Ÿ@
²IICEFKFLwC
being curious and may not provide sufficient protection against indicating a compromise between accuracy and privacy in such

ßÓÉ×à Íã
privacy attacks from other clients. Additionally, the four-round
¯ ¬ä
scenarios. In addition, some researchers have suggested the

HDEF@DšHÅLšGFçèE x‘vˆ{xH
vF}E
vwP{ CDH€K„‰CEF ‹xv¡L{vG
interactive nature of the protocol can lead to wasted data and
„{vxv檟 z}z{v
reduced model accuracy as the server does not have access to
wJ
°
‡ŠzPL{QxvM}{L€{
°
incorporation
GHM{@v‡DHvLwGw
of±differential
±
| yw|z E‹¨
DOE©EGDzEPEF
privacy within FL to safeguard
zvx€
client data by masking client contributions duringµ training

†}}vzy }z{z
’GD v‡
HPK€€x@ |K†wEyF
zˆvŸ šÅš‚˜Šy
OEK FLKL~PvEƒF xyu†wuz
the client data until the submission phase is completed [51].

L{{K„G
v€„‰‹L xvOC¡vEwJ‡I
ЇE€ˆF@€GwvPwQzE}€
F?~}vxEJƒ„ Aˆ
[54]. For instance, [55] introduced a differentially private
FL framework for analyzing histopathology images, which

QF@EOLFNQLFFEPL{FME‰MLGPDF@HGEJEJIEJE|HMEP‹}EJƒ„ @HPDL
2) Differential privacy: Differential privacy technology, as
indicated by past studies, has been employed as a protective are considered some of the most intricate medical images.

}EJB|I”@GJŒ‚~~ˆ~ŽwŠDF@HGHGI{PHGICLM@CŒƒ•@PHG}EJB|
JE@GJ@DDzEEJIE@PHGŒ‚‹}EJƒ„ M@GDF@GPQEFNGLOCEJIEQFL
11

The authors assessed how data distribution, the number of


healthcare providers, and individual dataset sizes influenced Server
the framework’s performance. They used the cancer genome
Atlas (TCGA) dataset, an openly accessible repository, to

FedAvg
mimic a distributed environment. Furthermore, [56] presented
an FL framework capable of generating a global model from
distributed health data stored in diverse locations, without the
need to move or share raw data. This framework also ensures
privacy protection through differential privacy. An evaluation
using real electronic health data from a million patients across
two healthcare applications indicated the framework’s success
in delivering considerable privacy without compromising the Client 1 Client 3
global model’s utility.
Differential privacy can be used in the training of DNN Client 2
through the use of the differentially private stochastic gradient

supervised learning
descent (DP-SGD) algorithm. [57] introduces deepee, a free
labelled data
and open-source framework for differentially private DL that
is compatible with the PyTorch DL framework for medical
image analysis. This study uses parallel processing to calculate
and modify the gradients for each sample. This process is
made efficient through the use of a data structure that stores
shared memory references to the neural network weights Fig. 8. The basic setup of a typical FL system with supervised learning
in order to save memory. It also provides specialized data involves several steps. Initially, the server picks three clients and transmits the
loading procedures and privacy budget tracking based on global model wtg to them. Subsequently, these clients employ their labeled
to update the global model wtg , generating their respective local models.
the Gaussian differential privacy framework, as well as the data Once this process is completed, the clients send their updated local models
ability to automatically modify the user-provided neural back to the server. The server then aggregates these local models into a new
network architecture to ensure that it adheres to DP standards. global model using the FedAvg algorithm [62].
Besides, [58] incorporates FL into the DL of medical image
analysis models to enhance the protection of local models and
prevent adversaries from inferring private medical data through privacy, using blockchain ledger technology to decentralize the
attacks such as model reconstruction or model inversion. FL process without the need for a central server. Specifically,
Cryptographic techniques such as masks and homomorphic homomorphic encryption protects the privacy of the model’s
encryption are utilized. Instead of relying on the size of the gradients. This framework involves training a local model
datasets, as is commonly done in DL, the contribution rate using a capsule network for the segmentation and classification
of each local model to the global model during each training of COVID-19 images, securing the local model with the
epoch is determined based on the qualities of the datasets homomorphic encryption scheme, and sharing the model over a
owned by the participating entities. Additionally, a dropout- decentralized platform using a blockchain-based FL algorithm.
tolerant approach for FL is presented in which the process
is not interrupted if the number of online clients is above a
E. Learning process
predetermined threshold.
3) Homomorphic encryption: Homomorphic encryption 1) Supervised FL: Supervised FL trains ML models on
is a type of encryption used in the ML process that uses sensitive labeled data across multiple devices and learns to
parameter exchange to protect the privacy of user data. Unlike predict an output based on the input data. Each device runs
differential privacy, it does not transmit the data or models the model and updates its parameters based on its local data,
themselves and encrypts the data without allowing it to be and then the updates are aggregated to improve the global
discovered. This makes it less likely for the original data to model. The supervision is ensured through the presence of
be leaked. The additive homomorphic encryption model is labeled data in the participating entities’ local datasets, and
the most commonly used version in practice. Besides, despite the learning process can be implemented using support vector
the benefits of FL, there is a risk that private or sensitive machines (SVM) [61], linear models (LMs), neural networks
personal data may be exposed through membership attacks (NN), or decision trees (DT) [34]. Fig. 8 illustrates an example
when model parameters or summary statistics are shared of a typical supervised FL with three clients.
with a central site. To address this issue, [59] presents a 2) Unsupervised FL: Unsupervised FL is a variant of FL
secure FL-based framework that employs fully homomorphic in which the participating devices do not have access to
encryption (FHE). In doing so, the Cheon-Kim-Kim-Song labeled data. Instead, they must rely on unsupervised learning
(CKKS) construction has been used, which enables the techniques, such as clustering or dimensionality reduction, to
execution of approximate calculations, on real and floating- learn useful representations of the data. This can be useful
point numbers benefiting from ciphertext rescaling and packing. in scenarios where collecting and annotating labeled data is
Moving on, [60] introduces a homomorphic encryption and difficult or expensive, or where the data is highly sensitive and
blockchain-based privacy-preserving aggregation framework cannot be shared with a centralized server. The learning process
for medical image analysis. This allows hospitals to collaborate can be implemented using federated principal component
and train encrypted federated models while maintaining data analysis (PCA) [63], federated k-means [64], etc.
12

3) Semi-supervised FL: FL allows training ML algorithms a small amount of labeled private facial expression data to
with a semi-supervised learning (SSL) process on remote train local models on individual devices is proposed. They
datasets without the need to share the data itself. However, are then aggregated in a central server to create a globally
data annotation remains a challenge, especially in fields like optimal model. FL is also used to update the feature extractor
medicine and surgery where specialized knowledge is often network on unlabeled private facial data from user devices to
required. To address this issue, semi-supervised FL has recently learn robust face representations.
been used where the participating devices have access to a
dataset that contains both labeled and unlabeled examples. The F. Evaluation of FL-based CV scheme
labeled examples are used to train a model using supervised
learning techniques, while the unlabeled examples are used to To assess the efficiency of FL systems in CV, it is essential to
learn additional useful features using unsupervised techniques employ metrics that are responsive in gauging the performance
such as clustering or dimensionality reduction. Fig. 9 illustrates of implementing FL across various CV applications. Besides
an example of a typical semi-supervised FL with three clients. the famous metrics that are commonly employed in ML and
In this direction, Kassem et al. [65] propose FedCy, a DL, such as accuracy, F1-score, and area under the curve
federated SSL (FSSL) system that combines FL with self- (AUC) [70], other metrics are summarized in Table II.
supervised learning to improve the performance of surgical
phase recognition using a decentralized dataset containing both III. A PPLICATIONS OF FL IN CV
labeled and unlabeled videos. FedCy uses temporal patterns in FL has diverse applications. It enables privacy-preserving
the labeled data to guide the unsupervised training of unlabeled object detection, face detection, and video surveillance in smart
data towards task-specific features for phase recognition. This environments. It also facilitates advancements in healthcare
scheme outperforms state-of-the-art (SOTA) FSSL methods and medical AI, and plays a crucial role in autonomous driving
on the task of automatically recognizing surgical phases using within CV. Below, the survey provides detailed information
a multi-institutional dataset of laparoscopic cholecystectomy about each of these domains.
videos. Additionally, it learns more generalizable features when
tested on data from an unseen domain. [66] examines the
effectiveness of SOTA video SSL techniques when used in A. Object detection
a large-scale FL setting, as simulated using the kinetics-400 A special CV task is the segmentation of images and more
dataset. The limitations of these techniques in this context are specifically the detection of salient objects in one or multiple
identified before introducing a new federated SSL framework images. Salient object detection (SOD) is an important pre-
for a video called FedVSSL. FedVSSL incorporates various processing step in many CV tasks, such as object detection and
aggregation strategies and partial weight updates and has been tracking, semantic segmentation, and the interaction between
shown through experiments to outperform centralized SOTA by robots and humans (e.g. for tracing human hands during
6.66% on UCF-101 and 5.13% on HMDB-51 in downstream imitation learning scenarios).
retrieval tasks. 1) CL for object detection: CL techniques that combine the
Moving forward, [67] presents a self-supervised privacy merits of multiple feature learning techniques from images
preservation framework for action recognition, namely SPAct. seem to be the most promising technique. The main issue to
It includes three main components: (i) an anonymization handle is the separation of the object from its background,
function, (ii) a self-supervised privacy removal module, and any foreground occlusions, or noise. Other important issues
(iii) an action recognition module. A minimax optimization related to the size and variety of objects to be detected in the
strategy is used to train this framework, which minimizes the same image and the need to track objects across consecutive or
action recognition cost function and maximizes the privacy cost groups of images. For example, the work in [71] emphasized
function through a contrastive self-supervised loss. By using on the use of RGB-D methods that combine RGB images
existing protocols for known action and privacy attributes, this with depth images in order to improve the SOD across
framework achieves a good balance between action recognition multiple images. They proposed a CL framework (CoNet),
and privacy protection, similar to the current SOTA supervised which combines the edge information extracted from low-level
methods. Additionally, a new protocol to test the generalization features of the images with the merits of a spatial attention map
of the learned anonymization function to novel action and that detects salient features in the images and depth images
privacy attributes is introduced. that better locate salient objects in a scene. The three different
In [68], FedUTN is proposed which is an FL allowing each collaborators are combined in a knowledge collector module
client to train a model that works well on both independently that first concatenates salient and edge features to jointly learn
and identically distributed (IID) and non-independent and the boundaries and locations of salient objects and then uses
identically distributed (non-IID) data. In this framework, each the depth information to separate salient regions from their
party has two networks, a target network, and an online background. Another work [72] presents a NN architecture for
network. FedUTN uses the online network parameters from content-aware segmentation of sets of images that employs co-
each terminal to update the target network, which is different saliency maps generated from the input images using a group
from the method used in previous studies. FedUTN also CL framework (GCoNet). The proposed method outperforms
introduces a new control algorithm for training. After testing, standard SOD alternatives and is capable of detecting co-
it was found that FedUTN’s method of aggregation is simpler, salient objects in real-time. The main criteria employed are
more effective, and more robust than other FL algorithms, the compactness of the extracted objects within the group of
and outperforms the SOTA algorithm by 0.5%-1.6% under images and the separability of objects from other noise objects
normal conditions. In [69], a few-shot FL framework utilizing in the scenes.
13

encoder decoder Server


.encoder()

supervised learning
labelled data ( ) labelled representation ( )

encode
FedAvg

, , ,

Client 1 Client 2 Client 3

unsupervised learning
unlabelled data

Fig. 9. The structure of our semi-supervised FL system involves a server and three clients, which adhere to the standard FL process to update a global
a ag
autoencoder wt g . Next, the server employs the encoder from wt+1 to encode the labeled dataset D on the server, resulting in a labeled representation
dataset D′ . Afterward, the server utilizes supervised learning with D′ to train a classifier wts , producing a new classifier wt+1
s [62].

TABLE II
T HE EVALUATION METRICS USED IN THE SUGGESTED FL- BASED CV SCHEMES .

Metric Math description Characteristic


The average precision (AP) is commonly used in information retrieval
and object detection tasks. It measures the precision of a model across
N
1 X different levels of recall. N represents the total number of retrieved
AP P (k) · rel(k)
N items, P (k) denotes the precision at the k-th retrieved item, and rel(k)
k=1
indicates whether the k-th retrieved item is relevant (1 if relevant, 0 if
not relevant).
The mean average precision (mAP) is often employed to assess the
Nclass
1 X performance of models that generate multiple predictions or retrieve
mAP APc
Nclass ranked results. Nclass represents the total number of classes, APc denotes
c=1
the AP for class c.
The mean absolute error (MAE) measures the average absolute
H X
W
1 X difference between each pixel in a normalized prediction saliency
MAE (P (i, j) − G(i, j))
H ×W i j map P (i, j) and its corresponding pixel in the binary ground truth
mask G(i, j). H, and W represent the image’s height and width.
v Is a widely used metric to measure the difference between prediction
u M X N
u 1 X and ground truth. It calculates the average magnitude of squared
RMSE t (P (i, j) − G(i, j))2
M × N i=1 j=1 differences between corresponding elements in matrices, providing
a measure of accuracy.

Moving on, the research work on hand detection and tracking the problem of occlusions, aspect changes and articulations
over long videos [73], [74] focused on the need for fast and that may hinder the proper detection of objects in uncontrolled
long-term object detection even in the presence of temporal scenes, [75] propose a method that first detects object parts
occlusions and changes among frames. In order to tackle these and consequently tries to associate them and detect the object.
problems, the researchers suggested the use of a hand detection The detected object parts are called granules, i.e. small areas
model (Based on faster R-CNN) combined with the projection with simple properties (e.g. color) that separate them from their
across an image (frame) sequences in order to detect clusters of context. Using a combinatorial optimization method, based on
bounding boxes that remain almost stable across frames, even simulated annealing, they learn how to associate the proper
if they are occluded in some of the frames. In order to handle neighboring granules that compose the object. The problem of
14

overlapping objects is also discussed by [76], where the authors paradigm as an alternative to FL-based averaging (FedAvg) or
propose a new object detection algorithm that improves the more proficient averaging techniques.
localization of the detected objects and suppresses redundant 2) FL for object detection: The learning process for object
detection boxes, using a better loss function. detection is usually handled as a centralized offline task, but
in resource-restricted environments and applications that need
Moreover, CL for medical image segmentation and clas- mobility, privacy, and security decentralized and distributed
sification has been applied by [77]. The authors distinguish approaches, as well as cloud-edge collaborative approaches
between the image segmentation and annotation of segments have been proposed. In an attempt to optimize object detection
task, and that of detecting the severity of the disease by performance, authors in [84] proposed an FL approach that
considering the image as a whole. They propose the use of improves the performance of the federated averaging model
CL method for disease severity classification that is based on aggregation over independent and identically distributed (IID)
attention over the detected and annotated segments. A weakly data. The main claim of their work is that SOTA DL CNN
supervised framework for CL is proposed by [78] for allowing models have been trained in controlled, centralized and
object detection using labels at the image level. The proposed balanced datasets and cannot perform well on non-independent
framework combines a weakly supervised learner (i.e. a two- and identically distributed (non-IID) data. Since in cases
stream CNN built on VGG16) with a strongly supervised that need privacy and security, such as in medical images
learner (i.e. faster-RCNN) and trains the two subnetworks for example, the data can be non-IID, the use of simple
in a collaborative manner. The former subnetwork optimizes federated algorithms, such as FedAvg, can be problematic.
the multi-label classification task, whereas the later optimize For this purpose, they propose a weighted variant of FedAvg
the prediction consistency of the detected objects’ bounding that improves FL performance in non-IID image data. Using
boxes. Another weakly supervised object detection approach collaborative intelligence on the edge, or balancing between
that employs image-level labels in order to learn to detect the the edge and the cloud is an efficient FL paradigm that can
accurate location of objects in the training images has been improve the performance of object detection tasks. The work
presented by [79]. A CL framework has first been trained of [85] has shown that splitting the deep NN models between
on the image-level labels aiming to optimize the image-level the cloud and the edge, apart from being privacy-preserving
classifier and to assign higher weights to instances with more and secure can also be more efficient for object detection tasks.
confidence (i.e. without much noise, with fewer objects and Efficiency can be achieved by quantizing and compressing
simpler backgrounds). Then, a curriculum learning component the tensors of the first layers before sending them to the final
is employed for guiding the learning procedure of object layers that reside in the cloud. Depending on where the split is
localization and classification. In [80], the authors also propose performed, different compression techniques (lossless or lossy
an object detection method from remote sensing images, which ones) are preferred. The same network split strategy (splitNN)
combines a weakly supervised detector for image-level labels has been used in [86] for protecting the privacy of medical
and a strongly supervised detector for object localization. In data. More specifically, the client nodes train the network up
a similar manner, [81] split the salient object detection task to a certain layer and the outputs are sent to the server to
into two sub-tasks that are examined in parallel. The first perform the rest of the training. The inverse process takes
task relates to the estimation of internal semantics and the place during the back-propagation of the gradients. In order to
second to the prediction of the object boundaries. VGG-16 further protect the labels of the training samples from exposure
is used as a basis for feature extraction from images and an to the server, the authors also propose a u-shaped forward
additional layer is used to define their semantics. Two decoders and backward propagation process in which the first and last
are combined for the second part. The first detects the broader layers of the network (along with the training labels) are kept
area of interest in the images and the second performs a fine- in the clients. Finally, they propose a vertically partitioned
grained boundary detection. The CL network joins the two data model in which multiple clients train the same first layers
networks in order to efficiently fuse semantics and boundaries of their networks in parallel and send the outputs to the server
and extract the saliency maps of each image. Combining that concatenates them before proceeding with the remaining
an image enhancement network with the object recognition layers. This way the clients share the model (at least the last
network in order to be able to recognize objects in images of layers) without sharing their data.
extremely low resolution is proposed in [82]. The two networks
were trained using a CL technique in which the knowledge
from the object recognition network is used to enhance the B. Self-supervised-based FL
low-resolution images that are given as input to the other Self-supervised learning (SSL) and its variants, momen-
network. The enhanced images are then fed to the classifier to tum contrast (MoCo), bootstrap your own latent (BYOL),
improve its performance. Four different losses (reconstruction, and simple siamese (SimSiam), are powerful techniques
perpetual, classification and edge loss) are combined, in order for learning representations from centralized data [87]. FL
to optimize the object recognition performance. Transferring has been combined with SSL to address privacy concerns
the knowledge distillation method in a distributed and CL with decentralized data. However, there is a lack of in-
setup has been proven beneficial for online object detection depth understanding of the fundamental building blocks for
tasks [83]. The student models (one in each node) are trained federated self-supervised learning (FedSSL). The reference
separately using as their teacher model, an ensemble that [87] introduces a federated hybrid self-supervised learning
comes from the fusion of the student logits. The teacher’s (FedHSSL) framework that combines VFL and SSL techniques
knowledge is then distilled back to the students in the form such as SimSiam, addressing data deficiency. FedHSSL utilizes
of a soft target. This logic can easily be ported to the FL cross-party views and local views of aligned and unaligned
15

samples within each party to enhance representation learning. C. Face detection


It also incorporates partial model aggregation using shared The detection of face mask wearing is another CV task that
generic features among parties. Experimental results show that can be significantly benefited by FL [91]. When visual data are
FedHSSL outperforms baseline methods, particularly when collected by surveillance cameras, the monitoring of face mask-
labeled samples are limited. The researchers also analyze wearing can be split into two separate tasks: i) the detection
privacy protection mechanisms in FedHSSL, demonstrating of faces, ii) the detection of proper mask-wearing. The second
its effectiveness in thwarting label inference attacks. Moving task poses a risk to user privacy, making FL the preferred
forward, Zhuang et al. [88] propose a generalized FedSSL approach. The proposed scheme utilizes a dilation retina-Net
framework that accommodates existing SSL methods based on face location network for detecting faces, including both clear
Siamese networks and allows for flexibility in future methods. and occluded ones, in a dense crowd. Additionally, an SRNet20
Through empirical study, the authors uncover insights that network has been employed to process the detected faces. The
challenge the conventional wisdom, showing that stop-gradient second network is trained using multiple client nodes that
operation is not always necessary in FedSSL and retaining share their models with a central server, which aggregates the
local knowledge of clients is beneficial for non-IID data. client models.
Building on these insights, the authors propose a new model Furthermore, a related-face detection task is facial expression
update approach called federated divergence-aware exponential recognition, which has numerous applications in the field of
moving average (FedEMA) update, which adaptively updates affective computing. The study conducted by [69] proposes
local models of clients using exponential moving average of a few-shot meta-learning-based FL framework that requires
the global model with dynamically measured decay rate based only a limited number of labeled images for training. The
on model divergence. Experimental results demonstrate that local devices process numerous unlabeled face images, which
FedEMA outperforms existing methods on linear evaluation. are utilized to train an encoder network for learning face
The authors hope that this work provides valuable insights representations. Periodically, a central server aggregates the
for future research in the field of FedSSL. Saeed et al. [89] models using the FedAvg algorithm to enhance representation
propose a self-supervised approach called scalogram-signal learning. Additionally, a second few-shot learner employs a
correspondence learning based on wavelet transform (WT) to small number of labeled private facial expressions to train
learn representations from unlabeled sensor inputs such as local models, which are subsequently aggregated in the central
electroencephalography, blood volume pulse, accelerometer, server.
and WiFi channel-state information. The proposed method
addresses issues related to privacy, bandwidth limitations,
and cost of annotations associated with centralized data D. Video surveillance and smart environments
repositories for supervised learning. The auxiliary task involves An application that attracts the interest of CV researchers
a deep temporal neural network to determine if a given pair in terms of balancing the workload between centralized
of a signal and its complementary view align with each (cloud) and distributed (edge-based) architectures is video
other, optimized through a contrastive objective. The learned surveillance [92]. Edge computing has offered several solutions
features are extensively evaluated on various public datasets, in the direction of workload balancing. However, it added
demonstrating strong performance in multiple domains. The new challenges concerning the compression and filtering of
proposed methodology achieves competitive performance with data that is transmitted over the communication network, and
fully supervised networks and outperforms autoencoders in the fragmentation of knowledge across the edge devices. In
both central and federated contexts. Notably, it improves order to tackle the aforementioned projects, [93] propose a
generalization in a semi-supervised setting by reducing the multi-layer design in order to minimize the communication
volume of labeled data required through leveraging self- between the edge and the cloud and employ FL to update the
supervised learning. detection models without sending training data to the cloud.
The multi-layer architecture comprises a data acquisition layer
on the edge that performs compression and filtering, an object
Yan et al. [90] propose a robust and label-efficient self- detection and identification layer with edge servers that are
supervised FL (LE-SSFL) framework for medical image anal- specialized in specific tasks (e.g. face recognition, licence
ysis. The method utilizes a Transformer-based self-supervised plate detection, etc.) and a FL layer that solves tasks on the
pre-training approach on decentralized target task datasets edge and marks data and models that must communicated to
using masked image modeling. This enables robust represen- the cloud. The information transferred to the cloud usually
tation learning and effective knowledge transfer. Experimental comprises model parameters and the respective weights.
results on simulated and real-world medical imaging non-IID A generic FL architecture that processes sensor data for
federated datasets demonstrate improved model robustness human activity recognition has been presented by [94]. The
against data heterogeneity. The approach achieves significant proposed FL architecture is based on a federated aggregator
enhancements in test accuracy on retinal, dermatology, and that is trained using local data on the edge nodes. The
chest X-ray classification without additional pre-training data, aggregator sends the models to a central entity where the
outperforming a supervised baseline with ImageNet pre- Federated Average algorithm is employed to generate a
training. The FedSSL pre-training approach demonstrates common model that is distributed back to the edge nodes.
better generalization to out-of-distribution data and improved The federated aggregator in the video surveillance scenario
performance with limited labeled data compared to existing FL could be a local server that is trained on the edge using
algorithms. Table III presents of summary of FL frameworks local data. Moving on, [95] introduces a FL framework based
proposed for object detection tasks. on a lightweight version of the Dense-MobileNet model for
16

TABLE III
A SUMMARY OF FL FRAMEWORKS PROPOSED FOR OBJECT DETECTION TASKS . L EARNING TYPE (LT), BEST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).

Ref. LT Model Dataset Description BP Limitations PLA


[76] CL ResNet50, Pascal Voc A new effective loss function han- AP=17.08 only applicable to one stage Yes1
YOLO 3 2007, MS dling both overlapping and non- detectors
COCO overlapping boxes
[78] CL WSDNN, Pascal 2007, An end-to-end to end weakly super- AP=20.3 Achieves maximum No
Faster 2012 vised collaborative learning frame- performance with a high
R-CNN work to improve model accuracy number of iterations.
[79] CL VGG16, Pascal Voc A novel collaborative self-paced mAP=7.2 Generalizability needs to be No
Fast 2007, test-2007, curriculum learning exploiting prior verified with other tasks and
RCNN test-2010, knowledge to enhance the accuracy datasets.
COCO 2014 of weakly object detection
[80] CL FCCNET TGRS- A new CNN model based on sev- mAP=0.2 The results drastically between No
HRRSD, eral mechanisms that is trained by classes.
DIOR alternating strongly and weakly de-
tectors
[81] CL SDCLNET, ECSSD, Pascal- A new model leveraging collabo- MAE=0.03 Considers only the accuracy No
VGG, C, DUTS, rative learning to optimize to sub while disregarding aspects such
ResNet HKU-IS, DUT- goals, internal semantic estimation as network space redundancy.
OMRON, and boundary detail prediction, to
SOD improve the accuracy of saliency
maps
[82] CL ResNet- CIFAR-10, improvement of object detection ACC=7.31 Acceptable yet unsatisfactory No
152 CIFAR-100, in low-resolution images through results.
ImageNet use of two neural networks trained
jointly
[83] CL ResNet MS COCO, CI- A new online knowledge dist dis- ACC=72.9 Generalization issues with lim- No
FAR, ImageNet tillation method leveraging several ited data.
networks trained in parallel acting
all as students
[84] FL SSD300 Pascal Voc FL is used to train the model with FedAvg=76.6 Generalizability needs to be ver- No
2007 a large number of features. KLD (NonIID) ified with other datasets.
measures weights divergence be-
tween non-IID models. The scheme
reduces the impact of abnormal
weights in FedAvg using SSD.

processing aerial images collected by unmanned aerial vehicles at the central server on a large-scale data repository. In
(UAV) swarms. The UAV swarms used for vision sensing doing so, fine-tuning is used since small datasets are not
collect haze images, and are supported by ground sensors that appropriate for action recognition models to learn complex
collect information about air quality. Each UAV employs a ML spatio-temporal features. Moving forward, an asynchronous
model in order to correlate air quality with the haze images federated optimization is adopted and the convergence bound
and shares its trained gradient with a central server. The server has been shown since the present clients’ computing resources
combines the gradients a learns a global model, which is then were heterogeneous. In a different paradigm, a driver action
used to predict air-quality distribution in the region. Besides, recognition system is built by [99] using FL. The latter has
platforms that supports FL powered applications of CV in been used for model training to protect users’ privacy while
[96], [97]. Lia et al. in [96] demonstrate FedVision in a fire enabling online model upgrades. Similarly, in [48], an FL-
detection task, using a YOLOv3 model for object detection based driver activity recognition system is implemented by
and they report its use on three pilots concerning safety training the detection model in a decentralized fashion. FedAvg
hazard detection, photovoltaic panel monitoring and suspicious and FedGKT have been implemented and their performance
transaction monitoring in ATMs via cameras. Catalfamo et has been demonstrated on the 2022 AI City Challenge.
al. [97] thoroughly examine the use of edge federation for Zhao et al. [62] proposed a semi-supervised FL for activity
implementing ML-based solutions. The platform is introduced recognition, which helps edge devices conducting unsupervised
for deploying and managing complex services on the edge learning of general representations using autoencoders and non
that utilizes micro-services, which are small, independent, annotated local data. In this case the cloud server performed
and loosely-connected services. This platform enables the supervised learning by training activity classifiers on the
management of these services across a network of edge devices learned representations, and annotated data.
by abstracting the physical devices they run on. The the 2) Crowd counting: When more refined tasks are assigned
effectiveness of this solution was demonstrated through a to CV algorithms in dense crowds, such as the detection of
case study involving video analysis in the field of morphology. face mask wearing [91], the difficulty increases since the target
1) Action recognition: Using knowledge distillation, [98] objects (i.e. masks) are in different scales and occlusions. FL is
allows client nodes with limited computational resources to a promising solution that enhances the privacy of individuals.
execute action recognition by performing model compression However, scale variations, occlusions and the diversity in
17

crowd distribution are still the open issues that demand efficient [107]. However, privacy and ethical concerns are increasingly
detection techniques, such as deep negative correlation learning critical, which makes it difficult to collect large quantities
[100], relational attention [101], etc. of data from multiple institutions. FL provides a promising
Crowd counting is a complex CV task especially when decentralized solution to CL by exchanging client models
multiple sensors (i.e. cameras) are combined. FL approaches instead of private data. Sheller et al. [108] performed the first
can be helpful since they allow the distributed trainers to study that investigated the use of FL for multi-institutional
exchange their models and improve their performance quickly, collaboration, and enabled the training of DL models without
especially when a centralized trainer is able to periodically sharing patients’ data. In particular, the aggregation process
validate and improve the quality of the aggregated model involves calculating a weighted average of institutional updates,
[27]. However, when the distributed nodes are not trusted where each institution’s weight is determined by the proportion
in advance, or when the quality of their data is ambiguous, of total data instances it holds. This entire process, comprising
more control mechanisms and incentives are needed to avoid local training, update aggregation, and distribution of new
the deterioration of the resulting model. When the distributed parameters, is referred to as a federated round. Linardos et
trainers have to co-operate in order to reach a consensus, either al. [46] modeled cardiovascular magnetic resonance using a
it is for crowd counting or for any other task, it is important FL scheme that concentrated on the diagnosis of hypertrophic
to provide them enough incentives in order to become trustful cardiomyopathy. A 3D-CNN model, pre-trained on action
[102]. Blockchain-based approaches [103] manage to distribute recognition, was deployed. Moreover, shape prior information
the incentive among the trained models and evaluate their has been integrated into 3D-CNN using two techniques and
reliability in order to fairly partition any potential profit (or four data augmentation strategies (Fig. 10). This approach
trust). This way, they allow FL algorithms to become more has then been evaluated on the automatic cardiac diagnosis
robust, by including trustful trainers and honest reporters that challenge (ACDC) dataset. The multi-site fMRI classification
detect misbehaviors and block them from the FL process. problem is addressed by [109] while preserving privacy using
3) Anomaly detection: An interesting data mining task a FL model. Accordingly, a decentralized iterative optimization
with many CV applications is anomaly detection. It involves has been deployed before using a randomization mechanism
the identification of strange patterns in data, which may to alter shared local model weights. In the same way, Dayan
indicate a fake or incorrect situation. Several cutting-edge et al. [110] developed and trained a FL model on data from
ML and DL algorithms have been developed in the literature data from 20 institutes worldwide, namely EMR chest X-ray
in order to detect and prevent such incidents. When it comes AI model (EXAM). The latter was built based on the COV-
to CV various DL architectures, from variational autoencoders 2 clinical decision support (CDS) model. This helps predict
(VAE), generative adversarial networks (GANs) or recurrent the future oxygen requirements of COVID-19 patients using
neural networks (RNNs) can be trained to detect anomalies chest X-rays, laboratory data, and inputs of vital signs. In the
in video sequences [104]. The capitalization of the use of same way, an FL-based solution that screens COVID-19 from
autoencoders and FL for detecting anomalies is introduced chest X-ray (CXR) images is deployed by [111]. In addition,
in [105]. The bipartite structure of autoencoders, and their a communication-efficient CNN-based FL scheme to multi-
ability to reconstruct the input in their output, allows the chest diseases classification from CXR images is proposed
detection of anomalies based on the errors spotted during by [112]. Yan et al. [113] proposed a real-time contribution
the regeneration of the input. The authors train two clients measurement approach for participants in FL, which is called
using different parts of the training dataset and they merge the Fedcm. The latter has been applied to identify COVID-19 based
resulting models using FL library called PySyft for secure and on medical images. Moving forward, Roth et al. [114] deployed
private DL. They employ the MSE loss metric to compare the a FL approach for building medical imaging breast density
input and output and examine whether it exceeds a threshold classification solutions. Data from seven clinical institutions
in order to decide whether it is an anomaly or not. Bharti et around the world has been used to train the FL algorithm.
al. [106] have proposed an edge-enabled FL approach for the
automatic inspection of products using CV. Their main basis is Because of the gaps in fMRI distributions from distinct
SqueezeNet, a lightweight model pre-trained on the ImageNet sites, a mixture of experts domain adaptation (MoE-DA) and
dataset, which is able to identify 100 different types of objects. adversarial domain alignment (ADA) schemes have been inte-
Although the image processing model has been trained using grated into the FL algorithm. The reference [115] introduced a
multiple images of normal products the various defects are variation-aware FL approach, in which the variations between
detected using an anomaly detection algorithm. SqueezeNet clients have been reduced by transforming the images of all
acts as a feature extractor from images, which are then fed to a clients onto a common image space. A privacy-preserving
dense layer with as many output neurons as possible anomaly generative adversarial network, namely PPWGAN-GP is
classes, plus one for the normal products. In its federated introduced. Moving on, a modified CycleGAN is deployed
version, the edge server aggregates the various local models for every client to transfer its raw images to the target image
and computes a new global model that is reshared with the space defined by the shared synthesized images. Accordingly,
local nodes. Table IV summarizes some of the FL frameworks this helps address the cross-client variation problem while
proposed for video surveillance and smart environments. preserving privacy. Similarly, for data privacy-preserving,
[116] used a FL scheme to securely access and meta-analyze
biomedical data without sharing individual information. Specif-
E. Healthcare and medical AI ically, brain structural relationships are investigated across
Nowadays, DL methods with large-scale datasets can clinical cohorts and diseases. Sheller and their colleagues
produce clinically useful models for computer-aided diagnosis [117] deployed a FL scheme to facilitate multi-institutional
18

TABLE IV
S UMMARY OF FL FRAMEWORKS PROPOSED FOR VIDEO SURVEILLANCE AND SMART ENVIRONMENTS . B EST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).

Ref. Model Dataset Description BP Limitations PLA

[48] ResNet 8 AI City Optimise FL for resource-limited accuracy = FL results relatively close No
Challenge, devices with FedGKT 95% to the centralized results but
StateFarm are not superior.
[62] LSTM OPP, DG, Addressed the problem of the lack accuracy = Require labeled data on the No
PAMAP2 of labeled data on the client’s level 82% central server.
[99] MobileNetV2, StateFarm Evaluation of FL for driver’s action accuracy = Evaluation considered only No
ResNet50, recognition 85% the accuracy of the model.
VGG16,
Xception
[98] ReNet HMDB51, Custom model initialisation through accuracy = Limited consideration of No
UCF101 knowledge distillation with asyn- 89.5% non-iid data.
chronous aggregation
[102]CNN MNIST, Next- Federated crowd sensing with an accuracy = Even though the loss re- No
Character- incentive mechanism to reward and 80% duced significantly, it is still
Prediction motivate participants high.
[103]DDCBF NA Suggest an FL framework to dis- accuracy = Require long time for con- No
tribute trust and incentive among 97% vergence
trainers
[100]DRFL Wider Face Proposes a cascade network with mAP = Can not achieve real-time de- Yes2
two stages trained with FL 98.5% tection due to computational
complexity.
[106]CNN MVTech FL-based approach to enable visual F1-score Limited evaluation consid- Yes3
inspectors to recognise unseen de- ¿= 90% ering only detection perfor-
fects in industrial setups mance.

collaborations without the need to share patients’ data. In [118], for watermarking-based FL. The first is called WAFFLE
MR data from multiple institutions is shared with privacy proposed to prevent global FL model theft by offering a
preservation. Moreover, cross-site modeling for MR image mechanism for model owners to showcase their ownership of
reconstruction is introduced to reduce domain shift and improve the models, and the second is a client-side backdoor-triggered
the generalization of the FL model. Moving on, [119] combined watermarking is adopted to secure FL model verification.
differential privacy and weakly-supervised attention multiple Blockchain is also another means used for data privacy, Polap
instance learning (WS-AMIL) in order to develop a privacy- et al. [126] developed an intelligent medical system based on
preserving FL approach for gigapixel whole slide images in agent architecture using blockchain and FL technologies. Since
computational pathology. Researchers in [120] implemented FL algorithms do not inherently contain privacy-preserving
differential-privacy schemes for protecting patients’ data in mechanisms and can be sensitive to privacy-centered attacks
a FL setup designed for brain tumour segmentation on the that can divulge patients’ data, it is important to augment them
BraTS dataset. In the same field, Bercea et al. [121] developed with privacy-enhancing technologies, especially in clinical
a disentangled FL approach to segment brain pathologies in applications that are implemented in a multi-institutional
an unsupervised mode. Cetinkaya et al. [122] attempted to setting. In this context, a differentially private FL solution
improve the performance of FL-based medical image analysis is suggested by [127] to segment multi-site medical images
in non IID settings using image augmentation. in order to further enhance privacy against attacks. Besides,
Guo et al. [128] propose an FL-based approach for distributed
The schemes [123], [124], [125] consider whether the data in medical cyber-physical systems. Their approach helps
necessity of watermarking is required when using FL. For in training DL models for disease diagnosis following three
example, [123] introduced a FL-based zero-watermarking steps which are repeated in cycle: i) training a global model
technique for security and privacy preservation in teleder- using offline medical images and transferring the global model
matology healthcare frameworks. FL-based autoencoder is to the local diagnosis nodes, ii) re-training the local models
employed for extracting image features from dermatology using local data and iii) sending them back to the central
data using two-dimensional discrete cosine transform (2D- server for federated averaging. The authors in [129] propose a
DCT). Conversely, [124], [125] are two strategies proposed
ˆ‰Š‹ŒŽ‘‘gTX[YScO\XUYZU^T~QcQR[UMQR[QcUTR[QRXT[TQXUVQZYcQUOR^UOQcUNTX[YScO\UX[OR^Oc^TkO[TYRhUiQU[aYU
QjOPWO[TYRU[QMNRT’WQXUT\`PbU^T~QcQR[UO^O`[O[TYRUXQ[rW`XUOXUZYcU[NQU““”UaQUOcQUX[OR^Oc^TkTRSUWXTRSUORUOjQcOSQU
ZcY\UOPPUYZU[NQUMQR[QcX]UaNTPQUZYcU•“–—“”UaQUMYRXT^QcUYRQUMQR[QcUQR[TcQPbUWRXQQRh 19

µ¡œ¥šž¢« µ¢ž«šž¢«

´œž«ª¡§
Ν«£Ïœ¥ª
Öԝ©²«œ¥š¥š
°¥±²£¢š³¡œ¡ž ¸¹š«¡œ¥œ¥ª ²¥ž¢¢¥š
´œž«ª¡§ £¢¥«¢¡
µ¡œ¥œ¥ª
˜™š›œžšŸ ¡¡¢£¤ ¥ µ¡œ¥šž¢« µ¢ž«šž¢« ¶«šº²ª§¢¥««œ¥ž ¹£©š§ ±¢©š£¨œ¢žš«¡œ¥¢±š¥š¢£ÏšÐš
«Ï¢š«Ï¡¢¢š«¡œ¥š£¢¥«¢¡ž
Öԝ©²«œ¥š¥š
¦¢ž§¨©œ¥ªš«š¬­¬­¬š Ÿ¶·š«¡œ¥œ¥ª ²¥ž¢¢¥š
§§ £¢¥«¢¡
΢¡ª¢š«Ï¢š«Ï¡¢¢š«¡œ¥š£¢¥«¢¡žšœ¥«š¥¢š
Ÿ¡¨¨œ¥ªš«š ¨ ©šÐ¡š«¡œ¥œ¥ª
¬®¯­¬®¯­¬¯ ´œž«ª¡§ ´œž«ª¡§
µ¡œ¥œ¥ª Ν«£Ïœ¥ª
·¨©œ«šœ¥«šÐ©±ž
¸¹š«¡œ¥œ¥ª ¸¢±¢¡«¢±š
°¥±²£¢š³¡œ¡ž ¢Ô©²«œ¥
»¼½¾¼¿À¼Á¼ÿÄÅ¿ƿÀÃÇÈȿɽÊË̽ÂËÇÁ
¹£©š§ ±¢©š£¨œ¢žšš«¡œ¥¢±š¥š©©šÐ²¡š
ÀÇÊʽÍÇýÂ˾¼¿ÀÃÇÈȿɽÊË̽ÂËÇÁ ¶«šº²ª§¢¥««œ¥ž £¢¥«¢¡ž
ÑÊÇÒ¿ÇÓ¿ÂýËÁ¿È¼Â Öԝ©²«œ¥š¥š
ÑÊÇÒ¿Çӿ¼È¿ȼ Ÿ¶·š«¡œ¥œ¥ª §¢¡ª¢±š«¢ž«š
ž¢« ΢¡ª¢š«¢ž«š
ÑÊÇÒ¿ÇÓ¿ÕÇÌ¼Ê ž¢«žš
΢¡ª¢š©©š£¢¥«¢¡žšœ¥«š¥¢š¨ ©šÐ¡š
«¡œ¥œ¥ª

ˆ‰Š‹ŒŽ×‘‘€RUYjQcjTQaUYZU[NQÙT̀QPTRQUWXQ^UZYcU[NQUQ_`QcT\QR[XUTRU[NTXUX[W^bh
Fig. 10. The methodology applied for conducting the experiments in this study is outlined in [46]. Leave center out and collaborative cross-validations, both
employed collaborative data sharing (CDS) and FL for the training.

MNOPPQRSTRSUVQMOWXQUYZU[NTXUPT\T[O[TYR]UOR^UTRU[NTXUQ_O\`PQUaQUMORUXQQU[NTXUVbU[NQÙcQXQRMQUYZUOUXQMYR^ÙQOdUYRU
technique to harmonizeeO PPU^fgQVcYglobal
RfXUTR[Qdrifts
RXT[bUin^TXFL
[cTVmodels
W[TYRhon healthcare and AI applications.
iQ
local
UYj
and
QcO P
PÙT̀Q P
TR QUTX UXW\\O
heterogeneous medical images. In doing so, the local update
cTkQ^UTRUlTShmnh
 op
drift is first mitigated by normalizing amplitudes of images
qRU[NQUTRT[TOPUQ_`QcT\QR[X]UaQUMY\`OcQU[NQUVOXQPTRQU^O[OU[YUOUMWcO[Q^UjQcXTYRUaT[NUTR^WMQ^UXNO`QÙcTYcXUOXUjTXWr
transformed into the frequency domain and then a client weight F. Autonomous driving
OPTkQ^UTRUlTShmshUiQUcQXWP[XUOcQUYW[PTRQ^UTRUtOVPQmshUlYcUVY[NUMYPPOVYcO[TjQUPQOcRTRSUZcO\QaYcdX]ÙQcZYc\ORMQU
perturbation guiding each local model to reach a flat optimum Autonomous driving has recently received increasing interest
T\`cYjQXUOXUaQUTR^WMQUOUXNO`QÙcTYcUuvOXdQ^UvwqxUOR^ÙQOdXUaNQRUaQUWXQU[NQÙQcrX[cWM[WcQUX`PT[yOÙcTYcUYZU[NQU
is designed based on harmonized features.
NQOc[fXUX[cWM[WcQhUlzUQ_NTVT[XUXT\TPOcÙQcZYc\ORMQU[YU{|}UTRUOPPUMOXQX]UNYaQjQcU[NQUX[OR^Oc^U^QjTO[TYRXUOcQU\WMNU
due to the advance of CV, which is in the core of this

PYaQcUTRU[NQUMOXQUYZUlzhUvYcQYjQc]U[NQU^T~QcQRMQUTRÙQcZYc\ORMQXUYZU{|}UOR^UlzU^QMcQOXQXUOQcUMYRX[cOTRTRSU
The problem setting of federated domain generalization technology. Vehicles leverage object detectors to analyze
[NQUXYPW[TYRUX`OMQUWXTRSU[NQUXNO`QÙcTYcXh
(FedDG) is solved in [130]. This helps in learning a FL images collected by multiple sensors and cameras, analyze their
€UXbX[Q\O[TMUMY\`OcO[TjQUX[W^bUTXUMYR^WM[Q^UYRU[NQU^T~QcQR[UOWS\QR[O[TYRUXQ[rW`XUWXTRSUVY[NU[NQU{{eUOR^U
architecture from various distributed SDs, which enables its surroundings in real-time and then recognize different objects,
z{r{eUQjOPWO[TYRÙcYMQ^WcQXhU‚QUƒR^U[NO[U{{eÙQcZYc\ORMQU^cY`XUTRUOPPUMYPPOVYcO[TjQUPQOcRTRSUZcO\QaYcdXU
generalization to unseen TDs. This was made possible by including other vehicles, road signs, barriers, pedestrians, etc.,
utOVPQm„UOR^UlTShmxUYRMQUOWS\QR[O[TYRXUOcQUTR[cY^WMQ^h
introducing a the episodic learning in continuous frequency which help them in safely navigating the roads. While a
qRU[NQUMOXQUYZUz{r{e]UaNQcQU[NQU[cOTRUOR^U[QX[TRSUXQ[XUMY\QUZcY\U^T~QcQR[U^TX[cTVW[TYRX]U{|}ÙQcZYc\ORMQU
space (ELCFS) technique. Moving on, [131] propose a partial plethora of studies have focused on improving the accuracy by
MYRXTX[QR[PbUT\`cYjQXUZcY\UO^^T[TYROPUOWS\QR[O[TYRX]UcQOMNTRSUT[XUNTSNQX[UaNQRU[NQU\YX[UOWS\QR[O[TYRXUOcQU
O``PTQ^UulTShmxhUlzUOPXYUVQRQƒ[XUZcY\U[NQUVOXTMUOWS\QR[O[TYRXUYZUcY[O[TYRUOR^U†T̀`TRS]UVW[UT[XÙQcZYc\ORMQU^cY`XU
initialization-based cross-domain personalized FL, namely training DL algorithms on centralized large-scale datasets,

OXUXNO`QUOR^UTR[QRXT[bUOWS\QR[O[TYRXUOcQUTR[cY^WMQ^]UOR^U{|}UXWc`OXXQXUlzhUqR[QcQX[TRSPb]U\Y^QPXU[cOTRQ^UWR^QcU
PartialFed. Their method is based on loading a subset of few of them have addressed the users’ privacy. To that
ORUlzUZcO\QaYcdUuQT[NQcUlzUYcUlzr‡exUQ_NTVT[UTRMcQ^TVPbUMYRXTX[QR[UcQXWP[XUOMcYXXU^T~QcQR[UTRT[TOPTkO[TYRXUYZU[NQU
the global model’s parameters instead of loading the entire end, using FL in autonomous driving has recently attracted
XO\QUXQ[rW`X]UaNTPQU{|}rZcO\Q^U\Y^QPXUXNYaUOUNTSNUO\YWR[UYZUjOcTORMQh
model, as it is done in most of the FL approaches. Thus, it increasing attention. However, numerous challenges have been
is closer to the split Learning paradigm. More over, Wang et raised, including data discrepancy across clients and the server,
al. [132] design an effective communication, and computation expensive communication, systems heterogeneity and privacy
!"#""$%"&'((% )**+,-../012034.5625678.,95:;8<6==<6058><9
efficient FL scheme using progressive training. This helps concerns. Typically, privacy issues include internal and external 0
to reduce computation and two-way communication costs, data, such as the faces of pedestrians, vehicles’ positions, etc. 123456789 
while preserving almost the same performance of the final The approach to on-device ML using FL and validates
models. While FL allows collaboratively training, using a joint it through a case study on wheel steering angle prediction
model that is trained in multiple medical centers that maintain for autonomous driving vehicles is presented in [135]. The
their data decentralized to preserve privacy, the federated results show that FL can significantly improve the quality
optimizations face the heterogeneity and non-uniformity of of local edge models and reach the same accuracy level as
data distribution across medical centers. To overcome this centralized ML without negative effects. FL can also accelerate
issue, an FL scheme with shared label distribution, namely model training speed and reduce communication overhead,
FedSLD, is proposed by [133]. This approach can reduce making it useful for deploying ML/DL components to various
the discrepancy brought by data heterogeneity and adjust the embedded systems. Fig. 11 presents a FL-based framework
contribution of every sample to the local objective during for wheel steering angle prediction in autonomous vehicles.
optimization via the knowledge of clients’ label distributions. Nguyen et al. [136] propose a communication-efficient FL
Tbale V presents a Summary of FL frameworks proposed for to detect fatigue driving behaviors, namely FedSup. It helps
20

TABLE V
S UMMARY OF FL FRAMEWORKS PROPOSED FOR HEALTHCARE AND MEDICAL AI APPLICATIONS . B EST PERFORMANCE (BP), PROJECT LINK
AVAILABILITY (PLA).

Ref. Model Dataset Description BP Limitations PLA


(%)

[46] 3D-CNN M&M and ACDC Modeling cardiovascular mag- AUC = Presence of bias against ACDC Yes4
datasets netic resonance with focus on 78 on the shape and intensity set-
diagnosing hypertrophic car- up where FL exhibits an AUC
diomyopathy performance of about 0.85 to
0.89.
[47] PFA Real-World A personalized retrogress- AUC= Generalization with other Yes6
Dermoscopic FL resilient FL with modification 88.92, datasets is not confirmed.
Dataset 5 in the clients and server. F1=
70.75
[109] MoE-DA, Autism Brain Privacy-preserving FL and do- ACC= The model updating strategy is Yes7
ADA Imaging Data main adaptation for multi-site 78.9 not optima. Additionally,the sen-
Exchange dataset fMRI analysis sitivity of the mapping function
(ABIDE I) was difficult to estimate.
[115] PPWGAN- LocalPCa, Address cross-client variation AUC= Do no address the inter-observer No
GP, PROSTATEx problem among medical image 96.79, problem and incomplete image-
modified challenge [134] data using VAFL. ACC= to-image translation.
Cycle- 98.3%
GAN
[116] fPCA ADNI, PPMI, Meta-analysis of large-scale sub- N/A No comparisons with the SOTA No
MIRIAD and UK cortical brain data using FL have been reported.
Biobank
[110] fPCA Synthetic and real Predict the future oxygen re- AUC¿ Normalization techniques to en- Yes8
world private data quirements of symptomatic pa- 92 able the training of AI models in
tients with COVID-19 using in- FL were not investigated.
puts of vital signs
[126] Agent- Private data Blockchain technology and ACC= Poor internet connection hinders No
based threaded FL for private sharing 80 accessing the data. The initial
mod of medical images. adaptation of the classifier for
practical use needs more investi-
gation.
[133] FedSLD MNIST, FL with shared label distribution ACC= Reduce the impact of non-IID No
CIFAR10, Organ- to classify medical images 95.85 data by leveraging the clients’
MNIST(axial), label distribution.
PathMNIST
[131] PartialFed, KITTI, WFace, Partial initialization for cross- ACC= Reduce performance degradation No
PartialFed- VOC, LISA, domain personalized FL 95.92 caused by extreme distribution
Adaptive DOTA, COCO, heterogeneity.
WC, CP, CM,
Kit, DL

in progressively optimizing the sharing model with tailored interconnected networks and heterogeneous data generated at
client–edge–cloud architecture and reduces communication the network edge in the upcoming 6G environment. A two-
overhead by a Bayesian convolutional neural network (BCNN) layer FL model is proposed that utilizes the distributed end-
data selection strategy. In [136], a federated autonomous edge-cloud architecture to achieve more efficient and accurate
driving network (FADNet) solution is introduced for enhancing learning while ensuring data privacy protection and reducing
models’ stability, ensuring convergence, and handling imbal- communication overhead. A novel multi-layer heterogeneous
anced data distribution problems where FL models are trained. model selection and aggregation scheme is designed to better
Zhou and their colleagues [137] discuss the need for utilize the local and global contexts of individual vehicles and
distributed ML techniques to take advantage of the massive roadside units (RSUs) in 6 G-supported vehicular networks.
21

Aggregation
server

Model Model Model


update update update

Input
steering
wheel angle

Error back-
Network
propagation
prediction angle

Image frames
CNN
Local training Local training Local training Optical flow
frames

Fig. 11. FL-based framework for wheel steering angle prediction in autonomous vehicles [135].

This context-aware distributed learning mechanism is then environment for FL. Consequently, it is important to establish
applied to address intelligent object detection in modern the mechanisms that manage heterogeneous nodes in the same
intelligent transportation systems with autonomous vehicles. network, considering their restrictions and capabilities. The use
Fig. 12 presents an overview of the two-layer FL model of central (cloud-based) nodes that carry part of the training
based on convolutional neural network (TFL-CNN) framework. process, either using additional training data or by training
Table VI presents a summary of FL frameworks proposed for parts of the learning model (e.g. some layers of the DNN
autonomous vehicles. as in the split learning paradigm) can be beneficial for the
overall performance. A third issue that must be considered is
IV. O PEN ISSUES the statistical heterogeneity of data that arrives in each node.
Learning using non-IID data is still an open challenge for FL
The advances in CV algorithms have increased the number algorithms, and various alternatives are still under evaluation.
and variety of tasks that range from the detection and tracking Finally, privacy and robustness concerns are still present and
of specific objects in static images or controlled image streams methods that preserve privacy and at the same time guarantee
to public surveillance and the detection of normal or abnormal the resulting model robustness to any kind of breaches or
behaviors and conditions in the wild, with applications from attacks must be properly designed and developed.
agricultural crop monitoring to law enforcement and medical
diagnostics. The information captured in surveillance cameras’
footage may contain sensitive and private data that must be A. Communication overhead
protected (e.g., the presence of individuals at a place), the CV A crucial component of a FL environment is communication
models trained to detect abnormal behaviors in public may between the client nodes and the central server. The ML
suffer from race, ethnicity or gender bias, and scene perception model is downloaded, uploaded, and trained over the course of
software in self-driving cars may suffer from the ability to several communication rounds. Although transferring models
adapt to new environments or conditions (e.g. fog, mist or instead of training datasets is significantly more efficient,
haze). these communication rounds may be delayed when the device
FL is a new method for protecting privacy when developing has limited bandwidth, energy, or power. The communication
a DNN model, that uses data from various clients and trains overhead also increases when multiple client devices participate
a common model for all clients. This is achieved by fusing in each communication round thus leading to a bottleneck.
distributed ML, encryption and security, and introducing incen- Further delays may occur from the need to synchronise the
tive mechanisms based on game theory and economic theory. receipt of models from all clients, including those with low
FL could therefore serve as the cornerstone of next-generation bandwidth or unstable connections. An additional burden to
ML that meets societal and technological requirements for the client synchronization overhead can be caused by the
ethical AI development and implementation. non-identical distribution of training data to the nodes, which
Despite their many advantages, FL solutions also have may delay the training of some models, and consequently
several challenges to address. First of all, since FL is a the model update process [144]. This delay in cybersecurity
distributed learning technique it is important to guarantee the can be considered an intrusion if it surpasses a predetermined
efficient communication between the federated network nodes threshold [10].
so that they can communicate the learned model parameters. In order to mitigate the communication overhead and
In addition to this, the models may be trained on nodes establish communication-efficient federation strategies, [145]
of different hardware architecture and capabilities (e.g. IoT have proposed the compression of transferred data. In a
devices, smartphones, etc.) which constitute a heterogeneous different approach, [146] have focused on identifying irrelevant
22

Aggregated
parameters Global aggregation

Contextual
Information
Weighted Global
aggregation caching

Central Cloud Cerver


Edge aggregation
Encrypted local parameters

Weighted Nodel
aggregation selection

Vehicular Contextual Information


Global Global
updates updates
Encrypted
Locality Driving local
Information Information parameters

Local Training Local Training

CNN Model Caching CNN Model Caching

Data owner Data owner

Fig. 12. Overview of the the TFL-CNN framework introduced in [137].

models and precluding them from the aggregation, thus with low communication overhead for use in the FL scenario
significantly reducing the communication cost. This is achieved by leveraging the covariance stationarity. The experimental
by sharing the global tendency of model updating between the results show that FedCor can improve convergence rates by
central server and all the client nodes and asking each client 34% to 99% on FMNIST and 26% to 51% on CIFAR-10
to align its update with this tendency before communicating compared to the SOTA method. Besides, FL still suffers
the model. from significant challenges such as the lack of convergence
In the case of FLchain [36], the model communication cost and the possibility of catastrophic forgetting. However, it is
between the clients and a central server, which is common demonstrated in [149] that the self-attention-based architectures
in FL, is replaced by the cost of sharing the models with the (e.g., Transformers) have shown to be more resistant to changes
blockchain ledger. For this reason, it is important to consider in data distribution and therefore can improve the effectiveness
the time needed for this process, including local training, model of FL on heterogeneous devices. The authors of [150] propose
transmission, consensus, and block mining. an alternative approach, called FedAlign, which aims to address
data heterogeneity by focusing on local learning rather than
proximal restriction. They conducted a study using second-
B. Heterogeneity of client nodes
order indicators to evaluate the effectiveness of different algo-
One of the major challenges in FL is the heterogeneity of rithms in FL and found that standard regularization methods
data on different clients, which can hinder effective training. performed well in mitigating the effects of data heterogeneity.
To address this issue, client selection strategies are often used FedAlign was found to be a simple and effective method
in an attempt to improve the convergence rate of the FL for overcoming data heterogeneity, with competitive accuracy
process. While active client selection strategies have been compared to SOTA FL methods and minimal computational
proposed in recent studies, they do not take into account and memory overhead.
the loss correlations between clients and only offer limited
improvement compared to a uniform selection strategy [147]. On the other hand, when FL is limited only to client
To overcome this limitation, FedCor, an FL framework that nodes that share the same model architecture then the FedAvg
utilizes a correlation-based client selection strategy to improve algorithm and its alternatives (e.g. FedSDG, FedProx, etc.) can
the convergence rate of FL is proposed in [148]. The loss be applied to merge the locally trained models in each iteration
correlations between clients is modeled using a Gaussian [151]. However, such approaches assume that the underlying
process (GP) and use this model to select clients in a way nodes share similar hardware architecture, specifications and
that significantly reduces the expected global loss in each processing capabilities in general, which is not always the case
round. Moreover, an efficient GP training method is developed in FL settings. For example, when smartphones, monitoring
23

TABLE VI
S UMMARY OF FL FRAMEWORKS PROPOSED FOR AUTONOMOUS VEHICLES , INCLUDING THE ML MODEL USED , DATASET, DESCRIPTION OF THE MAIN
CONTRIBUTION , BEST PERFORMANCE (BP), LIMITATIONS AND PROJECT LINK AVAILABILITY (PLA).

Work Model Dataset Description BP Limitations PLA

[136] FADNet Udacity, P2P federated framework for RMSE= Limited deployment experi- Yes9
Gazebo, Carla training autonomous driving 0.07 ment
models
[138] BSUM- MNIST Dispersed FL framework to im- accuracy = NP-hard and non-convex for- No
based prove the robustness, privacy- 99% mulation of the problem
solution awareness and communication
constraints
[139] CNN Private dataset Addresses the case where ve- accuracy= Require long training itera- No
hicles and servers are consid- 92.5% tions for small images
ered honest but curious through
blockchain
[140] CNN MNIST, A reactive method for the al- accuracy = The impact of resource allo- No
FMNIST location of wireless resources, 88 % cation methods is diminished
dynamically occurring at each when strengthening the role
round of the proximal term
[135] CNN SullyChen End-to-end ML model for han- RMSE= 9.2 Synchroneous aggregation No
dling real-time generated data
[141] U-NET, Lyft Level 5 AV Feasibility study of using FL for accuracy = Evaluated only the accuracy No
CNN vehicular applications 95%
[137] TFL-CNN BelgiumTSC An improved model selection F1-score = Limited evaluation of the No
and aggregation algorithm 94% framework
[142] YOLO3 KITTI Evaluation of real-time object MAP = Sensitive to changes in labels No
detection in real traffic environ- 68.5% across clients
ments
[143] SNNs BelgiumTSC Leverages spike NN to optimize accuracy = Suffers from security issue No
resources required for the train- 95%
ing

cameras and field programmable gate arrays (FPGA) attached C. Non-IID data
cameras are orchestrated in an FL setting, the models have
heterogeneous architectures and model parameter sharing is Although FL offers a promising method to privacy pro-
infeasible [152], [153]. The barriers of conventional FL are tection, there are significant difficulties when FL is used in
removed when the models share their models instead of their the real world as opposed to centralized learning. Numerous
parameters or updates. Algorithms such as the federated model studies have shown that the accuracy of FL on non-IID
distillation (FedMD) propose a model agnostic federation or heterogeneous data would inevitably deteriorate, mainly
solution, which sends the predictions of the local models because of the divergence in the weights of local models that
to the central node instead of the models themselves [154]. result from non-IID data [156]. Either the FL approach is
Another impact of node heterogeneity applies to the overall horizontal (i.e. aggregating the local models’ weights on a
performance of the FL process since the federation of heteroge- central server) or vertical (i.e. aggregating the model outputs
neous nodes with varying training data structure and size and in the guest client to calculate the loss function) the non-IID
varying processing capabilities usually requires in each round data can take various forms (e.g. attribute, label or temporal
all the local nodes to train their models before proceeding to skew) that can affect the overall performance, and mitigation
the next iteration. As a result, slow nodes, with low usability measures must be taken to avoid it [157]. These include data
data may degrade the time performance of the federation. The sharing and augmentation, and the fine-tuning of local models
work [155] proposes a reinforcement learning-based central using a combination of local and global information. Wang wt
server, which gradually weights the clients based on the quality al. [158] propose FAVOR, a reinforcement-based method (deep
of their models and their overall response in an attempt to Q-learning) for choosing the clients that contribute models to
establish a group of clients that are used to achieve almost the aggregation phase in each round. The proposed method
optimal performance. reduces the non-IID data bias and the communication overhead.
24

Since the number of dimensions of the state space equals the Blockchain can be beneficial for preserving privacy in FL
number of model weights times the number of client nodes, a settings and can help avoid various attack types, from single
dimensionality reduction technique, such as PCA, is applied in point of failure and membership inference to Byzantine, label
order to compress the state space. Another group of approaches flipping, and data poisoning ones [165]. The FL blockchain,
tries to balance the bias of non-IID data by clustering the local or FLchain for short, paradigm can transform mobile-edge
updates using a hierarchical algorithm [159]. Such approaches computing to a secure and privacy-preserving system, once the
result in multiple models that are trained independently and proper solutions are found for allocating resources, providing
in parallel, leading to a faster and better convergence of each incentives to the client nodes, and protecting the security
group. and privacy of data at an optimal communication cost.
In their proposed architecture, the mobile-edge computing
D. Device compatibility and network issues (MEC) servers can act either as learning clients or as miners
FL relies on the participation of multiple devices, which to establish blockchain consensus and the mobile devices
may have different hardware and software configurations. This associated with each server focus only on the learning tasks.
can make it difficult to ensure that the model can be trained on The mobile devices transmit their models to the MEC server
all devices, and may require additional efforts to optimize the as transaction. Each MEC server stores the local models it
model for different device types [160]. FL requires a stable collects to the blockchain after verifying them with other
and reliable network connection in order to train the model MEC servers. The aggregation node or any other local node
and exchange model updates between participants and the is then able to retrieve the stored and verified models from
central server. If the network connection is unreliable or has the blockchain, aggregate them and use the resulting model
low bandwidth, this can negatively impact the training process in the next iteration [36]. Watermarking is another means of
and the performance of the model. Explicitly, if the parties model security preserving [123], [124], [125].
involved in FL are located in different geographical regions,
the time it takes for model updates to be transmitted over the G. Replication of research results
network can vary significantly, leading to delays in the training Despite the fact that the performance of FL models requires
process. further improvement compared to that obtained with centralized
training, replicating FL research results and conducting fair
E. Human bias comparisons are still challenging. This is mainly due to the lack
FL relies on the participation of multiple parties, which of exploration in different tasks with a unified FL framework.
may have different biases or perspectives that can influence To that end, [166] develop an FL benchmarking and library
the model’s training and performance. This can be particularly platform for CV applications, called FedCV. This helps in
challenging in the context of CV, as the model may be trained bridging gaps between SOTA algorithms and facilitating the
on data with biased or inaccurate annotations. It is important to development of FL solutions. Additionally, three main taks of
carefully consider and address these issues in order to ensure CV, including image object detection, image segmentation and
that the model is fair and unbiased [161]. image classification can be evaluated on this toolkit, as shown
in Fig. 13. Moreover, numerous FL algorithms, models, and
F. Privacy and robustness to attacks non-IID benchmarking datasets have been uploaded and still
The resilience and data privacy of existing FL protocol de- the toolkit can be updated.
signs have been shown to be compromised by adversaries both Besides, enhancing FL-based systems’ efficiency is a
inside and outside the system. The sharing of gradients during delicate task due to the per-client memory cost and large
training can reveal private information and cause data leakage number of parameters. Thus, using the FedCV framework,
to the central server or a third party. Similarly, malicious users which is an efficient and flexible distributed training toolkit
can try to affect the global model by introducing poisoned that has easy-to-use APIs, along with benchmarks, and different
local data or gradients (model poisoning) in an attempt to evaluation settings can be useful for the FL research community
destroy its convergence (i.e. Byzantine attack) or to implant to conduct advanced research CV studies.
a trigger in the model that constantly predicts adversarial
classes, without losing performance on the main task. These V. C HALLENGES OF USING FL IN CV
two times of attacks either aim to breach user privacy (e.g. FL is a promising approach for solving privacy and data
infer class representatives, infer class membership or infer user distribution challenges in CV tasks such as object detection,
properties) or poison the model in order to control its behavior autonomous vehicles, etc. However, there are several challenges
[162]. Homomorphic encryption, secure joint computation associated with the use of FL in these applications. For instance,
from multiple parties, and differential privacy are some of FL assumes that the data distribution across clients is similar.
the means for mitigating privacy breaches. Respectively, the However, in CV tasks, the data may be heterogeneous, and the
defenses against poisoning attacks focus on the detection of models need to be robust to handle variations in lighting, view-
malicious users based on the anomalies detected in their models point, and occlusions. FL relies on communication between
(e.g. different distributions of features than the rest of the users clients and the server. For CV tasks, such as object detection,
[163]) focusing on increasing the robustness against Byzantine the models can be large, and the communication costs can
attacks. They also examine the resulting models to detect be high. This can result in slower training times and higher
whether they are compromised or not, using a combination of energy consumption. Moreover, CV tasks require labeled data
backdoor-enabled and clean inputs and examining the behavior for training the models while FL requires each client to have
of the model [164]. labeled data, which can be challenging in scenarios where
25

Computer Vision Applications

Healthcare / Medical AI Smart Home / Surveillance Autonomous Vehicles Video Surveillance

FedML-API (High-level APIs)

Model Data

Object detection Non-IID Dataset

Image segmentation Data Loader

Object detection Partition methods

Federated learning Algorithms (distributed / standalone)

FedAVG FedOpt Vertical FL FedNas FedGKT Security-guaranteed Fl algorithms

FedDG FedSLD Fedcm PartialFed WAFFLE PPWGAN-GP FedSup

FedML-API (Low-level APIs)

ComManager ComManager ComManager

ComManager ComManager ComManager


Security & Privacy
Primitive
ComManager ComManager

MPI MQTT RPC ComManager ComManager

Distributed communication Training Engine

Fig. 13. Overview of FedCV platform architecture design proposed in [166].

labeling data is expensive or time-consuming. Additionally, employed to validate all the remaining issues discussed in this
FL requires clients to share their data with the server, which section, such as the node and data heterogeneity, the need for
can raise privacy and security concerns. Malicious clients can robustness and privacy etc.
manipulate the data or models, compromising the integrity of
the system. Lastly, FL requires aggregating the models from FedVision [96] is an online platform for developing object
multiple clients to produce a global model. In CV tasks, this detection solutions using FL and a three step workflow that
can be challenging due to differences in the quality of the comprises: i) image annotation, ii) horizontal FL model
models and the data distribution across clients. training and iii) model update. Since the platform is generic, it
allows users to configure the learning parameters, schedule the
On the other hand, FedCV [166] is a benchmarking communication between the server and clients, allocate tasks
framework for the evaluation of FL in popular CV tasks and monitor the utilization of resources. The main challenges
such as image classification and segmentation and object mentioned in this section for FL also apply in the case of
detection. It comprises non-IID datasets and various models FedVision, which however chooses specific strategies to tackle
and algorithms for experimentation. The experiments validate them. For example, it uses model compression to reduce
the aforementioned open challenges for FL in CV tasks: i) the communication overhead, cloud object storage to store huge
degradation of model accuracy when non-IID data are used, amounts of data (model parameters) in the server, and an
ii) the complexity of optimising the training in the FL setting, one-stage approach, based on YOLOv3, to perform end-to-end
iii) the huge number of parameters of NN models used in CV training of the model that identifies the bounding box and
that affects the FL performance. The same benchmark can be the class of the object. However, the real challenges for FL
26

approaches in CV come with the application in real-world or regulatory requirements. Such applications can benefit
images from surveillance cameras, etc. [167]. The large-scale from FL since they can ease network load and enable
of data collected and the requirement for almost real-time private learning amongst many devices and organizations.
inference raises more design challenges for FL experts. • Large language models and generative chatbots:
Advanced language models, such as ChatGPT, can assist
VI. F UTURE DIRECTIONS in various ways to improve the use of FL in CV
by (i) offering explanations, answering questions, and
The future of FL is very promising, but also challenging providing tutorials related to FL and CV [168]. By helping
for researchers, with the main directions being as follows: researchers and practitioners understand the concepts
• Deployment over heterogeneous environments: Smart- better, it can drive wider adoption and more informed
phones are becoming the most popular edge devices application of FL in CV [169]; (ii) simplifying complex
for ML applications since they allow users to perform algorithms, provide pseudocode, and suggest optimization
a large variety of tasks from face detection to voice strategies; assisting with code debugging or suggesting
recognition. In this direction, FL can be used to support modifications to FL algorithms, thereby improving their
such tasks without exposing private information. On the efficiency or performance [170]; (iii) providing insights
other side, IoT networks combine wearables, mobile and into the ethical implications and considerations when
stable sensors, and smartphones in order to establish applying FL in CV, especially in terms of data privacy
smart environments for the end users. All these constitute and usage; (iv) simulating client-server conversations,
a diverse and heterogeneous environment in which models allowing developers to anticipate challenges and refine
of varying complexity have to be communicated in the their systems; and (v) offering advice on best practices
place of data in order to implement federation while for integration, be it with databases, cloud services, or
protecting data privacy. edge devices [171].
• Efficient communication: When creating techniques for
federated networks, communication is a crucial bottleneck VII. C ONCLUSION
to take into account. This is due to the fact that federated
networks may contain a sizable number of devices (such Federated Learning (FL) has emerged as a revolutionary
as millions of smartphones), and communication through- paradigm in the realm of Computer Vision (CV), fostering
out the network may be much slower than local computing. collaborative machine learning without compromising data
Reducing the overall number of communication rounds privacy. This review navigated through the intricate alleys of
or the quantity of communicated messages at each round FL, from its foundational concepts to the myriad applications in
are the two main ways to further cut communication. CV. The aggregation approaches such as averaging, Progressive
The communication can be more efficient using: i) Local Fourier, and FedGKT accentuate FL’s versatility. Moreover, the
updating techniques that allow to cut down the overall inclusion of privacy technologies like the Secure MPC model,
communication rounds, ii) model compression techniques differential privacy, and homomorphic encryption underscores
including quantization, subsampling, and sparsification its commitment to safeguarding data.
can be used to reduce the size of messages conveyed It is remarkable to note the vast landscape of CV applications
during each update round, iii) when the connection with benefitting from FL, ranging from object and face detection
the server becomes a bottleneck, decentralized topolo- to innovative domains like healthcare, autonomous driving,
gies provide an alternative, particularly when working and smart environment surveillance. Yet, like any evolving
in networks with low bandwidth or excessive latency. technology, FL in CV is not devoid of challenges. Issues
Iterative optimization algorithms are parallelized using like communication overhead, device heterogeneity, and the
asynchronous communication. An appealing strategy for conundrums posed by non-IID data offer fertile grounds for
reducing stragglers in heterogeneous contexts is the use of future research.
asynchronous systems. Moreover, using 5G/6G networks While current advancements set a promising trajectory, the
will offer significantly higher speeds and lower latency open issues highlight areas ripe for exploration and innovation.
compared to previous generations. This can enable faster The challenges also underscore the importance of collaboration
and more efficient FL in CV applications. between researchers, practitioners, and industries to make FL
• Dispersed FL: Concerns about FL’s robustness exist more efficient, inclusive, and robust for CV.
since it could cease to function if the aggregate server As we stand on the cusp of a technological evolution, FL
fails (e.g., due to a malicious attack or physical defect). offers a beacon of hope, combining the best of collaborative
Dispersed FL can be used as a more robust alternative to learning and data privacy. The journey ahead is replete with
FL, with groups of nodes with a lot of computing power opportunities and challenges, making it an exhilarating era for
to collaborate in more sub-global iterations to increase researchers and enthusiasts alike.
their performance and consequently support the overall
performance of the federation. R EFERENCES
• New organizational models: The term ”devices” can
[1] Y. Himeur, K. Ghanem, A. Alsalemi, F. Bensaali, and A. Amira,
refer to entire companies or institutions in the context of “Artificial intelligence based anomaly detection of energy consumption
FL. For applications in predictive healthcare, hospitals in buildings: A review, current trends and new perspectives,” Applied
are an example of a company with a lot of patient data. Energy, vol. 287, p. 116601, 2021.
[2] A. Sayed, Y. Himeur, F. Bensaali, and A. Amira, “Artificial intelligence
Hospitals must adhere to strong privacy laws and may with iot for energy efficiency in buildings,” Emerging Real-World
have to maintain local data due to ethical, administrative, Applications of Internet of Things, pp. 233–252, 2022.
27

[3] D. Ng, X. Lan, M. M.-S. Yao, W. P. Chan, and M. Feng, “Federated [26] A. R. Khan, A. Zoha, L. Mohjazi, H. Sajid, Q. Abbasi, and M. A. Imran,
learning: a collaborative effort to achieve better medical imaging models “When federated learning meets vision: An outlook on opportunities and
for individual sites that have small labelled datasets,” Quantitative challenges,” in EAI International Conference on Body Area Networks.
Imaging in Medicine and Surgery, vol. 11, no. 2, p. 852, 2021. Springer, 2021, pp. 308–319.
[4] A. Al-Kababji, F. Bensaali, S. P. Dakua, and Y. Himeur, “Automated [27] K. Giorgas and I. Varlamis, “Online federated learning with imbalanced
liver tissues delineation techniques: A systematic survey on machine class distribution,” in 24th Pan-Hellenic Conference on Informatics,
learning current trends and future orientations,” Engineering Applica- 2020, pp. 91–95.
tions of Artificial Intelligence, vol. 117, p. 105532, 2023. [28] S. Abuadbba, K. Kim, M. Kim, C. Thapa, S. A. Camtepe, Y. Gao,
[5] A. N. Sayed, Y. Himeur, and F. Bensaali, “From time-series to 2d H. Kim, and S. Nepal, “Can we use split learning on 1d cnn models
images for building occupancy prediction using deep transfer learning,” for privacy preserving training?” in Proceedings of the 15th ACM
Engineering Applications of Artificial Intelligence, vol. 119, p. 105786, Asia Conference on Computer and Communications Security, 2020, pp.
2023. 305–318.
[6] Y. Teng, J. Zhang, and T. Sun, “Data-driven decision-making model [29] M. S. Jere, T. Farnan, and F. Koushanfar, “A taxonomy of attacks on
based on artificial intelligence in higher education system of colleges federated learning,” IEEE Security & Privacy, vol. 19, no. 2, pp. 20–28,
and universities,” Expert Systems, vol. 40, no. 4, p. e12820, 2023. 2020.
[7] Y. Himeur, S. Al-Maadeed, N. Almaadeed, K. Abualsaud, A. Mohamed, [30] J. Zhang, Y. Chen, and H. Li, “Privacy leakage of adversarial training
T. Khattab, and O. Elharrouss, “Deep visual social distancing monitor- models in federated learning systems,” in Proceedings of the IEEE/CVF
ing to combat covid-19: A comprehensive survey,” Sustainable cities Conference on Computer Vision and Pattern Recognition, 2022, pp.
and society, vol. 85, p. 104064, 2022. 108–114.
[8] Y. Elmir, Y. Himeur, and A. Amira, “Ecg classification using deep cnn [31] Y. Himeur, A. Sayed, A. Alsalemi, F. Bensaali, A. Amira, I. Varlamis,
and gramian angular field,” arXiv preprint arXiv:2308.02395, 2023. M. Eirinaki, C. Sardianos, and G. Dimitrakopoulos, “Blockchain-
[9] A. Chouchane, A. Ouamane, Y. Himeur, W. Mansoor, S. Atalla, A. Ben- based recommender systems: Applications, challenges and future
zaibak, and C. Boudellal, “Improving cnn-based person re-identification opportunities,” Computer Science Review, vol. 43, p. 100439, 2022.
using score normalization,” arXiv preprint arXiv:2307.00397, 2023. [32] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain
[10] H. Kheddar, Y. Himeur, S. Al-Maadeed, A. Amira, and F. Bensaali, and federated learning for privacy-preserved data sharing in industrial
“Deep transfer learning for automatic speech recognition: Towards better iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp.
generalization,” arXiv preprint arXiv:2304.14535, 2023. 4177–4186, 2019.
[11] Y. Himeur, S. Al-Maadeed, I. Varlamis, N. Al-Maadeed, K. Abualsaud, [33] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb, “Federated
and A. Mohamed, “Face mask detection in smart cities using deep machine learning: Survey, multi-level classification, desirable criteria
and transfer learning: lessons learned from the covid-19 pandemic,” and future directions in communication and networking systems,” IEEE
Systems, vol. 11, no. 2, p. 107, 2023. Communications Surveys & Tutorials, vol. 23, no. 2, pp. 1342–1397,
[12] F. Esposito and D. Malerba, “Machine learning in computer vision,” 2021.
Applied Artificial Intelligence, vol. 15, no. 8, pp. 693–705, 2001. [34] Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, and B. He, “A
[13] A. Copiaco, Y. Himeur, A. Amira, W. Mansoor, F. Fadli, S. Atalla, survey on federated learning systems: vision, hype and reality for data
and S. S. Sohail, “An innovative deep anomaly detection of building privacy and protection,” IEEE Transactions on Knowledge and Data
energy consumption using energy time-series images,” Engineering Engineering, 2021.
Applications of Artificial Intelligence, vol. 119, p. 105775, 2023. [35] M. Aledhari, R. Razzak, R. M. Parizi, and F. Saeed, “Federated learning:
[14] S. Alyamkin, M. Ardi, A. C. Berg, A. Brighton, B. Chen, Y. Chen, H.-P. A survey on enabling technologies, protocols, and applications,” IEEE
Cheng, Z. Fan, C. Feng, B. Fu et al., “Low-power computer vision: Access, vol. 8, pp. 140 699–140 725, 2020.
Status, challenges, and opportunities,” IEEE Journal on Emerging and [36] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, and
Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 411–421, H. V. Poor, “Federated learning for internet of things: A comprehensive
2019. survey,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp.
[15] Z. Tong and G. Tanaka, “Reservoir computing with untrained con- 1622–1658, 2021.
volutional neural networks for image recognition,” in 2018 24th [37] S. AbdulRahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi,
International Conference on Pattern Recognition (ICPR). IEEE, 2018, and M. Guizani, “A survey on federated learning: The journey from
pp. 1289–1294. centralized to distributed on-site learning and beyond,” IEEE Internet
[16] S. Mohamed and G. Rubino, “A study of real-time packet video quality of Things Journal, vol. 8, no. 7, pp. 5476–5497, 2020.
using random neural networks,” IEEE transactions on circuits and [38] Y. Cheng, Y. Liu, T. Chen, and Q. Yang, “Federatfed learning for
systems for video technology, vol. 12, no. 12, pp. 1071–1083, 2002. privacy-preserving ai,” Communications of the ACM, vol. 63, no. 12,
[17] Y. Himeur, S. S. Sohail, F. Bensaali, A. Amira, and M. Alazab, pp. 33–36, 2020.
“Latest trends of security and privacy in recommender systems: a [39] S. Yang, B. Ren, X. Zhou, and L. Liu, “Parallel distributed logistic re-
comprehensive review and future perspectives,” Computers & Security, gression for vertical federated learning without third-party coordinator,”
vol. 118, p. 102746, 2022. arXiv preprint arXiv:1911.09824, 2019.
[18] S. Quach, P. Thaichon, K. D. Martin, S. Weaven, and R. W. Palmatier, [40] S. Sharma, C. Xing, Y. Liu, and Y. Kang, “Secure and efficient federated
“Digital technologies: tensions in privacy and data,” Journal of the transfer learning,” in 2019 IEEE International Conference on Big Data
Academy of Marketing Science, vol. 50, no. 6, pp. 1299–1323, 2022. (Big Data). IEEE, 2019, pp. 2569–2576.
[19] Y. Himeur and K. A. Sadi, “Robust video copy detection based on ring [41] G. Bjelobaba, A. Savić, T. Tošić, I. Stefanović, and B. Kocić,
decomposition based binarized statistical image features and invariant “Collaborative learning supported by blockchain technology as a model
color descriptor (rbsif-icd),” Multimedia Tools and Applications, vol. 77, for improving the educational process,” Sustainability, vol. 15, no. 6,
pp. 17 309–17 331, 2018. p. 4780, 2023.
[20] D. Patrikar and M. Parate, “Anomaly detection using edge computing [42] Y. Liu, X. Zhang, Y. Kang, L. Li, T. Chen, M. Hong, and Q. Yang,
in video surveillance system: review,” Int J Multimed Info Retr, vol. 11, “Fedbcd: A communication-efficient collaborative learning framework
pp. 85–110, 2022. for distributed features,” IEEE Transactions on Signal Processing,
[21] A. Xiang, “Being’seen’vs.’mis-seen’: Tensions between privacy and vol. 70, pp. 4277–4290, 2022.
fairness in computer vision,” Harvard Journal of Law & Technology, [43] E. Gabrielli, G. Pica, and G. Tolomei, “A survey on decentralized
Forthcoming, 2022. federated learning,” arXiv preprint arXiv:2308.04604, 2023.
[22] A. Alsalemi, Y. Himeur, F. Bensaali, and A. Amira, “An innovative [44] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated
edge-based internet of energy solution for promoting energy saving in optimization: Distributed machine learning for on-device intelligence,”
buildings,” Sustainable Cities and Society, vol. 78, p. 103571, 2022. arXiv preprint arXiv:1610.02527, 2016.
[23] A. Sayed, Y. Himeur, A. Alsalemi, F. Bensaali, and A. Amira, [45] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated
“Intelligent edge-based recommender system for internet of energy multi-task learning,” Advances in neural information processing systems,
applications,” IEEE Systems Journal, vol. 16, no. 3, pp. 5001–5010, vol. 30, 2017.
2021. [46] A. Linardos, K. Kushibar, S. Walsh, P. Gkontra, and K. Lekadir,
[24] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated “Federated learning for multi-center imaging diagnostics: A study in
learning,” Synthesis Lectures on Artificial Intelligence and Machine cardiovascular disease,” arXiv preprint arXiv:2107.03901, 2021.
Learning, vol. 13, no. 3, pp. 1–207, 2019. [47] Z. Chen, M. Zhu, C. Yang, and Y. Yuan, “Personalized retrogress-
[25] O. Gupta and R. Raskar, “Distributed learning of deep neural network resilient framework for real-world medical federated learning,” in
over multiple agents,” Journal of Network and Computer Applications, International Conference on Medical Image Computing and Computer-
vol. 116, pp. 1–8, 2018. Assisted Intervention. Springer, 2021, pp. 347–356.
28

[48] K. Doshi and Y. Yilmaz, “Federated learning-based driver activity [69] D. Shome and T. Kar, “Fedaffect: Few-shot federated learning for facial
recognition for edge devices,” in Proceedings of the IEEE/CVF expression recognition,” in Proceedings of the IEEE/CVF International
Conference on Computer Vision and Pattern Recognition, 2022, pp. Conference on Computer Vision, 2021, pp. 4168–4175.
3338–3346. [70] H. Kheddar, M. Hemis, Y. Himeur, D. Megı́as, and A. Amira, “Deep
[49] H. Zhu, “On the relationship between (secure) multi-party computation learning for diverse data types steganalysis: A review,” arXiv preprint
and (secure) federated learning,” arXiv preprint arXiv:2008.02609, arXiv:2308.04522, 2023.
2020. [71] W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate rgb-d salient
[50] D. Byrd and A. Polychroniadou, “Differentially private secure multi- object detection via collaborative learning,” in Computer Vision–ECCV
party computation for federated learning in financial applications,” 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020,
in Proceedings of the First ACM International Conference on AI in Proceedings, Part XVIII 16. Springer, 2020, pp. 52–69.
Finance, 2020, pp. 1–9. [72] Q. Fan, D.-P. Fan, H. Fu, C.-K. Tang, L. Shao, and Y.-W. Tai, “Group
[51] R. Kanagavelu, Z. Li, J. Samsudin, Y. Yang, F. Yang, R. S. M. Goh, collaborative learning for co-salient object detection,” in Proceedings of
M. Cheah, P. Wiwatphonthana, K. Akkarajitsakul, and S. Wang, “Two- the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
phase multi-party computation enabled privacy-preserving federated 2021, pp. 12 288–12 298.
learning,” in 2020 20th IEEE/ACM International Symposium on Cluster, [73] S. Teeparthi, V. Jatla, M. S. Pattichis, S. Celedón-Pattichis, and
Cloud and Internet Computing (CCGRID). IEEE, 2020, pp. 410–419. C. LópezLeiva, “Fast hand detection in collaborative learning environ-
[52] A. Triastcyn and B. Faltings, “Federated learning with bayesian ments,” in International Conference on Computer Analysis of Images
differential privacy,” in 2019 IEEE International Conference on Big and Patterns. Springer, 2021, pp. 445–454.
Data (Big Data). IEEE, 2019, pp. 2587–2596. [74] S. Teeparthi, “Long term object detection and tracking in collaborative
[53] X. Wu, Y. Zhang, M. Shi, P. Li, R. Li, and N. N. Xiong, “An adaptive learning environments,” arXiv preprint arXiv:2106.07556, 2021.
federated learning scheme with differential privacy preserving,” Future [75] C. Huang and R. Nevatia, “High performance object detection by
Generation Computer Systems, vol. 127, pp. 362–372, 2022. collaborative learning of joint ranking of granules features,” in 2010
[54] U. Shah, I. Dave, J. Malde, J. Mehta, and S. Kodeboyina, “Maintaining IEEE Computer Society Conference on Computer Vision and Pattern
privacy in medical imaging with federated learning, deep learning, Recognition. IEEE, 2010, pp. 41–48.
differential privacy, and encrypted computation,” in 2021 6th Inter- [76] X. Fang, Z. Kuang, R. Zhang, X. Shao, and H. Wang, “Collaborative
national Conference for Convergence in Technology (I2CT). IEEE, learning in bounding box regression for object detection,” Pattern
2021, pp. 1–6. Recognition Letters, 2021.
[55] M. Adnan, S. Kalra, J. C. Cresswell, G. W. Taylor, and H. R. Tizhoosh, [77] Y. Zhou, X. He, L. Huang, L. Liu, F. Zhu, S. Cui, and L. Shao, “Col-
“Federated learning and differential privacy for medical image analysis,” laborative learning of semi-supervised segmentation and classification
Scientific reports, vol. 12, no. 1, pp. 1–10, 2022. for medical images,” in Proceedings of the IEEE/CVF Conference on
[56] O. Choudhury, A. Gkoulalas-Divanis, T. Salonidis, I. Sylla, Y. Park, Computer Vision and Pattern Recognition, 2019, pp. 2079–2088.
G. Hsu, and A. Das, “Differential privacy-enabled federated learning [78] J. Wang, J. Yao, Y. Zhang, and R. Zhang, “Collaborative learning for
for sensitive health data,” arXiv preprint arXiv:1910.02578, 2019. weakly supervised object detection,” arXiv preprint arXiv:1802.03531,
[57] A. Ziller, D. Usynin, R. Braren, M. Makowski, D. Rueckert, and 2018.
G. Kaissis, “Medical imaging deep learning with differential privacy,” [79] D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge
Scientific Reports, vol. 11, no. 1, pp. 1–8, 2021. for weakly supervised object detection under a collaborative self-paced
[58] L. Zhang, J. Xu, P. Vijayakumar, P. K. Sharma, and U. Ghosh, curriculum learning framework,” International Journal of Computer
“Homomorphic encryption-based privacy-preserving federated learning Vision, vol. 127, no. 4, pp. 363–380, 2019.
in iot-enabled healthcare system,” IEEE Transactions on Network [80] S. Chen, D. Shao, X. Shu, C. Zhang, and J. Wang, “FCC-Net: A full-
Science and Engineering, pp. 1–17, 2022. coverage collaborative network for weakly supervised remote sensing
[59] D. Stripelis, H. Saleem, T. Ghai, N. Dhinagar, U. Gupta, C. Anastasiou, object detection,” Electronics, vol. 9, no. 9, p. 1356, 2020.
G. Ver Steeg, S. Ravi, M. Naveed, P. M. Thompson et al., “Secure [81] Y. Liang, G. Qin, M. Sun, J. Qin, J. Yan, and Z. Zhang, “Semantic
neuroimaging analysis using federated learning with homomorphic and detail collaborative learning network for salient object detection,”
encryption,” in 17th International Symposium on Medical Information Neurocomputing, vol. 462, pp. 478–490, 2021.
Processing and Analysis, vol. 12088. SPIE, 2021, pp. 351–359. [82] J. Seo and H. Park, “Object recognition in very low resolution images
[60] R. Kumar, J. Kumar, A. A. Khan, H. Ali, C. M. Bernard, R. U. Khan, using deep collaborative learning,” IEEE Access, vol. 7, pp. 134 071–
S. Zeng et al., “Blockchain and homomorphic encryption based privacy- 134 082, 2019.
preserving model aggregation for medical images,” Computerized [83] Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online
Medical Imaging and Graphics, vol. 102, p. 102139, 2022. knowledge distillation via collaborative learning,” in Proceedings of the
[61] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, IEEE/CVF Conference on Computer Vision and Pattern Recognition,
and W. Shi, “Federated learning of predictive models from federated 2020, pp. 11 020–11 029.
electronic health records,” International journal of medical informatics, [84] P. Yu and Y. Liu, “Federated object detection: Optimizing object
vol. 112, pp. 59–67, 2018. detection model with federated learning,” in Proceedings of the 3rd
[62] Y. Zhao, H. Liu, H. Li, P. Barnaghi, and H. Haddadi, “Semi- International Conference on Vision, Image and Signal Processing, 2019,
supervised federated learning for activity recognition,” arXiv preprint pp. 1–6.
arXiv:2011.00851, 2020. [85] H. Choi and I. V. Bajić, “Deep feature compression for collaborative
[63] A. Grammenos, R. Mendoza Smith, J. Crowcroft, and C. Mascolo, “Fed- object detection,” in 2018 25th IEEE International Conference on
erated principal component analysis,” Advances in Neural Information Image Processing (ICIP). IEEE, 2018, pp. 3743–3747.
Processing Systems, vol. 33, pp. 6453–6464, 2020. [86] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning
[64] H. H. Kumar, V. Karthik, and M. K. Nair, “Federated k-means clustering: for health: Distributed deep learning without sharing raw patient data,”
A novel edge ai based approach for privacy preservation,” in 2020 IEEE arXiv preprint arXiv:1812.00564, 2018.
International Conference on Cloud Computing in Emerging Markets [87] Y. He, Y. Kang, J. Luo, L. Fan, and Q. Yang, “A hybrid self-supervised
(CCEM). IEEE, 2020, pp. 52–56. learning framework for vertical federated learning,” arXiv preprint
[65] H. Kassem, D. Alapatt, P. Mascagni, C. AI4SafeChole, A. Karargyris, arXiv:2208.08934, 2022.
and N. Padoy, “Federated cycling (fedcy): Semi-supervised federated [88] W. Zhuang, Y. Wen, and S. Zhang, “Divergence-aware federated self-
learning of surgical phases,” IEEE Transactions on Medical Imaging, supervised learning,” arXiv preprint arXiv:2204.04385, 2022.
pp. 1–1, 2022. [89] A. Saeed, F. D. Salim, T. Ozcelebi, and J. Lukkien, “Federated
[66] Y. Rehman, Y. Gao, J. Shen, P. de Gusmão, and N. Lane, “Federated self-supervised learning of multisensor representations for embedded
self-supervised learning for video understanding,” Lecture Notes in intelligence,” IEEE Internet of Things Journal, vol. 8, no. 2, pp. 1030–
Computer Science (including subseries Lecture Notes in Artificial 1040, 2020.
Intelligence and Lecture Notes in Bioinformatics), vol. 13691 LNCS, [90] R. Yan, L. Qu, Q. Wei, S.-C. Huang, L. Shen, D. Rubin, L. Xing, and
pp. 506–522, 2022, cited By 0. Y. Zhou, “Label-efficient self-supervised federated learning for tackling
[67] I. Dave, C. Chen, and M. Shah, “Spact: Self-supervised privacy data heterogeneity in medical imaging,” IEEE Transactions on Medical
preservation for action recognition,” in Proceedings of the IEEE/CVF Imaging, 2023.
Conference on Computer Vision and Pattern Recognition, vol. 2022- [91] R. Zhu, K. Yin, H. Xiong, H. Tang, and G. Yin, “Masked face detection
June. IEEE Computer Society, 2022, pp. 20 132–20 141, cited By algorithm in the dense crowd based on federated learning,” Wireless
3. Communications and Mobile Computing, vol. 2021, 2021.
[68] S. Li, Y. Mao, J. Li, Y. Xu, J. Li, X. Chen, S. Liu, and X. Zhao, [92] Y. Himeur, S. Al-Maadeed, H. Kheddar, N. Al-Maadeed, K. Abualsaud,
“Fedutn: federated self-supervised learning with updating target network,” A. Mohamed, and T. Khattab, “Video surveillance using deep transfer
Applied Intelligence, 2022, cited By 0. learning and deep domain adaptation: Towards better generalization,”
29

Engineering Applications of Artificial Intelligence, vol. 119, p. 105698, [113] B. Yan, B. Liu, L. Wang, Y. Zhou, Z. Liang, M. Liu, and C.-Z. Xu,
2023. “Fedcm: A real-time contribution measurement method for participants
[93] A. B. Sada, M. A. Bouras, J. Ma, H. Runhe, and H. Ning, “A distributed in federated learning,” in 2021 International Joint Conference on Neural
video analytics architecture based on edge-computing and federated Networks (IJCNN). IEEE, 2021, pp. 1–8.
learning,” in 2019 IEEE Intl Conf on Dependable, Autonomic and [114] H. R. Roth, K. Chang, P. Singh, N. Neumark, W. Li, V. Gupta,
Secure Computing, Intl Conf on Pervasive Intelligence and Computing, S. Gupta, L. Qu, A. Ihsani, B. C. Bizzo et al., “Federated learning
Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Sci- for breast density classification: A real-world implementation,” in
ence and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Domain Adaptation and Representation Transfer, and Distributed and
IEEE, 2019, pp. 215–220. Collaborative Learning. Springer, 2020, pp. 181–191.
[94] F. Concone, C. Ferdico, G. L. Re, and M. Morana, “A federated learning [115] Z. Yan, J. Wicaksana, Z. Wang, X. Yang, and K.-T. Cheng, “Variation-
approach for distributed human activity recognition,” in 2022 IEEE aware federated learning with multi-source decentralized medical image
International Conference on Smart Computing (SMARTCOMP). IEEE, data,” IEEE Journal of Biomedical and Health Informatics, 2020.
2022, pp. 269–274. [116] S. Silva, B. A. Gutman, E. Romero, P. M. Thompson, A. Altmann,
[95] Y. Liu, J. Nie, X. Li, S. H. Ahmed, W. Y. B. Lim, and C. Miao, and M. Lorenzi, “Federated learning in distributed medical databases:
“Federated learning in the sky: Aerial-ground air quality sensing Meta-analysis of large-scale subcortical brain data,” in 2019 IEEE 16th
framework with uav swarms,” IEEE Internet of Things Journal, 2020. international symposium on biomedical imaging (ISBI 2019). IEEE,
[96] Y. Liu, A. Huang, Y. Luo, H. Huang, Y. Liu, Y. Chen, L. Feng, T. Chen, 2019, pp. 270–274.
H. Yu, and Q. Yang, “Fedvision: An online visual object detection [117] M. J. Sheller, B. Edwards, G. A. Reina, J. Martin, S. Pati, A. Kotrotsou,
platform powered by federated learning,” in Proceedings of the AAAI M. Milchenko, W. Xu, D. Marcus, R. R. Colen et al., “Federated
Conference on Artificial Intelligence, vol. 34, 2020, pp. 13 172–13 179. learning in medicine: facilitating multi-institutional collaborations
[97] A. Catalfamo, A. Celesti, M. Fazio, G. Randazzo, and M. Villari, “A without sharing patient data,” Scientific reports, vol. 10, no. 1, pp.
platform for federated learning on the edge: a video analysis use case,” 1–12, 2020.
in 2022 IEEE Symposium on Computers and Communications (ISCC), [118] P. Guo, P. Wang, J. Zhou, S. Jiang, and V. M. Patel, “Multi-institutional
2022, pp. 1–7. collaborations for improving deep learning-based magnetic resonance
[98] P. Jain, S. Goenka, S. Bagchi, B. Banerjee, and S. Chaterji, “Federated image reconstruction using federated learning,” in Proceedings of the
action recognition on heterogeneous embedded devices,” arXiv preprint IEEE/CVF Conference on Computer Vision and Pattern Recognition,
arXiv:2107.12147, 2021. 2021, pp. 2423–2432.
[99] B. Zhang, J. Wang, J. Fu, and J. Xia, “Driver action recognition [119] M. Y. Lu, R. J. Chen, D. Kong, J. Lipkova, R. Singh, D. F. Williamson,
using federated learning,” in 2021 the 7th International Conference on T. Y. Chen, and F. Mahmood, “Federated learning for computational
Communication and Information Processing (ICCIP), 2021, pp. 74–77. pathology on gigapixel whole slide images,” Medical image analysis,
[100] Z. Shi, L. Zhang, Y. Liu, X. Cao, Y. Ye, M.-M. Cheng, and vol. 76, p. 102298, 2022.
G. Zheng, “Crowd counting with deep negative correlation learning,” [120] W. Li, F. Milletarı̀, D. Xu, N. Rieke, J. Hancox, W. Zhu, M. Baust,
in Proceedings of the IEEE conference on computer vision and pattern Y. Cheng, S. Ourselin, M. J. Cardoso et al., “Privacy-preserving
recognition, 2018, pp. 5382–5390. federated brain tumour segmentation,” in International workshop on
[101] A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao, and L. Shao, machine learning in medical imaging. Springer, 2019, pp. 133–141.
“Relational attention network for crowd counting,” in Proceedings of [121] C. I. Bercea, B. Wiestler, D. Rueckert, and S. Albarqouni, “Feddis:
the IEEE/CVF international conference on computer vision, 2019, pp. Disentangled federated learning for unsupervised brain pathology
6788–6797. segmentation,” arXiv preprint arXiv:2103.03705, 2021.
[102] Y. Jiang, R. Cong, C. Shu, A. Yang, Z. Zhao, and G. Min, “Federated [122] A. E. Cetinkaya, M. Akin, and S. Sagiroglu, “Improving performance
learning based mobile crowd sensing with unreliable user data,” in 2020 of federated learning based medical image analysis in non-iid settings
IEEE 22nd International Conference on High Performance Computing using image augmentation,” in 2021 International Conference on
and Communications; IEEE 18th International Conference on Smart Information Security and Cryptology (ISCTURKEY). IEEE, 2021, pp.
City; IEEE 6th International Conference on Data Science and Systems 69–74.
(HPCC/SmartCity/DSS). IEEE, 2020, pp. 320–327. [123] B. Han, R. Jhaveri, H. Wang, D. Qiao, and J. Du, “Application of robust
[103] X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, “Flchain: A blockchain zero-watermarking scheme based on federated learning for securing the
for auditable federated learning with trust and incentive,” in 2019 5th healthcare data,” IEEE Journal of Biomedical and Health Informatics,
International Conference on Big Data Computing and Communications 2021.
(BIGCOM). IEEE, 2019, pp. 151–159. [124] B. G. Tekgul, Y. Xia, S. Marchal, and N. Asokan, “Waffle: Watermark-
[104] B. R. Kiran, D. M. Thomas, and R. Parakkal, “An overview of deep ing in federated learning,” in 2021 40th International Symposium on
learning based methods for unsupervised and semi-supervised anomaly Reliable Distributed Systems (SRDS). IEEE, 2021, pp. 310–320.
detection in videos,” Journal of Imaging, vol. 4, no. 2, p. 36, 2018. [125] X. Liu, S. Shao, Y. Yang, K. Wu, W. Yang, and H. Fang, “Secure
[105] S. Singh, S. Bhardwaj, H. Pandey, and G. Beniwal, “Anomaly detection federated learning model verification: A client-side backdoor triggered
using federated learning,” in Proceedings of International Conference watermarking scheme,” in 2021 IEEE International Conference on
on Artificial Intelligence and Applications. Springer, 2021, pp. 141– Systems, Man, and Cybernetics (SMC). IEEE, 2021, pp. 2414–2419.
148. [126] D. Połap, G. Srivastava, and K. Yu, “Agent architecture of an intelligent
[106] S. Bharti, A. McGibney, and T. O’Gorman, “Edge-enabled federated medical system based on federated learning and blockchain technology,”
learning for vision based product quality inspection,” in 2022 33rd Journal of Information Security and Applications, vol. 58, p. 102748,
Irish Signals and Systems Conference (ISSC). IEEE, 2022, pp. 1–6. 2021.
[107] A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, Y. Liu, [127] A. Ziller, D. Usynin, N. Remerscheid, M. Knolle, M. Makowski,
E. Topol, J. Dean, and R. Socher, “Deep learning-enabled medical R. Braren, D. Rueckert, and G. Kaissis, “Differentially private federated
computer vision,” NPJ digital medicine, vol. 4, no. 1, pp. 1–9, 2021. deep learning for multi-site medical image segmentation,” arXiv
[108] M. J. Sheller, G. A. Reina, B. Edwards, J. Martin, and S. Bakas, preprint arXiv:2107.02586, 2021.
“Multi-institutional deep learning modeling without sharing patient data: [128] K. Guo, N. Li, J. Kang, and J. Zhang, “Towards efficient federated
A feasibility study on brain tumor segmentation,” in International learning-based scheme in medical cyber-physical systems for distributed
MICCAI Brainlesion Workshop. Springer, 2018, pp. 92–104. data,” Software: Practice and Experience, vol. 51, no. 11, pp. 2274–
[109] X. Li, Y. Gu, N. Dvornek, L. H. Staib, P. Ventola, and J. S. Duncan, 2289, 2021.
“Multi-site fmri analysis using privacy-preserving federated learning [129] M. Jiang, Z. Wang, and Q. Dou, “Harmofl: Harmonizing local and
and domain adaptation: Abide results,” Medical Image Analysis, vol. 65, global drifts in federated learning on heterogeneous medical images,”
p. 101765, 2020. arXiv preprint arXiv:2112.10775, 2021.
[110] I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, [130] Q. Liu, C. Chen, J. Qin, Q. Dou, and P.-A. Heng, “Feddg: Federated
A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsai et al., “Federated learning domain generalization on medical image segmentation via episodic
for predicting clinical outcomes in patients with covid-19,” Nature learning in continuous frequency space,” in Proceedings of the
medicine, vol. 27, no. 10, pp. 1735–1743, 2021. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
[111] I. Feki, S. Ammar, Y. Kessentini, and K. Muhammad, “Federated 2021, pp. 1013–1023.
learning for covid-19 screening from chest x-ray images,” Applied Soft [131] B. Sun, H. Huo, Y. Yang, and B. Bai, “Partialfed: Cross-domain
Computing, vol. 106, p. 107330, 2021. personalized federated learning via partial initialization,” Advances in
[112] A. E. Cetinkaya, M. Akin, and S. Sagiroglu, “A communication efficient Neural Information Processing Systems, vol. 34, 2021.
federated learning approach to multi chest diseases classification,” [132] H.-P. Wang, S. U. Stich, Y. He, and M. Fritz, “Progfed: Effective, com-
in 2021 6th International Conference on Computer Science and munication, and computation efficient federated learning by progressive
Engineering (UBMK). IEEE, 2021, pp. 429–434. training,” arXiv preprint arXiv:2110.05323, 2021.
30

[133] J. Luo and S. Wu, “Fedsld: Federated learning with shared label distribu- area-efficient and flexible architectures for optimal ate pairing on fpga,”
tion for medical image classification,” arXiv preprint arXiv:2110.08378, arXiv preprint arXiv:2308.04261, 2023.
2021. [154] D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model
[134] G. Litjens, O. Debats, J. Barentsz, N. Karssemeijer, and H. Huisman, distillation,” arXiv preprint arXiv:1910.03581, 2019.
“Computer-aided detection of prostate cancer in mri,” IEEE transactions [155] J. Pang, Y. Huang, Z. Xie, Q. Han, and Z. Cai, “Realizing the
on medical imaging, vol. 33, no. 5, pp. 1083–1092, 2014. heterogeneity: A self-organized federated learning framework for iot,”
[135] H. Zhang, J. Bosch, and H. H. Olsson, “End-to-end federated learning IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3088–3098, 2020.
for autonomous driving vehicles,” in 2021 International Joint Confer- [156] A. Alsalemi, Y. Himeur, F. Bensaali, and A. Amira, “Smart sensing and
ence on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8. end-users’ behavioral change in residential buildings: An edge-based
[136] A. Nguyen, T. Do, M. Tran, B. X. Nguyen, C. Duong, T. Phan, internet of energy perspective,” IEEE Sensors Journal, vol. 21, no. 24,
E. Tjiputra, and Q. D. Tran, “Deep federated learning for autonomous pp. 27 623–27 631, 2021.
driving,” in 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, [157] H. Zhu, J. Xu, S. Liu, and Y. Jin, “Federated learning on non-iid data:
2022, pp. 1824–1830. A survey,” Neurocomputing, vol. 465, pp. 371–390, 2021.
[137] X. Zhou, W. Liang, J. She, Z. Yan, I. Kevin, and K. Wang, “Two- [158] H. Wang, Z. Kaplan, D. Niu, and B. Li, “Optimizing federated learning
layer federated learning with heterogeneous model aggregation for on non-iid data with reinforcement learning,” in IEEE INFOCOM
6g supported internet of vehicles,” IEEE Transactions on Vehicular 2020-IEEE Conference on Computer Communications. IEEE, 2020,
Technology, vol. 70, no. 6, pp. 5308–5317, 2021. pp. 1698–1707.
[159] C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical
[138] L. U. Khan, Y. K. Tun, M. Alsenwi, M. Imran, Z. Han, and C. S. Hong,
clustering of local updates to improve training on non-iid data,” in 2020
“A dispersed federated learning framework for 6g-enabled autonomous
International Joint Conference on Neural Networks (IJCNN). IEEE,
driving cars,” IEEE Transactions on Network Science and Engineering,
2020, pp. 1–9.
2022.
[160] H. Bousbiat, Y. Himeur, I. Varlamis, F. Bensaali, and A. Amira, “Neural
[139] Y. Li, X. Tao, X. Zhang, J. Liu, and J. Xu, “Privacy-preserved federated load disaggregation: Meta-analysis, federated learning and beyond,”
learning for autonomous driving,” IEEE Transactions on Intelligent Energies, vol. 16, no. 2, p. 991, 2023.
Transportation Systems, vol. 23, no. 7, pp. 8423–8434, 2021. [161] S. Pouriyeh, O. Shahid, R. M. Parizi, Q. Z. Sheng, G. Srivastava,
[140] I. Donevski, J. J. Nielsen, and P. Popovski, “On addressing heterogeneity L. Zhao, and M. Nasajpour, “Secure smart communication efficiency
in federated learning for autonomous vehicles connected to a drone in federated learning: Achievements and challenges,” Applied Sciences,
orchestrator,” Frontiers in Communications and Networks, vol. 2, p. vol. 12, no. 18, p. 8980, 2022.
709946, 2021. [162] L. Lyu, H. Yu, X. Ma, L. Sun, J. Zhao, Q. Yang, and P. S. Yu, “Privacy
[141] A. M. Elbir, B. Soner, S. Çöleri, D. Gündüz, and M. Bennis, “Federated and robustness in federated learning: Attacks and defenses,” arXiv
learning in vehicular networks,” in 2022 IEEE International Mediter- preprint arXiv:2012.06337, 2020.
ranean Conference on Communications and Networking (MeditCom). [163] S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning
IEEE, 2022, pp. 72–77. attacks in collaborative deep learning systems,” in Proceedings of the
[142] D. Jallepalli, N. C. Ravikumar, P. V. Badarinath, S. Uchil, and M. A. 32nd Annual Conference on Computer Security Applications, 2016, pp.
Suresh, “Federated learning for object detection in autonomous vehicles,” 508–519.
in 2021 IEEE Seventh International Conference on Big Data Computing [164] S. Andreina, G. A. Marson, H. Möllering, and G. Karame, “Baffle:
Service and Applications (BigDataService). IEEE, 2021, pp. 107–114. Backdoor detection via feedback-based federated learning,” in 2021
[143] K. Xie, Z. Zhang, B. Li, J. Kang, D. Niyato, S. Xie, and Y. Wu, IEEE 41st International Conference on Distributed Computing Systems
“Efficient federated learning with spike neural networks for traffic sign (ICDCS). IEEE, 2021, pp. 852–863.
recognition,” IEEE Transactions on Vehicular Technology, vol. 71, [165] Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-
no. 9, pp. 9980–9992, 2022. based federated learning for traffic flow prediction,” Future Generation
[144] F. Sattler, S. Wiedemann, K.-R. Müller, and W. Samek, “Robust and Computer Systems, vol. 117, pp. 328–337, 2021.
communication-efficient federated learning from non-iid data,” IEEE [166] C. He, A. D. Shah, Z. Tang, D. F. N. Sivashunmugam, K. Bhogaraju,
transactions on neural networks and learning systems, vol. 31, no. 9, M. Shimpi, L. Shen, X. Chu, M. Soltanolkotabi, and S. Avestimehr,
pp. 3400–3413, 2019. “Fedcv: a federated learning framework for diverse computer vision
[145] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, tasks,” arXiv preprint arXiv:2111.11066, 2021.
“Federated learning with compression: Unified analysis and sharp [167] J. Luo, X. Wu, Y. Luo, A. Huang, Y. Huang, Y. Liu, and Q. Yang,
guarantees,” in International Conference on Artificial Intelligence and “Real-world image datasets for federated learning,” arXiv preprint
Statistics. PMLR, 2021, pp. 2350–2358. arXiv:1910.11089, 2019.
[146] W. Luping, W. Wei, and L. Bo, “Cmfl: Mitigating communication [168] S. S. Sohail, F. Farhat, Y. Himeur, M. Nadeem, D. Ø. Madsen, Y. Singh,
overhead for federated learning,” in 2019 IEEE 39th international S. Atalla, and W. Mansoor, “Decoding chatgpt: A taxonomy of existing
conference on distributed computing systems (ICDCS). IEEE, 2019, research, current challenges, and possible future directions,” Journal of
pp. 954–964. King Saud University-Computer and Information Sciences, p. 101675,
[147] I. Varlamis, C. Sardianos, C. Chronis, G. Dimitrakopoulos, Y. Himeur, 2023.
A. Alsalemi, F. Bensaali, and A. Amira, “Using big data and [169] F. Farhat, E. S. Silva, H. Hassani, D. Ø. Madsen, S. S. Sohail, Y. Himeur,
federated learning for generating energy efficiency recommendations,” M. A. Alam, and A. Zafar, “Analyzing the scholarly footprint of chatgpt:
International Journal of Data Science and Analytics, pp. 1–17, 2022. mapping the progress and identifying future trends,” 2023.
[148] M. Tang, X. Ning, Y. Wang, J. Sun, Y. Wang, H. Li, and Y. Chen, [170] S. S. Sohail, F. Farhat, Y. Himeur, M. Nadeem, D. Ø. Madsen, Y. Singh,
“Fedcor: Correlation-based active client selection strategy for heteroge- S. Atalla, and W. Mansoor, “The future of gpt: A taxonomy of existing
neous federated learning,” in 2022 IEEE/CVF Conference on Computer chatgpt research, current challenges, and possible future directions,”
Vision and Pattern Recognition (CVPR), 2022, pp. 10 092–10 101. Current Challenges, and Possible Future Directions (April 8, 2023),
2023.
[149] L. Qu, Y. Zhou, P. P. Liang, Y. Xia, F. Wang, E. Adeli, L. Fei-
[171] S. S. Sohail, D. Ø. Madsen, Y. Himeur, and M. Ashraf, “Using chatgpt
Fei, and D. Rubin, “Rethinking architecture design for tackling data
to navigate ambivalent and contradictory research findings on artificial
heterogeneity in federated learning,” in Proceedings of the IEEE/CVF
intelligence,” Available at SSRN 4413913, 2023.
Conference on Computer Vision and Pattern Recognition, 2022, pp.
10 061–10 071.
[150] M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, and C. Chen, “Local
learning matters: Rethinking data heterogeneity in federated learning,”
in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2022, pp. 8397–8406.
[151] H. Bousbiat, R. Bousselidj, Y. Himeur, A. Amira, F. Bensaali, F. Fadli,
W. Mansoor, and W. Elmenreich, “Crossing roads of federated learning
and smart grids: Overview, challenges, and perspectives,” arXiv preprint
arXiv:2304.08602, 2023.
[152] M. Domı́nguez-Morales, J. P. Domı́nguez-Morales, Á. Jiménez-
Fernández, A. Linares-Barranco, and G. Jiménez-Moreno, “Stereo
matching in address-event-representation (aer) bio-inspired binocular
systems in a field-programmable gate array (fpga),” Electronics, vol. 8,
no. 4, p. 410, 2019.
[153] O. Azzouzi, M. Anane, M. Koudil, M. Issad, and Y. Himeur, “Novel

You might also like