0% found this document useful (0 votes)

11 views21 pages

Privacy and Robustness in Federated Learning: Attacks and Defenses

This document provides a comprehensive survey on privacy and robustness in federated learning (FL), highlighting vulnerabilities to attacks and potential defenses over the past five years. It categorizes FL based on data distribution and architecture, discussing various threat models, privacy attacks, and defenses, while emphasizing the importance of designing FL systems that ensure data privacy and robustness. The article also reviews existing methodologies and future research directions in securing FL against adversarial threats.

Uploaded by

Phương Nghi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

Privacy and Robustness in Federated Learning: Attacks and Defenses

Uploaded by

Phương Nghi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

8726 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO.

7, JULY 2024

Privacy and Robustness in Federated

Learning: Attacks and Defenses
Lingjuan Lyu , Member, IEEE, Han Yu , Senior Member, IEEE, Xingjun Ma , Chen Chen , Lichao Sun ,
Jun Zhao , Member, IEEE, Qiang Yang, Fellow, IEEE, and Philip S. Yu , Life Fellow, IEEE

Abstract— As data are increasingly being stored in different directions toward robust and privacy-preserving FL, and their
silos and societies becoming more aware of data privacy issues, interplays with the multidisciplinary goals of FL.
the traditional centralized training of artificial intelligence (AI)
models is facing efficiency and privacy challenges. Recently, Index Terms— Attacks, defenses, federated learning (FL), pri-
federated learning (FL) has emerged as an alternative solution vacy, robustness.
and continues to thrive in this new reality. Existing FL protocol
designs have been shown to be vulnerable to adversaries within N OMENCLATURE
or outside of the system, compromising data privacy and system
robustness. Besides training powerful global models, it is of
AI Artificial intelligence.
paramount importance to design FL systems that have privacy ML Machine learning.
guarantees and are resistant to different types of adversaries. FL Federated learning.
In this article, we conduct a comprehensive survey on privacy GDPR General data protection regulation.
and robustness in FL over the past five years. Through a concise i.i.d. Independent identically distributed.
introduction to the concept of FL and a unique taxonomy
covering: 1) threat models; 2) privacy attacks and defenses; and
IoT Internet of Things.
3) poisoning attacks and defenses, we provide an accessible review HFL Horizontally federated learning.
of this important topic. We highlight the intuitions, key tech- VFL Vertically federated learning.
niques, and fundamental assumptions adopted by various attacks FTL Federated transfer learning.
and defenses. Finally, we discuss promising future research H2B HFL to businesses.
H2C HFL to consumers.
Manuscript received 13 January 2022; revised 11 August 2022; SGD Stochastic gradient descent.
accepted 14 October 2022. Date of publication 10 November 2022; date
of current version 9 July 2024. This work was supported in part by Sony SMC Secure multiparty computation.
AI; in part by the Joint NTU-WeBank Research Centre on Fintech under DP Differential privacy.
Award NWJ-2020-008; in part by Nanyang Technological University, Sin- CDP Centralized differential privacy.
gapore; in part by the Joint SDU-NTU Centre for Artificial Intelligence
Research (C-FAIR) under Grant NSC-2019-011; in part by the National LDP Local differential privacy.
Research Foundation, Singapore, under its AI Singapore Programme, under DDP Distributed differential privacy.
AISG Award AISG2-RP-2020-019; in part by the RIE 2020 Advanced HE Homomorphic encryption.
Manufacturing and Engineering (AME) Programmatic Fund, Singapore, under
Grant A20G8b0102; in part by Nanyang Technological University through RFA Robust federated aggregation.
the Nanyang Assistant Professorship (NAP); and in part by the Future GAN Generative adversarial network.
Communications Research & Development Programme under Grant FCP- MIA Membership inference attack.
NTU-RG-2021-014. The work of Qiang Yang was supported in part by
the Hong Kong RGC Theme-Based Research Scheme under Grant T41- AT Adversarial training.
603/20-R. The work of Philip S. Yu was supported in part by NSF under FAT Federated adversarial training.
Grant III-1763325, Grant III-1909323, Grant III-2106758, and Grant SaTC- API Application programming interface.
1930941. (Corresponding authors: Lingjuan Lyu; Han Yu; and Qiang Yang.)
Lingjuan Lyu is with Sony AI, Tokyo 108-0075, Japan (e-mail:
[email protected]).
Han Yu and Jun Zhao are with the School of Computer Science and I. I NTRODUCTION
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]; [email protected]).
Xingjun Ma is with the School of Computer Science, Fudan University,
Shanghai 200437, China (e-mail: [email protected]).
A S COMPUTING devices become increasingly ubiqui-
tous, people generate huge amounts of data through
their day-to-day usage. Collecting such data into centralized
Chen Chen was with Sony AI, Tokyo 108-0075, Japan. He is now with the storage facilities is costly and time-consuming. Traditional
College of Computer Science, Zhejiang University, Hangzhou 310027, China
(e-mail: [email protected]). centralized ML approaches cannot support such ubiquitous
Lichao Sun is with the Department of Computer Science and Engineering, deployments and applications due to infrastructure shortcom-
Lehigh University, Bethlehem, PA 18015 USA (e-mail: [email protected]). ings, such as limited communication bandwidth, intermittent
Qiang Yang is with the Department of Artificial Intelligence (AI), WeBank,
Shenzhen 518000, China, and also with the Department of Computer Science network connectivity, and strict delay constraints [1]. Another
and Engineering, The Hong Kong University of Science and Technology, critical concern is data privacy and user confidentiality as the
Hong Kong (e-mail: [email protected]). usage data usually contain sensitive information [2]. Sensitive
Philip S. Yu is with the Department of Computer Science, University of
Illinois Chicago, Chicago, IL 60607 USA (e-mail: [email protected]). data, such as facial images, location-based services, or health
Digital Object Identifier 10.1109/TNNLS.2022.3216981 information, can be used for targeted social advertising and
2162-237X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8727

recommendation, posing immediate or potential privacy risks.

Hence, private data should not be directly shared without any
privacy protection. As societies become increasingly aware of
privacy preservation, legal restrictions, such as the GDPR,
are emerging, which makes data aggregation practices less
feasible [3].
In this scenario, FL (also well known as collaborative
learning), which distributes model training to the devices
from which data originate, emerged as a promising alternative
ML paradigm [4]. FL enables a multitude of participants to
construct a joint ML model without exposing their private
training data [4], [5]. It can also handle unbalanced and
non-i.i.d. data, which naturally arise in the real world [6]. Fig. 1. Typical FL training process, in which both the (potentially malicious)
In recent years, FL has benefited a wide range of appli- FL server/aggregator and malicious participants may pose threats to the FL
cations such as next word prediction [6], [7], visual object system.
detection for safety [8], entity resolution [9], recommenda-
tion [10], [11], [12], industrial IoT [13], unmanned aerial
under a federation. FTL enables complementary knowledge
vehicles (UAVs) [14], and graph-based analysis [15], [16],
to be transferred across domains in a data federation, thereby
[17], [18].
enabling a target-domain party to build flexible and effective
models by leveraging rich labels from a source domain [23].
A. Categorization of FL Based on Distribution
Based on the distribution of data features and data samples B. Categorization of FL Based on Architectures
among participants, FL can be generally classified as HFL, 1) FL With Homogeneous Architectures: Sharing gradients
VFL, and FTL [19]. is typically limited only to homogeneous FL architectures, i.e.,
Under HFL, datasets owned by each participant share the same model is shared with all participants. Participants
similar features but concern different users [20]. For exam- aim to collaboratively learn a more accurate model. Specifi-
ple, several hospitals may each store similar types of data cally, model parameters w are often obtained
via solving the
(e.g., demographic, clinical, and genomics) about different following optimization problem: minw ni=1 F(w, Di ), where
patients. If they decide to build an ML model together F(w, Di ) is the objective function for the local training
using FL, we refer to such a scenario as HFL. In this dataset on the i th participant and characterizes how well the
article, we further classify HFL into H2B and H2C. The main parameters w model the local training dataset Di on the
difference between H2B and H2C lies in the number of partic- i th participant. Different classifiers (e.g., logistic regression
ipants, FL training participation level, and technical capability, and deep neural networks) use different objective functions.
which can influence how adversaries attempt to compromise In FL, each participant maintains a local model for its local
the FL system. Under H2B, there are typically a handful training dataset. The server maintains a global model via
of participants. They can be frequently selected during FL aggregating local models from n participants. Specifically,
training. The participants tend to possess significant computa- FL with homogeneous architectures performs the steps in
tional power and sophisticated technical capabilities [3]. Under Fig. 1. Generally, FL with homogeneous architectures comes
H2C, there can be thousands or even millions of potential in two forms [6]: 1) FedSGD, in which each participant sends
participants. In each round of training, only a subset of them every SGD update to the server and 2) FedAvg, in which
is selected. As their datasets tend to be small, the chance of participants locally batch multiple iterations of SGD before
a participant being selected repeatedly for FL training is low. sending updates to the server, which is more communication
They generally possess limited computational power and low efficient. These methods are all based on the mean aggregation
technical capabilities. An example of H2C is Google’s GBoard rule that takes the average of the local model parameters as
application [7]. the global model.
VFL is applicable to cases in which participants have large 2) FL With Heterogeneous Architectures: The most recent
overlaps in the sample space but differ in the feature space, efforts extended FL to collaboratively train models with het-
i.e., different participants hold different attributes of the same erogeneous architectures [24], [25], [26]. Conventional fed-
records [21]. VFL mainly targets business participants. Thus, erated model training that directly averages model weights
the characteristics of VFL participants are similar to those of is only possible if all local models have the same model
H2B participants. structure. Naturally, it limits collaboration among data owners
Nowadays, FTL is attracting increasing attention in indus- with heterogeneous model architectures. One representative
tries such as finance, medicine, and healthcare. FTL deals work called federated model distillation (FedMD) [25] does
with scenarios in which FL participants have little overlap not force a single global model onto local models. Instead,
in both the sample space and the feature space [3], [19]. it is conducted in a succinct, black box, and model-agnostic
In this case, transfer learning [22] techniques can be applied manner. Each local model is updated separately; participants
to provide solutions for the entire sample and feature space share the knowledge of their local models via their predictions

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8728 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

TABLE I
S UMMARY OF ATTACKS A GAINST S ERVER -BASED FL

on an unlabeled public set [25]. One obvious benefit of important to understand the principles behind these privacy
sharing logits is the reduced communication costs, without and robustness attacks. The properties of the representative
significantly affecting utility [25]. privacy and robustness attacks in server-based FL are summa-
In summary, all the above sharing methods did not inher- rized in Table I.
ently provide defenses against privacy and poisoning attacks— Note that all the above threats are mainly for homogeneous
two main sources of threats to FL. FL. Although heterogeneous FL is more privacy friendly
compared to homogeneous FL, as sharing model prediction
instead of model parameters or updates eliminates the risk of
C. Threats to FL
white-box inference attacks in homogeneous FL [25], [41],
FL offers a privacy-aware paradigm of model training, there is no theoretic guarantee that sharing prediction is
which does not require data sharing and allows participants to private and secure. In fact, the predictions from local models
join and leave a federation freely. Nevertheless, recent works also encode some private information [42], [43], [44], [45].
have demonstrated that FL may not always provide suffi- Similarly, local model predictions can also be arbitrarily
cient privacy and robustness guarantees. Existing FL protocol manipulated by any malicious participant. However, how it is
designs are vulnerable to: 1) a malicious server that aims to vulnerable to different privacy and poisoning attacks, as listed
infer sensitive information from individual updates over time, in Sections III and IV, remains largely unknown, which needs
tamper with the training process, or control the view of the further investigation. Henceforth, we are mainly focusing on
participants on the global parameters and 2) any adversarial the homogeneous FL throughout this survey.
participant who can infer other participants’ sensitive informa-
tion, tamper the global parameter aggregation, or poison the
global model. D. Secure FL
In terms of privacy, communicating gradients throughout Attacks on FL come from either the privacy perspective
the training process can reveal sensitive information [27], [28] when a malicious participant or the central server attempts
and even cause deep leakage [29], either to a third party or to infer the private information of a victim participant or the
the central server [7], [30]. Even a small portion of gradients robustness perspective when a malicious participant aims to
can reveal a fair amount of sensitive information about the compromise the global model.
local data [31]. Recent works further show that, by simply To secure FL against privacy attacks, existing privacy-
observing the gradients, a malicious attacker can successfully preserving methodologies in centralized ML have been tried in
steal the training data [29], [32]. FL, including HE, SMC, and DP. However, HE and SMC may
In terms of robustness, FL systems are vulnerable to both not be applicable to large-scale FL, as they incur substantial
data poisoning [33], [34] and model poisoning attacks [35], communication and computation overhead. In aggregation-
[36], [37], [38]. Malicious participants can attack the con- based tasks, DP requires the aggregated value to contain
vergence of the global model or implant backdoor triggers random noise up to a certain magnitude to ensure (, δ)-DP
into the global model by deliberately altering their local data and, thus, is also not ideal for FL. The noise addition required
(data poisoning) or their gradient uploads (model poisoning). by DP is also hard to execute in FL. In an ideal scenario where
More broadly, poisoning attacks can be categorized into: the server (aggregator) is trusted, the server can add the noise
1) untargeted attack, such as a Byzantine attack, where the to the aggregated gradients [7]. However, in many real-world
adversary aims to destroy the convergence and performance scenarios, the participants may not trust the central server
of the global model [39], [40] and 2) targeted attack, such or each other. In this case, the participants would compete
as a backdoor attack, where the adversary aims to implant a with each other, and all want to ensure LDP by adding as
backdoor trigger into the global model, so as to trick the model much noise as possible to their local gradients [30], [44], [46].
to constantly predict an adversarial class on a subtask while This tends to accumulate significant errors on the server side.
keeping good performance on the main task [34], [35], [36]. DDP [30], [44], [46] can mitigate this problem to some extent
These privacy and robustness attacks pose significant threats when at least a certain fraction of the participants are honest
to FL. In centralized learning, the server is responsible for and do not conduct such malicious competition.
all the participants’ privacy and model robustness. However, Defending FL against various robustness attacks (e.g.,
in FL, any participant can attack the server and spy on other untargeted Byzantine attack and targeted backdoor attack)
participants, even without involving the server. Therefore, it is is an extremely challenging task. This is due to two main

reasons. First, the defense can only be executed on the privacy analysis of FL under both passive and active white-box
server side where only local gradients are available. This inference attacks; Huang et al. [72] did a comprehensive eval-
invalids many backdoor defense methods developed in the uation of defenses against gradient inversion attacks in FL,
centralized ML, for example, denoising (preprocessing) meth- including encrypting gradients, perturbing gradients, encoding
ods [47], [48], [49], [50], [51], backdoor sample/trigger inputs, and combined defenses; and Shejwalkar et al. [73]
detection methods [52], [53], [54], [55], [56], [57], robust clearly showed that FL, even without any defenses, is highly
data augmentations [58], fine-tuning methods [58], the neural robust in practice. For production cross-device FL (H2C),
attention distillation (NAD)-based method [59], and more which contains thousands to billions of clients, poisoning
recent anti-backdoor learning (ABL) method based on a attacks have no impact on existing robust FL algorithms even
sophisticated learning process [60]. Second, the defense with impractically high percentages of compromised clients.
method has to be robust to both data poisoning and For production cross-silo FL (H2B), which contains up to
model poisoning attacks. Most existing robustness defenses 100 clients, data poisoning attacks are completely ineffective;
are gradient aggregation methods mainly developed for model poisoning attacks are unlikely to play a major risk
defending against the untargeted Byzantine attackers, such when the clients involved are bound by contract and their
as Krum/multi-Krum [40], AGGREGATHOR [61], Byzan- software stacks are professionally maintained (e.g., in banks
tine gradient descent (BGD) [62], median-based gradient and hospitals).
descent [63], trimmed-mean-based gradient descent [63], and Overall, the major contributions of this survey include the
SIGN SGD [39]. These defense methods have never been tested following.
on the targeted backdoor attacks [33], [34], [35], [36], [38]. 1) This survey presents a comprehensive categorization
Dedicated defense methods against both data poisoning and of FL, and summarized threats and the corresponding
model poisoning attacks have been investigated, such as norm protections for FL in a systematic manner.
clipping [38], geometric median-based RFA [64], and robust 2) Existing privacy and robustness attacks and defenses
learning rate [65]. For the collusion of Sybil attacks, contribu- are well explored to help readers better understand the
tion similarity [37] can be leveraged as a strategy for defense. assumptions, principles, reasons, and differences of the
current progress in the privacy and robustness domain
of FL.
E. Motivation of This Survey and Our Contribution 3) The conflicts between privacy and robustness, and
Most existing surveys on FL are mostly focused on the sys- among multiple design goals are identified; the gaps
tem or protocol design [19], [66], [67]. A few surveys touched between the current works and the real scenarios in
on either privacy or robustness, but did not systematically FL are summarized.
explore both, and their intersections with the other aspects in 4) Future research directions will assist the community
FL, such as fairness and efficiency [68], [69], [70]. A notable to rethink and improve their current designs toward
number of research works have been conducted on privacy and robust and privacy-preserving FL of real practicality
robustness. Although these works attempt to discover the vul- and impact. Meanwhile, it is suggested to integrate
nerabilities of FL and aim to enhance the privacy and system multidisciplinary goals in the system design of FL.
robustness of FL, there are very few efforts for categorizing
them in a systemic manner, and privacy and robustness threats
to FL have not been systematically explored. To fill in this F. Survey Organization
gap, in this article, we have conducted an extensive survey The rest of the survey is organized as follows. Before going
on the recent advances in privacy and robustness threats to into an in-depth discussion on privacy and robustness in FL,
FL and their defenses. In particular, we focus on two specific in Section II, we first summarize the threat models from a
threats initiated by insiders in FL systems: 1) privacy attacks general perspective and discuss the customized threat models
that attempt to infer the victim participants’ private infor- for privacy and robustness, respectively. Section III presents a
mation and 2) poisoning attacks that attempt to prevent the comprehensive review of the privacy attacks in FL, particularly
learning of a global model or implant triggers to control targeting the sensitive information (class representative, mem-
the behavior of the global model. This article mainly surveys bership, properties, training inputs, and labels) in HFL with
the literature over the past five years on privacy and robustness homogeneous architectures. Section IV shows the detailed
in FL; it can be a notable inclusion to the existing literature, poisoning attacks that aim to compromise system robust-
helping the community better understand the state-of-the-art ness, including the untargeted and targeted poisoning attacks.
privacy and robustness progress in FL. The limitations and the Sections V and VI list the most representative privacy-
promising use cases of the existing works in the literature and preserving techniques and defense mechanisms for robustness,
open directions for future research are also offered to identify and current practices that have applied these techniques in FL.
the research gaps to address the challenges of privacy and From the lessons learned in this survey paper, the research
robustness in FL. gaps toward realizing trustworthy FL along with directions for
For empirical and use case analysis of privacy and robust- future research are provided in Section VII. Finally, concluding
ness, interesting readers can refer to [71], [72], and [73], which remarks are drawn in Section VIII.
showcased where and how the attacks and defense stand so far For better readability, we give a diagram in Fig. 2 show-
in FL. For example, Nasr et al. [71] provided a comprehensive ing the different aspects covered in the survey. The list of

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8730 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

replaying, or removing messages. This setting allows the

adversary to conduct particularly devastating attacks.
Note that both the honest and semihonest participants
will follow the protocol honestly; however, semihonest may
attempt to learn or infer sensitive information from the infor-
mation he or she received [43], [67]. By contrast, the malicious
participant will arbitrarily betray the protocol in order to
compromise the privacy of other participants or compromise
the integrity of the FL model in either a targeted or untargeted
manner, as indicated in Sections III and IV.

C. Training Phase Versus Inference Phase

1) Training Phase: During the training phase, the insider
Fig. 2. Survey organization.
attacker has access to the full model, notably its architecture
abbreviations used in this survey is provided in Nomenclature. and parameters, and any hyperparameter that is needed to
use the model for predictions [71]. Attacks conducted during
Throughout this survey, we will interchangeably use partici-
pants/clients/users to represent the participants in FL. the training phase attempt to learn, influence, or corrupt the
FL model itself [74].
In terms of the privacy vulnerability during the training
II. T HREAT M ODELS phase, the attacker can also launch a range of inference
Before reviewing attacks on FL, we first present a summary attacks on an individual participant’s shared information or the
of the threat models. Generally, we categorize the threat aggregated information in order to compromise privacy [28],
models in FL into the following types: 1) insider versus [29], [71], [75]. For example, from the shared local model
outsider according to the role of the attacker; 2) semihonest parameter, the insider attacker (participant or the server)
versus malicious according to whether the attacker will obey can infer class representatives [75], record membership, and
the protocol; and 3) training phase versus test/inference phase properties [28], [71], even the original training inputs and
according to the happening phase of the attack. These threat labels [29]. More details can be referred to Section III.
models apply to both privacy and robustness attacks. In terms of the robustness vulnerability during the training
phase, the attacker can run data poisoning attacks to com-
promise the integrity of the training dataset [35], [37], [76],
A. Insider Versus Outsider [77], [78], [79] or model poisoning attacks to compromise
1) Insider Attacks: Insider attacks can be launched by either the integrity of the learning process [36], [80]. The con-
the FL server or the participants in the FL system. More crete attacks could be the targeted label-flipping attack [37],
concretely, the central server can observe individual updates [81] and backdoor attack [35], or the untargeted Byzantine
over time and can control the view of the participants on attack [40], [63]. More details can be referred to Section IV.
the global parameters; any of the participants can observe the 2) Test/Inference Phase: In this phase, attackers do not alter
global parameter updates and can control his or her parameter the targeted model; instead, they will query the targeted model
uploads [71]. to leak some private information or trick the targeted model
2) Outsider Attacks: Outsider attacks include those to compromise robustness by producing wrong predictions.
launched by the eavesdroppers on the communication channel The effectiveness of such attacks is largely determined by the
between participants and the FL server, and by users of the information that is available to the adversary about the model.
final FL model when it is deployed as a service. In terms of privacy vulnerability during the inference phase,
Insider attacks are generally more dangerous than outsider the trained global model may reveal sensitive information from
attacks, as it strictly enhances the capability of the adversary. model predictions when deployed as a service, causing privacy
Thus, our discussion of attacks against FL will focus primarily leakage. In such a setting, an attacker does not have direct
on insider attacks. access to the model parameters but may be able to view
input–output pairs, thus launching model stealing attacks in
which model parameters can be reconstructed [82], [83], [84],
B. Semihonest Versus Malicious [85] or MIAs that aim to determine if a particular record was
1) Semihonest Setting: Adversaries are considered passive used to train the model [71], [86].
or honest-but-curious. They try to learn the private information In terms of the robustness vulnerability during the inference
of other participants without deviating from the FL protocol. phase, the global model maintained by the server suffers
The adversaries can only observe the received information, from the same evasion attacks [87] as in the conventional
i.e., parameters of the global model. ML setting when the target model is deployed as a service.
2) Malicious Setting: An active or malicious adversary tries One well-studied form of evasion attacks is the so-called
to learn the private information of the other honest participants adversarial examples, which seem almost indistinguishable
and deviates arbitrarily from the FL protocol by modifying, from the original test input to a human, but can fool the

A. Inferring Class Representatives

Hitaj et al. [75] first devised an active inference attack
called GAN attack against deep FL models. In this attack,
a malicious participant can intentionally compromise any other
participant. The GAN attack exploits the real-time nature
of the FL learning process, which allows the adversarial
participant to train a GAN to generate prototypical samples
of the targeted private training data. The generated samples
appear to come from the same distribution as the training data.
Fig. 3. Demo of privacy leakage in FL. An attacker can infer various private Hence, the GAN attack is not targeted to reconstruct the exact
information about the victim participant from the received gradients or the
snapshot of the FL model parameters. training inputs but only the class representatives. It should
be noted that the GAN attack assumes the entire training
corpus for a given class comes from a single participant, which
trained model [88], [89]. Recent studies [90], [91], [92] have means that the GAN-constructed representatives are similar to
shown that FL is also vulnerable to well-crafted adversarial the training data only when all class members are similar.
examples during the inference phase. One typical defense This resembles model inversion attacks in the centralized ML
against robustness attacks is AT, in which a robust model settings [96]. Note that these assumptions are less practical
is trained with adversarial examples and, hence, provides in FL. Since the GAN attack requires a substantial amount
some robustness to white-box evasion attacks [93]. However, of computational resources to train the GAN model, it is less
adapting AT methods to FL brings a host of open questions, suitable for H2C scenarios.
which we discuss in Section VII.
B. Inferring Membership
III. P RIVACY ATTACKS
Given an exact data point, MIAs aim to determine if it was
Although FL prevents the participants from directly sharing used to train the model [86]. For example, an attacker can infer
their private data, a series of works have demonstrated that whether a specific patient profile was used to train a classifier
exchanging gradients in FL can also leak sensitive information associated with a certain disease. FL opens new possibilities
about the participants’ private data to either passive or active for such attacks. In FL, the adversary can infer if a particular
attackers [28], [29], [31], [71], [94]. For example, gradients sample belongs to the private training data of a particular
or two consecutive snapshots of the FL model parameters participant (if the target update is from a single participant)
can leak unintended features of the participants’ training data or any participant (if the target update is the aggregate). For
to the adversarial participants, as deep learning models tend example, during FL model training, the nonzero gradients of
to recognize and remember more features of the data than the embedding layer of a deep natural language processing
needed for the main learning task [95]. Fig. 3 illustrates that model trained on text data can reveal which words are in the
the set of information an adversary can infer from the gradients training batches of the honest participants [28], [97].
(i.e., wt1 ) or, equivalently, the difference of two successive Attackers in an FL system can conduct both active and
snapshots of the model parameters (i.e., w t+1 − wt ). passive MIAs [28], [71]. In the passive case, the attacker
The reason why gradients can cause privacy leakage is observes the updated model parameters and performs inference
that the gradients are derived from the participants’ private without modifying the learning process. In the active case,
training data, and a learning model can be considered as a the attacker can tamper with the FL model training protocol
representation of the high-level statistics of the dataset that it and perform a more powerful attack against other participants.
was trained on [43]. In deep learning models, gradients of a For instance, the attacker may share malicious updates and
given layer are computed based on the layer’s features and trick the FL model to expose more information about other
the error from the layer after (i.e., backpropagation). In the participants’ local data. One such attack is the gradient ascent
case of sequential fully connected layers, the gradients of the attack [71], where the attacker runs gradient ascent on a target
weights are the inner products of the current layer’s features data sample and observes whether its increased loss can be
and the error from the layer after. Similarly, for a convolu- drastically reduced in the next communication round; if so,
tional layer, the gradients of the weights are convolutions of the sample is very likely to be in the training set. This attack
the layer’s features and the error from the layer after [28]. can be applied on a batch of target data samples all at the
Consequently, observations of gradients can be used to infer same time [71].
a significant amount of private information, such as class
representatives, membership, and properties of a subset of the
training data. Even worse, an attacker can infer labels from C. Inferring Properties
the shared gradients and recover the original training samples An adversary can launch both passive and active property
without any prior knowledge about the training data [29]. inference attacks to infer certain properties of other partici-
Next, we detail the potential privacy leakage of FL accord- pants’ training data [28]. Property inference attacks assume
ing to the type of sensitive information that the attacker is that the adversary has auxiliary training data that are correctly
targeting. labeled with the target property. A passive adversary can only

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8732 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

observe or eavesdrop on the gradients and perform inference

by training a binary property classifier. An active adversary can
exploit multitask learning to trick the FL model into learning
a better separation between data with and without the target
property, so as to extract more information. An adversarial
participant can also infer when a property appears or disap-
pears in the training data (e.g., identifying when a person first
appears in the photographs used to train a gender classifier).
The assumption of auxiliary training data in property inference
attacks may limit its applicability in H2C. Fig. 4. Data versus model poisoning attacks in FL.

D. Inferring Training Inputs and Labels centralized poisoning attack in which a subset of the training
data is poisoned.
One recent work called deep leakage from gradient (DLG)
proposes an optimization algorithm to extract both the training
A. Untargeted Attacks
inputs and the labels [29]. This attack is much stronger than
previous approaches. It can accurately recover the raw images Untargeted poisoning attacks aim to arbitrarily compromise
and texts used to train a deep learning model. In a follow- the integrity of the target model. The Byzantine attack is one
up work [32], an analytical approach called improved DLG type of untargeted poisoning attack that upload arbitrarily
(iDLG) was proposed to extract labels based on the shared malicious gradients to the server so as to cause the failure
gradients and an exploration of the correlation between the of the global model [39], [40], [61], [100], [107]. A formal
labels and the signs of the gradients. iDLG can be applied definition of Byzantine attack is given in Definition 1.
to attack any differentiable models trained with cross-entropy Definition 1 (Byzantine Attack [40], [63]): An honest par-
loss and one-hot labels, which is a typical setting for classifi- ticipant uploads wi := ∇ Fi (wi ), while a dishonest partici-
cation tasks. pant can upload arbitrary values

In summary, inference attacks generally assume that the ∗, if i -th participant is Byzantine
adversaries possess sophisticated technical capabilities and wi = (1)
unlimited computational resources. Moreover, most attacks ∇ Fi (wi ), otherwise
assume that the adversarial participants can be selected where “∗” represents arbitrary values and Fi represents par-
(to update the global model) in many rounds of the FL ticipant i ’s local model objective function.
training process. In FL, these assumptions are generally Blanchard et al. [40] showed that the aggregation of FL can
not practical in H2C scenarios but more likely to happen in be completely controlled by a single Byzantine participant if
H2B scenarios. These inference attacks highlight the need for there is no defense in the FL. In particular, suppose that there
gradient protection in FL, possibly through various privacy- are n − 1 benign participants and a Byzantine participant;
the
preserving mechanisms [3] detailed in Section V. server aggregates the gradients by w = (1/n) ni=1 wi ,
where w is the aggregated gradient. Assume the nth partic-
IV. P OISONING ATTACKS ipant is Byzantine; it can always make the aggregated gradient
Different from privacy attacks that are targeting at data become any vector u by uploading the following gradient:
privacy, poisoning attacks aim to compromise the system’s
n−1

robustness. Depending on the attacker’s objective, poison- wn = nu − wi . (2)

ing attacks can be broadly classified into two categories: i=1
1) untargeted poisoning attacks [80], [81], [98], [99], [100] and Such a simple attack exposes the vulnerability of FL against
2) targeted poisoning attacks [35], [36], [101], [102], [103], Byzantine attacks.
[104], [105], [106]. Chen et al. [108] discussed Byzantine attacks in Adam-
Note that the untargeted and targeted poisoning attacks based FL and proposed a camouflaged attack that can cam-
during the training phase can be mounted on both the data ouflage the model updates and launch effective attacks. Their
and the model. Fig. 4 shows that the poisoned updates can proposed attack also works on other well-known optimizers,
be sourced from two poisoning attacks: 1) data poisoning such as AdaGrad and RMSProp. Baruch et al. [109] showed
attack during local data collection and 2) model poisoning that the core part of gradient descent algorithms is the direction
attack during local model training process. At a high level, of the descent. Specifically, for gradient descent algorithms,
both poisoning attacks attempt to modify the behavior of the to guarantee the descent of the loss, the inner product between
target model in some undesirable way. However, due to the the ground-truth gradient and the robust aggregated gradient
model-sharing nature of FL with homogeneous architectures, must be nonnegative
data poisoning attacks are generally less effective than model
w, w ≥ 0 (3)
poisoning attacks [35], [36], [37], [38]. In fact, model poison-
ing subsumes data poisoning in FL settings, as data poisoning where ·, · is the inner product operation, w is the optimal
attacks eventually change a subset of updates sent to the gradient, and w = aggregate(w1 , . . . , wn ) is the aggre-
model at any given iteration. This is functionally identical to a gated gradient with aggregate(·) being arbitrary aggregation

function. To make the aggregation fail, they proposed an TABLE II

“inner product manipulation attack” that can make the inner P RIVACY-P RESERVING T ECHNIQUES FOR FL
product between the ground-truth gradient and the robust
aggregated gradient negative. To do this, each Byzantine par-
ticipant uploaded the negative of the average benign gradients.
Their proposed attack can successfully bypass coordinatewise
median [63] and Krum [40]. Xie et al. [100] claimed that,
by consistently applying small changes to many parameters,
a Byzantine participant can perturb the model’s convergence.
First, they used the local data of Byzantine participants to the data or the gradients. Bhagoji et al. [36] demonstrated
estimate the mean and standard deviation of the distribution. that a single, noncolluding malicious participant can cause
Then, they analyzed the range in which changes to the para- the model to misclassify a set of chosen inputs with high
meters will not be detected by the defense, and upon choosing confidence. Bagdasaryan et al. [35] pointed out that the poi-
the maxima of this range, the convergence is averted. soned updates can be generated by training the local model on
backdoored local training data, and even a single-shot attack
may be enough to inject a backdoor into the global model.
B. Targeted Attacks Xie et al. [34] demonstrated that a global trigger pattern can
In targeted poisoning attacks, the learned model outputs the be decomposed into separate local patterns and embedded into
target label specified by the adversary for particular testing the training set of colluding adversarial participants, respec-
examples, e.g., predicting spams as nonspam and predicting tively. The impact on the FL model depends on the extent to
attacker-desired labels for testing examples with a particular which the backdoor participants engage in the attacks and the
Trojan trigger (backdoor/Trojan attacks). However, the test amount of training data being poisoned. Recent work shows
error for other testing examples is unaffected. Generally, that poisoning edge-case (low probability) training samples are
targeted attacks are more difficult to conduct than untargeted more effective [33].
attacks as the attacker has a specific goal to achieve. 4) Remark: We remark that most of the previous research
1) Label-Flipping Attack: One common example of targeted on poisoning attacks focus on Byzantine or backdoor attackers.
poisoning attack is the label-flipping attack [37], [81]. The A system that allows participants to join and leave is suscepti-
labels of honest training examples of one class are flipped ble to Sybil attacks [115], in which an attacker gains influence
to another class, while the features of the data are kept by joining a system to inject c fake participants into the FL
unchanged. For example, the malicious participants in the system or compromise c benign participants [37]. Sybil attacks
system can poison their local data by flipping all 1s into 7s. can be launched in both the untargeted and targeted manners.
A successful attack produces a model that is unable to cor- For example, targeted poisoning can be conducted by Sybil
rectly classify 1s and incorrectly predicts them to be 7s. clones who contribute updates toward a specific poisoning
2) Backdoor Attack: Another realistic targeted poisoning objective [37]. Concretely, Fung et al. [37] considered two
attack is a backdoor attack, in which an adversary can modify types of targeted attacks by Sybil clones: label-flipping attacks
individual features or small regions of the original training and backdoor attacks.
dataset to implant a backdoor trigger into the model. The
model will behave normally on clean data yet will constantly V. D EFENSES AGAINST P RIVACY ATTACKS
predict a target class whenever the trigger (e.g., a stamp on an While privacy preservation has been extensively studied
image) appears. For instance, a backdoor attack can cause the in the ML community, privacy preservation in FL can be
FL model to reach 100% accuracy on the backdoor task, e.g., more challenging due to the sporadic access to power and
to control an image classifier to assign an attacker-chosen label network connectivity, statistical heterogeneity in the data,
to images with certain features in an image-classification task, and so on. Existing works in privacy-preserving FL are
or a next-word predictor completes certain sentences with an mostly developed based on the well-known privacy-preserving
attacker-chosen word in a word-prediction task [35]. techniques, including: 1) HE, such as Paillier [126], Elga-
Backdoor attacks can be further divided into two categories: mal [127], and Brakerski-Gentry-Vaikuntanathan cryptosys-
dirty-label attacks [58], [103], [104], [110] and clean-label tems [128]; 2) SMC, such as garbled circuits [129], and secret
attacks [58], [79], [105], [111], [112], [113], [114]. Clean- sharing [130]; and 3) DP [131], [132]. A concise summary of
label attacks assume that the adversary cannot change the label privacy-preserving techniques is listed in Table II.
of any training data as there is a process by which data are
certified as belonging to the correct class, and the poisoning
of data samples has to be imperceptible. In contrast, in dirty- A. Privacy Preservation Through Homomorphic Encryption
label poisoning, the adversary can introduce a number of data An HE scheme allows arithmetic operations to be directly
samples that are expected to be misclassified by the model performed on ciphertexts, which is equivalent to a specific
with the desired target label into the training data. Clean-label linear algebraic manipulation of the plaintext. Existing HE
attacks are arguably stealthier as they do not change the labels. techniques can be categorized into: 1) fully HE; 2) somewhat
3) How to Poison: The targeted poisoning attack in FL can HE; and 3) partially HE. Fully HE can support arbitrary com-
be carried out by any FL participant or via collusion on either putation on ciphertexts but is less efficient [128]. On the other

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8734 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

hand, somewhat HE and partially HE are more efficient but In general, SMC techniques ensure a high level of privacy
are specified by a limited number of operations [126], [127], and accuracy, at the expense of high computation and commu-
[133], [134]. Partially HE schemes are more widely used in nication overhead, thereby doing a disservice to attracting par-
practice, including RSA [134], El Gamal [127], Paillier [126], ticipation. Another main challenge facing SMC-based schemes
and so on. The homomorphic properties can be described as is the requirement for simultaneous coordination of all partic-
ipants during the entire training process. Such a multiparty
E pk (m 1 + m 2 ) = c1 ⊕ c2
interaction model may not be desirable in practical settings,
E pk (a · m 1 ) = a ⊗ c1 especially under the commonly considered participant-server
where a is a constant, m 1 and m 2 are the plaintexts that need architecture in FL settings. Besides, SMC-based protocols
to be encrypted, and c1 and c2 are the ciphertext of m 1 and can enable multiple participants to collaboratively compute
m 2 , respectively. an agreed-upon function without leaking input information
HE is widely used and is especially useful for securing from any participant except for what can be inferred from the
the learning process by computing encrypted data. However, outcomes of the computation [135], [136]. That said, SMC
doing arithmetic on the encrypted numbers comes at a cost of cannot fully guarantee protection from information leakage,
memory and processing time. For example, with the Paillier which requires additional DP techniques to be incorporated
encryption scheme, the encryption of an encoded floating-point into the multiparty protocol to address such concerns [137],
number (whether single or double precision) is 2m bits long, [138], [139], [140].
where m is typically at least 1024 and the addition of two In summary, HE- or SMC-based approaches may not be
encrypted numbers is 2∼3 orders of magnitude slower than applicable to large-scale FL scenarios as they incur substantial
the unencrypted equivalent [9]. Moreover, polynomial approx- additional communication and computation costs. Moreover,
imations need to be made to evaluate nonlinear functions encryption-based techniques need to be carefully designed
in ML algorithms, resulting in a tradeoff between utility and implemented for each operation in the target learning
and privacy [116], [117]. For example, to protect individual algorithm [141], [142]. Finally, all the cryptography-based
gradients, Aono et al. [31] used additively HE to preserve the protocols prevent anyone from auditing participants’ updates
privacy of gradients and enhance the security of the distributed to the joint model, which leaves space for the malicious
learning system. However, their protocol not only incurs large participants to attack. For example, malicious participants can
communication and computational overheads but also results introduce stealthy backdoor functionality into the global model
in utility loss. Furthermore, it is not able to withstand collusion without being detected [36].
between the server and multiple participants. Hardy et al. [9]
C. Privacy-Preservation Through Differential Privacy
applied federated logistic regression on vertically partitioned
data encrypted with an additively homomorphic scheme to DP was originally designed for the single database scenario,
secure against an honest-but-curious adversary. Overall, all where, for every query made, a database server answers the
these works incur extra communication and computational query in a privacy-preserving manner with tailored randomiza-
overheads, which limit their applications in H2C scenarios. tion [131]. In comparison with encryption-based approaches,
DP trades off privacy and accuracy by perturbing the data in a
way that: 1) is computationally efficient; 2) does not allow an
B. Privacy-Preservation Through SMC attacker to recover the original data; and 3) does not severely
SMC [129] enables different participants with private inputs affect the utility.
to perform a joint computation on their inputs without reveal- The concept of DP is that the effect of the presence or
ing them to each other. Mohassel and Zhang [125] proposed the absence of a single record on the output likelihood is
SecureML that conducts privacy-preserving learning via SMC, bounded by a small factor . As defined in Definition 2, (, δ)-
where data owners need to process, encrypt and/or secret-share approximate DP [132] relaxes pure -DP by a δ additive term,
their data among two noncolluding servers in the initial setup which means that the unlikely responses need not satisfy the
phase. SecureML allows data owners to train various models pure DP criterion.
on their joint data without revealing any information beyond Definition 2 ((, δ)-DP [132]): For scalars > 0 and 0 ≤
the outcome. However, this comes at a cost of high compu- δ < 1, mechanism M is said to preserve (approximate) (, δ)-
tation and communication overhead, which may hamper par- DP if, for all adjacent datasets, D, D ∈ Dn and measurable
ticipants’ interest to collaborate. Bonawitz et al. [5] proposed S ∈ range(M)
a secure, communication-efficient, and failure-robust protocol
Pr{M(D) ∈ S} ≤ exp() · Pr M D ∈ S + δ .
based on SMC for secure aggregation of individual gradients.
It ensures that the only information about the individual users To avoid the worst case scenario of always violating the
the server learns is what can be inferred from the aggregated privacy of a δ fraction, the standard recommendation is to
results. The security of their protocol is maintained under both choose δ 1/|D|, where |D| is the size of the database.
the honest-but-curious and malicious settings, even when the This strategy forecloses the possibility of one particularly
server and a subset of users act maliciously—colluding and devastating outcome, but other forms of information leakage
deviating arbitrarily from the protocol. That is, no party learns remain.
anything more than the sum of the inputs of a subset of honest The privacy community generally categorizes DP into the
users of a large size [5]. following three categories as per different trust assumptions

TABLE III
C OMPARATIVE A NALYSIS A MONG CDP, LDP, AND DDP

and noise sources: CDP, LDP, and DDP. A comprehensive releases the aggregated value with an expected additive error
comparison among CDP, LDP, and DDP is listed in Table III. of at most (1/) to ensure -DP (e.g., using the Laplace
1) Centralized Differential Privacy: CDP was originally mechanism
√ [132]). In contrast, under the LDP model, at least
designed for the centralized scenario where a trusted database ( n/) additive error in expectation must be incurred by
server, which is entitled to see all participants’ data in the any -DP mechanism for the same task [146], [152]. This gap
clear, wishes to answer queries or publish statistics in a is essential for eliminating the trust in the centralized server
privacy-preserving manner by randomizing query results [42], and cannot be removed by algorithmic improvement [153].
[131], [144]. When CDP meets FL, CDP assumes a trusted To protect FL with homogeneous architectures, in which
aggregator, who is responsible for adding noise to the aggre- model parameters or gradients are shared, for example, Shokri
gated local gradients to ensure record-level privacy of the and Shmatikov [154] first applied LDP to distributed learn-
whole data of all participants [7], [118]. However, CDP is ing/FL, in which each participant individually adds noise to
geared to tackle thousands of users for training to converge its gradients before releasing to the server, thus ensuring LDP.
and achieve an acceptable tradeoff between privacy and accu- However, their privacy bounds are given per-parameter, and a
racy [7], resulting in a convergence problem with a small large number of parameters prevents their method from pro-
number of participants [145]. Moreover, CDP can achieve viding a meaningful privacy guarantee [42]. Other approaches
acceptable accuracy only with a large number of participants, that are also considered to apply LDP to FL can only support
thus not applicable to H2B with relatively a small number of shallow models, such as logistic regression, and only focus
participants. on simple tasks and datasets [119], [120], [121]. Bhowmick
Meanwhile, the assumption of a trusted server in CDP is et al. [27] presented a viable approach to large-scale local
ill-suited in many applications as it constitutes a single point private model training and introduced a relaxed version of
of failure for data breaches and saddles the trusted curator LDP by limiting the power of potential adversaries. Due to
with legal and ethical obligations to keep the user data secure. the high variance of their mechanism, it requires more than
When the aggregator is untrusted, which is often the case 200 communication rounds and incurs much higher privacy
in distributed scenarios, LDP [146] or DDP is needed [138], costs, i.e., MNIST ( = 500) and CIFAR-10 ( = 5000). Note
[147] to protect the privacy of individuals. that required in [27] is relatively large, as they considered
2) Local Differential Privacy: LDP [146] offers a stronger only privacy protection against reconstruction attacks instead
privacy guarantee, and data owners perturb their private infor- of membership attacks. Their obtained results suggested that
mation to satisfy DP locally before reporting it to an untrusted using LDP mechanisms with large may still provide decent
data curator [122], [123], [148]. A comprehensive survey of protection against reconstruction. Li et al. [143] proposed
LDP can be referred to [149]. A formal definition of LDP is locally differentially private algorithms in the context of met-
given in Definition 3. alearning, which might be applicable to FL with personaliza-
Definition 3 ((, δ)-LDP): A randomized algorithm M sat- tion. However, it only provides provable learning guarantees
isfies (, δ)-LDP ((, δ)-LDP) if and only if, for any input v in convex settings. Truex et al. [123] applied condensed LDP
and v , we have (α-CLDP) into FL. However, α-CLDP results in a weak
privacy guarantee. Another contemporary work called LDP-
Pr{M(v) = o} ≤ exp() · Pr M v = o + δ
FL [122] achieves better performance on both effectiveness
for ∀o ∈ Range(M), where Range(M) denotes the set of all and efficiency than [123] with a special communication design
possible outputs of the algorithm M. Furthermore, M is said for deep learning approaches.
to preserve (pure) -LDP if the condition holds for δ = 0. To protect FL with heterogeneous architectures, in which
Although the randomized response [150] and its vari- model predictions are shared, one naive approach is adding the
ants [151] have been widely used to provide LDP when locally differentially private random noise to the predictions
individuals disclose their personal information, we remark like in previous works. Although the privacy concern is miti-
that all the randomization mechanisms used for CDP, such as gated with random noise perturbation, it brings a new problem
Laplace and Gaussian mechanisms [132], can be individually with a substantial tradeoff between privacy budget and model
used by each participant to ensure LDP in isolation. However, utility. Sun and Lyu [45] filled in this gap by proposing a
in the distributed scenario, without the help of cryptographic novel framework called FEDMD-NFDP, which integrated a
techniques, each participant has to add enough calibrated novel noise-free DP (NFDP) mechanism into FedMD. The
noise to ensure LDP. The attractive privacy properties of LDP, LDP guarantee of NFDP roots in the local data sampling
thus, come with a huge utility degradation, especially with process, which explicitly eliminates noise addition and privacy
billions of individuals. Under the CDP model, the aggregator cost explosion issues in previous works.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8736 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

Fig. 5. Illustration of FL without privacy and with different DP mechanisms. M denotes a DP mechanism used to privatize the data. (b) In centralized DP,
the central server is trusted. (c) In LDP, the central server is not trusted; gradients are perturbed to ensure LDP before forwarding to the central server. (d) In
distributed DP, the central server is also not trusted; gradients are perturbed via DP mechanism M and encrypted via encryption operation E to ensure privacy
before forwarding to the central server, which needs to finally decrypt (D) the aggregated ciphertext. (a) FL without privacy. (b) Centralized DP: FL with a
trusted server. (c) Local DP: FL without a trusted server. (d) Distributed DP with SMC: FL without a trusted server.

3) Distributed Differential Privacy: DDP bridges the gap SMC to maintain utility and ensure aggregator oblivious-
between LDP and CDP while ensuring the privacy of each ness, as evidenced in [30], [46], [125], [138], [139], [140],
individual by combining with cryptographic protocols [30], and [141].
[137], [138], [139], [140]. Therefore, DDP avoids placing trust An illustration of FL without privacy and with different
in any server and offers better utility than LDP. Theoretically, DP mechanisms is given in Fig. 5. Another parallel line
DDP offers the same utility as CDP, as the total amount of of work for privacy-preserving distributed learning is to
noise is the same. transfer the knowledge of the ensemble of multiple models
The notion of DDP reflects the fact that the required noise in to a student model [42], [144], [155], [156]. For example,
the target statistic is sourced from multiple participants [147]. Hamm et al. [155] first created labeled data from auxiliary
Approaches to DDP that implement an overall additive noise unlabeled data, then used the labeled auxiliary data to find an
mechanism by summing the same mechanism run at each empirical risk minimizer, and, finally, released a differentially
participant (typically with less noise) necessitate mechanisms private classifier using output perturbation [157]. Similarly,
with stable distributions—to guarantee proper calibration of Papernot et al. [42], [144] proposed private aggregation of
known end-to-end response distribution—and cryptography teacher ensembles (PATE) to first train an ensemble of teach-
for hiding all but the final result from participants [30], ers on disjoint subsets of private data and then perturb the
[46], [124], [137], [138], [139], [147]. Stable distributions knowledge of the ensemble of teachers by adding noise to the
include Gaussian distribution, Binomial distribution [44], and aggregated teacher votes before transferring the knowledge to
so on, i.e., the sum of Gaussian random variables still follows a student. Finally, a student model is trained on the aggregate
a Gaussian distribution, and the sum of Binomial random output of the ensemble such that the student learns to accu-
variables still follows a Binomial distribution. DDP utilizes rately mimic the ensemble. PATE requires a lot of participants
this nice stability to permit each participant to randomize its to achieve reasonable accuracy, and each participant needs to
local statistic to a lesser degree than would LDP. However, have enough data to train an accurate model, which might
in DDP, only the sum of the individually released statis- not hold in the FL system, where the data distribution of
tics is (, δ)-differentially private but not the individually participants might be highly unbalanced, making this approach
released statistic. Therefore, DDP necessitates the help of unsuitable to the FL system.

VI. D EFENSES AGAINST P OISONING ATTACKS and geometric-median-based robust aggregation rules are also
Robustness to poisoning attacks is a desirable property extensively explored in [165] and [166]. Pillutla et al. [64]
in FL. To address poisoning attacks, many robust aggregation proposed a robust aggregation approach called RFA by replac-
schemes are proposed in the literature. Known defenses to ing the weighted arithmetic mean with an approximate geo-
poisoning attacks in a centralized setting, such as robust metric median, so as to reduce the impact of the contaminated
losses [158] and anomaly detection [98], assume control of updates. Unfortunately, RFA can only handle a few types of
the participants or explicit observation of the training data. poisoning attackers but is not applicable to Byzantine attacks.
Neither of these assumptions is applicable to FL in which the 4) Weakness of Current Defenses: In spite of their robust-
server only observes the model parameters/updates that are ness guarantees, recent inspections revealed that previous
sent as part of the iterative ML algorithm [37]. We summarize Byzantine-robust FL mechanisms are also quite brittle and
the defenses against untargeted and targeted attacks as follows. can be easily circumvented. Bhagoji et al. [36] showed that
targeted model poisoning of deep neural networks is effec-
A. Defenses Against Untargeted Attacks tive even against the Byzantine-robust aggregation rules,
For Byzantine-resilient aggregation, an algorithm is Byzan- such as Krum and coordinatewise median. Baruch et al. [109]
tine fault tolerant [40] if its convergence is robust even when and Xie et al. [100] showed that, while the Byzantine-robust
a large portion of participants is adversarial. In the following, aggregation rules may ensure that the influence of the Byzan-
we list several representative attempts that try to defend against tine workers in any single round is limited, the attackers
untargeted Byzantine attacks. can couple their attacks across the rounds, moving weights
1) AUROR: Shen et al. [159] introduced a statistical mech- significantly away from the desired direction, and, thus,
anism called AUROR to detect malicious users while gener- achieve the goal of lowering the model quality. Xu and
ating an accurate model. AUROR is based on the observation Lyu [166] demonstrated that multi-Krum is not robust against
that indicative features (most important model features) from untargeted poisoning. This is because multi-Krum is based
the majority of honest users will exhibit a similar distribution, on the distance between each gradient vector and the mean
while those from malicious users will exhibit an anomalous vector, while the mean vector is not robust against untargeted
distribution. It then uses k-means to cluster participants’ poisoning. Fang et al. [80] showed that aggregation rules
updates across training rounds and discards the outliers, i.e., (e.g., Krum [40], Bulyan [160], trimmed mean [63], coor-
contributions from small clusters that exceed a threshold dinatewise median [63], and other median-based aggrega-
distance are removed. The accuracy of a model trained using tors [62]) that were claimed to be robust against Byzantine
AUROR drops by only 3% even when 30% of all the users failures are not effective in practice against optimized local
are adversarial. model poisoning attacks that carefully craft local models on
2) Krum: Blanchard et al. [40] proposed Krum, in which the compromised participants such that the aggregated global
the top f contributions to the model that is furthest from the model deviates the most toward the inverse of the direction
mean participant contribution are removed from the aggrega- along which the global model would change when there are
tion. Krum uses the Euclidean distance to determine which no attacks. All these highlight the need for more effective
gradient contributions should be removed. It can theoretically defenses against Byzantine attackers in FL.
withstand poisoning attacks of up to 33% adversaries among 5) Other Possibilities: Other works investigate Byzantine
the participants, i.e., given n agents of which f is Byzantine, robustness from different lenses. Chen et al. [163] presented
Krum requires that n ≥ 2 f + 3. Krum is resistant to DRACO, a framework for robust distributed training via
attacks by omniscient adversaries—aware of a good estimate algorithmic redundancy. DRACO is robust to arbitrarily mali-
of the gradient—who send the opposite vector multiplied by cious computing nodes while being orders of magnitude
a large factor. It is also resistant to attacks by adversaries faster than state-of-the-art robust distributed systems. How-
who send random vectors drawn from a Gaussian distribution ever, DRACO assumes that each participant can access other
(the larger the variance of the distribution, the stronger the participants’ data, limiting its practicability in FL. Su and
attack). Multi-Krum is a variant of Krum, which intuitively Xu [94] proposed to robustly aggregate the gradients computed
interpolates between Krum and averaging, thereby combining by the Byzantine participants based on the filtering proce-
the resilience properties of Krum with the convergence speed dure proposed by Steinhardt et al. [167]. Bernstein et al. [39]
of averaging. Essentially, Krum filters outliers based on the proposed SIGNSGD, which is combined with a majority vote
entire update vector but does not filter coordinatewise outliers. to enable participants to upload elementwise signs of their
3) Coordinatewise Statistics: To address this issue, gradients to defend against three types of half “blind” Byzan-
Yin et al. [63] proposed two robust distributed gradient descent tine adversaries: 1) adversaries that arbitrarily rescale their
algorithms: one based on coordinatewise median and the other stochastic gradient estimate; 2) adversaries that randomize
based on the coordinatewise trimmed mean. Unfortunately, the sign of each coordinate of the stochastic gradient; and
median-based rules can incur a prohibitive computational 3) adversaries that invert their stochastic gradient estimate.
overhead in large-scale settings [163]. Guerraoui et al. [160]
proposed a meta-aggregation rule called Bulyan, a two-step B. Defenses Against Targeted Attacks
meta-aggregation algorithm based on the Krum and trimmed Existing defenses against the targeted backdoor attacks can
median, which filters malicious updates followed by comput- be categorized into two types: detection methods and erasing
ing the trimmed median of the remaining updates. Median methods [168].

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8738 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

TABLE IV
S TATE - OF - THE -A RT D EFENSES A GAINST FL P OISONING . n I S THE N UMBER OF PARTICIPANTS .
N OTE T HAT S OME D EFENSES H AVE N O T HEORETIC B REAKING P OINT

1) Detection: Detection methods exploit activation statis- largely unexplored. Xie et al. [162] provided the first general
tics or model properties to determine whether a model is framework called certifiably robust FL (CRFL) to train CRFL
backdoored [169], [170] or whether a training/test example models against backdoors.
is a backdoor example [52]. There are a number of detection 4) Sybil Defenses in FL: In addition to backdoors, the
algorithms that are designed to detect which inputs contain targeted attack can also be launched by Sybil clones [37].
a backdoor, and which parts of the model (its activation To defend against the targeted poisoning attack by Sybil
functions specifically) are responsible for triggering the adver- clones, Fung et al. [37] exploited the characteristic behavior
sarial behavior of the model, in order to remove the back- that Sybils are more similar to each other than the similarity
door [47], [52], [53], [103], [171]. These algorithms rely on observed amongst the honest clients and proposed FoolsGold:
the statistical difference between the latent representations of a new defense scheme against FL Sybil attacks by adapting the
backdoor-enabled and clean (benign) inputs in the poisoned learning rate of participants based on contribution similarity.
model. These backdoor detection algorithms can, however, Note that FoolsGold does not bound the expected number of
be bypassed by maximizing the latent indistinguishability of attackers by assuming that attackers can spawn a large number
backdoor-enabled adversarial inputs and clean inputs [172]. of Sybils, rendering assumptions about proportions of honest
2) Erasing: While detection can help identify potential participants unrealistic [40]. In addition, FoolsGold requires no
risks, the backdoored model still needs to be purified/erased auxiliary information beyond the learning process and makes
since the potential impact of backdoor triggers remains fewer assumptions about participants and their data. The
uncleared in the backdoored models. The erasing methods robustness of FoolsGold holds for different distributions of
take a step further and aim to purify the adverse impacts participant data, varying poisoning targets, and various Sybil
on models caused by the backdoor triggers. The current strategies and can be applied successfully on both FedSGD
state-of-the-art erasing methods are mode connectivity repair and FedAvg.
(MCR) [173] and NAD [59]. MCR mitigates the backdoors 5) Summary: We list the most representative defenses
by selecting a robust model in the path of loss landscape, against poisoning attacks in FL in Table IV. Some of them
while NAD leverages knowledge distillation to erase triggers. have breaking points, i.e., the fraction of malicious partici-
Other previous methods, including fine-tuning, denoising, and pants, and robustness guarantees cannot be provided if the
fine-pruning [171], have been shown to be insufficient against fraction of malicious participants is larger than the breaking
the latest attacks [58], [174]. Another more recent work called point.
ABL [60] aims to train clean models given backdoor-poisoned 6) Remark: Note that both the untargeted and targeted
data. The overall learning process is framed as a dual task of poisoning attacks are less effective in settings with infrequent
learning the clean and the backdoor portions of data. Based participation like H2C [35]. Moreover, under practical produc-
on this process, ABL can: 1) help isolate backdoor examples tion FL environments, Shejwalkar et al. [73] have shown that
at an early training stage and 2) break the correlation between FL, even without any defenses, is highly robust in practice. For
backdoor examples and the target class at a later training production cross-device FL (H2C), which contains thousands
stage. to billions of clients, poisoning attacks have no impact on
3) Backdoor Defenses in FL: Despite the promising back- existing robust FL algorithms even with impractically high
door defense results in the centralized setting, it is still percentages of compromised clients. For production cross-silo
unclear whether these defenses can be smoothly adapted to FL (H2B), which contains up to 100 clients, data poisoning
the FL setting, especially in the non-i.i.d. setting. For backdoor attacks are completely ineffective; model poisoning attacks
defense in FL, Sun et al. [38] showed that clipping the norm are unlikely to play a major risk when the clients involved
of model updates and adding Gaussian noise can mitigate are bound by contract and their software stacks profession-
backdoor attacks that are based on the model replacement ally maintained (e.g., in banks, hospitals). Some exceptional
paradigm. Andreina et al. [175] incorporated an additional cross-silo scenarios are most likely with a strong incentive
validation phase in each round of FL to detect backdoor. How- (e.g., financial) causing multiple parties to be willing to risk
ever, none of these provides certified robustness guarantees. a breach of contract by colluding or for one party to hack
Certified robustness for FL against backdoor attacks remains thereby risking criminal liability. Therefore, we conclude that

these poisoning attacks are more likely to happen in some H2C scenario. In terms of DP-based methods [7], [121], [182],
exceptional H2B scenarios. [183], [184], [185], record-level DP bounds the success of
membership inference but does not prevent property inference
VII. D ISCUSSION AND P ROMISING D IRECTIONS applied to a group of training records [28]. Participant-level
There are still potential vulnerabilities that need to be DP, on the other hand, is geared to work with thousands of
addressed in order to improve the privacy and robustness of users for training to converge and achieving an acceptable
FL systems. Moreover, there are multiple design goals that tradeoff between privacy and accuracy [7]. The FL model
are equally important with privacy and robustness and, thus, fails to converge with a small number of participants, making
need to be considered simultaneously in FL. In this section, it unsuitable for H2B scenarios. Furthermore, DP may hurt
we outline research directions that we believe are promising. the accuracy of the learned model [186], which may not be
7) Curse of Dimensionality: Large models, with high appealing to the industry. Further work is needed to investigate
dimensional parameter vectors, are particularly susceptible if participant-level DP can protect FL systems with fewer
to privacy and security attacks [176]. Most FL algorithms participants. It is also worthwhile to explore whether we can
require overwriting the local model parameters with the global use the condensed data [187] rather than the raw data for local
model. This makes them susceptible to poisoning attacks, model training in order to better protect privacy.
as the adversary can make small but damaging changes in the 10) Optimizing Defense Mechanism Deployment: When
high-dimensional models without being detected. Almost all deploying defense mechanisms to check if any adversary is
of the well-designed Byzantine-robust aggregators [40], [63], attacking the FL system, the FL server will need additional
[64] still suffer from the curse of dimensionality. Specifically, computational costs. In addition, different types of defense
the estimation error scales up with the size of the model in mechanisms may exhibit different effectiveness against differ-
a square-root manner. Thus, sharing model parameters may ent attacks and incur different costs. It is important to study
not be a strong design choice in FL; it opens all the internal how to optimize the timing of deploying defense mechanisms
states of the model to inference attacks and maximizes the or the announcement of deterrence measures. Game theoretic
model’s malleability by poisoning attacks. To address these research holds promise in addressing this challenge.
fundamental shortcomings of FL, it is worthwhile to explore 11) Test-Phase Privacy in FL: This survey mainly focuses
whether sharing gradients is essential. Instead, sharing less on the training phase attacks and defenses in FL, considering
sensitive information (e.g., SIGNSGD [39]) or only sharing the more attack possibilities opened by the distributed training
model predictions [25], [45], [176] in a black-box manner may property of FL systems. In fact, FL is also vulnerable to both
result in more robust privacy protection in FL. privacy and robustness attacks during the test/inference phase
8) Rethinking Current Privacy Attacks: There are several by the users of the final FL model when it is deployed as a
inherent weaknesses in current attacks that may limit their service.
applicability in FL [177]. For example, the GAN attack In terms of privacy vulnerability, the trained global model
assumes that the entire training corpus for a given class comes may reveal sensitive information from model predictions when
from a single participant, and only in the special case where all deployed as a service, causing privacy leakage. In such a
class members are similar, GAN-constructed representatives setting, an adversary does not have direct access to the
are similar to the training data [75]. These assumptions are model parameters but may be able to view input–output
less practical in FL. For DLG [29] and iDLG [32], both works: pairs. Previous studies have shown a series of privacy leakage
1) adopt a second-order optimization method called L-BFGS given only black-box access to the trained models, such as:
that is more computationally expensive compared with first- 1) model stealing attacks in which model parameters can be
order optimizations;2) are only applicable to gradients com- reconstructed by an adversary who only has access to an
puted on minibatches of data, i.e., at most B = 8 in DLG and inference/prediction API based on those parameters [82], [83],
B = 1 in iDLG, which is not the real case for FL, in which [84], [85] and 2) MIAs that aim to determine if a particular
gradient is normally shared after at least 1 epoch of local record was used to train the model [86]. FL models face a sim-
training; and 3) used untrained model, neglecting gradients ilar dilemma during model deployment for testing purposes.
over multiple communication rounds. Attacking FL systems The development of effective defenses against privacy leakage
in a more efficient manner and under more practical settings during model deployment calls for further investigations.
remain largely unexplored. In addition, whether current attacks 12) Test-Phase Robustness in FL: In terms of robustness
still work in FL that uses adaptive optimization methods [178], vulnerability, recent studies [90], [91], [92] have shown that
such as SGDM and Adam, remains unknown. FL is also vulnerable to well-crafted adversarial examples.
9) Rethinking Current Defenses: FL with secure aggrega- During inference time, the attackers can add a very small
tion for the purpose of privacy is more susceptible to poisoning perturbation to the test data, making the test data almost indis-
attacks as individual updates cannot be inspected. Similarly, tinguishable from natural data and yet classified incorrectly by
it is still unclear if AT, one state-of-the-art defense approach the global model. For federated robustness against adversarial
against adversarial attacks in conventional ML [93], [179], examples, Zizzo et al. [90] and Hong et al. [91] proposed
[180], can be adapted to FL, as AT was developed primarily for to apply AT to FL, i.e., FAT, in order to achieve adversarial
i.i.d. data and remains unclear for its performance in non-i.i.d. robustness in FL. Zizzo et al. [90] noticed that conducting AT
settings. Moreover, AT is computationally expensive and may on all participants leads to divergence of the model. To solve
hurt the performance [181], which may not be feasible for the this problem, they conducted AT on only a proportion of

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8740 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

participants for better convergence. Another recent work by For VFL, Secureboost [191] considered user privacy and
Hong et al. [91] considered hardware heterogeneity in FL, data confidentiality in VFL and presented an approach to train
i.e., only limited users can afford AT. Hence, they conduct a high-quality tree-boosting model collaboratively. A recent
AT on only a proportion of participants that have powerful work called FederBoost [192] pointed out that Secureboost
computation resources while standard training on the rest of is expensive since it requires cryptographic computation and
the participants. Shah et al. [92] investigated the impact of communication for each possible split; thus, they proposed a
communication rounds in FAT and proposed a dynamic AT. vertical FederBoost, which does not require any cryptographic
The training of all the above FAT works is unstable, which operation. Another recent work by Jin et al. [193] uncovered
potentially hurts the convergence and performance. Moreover, the risk of catastrophic data leakage in vertical FL (CAFE)
AT typically requires significant computation and a longer time through a novel algorithm that can perform large-batch data
to converge, and it is unclear how it performs in non-i.i.d. leakage with high data recovery quality and theoretical guar-
settings. Chen et al. [188] took the first step to investigate FAT antees. They empirically demonstrated that CAFE can recover
under non-i.i.d. setting with label skewness. However, how to large-scale private data from the shared aggregated gradients
speed up AT in FL may be required in the future. Overall, there in VFL settings, overcoming the batch limitation problem in
exist difficulties in applying AT to the federated setting. This current data leakage attacks.
motivates future works to explore more effective approaches For FTL, Gao et al. [24] proposed an end-to-end privacy-
to maintain both natural accuracy and robustness in FL. preserving multiparty learning approach with two variants
In addition to the adversarial examples, recent works [83], based on HE and secret-sharing techniques, respectively,
[84] have validated that the API services (the victim/target in order to build a heterogeneous FTL (HFTL) frame-
model) can be easily stolen and are vulnerable to adversarial work. Liu et al. [23] adopted two secure approaches, namely,
example transferability attacks. It would be interesting to HE and secret sharing for preserving privacy. The HE approach
explore whether the collaboratively built global model in FL is simple but computationally expensive. By contrast, the
is also facing a similar problem and how to effectively claim secret-sharing approach offers the following advantages: 1)
the ownership of the trained model [189]. there is no accuracy loss and 2) computation is much
13) Relationship With GDPR: GDPR1 defines six-core prin- faster than the HE approach. The major drawback of the
ciples as rational guidelines for service providers to manage secret-sharing approach is that one has to offline generate and
personal data, including: 1) lawfulness, fairness, and trans- store many triplets before online computation.
parency; 2) purpose limitation; 3) data minimization; 4) accu- Overall, there is still a large space for VFL and FTL. It is
racy; 5) storage limitation; and 6) integrity and confidentiality worth further investigation as to whether existing threats in
(security). GDPR also requires data controllers to provide HFL are all valid in VFL and FTL or if there are new threats
the following rights for data subjects if capable (the GDPR and countermeasures in VFL and FTL.
Articles 12–23): 1) right to be informed; 2) right of access; 15) Vulnerabilities to Free-Riding Participants: In FL sys-
3) right to rectification; 4) right to erasure (right to be forgot- tems, there may exist free-riders who aim to benefit from
ten); 5) right to restrict processing; 6) right to data portability; the global model but do not want to contribute any useful
7) right to object; and 8) rights in relation to automated information, thus compromising collaborative fairness [183],
decision making and profiling. Although FL has emerged as [184], [194]. The main incentives for free-riders include:
a prospective solution that facilitates distributed collaborative 1) the participant does not have any data to train the local
learning without disclosing original training data, unfortu- model; 2) the participant is too concerned about its privacy
nately, FL is not naturally compliant with the GDPR [190], and, thus, chooses to release fake updates; and 3) the partic-
as pointed out by a recent survey [190], which has dedicated ipant does not want to consume or does not have any local
to surveying the relationship between FL and GDPR require- computation power to train the local model. In the current FL
ments. For example, the secure aggregation mechanism in FL paradigm [6], all participants receive the same federated model
amplifies the lack of transparency and fairness in FL systems at the end of the collaborative training, regardless of their
and, thus, fails to fully comply with the GDPR requirements individual contributions. This makes the paradigm vulnerable
of fairness and transparency; malicious participants in FL may to free-riding participants [166], [183], [184], [195], [196].
conduct either data or model poisoning attacks for an unautho- How to prevent free-riding remains an open challenge. Incen-
rized purpose, and local ML model parameters obtained from tivized FL (via allocating different reputations to different
participants are no longer minimal for the original purpose. participants and penalizing unreliable or malicious partici-
These possible attacks, which lead to noncompliance with the pants) [194], [197] would be an important direction to help
GDPR, should be addressed. Henceforth, it is worthwhile to address free-riding problem and possibly the before-mentioned
explore approaches to empower FL-based systems to follow privacy and poisoning attack problems in Sections III and IV.
the GDPR regulatory guidelines and, thus, fully comply with 16) Reliability of FL Over Wireless Network: When FL
the GDPR. systems are deployed in the real world, unreliable data may be
14) Threats and Protections of VFL and FTL: This survey uploaded by mobile devices (i.e., workers). The workers may
mainly focuses on the threats to HFL; there are some recent perform unreliable updates intentionally, e.g., the data poi-
exploratory efforts on threats and protections of VFL and FTL. soning attack, or unintentionally, e.g., low-quality data caused
by energy constraints or high-speed mobility [198]. Similarly,
1 https://gdpr-info.eu when FL meets UAVs, reliability is a key factor that may

affect performance [14]. Therefore, finding out trusted and and there exist large gaps as to how to simultaneously achieve
reliable workers for FL tasks becomes critical. The concept of all the above six objectives.
reputation could be used for reliable worker selection strategy
design in order to keep the low-quality devices from affecting VIII. C ONCLUSION
the learning efficiency and accuracy [198]. Although FL is still in its infancy, it will continue to
thrive and will be an active and important research area in
17) Extension to Decentralized FL: Decentralized FL is the foreseeable future. As FL evolves, so will the privacy
an emerging research area, where there is no single central and robustness threats to FL. It is of vital importance to
server in the system [3], [7], [183], [184]. Decentralized FL is provide a broad overview of current attacks and defenses on
potentially more useful for H2B scenarios where the business FL so that future FL system designers are well aware of
participants do not trust any third party. In this paradigm, the potential vulnerabilities in the current designs and help
each participant could be elected as a server in a round- them clear roadblocks toward the real-world deployment of
robin manner. The recent emerging swarm learning [199] can FL. This survey serves as a concise and accessible overview
be deemed as a decentralized FL framework, which unites of this topic, and it would greatly help our understanding
edge computing, blockchain-based peer-to-peer networking, of the privacy and robustness attack and defense landscape
and coordination while maintaining confidentiality without the in FL. Global collaboration on FL is emerging through a
need for a central coordinator. It is interesting to investigate number of workshops at leading AI conferences.2 The ultimate
whether existing threats to server-based FL still apply in goal of developing a general-purpose FL defense mechanism
decentralized FL. that can be robust against various attacks without degrading
18) Efficient FL With Single Round Communication: In model performance will require interdisciplinary effort from
addition to privacy and robustness, communication cost is the wider research community.
another major concern that may hinder the large-scale imple-
mentation of FL [200]. One-shot FL has recently emerged as ACKNOWLEDGMENT
a promising approach for communication efficiency. It allows Any opinions, findings, and conclusions or recommenda-
the central server to learn a model in a single communication tions expressed in this material are those of the author(s) and
round. Despite the low communication cost, existing one-shot do not reflect the views of the National Research Foundation,
FL methods are mostly impractical or face inherent limitations, Singapore.
e.g., a public dataset is required, participants’ models are
R EFERENCES
homogeneous, additional data/model information needs to be
uploaded, and unsatisfactory performance [201], [202], [203]. [1] H. Li, K. Ota, and M. Dong, “Learning IoT in edge: Deep learning
for the Internet of Things with edge computing,” IEEE Netw., vol. 32,
Recent work proposed a more practical data-free approach no. 1, pp. 96–101, Jan. 2018.
named DENSE for a one-shot FL framework with hetero- [2] M. Abadi et al., “Deep learning with differential privacy,” in Proc.
geneity [204]. Other alternative one-shot FL approaches with CCS, 2016, pp. 308–318.
[3] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
practical assumptions are worthwhile to explore, considering learning,” Synth. Lect. Artif. Intell. Mach. Learn., vol. 13, no. 3,
the alluring communication efficiency and less privacy and pp. 1–207, 2019.
robustness attack surfaces exposed in one-shot FL. [4] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and
B. A. Y. Arcas, “Communication-efficient learning of deep networks
19) Achieving Multiple Objectives Simultaneously: There from decentralized data,” 2016, arXiv:1602.05629.
[5] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
are no existing works that can satisfy multiple goals simultane- machine learning,” in Proc. CCS, 2017, pp. 1175–1191.
ously: 1) fast algorithmic convergence; 2) good generalization [6] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,
performance; 3) communication efficiency; 4) fault tolerance; “Communication-efficient learning of deep networks from decentral-
ized data,” in Artificial Intelligence and Statistics. PMLR, 2017,
5) privacy preservation; and 6) robustness to targeted, untar- pp. 1273–1282.
geted poisoning attacks, and free-riders. Previous works have [7] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning
attempted to solve multiple objectives at the same time. For differentially private recurrent language models,” in Proc. ICLR, 2018,
pp. 1–14.
example, Lyu et al. [183], [184] addressed collaborative fair- [8] Y. Liu et al., “Fedvision: An online visual object detection platform
ness and privacy simultaneously; Xu and Lyu et al. [166] pro- powered by federated learning,” in Proc. IAAI, 2020, pp. 13172–13179.
posed a robust and fair FL (RFFL) framework to address both [9] S. Hardy et al., “Private federated learning on vertically partitioned data
via entity resolution and additively homomorphic encryption,” 2017,
collaborative fairness and Byzantine robustness. However, it is arXiv:1711.10677.
important to highlight that there is an inherent conflict between [10] C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, “Communication-
privacy and robustness: defending against robustness attacks efficient federated learning via knowledge distillation,” Nature Com-
mun., vol. 13, no. 1, pp. 1–8, 2022.
usually requires complete control of the training process or [11] C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, “FedCTR: Federated
access to the training data [37], [40], [63], [159], [205], [206], native ad CTR prediction with cross platform user behavior data,” ACM
which goes against the privacy requirements of FL. Although Trans. Intell. Syst. Technol., vol. 13, no. 4, pp. 62:1–62:19, 2022.
[12] J. Cui, C. Chen, L. Lyu, C. Yang, and W. Li, “Exploiting data sparsity
using encryption or DP-based techniques can provide provably in secure cross-platform social recommendation,” in Proc. NIPS, 2021,
privacy preservation, they are not robust to poisoning attacks pp. 10524–10534.
and may produce models with undesirably poor privacy-utility [13] J. Li, L. Lyu, X. Liu, X. Zhang, and X. Lyu, “FLEAM: A federated
learning empowered architecture to mitigate DDoS in industrial IoT,”
tradeoffs. Agarwal et al. [30] combined DP with model com- IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 4059–4068, Jun. 2021.
pression techniques to reduce communication costs and obtain
privacy benefits simultaneously. It remains largely unexplored, 2 http://www.federated-learning.org/

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8742 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

[14] H. Yang, J. Zhao, Z. Xiong, K.-Y. Lam, S. Sun, and L. Xiao, “Privacy- [42] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar,
preserving federated learning for UAV-enabled networks: Learning- “Semi-supervised knowledge transfer for deep learning from private
based joint scheduling and resource management,” IEEE J. Sel. Areas training data,” in Proc. ICLR, 2017, pp. 1–16.
Commun., vol. 39, no. 10, pp. 3144–3159, Oct. 2021. [43] L. Lyu, “Privacy-preserving machine learning and data aggregation for
[15] C. Wu, F. Wu, Y. Cao, Y. Huang, and X. Xie, “FedGNN: Federated Internet of Things,” Ph.D. dissertation, Dept. Elect. Electron. Eng.,
graph neural network for privacy-preserving recommendation,” 2021, Univ. Melbourne, Melbourne, VIC, Australia, 2018.
arXiv:2102.04925. [44] L. Lyu et al., “Distributed privacy-preserving prediction,” in Proc. Int.
[16] C. Chen et al., “Vertically federated graph neural network for privacy- Conf. Syst., Man, Cybern., 2020.
preserving node classification,” in Proc. IJCAI, 2022, pp. 1–9. [45] L. Sun and L. Lyu, “Federated model distillation with noise-free
[17] X. Ni, X. Xu, L. Lyu, C. Meng, and W. Wang, “A vertical fed- differential privacy,” in Proc. IJCAI, 2021.
erated learning framework for graph convolutional network,” 2021, [46] S. Truex et al., “A hybrid approach to privacy-preserving federated
arXiv:2106.11593. learning,” in Proc. 12th ACM Workshop Artif. Intell. Secur., 2019,
[18] C. Wu, F. Wu, L. Lyu, T. Qi, Y. Huang, and X. Xie, “A federated graph pp. 1–11.
neural network framework for privacy-preserving personalization,” [47] Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int.
Nature Commun., vol. 13, no. 1, pp. 1–10, Dec. 2022. Conf. Comput. Design (ICCD), Nov. 2017, pp. 45–48.
[19] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning:
[48] B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input
Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10,
purification defense against trojan attacks on deep neural network sys-
no. 2, pp. 1–19, 2019.
tems,” in Proc. Annu. Comput. Secur. Appl. Conf., 2020, pp. 897–912.
[20] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed mining
[49] S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopad-
of association rules on horizontally partitioned data,” IEEE Trans.
hyay, “Model agnostic defence against backdoor attacks in machine
Knowl. Data Eng., vol. 16, no. 9, pp. 1026–1037, Sep. 2004.
learning,” IEEE Trans. Rel., vol. 71, no. 2, pp. 880–895, Jun. 2022.
[21] J. Vaidya and C. Clifton, “Privacy preserving association rule mining
in vertically partitioned data,” in Proc. KDD, 2002, pp. 639–644. [50] M. Villarreal-Vasquez and B. Bhargava, “ConFoc: Content-focus
[22] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. protection against trojan attacks on neural networks,” 2020,
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Dec. 2009. arXiv:2007.00711.
[23] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang, “A secure feder- [51] Y. Li, T. Zhai, B. Wu, Y. Jiang, Z. Li, and S. Xia, “Rethinking the
ated transfer learning framework,” IEEE Intell. Syst., vol. 35, no. 4, trigger of backdoor attack,” 2020, arXiv:2004.04692.
pp. 70–82, Jul./Aug. 2020. [52] B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,”
[24] D. Gao, Y. Liu, A. Huang, C. Ju, H. Yu, and Q. Yang, “Privacy- in Proc. NIPS, 2018, pp. 8000–8010.
preserving heterogeneous federated transfer learning,” in Proc. IEEE [53] B. Chen et al., “Detecting backdoor attacks on deep neural networks
Int. Conf. Big Data (Big Data), Dec. 2019, pp. 2552–2559. by activation clustering,” 2018, arXiv:1811.03728.
[25] D. Li and J. Wang, “FedMD: Heterogenous federated learning via [54] D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant:
model distillation,” 2019, arXiv:1910.03581. Statistical analysis of DNNs for robust backdoor contamination detec-
[26] R. Liu et al., “No one left behind: Inclusive federated learning over tion,” in Proc. 30th USENIX Secur. Symp. (USENIX Security), 2021,
heterogeneous devices,” in Proc. KDD, 2022, pp. 1–9. pp. 1541–1558.
[27] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers, “Pro- [55] E. Soremekun, S. Udeshi, and S. Chattopadhyay, “Exposing backdoors
tection against reconstruction and its applications in private federated in robust machine learning models,” 2020, arXiv:2003.00865.
learning,” 2018, arXiv:1812.00984. [56] A. Chan and Y.-S. Ong, “Poison as a cure: Detecting & neutraliz-
[28] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting ing variable-sized backdoor attacks in deep neural networks,” 2019,
unintended feature leakage in collaborative learning,” in Proc. IEEE arXiv:1911.08040.
Symp. Secur. Privacy (SP), May 2019, pp. 691–706. [57] E. Chou, F. Tramer, and G. Pellegrino, “SentiNet: Detecting localized
[29] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” in Proc. universal attack against deep learning systems,” in Proc. IEEE Secur.
NIPS, 2019, pp. 14747–14756. Privacy Workshops (SPW), May 2020, pp. 48–54.
[30] N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan, [58] Y. Liu, X. Ma, J. Bailey, and F. Lu, “Reflection backdoor: A natural
“CpSGD: Communication-efficient and differentially-private distrib- backdoor attack on deep neural networks,” in Proc. ECCV. Springer,
uted SGD,” in Proc. NIPS, 2018, pp. 7564–7575. 2020, pp. 182–199.
[31] L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai, “Privacy- [59] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention
preserving deep learning via additively homomorphic encryption,” distillation: Erasing backdoor triggers from deep neural networks,” in
IEEE Trans. Inf. Forensics Security, vol. 13, no. 5, pp. 1333–1345, Proc. ICLR, 2021, pp. 1–19.
May 2018. [60] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Anti-backdoor
[32] B. Zhao, K. R. Mopuri, and H. Bilen, “IDLG: Improved deep leakage learning: Training clean models on poisoned data,” in Proc. NIPS, 2021,
from gradients,” 2020, arXiv:2001.02610. pp. 14900–14912.
[33] H. Wang et al., “Attack of the tails: Yes, you really can backdoor [61] G. Damaskinos, E. M. El Mhamdi, R. Guerraoui, A. H. A. Guirguis,
federated learning,” in Proc. NIPS, 2020, pp. 1–15. and S. L. A. Rouault, “Aggregathor: Byzantine machine learning via
[34] C. Xie, K. Huang, P. Chen, and B. Li, “DBA: Distributed backdoor robust gradient aggregation,” in Proc. SysML, 2019, pp. 81–106.
attacks against federated learning,” in Proc. ICLR, 2020, pp. 1–19. [62] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning
[35] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to in adversarial settings: Byzantine gradient descent,” ACM Meas. Anal.
backdoor federated learning,” in Proc. Int. Conf. Artif. Intell. Statist., Comput. Syst., vol. 1, no. 2, p. 44, 2017.
2020, pp. 2938–2948.
[63] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distrib-
[36] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing uted learning: Towards optimal statistical rates,” in Proc. ICML, 2018,
federated learning through an adversarial lens,” in Proc. ICML, 2019,
pp. 5650–5659.
pp. 634–643.
[64] K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggrega-
[37] C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated
tion for federated learning,” IEEE Trans. Signal Process., vol. 70,
learning in Sybil settings,” in Proc. 23rd Int. Symp. Res. Attacks,
pp. 1142–1154, 2022.
Intrusions Defenses (RAID), 2020, pp. 301–316.
[38] Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really [65] M. S. Ozdayi, M. Kantarcioglu, and Y. R. Gel, “Defending against
backdoor federated learning?” 2019, arXiv:1911.07963. backdoors in federated learning with robust learning rate,” in Proc.
[39] J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, AAAI Conf. Artif. Intell., vol. 35, no. 10, 2021, pp. 9268–9276.
“SignSGD with majority vote is communication efficient and fault [66] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning:
tolerant,” in Proc. ICLR, 2019, pp. 1–20. Challenges, methods, and future directions,” IEEE Signal Process.
[40] P. Blanchard et al., “Machine learning with adversaries: Byzantine Mag., vol. 37, no. 3, pp. 50–60, May 2020.
tolerant gradient descent,” in Proc. NIPS, 2017, pp. 119–129. [67] P. Kairouz et al., “Advances and open problems in federated learning,”
[41] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, 2019, arXiv:1912.04977.
“Communication-efficient on-device machine learning: Federated dis- [68] Q. Li et al., “A survey on federated learning systems: Vision, hype
tillation and augmentation under non-IID private data,” 2018, and reality for data privacy and protection,” IEEE Trans. Knowl. Data
arXiv:1811.11479. Eng., early access, Nov. 2, 2021, doi: 10.1109/TKDE.2021.3124599.

[69] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, [96] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks
and G. Srivastava, “A survey on security and privacy of federated learn- that exploit confidence information and basic countermeasures,” in
ing,” Future Gener. Comput. Syst., vol. 115, pp. 619–640, Feb. 2021. Proc. 22nd ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2015,
[70] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey pp. 1322–1333.
on federated learning,” Knowl.-Based Syst., vol. 216, Mar. 2021, [97] S. Guo, C. Xie, J. Li, L. Lyu, and T. Zhang, “Threats to pre-trained
Art. no. 106775. language models: Survey and taxonomy,” 2022, arXiv:2202.06862.
[71] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy [98] B. I. P. Rubinstein et al., “ANTIDOTE: Understanding and defending
analysis of deep learning: Passive and active white-box inference against poisoning of anomaly detectors,” in Proc. 9th ACM SIGCOMM
attacks against centralized and federated learning,” in Proc. IEEE Symp. Internet Measurement Conf., 2009, pp. 1–14.
Secur. Privacy (SP), May 2019, pp. 739–753. [99] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li,
[72] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient “Manipulating machine learning: Poisoning attacks and countermea-
inversion attacks and defenses in federated learning,” in Proc. NIPS, sures for regression learning,” in Proc. IEEE Symp. Secur. Privacy
2021, pp. 7232–7241. (SP), May 2018, pp. 19–35.
[73] V. Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back [100] C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking Byzantine-
to the drawing board: A critical evaluation of poisoning attacks on tolerant SGD by inner product manipulation,” in Uncertainty in Artifi-
production federated learning,” in Proc. IEEE Symp. Secur. Privacy cial Intelligence. PMLR, 2020, pp. 261–270.
(SP), May 2022, pp. 1354–1371. [101] B. Nelson et al., “Exploiting machine learning to subvert your spam
[74] B. Biggio, B. Nelson, and P. Laskov, “Support vector machines under filter,” in Proc. LEET, vol. 8, 2008, pp. 1–9.
adversarial label noise,” in Proc. ACML, 2011, pp. 97–112.
[102] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar,
[75] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the
“Adversarial machine learning,” in Proc. 4th ACM Workshop Secur.
GAN: Information leakage from collaborative deep learning,” in Proc.
Artif. Intell., 2011, pp. 43–58.
ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 603–618.
[103] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted back-
[76] C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data
door attacks on deep learning systems using data poisoning,” 2017,
poisoning attacks in crowd sensing systems,” in Proc. 18th ACM Int.
arXiv:1712.05526.
Symp. Mobile Ad Hoc Netw. Comput., Jun. 2018, pp. 111–120.
[77] C. Miao, Q. Li, L. Su, M. Huai, W. Jiang, and J. Gao, “Attack [104] T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: Identifying vul-
under disguise: An intelligent data poisoning attack mechanism in nerabilities in the machine learning model supply chain,” 2017,
crowdsourcing,” in Proc. World Wide Web Conf., 2018, pp. 13–22. arXiv:1708.06733.
[78] H. Zhang et al., “Data poisoning attack against knowledge graph [105] A. Shafahi et al., “Poison frogs! Targeted clean-label poisoning attacks
embedding,” 2019, arXiv:1904.12052. on neural networks,” in Proc. NIPS, 2018, pp. 6103–6113.
[79] G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, and J. Liu, “Data poisoning [106] Y. Liu et al., “Trojaning attack on neural networks,” in Proc. NDSS,
attacks on federated machine learning,” IEEE Internet Things J., vol. 9, 2018, pp. 1–17.
no. 13, pp. 11365–11375, Jul. 2022. [107] L. Lamport, R. Shostak, and M. Pease, “The Byzantine generals
[80] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks problem,” ACM Trans. Program. Lang. Syst., vol. 4, no. 3, pp. 382–401,
to Byzantine-robust federated learning,” in Proc. 29th USENIX Secur. Jul. 1982.
Symp. (USENIX), 2020, pp. 1605–1622. [108] C. Chen, J. Zhang, A. K. H. Tung, M. Kankanhalli, and G. Chen,
[81] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support “Robust federated recommendation system,” 2020, arXiv:2006.08259.
vector machines,” 2012, arXiv:1206.6389. [109] G. Baruch, M. Baruch, and Y. Goldberg, “A little is enough: Cir-
[82] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing cumventing defenses for distributed learning,” in Proc. NIPS, 2019,
machine learning models via prediction APIs,” in Proc. USENIX Secur. pp. 8632–8642.
Symp., 2016, pp. 601–618. [110] T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in
[83] X. He, L. Lyu, L. Sun, and Q. Xu, “Model extraction and adversarial Proc. NIPS, 2020, pp. 3454–3464.
transferability, your BERT is vulnerable!” in Proc. Conf. North Amer. [111] L. Muñoz-González et al., “Towards poisoning of deep learning algo-
Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2021, rithms with back-gradient optimization,” in Proc. 10th ACM Workshop
pp. 2006–2012. Artif. Intell. Secur., 2017, pp. 27–38.
[84] Q. Xu, X. He, L. Lyu, L. Qu, and G. Haffari, “Beyond model extraction: [112] P. W. Koh and P. Liang, “Understanding black-box predictions via
Imitation attack for black-box NLP APIs,” in Proc. COLING, 2022, influence functions,” in Proc. ICML, 2017, pp. 1885–1894.
pp. 1–12. [113] S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, “Clean-
[85] X. He et al., “CATER: Intellectual property protection on text genera- label backdoor attacks on video recognition models,” in Proc. CVPR,
tion APIs via conditional watermarks,” in Proc. NIPS, 2022, pp. 1–19. Jun. 2020, pp. 14443–14452.
[86] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership [114] Y. Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia, “Narcissus: A
inference attacks against machine learning models,” in Proc. IEEE practical clean-label backdoor attack with limited information,” 2022,
Symp. Secur. Privacy (SP), May 2017, pp. 3–18. arXiv:2204.05255.
[87] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can
[115] J. R. Douceur, “The Sybil attack,” in Proc. Int. Workshop Peer-to-Peer
machine learning be secure?” in Proc. ICCS, 2006, pp. 16–25.
Syst., 2002, pp. 251–260.
[88] B. Biggio et al., “Evasion attacks against machine learning at test time,”
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, [116] Y. Aono, T. Hayashi, L. T. Phong, and L. Wang, “Scalable and secure
2013, pp. 387–402. logistic regression via homomorphic encryption,” in Proc. 6th ACM
Conf. Data Appl. Secur. Privacy, Mar. 2016, pp. 142–144.
[89] C. Szegedy et al., “Intriguing properties of neural networks,” 2013,
arXiv:1312.6199. [117] M. Kim, Y. Song, S. Wang, Y. Xia, and X. Jiang, “Secure logistic
[90] G. Zizzo, A. Rawat, M. Sinn, and B. Buesser, “FAT: Federated regression based on homomorphic encryption: Design and evaluation,”
adversarial training,” 2020, arXiv:2012.01791. JMIR Med. Informat., vol. 6, no. 2, p. e19, Apr. 2018.
[91] J. Hong, H. Wang, Z. Wang, and J. Zhou, “Federated robustness [118] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated
propagation: Sharing robustness in heterogeneous federated learning,” learning: A client level perspective,” 2017, arXiv:1712.07557.
2021, arXiv:2106.10196. [119] T. T. Nguyên, X. Xiao, Y. Yang, S. C. Hui, H. Shin, and J. Shin,
[92] D. Shah, P. Dube, S. Chakraborty, and A. Verma, “Adversarial “Collecting and analyzing data from smart device users with local
training in communication constrained federated learning,” 2021, differential privacy,” 2016, arXiv:1606.05053.
arXiv:2103.01319. [120] N. Wang et al., “Collecting and analyzing multidimensional data with
[93] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards local differential privacy,” in Proc. IEEE 35th Int. Conf. Data Eng.
deep learning models resistant to adversarial attacks,” in Proc. ICLR, (ICDE), Apr. 2019, pp. 638–649.
2018, pp. 1–28. [121] Y. Zhao et al., “Local differential privacy-based federated learning
[94] L. Su and J. Xu, “Securing distributed gradient descent in high for Internet of Things,” IEEE Internet Things J., vol. 8, no. 11,
dimensional statistical learning,” 2018, arXiv:1804.10140. pp. 8836–8853, Nov. 2020.
[95] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand- [122] L. Sun, J. Qian, X. Chen, and P. S. Yu, “LDP-FL: Practical private
ing deep learning (still) requires rethinking generalization,” Commun. aggregation in federated learning with local differential privacy,” in
ACM, vol. 64, no. 3, pp. 107–115, 2021. Proc. IJCAI, 2021, pp. 1–9.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8744 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

[123] S. Truex, L. Liu, K.-H. Chow, M. E. Gursoy, and W. Wei, “LDP-fed: [149] M. Yang, L. Lyu, J. Zhao, T. Zhu, and K.-Y. Lam, “Local differ-
Federated learning with local differential privacy,” in Proc. 3rd ACM ential privacy and its applications: A comprehensive survey,” 2020,
Int. Workshop Edge Syst., Analytics Netw., Apr. 2020, pp. 61–66. arXiv:2008.03686.
[124] L. Lyu, “Lightweight crypto-assisted distributed differential privacy for [150] S. L. Warner, “Randomized response: A survey technique for elimi-
privacy-preserving distributed learning,” in Proc. Int. Joint Conf. Neural nating evasive answer bias,” J. Amer. Statist. Assoc., vol. 60, no. 309,
Netw. (IJCNN), Jul. 2020, pp. 1–8. pp. 63–69, 1965.
[125] P. Mohassel and Y. Zhang, “SecureML: A system for scalable privacy- [151] U. Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized
preserving machine learning,” in Proc. IEEE Symp. Secur. Privacy (SP), aggregatable privacy-preserving ordinal response,” in Proc. CCS, 2014,
May 2017, pp. 19–38. pp. 1054–1067.
[152] T. H. Chan, E. Shi, and D. Song, “Optimal lower bound for differen-
[126] P. Paillier et al., “Public-key cryptosystems based on composite degree
tially private multi-party aggregation,” in Proc. Eur. Symp. Algorithms,
residuosity classes,” in Proc. Eurocrypt, vol. 99, 1999, pp. 223–238.
2012, pp. 277–288.
[127] T. ElGamal, “A public key cryptosystem and a signature scheme based [153] T. Chan, K.-M. Chung, B. M. Maggs, and E. Shi, “Foundations of
on discrete logarithms,” IEEE Trans. Inf. Theory, vol. IT-31, no. 4, differentially oblivious algorithms,” in Proc. 30th Annu. ACM-SIAM
pp. 469–472, Jul. 1985. Symp. Discrete Algorithms, 2019, pp. 2448–2467.
[128] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in [154] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
Proc. 41st Annu. ACM Symp. Symp. theory Comput. (STOC), 2009, Proc. 53rd Annu. Allerton Conf. Commun., Control, Comput. (Allerton),
pp. 169–178. Sep. 2015, pp. 1310–1321.
[129] A. C. Yao, “Protocols for secure computations,” in Proc. 23rd Annu. [155] J. Hamm, Y. Cao, and M. Belkin, “Learning privately from multiparty
Symp. Found. Comput. Sci., Nov. 1982, pp. 160–164. data,” in Proc. ICML, 2016, pp. 555–563.
[130] D. Demmler, T. Schneider, and M. Zohner, “ABY—A framework for [156] L. Lyu and C.-H. Chen, “Differentially private knowledge distillation
efficient mixed-protocol secure two-party computation,” in Proc. Netw. for mobile analytics,” in Proc. 43rd Int. ACM SIGIR Conf. Res.
Distrib. Syst. Secur. Symp., 2015, pp. 1–15. Develop. Inf. Retr., Jul. 2020, pp. 1809–1812.
[131] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise [157] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially
to sensitivity in private data analysis,” in Proc. Theory Cryptography private empirical risk minimization,” J. Mach. Learn. Res., vol. 12,
Conf., 2006, pp. 265–284. pp. 1069–1109, Mar. 2011.
[132] C. Dwork and A. Roth, “The algorithmic foundations of differen- [158] B. Han, I. W. Tsang, and L. Chen, “On the convergence of a family
tial privacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, of robust losses for stochastic gradient descent,” in Machine Learning
pp. 211–407, Aug. 2014. and Knowledge Discovery in Databases. 2016, pp. 665–680.
[159] S. Shen, S. Tople, and P. Saxena, “A uror: Defending against poisoning
[133] I. Damgård, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computa-
attacks in collaborative deep learning systems,” in Proc. 32nd Annu.
tion from somewhat homomorphic encryption,” in Proc. Annu. Cryptol.
Conf. Comput. Secur. Appl., Dec. 2016, pp. 508–519.
Conf., 2012, pp. 643–662. [160] R. Guerraoui et al., “The hidden vulnerability of distributed learning
[134] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig- in byzantium,” in Proc. ICML, 2018, pp. 3521–3530.
ital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, [161] C. Wu, X. Yang, S. Zhu, and P. Mitra, “Mitigating backdoor attacks
no. 2, pp. 120–126, Feb. 1978. in federated learning,” 2020, arXiv:2011.01767.
[135] S. Goryczka and L. Xiong, “A comprehensive comparison of multiparty [162] C. Xie, M. Chen, P.-Y. Chen, and B. Li, “CRFL: Certifiably robust
secure additions with differential privacy,” IEEE Trans. Dependable federated learning against backdoor attacks,” in Proc. ICML, 2021,
Sec. Comput., vol. 14, no. 5, pp. 463–477, Oct. 2015. pp. 11372–11382.
[136] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. Songhori, T. Schneider, [163] L. Chen, H. Wang, Z. Charles, and D. Papailiopoulos, “Draco:
and F. Koushanfar, “Chameleon: A hybrid secure computation frame- Byzantine-resilient distributed training via redundant gradients,” in
work for machine learning applications,” in Proc. Asia Conf. Comput. Proc. ICML, 2018, pp. 903–912.
Commun. Secur., May 2018, pp. 707–721. [164] C. Xie, O. Koyejo, and I. Gupta, “Generalized Byzantine-tolerant
[137] V. Rastogi and S. Nath, “Differentially private aggregation of distrib- SGD,” 2018, arXiv:1802.10116.
uted time-series with transformation and encryption,” in Proc. ACM [165] D. Alistarh, Z. Allen-Zhu, and J. Li, “Byzantine stochastic gradient
SIGMOD Int. Conf. Manage. data, Jun. 2010, pp. 735–746. descent,” in Proc. NIPS, 2018, pp. 4613–4623.
[138] E. Shi, H. Chan, E. Rieffel, R. Chow, and D. Song, “Privacy-preserving [166] X. Xu and L. Lyu, “A reputation mechanism is all you need: Collabo-
rative fairness and adversarial robustness in federated learning,” 2020,
aggregation of time-series data,” in Proc. Annu. Netw. Distrib. Syst.
Secur. Symp. (NDSS), 2011, pp. 1–17. arXiv:2011.10464.
[167] J. Steinhardt, M. Charikar, and G. Valiant, “Resilience: A criterion for
[139] G. Ács and C. Castelluccia, “I have a dream! (differentially private learning in the presence of arbitrary outliers,” 2017, arXiv:1703.04940.
smart metering),” in Proc. Int. Workshop Inf. Hiding. Berlin, Germany: [168] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,”
Springer, 2011, pp. 118–132. IEEE Trans. Neural Netw. Learn. Syst., early access, Jun. 22, 2022,
[140] L. Lyu, K. Nandakumar, B. Rubinstein, J. Jin, J. Bedo, and doi: 10.1109/TNNLS.2022.3182979.
M. Palaniswami, “PPFA: Privacy preserving fog-enabled aggregation in [169] B. Wang et al., “Neural cleanse: Identifying and mitigating backdoor
smart grid,” IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3733–3744, attacks in neural networks,” in Proc. IEEE Symp. Secur. Privacy (SP),
Aug. 2018. May 2019, pp. 707–723.
[141] V. Chen, V. Pastro, and M. Raykova, “Secure computation for machine [170] H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box
learning with SPDZ,” 2019, arXiv:1901.00329. trojan detection and mitigation framework for deep neural networks,”
[142] P. Mohassel and P. Rindal, “ABY 3: A mixed protocol framework for in IJCAI, vol. 2019, pp. 4658–4664.
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. [171] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against
Secur., Oct. 2018, pp. 35–52. backdooring attacks on deep neural networks,” in Proc. Int. Symp. Res.
[143] J. Li, M. Khodak, S. Caldas, and A. Talwalkar, “Differentially private Attacks, Intrusions, Defenses, 2018, pp. 273–294.
meta-learning,” 2019, arXiv:1909.05830. [172] T. J. L. Tan and R. Shokri, “Bypassing backdoor detection algorithms
[144] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and in deep learning,” in Proc. IEEE Eur. Symp. Secur. Privacy (EuroS&P),
U. Erlingsson, “Scalable private learning with pate,” in Proc. ICLR, Sep. 2020, pp. 175–183.
2018, pp. 1–34. [173] P. Zhao, P.-Y. Chen, P. Das, K. N. Ramamurthy, and X. Lin, “Bridging
mode connectivity in loss landscapes and adversarial robustness,” 2020,
[145] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,”
arXiv:2005.00060.
2020, arXiv:2003.02133. [174] Y. Yao, H. Li, H. Zheng, and B. Y. Zhao, “Latent backdoor attacks on
[146] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and deep neural networks,” in Proc. ACM SIGSAC Conf. Comput. Commun.
statistical minimax rates,” in Proc. 54th IEEE Annu. Symp. Found. Secur., Nov. 2019, pp. 2041–2055.
Comput. Sci., Oct. 2013, pp. 429–438. [175] S. Andreina, G. A. Marson, H. Mollering, and G. Karame, “BaFFLe:
[147] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, Backdoor detection via feedback-based federated learning,” in Proc.
“Our data, ourselves: Privacy via distributed noise generation,” in Proc. IEEE 41st Int. Conf. Distrib. Comput. Syst. (ICDCS), Jul. 2021,
Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 2006, pp. 486–503. pp. 852–863.
[148] L. Lyu, X. He, and Y. Li, “Differentially private representation for NLP: [176] H. Chang, V. Shejwalkar, R. Shokri, and A. Houmansadr, “Cronus:
Formal guarantee and an empirical study on privacy and fairness,” in Robust and heterogeneous collaborative learning with black-box knowl-
Proc. Findings Assoc. Comput. Linguistics (EMNLP), 2020, pp. 1–11. edge transfer,” 2019, arXiv:1912.11279.

[177] C. Chen, L. Lyu, H. Yu, and G. Chen, “Practical attribute reconstruction [203] D. K. Dennis, T. Li, and V. Smith, “Heterogeneity for the win: One-
attack against federated learning,” IEEE Trans. Big Data, early access, shot federated clustering,” in Proc. Int. Conf. Mach. Learn., 2021,
Mar. 15, 2022, doi: 10.1109/TBDATA.2022.3159236. pp. 2611–2620.
[178] J. Jin, J. Ren, Y. Zhou, L. Lyu, J. Liu, and D. Dou, “Accelerated [204] J. Zhang, C. Chen, B. Li, L. Lyu, S. Wu, J. Xu, S. Ding, and C. Wu,
federated learning with decoupled adaptive optimization,” in Proc. “A practical data-free approach to one-shot federated learning with
ICML, 2022, pp. 10298–10322. heterogeneity,” 2021, arXiv:2112.12371.
[179] Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu, “On the [205] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar, “The security
convergence and robustness of adversarial training,” in Proc. ICML, of machine learning,” Mach. Learn., vol. 81, no. 2, pp. 121–148, 2010.
vol. 1, 2019, p. 2. [206] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for
[180] Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving data poisoning attacks,” in Proc. NIPS, 2017, pp. 3517–3529.
adversarial robustness requires revisiting misclassified examples,” in
Proc. ICLR, 2019, pp. 1–14.
[181] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry,
“Robustness may be at odds with accuracy,” in Proc. ICLR, 2019,
pp. 1–14.
[182] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learning
for the Internet of Things,” IEEE Trans. Ind. Informat., vol. 15, no. 7,
pp. 4206–4215, Jul. 2019.
[183] L. Lyu et al., “Towards fair and privacy-preserving federated deep mod- Lingjuan Lyu (Member, IEEE) received the
els,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 11, pp. 2524–2541, Ph.D. degree from The University of Melbourne,
Mar. 2020. Melbourne, VIC, Australia.
[184] L. Lyu, Y. Li, K. Nandakumar, J. Yu, and X. Ma, “How to democratise She is currently a Senior Research Scientist and
and protect AI: Fair and differentially private decentralised deep the Team Leader with Sony AI, Tokyo, Japan. She
learning,” IEEE Trans. Dependable Secure Comput., vol. 19, no. 2, had published over 50 papers in top conferences and
pp. 1003–1017, Apr. 2020. journals, including NeurIPS, ICLR, ICML, Nature,
[185] L. Lyu, J. C. Bezdek, J. Jin, and Y. Yang, “FORESEEN: Towards differ- and so on. Her current research interest is trustwor-
entially private deep inference for intelligent Internet of Things,” IEEE thy artificial intelligence (AI).
J. Sel. Areas Commun., vol. 38, no. 10, pp. 2418–2429, Oct. 2020. Dr. Lyu was a winner of the IBM Ph.D. Fellowship
[186] X. Pan, M. Zhang, S. Ji, and M. Yang, “Privacy risks of general-purpose Worldwide. Her works had won several best paper
language models,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2020, awards and oral presentations from top conferences.
pp. 1314–1331.
[187] T. Dong, B. Zhao, and L. Lyu, “Privacy for free: How does dataset
condensation help privacy?” in Proc. ICML, 2022, pp. 1–19.
[188] C. Chen, Y. Liu, X. Ma, and L. Lyu, “CalFAT: Calibrated feder-
ated adversarial training with label skewness,” in Proc. NIPS, 2022,
pp. 1–16.
[189] X. He, Q. Xu, L. Lyu, F. Wu, and C. Wang, “Protecting intellectual
property of language generation APIs with lexical watermark,” in Proc.
AAAI, 2022, pp. 1–9. Han Yu (Senior Member, IEEE) received the Ph.D.
[190] N. Truong, K. Sun, S. Wang, F. Guitton, and Y. Guo, “Privacy degree from the School of Computer Science and
preservation in federated learning: An insightful survey from the GDPR Engineering, NTU, Singapore.
perspective,” Comput. Secur., vol. 110, Nov. 2021, Art. no. 102402. He is a Nanyang Assistant Professor (NAP) in
[191] K. Cheng et al., “SecureBoost: A lossless federated learning frame- the School of Computer Science and Engineering
work,” IEEE Intell. Syst., vol. 36, no. 6, pp. 87–98, Dec. 2021. (SCSE), Nanyang Technological University (NTU), .
[192] Z. Tian, R. Zhang, X. Hou, J. Liu, and K. Ren, “FederBoost: Private He held the prestigious Lee Kuan Yew Post-Doctoral
federated learning for GBDT,” 2020, arXiv:2011.02796. Fellowship (LKY PDF), from 2015 to 2018. He
has published over 200 research papers and book
[193] X. Jin, P.-Y. Chen, C.-Y. Hsu, C.-M. Yu, and T. Chen, “Catastrophic
chapters in leading international conferences and
data leakage in vertical federated learning,” in Proc. NIPS, vol. 2021,
journals. He is a coauthor of the book Federated
pp. 994–1006.
Learning - the first monograph on the topic of federated learning. His research
[194] X. Xu, L. Lyu, X. Ma, C. Miao, C. S. Foo, and B. K. H. Low, “Gradient focuses on trustworthy federated learning.
driven rewards to guarantee fairness in collaborative machine learning,” Dr.Yu is a Senior Member of CCF. His research works have won multiple
in Proc. NIPS, 2021, pp. 16104–16117. awards from conferences and journals.
[195] L. Lyu, X. Xu, Q. Wang, and H. Yu, “Collaborative fairness in federated
learning,” in Federated Learning. Springer, 2020, pp. 189–204.
[196] Q. Yang, L. Fan, and H. Yu, Federated Learning: Privacy Incentive.
Springer, 2020.
[197] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim,
“Incentive design for efficient federated learning in mobile networks:
A contract theory approach,” in Proc. IEEE VTS Asia Pacific Wireless
Commun. Symp. (APWCS), Aug. 2019, pp. 1–5.
[198] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, Xingjun Ma received the bachelor’s degree from
“Reliable federated learning for mobile networks,” IEEE Wireless
Jilin University, Changchun, China, the master’s
Commun., vol. 27, no. 2, pp. 72–80, Feb. 2020.
degree from Tsinghua University, Beijing, China,
[199] S. Warnat-Herresthal et al., “Swarm learning for decentralized and and the Ph.D. degree from The University of
confidential clinical machine learning,” Nature, vol. 594, no. 7862, Melbourne, Melbourne, VIC, Australia.
pp. 265–270, 2021. He was a Lecturer with the School of Informa-
[200] Y. Liu, X. Yuan, Z. Xiong, J. Kang, X. Wang, and D. Niyato, tion Technology, Deakin University, Geelong, VIC,
“Federated learning for 6G communications: Challenges, methods, Australia. He was a Post-Doctoral Research Fellow
and future directions,” China Commun., vol. 17, no. 9, pp. 105–118, with the School of Computing and Information Sys-
Sep. 2020. tems, The University of Melbourne. He is currently
[201] N. Guha, A. Talwalkar, and V. Smith, “One-shot federated learning,” an Associate Professor of computer science with
2019, arXiv:1902.11175. Fudan University, Shanghai, China. He works in the areas of adversarial
[202] Q. Li, B. He, and D. Song, “Practical one-shot federated learning for machine learning, deep learning, artificial intelligence (AI) security, and data
cross-silo setting,” 2020, arXiv:2010.01017. privacy.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8746 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024

Chen Chen received the B.S. degree in computer Qiang Yang (Fellow, IEEE) received the B.Sc.
science from the Chu Kochen Honors College, degree in astrophysics from Peking University,
Zhejiang University, Hangzhou, China, in 2017, Beijing, China, in 1982, and the M.Sc. degree in
where he is currently pursuing the Ph.D. degree with astrophysics and the Ph.D. degree in computer sci-
the College of Computer Science. ence from the University of Maryland, College Park,
He is currently an Intern with Sony AI, Tokyo, MD, USA, in 1985 and 1989, respectively.
Japan. His research interests include federated learn- He was a Faculty Member with the Uni-
ing, adversarial training, multilabel learning, and versity of Waterloo, Waterloo, ON, Canada,
recommendation systems. from 1989 to 1995, and Simon Fraser University,
Burnaby, BC, Canada, from 1995 to 2001. He was
the Founding Director of Huawei’s Noah’s Ark Lab,
Hong Kong, from 2012 to 2014 and a Co-Founder of 4Paradigm Corporation,
Beijing, an artificial intelligence (AI) platform company. He is currently the
Head of the AI Department and the Chief AI Officer of WeBank, Shenzhen,
China, and a Chair Professor with the Computer Science and Engineering
(CSE) Department, The Hong Kong University of Science and Technology
(HKUST), Hong Kong, where he was a former Head of the CSE Department
and the Founding Director of the Big Data Institute from 2015 to 2018.
He is the author of several books, including Intelligent Planning (Springer),
Lichao Sun received the Ph.D. degree in computer Crafting Your Research Future (Morgan & Claypool), and Constraint-Based
science from the University of Illinois Chicago, Design Recovery for Software Engineering (Springer). His research interests
Chicago, IL, USA, in 2020, under the supervision include AI, machine learning, and data mining, especially in transfer learning,
of Prof. Philip S. Yu. automated planning, federated learning, and case-based reasoning.
He is currently an Assistant Professor with the Dr. Yang has served as an Executive Council Member of the Advancement of
Department of Computer Science and Engineering, AI (AAAI) from 2016 to 2020. He is a fellow of several international societies,
Lehigh University, Bethlehem, PA, USA. He has including ACM, AAAI, IEEE, IAPR, and AAAS. He was a recipient of several
published more than 45 research articles in top awards, including the 2004/2005 ACM KDDCUP Championship, the ACM
conferences and journals, such as CCS, USENIX- SIGKDD Distinguished Service Award in 2017, and the AAAI Innovative
Security, NeurIPS, KDD, ICLR, the Advancement AI Applications Awards in 2018 and 2020. He was the Founding Editor-
of AI (AAAI), the International Joint Conference on in-Chief of the ACM Transactions on Intelligent Systems and Technology
AI (IJCAI), ACL, NAACL, TII, TNNLS, and TMC. His research interests (ACM TIST) and IEEE T RANSACTIONS ON B IG D ATA (IEEE TBD). He has
include security and privacy in deep learning and data mining. He mainly served as the President of the International Joint Conference on AI (IJCAI)
focuses on artificial intelligence (AI) security and privacy, social networks, from 2017 to 2019.
and NLP applications.
Philip S. Yu (Life Fellow, IEEE) received the
B.S. degree in electrical engineering (E.E.) from
the National Taiwan University, New Taipei, Taiwan,
in 1992, the M.S. and Ph.D. degrees in E.E. from
Stanford University, Stanford, CA, USA, in 1976
and 1978, respectively, and the M.B.A. degree from
New York University, New York, NY, USA, in 1982.
He is currently a Distinguished Professor of com-
Jun Zhao (Member, IEEE) received the bachelor’s puter science with the University of Illinois Chicago
degree from Shanghai Jiao Tong University, (UIC), Chicago, IL, USA, and also holds the Wexler
Shanghai, China, in July 2010, and the Ph.D. deg- Chair in Information Technology. Before joining
ree in electrical and computer engineering from UIC, he was with IBM, USA, where he was the Manager of the Software
Carnegie Mellon University (CMU), Pittsburgh, Tools and Techniques Department, Watson Research Center. He has published
PA, USA, in May 2015, affiliating with CMU’s more than 1200 papers in refereed journals and conferences. He holds or has
renowned CyLab Security & Privacy Institute. applied for more than 300 U.S. patents. His research interest is on big data,
He is currently an Assistant Professor with the including data mining, data stream, database, and privacy.
School of Computer Science and Engineering, Dr. Yu is a fellow of the ACM. He was a recipient of the ACM
Nanyang Technological University (NTU), SIGKDD 2016 Innovation Award for his influential research and scientific
Singapore. Before joining NTU first as a contributions to mining, fusion, and anonymization of big data, the IEEE
Post-Doctoral Researcher with Xiaokui Xiao and then as a Faculty Member, Computer Society’s 2013 Technical Achievement Award for pioneering and
he was a Post-Doctoral Researcher and Arizona Computing PostDoc Best fundamentally innovative contributions to the scalable indexing, querying,
Practices Fellow with Arizona State University, Tempe, AZ, USA. His searching, mining, and anonymization of big data, and the Research Contribu-
research interests include communication networks, security/privacy, and tions Award from IEEE International Conference on Data Mining (ICDM) in
artificial intelligence (AI). 2003 for his pioneering contributions to the field of data mining. He received
Dr. Zhao’s coauthored papers received the Best Paper Award (IEEE the ICDM 2013 10-Year Highest-Impact Paper Award and the EDBT Test of
T RANSACTIONS) by the IEEE Vehicular Society (VTS) Singapore Chapter Time Award in 2014. He was the Editor-in-Chief of ACM Transactions on
in 2019 and the Best Paper Award in the EAI International Conference on Knowledge Discovery from Data from 2011 to 2017 and IEEE T RANSAC -
6G for Future Wireless Networks (EAI 6GN) 2020. TIONS ON K NOWLEDGE AND D ATA E NGINEERING from 2001 to 2004.

Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.

Federated Learning Book
No ratings yet
Federated Learning Book
209 pages
A Survey On Federated Learning: Challenges and Applications
No ratings yet
A Survey On Federated Learning: Challenges and Applications
23 pages
Elective Mathematics Super Mock 2025
No ratings yet
Elective Mathematics Super Mock 2025
4 pages
Learning Critically Selective Self-Distillation in Federated Learning On Non-IID Data
No ratings yet
Learning Critically Selective Self-Distillation in Federated Learning On Non-IID Data
12 pages
A Survey of Security Threats in Federated Learning
100% (1)
A Survey of Security Threats in Federated Learning
26 pages
Advances and Open Problems in Federated Learning
No ratings yet
Advances and Open Problems in Federated Learning
105 pages
Federated Unlearning A Survey On Methods Design Guidelines and Evaluation Metrics
No ratings yet
Federated Unlearning A Survey On Methods Design Guidelines and Evaluation Metrics
21 pages
CLFLDP Communication Efficient Layer Clipping Federat 2024 Journal of Syste
No ratings yet
CLFLDP Communication Efficient Layer Clipping Federat 2024 Journal of Syste
17 pages
BPS-FL Blockchain-Based Privacy-Preserving and Sec
No ratings yet
BPS-FL Blockchain-Based Privacy-Preserving and Sec
25 pages
Scope On Detecting Constrained Backdoor Attacks in Federated Learning
No ratings yet
Scope On Detecting Constrained Backdoor Attacks in Federated Learning
14 pages
Enhancing IoT Security: A Novel Approach With Federated Learning and Differential Privacy Integration
No ratings yet
Enhancing IoT Security: A Novel Approach With Federated Learning and Differential Privacy Integration
17 pages
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
No ratings yet
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
121 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
3215 Byzantine Robust Dynamic
No ratings yet
3215 Byzantine Robust Dynamic
24 pages
A Survey On Security and Privacy of Federated Learning
No ratings yet
A Survey On Security and Privacy of Federated Learning
61 pages
BIT-FL Blockchain-Enabled Incentivized and Secure Federated Learning Framework PDF
No ratings yet
BIT-FL Blockchain-Enabled Incentivized and Secure Federated Learning Framework PDF
18 pages
2021 Digestive Neural Network A Novel Defense Strategy Against Inference Attacks in Federated Learning
No ratings yet
2021 Digestive Neural Network A Novel Defense Strategy Against Inference Attacks in Federated Learning
20 pages
BTP Research Internship Final Report
No ratings yet
BTP Research Internship Final Report
21 pages
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
No ratings yet
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
24 pages
PPFL Privacy Preserving FL With TEE
No ratings yet
PPFL Privacy Preserving FL With TEE
15 pages
Privacy-Preserving Traffic Flow Prediction A Federated Learning Approach
No ratings yet
Privacy-Preserving Traffic Flow Prediction A Federated Learning Approach
13 pages
Adapting Security and Decentralized Knowledge Enhancement in Federated Learning Using Blockchain Technology: Literature Review
No ratings yet
Adapting Security and Decentralized Knowledge Enhancement in Federated Learning Using Blockchain Technology: Literature Review
24 pages
Journal of Automation and Intelligence: Jia Liu Yaochu Jin
No ratings yet
Journal of Automation and Intelligence: Jia Liu Yaochu Jin
21 pages
Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
No ratings yet
Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
6 pages
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
No ratings yet
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
15 pages
Anomaly Detection and Defense Techniques in Federated Learning A Comprehensive Review
No ratings yet
Anomaly Detection and Defense Techniques in Federated Learning A Comprehensive Review
34 pages
22 TBD
No ratings yet
22 TBD
14 pages
MDPI Article Template
No ratings yet
MDPI Article Template
19 pages
A Survey On Privacy-Preserving Federated Learning Against Poisoning Attacks
No ratings yet
A Survey On Privacy-Preserving Federated Learning Against Poisoning Attacks
18 pages
Big Data Project Privacy and FL DP
No ratings yet
Big Data Project Privacy and FL DP
3 pages
File - BB 6 15
No ratings yet
File - BB 6 15
10 pages
Introduction To Cooling Water Treatment
No ratings yet
Introduction To Cooling Water Treatment
40 pages
A Comprehensive Survey of Privacy-Preserving Federated Learning A Taxonomy, Review, and Future Directions
No ratings yet
A Comprehensive Survey of Privacy-Preserving Federated Learning A Taxonomy, Review, and Future Directions
36 pages
Applied Sciences Privacy-Preserving Federated Learning Using Homomorphic Encryption
No ratings yet
Applied Sciences Privacy-Preserving Federated Learning Using Homomorphic Encryption
17 pages
Jsan 12 00013
No ratings yet
Jsan 12 00013
18 pages
Threats To Federated Learning A Survey FLPI
No ratings yet
Threats To Federated Learning A Survey FLPI
15 pages
Threats, Attacks and Defenses To Federated Learning: Issues, Taxonomy and Perspectives
No ratings yet
Threats, Attacks and Defenses To Federated Learning: Issues, Taxonomy and Perspectives
19 pages
Research Paper Mine
No ratings yet
Research Paper Mine
9 pages
3-PPFL Enhancing Privacy in FL With Confidential Computing
No ratings yet
3-PPFL Enhancing Privacy in FL With Confidential Computing
4 pages
Shreya Ghosh MS Thesis Final Revised
No ratings yet
Shreya Ghosh MS Thesis Final Revised
64 pages
Challenges and Approaches For Mitigating Byzantine Attacks in Federated Learning
No ratings yet
Challenges and Approaches For Mitigating Byzantine Attacks in Federated Learning
8 pages
Federated Learning Article
No ratings yet
Federated Learning Article
68 pages
1907 07157v91
No ratings yet
1907 07157v91
7 pages
SecureBoost A Lossless Federated Learning Framework
No ratings yet
SecureBoost A Lossless Federated Learning Framework
9 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
UFO Glasnost Marina Popowitsch LQ
No ratings yet
UFO Glasnost Marina Popowitsch LQ
288 pages
A Survey On Federated Learning For Resource-Constrained IoT Devices
No ratings yet
A Survey On Federated Learning For Resource-Constrained IoT Devices
24 pages
Research Paper 7
No ratings yet
Research Paper 7
12 pages
Intrusion Detection Based On Privacy-Preserving Federated Learning For The Industrial IoT
No ratings yet
Intrusion Detection Based On Privacy-Preserving Federated Learning For The Industrial IoT
10 pages
Federated Learning Attacks and Defenses: A Survey
No ratings yet
Federated Learning Attacks and Defenses: A Survey
10 pages
Decentralized Federated Learning Based On Blockchain Co 2024 Computer Commu
No ratings yet
Decentralized Federated Learning Based On Blockchain Co 2024 Computer Commu
11 pages
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
No ratings yet
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
41 pages
A Detailed Survey On Federated Learning Attacks and Defenses
No ratings yet
A Detailed Survey On Federated Learning Attacks and Defenses
18 pages
BL 5
No ratings yet
BL 5
6 pages
Visual Basic 6.0 Documentation
No ratings yet
Visual Basic 6.0 Documentation
33 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
NIT Research Proposal1
No ratings yet
NIT Research Proposal1
2 pages
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
No ratings yet
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
17 pages
Paper 5
No ratings yet
Paper 5
7 pages
1 s2.0 S0167739X21004726 Main
No ratings yet
1 s2.0 S0167739X21004726 Main
9 pages
Federated Foundation Models: Privacy-Preserving and Collaborative Learning For Large Models
No ratings yet
Federated Foundation Models: Privacy-Preserving and Collaborative Learning For Large Models
10 pages
Vendor P Q App
No ratings yet
Vendor P Q App
441 pages
FLAIRS FPGA-Accelerated Inference-Resistant Amp Secure Federated Learning
No ratings yet
FLAIRS FPGA-Accelerated Inference-Resistant Amp Secure Federated Learning
6 pages
Emailing Metrology Lab Manual - Consolidated Mar2021
No ratings yet
Emailing Metrology Lab Manual - Consolidated Mar2021
113 pages
BULLETIN FOR THE HISTORYvol30-2
No ratings yet
BULLETIN FOR THE HISTORYvol30-2
100 pages
Jan 25 Chem Pastec Paper CXC
No ratings yet
Jan 25 Chem Pastec Paper CXC
20 pages
290 Module III
No ratings yet
290 Module III
31 pages
Class 10 2019 Science Set 2
No ratings yet
Class 10 2019 Science Set 2
11 pages
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
No ratings yet
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
77 pages
ONGC Uran
No ratings yet
ONGC Uran
10 pages
Chapter 2 - FIR Filters - Digital Filter Design
No ratings yet
Chapter 2 - FIR Filters - Digital Filter Design
100 pages
Structural Optimization in Civil Engineering A Lit
No ratings yet
Structural Optimization in Civil Engineering A Lit
28 pages
NDA Books 2022: Best Books To Crack NDA 1 & 2 Exams: Anangsha Patra
No ratings yet
NDA Books 2022: Best Books To Crack NDA 1 & 2 Exams: Anangsha Patra
7 pages
Everhard™: Abrasion-Resistant Steel Plate
No ratings yet
Everhard™: Abrasion-Resistant Steel Plate
12 pages
2.003J/1.053J Dynamics and Control I Fall 2007 Problem Set 4
No ratings yet
2.003J/1.053J Dynamics and Control I Fall 2007 Problem Set 4
4 pages
PAPERGOC2
No ratings yet
PAPERGOC2
7 pages
Suggestion For ICT-2024
No ratings yet
Suggestion For ICT-2024
5 pages
Practice Problems For Mid Term Test
No ratings yet
Practice Problems For Mid Term Test
11 pages
AC Voltmeter: PMMC Based: Known: FSD of I Solution
No ratings yet
AC Voltmeter: PMMC Based: Known: FSD of I Solution
21 pages
SAQ Ans 24
No ratings yet
SAQ Ans 24
2 pages
Books and Stationery Grade 9
No ratings yet
Books and Stationery Grade 9
2 pages
Bla Bla
No ratings yet
Bla Bla
6 pages
Math Project Correction
No ratings yet
Math Project Correction
8 pages
Iot Based Garbage Management System For Smart City Using Raspberry Pi
No ratings yet
Iot Based Garbage Management System For Smart City Using Raspberry Pi
10 pages
ENGR 2530 Syllabus-Spring 2015 - KLM Abbreviated
No ratings yet
ENGR 2530 Syllabus-Spring 2015 - KLM Abbreviated
2 pages
Fanuc Pmc-Model Sa1/Sb7 Supplemental Programming Manual (LADDER Language)
No ratings yet
Fanuc Pmc-Model Sa1/Sb7 Supplemental Programming Manual (LADDER Language)
12 pages
Windows 7 Hyper Terminal
No ratings yet
Windows 7 Hyper Terminal
4 pages
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)

Privacy and Robustness in Federated Learning: Attacks and Defenses

Uploaded by

Privacy and Robustness in Federated Learning: Attacks and Defenses

Uploaded by

8726 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO.

Privacy and Robustness in Federated

recommendation, posing immediate or potential privacy risks.

replaying, or removing messages. This setting allows the

C. Training Phase Versus Inference Phase

A. Inferring Class Representatives

observe or eavesdrop on the gradients and perform inference

robustness. Depending on the attacker’s objective, poison- wn = nu − wi . (2)

function. To make the aggregation fail, they proposed an TABLE II

You might also like