Privacy and Robustness in Federated Learning: Attacks and Defenses
Privacy and Robustness in Federated Learning: Attacks and Defenses
7, JULY 2024
Abstract— As data are increasingly being stored in different directions toward robust and privacy-preserving FL, and their
silos and societies becoming more aware of data privacy issues, interplays with the multidisciplinary goals of FL.
the traditional centralized training of artificial intelligence (AI)
models is facing efficiency and privacy challenges. Recently, Index Terms— Attacks, defenses, federated learning (FL), pri-
federated learning (FL) has emerged as an alternative solution vacy, robustness.
and continues to thrive in this new reality. Existing FL protocol
designs have been shown to be vulnerable to adversaries within N OMENCLATURE
or outside of the system, compromising data privacy and system
robustness. Besides training powerful global models, it is of
AI Artificial intelligence.
paramount importance to design FL systems that have privacy ML Machine learning.
guarantees and are resistant to different types of adversaries. FL Federated learning.
In this article, we conduct a comprehensive survey on privacy GDPR General data protection regulation.
and robustness in FL over the past five years. Through a concise i.i.d. Independent identically distributed.
introduction to the concept of FL and a unique taxonomy
covering: 1) threat models; 2) privacy attacks and defenses; and
IoT Internet of Things.
3) poisoning attacks and defenses, we provide an accessible review HFL Horizontally federated learning.
of this important topic. We highlight the intuitions, key tech- VFL Vertically federated learning.
niques, and fundamental assumptions adopted by various attacks FTL Federated transfer learning.
and defenses. Finally, we discuss promising future research H2B HFL to businesses.
H2C HFL to consumers.
Manuscript received 13 January 2022; revised 11 August 2022; SGD Stochastic gradient descent.
accepted 14 October 2022. Date of publication 10 November 2022; date
of current version 9 July 2024. This work was supported in part by Sony SMC Secure multiparty computation.
AI; in part by the Joint NTU-WeBank Research Centre on Fintech under DP Differential privacy.
Award NWJ-2020-008; in part by Nanyang Technological University, Sin- CDP Centralized differential privacy.
gapore; in part by the Joint SDU-NTU Centre for Artificial Intelligence
Research (C-FAIR) under Grant NSC-2019-011; in part by the National LDP Local differential privacy.
Research Foundation, Singapore, under its AI Singapore Programme, under DDP Distributed differential privacy.
AISG Award AISG2-RP-2020-019; in part by the RIE 2020 Advanced HE Homomorphic encryption.
Manufacturing and Engineering (AME) Programmatic Fund, Singapore, under
Grant A20G8b0102; in part by Nanyang Technological University through RFA Robust federated aggregation.
the Nanyang Assistant Professorship (NAP); and in part by the Future GAN Generative adversarial network.
Communications Research & Development Programme under Grant FCP- MIA Membership inference attack.
NTU-RG-2021-014. The work of Qiang Yang was supported in part by
the Hong Kong RGC Theme-Based Research Scheme under Grant T41- AT Adversarial training.
603/20-R. The work of Philip S. Yu was supported in part by NSF under FAT Federated adversarial training.
Grant III-1763325, Grant III-1909323, Grant III-2106758, and Grant SaTC- API Application programming interface.
1930941. (Corresponding authors: Lingjuan Lyu; Han Yu; and Qiang Yang.)
Lingjuan Lyu is with Sony AI, Tokyo 108-0075, Japan (e-mail:
[email protected]).
Han Yu and Jun Zhao are with the School of Computer Science and I. I NTRODUCTION
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]; [email protected]).
Xingjun Ma is with the School of Computer Science, Fudan University,
Shanghai 200437, China (e-mail: [email protected]).
A S COMPUTING devices become increasingly ubiqui-
tous, people generate huge amounts of data through
their day-to-day usage. Collecting such data into centralized
Chen Chen was with Sony AI, Tokyo 108-0075, Japan. He is now with the storage facilities is costly and time-consuming. Traditional
College of Computer Science, Zhejiang University, Hangzhou 310027, China
(e-mail: [email protected]). centralized ML approaches cannot support such ubiquitous
Lichao Sun is with the Department of Computer Science and Engineering, deployments and applications due to infrastructure shortcom-
Lehigh University, Bethlehem, PA 18015 USA (e-mail: [email protected]). ings, such as limited communication bandwidth, intermittent
Qiang Yang is with the Department of Artificial Intelligence (AI), WeBank,
Shenzhen 518000, China, and also with the Department of Computer Science network connectivity, and strict delay constraints [1]. Another
and Engineering, The Hong Kong University of Science and Technology, critical concern is data privacy and user confidentiality as the
Hong Kong (e-mail: [email protected]). usage data usually contain sensitive information [2]. Sensitive
Philip S. Yu is with the Department of Computer Science, University of
Illinois Chicago, Chicago, IL 60607 USA (e-mail: [email protected]). data, such as facial images, location-based services, or health
Digital Object Identifier 10.1109/TNNLS.2022.3216981 information, can be used for targeted social advertising and
2162-237X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8727
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8728 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
TABLE I
S UMMARY OF ATTACKS A GAINST S ERVER -BASED FL
on an unlabeled public set [25]. One obvious benefit of important to understand the principles behind these privacy
sharing logits is the reduced communication costs, without and robustness attacks. The properties of the representative
significantly affecting utility [25]. privacy and robustness attacks in server-based FL are summa-
In summary, all the above sharing methods did not inher- rized in Table I.
ently provide defenses against privacy and poisoning attacks— Note that all the above threats are mainly for homogeneous
two main sources of threats to FL. FL. Although heterogeneous FL is more privacy friendly
compared to homogeneous FL, as sharing model prediction
instead of model parameters or updates eliminates the risk of
C. Threats to FL
white-box inference attacks in homogeneous FL [25], [41],
FL offers a privacy-aware paradigm of model training, there is no theoretic guarantee that sharing prediction is
which does not require data sharing and allows participants to private and secure. In fact, the predictions from local models
join and leave a federation freely. Nevertheless, recent works also encode some private information [42], [43], [44], [45].
have demonstrated that FL may not always provide suffi- Similarly, local model predictions can also be arbitrarily
cient privacy and robustness guarantees. Existing FL protocol manipulated by any malicious participant. However, how it is
designs are vulnerable to: 1) a malicious server that aims to vulnerable to different privacy and poisoning attacks, as listed
infer sensitive information from individual updates over time, in Sections III and IV, remains largely unknown, which needs
tamper with the training process, or control the view of the further investigation. Henceforth, we are mainly focusing on
participants on the global parameters and 2) any adversarial the homogeneous FL throughout this survey.
participant who can infer other participants’ sensitive informa-
tion, tamper the global parameter aggregation, or poison the
global model. D. Secure FL
In terms of privacy, communicating gradients throughout Attacks on FL come from either the privacy perspective
the training process can reveal sensitive information [27], [28] when a malicious participant or the central server attempts
and even cause deep leakage [29], either to a third party or to infer the private information of a victim participant or the
the central server [7], [30]. Even a small portion of gradients robustness perspective when a malicious participant aims to
can reveal a fair amount of sensitive information about the compromise the global model.
local data [31]. Recent works further show that, by simply To secure FL against privacy attacks, existing privacy-
observing the gradients, a malicious attacker can successfully preserving methodologies in centralized ML have been tried in
steal the training data [29], [32]. FL, including HE, SMC, and DP. However, HE and SMC may
In terms of robustness, FL systems are vulnerable to both not be applicable to large-scale FL, as they incur substantial
data poisoning [33], [34] and model poisoning attacks [35], communication and computation overhead. In aggregation-
[36], [37], [38]. Malicious participants can attack the con- based tasks, DP requires the aggregated value to contain
vergence of the global model or implant backdoor triggers random noise up to a certain magnitude to ensure (, δ)-DP
into the global model by deliberately altering their local data and, thus, is also not ideal for FL. The noise addition required
(data poisoning) or their gradient uploads (model poisoning). by DP is also hard to execute in FL. In an ideal scenario where
More broadly, poisoning attacks can be categorized into: the server (aggregator) is trusted, the server can add the noise
1) untargeted attack, such as a Byzantine attack, where the to the aggregated gradients [7]. However, in many real-world
adversary aims to destroy the convergence and performance scenarios, the participants may not trust the central server
of the global model [39], [40] and 2) targeted attack, such or each other. In this case, the participants would compete
as a backdoor attack, where the adversary aims to implant a with each other, and all want to ensure LDP by adding as
backdoor trigger into the global model, so as to trick the model much noise as possible to their local gradients [30], [44], [46].
to constantly predict an adversarial class on a subtask while This tends to accumulate significant errors on the server side.
keeping good performance on the main task [34], [35], [36]. DDP [30], [44], [46] can mitigate this problem to some extent
These privacy and robustness attacks pose significant threats when at least a certain fraction of the participants are honest
to FL. In centralized learning, the server is responsible for and do not conduct such malicious competition.
all the participants’ privacy and model robustness. However, Defending FL against various robustness attacks (e.g.,
in FL, any participant can attack the server and spy on other untargeted Byzantine attack and targeted backdoor attack)
participants, even without involving the server. Therefore, it is is an extremely challenging task. This is due to two main
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8729
reasons. First, the defense can only be executed on the privacy analysis of FL under both passive and active white-box
server side where only local gradients are available. This inference attacks; Huang et al. [72] did a comprehensive eval-
invalids many backdoor defense methods developed in the uation of defenses against gradient inversion attacks in FL,
centralized ML, for example, denoising (preprocessing) meth- including encrypting gradients, perturbing gradients, encoding
ods [47], [48], [49], [50], [51], backdoor sample/trigger inputs, and combined defenses; and Shejwalkar et al. [73]
detection methods [52], [53], [54], [55], [56], [57], robust clearly showed that FL, even without any defenses, is highly
data augmentations [58], fine-tuning methods [58], the neural robust in practice. For production cross-device FL (H2C),
attention distillation (NAD)-based method [59], and more which contains thousands to billions of clients, poisoning
recent anti-backdoor learning (ABL) method based on a attacks have no impact on existing robust FL algorithms even
sophisticated learning process [60]. Second, the defense with impractically high percentages of compromised clients.
method has to be robust to both data poisoning and For production cross-silo FL (H2B), which contains up to
model poisoning attacks. Most existing robustness defenses 100 clients, data poisoning attacks are completely ineffective;
are gradient aggregation methods mainly developed for model poisoning attacks are unlikely to play a major risk
defending against the untargeted Byzantine attackers, such when the clients involved are bound by contract and their
as Krum/multi-Krum [40], AGGREGATHOR [61], Byzan- software stacks are professionally maintained (e.g., in banks
tine gradient descent (BGD) [62], median-based gradient and hospitals).
descent [63], trimmed-mean-based gradient descent [63], and Overall, the major contributions of this survey include the
SIGN SGD [39]. These defense methods have never been tested following.
on the targeted backdoor attacks [33], [34], [35], [36], [38]. 1) This survey presents a comprehensive categorization
Dedicated defense methods against both data poisoning and of FL, and summarized threats and the corresponding
model poisoning attacks have been investigated, such as norm protections for FL in a systematic manner.
clipping [38], geometric median-based RFA [64], and robust 2) Existing privacy and robustness attacks and defenses
learning rate [65]. For the collusion of Sybil attacks, contribu- are well explored to help readers better understand the
tion similarity [37] can be leveraged as a strategy for defense. assumptions, principles, reasons, and differences of the
current progress in the privacy and robustness domain
of FL.
E. Motivation of This Survey and Our Contribution 3) The conflicts between privacy and robustness, and
Most existing surveys on FL are mostly focused on the sys- among multiple design goals are identified; the gaps
tem or protocol design [19], [66], [67]. A few surveys touched between the current works and the real scenarios in
on either privacy or robustness, but did not systematically FL are summarized.
explore both, and their intersections with the other aspects in 4) Future research directions will assist the community
FL, such as fairness and efficiency [68], [69], [70]. A notable to rethink and improve their current designs toward
number of research works have been conducted on privacy and robust and privacy-preserving FL of real practicality
robustness. Although these works attempt to discover the vul- and impact. Meanwhile, it is suggested to integrate
nerabilities of FL and aim to enhance the privacy and system multidisciplinary goals in the system design of FL.
robustness of FL, there are very few efforts for categorizing
them in a systemic manner, and privacy and robustness threats
to FL have not been systematically explored. To fill in this F. Survey Organization
gap, in this article, we have conducted an extensive survey The rest of the survey is organized as follows. Before going
on the recent advances in privacy and robustness threats to into an in-depth discussion on privacy and robustness in FL,
FL and their defenses. In particular, we focus on two specific in Section II, we first summarize the threat models from a
threats initiated by insiders in FL systems: 1) privacy attacks general perspective and discuss the customized threat models
that attempt to infer the victim participants’ private infor- for privacy and robustness, respectively. Section III presents a
mation and 2) poisoning attacks that attempt to prevent the comprehensive review of the privacy attacks in FL, particularly
learning of a global model or implant triggers to control targeting the sensitive information (class representative, mem-
the behavior of the global model. This article mainly surveys bership, properties, training inputs, and labels) in HFL with
the literature over the past five years on privacy and robustness homogeneous architectures. Section IV shows the detailed
in FL; it can be a notable inclusion to the existing literature, poisoning attacks that aim to compromise system robust-
helping the community better understand the state-of-the-art ness, including the untargeted and targeted poisoning attacks.
privacy and robustness progress in FL. The limitations and the Sections V and VI list the most representative privacy-
promising use cases of the existing works in the literature and preserving techniques and defense mechanisms for robustness,
open directions for future research are also offered to identify and current practices that have applied these techniques in FL.
the research gaps to address the challenges of privacy and From the lessons learned in this survey paper, the research
robustness in FL. gaps toward realizing trustworthy FL along with directions for
For empirical and use case analysis of privacy and robust- future research are provided in Section VII. Finally, concluding
ness, interesting readers can refer to [71], [72], and [73], which remarks are drawn in Section VIII.
showcased where and how the attacks and defense stand so far For better readability, we give a diagram in Fig. 2 show-
in FL. For example, Nasr et al. [71] provided a comprehensive ing the different aspects covered in the survey. The list of
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8730 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8731
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8732 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
D. Inferring Training Inputs and Labels centralized poisoning attack in which a subset of the training
data is poisoned.
One recent work called deep leakage from gradient (DLG)
proposes an optimization algorithm to extract both the training
A. Untargeted Attacks
inputs and the labels [29]. This attack is much stronger than
previous approaches. It can accurately recover the raw images Untargeted poisoning attacks aim to arbitrarily compromise
and texts used to train a deep learning model. In a follow- the integrity of the target model. The Byzantine attack is one
up work [32], an analytical approach called improved DLG type of untargeted poisoning attack that upload arbitrarily
(iDLG) was proposed to extract labels based on the shared malicious gradients to the server so as to cause the failure
gradients and an exploration of the correlation between the of the global model [39], [40], [61], [100], [107]. A formal
labels and the signs of the gradients. iDLG can be applied definition of Byzantine attack is given in Definition 1.
to attack any differentiable models trained with cross-entropy Definition 1 (Byzantine Attack [40], [63]): An honest par-
loss and one-hot labels, which is a typical setting for classifi- ticipant uploads wi := ∇ Fi (wi ), while a dishonest partici-
cation tasks. pant can upload arbitrary values
In summary, inference attacks generally assume that the ∗, if i -th participant is Byzantine
adversaries possess sophisticated technical capabilities and wi = (1)
unlimited computational resources. Moreover, most attacks ∇ Fi (wi ), otherwise
assume that the adversarial participants can be selected where “∗” represents arbitrary values and Fi represents par-
(to update the global model) in many rounds of the FL ticipant i ’s local model objective function.
training process. In FL, these assumptions are generally Blanchard et al. [40] showed that the aggregation of FL can
not practical in H2C scenarios but more likely to happen in be completely controlled by a single Byzantine participant if
H2B scenarios. These inference attacks highlight the need for there is no defense in the FL. In particular, suppose that there
gradient protection in FL, possibly through various privacy- are n − 1 benign participants and a Byzantine participant;
the
preserving mechanisms [3] detailed in Section V. server aggregates the gradients by w = (1/n) ni=1 wi ,
where w is the aggregated gradient. Assume the nth partic-
IV. P OISONING ATTACKS ipant is Byzantine; it can always make the aggregated gradient
Different from privacy attacks that are targeting at data become any vector u by uploading the following gradient:
privacy, poisoning attacks aim to compromise the system’s
n−1
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8733
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8734 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
hand, somewhat HE and partially HE are more efficient but In general, SMC techniques ensure a high level of privacy
are specified by a limited number of operations [126], [127], and accuracy, at the expense of high computation and commu-
[133], [134]. Partially HE schemes are more widely used in nication overhead, thereby doing a disservice to attracting par-
practice, including RSA [134], El Gamal [127], Paillier [126], ticipation. Another main challenge facing SMC-based schemes
and so on. The homomorphic properties can be described as is the requirement for simultaneous coordination of all partic-
ipants during the entire training process. Such a multiparty
E pk (m 1 + m 2 ) = c1 ⊕ c2
interaction model may not be desirable in practical settings,
E pk (a · m 1 ) = a ⊗ c1 especially under the commonly considered participant-server
where a is a constant, m 1 and m 2 are the plaintexts that need architecture in FL settings. Besides, SMC-based protocols
to be encrypted, and c1 and c2 are the ciphertext of m 1 and can enable multiple participants to collaboratively compute
m 2 , respectively. an agreed-upon function without leaking input information
HE is widely used and is especially useful for securing from any participant except for what can be inferred from the
the learning process by computing encrypted data. However, outcomes of the computation [135], [136]. That said, SMC
doing arithmetic on the encrypted numbers comes at a cost of cannot fully guarantee protection from information leakage,
memory and processing time. For example, with the Paillier which requires additional DP techniques to be incorporated
encryption scheme, the encryption of an encoded floating-point into the multiparty protocol to address such concerns [137],
number (whether single or double precision) is 2m bits long, [138], [139], [140].
where m is typically at least 1024 and the addition of two In summary, HE- or SMC-based approaches may not be
encrypted numbers is 2∼3 orders of magnitude slower than applicable to large-scale FL scenarios as they incur substantial
the unencrypted equivalent [9]. Moreover, polynomial approx- additional communication and computation costs. Moreover,
imations need to be made to evaluate nonlinear functions encryption-based techniques need to be carefully designed
in ML algorithms, resulting in a tradeoff between utility and implemented for each operation in the target learning
and privacy [116], [117]. For example, to protect individual algorithm [141], [142]. Finally, all the cryptography-based
gradients, Aono et al. [31] used additively HE to preserve the protocols prevent anyone from auditing participants’ updates
privacy of gradients and enhance the security of the distributed to the joint model, which leaves space for the malicious
learning system. However, their protocol not only incurs large participants to attack. For example, malicious participants can
communication and computational overheads but also results introduce stealthy backdoor functionality into the global model
in utility loss. Furthermore, it is not able to withstand collusion without being detected [36].
between the server and multiple participants. Hardy et al. [9]
C. Privacy-Preservation Through Differential Privacy
applied federated logistic regression on vertically partitioned
data encrypted with an additively homomorphic scheme to DP was originally designed for the single database scenario,
secure against an honest-but-curious adversary. Overall, all where, for every query made, a database server answers the
these works incur extra communication and computational query in a privacy-preserving manner with tailored randomiza-
overheads, which limit their applications in H2C scenarios. tion [131]. In comparison with encryption-based approaches,
DP trades off privacy and accuracy by perturbing the data in a
way that: 1) is computationally efficient; 2) does not allow an
B. Privacy-Preservation Through SMC attacker to recover the original data; and 3) does not severely
SMC [129] enables different participants with private inputs affect the utility.
to perform a joint computation on their inputs without reveal- The concept of DP is that the effect of the presence or
ing them to each other. Mohassel and Zhang [125] proposed the absence of a single record on the output likelihood is
SecureML that conducts privacy-preserving learning via SMC, bounded by a small factor . As defined in Definition 2, (, δ)-
where data owners need to process, encrypt and/or secret-share approximate DP [132] relaxes pure -DP by a δ additive term,
their data among two noncolluding servers in the initial setup which means that the unlikely responses need not satisfy the
phase. SecureML allows data owners to train various models pure DP criterion.
on their joint data without revealing any information beyond Definition 2 ((, δ)-DP [132]): For scalars > 0 and 0 ≤
the outcome. However, this comes at a cost of high compu- δ < 1, mechanism M is said to preserve (approximate) (, δ)-
tation and communication overhead, which may hamper par- DP if, for all adjacent datasets, D, D ∈ Dn and measurable
ticipants’ interest to collaborate. Bonawitz et al. [5] proposed S ∈ range(M)
a secure, communication-efficient, and failure-robust protocol
Pr{M(D) ∈ S} ≤ exp() · Pr M D ∈ S + δ .
based on SMC for secure aggregation of individual gradients.
It ensures that the only information about the individual users To avoid the worst case scenario of always violating the
the server learns is what can be inferred from the aggregated privacy of a δ fraction, the standard recommendation is to
results. The security of their protocol is maintained under both choose δ 1/|D|, where |D| is the size of the database.
the honest-but-curious and malicious settings, even when the This strategy forecloses the possibility of one particularly
server and a subset of users act maliciously—colluding and devastating outcome, but other forms of information leakage
deviating arbitrarily from the protocol. That is, no party learns remain.
anything more than the sum of the inputs of a subset of honest The privacy community generally categorizes DP into the
users of a large size [5]. following three categories as per different trust assumptions
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8735
TABLE III
C OMPARATIVE A NALYSIS A MONG CDP, LDP, AND DDP
and noise sources: CDP, LDP, and DDP. A comprehensive releases the aggregated value with an expected additive error
comparison among CDP, LDP, and DDP is listed in Table III. of at most (1/) to ensure -DP (e.g., using the Laplace
1) Centralized Differential Privacy: CDP was originally mechanism
√ [132]). In contrast, under the LDP model, at least
designed for the centralized scenario where a trusted database ( n/) additive error in expectation must be incurred by
server, which is entitled to see all participants’ data in the any -DP mechanism for the same task [146], [152]. This gap
clear, wishes to answer queries or publish statistics in a is essential for eliminating the trust in the centralized server
privacy-preserving manner by randomizing query results [42], and cannot be removed by algorithmic improvement [153].
[131], [144]. When CDP meets FL, CDP assumes a trusted To protect FL with homogeneous architectures, in which
aggregator, who is responsible for adding noise to the aggre- model parameters or gradients are shared, for example, Shokri
gated local gradients to ensure record-level privacy of the and Shmatikov [154] first applied LDP to distributed learn-
whole data of all participants [7], [118]. However, CDP is ing/FL, in which each participant individually adds noise to
geared to tackle thousands of users for training to converge its gradients before releasing to the server, thus ensuring LDP.
and achieve an acceptable tradeoff between privacy and accu- However, their privacy bounds are given per-parameter, and a
racy [7], resulting in a convergence problem with a small large number of parameters prevents their method from pro-
number of participants [145]. Moreover, CDP can achieve viding a meaningful privacy guarantee [42]. Other approaches
acceptable accuracy only with a large number of participants, that are also considered to apply LDP to FL can only support
thus not applicable to H2B with relatively a small number of shallow models, such as logistic regression, and only focus
participants. on simple tasks and datasets [119], [120], [121]. Bhowmick
Meanwhile, the assumption of a trusted server in CDP is et al. [27] presented a viable approach to large-scale local
ill-suited in many applications as it constitutes a single point private model training and introduced a relaxed version of
of failure for data breaches and saddles the trusted curator LDP by limiting the power of potential adversaries. Due to
with legal and ethical obligations to keep the user data secure. the high variance of their mechanism, it requires more than
When the aggregator is untrusted, which is often the case 200 communication rounds and incurs much higher privacy
in distributed scenarios, LDP [146] or DDP is needed [138], costs, i.e., MNIST ( = 500) and CIFAR-10 ( = 5000). Note
[147] to protect the privacy of individuals. that required in [27] is relatively large, as they considered
2) Local Differential Privacy: LDP [146] offers a stronger only privacy protection against reconstruction attacks instead
privacy guarantee, and data owners perturb their private infor- of membership attacks. Their obtained results suggested that
mation to satisfy DP locally before reporting it to an untrusted using LDP mechanisms with large may still provide decent
data curator [122], [123], [148]. A comprehensive survey of protection against reconstruction. Li et al. [143] proposed
LDP can be referred to [149]. A formal definition of LDP is locally differentially private algorithms in the context of met-
given in Definition 3. alearning, which might be applicable to FL with personaliza-
Definition 3 ((, δ)-LDP): A randomized algorithm M sat- tion. However, it only provides provable learning guarantees
isfies (, δ)-LDP ((, δ)-LDP) if and only if, for any input v in convex settings. Truex et al. [123] applied condensed LDP
and v , we have (α-CLDP) into FL. However, α-CLDP results in a weak
privacy guarantee. Another contemporary work called LDP-
Pr{M(v) = o} ≤ exp() · Pr M v = o + δ
FL [122] achieves better performance on both effectiveness
for ∀o ∈ Range(M), where Range(M) denotes the set of all and efficiency than [123] with a special communication design
possible outputs of the algorithm M. Furthermore, M is said for deep learning approaches.
to preserve (pure) -LDP if the condition holds for δ = 0. To protect FL with heterogeneous architectures, in which
Although the randomized response [150] and its vari- model predictions are shared, one naive approach is adding the
ants [151] have been widely used to provide LDP when locally differentially private random noise to the predictions
individuals disclose their personal information, we remark like in previous works. Although the privacy concern is miti-
that all the randomization mechanisms used for CDP, such as gated with random noise perturbation, it brings a new problem
Laplace and Gaussian mechanisms [132], can be individually with a substantial tradeoff between privacy budget and model
used by each participant to ensure LDP in isolation. However, utility. Sun and Lyu [45] filled in this gap by proposing a
in the distributed scenario, without the help of cryptographic novel framework called FEDMD-NFDP, which integrated a
techniques, each participant has to add enough calibrated novel noise-free DP (NFDP) mechanism into FedMD. The
noise to ensure LDP. The attractive privacy properties of LDP, LDP guarantee of NFDP roots in the local data sampling
thus, come with a huge utility degradation, especially with process, which explicitly eliminates noise addition and privacy
billions of individuals. Under the CDP model, the aggregator cost explosion issues in previous works.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8736 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
Fig. 5. Illustration of FL without privacy and with different DP mechanisms. M denotes a DP mechanism used to privatize the data. (b) In centralized DP,
the central server is trusted. (c) In LDP, the central server is not trusted; gradients are perturbed to ensure LDP before forwarding to the central server. (d) In
distributed DP, the central server is also not trusted; gradients are perturbed via DP mechanism M and encrypted via encryption operation E to ensure privacy
before forwarding to the central server, which needs to finally decrypt (D) the aggregated ciphertext. (a) FL without privacy. (b) Centralized DP: FL with a
trusted server. (c) Local DP: FL without a trusted server. (d) Distributed DP with SMC: FL without a trusted server.
3) Distributed Differential Privacy: DDP bridges the gap SMC to maintain utility and ensure aggregator oblivious-
between LDP and CDP while ensuring the privacy of each ness, as evidenced in [30], [46], [125], [138], [139], [140],
individual by combining with cryptographic protocols [30], and [141].
[137], [138], [139], [140]. Therefore, DDP avoids placing trust An illustration of FL without privacy and with different
in any server and offers better utility than LDP. Theoretically, DP mechanisms is given in Fig. 5. Another parallel line
DDP offers the same utility as CDP, as the total amount of of work for privacy-preserving distributed learning is to
noise is the same. transfer the knowledge of the ensemble of multiple models
The notion of DDP reflects the fact that the required noise in to a student model [42], [144], [155], [156]. For example,
the target statistic is sourced from multiple participants [147]. Hamm et al. [155] first created labeled data from auxiliary
Approaches to DDP that implement an overall additive noise unlabeled data, then used the labeled auxiliary data to find an
mechanism by summing the same mechanism run at each empirical risk minimizer, and, finally, released a differentially
participant (typically with less noise) necessitate mechanisms private classifier using output perturbation [157]. Similarly,
with stable distributions—to guarantee proper calibration of Papernot et al. [42], [144] proposed private aggregation of
known end-to-end response distribution—and cryptography teacher ensembles (PATE) to first train an ensemble of teach-
for hiding all but the final result from participants [30], ers on disjoint subsets of private data and then perturb the
[46], [124], [137], [138], [139], [147]. Stable distributions knowledge of the ensemble of teachers by adding noise to the
include Gaussian distribution, Binomial distribution [44], and aggregated teacher votes before transferring the knowledge to
so on, i.e., the sum of Gaussian random variables still follows a student. Finally, a student model is trained on the aggregate
a Gaussian distribution, and the sum of Binomial random output of the ensemble such that the student learns to accu-
variables still follows a Binomial distribution. DDP utilizes rately mimic the ensemble. PATE requires a lot of participants
this nice stability to permit each participant to randomize its to achieve reasonable accuracy, and each participant needs to
local statistic to a lesser degree than would LDP. However, have enough data to train an accurate model, which might
in DDP, only the sum of the individually released statis- not hold in the FL system, where the data distribution of
tics is (, δ)-differentially private but not the individually participants might be highly unbalanced, making this approach
released statistic. Therefore, DDP necessitates the help of unsuitable to the FL system.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8737
VI. D EFENSES AGAINST P OISONING ATTACKS and geometric-median-based robust aggregation rules are also
Robustness to poisoning attacks is a desirable property extensively explored in [165] and [166]. Pillutla et al. [64]
in FL. To address poisoning attacks, many robust aggregation proposed a robust aggregation approach called RFA by replac-
schemes are proposed in the literature. Known defenses to ing the weighted arithmetic mean with an approximate geo-
poisoning attacks in a centralized setting, such as robust metric median, so as to reduce the impact of the contaminated
losses [158] and anomaly detection [98], assume control of updates. Unfortunately, RFA can only handle a few types of
the participants or explicit observation of the training data. poisoning attackers but is not applicable to Byzantine attacks.
Neither of these assumptions is applicable to FL in which the 4) Weakness of Current Defenses: In spite of their robust-
server only observes the model parameters/updates that are ness guarantees, recent inspections revealed that previous
sent as part of the iterative ML algorithm [37]. We summarize Byzantine-robust FL mechanisms are also quite brittle and
the defenses against untargeted and targeted attacks as follows. can be easily circumvented. Bhagoji et al. [36] showed that
targeted model poisoning of deep neural networks is effec-
A. Defenses Against Untargeted Attacks tive even against the Byzantine-robust aggregation rules,
For Byzantine-resilient aggregation, an algorithm is Byzan- such as Krum and coordinatewise median. Baruch et al. [109]
tine fault tolerant [40] if its convergence is robust even when and Xie et al. [100] showed that, while the Byzantine-robust
a large portion of participants is adversarial. In the following, aggregation rules may ensure that the influence of the Byzan-
we list several representative attempts that try to defend against tine workers in any single round is limited, the attackers
untargeted Byzantine attacks. can couple their attacks across the rounds, moving weights
1) AUROR: Shen et al. [159] introduced a statistical mech- significantly away from the desired direction, and, thus,
anism called AUROR to detect malicious users while gener- achieve the goal of lowering the model quality. Xu and
ating an accurate model. AUROR is based on the observation Lyu [166] demonstrated that multi-Krum is not robust against
that indicative features (most important model features) from untargeted poisoning. This is because multi-Krum is based
the majority of honest users will exhibit a similar distribution, on the distance between each gradient vector and the mean
while those from malicious users will exhibit an anomalous vector, while the mean vector is not robust against untargeted
distribution. It then uses k-means to cluster participants’ poisoning. Fang et al. [80] showed that aggregation rules
updates across training rounds and discards the outliers, i.e., (e.g., Krum [40], Bulyan [160], trimmed mean [63], coor-
contributions from small clusters that exceed a threshold dinatewise median [63], and other median-based aggrega-
distance are removed. The accuracy of a model trained using tors [62]) that were claimed to be robust against Byzantine
AUROR drops by only 3% even when 30% of all the users failures are not effective in practice against optimized local
are adversarial. model poisoning attacks that carefully craft local models on
2) Krum: Blanchard et al. [40] proposed Krum, in which the compromised participants such that the aggregated global
the top f contributions to the model that is furthest from the model deviates the most toward the inverse of the direction
mean participant contribution are removed from the aggrega- along which the global model would change when there are
tion. Krum uses the Euclidean distance to determine which no attacks. All these highlight the need for more effective
gradient contributions should be removed. It can theoretically defenses against Byzantine attackers in FL.
withstand poisoning attacks of up to 33% adversaries among 5) Other Possibilities: Other works investigate Byzantine
the participants, i.e., given n agents of which f is Byzantine, robustness from different lenses. Chen et al. [163] presented
Krum requires that n ≥ 2 f + 3. Krum is resistant to DRACO, a framework for robust distributed training via
attacks by omniscient adversaries—aware of a good estimate algorithmic redundancy. DRACO is robust to arbitrarily mali-
of the gradient—who send the opposite vector multiplied by cious computing nodes while being orders of magnitude
a large factor. It is also resistant to attacks by adversaries faster than state-of-the-art robust distributed systems. How-
who send random vectors drawn from a Gaussian distribution ever, DRACO assumes that each participant can access other
(the larger the variance of the distribution, the stronger the participants’ data, limiting its practicability in FL. Su and
attack). Multi-Krum is a variant of Krum, which intuitively Xu [94] proposed to robustly aggregate the gradients computed
interpolates between Krum and averaging, thereby combining by the Byzantine participants based on the filtering proce-
the resilience properties of Krum with the convergence speed dure proposed by Steinhardt et al. [167]. Bernstein et al. [39]
of averaging. Essentially, Krum filters outliers based on the proposed SIGNSGD, which is combined with a majority vote
entire update vector but does not filter coordinatewise outliers. to enable participants to upload elementwise signs of their
3) Coordinatewise Statistics: To address this issue, gradients to defend against three types of half “blind” Byzan-
Yin et al. [63] proposed two robust distributed gradient descent tine adversaries: 1) adversaries that arbitrarily rescale their
algorithms: one based on coordinatewise median and the other stochastic gradient estimate; 2) adversaries that randomize
based on the coordinatewise trimmed mean. Unfortunately, the sign of each coordinate of the stochastic gradient; and
median-based rules can incur a prohibitive computational 3) adversaries that invert their stochastic gradient estimate.
overhead in large-scale settings [163]. Guerraoui et al. [160]
proposed a meta-aggregation rule called Bulyan, a two-step B. Defenses Against Targeted Attacks
meta-aggregation algorithm based on the Krum and trimmed Existing defenses against the targeted backdoor attacks can
median, which filters malicious updates followed by comput- be categorized into two types: detection methods and erasing
ing the trimmed median of the remaining updates. Median methods [168].
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8738 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
TABLE IV
S TATE - OF - THE -A RT D EFENSES A GAINST FL P OISONING . n I S THE N UMBER OF PARTICIPANTS .
N OTE T HAT S OME D EFENSES H AVE N O T HEORETIC B REAKING P OINT
1) Detection: Detection methods exploit activation statis- largely unexplored. Xie et al. [162] provided the first general
tics or model properties to determine whether a model is framework called certifiably robust FL (CRFL) to train CRFL
backdoored [169], [170] or whether a training/test example models against backdoors.
is a backdoor example [52]. There are a number of detection 4) Sybil Defenses in FL: In addition to backdoors, the
algorithms that are designed to detect which inputs contain targeted attack can also be launched by Sybil clones [37].
a backdoor, and which parts of the model (its activation To defend against the targeted poisoning attack by Sybil
functions specifically) are responsible for triggering the adver- clones, Fung et al. [37] exploited the characteristic behavior
sarial behavior of the model, in order to remove the back- that Sybils are more similar to each other than the similarity
door [47], [52], [53], [103], [171]. These algorithms rely on observed amongst the honest clients and proposed FoolsGold:
the statistical difference between the latent representations of a new defense scheme against FL Sybil attacks by adapting the
backdoor-enabled and clean (benign) inputs in the poisoned learning rate of participants based on contribution similarity.
model. These backdoor detection algorithms can, however, Note that FoolsGold does not bound the expected number of
be bypassed by maximizing the latent indistinguishability of attackers by assuming that attackers can spawn a large number
backdoor-enabled adversarial inputs and clean inputs [172]. of Sybils, rendering assumptions about proportions of honest
2) Erasing: While detection can help identify potential participants unrealistic [40]. In addition, FoolsGold requires no
risks, the backdoored model still needs to be purified/erased auxiliary information beyond the learning process and makes
since the potential impact of backdoor triggers remains fewer assumptions about participants and their data. The
uncleared in the backdoored models. The erasing methods robustness of FoolsGold holds for different distributions of
take a step further and aim to purify the adverse impacts participant data, varying poisoning targets, and various Sybil
on models caused by the backdoor triggers. The current strategies and can be applied successfully on both FedSGD
state-of-the-art erasing methods are mode connectivity repair and FedAvg.
(MCR) [173] and NAD [59]. MCR mitigates the backdoors 5) Summary: We list the most representative defenses
by selecting a robust model in the path of loss landscape, against poisoning attacks in FL in Table IV. Some of them
while NAD leverages knowledge distillation to erase triggers. have breaking points, i.e., the fraction of malicious partici-
Other previous methods, including fine-tuning, denoising, and pants, and robustness guarantees cannot be provided if the
fine-pruning [171], have been shown to be insufficient against fraction of malicious participants is larger than the breaking
the latest attacks [58], [174]. Another more recent work called point.
ABL [60] aims to train clean models given backdoor-poisoned 6) Remark: Note that both the untargeted and targeted
data. The overall learning process is framed as a dual task of poisoning attacks are less effective in settings with infrequent
learning the clean and the backdoor portions of data. Based participation like H2C [35]. Moreover, under practical produc-
on this process, ABL can: 1) help isolate backdoor examples tion FL environments, Shejwalkar et al. [73] have shown that
at an early training stage and 2) break the correlation between FL, even without any defenses, is highly robust in practice. For
backdoor examples and the target class at a later training production cross-device FL (H2C), which contains thousands
stage. to billions of clients, poisoning attacks have no impact on
3) Backdoor Defenses in FL: Despite the promising back- existing robust FL algorithms even with impractically high
door defense results in the centralized setting, it is still percentages of compromised clients. For production cross-silo
unclear whether these defenses can be smoothly adapted to FL (H2B), which contains up to 100 clients, data poisoning
the FL setting, especially in the non-i.i.d. setting. For backdoor attacks are completely ineffective; model poisoning attacks
defense in FL, Sun et al. [38] showed that clipping the norm are unlikely to play a major risk when the clients involved
of model updates and adding Gaussian noise can mitigate are bound by contract and their software stacks profession-
backdoor attacks that are based on the model replacement ally maintained (e.g., in banks, hospitals). Some exceptional
paradigm. Andreina et al. [175] incorporated an additional cross-silo scenarios are most likely with a strong incentive
validation phase in each round of FL to detect backdoor. How- (e.g., financial) causing multiple parties to be willing to risk
ever, none of these provides certified robustness guarantees. a breach of contract by colluding or for one party to hack
Certified robustness for FL against backdoor attacks remains thereby risking criminal liability. Therefore, we conclude that
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8739
these poisoning attacks are more likely to happen in some H2C scenario. In terms of DP-based methods [7], [121], [182],
exceptional H2B scenarios. [183], [184], [185], record-level DP bounds the success of
membership inference but does not prevent property inference
VII. D ISCUSSION AND P ROMISING D IRECTIONS applied to a group of training records [28]. Participant-level
There are still potential vulnerabilities that need to be DP, on the other hand, is geared to work with thousands of
addressed in order to improve the privacy and robustness of users for training to converge and achieving an acceptable
FL systems. Moreover, there are multiple design goals that tradeoff between privacy and accuracy [7]. The FL model
are equally important with privacy and robustness and, thus, fails to converge with a small number of participants, making
need to be considered simultaneously in FL. In this section, it unsuitable for H2B scenarios. Furthermore, DP may hurt
we outline research directions that we believe are promising. the accuracy of the learned model [186], which may not be
7) Curse of Dimensionality: Large models, with high appealing to the industry. Further work is needed to investigate
dimensional parameter vectors, are particularly susceptible if participant-level DP can protect FL systems with fewer
to privacy and security attacks [176]. Most FL algorithms participants. It is also worthwhile to explore whether we can
require overwriting the local model parameters with the global use the condensed data [187] rather than the raw data for local
model. This makes them susceptible to poisoning attacks, model training in order to better protect privacy.
as the adversary can make small but damaging changes in the 10) Optimizing Defense Mechanism Deployment: When
high-dimensional models without being detected. Almost all deploying defense mechanisms to check if any adversary is
of the well-designed Byzantine-robust aggregators [40], [63], attacking the FL system, the FL server will need additional
[64] still suffer from the curse of dimensionality. Specifically, computational costs. In addition, different types of defense
the estimation error scales up with the size of the model in mechanisms may exhibit different effectiveness against differ-
a square-root manner. Thus, sharing model parameters may ent attacks and incur different costs. It is important to study
not be a strong design choice in FL; it opens all the internal how to optimize the timing of deploying defense mechanisms
states of the model to inference attacks and maximizes the or the announcement of deterrence measures. Game theoretic
model’s malleability by poisoning attacks. To address these research holds promise in addressing this challenge.
fundamental shortcomings of FL, it is worthwhile to explore 11) Test-Phase Privacy in FL: This survey mainly focuses
whether sharing gradients is essential. Instead, sharing less on the training phase attacks and defenses in FL, considering
sensitive information (e.g., SIGNSGD [39]) or only sharing the more attack possibilities opened by the distributed training
model predictions [25], [45], [176] in a black-box manner may property of FL systems. In fact, FL is also vulnerable to both
result in more robust privacy protection in FL. privacy and robustness attacks during the test/inference phase
8) Rethinking Current Privacy Attacks: There are several by the users of the final FL model when it is deployed as a
inherent weaknesses in current attacks that may limit their service.
applicability in FL [177]. For example, the GAN attack In terms of privacy vulnerability, the trained global model
assumes that the entire training corpus for a given class comes may reveal sensitive information from model predictions when
from a single participant, and only in the special case where all deployed as a service, causing privacy leakage. In such a
class members are similar, GAN-constructed representatives setting, an adversary does not have direct access to the
are similar to the training data [75]. These assumptions are model parameters but may be able to view input–output
less practical in FL. For DLG [29] and iDLG [32], both works: pairs. Previous studies have shown a series of privacy leakage
1) adopt a second-order optimization method called L-BFGS given only black-box access to the trained models, such as:
that is more computationally expensive compared with first- 1) model stealing attacks in which model parameters can be
order optimizations;2) are only applicable to gradients com- reconstructed by an adversary who only has access to an
puted on minibatches of data, i.e., at most B = 8 in DLG and inference/prediction API based on those parameters [82], [83],
B = 1 in iDLG, which is not the real case for FL, in which [84], [85] and 2) MIAs that aim to determine if a particular
gradient is normally shared after at least 1 epoch of local record was used to train the model [86]. FL models face a sim-
training; and 3) used untrained model, neglecting gradients ilar dilemma during model deployment for testing purposes.
over multiple communication rounds. Attacking FL systems The development of effective defenses against privacy leakage
in a more efficient manner and under more practical settings during model deployment calls for further investigations.
remain largely unexplored. In addition, whether current attacks 12) Test-Phase Robustness in FL: In terms of robustness
still work in FL that uses adaptive optimization methods [178], vulnerability, recent studies [90], [91], [92] have shown that
such as SGDM and Adam, remains unknown. FL is also vulnerable to well-crafted adversarial examples.
9) Rethinking Current Defenses: FL with secure aggrega- During inference time, the attackers can add a very small
tion for the purpose of privacy is more susceptible to poisoning perturbation to the test data, making the test data almost indis-
attacks as individual updates cannot be inspected. Similarly, tinguishable from natural data and yet classified incorrectly by
it is still unclear if AT, one state-of-the-art defense approach the global model. For federated robustness against adversarial
against adversarial attacks in conventional ML [93], [179], examples, Zizzo et al. [90] and Hong et al. [91] proposed
[180], can be adapted to FL, as AT was developed primarily for to apply AT to FL, i.e., FAT, in order to achieve adversarial
i.i.d. data and remains unclear for its performance in non-i.i.d. robustness in FL. Zizzo et al. [90] noticed that conducting AT
settings. Moreover, AT is computationally expensive and may on all participants leads to divergence of the model. To solve
hurt the performance [181], which may not be feasible for the this problem, they conducted AT on only a proportion of
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8740 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
participants for better convergence. Another recent work by For VFL, Secureboost [191] considered user privacy and
Hong et al. [91] considered hardware heterogeneity in FL, data confidentiality in VFL and presented an approach to train
i.e., only limited users can afford AT. Hence, they conduct a high-quality tree-boosting model collaboratively. A recent
AT on only a proportion of participants that have powerful work called FederBoost [192] pointed out that Secureboost
computation resources while standard training on the rest of is expensive since it requires cryptographic computation and
the participants. Shah et al. [92] investigated the impact of communication for each possible split; thus, they proposed a
communication rounds in FAT and proposed a dynamic AT. vertical FederBoost, which does not require any cryptographic
The training of all the above FAT works is unstable, which operation. Another recent work by Jin et al. [193] uncovered
potentially hurts the convergence and performance. Moreover, the risk of catastrophic data leakage in vertical FL (CAFE)
AT typically requires significant computation and a longer time through a novel algorithm that can perform large-batch data
to converge, and it is unclear how it performs in non-i.i.d. leakage with high data recovery quality and theoretical guar-
settings. Chen et al. [188] took the first step to investigate FAT antees. They empirically demonstrated that CAFE can recover
under non-i.i.d. setting with label skewness. However, how to large-scale private data from the shared aggregated gradients
speed up AT in FL may be required in the future. Overall, there in VFL settings, overcoming the batch limitation problem in
exist difficulties in applying AT to the federated setting. This current data leakage attacks.
motivates future works to explore more effective approaches For FTL, Gao et al. [24] proposed an end-to-end privacy-
to maintain both natural accuracy and robustness in FL. preserving multiparty learning approach with two variants
In addition to the adversarial examples, recent works [83], based on HE and secret-sharing techniques, respectively,
[84] have validated that the API services (the victim/target in order to build a heterogeneous FTL (HFTL) frame-
model) can be easily stolen and are vulnerable to adversarial work. Liu et al. [23] adopted two secure approaches, namely,
example transferability attacks. It would be interesting to HE and secret sharing for preserving privacy. The HE approach
explore whether the collaboratively built global model in FL is simple but computationally expensive. By contrast, the
is also facing a similar problem and how to effectively claim secret-sharing approach offers the following advantages: 1)
the ownership of the trained model [189]. there is no accuracy loss and 2) computation is much
13) Relationship With GDPR: GDPR1 defines six-core prin- faster than the HE approach. The major drawback of the
ciples as rational guidelines for service providers to manage secret-sharing approach is that one has to offline generate and
personal data, including: 1) lawfulness, fairness, and trans- store many triplets before online computation.
parency; 2) purpose limitation; 3) data minimization; 4) accu- Overall, there is still a large space for VFL and FTL. It is
racy; 5) storage limitation; and 6) integrity and confidentiality worth further investigation as to whether existing threats in
(security). GDPR also requires data controllers to provide HFL are all valid in VFL and FTL or if there are new threats
the following rights for data subjects if capable (the GDPR and countermeasures in VFL and FTL.
Articles 12–23): 1) right to be informed; 2) right of access; 15) Vulnerabilities to Free-Riding Participants: In FL sys-
3) right to rectification; 4) right to erasure (right to be forgot- tems, there may exist free-riders who aim to benefit from
ten); 5) right to restrict processing; 6) right to data portability; the global model but do not want to contribute any useful
7) right to object; and 8) rights in relation to automated information, thus compromising collaborative fairness [183],
decision making and profiling. Although FL has emerged as [184], [194]. The main incentives for free-riders include:
a prospective solution that facilitates distributed collaborative 1) the participant does not have any data to train the local
learning without disclosing original training data, unfortu- model; 2) the participant is too concerned about its privacy
nately, FL is not naturally compliant with the GDPR [190], and, thus, chooses to release fake updates; and 3) the partic-
as pointed out by a recent survey [190], which has dedicated ipant does not want to consume or does not have any local
to surveying the relationship between FL and GDPR require- computation power to train the local model. In the current FL
ments. For example, the secure aggregation mechanism in FL paradigm [6], all participants receive the same federated model
amplifies the lack of transparency and fairness in FL systems at the end of the collaborative training, regardless of their
and, thus, fails to fully comply with the GDPR requirements individual contributions. This makes the paradigm vulnerable
of fairness and transparency; malicious participants in FL may to free-riding participants [166], [183], [184], [195], [196].
conduct either data or model poisoning attacks for an unautho- How to prevent free-riding remains an open challenge. Incen-
rized purpose, and local ML model parameters obtained from tivized FL (via allocating different reputations to different
participants are no longer minimal for the original purpose. participants and penalizing unreliable or malicious partici-
These possible attacks, which lead to noncompliance with the pants) [194], [197] would be an important direction to help
GDPR, should be addressed. Henceforth, it is worthwhile to address free-riding problem and possibly the before-mentioned
explore approaches to empower FL-based systems to follow privacy and poisoning attack problems in Sections III and IV.
the GDPR regulatory guidelines and, thus, fully comply with 16) Reliability of FL Over Wireless Network: When FL
the GDPR. systems are deployed in the real world, unreliable data may be
14) Threats and Protections of VFL and FTL: This survey uploaded by mobile devices (i.e., workers). The workers may
mainly focuses on the threats to HFL; there are some recent perform unreliable updates intentionally, e.g., the data poi-
exploratory efforts on threats and protections of VFL and FTL. soning attack, or unintentionally, e.g., low-quality data caused
by energy constraints or high-speed mobility [198]. Similarly,
1 https://gdpr-info.eu when FL meets UAVs, reliability is a key factor that may
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8741
affect performance [14]. Therefore, finding out trusted and and there exist large gaps as to how to simultaneously achieve
reliable workers for FL tasks becomes critical. The concept of all the above six objectives.
reputation could be used for reliable worker selection strategy
design in order to keep the low-quality devices from affecting VIII. C ONCLUSION
the learning efficiency and accuracy [198]. Although FL is still in its infancy, it will continue to
thrive and will be an active and important research area in
17) Extension to Decentralized FL: Decentralized FL is the foreseeable future. As FL evolves, so will the privacy
an emerging research area, where there is no single central and robustness threats to FL. It is of vital importance to
server in the system [3], [7], [183], [184]. Decentralized FL is provide a broad overview of current attacks and defenses on
potentially more useful for H2B scenarios where the business FL so that future FL system designers are well aware of
participants do not trust any third party. In this paradigm, the potential vulnerabilities in the current designs and help
each participant could be elected as a server in a round- them clear roadblocks toward the real-world deployment of
robin manner. The recent emerging swarm learning [199] can FL. This survey serves as a concise and accessible overview
be deemed as a decentralized FL framework, which unites of this topic, and it would greatly help our understanding
edge computing, blockchain-based peer-to-peer networking, of the privacy and robustness attack and defense landscape
and coordination while maintaining confidentiality without the in FL. Global collaboration on FL is emerging through a
need for a central coordinator. It is interesting to investigate number of workshops at leading AI conferences.2 The ultimate
whether existing threats to server-based FL still apply in goal of developing a general-purpose FL defense mechanism
decentralized FL. that can be robust against various attacks without degrading
18) Efficient FL With Single Round Communication: In model performance will require interdisciplinary effort from
addition to privacy and robustness, communication cost is the wider research community.
another major concern that may hinder the large-scale imple-
mentation of FL [200]. One-shot FL has recently emerged as ACKNOWLEDGMENT
a promising approach for communication efficiency. It allows Any opinions, findings, and conclusions or recommenda-
the central server to learn a model in a single communication tions expressed in this material are those of the author(s) and
round. Despite the low communication cost, existing one-shot do not reflect the views of the National Research Foundation,
FL methods are mostly impractical or face inherent limitations, Singapore.
e.g., a public dataset is required, participants’ models are
R EFERENCES
homogeneous, additional data/model information needs to be
uploaded, and unsatisfactory performance [201], [202], [203]. [1] H. Li, K. Ota, and M. Dong, “Learning IoT in edge: Deep learning
for the Internet of Things with edge computing,” IEEE Netw., vol. 32,
Recent work proposed a more practical data-free approach no. 1, pp. 96–101, Jan. 2018.
named DENSE for a one-shot FL framework with hetero- [2] M. Abadi et al., “Deep learning with differential privacy,” in Proc.
geneity [204]. Other alternative one-shot FL approaches with CCS, 2016, pp. 308–318.
[3] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
practical assumptions are worthwhile to explore, considering learning,” Synth. Lect. Artif. Intell. Mach. Learn., vol. 13, no. 3,
the alluring communication efficiency and less privacy and pp. 1–207, 2019.
robustness attack surfaces exposed in one-shot FL. [4] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and
B. A. Y. Arcas, “Communication-efficient learning of deep networks
19) Achieving Multiple Objectives Simultaneously: There from decentralized data,” 2016, arXiv:1602.05629.
[5] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
are no existing works that can satisfy multiple goals simultane- machine learning,” in Proc. CCS, 2017, pp. 1175–1191.
ously: 1) fast algorithmic convergence; 2) good generalization [6] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,
performance; 3) communication efficiency; 4) fault tolerance; “Communication-efficient learning of deep networks from decentral-
ized data,” in Artificial Intelligence and Statistics. PMLR, 2017,
5) privacy preservation; and 6) robustness to targeted, untar- pp. 1273–1282.
geted poisoning attacks, and free-riders. Previous works have [7] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning
attempted to solve multiple objectives at the same time. For differentially private recurrent language models,” in Proc. ICLR, 2018,
pp. 1–14.
example, Lyu et al. [183], [184] addressed collaborative fair- [8] Y. Liu et al., “Fedvision: An online visual object detection platform
ness and privacy simultaneously; Xu and Lyu et al. [166] pro- powered by federated learning,” in Proc. IAAI, 2020, pp. 13172–13179.
posed a robust and fair FL (RFFL) framework to address both [9] S. Hardy et al., “Private federated learning on vertically partitioned data
via entity resolution and additively homomorphic encryption,” 2017,
collaborative fairness and Byzantine robustness. However, it is arXiv:1711.10677.
important to highlight that there is an inherent conflict between [10] C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, “Communication-
privacy and robustness: defending against robustness attacks efficient federated learning via knowledge distillation,” Nature Com-
mun., vol. 13, no. 1, pp. 1–8, 2022.
usually requires complete control of the training process or [11] C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, “FedCTR: Federated
access to the training data [37], [40], [63], [159], [205], [206], native ad CTR prediction with cross platform user behavior data,” ACM
which goes against the privacy requirements of FL. Although Trans. Intell. Syst. Technol., vol. 13, no. 4, pp. 62:1–62:19, 2022.
[12] J. Cui, C. Chen, L. Lyu, C. Yang, and W. Li, “Exploiting data sparsity
using encryption or DP-based techniques can provide provably in secure cross-platform social recommendation,” in Proc. NIPS, 2021,
privacy preservation, they are not robust to poisoning attacks pp. 10524–10534.
and may produce models with undesirably poor privacy-utility [13] J. Li, L. Lyu, X. Liu, X. Zhang, and X. Lyu, “FLEAM: A federated
learning empowered architecture to mitigate DDoS in industrial IoT,”
tradeoffs. Agarwal et al. [30] combined DP with model com- IEEE Trans. Ind. Informat., vol. 18, no. 6, pp. 4059–4068, Jun. 2021.
pression techniques to reduce communication costs and obtain
privacy benefits simultaneously. It remains largely unexplored, 2 http://www.federated-learning.org/
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8742 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
[14] H. Yang, J. Zhao, Z. Xiong, K.-Y. Lam, S. Sun, and L. Xiao, “Privacy- [42] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar,
preserving federated learning for UAV-enabled networks: Learning- “Semi-supervised knowledge transfer for deep learning from private
based joint scheduling and resource management,” IEEE J. Sel. Areas training data,” in Proc. ICLR, 2017, pp. 1–16.
Commun., vol. 39, no. 10, pp. 3144–3159, Oct. 2021. [43] L. Lyu, “Privacy-preserving machine learning and data aggregation for
[15] C. Wu, F. Wu, Y. Cao, Y. Huang, and X. Xie, “FedGNN: Federated Internet of Things,” Ph.D. dissertation, Dept. Elect. Electron. Eng.,
graph neural network for privacy-preserving recommendation,” 2021, Univ. Melbourne, Melbourne, VIC, Australia, 2018.
arXiv:2102.04925. [44] L. Lyu et al., “Distributed privacy-preserving prediction,” in Proc. Int.
[16] C. Chen et al., “Vertically federated graph neural network for privacy- Conf. Syst., Man, Cybern., 2020.
preserving node classification,” in Proc. IJCAI, 2022, pp. 1–9. [45] L. Sun and L. Lyu, “Federated model distillation with noise-free
[17] X. Ni, X. Xu, L. Lyu, C. Meng, and W. Wang, “A vertical fed- differential privacy,” in Proc. IJCAI, 2021.
erated learning framework for graph convolutional network,” 2021, [46] S. Truex et al., “A hybrid approach to privacy-preserving federated
arXiv:2106.11593. learning,” in Proc. 12th ACM Workshop Artif. Intell. Secur., 2019,
[18] C. Wu, F. Wu, L. Lyu, T. Qi, Y. Huang, and X. Xie, “A federated graph pp. 1–11.
neural network framework for privacy-preserving personalization,” [47] Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans,” in Proc. IEEE Int.
Nature Commun., vol. 13, no. 1, pp. 1–10, Dec. 2022. Conf. Comput. Design (ICCD), Nov. 2017, pp. 45–48.
[19] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning:
[48] B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input
Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10,
purification defense against trojan attacks on deep neural network sys-
no. 2, pp. 1–19, 2019.
tems,” in Proc. Annu. Comput. Secur. Appl. Conf., 2020, pp. 897–912.
[20] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed mining
[49] S. Udeshi, S. Peng, G. Woo, L. Loh, L. Rawshan, and S. Chattopad-
of association rules on horizontally partitioned data,” IEEE Trans.
hyay, “Model agnostic defence against backdoor attacks in machine
Knowl. Data Eng., vol. 16, no. 9, pp. 1026–1037, Sep. 2004.
learning,” IEEE Trans. Rel., vol. 71, no. 2, pp. 880–895, Jun. 2022.
[21] J. Vaidya and C. Clifton, “Privacy preserving association rule mining
in vertically partitioned data,” in Proc. KDD, 2002, pp. 639–644. [50] M. Villarreal-Vasquez and B. Bhargava, “ConFoc: Content-focus
[22] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. protection against trojan attacks on neural networks,” 2020,
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Dec. 2009. arXiv:2007.00711.
[23] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang, “A secure feder- [51] Y. Li, T. Zhai, B. Wu, Y. Jiang, Z. Li, and S. Xia, “Rethinking the
ated transfer learning framework,” IEEE Intell. Syst., vol. 35, no. 4, trigger of backdoor attack,” 2020, arXiv:2004.04692.
pp. 70–82, Jul./Aug. 2020. [52] B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,”
[24] D. Gao, Y. Liu, A. Huang, C. Ju, H. Yu, and Q. Yang, “Privacy- in Proc. NIPS, 2018, pp. 8000–8010.
preserving heterogeneous federated transfer learning,” in Proc. IEEE [53] B. Chen et al., “Detecting backdoor attacks on deep neural networks
Int. Conf. Big Data (Big Data), Dec. 2019, pp. 2552–2559. by activation clustering,” 2018, arXiv:1811.03728.
[25] D. Li and J. Wang, “FedMD: Heterogenous federated learning via [54] D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant:
model distillation,” 2019, arXiv:1910.03581. Statistical analysis of DNNs for robust backdoor contamination detec-
[26] R. Liu et al., “No one left behind: Inclusive federated learning over tion,” in Proc. 30th USENIX Secur. Symp. (USENIX Security), 2021,
heterogeneous devices,” in Proc. KDD, 2022, pp. 1–9. pp. 1541–1558.
[27] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers, “Pro- [55] E. Soremekun, S. Udeshi, and S. Chattopadhyay, “Exposing backdoors
tection against reconstruction and its applications in private federated in robust machine learning models,” 2020, arXiv:2003.00865.
learning,” 2018, arXiv:1812.00984. [56] A. Chan and Y.-S. Ong, “Poison as a cure: Detecting & neutraliz-
[28] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting ing variable-sized backdoor attacks in deep neural networks,” 2019,
unintended feature leakage in collaborative learning,” in Proc. IEEE arXiv:1911.08040.
Symp. Secur. Privacy (SP), May 2019, pp. 691–706. [57] E. Chou, F. Tramer, and G. Pellegrino, “SentiNet: Detecting localized
[29] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” in Proc. universal attack against deep learning systems,” in Proc. IEEE Secur.
NIPS, 2019, pp. 14747–14756. Privacy Workshops (SPW), May 2020, pp. 48–54.
[30] N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan, [58] Y. Liu, X. Ma, J. Bailey, and F. Lu, “Reflection backdoor: A natural
“CpSGD: Communication-efficient and differentially-private distrib- backdoor attack on deep neural networks,” in Proc. ECCV. Springer,
uted SGD,” in Proc. NIPS, 2018, pp. 7564–7575. 2020, pp. 182–199.
[31] L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai, “Privacy- [59] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention
preserving deep learning via additively homomorphic encryption,” distillation: Erasing backdoor triggers from deep neural networks,” in
IEEE Trans. Inf. Forensics Security, vol. 13, no. 5, pp. 1333–1345, Proc. ICLR, 2021, pp. 1–19.
May 2018. [60] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Anti-backdoor
[32] B. Zhao, K. R. Mopuri, and H. Bilen, “IDLG: Improved deep leakage learning: Training clean models on poisoned data,” in Proc. NIPS, 2021,
from gradients,” 2020, arXiv:2001.02610. pp. 14900–14912.
[33] H. Wang et al., “Attack of the tails: Yes, you really can backdoor [61] G. Damaskinos, E. M. El Mhamdi, R. Guerraoui, A. H. A. Guirguis,
federated learning,” in Proc. NIPS, 2020, pp. 1–15. and S. L. A. Rouault, “Aggregathor: Byzantine machine learning via
[34] C. Xie, K. Huang, P. Chen, and B. Li, “DBA: Distributed backdoor robust gradient aggregation,” in Proc. SysML, 2019, pp. 81–106.
attacks against federated learning,” in Proc. ICLR, 2020, pp. 1–19. [62] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning
[35] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to in adversarial settings: Byzantine gradient descent,” ACM Meas. Anal.
backdoor federated learning,” in Proc. Int. Conf. Artif. Intell. Statist., Comput. Syst., vol. 1, no. 2, p. 44, 2017.
2020, pp. 2938–2948.
[63] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distrib-
[36] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing uted learning: Towards optimal statistical rates,” in Proc. ICML, 2018,
federated learning through an adversarial lens,” in Proc. ICML, 2019,
pp. 5650–5659.
pp. 634–643.
[64] K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggrega-
[37] C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated
tion for federated learning,” IEEE Trans. Signal Process., vol. 70,
learning in Sybil settings,” in Proc. 23rd Int. Symp. Res. Attacks,
pp. 1142–1154, 2022.
Intrusions Defenses (RAID), 2020, pp. 301–316.
[38] Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really [65] M. S. Ozdayi, M. Kantarcioglu, and Y. R. Gel, “Defending against
backdoor federated learning?” 2019, arXiv:1911.07963. backdoors in federated learning with robust learning rate,” in Proc.
[39] J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, AAAI Conf. Artif. Intell., vol. 35, no. 10, 2021, pp. 9268–9276.
“SignSGD with majority vote is communication efficient and fault [66] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning:
tolerant,” in Proc. ICLR, 2019, pp. 1–20. Challenges, methods, and future directions,” IEEE Signal Process.
[40] P. Blanchard et al., “Machine learning with adversaries: Byzantine Mag., vol. 37, no. 3, pp. 50–60, May 2020.
tolerant gradient descent,” in Proc. NIPS, 2017, pp. 119–129. [67] P. Kairouz et al., “Advances and open problems in federated learning,”
[41] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, 2019, arXiv:1912.04977.
“Communication-efficient on-device machine learning: Federated dis- [68] Q. Li et al., “A survey on federated learning systems: Vision, hype
tillation and augmentation under non-IID private data,” 2018, and reality for data privacy and protection,” IEEE Trans. Knowl. Data
arXiv:1811.11479. Eng., early access, Nov. 2, 2021, doi: 10.1109/TKDE.2021.3124599.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8743
[69] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, [96] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks
and G. Srivastava, “A survey on security and privacy of federated learn- that exploit confidence information and basic countermeasures,” in
ing,” Future Gener. Comput. Syst., vol. 115, pp. 619–640, Feb. 2021. Proc. 22nd ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2015,
[70] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey pp. 1322–1333.
on federated learning,” Knowl.-Based Syst., vol. 216, Mar. 2021, [97] S. Guo, C. Xie, J. Li, L. Lyu, and T. Zhang, “Threats to pre-trained
Art. no. 106775. language models: Survey and taxonomy,” 2022, arXiv:2202.06862.
[71] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy [98] B. I. P. Rubinstein et al., “ANTIDOTE: Understanding and defending
analysis of deep learning: Passive and active white-box inference against poisoning of anomaly detectors,” in Proc. 9th ACM SIGCOMM
attacks against centralized and federated learning,” in Proc. IEEE Symp. Internet Measurement Conf., 2009, pp. 1–14.
Secur. Privacy (SP), May 2019, pp. 739–753. [99] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li,
[72] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient “Manipulating machine learning: Poisoning attacks and countermea-
inversion attacks and defenses in federated learning,” in Proc. NIPS, sures for regression learning,” in Proc. IEEE Symp. Secur. Privacy
2021, pp. 7232–7241. (SP), May 2018, pp. 19–35.
[73] V. Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back [100] C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking Byzantine-
to the drawing board: A critical evaluation of poisoning attacks on tolerant SGD by inner product manipulation,” in Uncertainty in Artifi-
production federated learning,” in Proc. IEEE Symp. Secur. Privacy cial Intelligence. PMLR, 2020, pp. 261–270.
(SP), May 2022, pp. 1354–1371. [101] B. Nelson et al., “Exploiting machine learning to subvert your spam
[74] B. Biggio, B. Nelson, and P. Laskov, “Support vector machines under filter,” in Proc. LEET, vol. 8, 2008, pp. 1–9.
adversarial label noise,” in Proc. ACML, 2011, pp. 97–112.
[102] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar,
[75] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the
“Adversarial machine learning,” in Proc. 4th ACM Workshop Secur.
GAN: Information leakage from collaborative deep learning,” in Proc.
Artif. Intell., 2011, pp. 43–58.
ACM SIGSAC Conf. Comput. Commun. Secur., Oct. 2017, pp. 603–618.
[103] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted back-
[76] C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data
door attacks on deep learning systems using data poisoning,” 2017,
poisoning attacks in crowd sensing systems,” in Proc. 18th ACM Int.
arXiv:1712.05526.
Symp. Mobile Ad Hoc Netw. Comput., Jun. 2018, pp. 111–120.
[77] C. Miao, Q. Li, L. Su, M. Huai, W. Jiang, and J. Gao, “Attack [104] T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: Identifying vul-
under disguise: An intelligent data poisoning attack mechanism in nerabilities in the machine learning model supply chain,” 2017,
crowdsourcing,” in Proc. World Wide Web Conf., 2018, pp. 13–22. arXiv:1708.06733.
[78] H. Zhang et al., “Data poisoning attack against knowledge graph [105] A. Shafahi et al., “Poison frogs! Targeted clean-label poisoning attacks
embedding,” 2019, arXiv:1904.12052. on neural networks,” in Proc. NIPS, 2018, pp. 6103–6113.
[79] G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, and J. Liu, “Data poisoning [106] Y. Liu et al., “Trojaning attack on neural networks,” in Proc. NDSS,
attacks on federated machine learning,” IEEE Internet Things J., vol. 9, 2018, pp. 1–17.
no. 13, pp. 11365–11375, Jul. 2022. [107] L. Lamport, R. Shostak, and M. Pease, “The Byzantine generals
[80] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks problem,” ACM Trans. Program. Lang. Syst., vol. 4, no. 3, pp. 382–401,
to Byzantine-robust federated learning,” in Proc. 29th USENIX Secur. Jul. 1982.
Symp. (USENIX), 2020, pp. 1605–1622. [108] C. Chen, J. Zhang, A. K. H. Tung, M. Kankanhalli, and G. Chen,
[81] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support “Robust federated recommendation system,” 2020, arXiv:2006.08259.
vector machines,” 2012, arXiv:1206.6389. [109] G. Baruch, M. Baruch, and Y. Goldberg, “A little is enough: Cir-
[82] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing cumventing defenses for distributed learning,” in Proc. NIPS, 2019,
machine learning models via prediction APIs,” in Proc. USENIX Secur. pp. 8632–8642.
Symp., 2016, pp. 601–618. [110] T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in
[83] X. He, L. Lyu, L. Sun, and Q. Xu, “Model extraction and adversarial Proc. NIPS, 2020, pp. 3454–3464.
transferability, your BERT is vulnerable!” in Proc. Conf. North Amer. [111] L. Muñoz-González et al., “Towards poisoning of deep learning algo-
Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2021, rithms with back-gradient optimization,” in Proc. 10th ACM Workshop
pp. 2006–2012. Artif. Intell. Secur., 2017, pp. 27–38.
[84] Q. Xu, X. He, L. Lyu, L. Qu, and G. Haffari, “Beyond model extraction: [112] P. W. Koh and P. Liang, “Understanding black-box predictions via
Imitation attack for black-box NLP APIs,” in Proc. COLING, 2022, influence functions,” in Proc. ICML, 2017, pp. 1885–1894.
pp. 1–12. [113] S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, “Clean-
[85] X. He et al., “CATER: Intellectual property protection on text genera- label backdoor attacks on video recognition models,” in Proc. CVPR,
tion APIs via conditional watermarks,” in Proc. NIPS, 2022, pp. 1–19. Jun. 2020, pp. 14443–14452.
[86] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership [114] Y. Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia, “Narcissus: A
inference attacks against machine learning models,” in Proc. IEEE practical clean-label backdoor attack with limited information,” 2022,
Symp. Secur. Privacy (SP), May 2017, pp. 3–18. arXiv:2204.05255.
[87] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can
[115] J. R. Douceur, “The Sybil attack,” in Proc. Int. Workshop Peer-to-Peer
machine learning be secure?” in Proc. ICCS, 2006, pp. 16–25.
Syst., 2002, pp. 251–260.
[88] B. Biggio et al., “Evasion attacks against machine learning at test time,”
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, [116] Y. Aono, T. Hayashi, L. T. Phong, and L. Wang, “Scalable and secure
2013, pp. 387–402. logistic regression via homomorphic encryption,” in Proc. 6th ACM
Conf. Data Appl. Secur. Privacy, Mar. 2016, pp. 142–144.
[89] C. Szegedy et al., “Intriguing properties of neural networks,” 2013,
arXiv:1312.6199. [117] M. Kim, Y. Song, S. Wang, Y. Xia, and X. Jiang, “Secure logistic
[90] G. Zizzo, A. Rawat, M. Sinn, and B. Buesser, “FAT: Federated regression based on homomorphic encryption: Design and evaluation,”
adversarial training,” 2020, arXiv:2012.01791. JMIR Med. Informat., vol. 6, no. 2, p. e19, Apr. 2018.
[91] J. Hong, H. Wang, Z. Wang, and J. Zhou, “Federated robustness [118] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated
propagation: Sharing robustness in heterogeneous federated learning,” learning: A client level perspective,” 2017, arXiv:1712.07557.
2021, arXiv:2106.10196. [119] T. T. Nguyên, X. Xiao, Y. Yang, S. C. Hui, H. Shin, and J. Shin,
[92] D. Shah, P. Dube, S. Chakraborty, and A. Verma, “Adversarial “Collecting and analyzing data from smart device users with local
training in communication constrained federated learning,” 2021, differential privacy,” 2016, arXiv:1606.05053.
arXiv:2103.01319. [120] N. Wang et al., “Collecting and analyzing multidimensional data with
[93] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards local differential privacy,” in Proc. IEEE 35th Int. Conf. Data Eng.
deep learning models resistant to adversarial attacks,” in Proc. ICLR, (ICDE), Apr. 2019, pp. 638–649.
2018, pp. 1–28. [121] Y. Zhao et al., “Local differential privacy-based federated learning
[94] L. Su and J. Xu, “Securing distributed gradient descent in high for Internet of Things,” IEEE Internet Things J., vol. 8, no. 11,
dimensional statistical learning,” 2018, arXiv:1804.10140. pp. 8836–8853, Nov. 2020.
[95] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand- [122] L. Sun, J. Qian, X. Chen, and P. S. Yu, “LDP-FL: Practical private
ing deep learning (still) requires rethinking generalization,” Commun. aggregation in federated learning with local differential privacy,” in
ACM, vol. 64, no. 3, pp. 107–115, 2021. Proc. IJCAI, 2021, pp. 1–9.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8744 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
[123] S. Truex, L. Liu, K.-H. Chow, M. E. Gursoy, and W. Wei, “LDP-fed: [149] M. Yang, L. Lyu, J. Zhao, T. Zhu, and K.-Y. Lam, “Local differ-
Federated learning with local differential privacy,” in Proc. 3rd ACM ential privacy and its applications: A comprehensive survey,” 2020,
Int. Workshop Edge Syst., Analytics Netw., Apr. 2020, pp. 61–66. arXiv:2008.03686.
[124] L. Lyu, “Lightweight crypto-assisted distributed differential privacy for [150] S. L. Warner, “Randomized response: A survey technique for elimi-
privacy-preserving distributed learning,” in Proc. Int. Joint Conf. Neural nating evasive answer bias,” J. Amer. Statist. Assoc., vol. 60, no. 309,
Netw. (IJCNN), Jul. 2020, pp. 1–8. pp. 63–69, 1965.
[125] P. Mohassel and Y. Zhang, “SecureML: A system for scalable privacy- [151] U. Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized
preserving machine learning,” in Proc. IEEE Symp. Secur. Privacy (SP), aggregatable privacy-preserving ordinal response,” in Proc. CCS, 2014,
May 2017, pp. 19–38. pp. 1054–1067.
[152] T. H. Chan, E. Shi, and D. Song, “Optimal lower bound for differen-
[126] P. Paillier et al., “Public-key cryptosystems based on composite degree
tially private multi-party aggregation,” in Proc. Eur. Symp. Algorithms,
residuosity classes,” in Proc. Eurocrypt, vol. 99, 1999, pp. 223–238.
2012, pp. 277–288.
[127] T. ElGamal, “A public key cryptosystem and a signature scheme based [153] T. Chan, K.-M. Chung, B. M. Maggs, and E. Shi, “Foundations of
on discrete logarithms,” IEEE Trans. Inf. Theory, vol. IT-31, no. 4, differentially oblivious algorithms,” in Proc. 30th Annu. ACM-SIAM
pp. 469–472, Jul. 1985. Symp. Discrete Algorithms, 2019, pp. 2448–2467.
[128] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in [154] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
Proc. 41st Annu. ACM Symp. Symp. theory Comput. (STOC), 2009, Proc. 53rd Annu. Allerton Conf. Commun., Control, Comput. (Allerton),
pp. 169–178. Sep. 2015, pp. 1310–1321.
[129] A. C. Yao, “Protocols for secure computations,” in Proc. 23rd Annu. [155] J. Hamm, Y. Cao, and M. Belkin, “Learning privately from multiparty
Symp. Found. Comput. Sci., Nov. 1982, pp. 160–164. data,” in Proc. ICML, 2016, pp. 555–563.
[130] D. Demmler, T. Schneider, and M. Zohner, “ABY—A framework for [156] L. Lyu and C.-H. Chen, “Differentially private knowledge distillation
efficient mixed-protocol secure two-party computation,” in Proc. Netw. for mobile analytics,” in Proc. 43rd Int. ACM SIGIR Conf. Res.
Distrib. Syst. Secur. Symp., 2015, pp. 1–15. Develop. Inf. Retr., Jul. 2020, pp. 1809–1812.
[131] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise [157] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially
to sensitivity in private data analysis,” in Proc. Theory Cryptography private empirical risk minimization,” J. Mach. Learn. Res., vol. 12,
Conf., 2006, pp. 265–284. pp. 1069–1109, Mar. 2011.
[132] C. Dwork and A. Roth, “The algorithmic foundations of differen- [158] B. Han, I. W. Tsang, and L. Chen, “On the convergence of a family
tial privacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, of robust losses for stochastic gradient descent,” in Machine Learning
pp. 211–407, Aug. 2014. and Knowledge Discovery in Databases. 2016, pp. 665–680.
[159] S. Shen, S. Tople, and P. Saxena, “A uror: Defending against poisoning
[133] I. Damgård, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computa-
attacks in collaborative deep learning systems,” in Proc. 32nd Annu.
tion from somewhat homomorphic encryption,” in Proc. Annu. Cryptol.
Conf. Comput. Secur. Appl., Dec. 2016, pp. 508–519.
Conf., 2012, pp. 643–662. [160] R. Guerraoui et al., “The hidden vulnerability of distributed learning
[134] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig- in byzantium,” in Proc. ICML, 2018, pp. 3521–3530.
ital signatures and public-key cryptosystems,” Commun. ACM, vol. 21, [161] C. Wu, X. Yang, S. Zhu, and P. Mitra, “Mitigating backdoor attacks
no. 2, pp. 120–126, Feb. 1978. in federated learning,” 2020, arXiv:2011.01767.
[135] S. Goryczka and L. Xiong, “A comprehensive comparison of multiparty [162] C. Xie, M. Chen, P.-Y. Chen, and B. Li, “CRFL: Certifiably robust
secure additions with differential privacy,” IEEE Trans. Dependable federated learning against backdoor attacks,” in Proc. ICML, 2021,
Sec. Comput., vol. 14, no. 5, pp. 463–477, Oct. 2015. pp. 11372–11382.
[136] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. Songhori, T. Schneider, [163] L. Chen, H. Wang, Z. Charles, and D. Papailiopoulos, “Draco:
and F. Koushanfar, “Chameleon: A hybrid secure computation frame- Byzantine-resilient distributed training via redundant gradients,” in
work for machine learning applications,” in Proc. Asia Conf. Comput. Proc. ICML, 2018, pp. 903–912.
Commun. Secur., May 2018, pp. 707–721. [164] C. Xie, O. Koyejo, and I. Gupta, “Generalized Byzantine-tolerant
[137] V. Rastogi and S. Nath, “Differentially private aggregation of distrib- SGD,” 2018, arXiv:1802.10116.
uted time-series with transformation and encryption,” in Proc. ACM [165] D. Alistarh, Z. Allen-Zhu, and J. Li, “Byzantine stochastic gradient
SIGMOD Int. Conf. Manage. data, Jun. 2010, pp. 735–746. descent,” in Proc. NIPS, 2018, pp. 4613–4623.
[138] E. Shi, H. Chan, E. Rieffel, R. Chow, and D. Song, “Privacy-preserving [166] X. Xu and L. Lyu, “A reputation mechanism is all you need: Collabo-
rative fairness and adversarial robustness in federated learning,” 2020,
aggregation of time-series data,” in Proc. Annu. Netw. Distrib. Syst.
Secur. Symp. (NDSS), 2011, pp. 1–17. arXiv:2011.10464.
[167] J. Steinhardt, M. Charikar, and G. Valiant, “Resilience: A criterion for
[139] G. Ács and C. Castelluccia, “I have a dream! (differentially private learning in the presence of arbitrary outliers,” 2017, arXiv:1703.04940.
smart metering),” in Proc. Int. Workshop Inf. Hiding. Berlin, Germany: [168] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,”
Springer, 2011, pp. 118–132. IEEE Trans. Neural Netw. Learn. Syst., early access, Jun. 22, 2022,
[140] L. Lyu, K. Nandakumar, B. Rubinstein, J. Jin, J. Bedo, and doi: 10.1109/TNNLS.2022.3182979.
M. Palaniswami, “PPFA: Privacy preserving fog-enabled aggregation in [169] B. Wang et al., “Neural cleanse: Identifying and mitigating backdoor
smart grid,” IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3733–3744, attacks in neural networks,” in Proc. IEEE Symp. Secur. Privacy (SP),
Aug. 2018. May 2019, pp. 707–723.
[141] V. Chen, V. Pastro, and M. Raykova, “Secure computation for machine [170] H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box
learning with SPDZ,” 2019, arXiv:1901.00329. trojan detection and mitigation framework for deep neural networks,”
[142] P. Mohassel and P. Rindal, “ABY 3: A mixed protocol framework for in IJCAI, vol. 2019, pp. 4658–4664.
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. [171] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against
Secur., Oct. 2018, pp. 35–52. backdooring attacks on deep neural networks,” in Proc. Int. Symp. Res.
[143] J. Li, M. Khodak, S. Caldas, and A. Talwalkar, “Differentially private Attacks, Intrusions, Defenses, 2018, pp. 273–294.
meta-learning,” 2019, arXiv:1909.05830. [172] T. J. L. Tan and R. Shokri, “Bypassing backdoor detection algorithms
[144] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and in deep learning,” in Proc. IEEE Eur. Symp. Secur. Privacy (EuroS&P),
U. Erlingsson, “Scalable private learning with pate,” in Proc. ICLR, Sep. 2020, pp. 175–183.
2018, pp. 1–34. [173] P. Zhao, P.-Y. Chen, P. Das, K. N. Ramamurthy, and X. Lin, “Bridging
mode connectivity in loss landscapes and adversarial robustness,” 2020,
[145] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,”
arXiv:2005.00060.
2020, arXiv:2003.02133. [174] Y. Yao, H. Li, H. Zheng, and B. Y. Zhao, “Latent backdoor attacks on
[146] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and deep neural networks,” in Proc. ACM SIGSAC Conf. Comput. Commun.
statistical minimax rates,” in Proc. 54th IEEE Annu. Symp. Found. Secur., Nov. 2019, pp. 2041–2055.
Comput. Sci., Oct. 2013, pp. 429–438. [175] S. Andreina, G. A. Marson, H. Mollering, and G. Karame, “BaFFLe:
[147] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, Backdoor detection via feedback-based federated learning,” in Proc.
“Our data, ourselves: Privacy via distributed noise generation,” in Proc. IEEE 41st Int. Conf. Distrib. Comput. Syst. (ICDCS), Jul. 2021,
Annu. Int. Conf. Theory Appl. Cryptograph. Techn., 2006, pp. 486–503. pp. 852–863.
[148] L. Lyu, X. He, and Y. Li, “Differentially private representation for NLP: [176] H. Chang, V. Shejwalkar, R. Shokri, and A. Houmansadr, “Cronus:
Formal guarantee and an empirical study on privacy and fairness,” in Robust and heterogeneous collaborative learning with black-box knowl-
Proc. Findings Assoc. Comput. Linguistics (EMNLP), 2020, pp. 1–11. edge transfer,” 2019, arXiv:1912.11279.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: PRIVACY AND ROBUSTNESS IN FL: ATTACKS AND DEFENSES 8745
[177] C. Chen, L. Lyu, H. Yu, and G. Chen, “Practical attribute reconstruction [203] D. K. Dennis, T. Li, and V. Smith, “Heterogeneity for the win: One-
attack against federated learning,” IEEE Trans. Big Data, early access, shot federated clustering,” in Proc. Int. Conf. Mach. Learn., 2021,
Mar. 15, 2022, doi: 10.1109/TBDATA.2022.3159236. pp. 2611–2620.
[178] J. Jin, J. Ren, Y. Zhou, L. Lyu, J. Liu, and D. Dou, “Accelerated [204] J. Zhang, C. Chen, B. Li, L. Lyu, S. Wu, J. Xu, S. Ding, and C. Wu,
federated learning with decoupled adaptive optimization,” in Proc. “A practical data-free approach to one-shot federated learning with
ICML, 2022, pp. 10298–10322. heterogeneity,” 2021, arXiv:2112.12371.
[179] Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu, “On the [205] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar, “The security
convergence and robustness of adversarial training,” in Proc. ICML, of machine learning,” Mach. Learn., vol. 81, no. 2, pp. 121–148, 2010.
vol. 1, 2019, p. 2. [206] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for
[180] Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving data poisoning attacks,” in Proc. NIPS, 2017, pp. 3517–3529.
adversarial robustness requires revisiting misclassified examples,” in
Proc. ICLR, 2019, pp. 1–14.
[181] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry,
“Robustness may be at odds with accuracy,” in Proc. ICLR, 2019,
pp. 1–14.
[182] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learning
for the Internet of Things,” IEEE Trans. Ind. Informat., vol. 15, no. 7,
pp. 4206–4215, Jul. 2019.
[183] L. Lyu et al., “Towards fair and privacy-preserving federated deep mod- Lingjuan Lyu (Member, IEEE) received the
els,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 11, pp. 2524–2541, Ph.D. degree from The University of Melbourne,
Mar. 2020. Melbourne, VIC, Australia.
[184] L. Lyu, Y. Li, K. Nandakumar, J. Yu, and X. Ma, “How to democratise She is currently a Senior Research Scientist and
and protect AI: Fair and differentially private decentralised deep the Team Leader with Sony AI, Tokyo, Japan. She
learning,” IEEE Trans. Dependable Secure Comput., vol. 19, no. 2, had published over 50 papers in top conferences and
pp. 1003–1017, Apr. 2020. journals, including NeurIPS, ICLR, ICML, Nature,
[185] L. Lyu, J. C. Bezdek, J. Jin, and Y. Yang, “FORESEEN: Towards differ- and so on. Her current research interest is trustwor-
entially private deep inference for intelligent Internet of Things,” IEEE thy artificial intelligence (AI).
J. Sel. Areas Commun., vol. 38, no. 10, pp. 2418–2429, Oct. 2020. Dr. Lyu was a winner of the IBM Ph.D. Fellowship
[186] X. Pan, M. Zhang, S. Ji, and M. Yang, “Privacy risks of general-purpose Worldwide. Her works had won several best paper
language models,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2020, awards and oral presentations from top conferences.
pp. 1314–1331.
[187] T. Dong, B. Zhao, and L. Lyu, “Privacy for free: How does dataset
condensation help privacy?” in Proc. ICML, 2022, pp. 1–19.
[188] C. Chen, Y. Liu, X. Ma, and L. Lyu, “CalFAT: Calibrated feder-
ated adversarial training with label skewness,” in Proc. NIPS, 2022,
pp. 1–16.
[189] X. He, Q. Xu, L. Lyu, F. Wu, and C. Wang, “Protecting intellectual
property of language generation APIs with lexical watermark,” in Proc.
AAAI, 2022, pp. 1–9. Han Yu (Senior Member, IEEE) received the Ph.D.
[190] N. Truong, K. Sun, S. Wang, F. Guitton, and Y. Guo, “Privacy degree from the School of Computer Science and
preservation in federated learning: An insightful survey from the GDPR Engineering, NTU, Singapore.
perspective,” Comput. Secur., vol. 110, Nov. 2021, Art. no. 102402. He is a Nanyang Assistant Professor (NAP) in
[191] K. Cheng et al., “SecureBoost: A lossless federated learning frame- the School of Computer Science and Engineering
work,” IEEE Intell. Syst., vol. 36, no. 6, pp. 87–98, Dec. 2021. (SCSE), Nanyang Technological University (NTU), .
[192] Z. Tian, R. Zhang, X. Hou, J. Liu, and K. Ren, “FederBoost: Private He held the prestigious Lee Kuan Yew Post-Doctoral
federated learning for GBDT,” 2020, arXiv:2011.02796. Fellowship (LKY PDF), from 2015 to 2018. He
has published over 200 research papers and book
[193] X. Jin, P.-Y. Chen, C.-Y. Hsu, C.-M. Yu, and T. Chen, “Catastrophic
chapters in leading international conferences and
data leakage in vertical federated learning,” in Proc. NIPS, vol. 2021,
journals. He is a coauthor of the book Federated
pp. 994–1006.
Learning - the first monograph on the topic of federated learning. His research
[194] X. Xu, L. Lyu, X. Ma, C. Miao, C. S. Foo, and B. K. H. Low, “Gradient focuses on trustworthy federated learning.
driven rewards to guarantee fairness in collaborative machine learning,” Dr.Yu is a Senior Member of CCF. His research works have won multiple
in Proc. NIPS, 2021, pp. 16104–16117. awards from conferences and journals.
[195] L. Lyu, X. Xu, Q. Wang, and H. Yu, “Collaborative fairness in federated
learning,” in Federated Learning. Springer, 2020, pp. 189–204.
[196] Q. Yang, L. Fan, and H. Yu, Federated Learning: Privacy Incentive.
Springer, 2020.
[197] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim,
“Incentive design for efficient federated learning in mobile networks:
A contract theory approach,” in Proc. IEEE VTS Asia Pacific Wireless
Commun. Symp. (APWCS), Aug. 2019, pp. 1–5.
[198] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, Xingjun Ma received the bachelor’s degree from
“Reliable federated learning for mobile networks,” IEEE Wireless
Jilin University, Changchun, China, the master’s
Commun., vol. 27, no. 2, pp. 72–80, Feb. 2020.
degree from Tsinghua University, Beijing, China,
[199] S. Warnat-Herresthal et al., “Swarm learning for decentralized and and the Ph.D. degree from The University of
confidential clinical machine learning,” Nature, vol. 594, no. 7862, Melbourne, Melbourne, VIC, Australia.
pp. 265–270, 2021. He was a Lecturer with the School of Informa-
[200] Y. Liu, X. Yuan, Z. Xiong, J. Kang, X. Wang, and D. Niyato, tion Technology, Deakin University, Geelong, VIC,
“Federated learning for 6G communications: Challenges, methods, Australia. He was a Post-Doctoral Research Fellow
and future directions,” China Commun., vol. 17, no. 9, pp. 105–118, with the School of Computing and Information Sys-
Sep. 2020. tems, The University of Melbourne. He is currently
[201] N. Guha, A. Talwalkar, and V. Smith, “One-shot federated learning,” an Associate Professor of computer science with
2019, arXiv:1902.11175. Fudan University, Shanghai, China. He works in the areas of adversarial
[202] Q. Li, B. He, and D. Song, “Practical one-shot federated learning for machine learning, deep learning, artificial intelligence (AI) security, and data
cross-silo setting,” 2020, arXiv:2010.01017. privacy.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.
8746 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 7, JULY 2024
Chen Chen received the B.S. degree in computer Qiang Yang (Fellow, IEEE) received the B.Sc.
science from the Chu Kochen Honors College, degree in astrophysics from Peking University,
Zhejiang University, Hangzhou, China, in 2017, Beijing, China, in 1982, and the M.Sc. degree in
where he is currently pursuing the Ph.D. degree with astrophysics and the Ph.D. degree in computer sci-
the College of Computer Science. ence from the University of Maryland, College Park,
He is currently an Intern with Sony AI, Tokyo, MD, USA, in 1985 and 1989, respectively.
Japan. His research interests include federated learn- He was a Faculty Member with the Uni-
ing, adversarial training, multilabel learning, and versity of Waterloo, Waterloo, ON, Canada,
recommendation systems. from 1989 to 1995, and Simon Fraser University,
Burnaby, BC, Canada, from 1995 to 2001. He was
the Founding Director of Huawei’s Noah’s Ark Lab,
Hong Kong, from 2012 to 2014 and a Co-Founder of 4Paradigm Corporation,
Beijing, an artificial intelligence (AI) platform company. He is currently the
Head of the AI Department and the Chief AI Officer of WeBank, Shenzhen,
China, and a Chair Professor with the Computer Science and Engineering
(CSE) Department, The Hong Kong University of Science and Technology
(HKUST), Hong Kong, where he was a former Head of the CSE Department
and the Founding Director of the Big Data Institute from 2015 to 2018.
He is the author of several books, including Intelligent Planning (Springer),
Lichao Sun received the Ph.D. degree in computer Crafting Your Research Future (Morgan & Claypool), and Constraint-Based
science from the University of Illinois Chicago, Design Recovery for Software Engineering (Springer). His research interests
Chicago, IL, USA, in 2020, under the supervision include AI, machine learning, and data mining, especially in transfer learning,
of Prof. Philip S. Yu. automated planning, federated learning, and case-based reasoning.
He is currently an Assistant Professor with the Dr. Yang has served as an Executive Council Member of the Advancement of
Department of Computer Science and Engineering, AI (AAAI) from 2016 to 2020. He is a fellow of several international societies,
Lehigh University, Bethlehem, PA, USA. He has including ACM, AAAI, IEEE, IAPR, and AAAS. He was a recipient of several
published more than 45 research articles in top awards, including the 2004/2005 ACM KDDCUP Championship, the ACM
conferences and journals, such as CCS, USENIX- SIGKDD Distinguished Service Award in 2017, and the AAAI Innovative
Security, NeurIPS, KDD, ICLR, the Advancement AI Applications Awards in 2018 and 2020. He was the Founding Editor-
of AI (AAAI), the International Joint Conference on in-Chief of the ACM Transactions on Intelligent Systems and Technology
AI (IJCAI), ACL, NAACL, TII, TNNLS, and TMC. His research interests (ACM TIST) and IEEE T RANSACTIONS ON B IG D ATA (IEEE TBD). He has
include security and privacy in deep learning and data mining. He mainly served as the President of the International Joint Conference on AI (IJCAI)
focuses on artificial intelligence (AI) security and privacy, social networks, from 2017 to 2019.
and NLP applications.
Philip S. Yu (Life Fellow, IEEE) received the
B.S. degree in electrical engineering (E.E.) from
the National Taiwan University, New Taipei, Taiwan,
in 1992, the M.S. and Ph.D. degrees in E.E. from
Stanford University, Stanford, CA, USA, in 1976
and 1978, respectively, and the M.B.A. degree from
New York University, New York, NY, USA, in 1982.
He is currently a Distinguished Professor of com-
Jun Zhao (Member, IEEE) received the bachelor’s puter science with the University of Illinois Chicago
degree from Shanghai Jiao Tong University, (UIC), Chicago, IL, USA, and also holds the Wexler
Shanghai, China, in July 2010, and the Ph.D. deg- Chair in Information Technology. Before joining
ree in electrical and computer engineering from UIC, he was with IBM, USA, where he was the Manager of the Software
Carnegie Mellon University (CMU), Pittsburgh, Tools and Techniques Department, Watson Research Center. He has published
PA, USA, in May 2015, affiliating with CMU’s more than 1200 papers in refereed journals and conferences. He holds or has
renowned CyLab Security & Privacy Institute. applied for more than 300 U.S. patents. His research interest is on big data,
He is currently an Assistant Professor with the including data mining, data stream, database, and privacy.
School of Computer Science and Engineering, Dr. Yu is a fellow of the ACM. He was a recipient of the ACM
Nanyang Technological University (NTU), SIGKDD 2016 Innovation Award for his influential research and scientific
Singapore. Before joining NTU first as a contributions to mining, fusion, and anonymization of big data, the IEEE
Post-Doctoral Researcher with Xiaokui Xiao and then as a Faculty Member, Computer Society’s 2013 Technical Achievement Award for pioneering and
he was a Post-Doctoral Researcher and Arizona Computing PostDoc Best fundamentally innovative contributions to the scalable indexing, querying,
Practices Fellow with Arizona State University, Tempe, AZ, USA. His searching, mining, and anonymization of big data, and the Research Contribu-
research interests include communication networks, security/privacy, and tions Award from IEEE International Conference on Data Mining (ICDM) in
artificial intelligence (AI). 2003 for his pioneering contributions to the field of data mining. He received
Dr. Zhao’s coauthored papers received the Best Paper Award (IEEE the ICDM 2013 10-Year Highest-Impact Paper Award and the EDBT Test of
T RANSACTIONS) by the IEEE Vehicular Society (VTS) Singapore Chapter Time Award in 2014. He was the Editor-in-Chief of ACM Transactions on
in 2019 and the Best Paper Award in the EAI International Conference on Knowledge Discovery from Data from 2011 to 2017 and IEEE T RANSAC -
6G for Future Wireless Networks (EAI 6GN) 2020. TIONS ON K NOWLEDGE AND D ATA E NGINEERING from 2001 to 2004.
Authorized licensed use limited to: Vietnam National University. Downloaded on December 07,2024 at 07:18:32 UTC from IEEE Xplore. Restrictions apply.