0% found this document useful (0 votes)
31 views15 pages

Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks

Uploaded by

rhy20060225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks

Uploaded by

rhy20060225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO.

12, DECEMBER 2024 12635

Does Differential Privacy Really Protect Federated


Learning From Gradient Leakage Attacks?
Jiahui Hu , Jiacheng Du, Zhibo Wang , Senior Member, IEEE, Xiaoyi Pang , Yajie Zhou, Peng Sun ,
and Kui Ren , Fellow, IEEE

Abstract—Federated Learning (FL) is susceptible to the gradient


leakage attack (GLA), which can recover local private training data
from the shared gradients or model updates. To ensure privacy,
differential privacy is applied in FL by clipping and adding noise
to local gradients (i.e., Local Differential Privacy (LDP)) or the
global model update (i.e., Central Differential Privacy (CDP)).
However, the effectiveness of DP in defending GLAs needs to be
thoroughly investigated since some works briefly verify that DP
can guard FL against GLAs while others question its defense
capability. In this paper, we empirically evaluate CDP and LDP on
the resistance of GLAs, and pay close attention to the trade-offs
between privacy and utility in FL. Our findings reveal that: 1)
existing GLAs can be defended by CDP using a per-layer clipping
strategy and LDP with a reasonable privacy guarantee and 2) both Fig. 1. CDP and LDP in FL.
CDP and LDP ensure the trade-off between privacy and utility
in training shallow model, but cannot guarantee this trade-off in
deeper model training (e.g., ResNets). Triggered by the crucial role data to drive intelligence, and edge intelligence will become a
of clipping operation for DP, we propose an improved attack that key solution in the IoT ecosystem. Complementing the rise of
incorporates the clipping operation into existing GLAs without edge intelligence, federated learning (FL) [1] is a decentralized
requiring additional information. The experimental results show
our attack can destruct the protection of CDP and weaken the machine learning technique where multiple clients collectively
effectiveness of LDP. Overall, our work validates the effectiveness train a model without sending their local data to a central server.
as well as reveals the vulnerability of DP under GLAs. We hope this In each communication round, all the selected clients would
work can provide guidance on utilizing DP for defending against receive a global model from the server and use their private data
GLA in FL and inspire the design of future privacy-preserving FL. to train local models, and then upload the weights or gradients to
Index Terms—Differential privacy, federated learning, gradient the server. The server aggregates local updates to produce a new
leakage attack. global model according to the predefined aggregation rule and
sends it to clients for the next round. Compared to centralized
machine learning, FL significantly alleviates clients’ privacy
I. INTRODUCTION concerns since it does not require centralizing their private data.
HE Internet of Things (IoT) has ushered in an era where However, FL still faces some privacy risks. For example, an
T billions of devices are connected to the internet, generating
vast amounts of data. The promise of IoT lies in leveraging this
attacker can launch membership inference attacks [2], [3], [4],
[5], property inference attacks [6], [7], [8] and data reconstruc-
tion attacks [9], [10], [11], [12] to infer sensitive information
of training data. Besides, it is especially noteworthy that the
Manuscript received 28 September 2023; revised 27 May 2024; accepted
18 June 2024. Date of publication 24 June 2024; date of current version 5 adversary can reconstruct victim clients’ private training data
November 2024. This work was supported in part by National Key R&D from the shared gradients by gradient leakage attack (GLA) [9],
Program of China under Grant 2021ZD0112803, in part by the National Natural since the shared model parameters or gradients contain rich
Science Foundation of China under Grant 62122066, Grant U20A20182, Grant
61872274, and Grant 62102337, in part by the Key R&D Program of Zhejiang information about the private training data [13].
under Grant 2024C01164, Grant 2022C01018, in part by the Natural Science To further protect data privacy, differential privacy (DP) is
Foundation of Hunan Province, China under Grant 2023JJ40174, and in part by adopted in FL to clip and perturb sensitive information (e.g.,
the Young Elite Scientists Sponsorship Program by CAST No. 2023QNRC001.
Recommended for acceptance by F. Wu. (Corresponding author: Zhibo Wang.) gradients). As shown in Fig. 1, there are two forms of DP in
Jiahui Hu, Jiacheng Du, Zhibo Wang, Xiaoyi Pang, Yajie Zhou, and Kui existing differentially private FL (DP-FL) [14], [15], [16], [17]:
Ren are with the State Key Laboratory of Blockchain and Data Security, 1) Central Differential Privacy (CDP), where the DP mechanism
Zhejiang University, Hangzhou 310027, China, and also with the School of
Cyber Science and Technology, Zhejiang University, Hangzhou 310027, China is performed on the trusted central server. Clients clip their model
(e-mail: [email protected]; [email protected]; [email protected]; xy- updates before sending them to the server. Then, to prevent
[email protected]; [email protected]; [email protected]). privacy breaches of models from external attackers, the server
Peng Sun is with the College of Computer Science and Electronic Engineering,
Hunan University, Changsha 410082, China (e-mail: [email protected]). aggregates local model updates and adds noise to the aggregated
Digital Object Identifier 10.1109/TMC.2024.3417930 update, and broadcasts the obfuscated global model to clients. 2)
1536-1233 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12636 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

Local Differential Privacy (LDP), where the server is untrusted superiority of clipping operation and per-layer clipping
and the DP mechanism is performed on clients. Specifically, to manner on the resilience to GLAs.
prevent privacy breaches from the untrusted server, clients clip r We propose a novel GLA approach incorporated with the
and add noise to the gradient during the training before sharing clipping operation to enhance the attack effect against
the model update with the untrusted server. DP-FL. The results show that our proposed attack can
Prior works have shown that DP can address the membership completely recover private training data from FL under
inference attacks and property inference attacks [18], [19], [20], CDP and reveal more information than SOTA attacks under
[21]. As an emerging privacy attack paradigm, the reaction of LDP, which verifies the vulnerability of DP-FL.
gradient leakage attacks to DP-FL has not been thoroughly The remainder of this paper is organized as follows. In
investigated. On the one hand, previous studies [16], [17], [22] Section II, a review of GLAs and privacy preservation technolo-
have proposed to leverage DP to defend against GLA in FL. gies in FL is presented. In Section III, we introduce the attack
However, the effectiveness is demonstrated by only a limited process of GLAs, typical GLA algorithms and differentially pri-
evaluation conducted based on relatively early GLA attacks [9], vate FL. In Section IV, a comprehensive evaluation of CDP and
[10]. On the other hand, some studies [23], [24] have indicated LDP against GLAs is conducted. The proposed GLA approach
that DP may not be sufficient to provide a defense against targeting DP-FL and its corresponding performance evaluation
GLA in FL. However, the [23] only separately evaluated the are presented in Section V. In Section VI, we discuss the existing
performance of gradient clipping and gradient perturbation, DP-FL techniques and recommend several strategies to defend
and the [24] evaluated the performance of LDP with a small against GLAs. Finally, the paper is concluded in Section VII.
number of experiments and did not make adjustments to privacy
parameters, such as the clipping value. Sincerely, they do not
conduct a comprehensive evaluation of LDP and CDP, thus can II. RELATED WORK
not fully demonstrate the ineffectiveness of DP. The conclusions A. Gradient Leakage Attacks
of the above two types of works contradict each other, and neither
of them has sufficient evidence. Hence, it’s urgent to evaluate The privacy vulnerability of gradients is discovered by Wang
the effectiveness of CDP and LDP in resisting GLAs in FL et al. [13], who first reconstructed original training images from
comprehensively. shared updates in FL. After that, GLA is formally proposed in [9]
In this paper, we concentrate on answering the following and immediately improved in [10], in which the attackers can
question: Does Differential Privacy Really Protect Federated recover pixel-level training images and token-level training texts
Learning from Gradient Leakage Attacks? Particularly, we con- through the model’s corresponding shared gradients. Most ex-
sider this question from the perspective of the trade-off between isting GLAs are optimization-based attacks, where the attacker
privacy and utility. That is, we aim to investigate whether reconstructs the training sample by generating a dummy input
DP methods can protect FL from GLAs without significantly that can minimize the distance between the original gradient and
reducing model accuracy. To this end, we first comprehen- the dummy gradient derived from the dummy input. Based on
sively evaluate the performance of DP-FL against GLAs by this, some recent GLAs focused on more complicated learning
changing their privacy parameters (e.g., noise multiplier, privacy tasks (e.g., transformer [25], [26], deeper networks [11]) and
budget, clipping norm), privacy operations (e.g., clipping and achieve better reconstruction performance with the averaged
perturbation), and clipping strategies (e.g., layer-wise clip and gradients of large batch and high-resolution inputs [11], [12].
model-wise clip). Then, we explore the vulnerability of DP-FL Besides, there are also some analytics-based GLAs [27], [28],
under GLAs by introducing an improved attack that incorporates which can recover private training data from the gradient by
the clipping operation into existing attack schemes to undermine solving a linear system of equations, but only work well on
the defense of DP. With our evaluations, we can validate the the reconstruction of a single or few data samples. The above
effectiveness as well as reveal the vulnerability of DP in FL attacks (i.e., passive attacks) usually assume that the server is
under GLAs. We hope that this work can provide suggestions honest-but-curious, who infers the privacy of the training data
on how to better use DP for defending against GLAs in FL, while but honestly follows the process of FL.
also offering guidance for the design of privacy-preserving FL. More recent GLAs [29], [30], [31], called active attacks,
The main contributions of this paper are summarized as assume a malicious server and modify the shared model’s ar-
follows: chitecture and weights to achieve more powerful recovery of
r We evaluate the resistance of CDP and LDP to GLAs user data. However, [29] only works on simple networks (e.g.,
on various model architectures and datasets. The results CNNs with a few convolutional layers) and other works [30],
validate that CDP can effectively defend against GLAs [31] require inserting a few linear layers at the start of the original
when using the per-layer clipping strategy and LDP is ef- model, which is certainly easy to detect. Hence, considering the
fective when adopting a reasonable perturbation level (i.e., possibility of happening in real FL scenarios, we choose to focus
non-trivial noise). Moreover, CDP and LDP only achieve on passive attacks in this paper.
good performance on the trade-offs between the model
utility and privacy protection under shallow networks.
r We illustrate the influence of privacy parameters, op- B. Privacy Preservation in Federated Learning
erations and clipping strategies on the effectiveness of Existing research on privacy protection in FL can generally
DP defending against GLAs. The results demonstrate the be divided into cryptography-based and DP-based methods.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12637

The commonly used technology in cryptography-based DLG [9]: This work formulated the gradient matching prob-
methods is the secure multi-party computation (MPC). MPC can lem as (1), where Euclidean distance is used to measure the
compute the output of a function over a set of private inputs from similarity of dummy gradient and original gradient and L-BFGS
clients in a secure manner and only reveal the intended output to is adopted as the optimizer. As for the label prediction, the author
the server. Specifically, FL uses homomorphic encryption [32], synchronously updates the input and label information during
[33] or secret sharing [34], [35] in MPC technologies to optimization.
achieve secure aggregation, which can protect individual
arg min ∇θ Lθ (x, y) − g2 . (1)
gradients from server access. Cryptography-based methods x,y
can generally guarantee the security and privacy of individual
iDLG [9]: A follow-up work of DLG [9], which proposed
gradients without compromising their utility. However, these
a simple approach to extract the ground-truth labels from the
approaches exponentially increase the computation time,
gradient and then optimize the (1) with the extracted label using
communication bandwidth, and data storage. Moreover, recent
L-BFGS.
research indicates that the server can still infer privacy from
DLG and iDLG are only effective when reconstructing low-
aggregated gradients [36], [37].
resolution images with a maximum batch size of 8. Moreover,
DP [38] is also widely adopted in FL due to its lightweight
according to our experiments, they mainly have stable perfor-
overhead and flexibility. DP-based FL can be divided into two
mance in reconstructing a single image.
categories according to the trust level of the server. CDP-based
InvertGrad [11]: The authors substituted the Euclidean dis-
FL (CDP-FL) [14], [39], [40], [41] always assumes a trusted
tance with the cosine distance and added the total variation norm
server to perturb the aggregated updates, which provides higher
(TV) to the optimization function. TV serves as a regularization
model accuracy but weaker privacy guarantee. Although the
term to recover more realistic data from the average gradient of
clipping operation in CDP-FL can be performed either on the
large batches. Therefore, the optimization problem is formulated
client or server side, it is typically executed on the client side to
as:
enhance privacy. LDP-based FL (LDP-FL) [16], [42], [43], [44],
[45], [46] can work under an untrusted server and perturbs the Lθ (x, y), g
arg min 1 − + αT V RT V (x), (2)
updated gradients locally on the client side. However, despite x Lθ (x, y)g
offering a heightened level of privacy guarantee, LDP-FL tends where RT V (x) denotes the total variation and αT V controls the
to compromise on model accuracy. To improve the model utility, weight of the regularization terms in the objective function. In
the shuffle scheme has been employed in recent LDP works [44], this work, the authors inferred label information in advance and
[45], [46]. The adopted shuffler permutes the locally obfuscated used the Adam optimizer to solve the optimization problem.
updates before transmitting them to the server (i.e., introducing GI [12]: This work still used the Euclidean distance, but
the anonymization of updates from clients), allowing for less added the total variation norm, l2 -norm, and batch normalization
noise to achieve the same level of privacy. (BN) to the regularization term served as Rf . Meanwhile, the
Sincerely, DP is the mainstream privacy protection technol- regularization term jointly adds group consistency Rg by con-
ogy in FL. Prior works have shown that DP can address the mem- sidering multiple random seeds for initialization and calculating
bership inference attacks and property inference attacks [18], the averaged data as a reference. Then, the optimization problem
[19], [20], [21]. As an emerging privacy attack paradigm, the re- is
action of GLAs to DP-FL has not been thoroughly investigated.
arg min ∇θ Lθ (x, y) − g2 + αf Rf (x) + αg Rg (x). (3)
x
III. PRELIMINARIES
Among all these attacks, InvertGrad1 and GI are SOTA widely
A. GLA Algorithms used in recent works [23], [47]. The GI achieves better recon-
We introduce the workflow of optimization-based GLAs (i.e., struction performance than InvertGrad when the input has a
our focus in this paper) and present four typical GLA methods. higher resolution. However, GI is based on a strong assumption
Attack Workflow: The attacker first obtains the gradient infor- that the attacker knows the exact BatchNorm (BN) statistics
mation of the victim g and the global model with parameters θ of the private batch. The ablation analysis of BN statistics has
through eavesdropping. However, if the attacker is the server, it been well discussed in [47], and the results showed that the
can directly get the model and gradient information. Then, the reconstructed batch of high-resolution images remains elusive
attacker introduces some dummy data (x, y) (including inputs without knowing BN statistics. Moreover, the BN statistics of
x and corresponding labels y), and performs a similar model single batch samples are lost during the multiple iterations of
training on them, yielding dummy gradients ∇θ Lθ (x, y). The forward propagation in local training with federated average
attacker iteratively updates the dummy data to minimize the (FedAvg), which is commonly used in the FL community.
distance between the dummy gradient and the victim client’s Hence, in our paper, we select InvertGrad as the default attack
gradient. After a certain number of iterations, the attacker obtains scheme unless otherwise specified.
dummy data samples that are close to the victim’s private data. In the above, we have already theoretically analyzed the
The existing optimization-based attacks generally follow the differences between the InvertGrad and GI, and explained why
above process but mainly differ in the optimization objectives,
optimizer, and label prediction approaches. We then list four 1 The official implementation: https://github.com/JonasGeiping/
representative studies about GLAs. invertinggradients. [11]
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12638 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

Algorithm 1: Central Difference Privacy Based Federated


Learning (CDP-FL).

Fig. 2. Reconstruction results of ImageNet.

we chose InvertGrad as the main attack algorithm in this paper.


Next, we will mainly present the experimental results of the
two algorithms on the ImageNet [48] to visually introduce their
differences. Note that the improvement of GI over InvertGrad
comes from the use of BN information, which is actually an
over-assumption.
To compare InvertGrad and GI, we present reconstructed
images from ResNet-18 on ImageNet in Fig. 2. We evalu-
ate the impact of various image resolutions on reconstruction
performance, ranging from 32 ∗ 32 to 256 ∗ 256. We can see
that there is a certain degree of privacy leakage at any image
resolution, while the reconstructed images become increasingly
difficult to recognize after exceeding the image resolution of
64 ∗ 64. Overall, GI performs better on high-resolution images
because it incorporates BN information into the optimization
objective. In the case of a resolution of 32*32, both two attacks
exhibit relatively stable recovery effects. Considering the stable Definition 2. (Gaussian mechanism [51]): Given a function
performance of InvertGrad in FL, we select 32*32 as the default f : D → R over domain D, the mechanism M satisfies (, δ)-
image resolution in our evaluation. differential privacy, if
M(D) = f (D) + N (0, σ 2 Sf2 )
B. Federated Learning With Differential Privacy where σ is the noise multiplier and Sf = maxD,D f (D) −
Differential privacy is a mathematical privacy technique in f (D )2 is the sensitivity of f (i.e., the maximum possible
which every change to a single data point in a dataset results in change in the f ’s output when a single record is added or deleted
only a statistically negligible change in the algorithm’s output. from the input D). Usually, we have σ 2 > 2log(1.25/δ)/2 . For
Hence, differential privacy is widely used in the literature to high dimensional functions of the form f : D → Rd , where d ≥
protect data privacy [49], [50]. 2, each coordinate is a noise sampled according to N (0, σ 2 Sf2 )
Definition 1. (Differential Privacy [38]): A randomized and independent of the other coordinates.
mechanism mechanism M provides (, δ)-differential privacy Applying DP to FL (also machine learning) usually includes
if for any two neighboring datasets D and D that differ in only the parameter (sensitive information to be protected) clipping
a single record, ∀S ⊆ Range(M), and perturbation operations. Specifically, clipping is used to
upper-bounds the sensitivity Sf of the query function f by
restricting the l2 -norm of queried value from one record in
P r(M(D) ∈ S) ≤ e P r(M(D ) ∈ S) + δ
dataset, and perturbation refers to adding noise to the sensitive
query information.
The  is the privacy budget and a smaller  will indicate a CDP vs LDP: Following [19], DP-FL can be divided into two
smaller difference between the output of M(D) and M(D ), categories, i.e., CDP-FL and LDP-FL. 1) In CDP-FL, the server
which means higher privacy protection. The parameter δ (0 ≤ is trusted and perturbs the aggregated updates with Gaussian
δ < 1) guarantees that the probability of privacy loss exceeding noise. 2) In LDP-FL, the server is untrusted, and clients per-
 is bounded by δ, which should be smaller than 1/|D|. turb local updates before sharing them with the server. More
The Gaussian mechanism is a classical DP mechanism and specifically, each client implements DP-SGD [52] (a popular
widely used in machine learning, which adds zero-mean Gaus- method for DP-Training in practice) with the local training
sian noise with a certain variance in each coordinate of the output dataset and uploads the final perturbed models or updates to
f (D). the untrusted server. Sincerely, the essential difference is that

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12639

TABLE I
ILLUSTRATION OF TWO CLIPPING STRATEGIES Algorithm 2: Local Difference Privacy based Federated
Learning (LDP-FL).

CDP performs privacy protection at the server level, while LDP


performs privacy protection at the client level. This could be
reflected in the fact that CDP typically calculates the privacy
budget for the entire FL training process, while LDP only keeps
track of the privacy budget for each individual separately.
We introduce the pseudocodes for CDP-FL in Algorithm
1 and LDP-FL in Algorithm 2. Note that the uncertainty of
whether a ”sample” (i.e., random client selection in CDP-FL or
sample subsampling in LDP-FL) has contributed can amplify the
privacy of this ”sample”. In the context of privacy-amplification-
via-sampling, the moments accountant (MA) [52] remains the
standard method implemented in many DP-Training libraries to
conduct privacy accounting (i.e., calculate the privacy budget).
Clipping Strategy: The clipping strategies are also discussed
in DP literature [14], [53], [54], and commonly include the flat
clipping (FClip) and the per-layer clipping (LClip). In the FClip,
all of the model layers are clipped equally, while each layer is
treated as a separate vector and clipped independently in the
LClip. The details of clipping strategies are presented in Table I.
Considering how to determine the Cj is non-trivial and irrelevant
to our focus in this paper, we follow [14] and simply set Cj =
√C for all layer j, where m is the number of layers of the model.
m

C. Threat Model
In our study, we aim to examine specific vulnerabilities related aims to circumvent the defenses offered by CDP, thereby per-
to GLAs in the context of FL systems employing DP techniques. forming the attack on the clipped model updates. In LDP-FL,
The threat model we consider encompasses the following as- the attacker should bypass the protection of LDP, inferring
pects, including attacker roles, attacker capabilities, and attacker the original data from the noisy model updates. Sincerely, the
objectives. attacker in LDP-FL encounters stronger defenses compared to
Attacker Roles: In CDP-FL, an external attacker eavesdrops CDP-FL.
on the communication channel and infers sensitive data from the By clearly outlining these aspects within our threat model, we
stolen parameters. In LDP-FL, the attacker could be an internal initially provide an intuitive understanding of the potential risks
entity (i.e., the server) who attempts to infer sensitive data from and vulnerabilities in both LDP and CDP settings. The further
the observed parameters. exploration of GLAs encountering the DP will be conducted
Attacker Capabilities: In principle, the server, as the internal through a comprehensive evaluation.
attacker, would have stronger attack capabilities than external
attackers. However, considering the stealthiness, this paper only
focuses on passive attack strategies, which only allow the server D. Existing Evaluation of DP on GLA
to observe and analyze parameters rather than actively manip- We mentioned above that study [24] has claimed that LDP
ulating the FL process. In this context, attackers in CDP-FL may not be sufficient to provide a defense against GLA in
and LDP-FL exhibit comparable attack capabilities, as essential FL. However, the authors only evaluated the performance of
knowledge like the global model and gradients is also transmit- FedCDP (a LDP mechanism with LClip) [16] in terms of
ted during the FL training process. utility-privacy trade-offs, based on a 10-category image clas-
Attacker Objectives: The attacker’s primary objective is to sification task on CIFAR10. Moreover, the model used therein
breach individual privacy by reconstructing original training was not explicitly disclosed. Therefore, a single experiment
data from observed model parameters. In CDP-FL, the attacker is insufficient to illustrate the performance of DP-FL in the

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12640 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

TABLE II
SUMMARY OF HYPERPARAMETERS OF ATTACK SCHEMES IN THE EVALUATION

TABLE III
SUMMARY OF AVERAGE l2 -NORM OF MODEL UPDATE (Δ) AND GRADIENT (g)
ON VARIOUS MODELS AND DATASETS
Fig. 3. Reconstruction results of existing evaluation work [24] on FedCDP
with different clipping norms.

trade-off between privacy and utility. Most importantly, and


what primarily motivates our work, the authors claim that LDP training set (50000 samples), and a testing set (10000 samples).
is insufficient to withstand GLAs, just based on experiments Tiny-ImageNet contains 100000 images of 200 classes (500 for
with a single clipping norm (C = 4). Sincerely, this is actually a each class) downsized to 32 × 32 colored images, where each
large clipping norm, which would render the clipping operation class has 500 training images, 50 validation images, and 50 test
ineffective. Based on the open code from [24],2 we replicated the images. GTSRB contains 43 classes of traffic signs, split into
evaluation experiments on FedCDP using several smaller clip- 26640 training images and 12,630 test images. The images have
ping norms and observed a substantial decline in reconstruction varying light conditions and rich backgrounds. The number of
performance. As shown in Fig. 3, the results show that FedCDP communication rounds is set as T = 200, and 10 clients are
effectively defends against GLA when C = 0.5. Hence, existing randomly selected to participate in each round.
works do not conduct a comprehensive evaluation of LDP and For all experiments about sample reconstruction, we imple-
CDP, thus cannot fully demonstrate the ineffectiveness of DP. ment the attack at the beginning of the training process and set
the training hyper-parameters of the victim as B = N = 16 by
IV. EVALUATING DP ON GRADIENT LEAKAGE ATTACKS default, where N denotes the number of samples and B denotes
In this section, we evaluate whether we can use DP to protect the batch size. The choice of a batch size of 16 aims to achieve
data privacy in FL from GLAs by clipping and adding noise a compromise between the need to ensure the effectiveness of
to model updates.3 In particular, we aim to inform whether GLAs, typically favored by smaller batch sizes, and the practical
CDP and LDP can achieve a good trade-off between utility and considerations of FL systems, which generally benefit from
resistance to such attacks. Combined, we evaluate the impact of larger batch sizes. Moreover, we use SGD as the local optimizer
privacy parameters (e.g., noise multiplier σ, clipping norm C) with the learning rate (lr) 0.01 by default and set lr to 0.05
and privacy operations (e.g., clipping and perturbation) in DP when we perform the whole FL training process in Tables VI
on the resistance to GLAs. and VIII. For each client, we set the epoch of local training
E = 5, except for E = 1 for the victim to guarantee the attack
A. Experiment Settings performance. Table II summarizes the detailed setup of attack
hyperparameters used in each figure or table in the evaluation.
All experiments are implemented by PyTorch and performed Table III summarizes the average l2 -norm of the update (CDP)
on a workstation equipped with Intel(R) Xeon(R) Gold 6248 and gradient (LDP) of 10 independent runs when batch size is
CPU @ 2.50 GHz, 251 GB RAM, and four GeForce RTX 2080Ti 16, which gives the guidance of setup of clipping norm C in the
cards. evaluation. Generally, the clipping operation will work only if
We consider an FL system with 100 clients in total and the clipping norm C is smaller than the l2 -norm of the private
evenly distribute the training data among them. We conduct our value to be protected.
evaluation on four model architectures (LeNet [55], ResNet-18,
ResNet-34 [56], and VGG-11 [57]) and six benchmark image B. Evaluation of CDP
datasets (MNIST [58], EMNIST [59], Tiny-ImageNet [60], GT-
SRB [61], CIFAR10, and CIFAR100 [62]). For the MNIST, we In CDP-FL, each client clips the local update after local
resize these images to 32 ∗ 32 and divide them into a training training and then sends the clipped update to the server for further
set (60000 samples), and a testing set (10000 samples). For aggregation and global perturbation. In this case, the attacker
the EMNIST, images are resized to 32 ∗ 32 and split into a can eavesdrop and get the client’s shared clipped update before
training set (387361 samples) and a testing set (23941 sam- the global perturbation. Therefore, only the clipping operation
ples). For the CIFAR10 and CIFAR100, we divide them into a plays a role in protecting gradients from attacks, and it is hard
for the perturbation operation to make a difference because it
2 The happens on the server side. For external attackers, the obfuscated
official implementation: https://github.com/KAI-YUE/rog.
3 The model update can be seen as the sum of gradients after several iterations global model has no negative impact on the performance of
of local training. GLAs because the stolen gradients are also generated on this

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12641

TABLE V
(FOR SEC. IV-B ANS.2) QUALITY OF RECONSTRUCTED SAMPLES (I.E., LPIPS
↑) FROM DIFFERENT MODELS AND DATASETS UNDER CDP WITH TWO
CLIPPING STRATEGIES

Fig. 4. (For Sec. IV-B Ans.1) Batch-Reconstruction from ResNet-18 on CI- TABLE VI
FAR100 when gradients are protected by CDP. The clipping norm C = 0.1 in (FOR SEC. IV-B ANS.3) TEST ACCURACY (%) OF VARIOUS IMAGE
both LClip and FClip. CLASSIFICATION TASKS UNDER DIFFERENT CLIPPING NORMS AND CLIPPING
STRATEGIES IN CDP
TABLE IV
(FOR SEC. IV-B ANS.2) QUALITY OF RECONSTRUCTED SAMPLES FROM
RESNET-18 ON CIFAR100 UNDER DIFFERENT CLIPPING STRATEGIES AND
NORMS

obfuscated global model. For that, in this part, we maintain our


focus on the impact of the clip strategy on the effectiveness of
CDP, and set out to answer the following research questions: quality. Table IV shows three metrics’ average results of batch-
r RQ1: Can CDP protect FL from gradient leakage attacks? reconstruction with batch size 16 under different clipping strate-
r RQ2: Which clipping strategy is better for defense, FClip gies and norms. Our results suggest that the reconstruction
or LClip? quality under the protection of LClip is worse than that of
r RQ3: Can CDP achieve the trade-off between the utility FClip even with a very small clipping norm (C = 1e − 4),
and privacy? demonstrating that LClip always performs better than FClip
Ans.1. CDP can resist gradient leakage attacks to a certain in defending against attacks. Table V shows the reconstruction
extent: To correctly perform the clipping, the clipping norm results from four model architectures and six datasets under the
C should be smaller than the l2 -norm of the model update. LClip and FClip with the similar clipping norm (i.e., C = 1
Fig. 4 shows the reconstruction results using InvertGrad when for LeNet and VGG, and C = 0.1 for ResNets). These results
the victim performs the local training with batch size 16 and further demonstrate that LClip has a better defense performance
clip the model update before uploading. After observing the than FClip. Moreover, FClip having a reasonable clipping norm
average l2 -norm of updates (about 0.3), we set C = 0.1 and see struggles to safeguard the privacy of training data from GLAs.
that LClip can make the recovered images indistinguishable by Ans.3. CDP with LClip achieves a good trade-off between util-
clipping the updates in different layers with different scales. ity and privacy under shallow models: We evaluate the impact of
Meanwhile, the FClip has little resistance to GLAs, which clipping norms on the training performance of CDP-FL under
illustrates the attack performance of InvertGrad is not affected by two clipping strategies, while all selected clipping norms are
directly decreasing the magnitude of gradients. This is because, sufficient to preserve the privacy of training samples. Table VI
considering the gradient as a high-dimensional vector, LClip shows the test accuracy of various image classification tasks
changes the direction and magnitude of the original gradient using different models and datasets when the whole FL training
at the same time, while FClip just cuts the magnitude of the procedure is protected by CDP with different clipping norms
original gradient shorter. However, the reconstructed image still that could protect privacy from the GLA (The snapshots of
leak some background feature information (e.g., Column 4, 11, visual reconstruction results are presented in Fig. 5). The results
13 in LClip). show that the training performance under the larger clipping
Ans.2. Layer Clipping achieves better than Flat Clipping on norm (i.e., less loss of original information) is not necessarily
the defense against attacks: We evaluate the resistance per- better. This is because the noise added to the aggregated update
formance of clipping strategies with different clipping norms. will be amplified by the clipping norm. Under the same privacy
We use Mean Square Error (MSE), Peak Signal Noise Rate budget, a larger clipping norm will result in a greater noise
(PSNR) [63] and Learned Perceptual Image Patch Similar- scale, resulting in lower utility and model performance. Overall,
ity (LPIPS) [64] to measure the similarity between recon- CDP with LClip achieves a better trade-off between utility and
structed and original images, which indicates the reconstruction privacy than CDP with FClip. The CDP with FClip requires an

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12642 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

Fig. 5. (For Sec. IV-B Ans.3) Batch-reconstruction from image classification


Fig. 6. (For Sec. IV-C Ans.1) Batch-Image reconstruction from ResNet-18
tasks in Table VI. We only present the results with the upper bound of clipping
trained with CIFAR100 when gradients are protected by gradient perturbation
norm Ĉ in Table VI and all C ≤ Ĉ could protect the privacy of training samples. (Cσ = 0.001), gradient flat clipping (C = 0.1) and LDP (C = 0.1, Cσ =
0.001).
TABLE VII
TEST ACCURACY OF TRAINING RESNET-18 ON CIFAR10 UNDER CDP WITH
DIFFERENT NOISE MULTIPLIERS (I.E., PRIVACY BUDGET) AND CLIPPING
NORMS then noised for privacy protection. The added noise is decided
by noise multiplier σ and clipping norm C, and thus sampled
from N (0, C 2 σ 2 ). After several steps of gradient clipping, per-
turbation, and descent, the clients upload the final obfuscated
model or updates to the server. In this DP mode, both clipping
and perturbation operations provide protection against GLAs
from the honest-but-curious server or external attackers. In this
subsection, we also answer the following research questions:
r RQ1: Can LDP protect FL from gradient leakage attacks?
r RQ2: Which privacy operation is more crucial for defense,
gradient clipping or gradient perturbation?
r RQ3: Can LDP achieve the trade-off between the utility
and privacy?
absolutely small clipping norm (C ≤ 1e − 4) to prevent GLAs,
Ans.1. LDP with gradient clipping and gradient perturba-
which completely destroys the performance of model training. In
tion can well resist gradient leakage attacks: We evaluate the
addition, the value of the clipping norm and its impact on training
performance of LDP in defending against GLAs. In LDP, a
performance also varies with different image classification tasks.
gradient will be protected by gradient clipping and then gradient
Therefore, even though it is easy to choose a clipping norm to
perturbation before uploading to the server. Fig. 6 presents the
defend against gradient attacks in CDP, how to choose a suitable
reconstruction results of gradients protected by FClip with C =
clipping norm to guarantee accuracy is still a very important
0.1, gradient perturbation with Cσ = 0.001, and both privacy
challenge in CDP.
operations. The results show that the LDP with both gradient
Table VII shows the further accuracy results of training
clipping and gradient perturbation can well resist GLAs, while
ResNet-18 on CIFAR10 under CDP (LClip) with various noise
any privacy operation alone makes the reconstructions highly
multipliers and clipping norms. We refrain from evaluating
recognizable. This is because, when the noise is relatively small
model training under CDP with FClip, as FClip cannot safeguard
(Cσ = 0.001), the gradient information still retains much of
the privacy of clients. As shown in Table IV and Fig. 5, FClip can
the original information, allowing GLAs to reconstruct the data
only protect the clipped gradients from GLAs with an extremely
based on noisy gradients. Similarly, FClip is also unable to
small clipping value (C = 1e − 4), a level at which information
disrupt the effectiveness of GLAs due to its uniform clipping
in the gradients is already compromised. Table VI also confirms
approach. However, once the gradients are clipped (i.e., reduced
that when C = 1e − 4, the model is already untrainable. We
to smaller values), the originally minor noise can effectively
can see that the LClip also degrades accuracy when the added
perturb the gradient information.
noise is zero. The accuracy drops to 52% when the clipping
Ans.2. Gradient clipping achieves better than gradient per-
norm C = 1, while the accuracy can reach to 70% if C = 10.
turbation on the resistance against gradient leakage attacks:
Note that the clipping operation in CDP does not work if the
We evaluate the resistance performance of gradient perturbation
chosen clipping norm (C = 10) is significantly larger than the
and gradient clipping in LDP to the GLA respectively. To make
l2 -norm of the model update. Moreover, the accuracy does not
the final reconstruction results more distinguishable, we use the
necessarily degrade with the decrease in privacy budget when
ImageNet resized to 64*64 as the targeted dataset, and adopt GI
the adopted clipping norm is relatively small and sufficient to
as the attack scheme. Fig. 7 shows the reconstruction results of
protect privacy (C < 1).
a single image when gradients are protected only by gradient
perturbation or gradient clipping. To disable gradient clipping,
C. Evaluation of LDP we set a clipping norm C that is much larger than the gradient’s
In LDP-FL, clients use the Differentially Private Stochastic l2 -norm, while we let the noise multiplier σ = 0 to incapacitate
Gradient Descent (DP-SGD) algorithm to train the model on the gradient perturbation. The experimental results show that
their datasets. At each step of the DP-SGD, the gradient com- even if adding non-negligible noise (Cσ = 1 and l2 -norm of the
puted from a random subset of examples would be clipped and raw gradient g = 48) compared to the original gradient, the
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12643

TABLE VIII
(FOR SEC. IV-C ANS.3) TEST ACCURACY AND RECONSTRUCTION QUALITY (LPIPS) OF DIFFERENT IMAGE CLASSIFICATION TASKS UNDER LDP WITH DIFFERENT
CLIPPING STRATEGIES

E = 5 and N = 600 for MNIST, N = 500 for CIFAR10 and


CIFAR100, N = 1000 for ImageNet, and N = 260 for GTSRB.
Hence, the privacy budget for training LeNet with MNIST
satisfies ( = 9.35, δ = 1e − 5) and that with CIFAR10 satis-
fies ( = 10, δ = 1e − 5). Table VIII shows the test accuracy
and reconstruction quality of various models, datasets under
Fig. 7. (For Sec. IV-C Ans.2) Single-Image reconstruction from ResNet-18 different clipping norm. The values of LPIPS (larger than 0.3)
trained with ImageNet on different privacy operations. indicate that the current privacy parameter settings are sufficient
to resist GLAs. The accuracy results illustrate that shallow
reconstructed image still leaks some information about the origi- model (LeNet) achieves better on the trade-off between the
nal image. Meanwhile, we can see that gradient clipping (LClip) accuracy and privacy. When training the LeNet under LDP with
can make the reconstructed image completely unrecognizable  ≤ 10, the test accuracy only drops 14.6% in LClip and 9.4%
even with a large clipping norm (C = 40). Therefore, compared in FClip on CIFAR10. However, LDP leads to 41.3% (LClip)
to gradient perturbation, gradient clipping achieves better resis- and 43.8% (FClip) accuracy loss when training the ResNet-18
tance against GLAs because it greatly degrade reconstruction on CIFAR10. As for ResNet-34 with CIFAR100, LDP makes
quality with relatively small gradient distortions. Be aware that the resulting model useless since the training gets near-zero
the aforementioned superiority of gradient clipping pertains accuracy increment for any value of C that is sufficient to protect
exclusively to LClip, while standalone FClip is ineffective in privacy. Regarding the VGG, LDP results in a 53.2% (LClip)
resisting GLAs, as demonstrated in the CDP evaluation. and 49.1% (FClip) degradation in accuracy with CIFAR10 and
Ans.3. LDP achieves better trade-off between the utility and a 44% (LClip) and 43.4% (FClip) reduction in accuracy with
privacy when training shallow models: We evaluate the per- CIFAR100. As for the GTSRB, the model accuracy drops 86.9%
formance of LDP on the trade-off between utility and defense under the LClip and 85.1% under the FClip. It is evident that
against GLAs. We use moments accountant (MA) [52] to accu- LDP significantly compromises model utility when applied to
mulate the privacy budget4 of local training, which is calculated deeper models.
on the input of sample rate (batch-size (B) / sample-size(N)), the Based on Table VIII, Table IX shows the further accuracy
number of local epochs E, and privacy multiplier σ. To make a results of training ResNet-18 on CIFAR10 under LDP with
reasonable privacy budget, we set the σ = 0.65 with the B = 16, various noise multipliers when the clipping norm C = 1. Note
that the privacy budget cannot be calculated in this case because
4 The process of batch normalization employed in ResNets results in inter-
of the batch normalization employed in the ResNet. We observe
dependence among instances within the same batch. Therefore, a privacy anal- that the accuracy does not fluctuate significantly with changes in
ysis for ResNet cannot be provided. noise levels, and adding lower levels of noise may not necessarily

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12644 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

TABLE IX A. Attack Methodology


TEST ACCURACY OF TRAINING RESNET-18 ON CIFAR10 UNDER LDP WITH
DIFFERENT NOISE MULTIPLIERS In the above, we have corroborated that clipping operation
in both CDP and LDP contributes a lot to pro against GLAs.
Intuitively, the clipping operation results in a shift of the gradient
space, thus misleading the existing optimization-based attacks.
To enhance the attack effect and suppress the defense effect pro-
vided by clipping, we propose to integrate the clipping process
into the attack to revise the search space. Specifically, we aim to
solve the following optimization problem:
improve training performance. For example, the test accuracy
with a noise multiplier of σ = 0.1, exhibits a slight decrease arg min D(g̃, T (∇θ Lθ (x, y))) + αT V (x). (4)
x
when compared to the accuracy with a higher noise level of
σ = 0.5. If DP is to be incorporated into FL, then investigating The g̃ denotes the obfuscated gradient shared from the victim
this phenomenon further would be a worthwhile pursuit for client and ∇θ Lθ (x, y) is the dummy gradient derived from the
future research. dummy sample (x, y), where x is the reconstructed result and y
denotes its label (can be inferred in advance [65]). The T (·) is the
transformation function, which implements the same clipping
D. Comparative Discussion of CDP and LDP operation as the victim. The function D(g1 , g2 ) is used to calcu-
Based on the above experimental results, we present a com- late the distance of two gradients, which can be an Euclidean dis-
parison between CDP and LDP in terms of their effectiveness tance g1 − g2 2 or cosine distance 1 − g1 , g2 /(g1 g2 ).
in defending GLAs and their impact on model utility. The T V (x) is the total variation used as a simple image prior
As for the effectiveness of privacy protection, LDP undoubt- to the overall optimization problem. By optimizing the (4),
edly offers stronger privacy guarantees for training samples by the attacker can minimize the distance between the original
adding noise to the gradient before sharing it to the server. It obfuscated gradient and the clipped dummy gradient to achieve
can be seen that the images reconstructed under CDP protec- better data reconstruction. The optimization problem can be
tion (i.e., Fig. 4) divulge more information than the images solved by Adam [66] after multiple optimization iterations. In
reconstructed under LDP protection (i.e., Fig. 6). Even so, the the following, we describe the details of the transformation
implementation of CDP with LClip can effectively protect the process for LClip and FClip, and introduce the experimental
sensitive data in original images since the reconstructed images results of the proposed attack method on CDP and LDP with
appear indistinct and are difficult to distinguish. However, CDP LClip.
implemented via FClip cannot be used to defend against GLAs Transformation Process: The core idea of the transformation
in FL systems. This is evidenced by Tables IV and VI, where process is to infer the clipping norm used at the client side,
CDP with FClip fails to protect training samples when applying and then perform the clipping operation with the same clip-
a reasonable clipping norm, while employing an excessively ping strategy. The intuition is that the l2 -norm of the gradient
small clipping norm completely undermines the model training is constrained by the clipping norm, so in turn the clipped
process. norm can be estimated from the l2 -norm of the observed
gradient.
As for the model utility, both CDP (with LClip) and LDP r LClip: For this layer-wise clipping manner, the adversary
achieve better trade-off between the utility and privacy when
training shallow models. However, CDP performs better than can calculate the l2 -norm of the observed gradient at each
LDP in model utility when privacy parameter settings are able layer, and use the maximum value as the clipping norm.
to defend against GLAs. As shown in Tables VI and VIII, After that, during the attack process, the adversary adopts
when training LeNet, CDP can achieve high model accuracy the LClip to clip the ∇θ Lθ (x, y)) at each optimization
iteration.
that is comparable to scenarios without DP, while LDP leads r FClip: For this model-wise clipping manner, the adversary
to a noticeable reduction in model accuracy. This is because
LDP must account for the worst-case privacy leakage from uses the l2 -norm of the observed gradient as the clipping
each individual, necessitating more significant noise to ensure norm, and then performs the FClip process on the derived
privacy. In CDP, the noise is added after aggregation, which gradient from the current reconstruction result.
usually allows for a higher model accuracy for the same level of Note that the proposed transformation process can be in-
privacy protection. corporated to any existing optimization-based GLAs. In our
experiments, we select the InvertGrad as the baseline and re-
strict (4) by using cosine distance to measure the similarity of
V. IMPROVED GRADIENT LEAKAGE ATTACKS AGAINST DP-FL gradients.
In this section, for DP-FL, we introduce a targeted attack
method that incorporates a clipping operation into the existing B. Experimental Results
attack strategy. Through experiments, we demonstrate the effec- Our attack destructs the protection of CDP: Since the de-
tiveness of this method. fense effectiveness of CDP with FClip is limited, we focus on

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12645

TABLE X TABLE XI
COMPARISON OF QUANTITATIVE RECONSTRUCTION RESULTS (LPIPS↓) WITH ERROR BARS OF COMPARATIVE RECONSTRUCTION RESULTS (LPIPS↓) WITH
THE INVERTGRAD ON CDP-FL WITH LCLIP UNDER VARIOUS BATCH-SIZE, THE INVERTGRAD ON CDP-FL WITH LCLIP UNDER VARIOUS BATCH-SIZE,
MODELS AND DATASETS MODELS AND DATASETS

Fig. 8. Comparison of the visual reconstruction results of four optimization-


based GLAs before and after adding clipping transformations in the attack
process when training the LeNet model on datasets CIFAR10 under the defense
of CDP with LClip (C = 10).

attacking CDP with LClip. Table X compares the performance


of the proposed attack method with InvertGrad on CDP-FL with
LClip. We can observe that the proposed attack outperforms
the InvertGrad on the reconstruction quality, and completely
recovers the original images when the client trains the shallow
model with small batch size. Moreover, the reconstructed images
are still recognizable when the batch size is increased. A smaller
LPIPS value represents better image reconstruction and, conse-
quently, severer privacy leakage. We observe that the maximum
LPIPS (the worst case) in our reconstruction results is 0.0910,
suggesting information leakage from the original image.
To demonstrate that our attack (i.e., adding clipping transfor-
mation) can stably achieve better attack performance. As shown
in Table XI, we run 10 independent comparative experiments
between the InvertGrad and our attack (i.e., InvertGrad + Clip-
ping) and present their mean, standard deviation (std.), best-case,
median and worst-case value of LPIPS. The conclusion is con-
sistent with Table X that our attack destructs the protection of
CDP.
Fig. 8 illustrates that the proposed transformation process
can be incorporated into any existing optimization-based GLAs
and significantly improve their performance on the image re- Fig. 9. Comparison of visual reconstruction results with the InvertGrad when
construction under the protection of CDP. In addition,we have training the LeNet model on datasets CIFAR10, MNIST under the defense of
LDP with LClip (C = 10).
demonstrated the effectiveness of our attack on batch reconstruc-
tion by presenting reconstruction results from batch-average
gradients under various batch sizes, which also shows the su-
periority of the InvertGrad. DLG and iDLG perform badly in incorporate the clipping transformation into the attack process
batch reconstruction and the performance of GI significantly based on an approximate clipping norm, in order to assess the
deteriorates when the target model (i.e, LeNet) is relatively effectiveness of the proposed attack method. Considering the
simple and lacks BN information. effectiveness of LClip, we continue to evaluate our attack on
Our attack weakens the protection of LDP: In the context of LDP with LClip.
LDP-FL, due to the noise added to the gradients, it is sincerely Fig. 9 shows the comparative results of the InvertGrad with
impossible to estimate the accurate clipping norm based on our attack method on LDP-FL. The results show our attack can
the observed obfuscated gradients. However, we still choose to recover the original images when the LDP works with trivially

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12646 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

additive noise (Cσ < 1e − 2), while the InvertGrad has been network (i.e., a fully connected model with two hidden layers) is
defended with Cσ = 1e − 5. This illustrates adding clipping too shallow. [42] applies LDP to FL and addresses the dimension
operation effectively improves the performance of the original dependency problem in LDP by privately selecting the top-k
InvertGrad on both CDP and LDP, even base on an approximate dimensions of gradient updates. The newly added dimension
clipping norm. However, the LDP is still effective in practice, selection mechanism effectively enhances the model’s utility
because too small noise indicates a very large privacy budget. and improves convergence. However, the additional compu-
For Cσ = 0.01 with C = 10, σ = 0.001 leads to privacy budget tational overhead of the dimension selection mechanism can
 = 3193, which is an unreasonable value in DP literature. Since be burdensome for resource-constrained devices. [43] enables
it is already difficult to attack LDP with a batch-size of 1, we will clients to customize their LDP privacy budget locally and re-
not further explore the attack capabilities with larger batch-size places LDP with condensed LDP (CLDP) to handle large-scale
as evaluated on CDP. model parameter updates. However, this work requires more
communication rounds between the cloud and clients due to the
VI. DISCUSSION high variance of the noises.
Regarding the LDP-FL techniques with a shuffle scheme,
A. CDP and LDP Techniques in FL they randomly permute the locally obfuscated updates before
Below, we discuss the existing LDP and CDP techniques in transmitting them to the server. In particular, [44] proposes a
FL, highlighting their strengths and weaknesses, to provide a new LDP mechanism for FL with DNNs, which employs data
more holistic perspective of this field and guide future research perturbation with adaptive range and parameter shuffling to each
directions. clients’ weights. This work considers the necessity of adapt-
Overview of CDP-FL Techniques: In existing CDP-FL tech- ing to varying weight ranges in different DNN layers, which
niques, [14] first incorporates CDP into the federated averaging helps reduce variance and enhance model accuracy. However,
algorithm and demonstrates the feasibility of training recur- the adaptive range setting and parameter shuffling mechanisms
rent language models with user-level DP. Almost at the same introduce additional computational overhead, which may be
time, [39] also integrates CDP into FL to prevent information challenging for resource-constrained devices. [45] proposes a
leakage about a client’s dataset from being inferred from the communication-efficient and local differentially private stochas-
shared model. Both of these works achieve user-level privacy tic gradient descent (CLDP-SGD) algorithm for FL. The pro-
preservation in FL environment with only a minor loss in model posed method improves the model performance by combining
performance. However, the effectiveness of privacy protection shuffling and subsampling techniques to amplify privacy and
relies on a large number of clients per round, which might not be proves that communication efficiency can be achieved with-
practical for some applications with smaller client bases. [40] out compromising privacy. However, the model’s effectiveness
tackles the problem of backdoor attacks in FL by using CDP hinges on the reliability and integrity of the shuffler, introducing
techniques. Specifically, the proposed Clip-Norm-Decay (CND) a potential single point of failure. [46] aims to integrate the
reduces the clipping threshold of model updates throughout the advantages of both the CDP and LDP in FL by utilizing the
training process to mitigate the effects of malicious updates. This shuffle model. The proposed FLAME combines the strengths
paper innovatively adjusts the clipping threshold dynamically, of CDP and LDP, boosting the accuracy of center model and
which helps preserve model accuracy better than traditional DP ensuring strong privacy without relying on any trusted party.
methods. However, CND depends on the specific settings of However, the effectiveness of the shuffle model might diminish
the clipping threshold and several hyperparameters, which may with very large-scale data or in cases where the shuffler becomes
not generalize across different datasets or attack scenarios. [41] a bottleneck for some clients.
introduces BREM to combine DP and Byzantine robustness in Summary and Future Directions: On the whole, researches
FL, which involves averaging updates from clients over time to on DP-FL focus on the trade-offs between the level of pri-
smooth out malicious fluctuations and adding noise to the aggre- vacy assurance and model performance. CDP-FL techniques
gated momentum for privacy. This novel integration addresses can achieve better on model performance compared to LDP-
two significant concerns in FL simultaneously and achieves a FL approaches, but require strong trust assumptions to ensure
good privacy-utility tradeoff. However, the performance of the the effectiveness of privacy protection. Moreover, our pro-
BREM heavily relies on the correct setting of several parameters posed attack has demonstrated that CDP is hard to protect
(e.g., the clipping threshold and noise levels), which may not be FL from GLAs. Hence, it is advisable to enhance security in
straightforward to tune in practice. FL systems by combining CDP with cryptographic techniques
Overview of LDP-FL Techniques: In existing LDP-FL tech- (e.g., secure multi-party computation), rather than using CDP
niques, some works ( [16], [42], [43]) directly use LDP in FL, alone. Most LDP-FL techniques experience reduced accuracy
and some other works ( [44], [45], [46]) explore the application and employ various methods to enhance model performance.
of LDP in FL with a shuffle scheme to improve the model However, existing works requires more complex mechanisms
utility. [16] formally incorporates LDP into FL to defend against on the client side, which can introduce computational overhead
the GLA and demonstrates its effectiveness in mitigating the and implementation challenges. Hence, future research requires
GLA. This work provides both theoretical guarantees and em- the development of lightweight and scalable LDP-FL mech-
pirical evidence of its effectiveness in protecting against privacy anisms that ensure a balanced utility-privacy trade-off in FL
breaches. However, the adopted attack is weak, and the evaluated frameworks.

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12647

B. Defensive Strategies REFERENCES


In the following, we outline several countermeasures that can [1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
“Communication-efficient learning of deep networks from decentralized
defend against our proposed attack and enhance the robustness
data,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, pp. 1273–1282.
of FL systems. [2] B. Hui, Y. Yang, H. Yuan, P. Burlina, N. Z. Gong, and Y. Cao, “Prac-
LDP Ensuring Model Utility: Our improved attack is diffi- tical blind membership inference attack via differential comparisons,”
2021, arXiv:2101.01341.
cult to reconstruct data from the LDP-FL with a reasonable
[3] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis
privacy budget. However, FL with LDP mechanisms always of deep learning,” in Proc. IEEE Symp. Secur. Privacy, 2018, pp. 1–15.
underperform in terms of model utility, and further research is [4] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes,
“ML-leaks: Model and data independent membership inference attacks
necessary to address this issue. For instance, LDP mechanisms
and defenses on machine learning models,” 2018, arXiv: 1806.01246.
implementing adaptive clipping and noise injection could help [5] L. Song and P. Mittal, “Systematic evaluation of privacy risks of ma-
balance privacy and utility. Indeed, selecting the appropriate chine learning models,” in Proc. 30th USENIX Secur. Symp., 2021,
factors in FL process to control the adjustment of the clipping pp. 2615–2632.
[6] K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov, “Property
threshold and the privacy budget presents a significant challenge. inference attacks on fully connected neural networks using permutation
CDP Combined with Cryptographic Protocols: We have re- invariant representations,” in Proc. ACM SIGSAC Conf. Comput. Commun.
marked that CDP cannot resist the improved GLA proposed Secur., 2018, pp. 619–633.
[7] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting
in this paper. This is because that the adversary has access to unintended feature leakage in collaborative learning,” in Proc. IEEE Symp.
the clipped update from individual client. In order to preserve Secur. Privacy, 2019, pp. 691–706.
the utility of CDP while preventing GLA, the combination of [8] Z. Wang, K. Liu, J. Hu, J. Ren, H. Guo, and W. Yuan, “Attrleaks on the edge:
Exploiting information leakage from privacy-preserving co-inference,”
CDP with cryptographic protocols can ensure that the central Chin. J. Electron., vol. 32, no. 1, pp. 1–12, 2023.
server only obtains the aggregated result. One promising tool [9] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” in Proc. Adv.
Neural Inf. Process. Syst., 2019, pp. 14747–14756.
in cryptographic protocols is secure multi-party computation, [10] B. Zhao, K. R. Mopuri, and H. Bilen, “iDLG: Improved deep leakage from
where each client encrypts the local update and sent it to the gradients,” 2020, arXiv: 2001.02610.
server. The server could compute the aggregated update based [11] J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting
gradients-how easy is it to break privacy in federated learning?,” in Proc.
on the encrypted updates and refresh the global model. Adv. Neural Inf. Process. Syst., 2020, pp. 16937–16947.
Appropriate Training Parameter Settings: Setting training [12] H. Yin, A. Mallya, A. Vahdat, J. M. Alvarez, J. Kautz, and P. Molchanov,
parameters appropriately, such as batch sizes, and the number “See through gradients: Image batch recovery via gradinversion,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 16337–16346.
of local updates, can significantly enhance the robustness of [13] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, “Beyond
FL systems. Based on our experimental observations, GLAs inferring class representatives: User-level privacy leakage from federated
struggle to reconstruct data from updates with large batch sizes learning,” in Proc. IEEE Conf. Comput. Commun., 2019, pp. 2512–2520.
[14] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning differ-
and multiple local updates. Therefore, for clients with sufficient entially private recurrent language models,” 2017, arXiv: 1710.06963.
computational resources, increasing the batch size and the num- [15] A. Cheng, P. Wang, X. S. Zhang, and J. Cheng, “Differentially private
ber of local updates can be an effective defense strategy. federated learning with local regularization and sparsification,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10122–10131.
[16] W. Wei, L. Liu, Y. Wut, G. Su, and A. Iyengar, “Gradient-leakage resilient
federated learning,” in Proc. IEEE 41st Int. Conf. Distrib. Comput. Syst.,
VII. CONCLUSION 2021, pp. 797–807.
[17] W. Wei and L. Liu, “Gradient leakage attack resilient deep learning,” IEEE
This work investigates the resistance of CDP and LDP to Trans. Inf. Forensics Secur., vol. 17, pp. 303–316, 2021.
GLAs in FL through extensive experiments. Our evaluation [18] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference
shows that CDP using the per-layer clipping strategy and LDP attacks against machine learning models,” in Proc. IEEE Symp. Secur.
Privacy, 2017, pp. 3–18.
with the reasonable privacy guarantee ( ≤ 10) can defend [19] M. Naseri, J. Hayes, and E. De Cristofaro, “Local and central differential
against SOTA GLAs. Meanwhile, both CDP and LDP achieve privacy for robustness and privacy in federated learning,” 2020, arXiv:
good trade-offs between the model utility and privacy protection 2009.03561.
[20] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer:
only under shallow networks. Finally, we propose to incorpo- Evaluating and testing unintended memorization in neural networks,” in
rate gradient clipping into the attack process, which effectively Proc. USENIX Secur. Symp., 2019, pp. 267–284.
recovers private training data from DP-FL. Hence, the vulner- [21] B. Jayaraman and D. Evans, “Evaluating differentially private machine
learning in practice,” in Proc. USENIX Secur. Symp., 2019, pp. 1895–1912.
ability of DP-FL remains serious with increasingly powerful [22] J. Hu et al., “Shield against gradient leakage attacks: Adaptive privacy-
attackers. Sincerely, we are still at the beginning of studying preserving federated learning,” IEEE/ACM Trans. Netw., vol. 32, no. 2,
this problem, and many more explorations remain to be done. pp. 1407–1422, Apr. 2024, doi: 10.1109/TNET.2023.3317870.
[23] Z. Li, J. Zhang, L. Liu, and J. Liu, “Auditing privacy defenses in feder-
Limitation & Future works: In this paper, we only discussed ated learning via generative gradient leakage,” in Proc. IEEE/CVF Conf.
the effect of DP in defending against SOTA optimization-based Comput. Vis. Pattern Recognit., 2022, pp. 10132–10142.
attacks [67], considering they are more powerful and general [24] K. Yue, R. Jin, C.-W. Wong, D. Baron, and H. Dai, “Gradient
obfuscation gives a false sense of security in federated learning,”
than analytic-based attacks [27], [28], especially in FL scenario. 2022, arXiv:2206.04055.
Although analytic-based attack methods may not be as practical, [25] A. Hatamizadeh et al., “GradViT: Gradient inversion of vision transform-
their interpretability may help us better understand the defense ers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022,
pp. 10021–10030.
of DP against GLAs. Therefore, in our future work, we will pay [26] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W.
more attention to the effectiveness of DP in defending against Shi, “Federated learning of predictive models from federated electronic
such attacks with more theoretical and experimental analysis. health records,” Int. J. Med. Informat., vol. 112, pp. 59–67, 2018.

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12648 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024

[27] C. Chen and N. D. Campbell, “Understanding training-data leak- [52] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM
age from gradients in neural networks for image classification,” SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
2021, arXiv:2111.10178. [53] X. Chen, S. Z. Wu, and M. Hong, “Understanding gradient clipping in
[28] J. Zhu and M. Blaschko, “R-gap: Recursive gradient attack on privacy,” private SGD: A geometric perspective,” in Proc. Adv. Neural Inf. Process.
2020, arXiv:2010.07733. Syst., 2020, pp. 13773–13782.
[29] F. Boenisch, A. Dziedzic, R. Schuster, A. S. Shamsabadi, I. Shumailov, [54] G. Andrew, O. Thakkar, B. McMahan, and S. Ramaswamy, “Differentially
and N. Papernot, “When the curious abandon honesty: Federated learning private learning with adaptive clipping,” in Adv. Neural Inf. Process. Syst.,
is not private,” 2021, arXiv:2112.02918. 2021, pp. 17455–17466.
[30] L. Fowl, J. Geiping, W. Czaja, M. Goldblum, and T. Goldstein, “Robbing [55] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-
the fed: Directly obtaining private data in federated learning with modified ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
models,” 2021, arXiv:2110.13057. pp. 2278–2324, Nov. 1998.
[31] J. C. Zhao, A. Sharma, A. R. Elkordy, Y. H. Ezzeldin, S. Avestimehr, [56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
and S. Bagchi, “Secure aggregation in federated learning is not pri- recognition,” in Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2016,
vate: Leaking user data at large scale through model modification,” pp. 770–778.
2023, arXiv:2303.12233. [57] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[32] Y. Aono et al., “Privacy-preserving deep learning via additively homo- large-scale image recognition,” 2014, arXiv:1409.1556.
morphic encryption,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 5, [58] L. Deng, “The MNIST database of handwritten digit images for machine
pp. 1333–1345, May 2018. learning research,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141–142,
[33] C. Zhang, S. Li, J. Xia, W. Wang, F. Yan, and Y. Liu, “{BatchCrypt }: Nov. 2012.
Efficient homomorphic encryption for { Cross-Silo} federated learning,” [59] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “EMNIST: Extending
in Proc. USENIX Annu. Tech. Conf., 2020, pp. 493–506. MNIST to handwritten letters,” in Proc. Int. Joint Conf. Neural Netw.,
[34] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving 2017, pp. 2921–2926.
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., [60] Y. Le and X. Yang, “Tiny imageNet visual recognition challenge,” CS
2017, pp. 1175–1191. 231N, vol. 7, no. 7, 2015, Art. no. 3.
[35] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin, “VerifyNet: Secure and [61] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer:
verifiable federated learning,” IEEE Trans. Inf. Forensics Secur., vol. 15, Benchmarking machine learning algorithms for traffic sign recognition,”
pp. 911–926, 2019. Neural Netw., vol. 32, pp. 323–332, 2012.
[36] D. Pasquini, D. Francati, and G. Ateniese, “Eluding secure aggregation in [62] A. Krizhevsky et al., “Learning multiple layers of features from tiny
federated learning via model inconsistency,” in Proc. ACM SIGSAC Conf. images,” Univ. Toronto, Toronto, Canada, Tech. Rep., 2009.
Comput. Commun. Secur., 2022, pp. 2429–2443. [63] K. R. Castleman, Digital Image Processing. Hoboken, NJ, USA: Prentice
[37] M. Lam, G.-Y. Wei, D. Brooks, V. J. Reddi, and M. Mitzenmacher, Hall Press, 1996.
“Gradient disaggregation: Breaking privacy in federated learning by re- [64] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
constructing the user participant matrix,” in Proc. Int. Conf. Mach. Learn., unreasonable effectiveness of deep features as a perceptual metric,” in
2021, pp. 5959–5968. Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2018, pp. 586–595.
[38] C. Dwork, “Differential Privacy,” in International Colloquium on Au- [65] A. Wainakh et al., “User-level label leakage from gradients in fed-
tomata, Languages, and Programming. Berlin, Germany: Springer, 2006, erated learning,” Proc. Privacy Enhancing Technol., vol. 2022, no. 2,
pp. 1–12. pp. 227–244, 2022.
[39] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated [66] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
learning: A client level perspective,” 2017, arXiv: 1712.07557. 2014, arXiv:1412.6980.
[40] L. Miao, W. Yang, R. Hu, L. Li, and L. Huang, “Against backdoor attacks [67] H. Yang, M. Ge, D. Xue, K. Xiang, H. Li, and R. Lu, “Gradient leakage
in federated learning with differential privacy,” in Proc. IEEE Int. Conf. attacks in federated learning: Research frontiers, taxonomy and future
Acoust. Speech Signal Process, 2022, pp. 2999–3003. directions,” IEEE Netw., vol. 38, no. 2, pp. 247–254, Mar. 2024.
[41] X. Gu, M. Li, and L. Xiong, “DP-BREM: Differentially-private
and Byzantine-robust federated learning with client momentum,”
2023, arXiv:2306.12608.
[42] R. Liu, Y. Cao, M. Yoshikawa, and H. Chen, “FedSel: Federated SGD under
local differential privacy with top-k dimension selection,” in Proc. 25th Int.
Conf. Database Syst. Adv. Appl., Jeju, South Korea, 2020, pp. 485–501.
[43] S. Truex, L. Liu, K.-H. Chow, M. E. Gursoy, and W. Wei, “LDP-fed:
Federated learning with local differential privacy,” in Proc. 3rd ACM Int. Jiahui Hu received the MS degree in cyber secu-
Workshop Edge Syst. Anal. Netw., 2020, pp. 61–66. rity from Wuhan University, China, in 2019. She is
[44] L. Sun, J. Qian, and X. Chen, “LDP-FL:: Practical private aggrega- currently working toward the doctor degree with the
tion in federated learning with local differential privacy,” 2020, arXiv: School of Cyber Science and Engineering, ZheJiang
2007.15789. University. Her research interest focuses on federated
[45] A. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh, “Shuffled learning and privacy.
model of differential privacy in federated learning,” in Proc. Int. Conf.
Artif. Intell. Statist., 2021, pp. 2521–2529.
[46] R. Liu, Y. Cao, H. Chen, R. Guo, and M. Yoshikawa, “Flame: Differentially
private federated learning in the shuffle model,” in Proc. AAAI Conf. Artif.
Intell., 2021, pp. 8688–8696.
[47] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient
inversion attacks and defenses in federated learning,” in Proc. Adv. Neural
Inf. Process. Syst., 2021, pp. 7232–7241.
[48] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: Jiacheng Du received the BS degree from the Hefei
A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
University of Technology, China, in 2023. He is cur-
Vis. Pattern Recognit., 2009, pp. 248–255.
rently a master degree with Zhejiang University. His
[49] P. Sun et al., “Pain-FL: Personalized privacy-preserving incentive for
main research interests include federated learning and
federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, privacy-preserving machine learning system.
pp. 3805–3820, Dec. 2021.
[50] X. Pang, Z. Wang, D. Liu, J. C. Lui, Q. Wang, and J. Ren, “Towards
personalized privacy-preserving truth discovery over crowdsourced data
streams,” IEEE/ACM Trans. Netw., vol. 30, no. 1, pp. 327–340, Feb. 2022.
[51] C. Dwork et al., “The algorithmic foundations of differential privacy,”
Found. Trends Theor. Comput. Sci., vol. 9, no. 3/4, pp. 211–407, 2014.

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12649

Zhibo Wang (Senior Member, IEEE) received the Peng Sun received the BE degree in automation
BE degree in automation from Zhejiang University, from Tianjin University, China, in 2015, and the
China, in 2007, and the PhD degree in electrical en- PhD degree in control science and engineering from
gineering and computer science from the University Zhejiang University, China, in 2020. From 2020 to
of Tennessee, Knoxville, in 2014. He is currently 2022, he worked as a postdoctoral researcher with
a professor with the School of Cyber Science and the School of Science and Engineering, The Chinese
Technology, Zhejiang University, China. His cur- University of Hong Kong, Shenzhen. He is currently
rently research interests include Internet of Things, an associate professor with the College of Computer
AI security, data security and privacy. He is member Science and Electronic Engineering, Hunan Univer-
of ACM. sity, China. His research interests include Internet of
Things, mobile crowdsensing, and federated learning.

Xiaoyi Pang received the BE degree in information


security and PhD degree in cyberspace security from
the School of Cyber Science and Engineering, Wuhan
Kui Ren (Fellow, IEEE) received the PhD degree
University, in 2018 and 2023, respectively. Her re-
from the Worcester Polytechnic Institute. He is cur-
search interests include focuses on edge intelligence,
rently a professor and the associate dean with the
collaborative computing, IoT and its security, and
privacy-preserving mobile crowdsensing systems. College of Computer Science and Technology, Zhe-
jiang University, where he also directs the School
of Cyber Science and Technology. Before that, he
was the SUNY Empire Innovation professor with the
State University of New York at Buffalo. His current
research interests include Data Security, IoT Security,
AI Security, and Privacy. He received Guohua Dis-
tinguished Scholar Award from ZJU in 2020, IEEE
Yajie Zhou received the BS degree from the CISTC Technical Recognition Award in 2017, SUNY Chancellor’s Research
Excellence Award in 2017, Sigma Xi Research Excellence Award in 2012 and
Huazhong University of Science and Technology,
NSF CAREER Award in 2011. Kui has published extensively in peer-reviewed
China, in 2023. She is currently working toward the
journals and conferences and received the Test-of-time Paper Award from IEEE
PhD degree with the School of Cyber Science and
Technology, Zhejiang University. Her main research INFOCOM and many Best Paper Awards from IEEE and ACM including Mo-
biSys’20, ICDCS’20, Globecom’19, ASIACCS’18, ICDCS’17, etc. His h-index
interests include edge intelligence and Internet of
is 74, and his total publication citation exceeds 32000 according to Google
Things.
Scholar. He is a fellow of ACM and a Clarivate Highly-Cited Researcher. He
is a frequent reviewer for funding agencies internationally and serves on the
editorial boards of many IEEE and ACM journals. He currently serves as Chair
of SIGSAC of ACM China.

Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.

You might also like