Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12636 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
Local Differential Privacy (LDP), where the server is untrusted superiority of clipping operation and per-layer clipping
and the DP mechanism is performed on clients. Specifically, to manner on the resilience to GLAs.
prevent privacy breaches from the untrusted server, clients clip r We propose a novel GLA approach incorporated with the
and add noise to the gradient during the training before sharing clipping operation to enhance the attack effect against
the model update with the untrusted server. DP-FL. The results show that our proposed attack can
Prior works have shown that DP can address the membership completely recover private training data from FL under
inference attacks and property inference attacks [18], [19], [20], CDP and reveal more information than SOTA attacks under
[21]. As an emerging privacy attack paradigm, the reaction of LDP, which verifies the vulnerability of DP-FL.
gradient leakage attacks to DP-FL has not been thoroughly The remainder of this paper is organized as follows. In
investigated. On the one hand, previous studies [16], [17], [22] Section II, a review of GLAs and privacy preservation technolo-
have proposed to leverage DP to defend against GLA in FL. gies in FL is presented. In Section III, we introduce the attack
However, the effectiveness is demonstrated by only a limited process of GLAs, typical GLA algorithms and differentially pri-
evaluation conducted based on relatively early GLA attacks [9], vate FL. In Section IV, a comprehensive evaluation of CDP and
[10]. On the other hand, some studies [23], [24] have indicated LDP against GLAs is conducted. The proposed GLA approach
that DP may not be sufficient to provide a defense against targeting DP-FL and its corresponding performance evaluation
GLA in FL. However, the [23] only separately evaluated the are presented in Section V. In Section VI, we discuss the existing
performance of gradient clipping and gradient perturbation, DP-FL techniques and recommend several strategies to defend
and the [24] evaluated the performance of LDP with a small against GLAs. Finally, the paper is concluded in Section VII.
number of experiments and did not make adjustments to privacy
parameters, such as the clipping value. Sincerely, they do not
conduct a comprehensive evaluation of LDP and CDP, thus can II. RELATED WORK
not fully demonstrate the ineffectiveness of DP. The conclusions A. Gradient Leakage Attacks
of the above two types of works contradict each other, and neither
of them has sufficient evidence. Hence, it’s urgent to evaluate The privacy vulnerability of gradients is discovered by Wang
the effectiveness of CDP and LDP in resisting GLAs in FL et al. [13], who first reconstructed original training images from
comprehensively. shared updates in FL. After that, GLA is formally proposed in [9]
In this paper, we concentrate on answering the following and immediately improved in [10], in which the attackers can
question: Does Differential Privacy Really Protect Federated recover pixel-level training images and token-level training texts
Learning from Gradient Leakage Attacks? Particularly, we con- through the model’s corresponding shared gradients. Most ex-
sider this question from the perspective of the trade-off between isting GLAs are optimization-based attacks, where the attacker
privacy and utility. That is, we aim to investigate whether reconstructs the training sample by generating a dummy input
DP methods can protect FL from GLAs without significantly that can minimize the distance between the original gradient and
reducing model accuracy. To this end, we first comprehen- the dummy gradient derived from the dummy input. Based on
sively evaluate the performance of DP-FL against GLAs by this, some recent GLAs focused on more complicated learning
changing their privacy parameters (e.g., noise multiplier, privacy tasks (e.g., transformer [25], [26], deeper networks [11]) and
budget, clipping norm), privacy operations (e.g., clipping and achieve better reconstruction performance with the averaged
perturbation), and clipping strategies (e.g., layer-wise clip and gradients of large batch and high-resolution inputs [11], [12].
model-wise clip). Then, we explore the vulnerability of DP-FL Besides, there are also some analytics-based GLAs [27], [28],
under GLAs by introducing an improved attack that incorporates which can recover private training data from the gradient by
the clipping operation into existing attack schemes to undermine solving a linear system of equations, but only work well on
the defense of DP. With our evaluations, we can validate the the reconstruction of a single or few data samples. The above
effectiveness as well as reveal the vulnerability of DP in FL attacks (i.e., passive attacks) usually assume that the server is
under GLAs. We hope that this work can provide suggestions honest-but-curious, who infers the privacy of the training data
on how to better use DP for defending against GLAs in FL, while but honestly follows the process of FL.
also offering guidance for the design of privacy-preserving FL. More recent GLAs [29], [30], [31], called active attacks,
The main contributions of this paper are summarized as assume a malicious server and modify the shared model’s ar-
follows: chitecture and weights to achieve more powerful recovery of
r We evaluate the resistance of CDP and LDP to GLAs user data. However, [29] only works on simple networks (e.g.,
on various model architectures and datasets. The results CNNs with a few convolutional layers) and other works [30],
validate that CDP can effectively defend against GLAs [31] require inserting a few linear layers at the start of the original
when using the per-layer clipping strategy and LDP is ef- model, which is certainly easy to detect. Hence, considering the
fective when adopting a reasonable perturbation level (i.e., possibility of happening in real FL scenarios, we choose to focus
non-trivial noise). Moreover, CDP and LDP only achieve on passive attacks in this paper.
good performance on the trade-offs between the model
utility and privacy protection under shallow networks.
r We illustrate the influence of privacy parameters, op- B. Privacy Preservation in Federated Learning
erations and clipping strategies on the effectiveness of Existing research on privacy protection in FL can generally
DP defending against GLAs. The results demonstrate the be divided into cryptography-based and DP-based methods.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12637
The commonly used technology in cryptography-based DLG [9]: This work formulated the gradient matching prob-
methods is the secure multi-party computation (MPC). MPC can lem as (1), where Euclidean distance is used to measure the
compute the output of a function over a set of private inputs from similarity of dummy gradient and original gradient and L-BFGS
clients in a secure manner and only reveal the intended output to is adopted as the optimizer. As for the label prediction, the author
the server. Specifically, FL uses homomorphic encryption [32], synchronously updates the input and label information during
[33] or secret sharing [34], [35] in MPC technologies to optimization.
achieve secure aggregation, which can protect individual
arg min ∇θ Lθ (x, y) − g2 . (1)
gradients from server access. Cryptography-based methods x,y
can generally guarantee the security and privacy of individual
iDLG [9]: A follow-up work of DLG [9], which proposed
gradients without compromising their utility. However, these
a simple approach to extract the ground-truth labels from the
approaches exponentially increase the computation time,
gradient and then optimize the (1) with the extracted label using
communication bandwidth, and data storage. Moreover, recent
L-BFGS.
research indicates that the server can still infer privacy from
DLG and iDLG are only effective when reconstructing low-
aggregated gradients [36], [37].
resolution images with a maximum batch size of 8. Moreover,
DP [38] is also widely adopted in FL due to its lightweight
according to our experiments, they mainly have stable perfor-
overhead and flexibility. DP-based FL can be divided into two
mance in reconstructing a single image.
categories according to the trust level of the server. CDP-based
InvertGrad [11]: The authors substituted the Euclidean dis-
FL (CDP-FL) [14], [39], [40], [41] always assumes a trusted
tance with the cosine distance and added the total variation norm
server to perturb the aggregated updates, which provides higher
(TV) to the optimization function. TV serves as a regularization
model accuracy but weaker privacy guarantee. Although the
term to recover more realistic data from the average gradient of
clipping operation in CDP-FL can be performed either on the
large batches. Therefore, the optimization problem is formulated
client or server side, it is typically executed on the client side to
as:
enhance privacy. LDP-based FL (LDP-FL) [16], [42], [43], [44],
[45], [46] can work under an untrusted server and perturbs the Lθ (x, y), g
arg min 1 − + αT V RT V (x), (2)
updated gradients locally on the client side. However, despite x Lθ (x, y)g
offering a heightened level of privacy guarantee, LDP-FL tends where RT V (x) denotes the total variation and αT V controls the
to compromise on model accuracy. To improve the model utility, weight of the regularization terms in the objective function. In
the shuffle scheme has been employed in recent LDP works [44], this work, the authors inferred label information in advance and
[45], [46]. The adopted shuffler permutes the locally obfuscated used the Adam optimizer to solve the optimization problem.
updates before transmitting them to the server (i.e., introducing GI [12]: This work still used the Euclidean distance, but
the anonymization of updates from clients), allowing for less added the total variation norm, l2 -norm, and batch normalization
noise to achieve the same level of privacy. (BN) to the regularization term served as Rf . Meanwhile, the
Sincerely, DP is the mainstream privacy protection technol- regularization term jointly adds group consistency Rg by con-
ogy in FL. Prior works have shown that DP can address the mem- sidering multiple random seeds for initialization and calculating
bership inference attacks and property inference attacks [18], the averaged data as a reference. Then, the optimization problem
[19], [20], [21]. As an emerging privacy attack paradigm, the re- is
action of GLAs to DP-FL has not been thoroughly investigated.
arg min ∇θ Lθ (x, y) − g2 + αf Rf (x) + αg Rg (x). (3)
x
III. PRELIMINARIES
Among all these attacks, InvertGrad1 and GI are SOTA widely
A. GLA Algorithms used in recent works [23], [47]. The GI achieves better recon-
We introduce the workflow of optimization-based GLAs (i.e., struction performance than InvertGrad when the input has a
our focus in this paper) and present four typical GLA methods. higher resolution. However, GI is based on a strong assumption
Attack Workflow: The attacker first obtains the gradient infor- that the attacker knows the exact BatchNorm (BN) statistics
mation of the victim g and the global model with parameters θ of the private batch. The ablation analysis of BN statistics has
through eavesdropping. However, if the attacker is the server, it been well discussed in [47], and the results showed that the
can directly get the model and gradient information. Then, the reconstructed batch of high-resolution images remains elusive
attacker introduces some dummy data (x, y) (including inputs without knowing BN statistics. Moreover, the BN statistics of
x and corresponding labels y), and performs a similar model single batch samples are lost during the multiple iterations of
training on them, yielding dummy gradients ∇θ Lθ (x, y). The forward propagation in local training with federated average
attacker iteratively updates the dummy data to minimize the (FedAvg), which is commonly used in the FL community.
distance between the dummy gradient and the victim client’s Hence, in our paper, we select InvertGrad as the default attack
gradient. After a certain number of iterations, the attacker obtains scheme unless otherwise specified.
dummy data samples that are close to the victim’s private data. In the above, we have already theoretically analyzed the
The existing optimization-based attacks generally follow the differences between the InvertGrad and GI, and explained why
above process but mainly differ in the optimization objectives,
optimizer, and label prediction approaches. We then list four 1 The official implementation: https://github.com/JonasGeiping/
representative studies about GLAs. invertinggradients. [11]
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12638 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12639
TABLE I
ILLUSTRATION OF TWO CLIPPING STRATEGIES Algorithm 2: Local Difference Privacy based Federated
Learning (LDP-FL).
C. Threat Model
In our study, we aim to examine specific vulnerabilities related aims to circumvent the defenses offered by CDP, thereby per-
to GLAs in the context of FL systems employing DP techniques. forming the attack on the clipped model updates. In LDP-FL,
The threat model we consider encompasses the following as- the attacker should bypass the protection of LDP, inferring
pects, including attacker roles, attacker capabilities, and attacker the original data from the noisy model updates. Sincerely, the
objectives. attacker in LDP-FL encounters stronger defenses compared to
Attacker Roles: In CDP-FL, an external attacker eavesdrops CDP-FL.
on the communication channel and infers sensitive data from the By clearly outlining these aspects within our threat model, we
stolen parameters. In LDP-FL, the attacker could be an internal initially provide an intuitive understanding of the potential risks
entity (i.e., the server) who attempts to infer sensitive data from and vulnerabilities in both LDP and CDP settings. The further
the observed parameters. exploration of GLAs encountering the DP will be conducted
Attacker Capabilities: In principle, the server, as the internal through a comprehensive evaluation.
attacker, would have stronger attack capabilities than external
attackers. However, considering the stealthiness, this paper only
focuses on passive attack strategies, which only allow the server D. Existing Evaluation of DP on GLA
to observe and analyze parameters rather than actively manip- We mentioned above that study [24] has claimed that LDP
ulating the FL process. In this context, attackers in CDP-FL may not be sufficient to provide a defense against GLA in
and LDP-FL exhibit comparable attack capabilities, as essential FL. However, the authors only evaluated the performance of
knowledge like the global model and gradients is also transmit- FedCDP (a LDP mechanism with LClip) [16] in terms of
ted during the FL training process. utility-privacy trade-offs, based on a 10-category image clas-
Attacker Objectives: The attacker’s primary objective is to sification task on CIFAR10. Moreover, the model used therein
breach individual privacy by reconstructing original training was not explicitly disclosed. Therefore, a single experiment
data from observed model parameters. In CDP-FL, the attacker is insufficient to illustrate the performance of DP-FL in the
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12640 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
TABLE II
SUMMARY OF HYPERPARAMETERS OF ATTACK SCHEMES IN THE EVALUATION
TABLE III
SUMMARY OF AVERAGE l2 -NORM OF MODEL UPDATE (Δ) AND GRADIENT (g)
ON VARIOUS MODELS AND DATASETS
Fig. 3. Reconstruction results of existing evaluation work [24] on FedCDP
with different clipping norms.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12641
TABLE V
(FOR SEC. IV-B ANS.2) QUALITY OF RECONSTRUCTED SAMPLES (I.E., LPIPS
↑) FROM DIFFERENT MODELS AND DATASETS UNDER CDP WITH TWO
CLIPPING STRATEGIES
Fig. 4. (For Sec. IV-B Ans.1) Batch-Reconstruction from ResNet-18 on CI- TABLE VI
FAR100 when gradients are protected by CDP. The clipping norm C = 0.1 in (FOR SEC. IV-B ANS.3) TEST ACCURACY (%) OF VARIOUS IMAGE
both LClip and FClip. CLASSIFICATION TASKS UNDER DIFFERENT CLIPPING NORMS AND CLIPPING
STRATEGIES IN CDP
TABLE IV
(FOR SEC. IV-B ANS.2) QUALITY OF RECONSTRUCTED SAMPLES FROM
RESNET-18 ON CIFAR100 UNDER DIFFERENT CLIPPING STRATEGIES AND
NORMS
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12642 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
TABLE VIII
(FOR SEC. IV-C ANS.3) TEST ACCURACY AND RECONSTRUCTION QUALITY (LPIPS) OF DIFFERENT IMAGE CLASSIFICATION TASKS UNDER LDP WITH DIFFERENT
CLIPPING STRATEGIES
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12644 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12645
TABLE X TABLE XI
COMPARISON OF QUANTITATIVE RECONSTRUCTION RESULTS (LPIPS↓) WITH ERROR BARS OF COMPARATIVE RECONSTRUCTION RESULTS (LPIPS↓) WITH
THE INVERTGRAD ON CDP-FL WITH LCLIP UNDER VARIOUS BATCH-SIZE, THE INVERTGRAD ON CDP-FL WITH LCLIP UNDER VARIOUS BATCH-SIZE,
MODELS AND DATASETS MODELS AND DATASETS
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12646 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
additive noise (Cσ < 1e − 2), while the InvertGrad has been network (i.e., a fully connected model with two hidden layers) is
defended with Cσ = 1e − 5. This illustrates adding clipping too shallow. [42] applies LDP to FL and addresses the dimension
operation effectively improves the performance of the original dependency problem in LDP by privately selecting the top-k
InvertGrad on both CDP and LDP, even base on an approximate dimensions of gradient updates. The newly added dimension
clipping norm. However, the LDP is still effective in practice, selection mechanism effectively enhances the model’s utility
because too small noise indicates a very large privacy budget. and improves convergence. However, the additional compu-
For Cσ = 0.01 with C = 10, σ = 0.001 leads to privacy budget tational overhead of the dimension selection mechanism can
= 3193, which is an unreasonable value in DP literature. Since be burdensome for resource-constrained devices. [43] enables
it is already difficult to attack LDP with a batch-size of 1, we will clients to customize their LDP privacy budget locally and re-
not further explore the attack capabilities with larger batch-size places LDP with condensed LDP (CLDP) to handle large-scale
as evaluated on CDP. model parameter updates. However, this work requires more
communication rounds between the cloud and clients due to the
VI. DISCUSSION high variance of the noises.
Regarding the LDP-FL techniques with a shuffle scheme,
A. CDP and LDP Techniques in FL they randomly permute the locally obfuscated updates before
Below, we discuss the existing LDP and CDP techniques in transmitting them to the server. In particular, [44] proposes a
FL, highlighting their strengths and weaknesses, to provide a new LDP mechanism for FL with DNNs, which employs data
more holistic perspective of this field and guide future research perturbation with adaptive range and parameter shuffling to each
directions. clients’ weights. This work considers the necessity of adapt-
Overview of CDP-FL Techniques: In existing CDP-FL tech- ing to varying weight ranges in different DNN layers, which
niques, [14] first incorporates CDP into the federated averaging helps reduce variance and enhance model accuracy. However,
algorithm and demonstrates the feasibility of training recur- the adaptive range setting and parameter shuffling mechanisms
rent language models with user-level DP. Almost at the same introduce additional computational overhead, which may be
time, [39] also integrates CDP into FL to prevent information challenging for resource-constrained devices. [45] proposes a
leakage about a client’s dataset from being inferred from the communication-efficient and local differentially private stochas-
shared model. Both of these works achieve user-level privacy tic gradient descent (CLDP-SGD) algorithm for FL. The pro-
preservation in FL environment with only a minor loss in model posed method improves the model performance by combining
performance. However, the effectiveness of privacy protection shuffling and subsampling techniques to amplify privacy and
relies on a large number of clients per round, which might not be proves that communication efficiency can be achieved with-
practical for some applications with smaller client bases. [40] out compromising privacy. However, the model’s effectiveness
tackles the problem of backdoor attacks in FL by using CDP hinges on the reliability and integrity of the shuffler, introducing
techniques. Specifically, the proposed Clip-Norm-Decay (CND) a potential single point of failure. [46] aims to integrate the
reduces the clipping threshold of model updates throughout the advantages of both the CDP and LDP in FL by utilizing the
training process to mitigate the effects of malicious updates. This shuffle model. The proposed FLAME combines the strengths
paper innovatively adjusts the clipping threshold dynamically, of CDP and LDP, boosting the accuracy of center model and
which helps preserve model accuracy better than traditional DP ensuring strong privacy without relying on any trusted party.
methods. However, CND depends on the specific settings of However, the effectiveness of the shuffle model might diminish
the clipping threshold and several hyperparameters, which may with very large-scale data or in cases where the shuffler becomes
not generalize across different datasets or attack scenarios. [41] a bottleneck for some clients.
introduces BREM to combine DP and Byzantine robustness in Summary and Future Directions: On the whole, researches
FL, which involves averaging updates from clients over time to on DP-FL focus on the trade-offs between the level of pri-
smooth out malicious fluctuations and adding noise to the aggre- vacy assurance and model performance. CDP-FL techniques
gated momentum for privacy. This novel integration addresses can achieve better on model performance compared to LDP-
two significant concerns in FL simultaneously and achieves a FL approaches, but require strong trust assumptions to ensure
good privacy-utility tradeoff. However, the performance of the the effectiveness of privacy protection. Moreover, our pro-
BREM heavily relies on the correct setting of several parameters posed attack has demonstrated that CDP is hard to protect
(e.g., the clipping threshold and noise levels), which may not be FL from GLAs. Hence, it is advisable to enhance security in
straightforward to tune in practice. FL systems by combining CDP with cryptographic techniques
Overview of LDP-FL Techniques: In existing LDP-FL tech- (e.g., secure multi-party computation), rather than using CDP
niques, some works ( [16], [42], [43]) directly use LDP in FL, alone. Most LDP-FL techniques experience reduced accuracy
and some other works ( [44], [45], [46]) explore the application and employ various methods to enhance model performance.
of LDP in FL with a shuffle scheme to improve the model However, existing works requires more complex mechanisms
utility. [16] formally incorporates LDP into FL to defend against on the client side, which can introduce computational overhead
the GLA and demonstrates its effectiveness in mitigating the and implementation challenges. Hence, future research requires
GLA. This work provides both theoretical guarantees and em- the development of lightweight and scalable LDP-FL mech-
pirical evidence of its effectiveness in protecting against privacy anisms that ensure a balanced utility-privacy trade-off in FL
breaches. However, the adopted attack is weak, and the evaluated frameworks.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12647
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
12648 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 23, NO. 12, DECEMBER 2024
[27] C. Chen and N. D. Campbell, “Understanding training-data leak- [52] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM
age from gradients in neural networks for image classification,” SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
2021, arXiv:2111.10178. [53] X. Chen, S. Z. Wu, and M. Hong, “Understanding gradient clipping in
[28] J. Zhu and M. Blaschko, “R-gap: Recursive gradient attack on privacy,” private SGD: A geometric perspective,” in Proc. Adv. Neural Inf. Process.
2020, arXiv:2010.07733. Syst., 2020, pp. 13773–13782.
[29] F. Boenisch, A. Dziedzic, R. Schuster, A. S. Shamsabadi, I. Shumailov, [54] G. Andrew, O. Thakkar, B. McMahan, and S. Ramaswamy, “Differentially
and N. Papernot, “When the curious abandon honesty: Federated learning private learning with adaptive clipping,” in Adv. Neural Inf. Process. Syst.,
is not private,” 2021, arXiv:2112.02918. 2021, pp. 17455–17466.
[30] L. Fowl, J. Geiping, W. Czaja, M. Goldblum, and T. Goldstein, “Robbing [55] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-
the fed: Directly obtaining private data in federated learning with modified ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
models,” 2021, arXiv:2110.13057. pp. 2278–2324, Nov. 1998.
[31] J. C. Zhao, A. Sharma, A. R. Elkordy, Y. H. Ezzeldin, S. Avestimehr, [56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
and S. Bagchi, “Secure aggregation in federated learning is not pri- recognition,” in Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2016,
vate: Leaking user data at large scale through model modification,” pp. 770–778.
2023, arXiv:2303.12233. [57] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[32] Y. Aono et al., “Privacy-preserving deep learning via additively homo- large-scale image recognition,” 2014, arXiv:1409.1556.
morphic encryption,” IEEE Trans. Inf. Forensics Secur., vol. 13, no. 5, [58] L. Deng, “The MNIST database of handwritten digit images for machine
pp. 1333–1345, May 2018. learning research,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141–142,
[33] C. Zhang, S. Li, J. Xia, W. Wang, F. Yan, and Y. Liu, “{BatchCrypt }: Nov. 2012.
Efficient homomorphic encryption for { Cross-Silo} federated learning,” [59] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “EMNIST: Extending
in Proc. USENIX Annu. Tech. Conf., 2020, pp. 493–506. MNIST to handwritten letters,” in Proc. Int. Joint Conf. Neural Netw.,
[34] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving 2017, pp. 2921–2926.
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., [60] Y. Le and X. Yang, “Tiny imageNet visual recognition challenge,” CS
2017, pp. 1175–1191. 231N, vol. 7, no. 7, 2015, Art. no. 3.
[35] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin, “VerifyNet: Secure and [61] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer:
verifiable federated learning,” IEEE Trans. Inf. Forensics Secur., vol. 15, Benchmarking machine learning algorithms for traffic sign recognition,”
pp. 911–926, 2019. Neural Netw., vol. 32, pp. 323–332, 2012.
[36] D. Pasquini, D. Francati, and G. Ateniese, “Eluding secure aggregation in [62] A. Krizhevsky et al., “Learning multiple layers of features from tiny
federated learning via model inconsistency,” in Proc. ACM SIGSAC Conf. images,” Univ. Toronto, Toronto, Canada, Tech. Rep., 2009.
Comput. Commun. Secur., 2022, pp. 2429–2443. [63] K. R. Castleman, Digital Image Processing. Hoboken, NJ, USA: Prentice
[37] M. Lam, G.-Y. Wei, D. Brooks, V. J. Reddi, and M. Mitzenmacher, Hall Press, 1996.
“Gradient disaggregation: Breaking privacy in federated learning by re- [64] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
constructing the user participant matrix,” in Proc. Int. Conf. Mach. Learn., unreasonable effectiveness of deep features as a perceptual metric,” in
2021, pp. 5959–5968. Proc. IEEE Conf. Comput. Vis. pattern Recognit., 2018, pp. 586–595.
[38] C. Dwork, “Differential Privacy,” in International Colloquium on Au- [65] A. Wainakh et al., “User-level label leakage from gradients in fed-
tomata, Languages, and Programming. Berlin, Germany: Springer, 2006, erated learning,” Proc. Privacy Enhancing Technol., vol. 2022, no. 2,
pp. 1–12. pp. 227–244, 2022.
[39] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated [66] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
learning: A client level perspective,” 2017, arXiv: 1712.07557. 2014, arXiv:1412.6980.
[40] L. Miao, W. Yang, R. Hu, L. Li, and L. Huang, “Against backdoor attacks [67] H. Yang, M. Ge, D. Xue, K. Xiang, H. Li, and R. Lu, “Gradient leakage
in federated learning with differential privacy,” in Proc. IEEE Int. Conf. attacks in federated learning: Research frontiers, taxonomy and future
Acoust. Speech Signal Process, 2022, pp. 2999–3003. directions,” IEEE Netw., vol. 38, no. 2, pp. 247–254, Mar. 2024.
[41] X. Gu, M. Li, and L. Xiong, “DP-BREM: Differentially-private
and Byzantine-robust federated learning with client momentum,”
2023, arXiv:2306.12608.
[42] R. Liu, Y. Cao, M. Yoshikawa, and H. Chen, “FedSel: Federated SGD under
local differential privacy with top-k dimension selection,” in Proc. 25th Int.
Conf. Database Syst. Adv. Appl., Jeju, South Korea, 2020, pp. 485–501.
[43] S. Truex, L. Liu, K.-H. Chow, M. E. Gursoy, and W. Wei, “LDP-fed:
Federated learning with local differential privacy,” in Proc. 3rd ACM Int. Jiahui Hu received the MS degree in cyber secu-
Workshop Edge Syst. Anal. Netw., 2020, pp. 61–66. rity from Wuhan University, China, in 2019. She is
[44] L. Sun, J. Qian, and X. Chen, “LDP-FL:: Practical private aggrega- currently working toward the doctor degree with the
tion in federated learning with local differential privacy,” 2020, arXiv: School of Cyber Science and Engineering, ZheJiang
2007.15789. University. Her research interest focuses on federated
[45] A. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh, “Shuffled learning and privacy.
model of differential privacy in federated learning,” in Proc. Int. Conf.
Artif. Intell. Statist., 2021, pp. 2521–2529.
[46] R. Liu, Y. Cao, H. Chen, R. Guo, and M. Yoshikawa, “Flame: Differentially
private federated learning in the shuffle model,” in Proc. AAAI Conf. Artif.
Intell., 2021, pp. 8688–8696.
[47] Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating gradient
inversion attacks and defenses in federated learning,” in Proc. Adv. Neural
Inf. Process. Syst., 2021, pp. 7232–7241.
[48] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: Jiacheng Du received the BS degree from the Hefei
A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
University of Technology, China, in 2023. He is cur-
Vis. Pattern Recognit., 2009, pp. 248–255.
rently a master degree with Zhejiang University. His
[49] P. Sun et al., “Pain-FL: Personalized privacy-preserving incentive for
main research interests include federated learning and
federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, privacy-preserving machine learning system.
pp. 3805–3820, Dec. 2021.
[50] X. Pang, Z. Wang, D. Liu, J. C. Lui, Q. Wang, and J. Ren, “Towards
personalized privacy-preserving truth discovery over crowdsourced data
streams,” IEEE/ACM Trans. Netw., vol. 30, no. 1, pp. 327–340, Feb. 2022.
[51] C. Dwork et al., “The algorithmic foundations of differential privacy,”
Found. Trends Theor. Comput. Sci., vol. 9, no. 3/4, pp. 211–407, 2014.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.
HU et al.: DOES DIFFERENTIAL PRIVACY REALLY PROTECT FEDERATED LEARNING FROM GRADIENT LEAKAGE ATTACKS? 12649
Zhibo Wang (Senior Member, IEEE) received the Peng Sun received the BE degree in automation
BE degree in automation from Zhejiang University, from Tianjin University, China, in 2015, and the
China, in 2007, and the PhD degree in electrical en- PhD degree in control science and engineering from
gineering and computer science from the University Zhejiang University, China, in 2020. From 2020 to
of Tennessee, Knoxville, in 2014. He is currently 2022, he worked as a postdoctoral researcher with
a professor with the School of Cyber Science and the School of Science and Engineering, The Chinese
Technology, Zhejiang University, China. His cur- University of Hong Kong, Shenzhen. He is currently
rently research interests include Internet of Things, an associate professor with the College of Computer
AI security, data security and privacy. He is member Science and Electronic Engineering, Hunan Univer-
of ACM. sity, China. His research interests include Internet of
Things, mobile crowdsensing, and federated learning.
Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on December 16,2024 at 06:35:06 UTC from IEEE Xplore. Restrictions apply.