0% found this document useful (0 votes)

10 views11 pages

A Hybrid Approach To Privacy-Preserving Federated Learning

Uploaded by

Chuyue Tang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

A Hybrid Approach To Privacy-Preserving Federated Learning

Uploaded by

Chuyue Tang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

A Hybrid Approach to Privacy-Preserving Federated Learning

Stacey Truex Nathalie Baracaldo Ali Anwar

[email protected] [email protected] [email protected]
Georgia Institute of Technology IBM Research Almaden IBM Research Almaden
Atlanta, Georgia San Jose, California San Jose, California

Thomas Steinke Heiko Ludwig Rui Zhang

[email protected] [email protected] [email protected]
IBM Research Almaden IBM Research Almaden IBM Research Almaden
arXiv:1812.03224v2 [cs.LG] 14 Aug 2019

San Jose, California San Jose, California San Jose, California

Yi Zhou
[email protected]
IBM Research Almaden
San Jose, California
ABSTRACT KEYWORDS
Federated learning facilitates the collaborative training of models Privacy, Federated Learning, Privacy-Preserving Machine Learning,
without the sharing of raw data. However, recent attacks demon- Differential Privacy, Secure Multiparty Computation
strate that simply maintaining data locality during training pro-
ACM Reference Format:
cesses does not provide sufficient privacy guarantees. Rather, we Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Lud-
need a federated learning system capable of preventing inference wig, Rui Zhang, and Yi Zhou. 2019. A Hybrid Approach to Privacy-Preserving
over both the messages exchanged during training and the final Federated Learning. In London ’19: ACM Workshop on Artificial Intelligence
trained model while ensuring the resulting model also has accept- and Security, November 15, 2019, London, UK. ACM, New York, NY, USA,
able predictive accuracy. Existing federated learning approaches 11 pages. https://doi.org/10.1145/1122445.1122456
either use secure multiparty computation (SMC) which is vulnerable
to inference or differential privacy which can lead to low accuracy 1 INTRODUCTION
given a large number of parties with relatively small amounts of
In traditional machine learning (ML) environments, training data is
data each. In this paper, we present an alternative approach that uti-
centrally held by one organization executing the learning algorithm.
lizes both differential privacy and SMC to balance these trade-offs.
Distributed learning systems extend this approach by using a set
Combining differential privacy with secure multiparty computation
of learning nodes accessing shared data or having the data sent to
enables us to reduce the growth of noise injection as the number
the participating nodes from a central node, all of which are fully
of parties increases without sacrificing privacy while maintaining
trusted. For example, MLlib from Apache Spark assumes a trusted
a pre-defined rate of trust. Our system is therefore a scalable ap-
central node to coordinate distributed learning processes [28]. An-
proach that protects against inference threats and produces models
other approach is the parameter server [26], which again requires a
with high accuracy. Additionally, our system can be used to train
fully trusted central node to collect and aggregate parameters from
a variety of machine learning models, which we validate with ex-
the many nodes learning on their different datasets.
perimental results on 3 different machine learning algorithms. Our
However, some learning scenarios must address less open trust
experiments demonstrate that our approach out-performs state of
boundaries, particularly when multiple organizations are involved.
the art solutions.
While a larger dataset improves the performance of a trained model,
organizations often cannot share data due to legal restrictions or
CCS CONCEPTS competition between participants. For example, consider three hos-
• Security and privacy → Privacy-preserving protocols; Trust pitals with different owners serving the same city. Rather than each
frameworks; • Computing methodologies → Learning settings. hospital creating their own predictive model forecasting cancer
risks for their patients, the hospitals want to create a model learned
over the whole patient population. However, privacy laws prohibit
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed them from sharing their patients’ data. Similarly, a service provider
for profit or commercial advantage and that copies bear this notice and the full citation may collect usage data both in Europe and the United States. Due
on the first page. Copyrights for components of this work owned by others than ACM to legislative restrictions, the service provider’s data cannot be
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a stored in one central location. When creating a predictive model
fee. Request permissions from [email protected]. forecasting service usage, however, all datasets should be used.
London ’19, November 15, 2019, London, UK The area of federated learning (FL) addresses these more restric-
© 2019 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 tive environments wherein data holders collaborate throughout the
https://doi.org/10.1145/1122445.1122456 learning process rather than relying on a trusted third party to hold
London ’19, November 15, 2019, London, UK Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou

data [6, 39]. Data holders in FL run a machine learning algorithm single individual, thus limiting an attacker’s ability to infer such
locally and only exchange model parameters, which are aggregated membership. The formal definition for DP is [13]:
and redistributed by one or more central entities. However, this
Definition 1 (Differential Privacy). A randomized mecha-
approach is not sufficient to provide reasonable data privacy guar-
nism K provides (ϵ, δ )- differential privacy if for any two neighboring
antees. We must also consider that information can be inferred
database D 1 and D 2 that differ in only a single entry, ∀S ⊆ Ranдe(K),
from the learning process [30] and that information that can be
traced back to its source in the resulting trained model [40]. Pr(K(D 1 ) ∈ S) ≤ e ϵ Pr(K(D 2 ) ∈ S) + δ (1)
Some previous work has proposed a trusted aggregator as a way If δ = 0, K is said to satisfy ϵ-differential privacy.
to control privacy exposure [1], [32]. FL schemes using Local Dif- To achieve DP, noise is added to the algorithm’s output. This
ferential Privacy also address the privacy problem [39] but entails noise is proportional to the sensitivity of the output, where sen-
adding too much noise to model parameter data from each node, sitivity measures the maximum change of the output due to the
often yielding poor performance of the resulting model. inclusion of a single data instance.
We propose a novel federated learning system which provides Two popular mechanisms for achieving DP are the Laplacian
formal privacy guarantees, accounts for various trust scenarios, and Gaussian mechanisms. Gaussian is defined by
and produces models with increased accuracy when compared
with existing privacy-preserving approaches. Data never leaves M(D) ≜ f (D) + N (0, S f2 σ 2 ), (2)
the participants and privacy is guaranteed using secure multiparty where N (0, S f2 σ 2 ) is the normal distribution with mean 0 and stan-
computation (SMC) and differential privacy. We account for po-
dard deviation S f σ . A single application of the Gaussian mechanism
tential inference from individual participants as well as the risk of
to function f of sensitivity S f satisfies (ϵ, δ )-differential privacy if
collusion amongst the participating parties through a customizable
trust threshold. Our contributions are the following: δ ≥ 54 exp(−(σϵ)2 /2) and ϵ < 1 [16].
To achieve ϵ-differential privacy, the Laplace mechanism may be
• We propose and implement an FL system providing formal used in the same manner by substituting N (0, S f2 σ 2 ) with random
privacy guarantees and models with improved accuracy com- variables drawn from Lap(S f /ϵ) [16].
pared to existing approaches. When an algorithm requires multiple additive noise mechanisms,
• We include a tunable trust parameter which accounts for the evaluation of the privacy guarantee follows from the basic com-
various trust scenarios while maintaining the improved ac- position theorem [14, 15] or from advanced composition theorems
curacy and formal privacy guarantees. and their extensions [7, 17, 18, 23].
• We demonstrate that it is possible to use the proposed ap-
proach to train a variety of ML models through the exper- 2.2 Threshold Homomorphic Encryption
imental evaluation of our system with three significantly
An additively homomorphic encryption scheme is one wherein the
different ML models: decision trees, convolutional neural
following property is guaranteed:
networks and linear support vector machines.
• We include the first federated approach for the private and Enc(m 1 ) ◦ Enc(m 2 ) = Enc(m 1 + m 2 ),
accurate training of a neural network model. for some predefined function ◦. Such schemes are popular in privacy-
The rest of this paper is organized as follows. We outline the preserving data analytics as untrusted parties can perform opera-
building blocks in our system. We then discuss the various privacy tions on encrypted values.
considerations in FL systems followed by outlining our threat model One such additive homomorphic scheme is the Paillier cryptosys-
and general system. We then provide experimental evaluation and tem [31], a probabilistic encryption scheme based on computations
discussion of the system implementation process. Finally, we give in the group Zn∗ 2 , where n is an RSA modulus. In [11] the authors
an overview of related work and some concluding remarks. extend this encryption scheme and propose a threshold variant. In
the threshold variant, a set of participants is able to share the secret
key such that no subset of the parties smaller than a pre-defined
2 PRELIMINARIES
threshold is able to decrypt values.
In this section we introduce building blocks of our approach and
explain how various approaches fail to protect data privacy in FL. 2.3 Privacy in Federated Learning
In centralized learning environments a single party P using a dataset
2.1 Differential Privacy D executes some learning algorithm f M resulting in a model M
Differential privacy (DP) is a rigorous mathematical framework where f M (D) = M. In this case P has access to the complete dataset
wherein an algorithm may be described as differentially private if D. By contrast, in a federated learning environment, multiple parties
and only if the inclusion of a single instance in the training dataset P1 , P2 , ..., Pn , each have their own dataset D 1 , D 2 , ..., D n , respec-
causes only statistically insignificant changes to the algorithm’s tively. The goal is then to learn a model using all of the datasets.
output. For example, consider private medical information from We must consider two potential threats to data privacy in such an
a particular hospital. The authors in [40] have shown that with FL environment: (1) inference during the learning process and (2) in-
access to only a trained ML model, attackers can infer whether ference over the outputs. Inference during the learning process refers
or not an individual was a patient at the hospital, violating their to any participant in the federation inferring information about an-
right to privacy. DP puts a theoretical limit on the influence of a other participant’s private dataset given the data exchanged during
A Hybrid Approach to Privacy-Preserving Federated Learning London ’19, November 15, 2019, London, UK

the execution of f M . Inference over the outputs refers to the leakage 3.1.1 Honest-But-Curious Aggregator. The honest-but-curious or
of any participants’ data from intermediate outputs as well as M. semi-honest adversarial model is commonly used in the field of SMC
We consider two types of inference attacks: insider and outsider. since its introduction in [3] and application to data mining in [27].
Insider attacks include those launched by participants in the FL Honest-but-curious adversaries follow the protocol instructions
system, including both data holders as well as any third parties, correctly but will try to learn additional information. Therefore,
while outsider attacks include those launched both by eavesdroppers the aggregator will not vary from the predetermined ML algorithm
to the communication between participants and by users of the but will attempt to infer private information using all data received
final predictive model when deployed as a service. throughout the protocol execution.
2.3.1 Inference during the learning process. Let us consider f M as 3.1.2 Colluding Parties. Our work also considers the threat of
the combination of computational operations and a set of queries collusion among parties, including the aggregator, through the
Q 1 , Q 2 , ..., Q k . That is, for each step s in f M requiring knowledge trust parameter t which is the minimum number of non-colluding
of the parties’ data there is a query Q s . In the execution of f M each parties. Additionally, in contrast to the aggregator, we consider
party Pi must respond to each such query Q s with appropriate scenarios in which parties in P may deviate from the protocol
information on D i . The types of queries are highly dependent on execution to achieve additional information on data held by honest
f M . For example, to build a decision tree, a query may request parties.
the number of instances in D i matching a certain criteria. In con-
trast, to train an SVM or neural network a query would request 3.1.3 Outsiders. We also consider potential attacks from adver-
model parameters after a certain number of training iterations. Any saries outside of the system. Our work ensures that any adversary
privacy-preserving FL system must account for the risk of inference monitoring communications during training cannot infer the pri-
over the responses to these queries. vate data of the participants. We also consider users of the final
Privacy-preserving ML approaches addressing this risk often model as potential adversaries. A predictive model output from our
do so by using secure multiparty computation (SMC). Generally, system may therefore be deployed as a service, remaining resilient
SMC protocols allow n parties to obtain the output of a function to inference against adversaries who may be users of the service.
over their n inputs while preventing knowledge of anything other We now detail the assumptions made in our system to more
than this output [20]. Unfortunately, approaches exclusively using concretely formulate our threat model.
secure multiparty computation remain vulnerable to inference over 3.1.4 Communication. We assume secure channels between each
the output. As the function output remains unchanged from func- party and the aggregator. This allows the aggregator to authenticate
tion execution without privacy, the output can reveal information incoming messages and prevents an adversary, whether they be an
about individual inputs. Therefore, we must also consider potential outsider or malicious data party, from injecting their own responses.
inference over outputs.
3.1.5 System set up. We additionally make use of the threshold
2.3.2 Inference over the outputs. This refers to intermediate out- variant of the Paillier encryption scheme from [11] assuming secure
puts available to participants as well as the predictive model. Recent key distribution. It is sufficient within our system to say that se-
work shows that given only black-box access to the model through mantic security of encrypted communication is equivalent to the
an ML as a service API, an attacker can still make training data decisional composite residuosity assumption. For further discussion
inferences [40]. An FL system should prevent such outsider attacks we direct the reader to [11]. Our use of the threshold variant of the
while also considering insiders. That is, participant Pi should not Paillier system ensures that any set of n − t or fewer parties can-
be able to infer information about D j when i , j as shown in [30]. not decrypt ciphertexts. Within the context of our FL system, this
Solutions addressing privacy of output often make use of the DP ensures the privacy of individual messages sent to the aggregator.
framework discussed in Preliminaries. As a mechanism satisfying
differential privacy guarantees that if an individual contained in a
3.2 Proposed Approach
given dataset is removed, no outputs would become significantly
more or less likely [13], a learning algorithm f M which is theoreti- We propose an FL system that addresses risk of inference during
cally proven to be ϵ-differentially private is guaranteed to have a the learning process, risk of inference over the outputs, and trust.
certain privacy of output quantified by the ϵ privacy parameter. We combine methods from SMC and DP to develop protocols that
In the federated learning setting it is important to note that the guarantee privacy without sacrificing accuracy.
definition of neighboring databases is consistent with the usual We consider the following scenario. There exists a set of n parties
DP definition – that is, privacy is provided at the individual record P = P1 , P2 , ..., Pn , a set of disjoint datasets D 1 , D 2 , ..., D n belonging
level, not the party level (which may represent many individuals). to the respective parties and adhering to the same structure, and
an aggregator A. Our system takes as additional input three pa-
3 AN END-TO-END APPROACH WITH TRUST rameters: f M , ϵ, and t. f M specifies the training algorithm, ϵ is the
privacy guarantee against inference, and t specifies the minimum
3.1 Threat Model number of honest, non-colluding parties.
We propose a system wherein n data parties use an ML service for The aggregator A runs the learning algorithm f M consisting
FL. We refer to this service as the aggregator. Our system is designed of k or fewer linear queries Q 1 , Q 2 , ..., Q k , each requiring infor-
to withstand three potential adversaries: (1) the aggregator, (2) the mation from the n datasets. This information may include model
data parties, and (3) outsiders. parameters after some local learning on each dataset or may be
London ’19, November 15, 2019, London, UK Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou

Algorithm 1 Private Federated Learning

Input: ML algorithm f M ; set of data parties P of size N , with each Pi ∈ P holding
a private dataset D i and a portion of the secret key sk i ; minimum number of honest,
non-colluding parties t ; privacy guarantee ϵ
t̄ = n − t + 1
for each Q s ∈ f M do
for each Pi ∈ P do
A asynchronously queries Pi with Q s
Pi sends r i,s = Enc pk (Q s (D i ) + noise(ϵ, t ))
end for
A aggregates Enc pk (r s ) ← r 1,s ◦ r 2,s ◦ ... ◦ r N , s
A selects Pd ec ⊆ P such that | Pd ec | = t̄
for each Pi ∈ Pd ec do
A asynchronously queries Pi with Enc pk (r s )
Figure 1: System Overview A receives partial decryption of r s from Pi using sk i
end for
A computes r s from partial decryptions
more traditionally queried information such as how many individ- A updates M with r s
uals match a set of criterion. For each algorithm deployed into our end for
system, any step s requiring such information reflective of some return M
analysis on the private, local datasets must be represented by a each Yi is drawn from the Gaussian distribution with standard
corresponding query Q s . Figure 1 shows an outline of a step s in nσ 2 Í
deviation Ss √σs . This is equivalent to N (0, Ss2 t −1s ) ni=1 Q s (D i ).
f M . Using secure channels between A and each of the parties, the t −1
aggregator will send query Q s . Each party will then calculate a Since we know that t − 1 < n, the noise included in the decrypted
response using their respective datasets. The participant will use a value is strictly greater than that required to satisfy DP. Additionally,
differential privacy mechanism that depends on the algorithm f M the encryption scheme guarantees that the maximum number of
to add the appropriate amount of noise according to the privacy colluders, t¯, cannot decrypt values of honest parties.
budget allocated to that step, the sensitivity of Q s , and the level Given this approach, we are able to maintain the customizable na-
of trust in the system. The noisy response is then encrypted using ture of our system with the trust parameter t and the formal privacy
the threshold variant of the Paillier cryptosystem and sent to A. guarantees of the DP framework while decreasing the amount of
Homomorphic properties then allow A to aggregate the individual noise for each query response leading to more accurate ML models.
responses. A subsequently queries at least n − t + 1 data parties
to decrypt the aggregate value and updates the model M. At the 4 EXPERIMENTAL EVALUATION
conclusion of f M , the model M is exposed to all participants. This In this section we empirically demonstrate how to apply our ap-
process is outlined in Algorithm 1. proach to train three distinct learning models: decision trees (DT),
We consider trust with respect to collusion in two steps: (1) in convolutional neural networks (CNN) and linear Support Vector
the addition of noise and (2) in the threshold setting of the encryp- Machines (SVM). We additionally provide analysis on the impact
tion scheme. The more participants colluding, the more knowledge of certain settings on the performance of our approach.
which is available to infer the data of an honest participant. There-
fore, the noise introduced by an honest participant must account 4.1 Decision Trees
for collusion. The use of homomorphic encryption however allows We first consider DT learning using the ID3 algorithm. In this
for significant increases in accuracy (over local privacy approaches). scenario, each dataset D i owned by some Pi ∈ P contains a set of
We now detail this strategy for FL. instances described by the same set of categorical features F and
a class attribute C. The aggregator initializes the DT model with
a root node. Then, the feature F ∈ F that maximizes information
3.3 Reducing Noise with SMC
gain is chosen based on counts queried from each party in P and
A key component to our system is the ability to reduce noise by child nodes are generated for each possible value of F . The feature
leveraging the SMC framework while considering a customizable F is then removed from F . This process continues recursively for
trust parameter. each child node until either (a) there are no more features in F , (b)
Specifically, let σs and Ss respectively be the noise parameter a pre-determined max-depth is reached, or (c) responses are too
and sensitivity to step Q s allocated a budget ϵs in the learning noisy to be deemed meaningful. This process is specifically detailed
algorithm f M . In a traditional application of differential privacy to as algorithmic pseudocode in Section 5.1.
federated learning, each party will use the Gaussian mechanism to There are two types of participant queries in private, federated
add N (0, Ss2 σs2 ) noise to their response r i,s when queried by A at DT learning: counts and class_counts. For executing these queries A
step Q s . This guarantees the privacy of each r i,s . first divides the entire privacy budget ϵ equally between each layer
If, however, each r i,s is encrypted using the scheme proposed of the tree. According to the composition property of differential
in [11] with a threshold setting of t¯ = n − t + 1, the noise may privacy, because different nodes within the same layer are evaluated
be reduced by a factor of t − 1. Rather than returning Q s (D i ) + on disjoint subsets of the datasets, they do not accumulate privacy
σ2
N (0, Ss2 σs2 ), each party may return Enc(Q s (D i ) + N (0, Ss2 t −1
s
)). loss and therefore the budget allocated to a single layer is not
Note that when A aggregates these responses the value that is divided further. Within each node, half of the budget (ϵ1 ) is allocated
eventually decrypted and exposed will be ni=1 Q s (Pi ) + Yi where
Í
to determining total counts and half is allocated to either class
A Hybrid Approach to Privacy-Preserving Federated Learning London ’19, November 15, 2019, London, UK

No Privacy Our Approach Local DP No Privacy Our Approach Local DP

Random Guess Uniform Guess Random Guess Uniform Guess
1.0 1.0
0.8 0.8
F1 Score

F1 score
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0.1
0.05

100

500
epsilon data parties
Figure 2: Effect of privacy budgets on the overall F1-score Figure 3: Effect of increasing number of parties on the over-
for Decision Trees all F1-score for Decision Trees

counts (done at the leaf nodes) or evaluating attributes (done at in degraded performance as the budget decreases, which is expected.
internal nodes). For internal nodes, each feature is evaluated for It is clear that our approach maintains improved performance over
potential splitting against the same dataset. The budget allocated the local DP approach for all budgets (until both approaches con-
to evaluating attributes must therefore be divided amongst each verge to the random guessing baseline). Particularly as the budget
|F|
feature (ϵ2 ). In all experiments the max depth is set to d = 2 . decreases from 1.0 to 0.4 we see our approach maintaining better
resilience to the decrease in the privacy budget.
Dataset. We conduct a number of experiments using the Nurs-
ery dataset from the UCI Machine Learning Repository [12]. This Number of Parties. Another important consideration for FL sys-
dataset contains 8 categorical attributes about 12,960 nursery school tems is the ability to maintain accuracy in highly distributed sce-
applications. The target attribute has five distinct classes with the narios. That is, when many parties, each with a small amount of
following distribution: 33.333%, 0.015%, 2.531%, 32.917%, 31.204%. data, such as in an IoT scenario, are contributing to the learning.
In Figures 3 and 4 we show the impact that |P | has on perfor-
Comparison Methods. To put model performance into context, mance. The results are for a fixed overall privacy budget of 0.5 and
we compare with two different random baselines and two current assume no collusion. For each experiment, the overall dataset was
FL approaches. Random baselines enable us to characterize when a divided into |P | equal sized partitions.
particular approach is no longer learning meaningful information The results in Figure 3 demonstrate the viability of our system
while the FL approaches visualize relative performance cost. for FL in highly distributed environments while highlighting the
(1) Uniform Guess. In this approach, class predictions are ran- shortcomings of the local DP approach. As |P | increases, the noise
domly sampled with a |C1 | chance for each class. in the local DP approach increases proportionally while our ap-
proach maintains consistent accuracy. We can see that with as few
(2) Random Guess. Random guess improves upon Uniform Guess
as 25 parties, the local DP results begin to approach the baseline
with consideration of class value distribution in the training
and even dip below random guessing by 100 participants.
data. At test time, each prediction is sampled from the set of
training class labels.
Unencrypted Encrypted
(3) Local DP. In the local approach, parties add noise to protect 80
training time

60
(seconds)

the privacy of their own data in isolation. The amount of

noise necessary to provide ϵ-differential privacy to each 40
dataset is defined in [5]. 20
(4) No Privacy. This is the result of running the distributed 0
1 2 3 4 5 6 7 8 9 10
learning algorithm without any privacy guarantee.
data parties
4.1.1 Variation in Settings. We now look at how different set- Figure 4: Decision Tree Training Time with Encryption
tings impact results.
Another consideration relative to the scaling of participants is
Privacy Budget. We first look at the impact of the privacy budget
the overhead of encryption. Figure 4 highlights the scalability of our
on performance in our system. To isolate the impact of the privacy
system, showing the impact that encryption has on overall training
budget we set the number of parties, |P |, to 10 and assume no collu-
time in our system as the number of parties increases from 1 to 10.
sion. We consider budget values between 0.05 and 2.0. Recall from
While the entire system experiences a steady increase in cost as
Preliminaries that for a mechanism to be ϵ-differentially private
the number of participants increases, the impact of the encryption
the amount of noise added will be inversely proportional to value
remains consistent. Because our system is designed for a distributed
of ϵ. In other words, the smaller the ϵ value, the smaller the privacy
scenario, the interactions with the aggregator are done in parallel
budget, and the more noise added to each query.
and therefore the overhead of encryption remains constant as the
We can see in Figure 2 that our approach maintains an F1-score
number of parties increases.
above 0.8 for privacy budgets as small as 0.4. Once the budget dips
below 0.4 we see the noise begins to overwhelm the information Trust. An important part of our system is the trust parameter.
being provided which can have one of two outcomes: (1) learning While the definition of a neighboring database within the con-
pre-maturely haults or (2) learning become inaccurate. This results text of the differential privacy framework considers privacy at the
London ’19, November 15, 2019, London, UK Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou

Distributed No Privacy Our Approach

11 Existing Approach - Epsilon 2 Our Approach - Epsilon 2 Local DP Central Data Holder, With Privacy
Existing Approach - Epsilon 1 Our Approach - Epsilon 1 Central Data Holder, No Privacy
1.0
Epsilon (Log Scale)

0.9

F1 score
0.1
0.1
0.8
0.7
0.01
0.01
0.6
0.5

1
10
19
28
37
46
55
64
73
82
91
100
0.001
0.001 epoch
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
Untrusted (%) Figure 6: Convolutional Neural Network Training with
Figure 5: Query Epsilons in Decision Tree Training with MNIST Data (10 parties and σ = 8, (ϵ, δ ) = (0.5, 10−5 ))
Varying Rate of Trust (50 parties). Epsilon 1 is defined as the to each randomly sampled batch. Using the moments accountant
privacy budget for count queries while Epsilon 2 is used for √
in [1], our approach is (O(bϵ E/b ), δ )-DP overall.
class counts.
record level, the trust model for adversarial knowledge is consid- Dataset and Model Structure. For our CNN experiments we use
ered within the context of the entire system. The trust parameter the publicly available MNIST dataset. This includes 60,000 training
therefore represents the degree of adversarial knowledge by cap- instances of handwritten digits and 10,000 testing instances. Each
turing the maximum number of colluding parties which the system example is a 28x28 grey-scale image of a digit between 0 and 9 [24].
may tolerate. Figure 5 demonstrates how the ϵ values used for both We use a model structure similar to that in [1]. Our model is a
count and distribution queries in private, federated DT learning are feedforward neural network with 2 internal layers of ReLU units
impacted by the trust parameter setting when |P | = 50. and a softmax layer of 10 classes with cross-entropy loss. The first
In the worst case scenario where a party Pi ∈ P assumes that layer contains 60 units and the second layer contains 1000. We set
all other P j ∈ P, i , j are colluding, our approach converges with the norm clipping to 4.0, learning rate to 0.1 and batch rate to 0.01.
existing local DP approaches. In all other scenarios the query ϵ We use Keras with a Tensorflow backend.
values will be increased in our system leading to more accurate
Comparison Methods. To the best of our knowledge, this paper
outcomes. Additionally, we believe the aforementioned scenario of
presents the first approach to accurately train a neural network
no trust is unlikely to exist in real world instances. Let us consider
in a private federated fashion without reliance on any public or
smart phone users as an IoT example. Collusion of all but one
non-protected data. We therefore compare our approach with the
party is impractical not only due to scale but also since such a
following baselines:
system is likely the be running without many users even knowing.
Additionally, on a smaller scale, if there is a set of five parties in (1) Central Data Holder, No Privacy. In this approach all the data
the system and one party is concerned that the other four are all is centrally held by one party and no privacy is considered
colluding, there is no reason for the honest party to continue to in the learning process.
participate. We therefore believe that realistic scenarios of FL will (2) Central Data Holder, With Privacy. While all the data is still
see accuracy gains when deploying our system. centrally held by one entity, this data holder now conducts
privacy-preserving learning. This is representative of the
scenario in [1].
4.2 Convolutional Neural Networks (3) Distributed, No Privacy. In this approach the data is dis-
We additionally demonstrate how to use our method to train a tributed to multiple parties, but the parties do not add noise
distributed differentially private CNN. In our approach, similarly during the learning process.
to centrally trained CNNs, each party is sent a model with the (4) Local DP. Parties add noise to protect the privacy of their
same initial structure and randomly initialized parameters. Each own data in isolation, adapting from [1] and [39].
party will then conduct one full epoch of learning locally. At the Figure 6 shows results with 10 parties conducting 100 epochs of
conclusion of each batch, Gaussian noise is introduced according training with the privacy parameter σ set to 8.0, the “large noise”
to the norm clipping value c and the privacy parameter σ . Norm setting in [1]. Note that Central Data Holder, No Privacy and Dis-
clipping allows us to put a bound on the sensitivity of the gradient tributed Data, No Privacy achieve similar results and thus overlap.
update.We use the same privacy strategy used in the centralized Our model is able to achieve an F1-score in this setting of 0.9. While
training approach presented in [1]. Once an entire epoch, or b1 this is lower than the central data holder setting where an F1-score
batches where b = batch rate, has completed the final parameters of approximately 0.95 is achieved, our approach again significantly
are sent back to A. A then averages the parameters and sends out-performs the local approach which only reaches 0.723. Addi-
back an updated model for another epoch of learning. After a pre- tionally, we see a drop off in the performance of the local approach
determined E number of epochs, the final model M is output. This early on as updates become overwhelmed by noise.
process for the aggregator and data parties are specifically detailed We additionally experiment with σ = 4 and σ = 2 as was done
as algorithmic pseudocode in Section 5.2. in [1]. When σ = 4 ((ϵ, δ ) = (2, 10−5 )) the central data holder with
Within our private, federated NN learning system, if σ = 2 · log 1.25 /ϵ
q
δ privacy is able to reach an F1 score of 0.960, the local approach
then by [16] our approach is (ϵ, δ )-differentially private with respect reaches 0.864, and our approach results in an F1-score of 0.957.
A Hybrid Approach to Privacy-Preserving Federated Learning London ’19, November 15, 2019, London, UK

Our Approach Local Differential Privacy Distributed, No Privacy Our Approach

Standard Deviation of Local DP Central DP
40
Noise DIstribution Central, No Privacy
30 1
20 0.9

F1 score
10 0.8

0 0.7

0 10 20 30 40 50 60 70 80 90 100 0.6

Untrusted (%) 0.5

1 10 19 28 37 46 55 64 73 82 91 100
Figure 7: Degree of Noise in Convolutional Neural Network epoch
Training with Varying Rate of Trust Figure 8: Linear SVM Training(10 parties and (ϵ, δ ) = (5,
0.0059))
When σ = 2 ((ϵ, δ ) = (8, 10−5 )) those scores become 0.973, 0.937,
and 0.963 respectively. We can see that our approach therefore FL methods, each party runs 10 epochs locally. We used 10 non-
demonstrates the most gain with larger σ values which translates colluding parties. Using σ =4, we report findings to achieve (ϵ, δ ) =
to tighter privacy guarantees. (5, 0.0059) according to [7].
Figure 7 again shows how the standard deviation of noise is Figure 8 shows F1-scores for the evaluated training methods.
significantly decreased in our system for most scenarios. Central, No Privacy and Distributed, No Privacy perform simi-
Our experiments demonstrate that the encryption time for one larly with F1-scores around 0.99 after fewer than 10 epochs due to
parameter at a party Pi takes approximately 0.001095 sec while the balanced nature of the dataset. Among the privacy-preserving
decryption between A and Pdec takes 0.007112 sec. While each pa- approaches, Central DP introduces the least noise and achieves
rameter requires encryption and decryption, these processes can be the highest F1-score. Among private FL methods, our approach
done completely in parallel. Therefore overhead remains relatively achieves an F1-score over 0.87 which is almost equal to Central DP
constant as both |P | and the number of parameters increase. and significantly out-performs Local DP after 100 epochs.
Overall, we have demonstrated that our system provides signif- We also evaluated our system in a lower trust setting with only
icant accuracy gains for FL compared with local DP in plausible, half of the parties trusted as non-colluding. Our approach again out-
real world scenarios and scales well. performed Local DP. Specifically, after 100 epochs, our approach
reached an F1-score of 0.85, while the Local DP achieves only 0.75.
4.3 Support Vector Machines (SVM) These experimental results show that our approach consistently
out-performs state of the art methods to train different ML models
We also demonstrate and assess our approach when solving a classic
in a private FL fashion. We similarly showed that our approach
ℓ2 -regularized binary linear SVM problem with hinge loss.
consistently out-performs baselines such as random guessing while
To train the linear SVM in a private distributed fashion, the
remaining reasonably close to non-private settings.
aggregator distributes a model with the same weight vector w to
all parties. Each party then runs a predefined number of epochs to
learn locally. To apply differential privacy in this setting, we first 5 SYSTEM IMPLEMENTATION
perform norm clipping on the feature vector x to obtain a bound The development and deployment of new machine learning training
on the sensitivity of the gradient update. Then, Gaussian noise is algorithms to our system requires the training process be first
added to the gradient according to [25]. After each party completes broken down into a set of queries in which meaningful aggregation
its local training, the final noisy encrypted weights are sent back may be done via summation. Each query must then be analyzed
to the aggregator. The aggregator averages the encrypted weights for its privacy impact and and designated a portion of the overall
and sends back an updated model with a new weight vector for privacy budget. Additionally, support must exist at each party for
another epoch of learning. Training ends after a predefined number each type of query required by the private training algorithm. We
of epochs. The detailed process is presented in Section 5.3. will now provide implementation details for each of the model types
evaluated, Decision Trees, Neural Networks, and Support Vector
Dataset. We use the publicly available ‘gisette’ dataset, which Machines, with additional discussion on the applicability of our
was used for NIPS 2003 Feature Selection Challenge [9]. This dataset framework to machine learning algorithms for other model types.
has 6, 000 training and 1, 000 testing samples with 5, 000 features.

Comparison Methods. We contrast the performance of our ap-

5.1 Application to Private Decision Tree
proach against other ways to train the model. Training
DT learning follows these steps: (1) determine the feature which
(1) Central, No Privacy. Centralized training without privacy.
best splits the training data, (2) split the training data into subsets
(2) Central DP. Centralized training with DP.
according to the chosen feature, (3) repeat steps (1) and (2) for
(3) Distributed, No Privacy. Model trained through federated
each subset. This is repeated until the subsets have reached a pre-
learning without privacy.
determined level of uniformity with respect to the target variable.
(4) Local DP. In this approach each party adds enough noise
To conduct private decision tree learning in our system we first
independently to protect their data according to [5].
address step (1): determining the best feature on which to split the
In these experiments, the learning rate was set to 0.01 for all data. We define the “best” feature as the feature which maximizes
settings. We used 100 epochs for all approaches. Additionally, for information gain. This is the same metric used in the ID3 [34],
London ’19, November 15, 2019, London, UK Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou

Algorithm 2 Private Decision Tree Learning meaningful information. The resulting private algorithm deployed
Input: Set of data parties P ; minimum number of honest, non-colluding parties t ; in our system is detailed in Algorithm 2.
privacy guarantee ϵ ; attribute set F ; class attribute C ; max tree depth d ; public key
pk
t̄ = n − t + 1 5.2 Application to Private Neural Network
ϵ1 = 2(dϵ+1) Training
Define current splits, S = ∅, for root node
M = BuildTree( S , P , t , ϵ1 , F , C , d , pk ) The process of deploying our system for neural network learning
return M is distinct from the process outlined in the previous section for
procedure BuildTree( S , P , t , ϵ1 , F , C , d , pk )
f = maxF ∈F |F | decision tree learning. In central neural network training, after a
Asynchronously query P : counts( S, ϵ1 , t ) randomly initialized model of pre-defined structure is created, the
N = decrypted aggregate of noisy √
counts following process is used: (1) the dataset D is shuffled and then
if F = ∅ or d = 0 or f N 2
|C | < ϵ1 then equally divided into batches, (2) each batch is passed through the
Asynchronously query P : class_counts( S, ϵ1 , t ) model iteratively, (3) a loss function L is used to compute the
Nc = vector of decrypted, noisy class counts
return node labeled with arg maxc Nc error of the model on each batch, (4) errors are then propagated
else
ϵ1
back through the network where an optimizer such as Stochastic
ϵ2 = 2|F|
Gradient Descent (SGD) is used to update network weights before
for each F ∈ F do
for each f i ∈ F do processing the next batch. Steps (1) through (4) constitute one
Update set of split values to send to child node: S i = S + {F = f i } epoch of learning and are repeated until the model converges (stops
Asynchronously query P : counts( Si , ϵ2 , t ) demonstrating improved performance).
and class_counts( Si , ϵ2 , t )
N ′Fi = aggregate of counts In our system we equate one query to the data parties as one
N ′Fi,c = element-wise aggregate of class_counts epoch of local learning. That is, each party conducts steps (1)
Recover N iF from t̄ partial decryptions of N ′Fi through (4) for one iteration and then sends an updated model
Recover N i,cF from t̄ partial decryptions of N ′F to the aggregator. The aggregator then averages the new model
i,c
end for weights provided by each party. An updated model is then sent
N F
· log i,c
Í |F | Í |C | F
VF = i =1 c =1 N i,c F along with a new query for another epoch of learning to each party.
Ni
end for
F¯ = arg maxF VF
Create root node M with label F¯ Algorithm 3 Private CNN Learning: Aggregator
for each f i ∈ F¯ do
Input: Set of data parties P ; minimum number of honest, non-colluding parties t ;
S i = S + {F = f i } noise parameter σ ; learning rate η ; sampling probability b ; loss function L ; clipping
M i = BuildTree(S i , P , t , ϵ1 , F \ F¯ , C , d − 1, pk ) value c ; number of epochs E ; public key pk
Set M i as child of M with edge f i t̄ = n − t + 1
end for Initialize model M with random weights θ ;
return M for each e ∈ [E] do
end if Asynchronously query P :
end procedure train_epoch(M , η , b , L , c , σ , t )
θ e = decrypted aggregate, noisy parameters from P
C4.5 [35] and C5.0 [36] tree training algorithms. Information gain M ← θe
end for
for a candidate feature f quantifies the difference between the return M
entropy of the current data with the weighted sum of the entropy
values for each of the data subsets which would be generated if f
were to be chosen as the splitting feature. Entropy for a dataset (or Each epoch receives the noise parameter σ and cost to the overall
subset) D is computed via the following equation: privacy budget is determined through a separate privacy accoun-
tant utility. Just as the decision tree stopping condition was replaced
|C |
Õ with a pre-set depth the neural network stopping condition of con-
Entropy(D) = pi log2pi (3) vergence is replaced with a pre-defined number of epochs E. This
i=1
process from the aggregator perspective is outlined Algorithm 3.
where C is the set of potential class values and pi indicates the At each data party we deploy code to support the process de-
probability that a random instance in D is of class i. Therefore, the tailed in Algorithm 4. To conduct a complete epoch of learning we
selection of the “best” feature on which to switch can be chosen follow the approach proposed in [1] for private centralized neural
via determining class probabilities which in turn may be computed network learning. This requires a number of changes to the tra-
via counts. Queries to the parties from the aggregator are therefore ditional learning approach. Rather than shuffling the dataset into
counts and class counts, known to have a sensitivity of 1. equal sized batches, a batch is randomly sampled for processing
Given the ability to query parties for class counts the aggregator with sampling probability b. An epoch then becomes defined as
may then control the iterative learning process. To ensure this the number of batch iterations required to process |D i | instances.
process is differentially private according to a pre-defined privacy Additionally, parameter updates determined through the loss func-
budget, we follow the approach from [19] to divide the budget for tion L are clipped to define the sensitivity of the neural network
each iteration and set a fixed number of iterations rather than a learning to individual training instances. Noise is then added to
purity test as a stopping condition. The algorithm will also stop if the weight updates. Once an entire epoch is completed the updated
counts appear too small relative to the degree of noise to provide weights can be sent back to the aggregator.
A Hybrid Approach to Privacy-Preserving Federated Learning London ’19, November 15, 2019, London, UK

Algorithm 4 Private CNN Learning: Data Party Pi Algorithm 6 Private SVM Learning: Data Party Pi
procedure train_epoch(M , η , b , L , c , σ , t ) procedure train_epoch(M , η , K , L , c , σ , t )
θ = parameters of M w = parameters of M
for j ∈ {1, 2, ..., b1 } do for each (x i , yi ) ∈ D do
Randomly sample D i, j from D i w/ probability b ||x i ||2
x i ← x i / max 1, c
for each d ∈ D i, j do
end for
gj (d ) ← ▽θ L(θ, d ) for k ∈ {1, 2, ..., K } do
||g (d )||
ḡj (d ) ← g j (d )/ max 1, j c 2 g(D) ← ▽w L(w, D)
σ2
end for ḡ ← g(D) + N 0, t −1
σ2
Í
ḡj ← |D 1 ∀d ḡj (d ) + N 0, c2 · t −1 w ← w − η ḡ
i, j |
θ ← θ − η ḡj M ←w
end for
M ←θ
return Enc pk (w )
end for
return Enc pk (θ ) end procedure
end procedure
interest and significantly different. The task of generating and de-
5.3 Application to Private Support Vector ploying our system for each algorithm, however, is non-trivial. First,
Machine Training a DP version of the algorithm must be developed. Second, this must
be written as a series of queries. Finally, each query must have
Finally, we focus on the classic ℓ2 -regularized binary linear SVM
an appropriate aggregation procedure. Our approach may then be
problem with hinge loss, which is given in the following form:
applied for accurate, federated, private results.
1 Õ
L(w) := max{0, 1 − yi ⟨w, x i ⟩} + λ∥w ∥22 , (4) Due to our choices to use the threshold Paillier cryptosystem in
|D| conjunction with an aggregator, rather than a complex SMC pro-
∀(x i ,yi )∈D
tocol run by the parties themselves, we can provide a streamlined
where (x i , yi ) ∈ Rd × {−1, 1} is a feature vector, class label pair, w
∈ interface between the aggregator and the parties. Parties need only
Rd is the model weight vector, and λ is the regularized coefficient. answer data queries with encrypted, noisy responses and decryp-
From the aggregator perspective, specified in Algorithm 5, the tion queries with partial decryption values. Management of the
process of SVM training is similar to that of neural network training. global model and communication with all other parties falls to the
Each query to the data parties is defined as K epochs of training. aggregator, therefore decreasing the barrier to entry for parties to
Once query responses are received, model parameters are averaged engage in our federated learning system. Figure 4 demonstrates the
to generate a new support vector machine model. This new model impact of this choice as our approach is able to effectively handle
is then sent to the data parties for another K epochs of training. We the introduction of more parties into the federated learning system
again specify a pre-determined number of epochs E to control the without the introduction of increased encryption overhead.
number of training iterations. Another issue in the deployment of new machine learning train-
ing algorithms is the choice of algorithmic parameters. Key deci-
Algorithm 5 Private SVM Learning: Aggregator sions must be made when using our system and many are domain-
Input: Set of data parties P ; minimum number of honest, non-colluding parties t ; specific. We aim to inform such decisions with our analysis of
noise parameter σ ; learning rate η ; loss function L ; clipping value c ; number of
epochs E ; number of epochs per query K , public key pk
trade-offs between privacy, trust and accuracy in Section 4.1.1, but
t̄ = n − t + 1 note that the impact will vary depending on the data and the train-
Initialize model M with random weights w ; ing algorithm chosen. While our system will reduce the amount of
for each e ∈ [E/K ] do
Asynchronously query P : noise required to train any federated ML algorithm, questions sur-
train_epoch(M , η , K , L , c , σ , t ) rounding what impact various data-specific features will have on
θ e = decrypted aggregate, noisy parameters from P the privacy budget are algorithm-specific. For example, Algorithm 2
M ← θe
end for demonstrates how, in decision tree training, the number of features
return M and classes impact the privacy budget at each level. Similarly, Al-
gorithms 4 and 6 show the role of norm clipping in neural network
To complete an epoch of learning at each data party, we iterate and SVM learning. In neural networks, this value not only impacts
through each instance in the local training dataset D i . We again noise but will also have a different impact on learning depending
deploy a clipping approach to constrain the sensitivity of the upon the size of the network and number of features.
dates. The model parameters are then updated according to the
loss function L as well as the noise parameter. The process con- 6 RELATED WORK
ducted at each data party for K epochs of training in response to
Our work relates to both the areas of FL as well as privacy-preserving
an aggregator query is outlined in Algorithm 6.
ML. Existing work can be classified into three categories: trusted
aggregator, local DP, and cryptographic.
5.4 Expanding the Algorithm Repository
Beyond the three models evaluated here, our approach can be used Trusted Aggregator. Approaches in this area trust the aggregator
to extend any differentially private machine learning algorithm into to obtain data in plaintext or add noise. [1] and [22] propose differ-
a federated learning environment. We demonstrate the flexibility entially private ML systems, but do not consider a distributed data
of our system through 3 example algorithms which are of broad scenario, thus requiring a central party. In [41], the authors develop
London ’19, November 15, 2019, London, UK Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou

a distributed data mining system with DP but show significant counting the number of values in an array. In contrast, we propose
accuracy loss and require a trusted aggregator to add noise. an accurate, private FL system for predictive model training.
Recently, [32] presented PATE, an ensemble approach to pri- Dwork et al. [14] present a distributed noise generation scheme
vate learning wherein several “teacher” models are independently and focus on methods for generating noise from different distribu-
trained over local datasets. A trusted aggregator then provides a DP tions. This scheme is based on secret sharing, an MPC mechanism
query interface to a “student” model that has unlabelled public data that requires extensive exchange of messages and entails a commu-
(but no direct access to private data) and obtains labels through nication overhead not viable in many federated learning settings.
queries to the teachers. While we have proposed a federated learn- [10] proposes a method to train neural networks in a private
ing (FL) approach wherein one global model is learned over the collaborative fashion by combining MPC, DP and secret sharing
aggregate of the parties’ datasets, the PATE method develops an assuming non-colluding honest parties. In contrast, our system
ensemble model with independently trained base models using prevents privacy leakages even if parties actively collude.
local datasets. Unlike the methods we evaluate, PATE assumes a Approaches for the private collection of streaming data, includ-
fully trusted party to aggregate the teachers’ labels; focuses on ing [2, 8, 21, 37], aim to recover computation when one or more
scenarios wherein each party has enough data to train an accurate parties go down. Our system, however, enables private federated
model, which might not hold, e.g., for cellphone users training a learning which allows for checkpoints in each epoch of training.
neural network; and assumes access to publicly available data, an The use of threshold cryptography also enables our system to de-
assumption not made in our FL system. Models produced from our crypt values when only a subset of the participants is available.
FL system learn from all available data, leading to more accurate
models than the local models trained by each participant in PATE 7 CONCLUSION
(Figure 4b in [32] demonstrates the need for a lot of parties to
In this paper, we present a novel approach to perform FL that com-
achieve reasonable accuracy in such a setting).
bines DP and SMC to improve model accuracy while preserving
provable privacy guarantees and protecting against extraction at-
Local Differential Privacy. [39] presents a distributed learning
tacks and collusion threats. Our approach can be applied to train
system using DP without a central trusted party. However, the DP
different ML models in a federated learning fashion for varying trust
guarantee is per-parameter and becomes meaningless for models
scenarios. Through adherence to the DP framework we are able
with more than a small number of parameters.
to guarantee overall privacy from inference of any model output
from our system as well as any intermediate result made available
Cryptographic Approaches. [38] presents a protocol to privately
to A or P. SMC additionally guarantees any messages exchanged
aggregate sums over multiple time periods. Their protocol is de-
without DP protection are not revealed and therefore do not leak
signed to allow participants to periodically upload encrypted values
any private information. This provides end-to-end privacy guar-
to an oblivious aggregator with minimum communication costs.
antees with respect to the participants as well as any attackers of
Their approach however has participants sending in a stream of
the model itself. Given these guarantees, models produced by our
statistics and does not address FL or propose an FL system. Ad-
system can be safely deployed to production without infringing on
ditionally, their approach calls for each participant to add noise
privacy guarantees.
independently. As our experimental results show, allowing each
We demonstrated how to apply our approach to train a variety
participant to add noise in this fashion results in models with low
of ML models and showed that it out-performs existing state-of-
accuracy, making this approach is unsuitable for FL. In contrast, our
the art techniques for FL. Our system provides significant gains in
approach reduces the amount of noise injected by each participant
accuracy when compared to a naïve application of state-of-the-art
by taking advantage of the additive properties of DP and the use
differentially private protocols to FL systems.
of threshold-based homomorphic encryption to produce accurate
For a tailored threat model, we propose an end-to-end private
models that protect individual parties’ privacy.
federated learning system which uses SMC in combination with
In [6, §B] the authors propose the use of multiparty computation
DP to produce models with high accuracy. As far as we know
to securely aggregate data for FL. The focus of the paper is to present
this is the first paper to demonstrate that the application of these
suitable cryptographic techniques to ensure that the aggregation
combined techniques allow us to maintain this high accuracy at
process can take place in mobile environments. While the authors
a given level of privacy over different learning approaches. In the
propose FL as motivation, no complete system is developed with
light of the ongoing social discussion on privacy, this proposed
“a detailed study of the integration of differential privacy, secure
approach provides a novel method for organizations to use ML in
aggregation, and deep learning” remaining beyond the scope.
applications requiring high model performance while addressing
[4] provides a theoretical analysis on how differentially private
privacy needs and regulatory compliance.
computations could be done in a federated setting for single in-
stance operations using either secure function evaluation or the
local model with a semi-trusted curator. By comparison, we consider REFERENCES
[1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov,
multiple operations to conduct FL and provide empirical evaluation Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In
of the FL system. [29] proposes a system to perform differentially Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications
private database joins. This approach combines private set intersec- Security. ACM, 308–318.
[2] Gergely Ács and Claude Castelluccia. 2011. I have a dream!(differentially private
tion with random padding, but cannot be generally applied to FL. smart metering). In International Workshop on Information Hiding. Springer, 118–
In [33] the authors’ protocols are tailored to inner join tables and 132.
A Hybrid Approach to Privacy-Preserving Federated Learning London ’19, November 15, 2019, London, UK

[3] Donald Beaver. 1991. Foundations of secure interactive computing. In Annual [27] Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In
International Cryptology Conference. Springer, 377–391. Annual International Cryptology Conference. Springer, 36–54.
[4] Amos Beimel, Kobbi Nissim, and Eran Omri. 2008. Distributed Private Data [28] Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkatara-
Analysis: Simultaneously Solving How and What. In Advances in Cryptology – man, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al.
CRYPTO 2008, David Wagner (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning
451–468. Research 17, 1 (2016), 1235–1241.
[5] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. 2005. Practical [29] Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join
privacy: the SuLQ framework. In Proceedings of the twenty-fourth ACM SIGMOD- Queries over Distributed Databases.. In OSDI. 149–162.
SIGACT-SIGART symposium on Principles of database systems. ACM, 128–138. [30] Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive Privacy
[6] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan Analysis of Deep Learning: Stand-alone and Federated Learning under Passive
McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Prac- and Active White-box Inference Attacks. In Security and Privacy (SP), 2019 IEEE
tical secure aggregation for privacy-preserving machine learning. In Proceedings Symposium on.
of the 2017 ACM SIGSAC Conference on Computer and Communications Security. [31] Pascal Paillier. 1999. Public-key cryptosystems based on composite degree resid-
ACM, 1175–1191. uosity classes. In International Conference on the Theory and Applications of
[7] Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simpli- Cryptographic Techniques. Springer, 223–238.
fications, extensions, and lower bounds. In Theory of Cryptography Conference. [32] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Tal-
Springer, 635–658. war, and Úlfar Erlingsson. 2018. Scalable Private Learning with PATE. arXiv
[8] T. H. Hubert Chan, Elaine Shi, and Dawn Song. 2012. Privacy-Preserving Stream preprint arXiv:1802.08908 (2018).
Aggregation with Fault Tolerance. In Financial Cryptography and Data Security, [33] Martin Pettai and Peeter Laud. 2015. Combining differential privacy and secure
Angelos D. Keromytis (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 200– multiparty computation. In Proceedings of the 31st Annual Computer Security
214. Applications Conference. ACM, 421–430.
[9] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector [34] J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986),
machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1– 81–106.
27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [35] J Ross Quinlan. 1993. C4. 5: Programming for machine learning. Morgan Kauff-
[10] Melissa Chase, Ran Gilad-Bachrach, Kim Laine, Kristin E Lauter, and Peter Rindal. mann 38 (1993), 48.
2017. Private Collaborative Neural Network Learning. IACR Cryptology ePrint [36] J. Ross Quinlan. 2007. C5. (2007). http://rulequest.com
Archive 2017 (2017), 762. [37] Vibhor Rastogi and Suman Nath. 2010. Differentially private aggregation of
[11] Ivan Damgård and Mats Jurik. 2001. A Generalisation, a Simplification and Some distributed time-series with transformation and encryption. In Proceedings of
Applications of Paillier’s Probabilistic Public-Key System. In Proceedings of the the 2010 ACM SIGMOD International Conference on Management of data. ACM,
4th International Workshop on Practice and Theory in Public Key Cryptography: 735–746.
Public Key Cryptography (PKC ’01). Springer-Verlag, London, UK, UK, 119–136. [38] Elaine Shi, HTH Chan, Eleanor Rieffel, Richard Chow, and Dawn Song. 2011.
http://dl.acm.org/citation.cfm?id=648118.746742 Privacy-preserving aggregation of time-series data. In Annual Network & Dis-
[12] Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. tributed System Security Symposium (NDSS). Internet Society.
(2017). http://archive.ics.uci.edu/ml [39] Reza Shokri and Vitaly Shmatikov. 2015. Privacy-preserving deep learning. In
[13] Cynthia Dwork. 2008. Differential privacy: A survey of results. In International Proceedings of the 22nd ACM SIGSAC conference on computer and communications
Conference on Theory and Applications of Models of Computation. Springer, 1–19. security. ACM, 1310–1321.
[14] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and [40] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Mem-
Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In bership inference attacks against machine learning models. In Security and Privacy
Annual International Conference on the Theory and Applications of Cryptographic (SP), 2017 IEEE Symposium on. IEEE, 3–18.
Techniques. Springer, 486–503. [41] Ning Zhang, Ming Li, and Wenjing Lou. 2011. Distributed data mining with
[15] Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In differential privacy. In Communications (ICC), 2011 IEEE International Conference
Proceedings of the forty-first annual ACM symposium on Theory of computing. on. IEEE, 1–5.
ACM, 371–380.
[16] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ-
ential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4
(2014), 211–407.
[17] Cynthia Dwork and Guy N Rothblum. 2016. Concentrated differential privacy.
arXiv preprint arXiv:1603.01887 (2016).
[18] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differ-
ential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer
Science. IEEE, 51–60.
[19] Arik Friedman and Assaf Schuster. 2010. Data mining with differential privacy.
In Proceedings of the 16th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 493–502.
[20] Oded Goldreich. 1998. Secure multi-party computation. Manuscript. Preliminary
version 78 (1998).
[21] S. Goryczka and L. Xiong. 2017. A Comprehensive Comparison of Multiparty
Secure Additions with Differential Privacy. IEEE Transactions on Dependable and
Secure Computing 14, 5 (Sep. 2017), 463–477. https://doi.org/10.1109/TDSC.2015.
2484326
[22] Geetha Jagannathan, Krishnan Pillaipakkamnatt, and Rebecca N Wright. 2009.
A practical differentially private random decision tree classifier. In Data Mining
Workshops, 2009. ICDMW’09. IEEE International Conference on. IEEE, 114–121.
[23] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2017. The composition
theorem for differential privacy. IEEE Transactions on Information Theory 63, 6
(2017), 4037–4049.
[24] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-
based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–
2324.
[25] Jaewoo Lee and Daniel Kifer. 2018. Concentrated differentially private gradient
descent with adaptive per-iteration privacy budget. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
ACM, 1656–1665.
[26] Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed,
Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling
Distributed Machine Learning with the Parameter Server.. In OSDI, Vol. 14. 583–
598.

Federated Learning Decentralized AI For Privacy-Preserving Collaboration
No ratings yet
Federated Learning Decentralized AI For Privacy-Preserving Collaboration
1 page
Final Project PPt-25CSDS05
No ratings yet
Final Project PPt-25CSDS05
24 pages
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
No ratings yet
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
56 pages
Privacy Preserving AI
No ratings yet
Privacy Preserving AI
2 pages
Data Fusion - KEN4223
No ratings yet
Data Fusion - KEN4223
45 pages
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
No ratings yet
Kairouz, McMahan Et Al 2019 - Advances and Open Problems in Federated Learning
121 pages
COM3030 Week 10 Slides
No ratings yet
COM3030 Week 10 Slides
63 pages
Blockchain For Federated Learning
No ratings yet
Blockchain For Federated Learning
18 pages
Rays of Truth Crystals of Light Information and Guidance For The Golden Age by Fred Bell
100% (2)
Rays of Truth Crystals of Light Information and Guidance For The Golden Age by Fred Bell
868 pages
Sec22 Stevens
No ratings yet
Sec22 Stevens
18 pages
Advances and Open Problems in Federated Learning
No ratings yet
Advances and Open Problems in Federated Learning
105 pages
A Survey On Security and Privacy of Federated Learning
No ratings yet
A Survey On Security and Privacy of Federated Learning
61 pages
FL Medical SLR
No ratings yet
FL Medical SLR
31 pages
Empowering Privacy-Preserving Machine Learning: A Comprehensive Survey On Federated Learning
No ratings yet
Empowering Privacy-Preserving Machine Learning: A Comprehensive Survey On Federated Learning
12 pages
BTP Research Internship Final Report
No ratings yet
BTP Research Internship Final Report
21 pages
Selective Knowledge Sharing For Privacy-Preserving Federated Distillation Without A Good Teacher
No ratings yet
Selective Knowledge Sharing For Privacy-Preserving Federated Distillation Without A Good Teacher
11 pages
100percent Updated 1
No ratings yet
100percent Updated 1
13 pages
2017 - Privacy-Preserving
No ratings yet
2017 - Privacy-Preserving
20 pages
Developing DApps On EOS MuditMarda
No ratings yet
Developing DApps On EOS MuditMarda
18 pages
Recent Advances
No ratings yet
Recent Advances
18 pages
FL 1
No ratings yet
FL 1
25 pages
SecureBoost A Lossless Federated Learning Framework
No ratings yet
SecureBoost A Lossless Federated Learning Framework
9 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
Federated Foundation Models: Privacy-Preserving and Collaborative Learning For Large Models
No ratings yet
Federated Foundation Models: Privacy-Preserving and Collaborative Learning For Large Models
10 pages
Report With Results and Graph
No ratings yet
Report With Results and Graph
30 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
Hybridalpha: An Efficient Approach For Privacy-Preserving Federated Learning
No ratings yet
Hybridalpha: An Efficient Approach For Privacy-Preserving Federated Learning
11 pages
Experiment 1 Photocell
83% (6)
Experiment 1 Photocell
6 pages
InfoSec Project Report
No ratings yet
InfoSec Project Report
7 pages
Privacy
No ratings yet
Privacy
14 pages
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
No ratings yet
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
17 pages
Federated Learning
No ratings yet
Federated Learning
11 pages
Base Paper
No ratings yet
Base Paper
10 pages
1 s2.0 S0167739X21004726 Main
No ratings yet
1 s2.0 S0167739X21004726 Main
9 pages
Research Paper 8
No ratings yet
Research Paper 8
7 pages
Splitfed: When Federated Learning Meets Split Learning
No ratings yet
Splitfed: When Federated Learning Meets Split Learning
14 pages
Federated Learning
No ratings yet
Federated Learning
10 pages
Radiation Protection in Medical Radiography 9th Edition Sherer Solution Manual Full Download
100% (2)
Radiation Protection in Medical Radiography 9th Edition Sherer Solution Manual Full Download
411 pages
Research Paper Mine
No ratings yet
Research Paper Mine
9 pages
Federated Learning Advancements Applications and F
No ratings yet
Federated Learning Advancements Applications and F
7 pages
Hirac (Manhole Installation)
No ratings yet
Hirac (Manhole Installation)
7 pages
A Federated Learning Framework Based On CSP Homomorphic Encryption
No ratings yet
A Federated Learning Framework Based On CSP Homomorphic Encryption
6 pages
Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
No ratings yet
Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
6 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
Adapting Security and Decentralized Knowledge Enhancement in Federated Learning Using Blockchain Technology: Literature Review
No ratings yet
Adapting Security and Decentralized Knowledge Enhancement in Federated Learning Using Blockchain Technology: Literature Review
24 pages
Federated Learning IoT Presentation
No ratings yet
Federated Learning IoT Presentation
9 pages
Privacy Preserving Federated Learning From Multi-Input Functional Proxy Re-Encryption
No ratings yet
Privacy Preserving Federated Learning From Multi-Input Functional Proxy Re-Encryption
5 pages
PPFL Privacy Preserving FL With TEE
No ratings yet
PPFL Privacy Preserving FL With TEE
15 pages
Extended Research Paper 4
No ratings yet
Extended Research Paper 4
6 pages
8 A Privacy Preserving Federated Learning Scheme Using Homomorphic
No ratings yet
8 A Privacy Preserving Federated Learning Scheme Using Homomorphic
15 pages
Split-Fed Learning A Deep Dive Into Methods Innova
No ratings yet
Split-Fed Learning A Deep Dive Into Methods Innova
24 pages
The Future of Privacy-First AI
No ratings yet
The Future of Privacy-First AI
2 pages
1-A Privacy-Preserving FL Scheme With TEE
No ratings yet
1-A Privacy-Preserving FL Scheme With TEE
11 pages
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
No ratings yet
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
24 pages
Federated Learning Presentation
No ratings yet
Federated Learning Presentation
11 pages
Ai For The Future of Privacy Part 2
No ratings yet
Ai For The Future of Privacy Part 2
2 pages
NIT Research Proposal1
No ratings yet
NIT Research Proposal1
2 pages
Big Data Project Privacy and FL DP
No ratings yet
Big Data Project Privacy and FL DP
3 pages
SAFELearn Secure Aggregation For Private FEderated Learning
No ratings yet
SAFELearn Secure Aggregation For Private FEderated Learning
7 pages
File - BB 6 15
No ratings yet
File - BB 6 15
10 pages
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
No ratings yet
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
5 pages
Federated Learning
No ratings yet
Federated Learning
2 pages
Federated Learning For Privacy-Preserving Iot Data Analytics
No ratings yet
Federated Learning For Privacy-Preserving Iot Data Analytics
2 pages
Amity University, Mumbai Aibas: Title: Learned Optimism Scale
No ratings yet
Amity University, Mumbai Aibas: Title: Learned Optimism Scale
9 pages
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
No ratings yet
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
12 pages
Eddy Current Level-1
No ratings yet
Eddy Current Level-1
111 pages
DFPC Fire Instructor I NFPA 1041 2007
No ratings yet
DFPC Fire Instructor I NFPA 1041 2007
10 pages
TLE9 CSS Q3 M4 Maintain-Hand-Tools
No ratings yet
TLE9 CSS Q3 M4 Maintain-Hand-Tools
8 pages
The Influence of Cultural Background On Language Anxiety Among Malaysian ESL Tertiary Learners
No ratings yet
The Influence of Cultural Background On Language Anxiety Among Malaysian ESL Tertiary Learners
12 pages
Oliver Twist Essay Questions
100% (2)
Oliver Twist Essay Questions
4 pages
Written Addition and Subtraction Problem Solving
No ratings yet
Written Addition and Subtraction Problem Solving
15 pages
TCS Codevita Previous Papers - Pdf-Edited
No ratings yet
TCS Codevita Previous Papers - Pdf-Edited
8 pages
The Five Stages of Hawthorne Studies and Their Purposes
No ratings yet
The Five Stages of Hawthorne Studies and Their Purposes
3 pages
Soil Nails Field Pull Out Testing Evaluation and Applications
No ratings yet
Soil Nails Field Pull Out Testing Evaluation and Applications
11 pages
1 s2.0 S2212827124000428 Main
No ratings yet
1 s2.0 S2212827124000428 Main
6 pages
Hardware Implementation of RLC Plant Using OP-AMP: Hammad Muneer: RP21-EE-429
No ratings yet
Hardware Implementation of RLC Plant Using OP-AMP: Hammad Muneer: RP21-EE-429
6 pages
Childhood Interests and Motivation - CCCHE (PVT) LTD
No ratings yet
Childhood Interests and Motivation - CCCHE (PVT) LTD
9 pages
Flexural Behaviour of RC One-Way Slabs Reinforced Using PAN Based Carbon Textile Grid
No ratings yet
Flexural Behaviour of RC One-Way Slabs Reinforced Using PAN Based Carbon Textile Grid
13 pages
Reliability Analysis
No ratings yet
Reliability Analysis
22 pages
Luyện Đọc Điền - Đọc Hiểu (Buổi 8) Livestream
No ratings yet
Luyện Đọc Điền - Đọc Hiểu (Buổi 8) Livestream
3 pages
25 Combi and NT Problems
No ratings yet
25 Combi and NT Problems
7 pages
CelebritySchool Pitch Deck
No ratings yet
CelebritySchool Pitch Deck
13 pages
Block Wise Sub Allocation NFSM CC (Jute) 24-25
No ratings yet
Block Wise Sub Allocation NFSM CC (Jute) 24-25
1 page
AAiT PECC 2015 Year II Sem I Sections 21
No ratings yet
AAiT PECC 2015 Year II Sem I Sections 21
21 pages
Compair Activated Carbon Tower Flyer 2pp GB v2 Work
No ratings yet
Compair Activated Carbon Tower Flyer 2pp GB v2 Work
2 pages
Generation Gap
No ratings yet
Generation Gap
3 pages
CVATSFriendly 1706242751 582645 344192
No ratings yet
CVATSFriendly 1706242751 582645 344192
1 page
Ocb & LMX
No ratings yet
Ocb & LMX
12 pages
Lab 1 - Experiment On Electrostatics
No ratings yet
Lab 1 - Experiment On Electrostatics
5 pages
Privacy-Preserving Machine Learning: A use-case-driven approach to building and protecting ML pipelines from privacy and security threats
From Everand
Privacy-Preserving Machine Learning: A use-case-driven approach to building and protecting ML pipelines from privacy and security threats
Srinivasa Rao Aravilli
No ratings yet

A Hybrid Approach To Privacy-Preserving Federated Learning

Uploaded by

A Hybrid Approach To Privacy-Preserving Federated Learning

Uploaded by

A Hybrid Approach to Privacy-Preserving Federated Learning

Stacey Truex Nathalie Baracaldo Ali Anwar

Thomas Steinke Heiko Ludwig Rui Zhang

San Jose, California San Jose, California San Jose, California

Algorithm 1 Private Federated Learning

No Privacy Our Approach Local DP No Privacy Our Approach Local DP

the privacy of their own data in isolation. The amount of

Distributed No Privacy Our Approach

Our Approach Local Differential Privacy Distributed, No Privacy Our Approach

Untrusted (%) 0.5

Comparison Methods. We contrast the performance of our ap-

You might also like