0% found this document useful (0 votes)

46 views9 pages

SecureBoost A Lossless Federated Learning Framework

Uploaded by

cnlonger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views9 pages

SecureBoost A Lossless Federated Learning Framework

Uploaded by

cnlonger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

SecureBoost: A Lossless Federated Learning

Framework
Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Member, IEEE, Tianjian Chen, Dimitrios Papadopoulos, Qiang
Yang, Fellow, IEEE

Abstract—The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General
Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over
their personal data, which motivates us to explore machine learning frameworks for data sharing that do not violate user privacy. To
arXiv:1901.08755v3 [cs.LG] 7 Apr 2021

meet this goal, in this paper, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting
of federated learning. SecureBoost first conducts entity alignment under a privacy-preserving protocol and then constructs boosting
trees across multiple parties with a carefully designed encryption strategy. This federated learning system allows the learning process
to be jointly conducted over multiple parties with common user samples but different feature sets, which corresponds to a vertically
partitioned data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non-privacy-preserving
approach while at the same time, reveals no information of each private data provider. We show that the SecureBoost framework is as
accurate as other non-federated gradient tree-boosting algorithms that require centralized data and thus it is highly scalable and
practical for industrial applications such as credit risk analysis. To this end, we discuss information leakage during the protocol
execution and propose ways to provably reduce it.

Index Terms—Federated Learning, Privacy, Security, Decision Tree

1 I NTRODUCTION user privacy issue in machine learning [5], [6]. For example,
Apple proposed to use differential privacy (DP) [7], [8] to address
The modern society is increasingly concerned with the unlawful
the privacy preservation issue. The basic idea of DP is to add
use and exploitation of personal data. At the individual level,
properly calibrated noise to data to disambiguate the identity of
improper use of personal data may cause potential risk to user
any individuals when data is being exchanged and analyzed by
privacy. At the enterprise level, data leakage may have grave
a third party. However, DP only prevents user-data leakage to
consequences on commercial interests. Actions are being taken by
a certain degree and cannot completely rule out the identity of
different societies. For example, the European Union has enacted a
an individual. In addition, data exchange under DP still requires
law known as General Data Protection Regulation (GDPR). GDPR
that data change hands between organizations, which may not be
is designed to give users more control over their personal data [1],
allowed by strict laws like GDPR. Furthermore, the DP method
[2], [3], [4]. Many enterprise that rely heavily on machine learning
is lossy in machine learning in that models built after noise is
are beginning to make sweeping changes as a consequence.
injected may perform unsatisfactorily in prediction accuracy.
Despite difficulty in meeting the goal of user privacy pro-
More recently, Google introduced a federated learning (FL)
tection, the need for different organizations to collaborate while
framework [9] and deployed it on Android cloud. The basic idea
building machine learning models still stays strong. In reality,
is to allow individual clients to upload only model updates but not
many data owners do not have sufficient amount of data to build
raw data to a central server where the models are aggregated. A
high-quality models. For example, retail companies have users’
secure aggregation protocol was further introduced [10] to ensure
purchases and transaction data, which are highly useful if provided
the model parameters do not leak user information to the server.
to banks for credit rating applications. Likewise, mobile phone
This framework is also referred to as horizontal FL [11] or data-
companies have users’ usage data, but each company may only
partition FL where each partition corresponds to a subset of data
have a small amount of users which are not enough to train
samples collected from one or multiple users.
high-quality user preference models. Such companies have strong
In this paper, we consider another setting of multiple par-
motivation to collaboratively exploit the joint data value.
ties collaboratively build their machine learning models while
So far, it is still a challenge to allow different data owners to
protecting user privacy and data confidentiality. Our setting is
collaboratively build high-quality machine learning models while
shown in Figure 2 and is typically referred as vertical FL [11]
at the same time protecting user data privacy and confidentiality.
because data are partitioned by features among different parties.
In the past, several attempts have been made to address the
This setting has a wide range of real-world applications. For
Yang Liu and Qiang Yang are the corresponding authors. Email: yan- example, financial institutes can leverage alternative data from a
[email protected], [email protected] third party to enhance users’ and small and medium enterprises’
Kewei Cheng is with the University of California, Los Angeles, Los Angeles, credit ratings [12]. Patents’ record from multiple hospitals can
USA. Tao Fan, Yang Liu, Tianjian Chen are with Department of Artificial be used together for diagnoses [13], [14]. We can regard the data
Intelligence, Webank, Shenzhen, China. Yilun Jin, Dimitrios Papadopoulos are
with Hong Kong University of Science and Technology. Qiang Yang is with located at different parties as a subsection of a virtual big data
both Webank and Hong Kong University of Science and Technology. table obtained by taking the union of all data at different parties.
2

SecureBoost

Sub-Model 1 Sub-Model 3 Sub-Model 2

Intermediate Computation Exchange Intermediate Computation Exchange

Privacy-Preserving Entity Alignment Privacy-Preserving Entity Alignment

Confidential Confidential
Info. Exchange Info. Exchange
Passive Party 1 Active Party Passive Party 2

Fig. 1: Illustration of the proposed SecureBoost framework

Virtually Joint Table

User X1 X2 X3 X4 X5 Y
under a privacy-preserving constraint. Then, we collaboratively
U1 learn a shared classification or regression model without leaking
U2
any user information to each other. We summarize our main
Virtually Join contributions as follows:
User
U1
X1 X2 Y User
U1
X3 X4 X5
• We formally define a novel problem of privacy-preserving
U2
U3
U2
U4
machine learning over vertically partitioned data in the
Party 1 Party 2 setting of federated learning.
• We present an approach to train a high-quality tree boost-
Fig. 2: Vertically partitioned data set
ing model collaboratively while keeping the training data
local over multiple parties. Our protocol does not need the
Then the data at each party has the following property: participation of a trusted third party.
• Finally and importantly, we prove that our approach is
1) The big data table is vertically split, such that the data are lossless in the sense that it is as accurate as any centralized
split in the feature dimension among parties; non-privacy-preserving methods that bring all data to a
2) Only one data provider has the label information; central location.
3) Parties share a common set of users. • In addition, along with a proof of security, we discuss
what would be required to make the protocols completely
Our goal is then to allow parties to build a prediction model jointly
secure.
while protecting all parties from leaking data information to other
parties. In contrast with most existing work on privacy-preserving
data mining and machine learning, the complexity in our setting is 2 P RELIMINARIES AND R ELATED WORK
significantly increased. Unlike sample-partitioned/horizontal FL,
the vertical FL setting requires a more complex mechanism to To protect the privacy of the data used for learning a model,
decompose the loss function at each party [5], [15], [16]. In the authors in [18] proposed to take advantage of differential
addition, since only one data provider owns the label information, privacy (DP) for learning a deep learning model. Recently, Google
we need to propose a secure protocol to guide the learning process introduced a federated learning framework to prevent the data
instead of sharing label information explicitly among all parties. from being transmitted by bringing the model training to each
Finally, data confidentiality and privacy concerns prevent parties mobile terminal [9], [10], [19]. Its basic idea is that each local
from exposing their own users. Hence, entity alignment should mobile terminal trains the local model using its local data with
also be conducted in a sufficiently secure manner. the same model architecture. The global model can simply be
Tree boosting is a highly effective and widely used machine updated by averaging all the local models. Following the same
learning method, which excels in many machine learning tasks idea, several attempts have been made to reinvent different ma-
due to its high efficiency as well as strong interpretability. For chine learning models to the federated setting, including decision
example, XGBoost [17] has been widely used in various appli- tree [20], [21], linear/logistic regression [22], [23], [24] and neu-
cations including credit risk analysis and user behavior studies. ral network [25], [26].
In this paper, we propose a novel end-to-end privacy-preserving All the above methods are designed for horizontally parti-
tree-boosting algorithm and framework known as SecureBoost to tioned data. Unlike sample-partitioned/horizontal FL, the vertical
enable machine learning in a federated setting. Secureboost has FL setting requires a more complex mechanism to decompose
been implemented in an open-sourced FL project, FATE1 to enable the loss function at each party. The concept of vertical FL is
industrial applications. Our federated learning framework operates first proposed in [5], [11] and protocols are proposed for linear
in two steps. First, we find the common users among the parties models [5], [13] and neural networks [27]. Some previous works
have been proposed for privacy-preserving decision trees over
1. https://github.com/FederatedAI/FATE vertically partitioned data [16], [28]. However, their proposed
3

methods have to reveal class distribution over given attributes, using their IDs. The problem is how to find the common data
which will cause potential security risks. In addition, they can only samples across the parties without revealing the non-shared parts.
handle discrete data, which is less practical for real-life scenarios. To achieve this goal, we align the data samples under a privacy-
In contrast, our method guarantees better protection to the data and preserving protocol for inter-database intersections [30].
can be easily applied to continuous data. Another work proposed After aligning the data across different parties under the pri-
in [29] jointly performs logistic regression over the encrypted vacy constraint, we now consider the problem of jointly building
vertically-partitioned data by approximating a non-linear logistic tree ensemble models over multiple parties without violating
loss by a Taylor expansion, which will inevitably compromise the privacy. Before further discussing the detail of the algorithm,
performance of the model. In contrast to these works, we propose we first introduce the general framework of federated learning.
a novel approach that is lossless in nature. In federated learning, a typical iteration consists of four steps.
First, each client downloads the current global model from server.
Second, each client computes an updated model based on its
3 P ROBLEM S TATEMENT local data and the current global model, which resides within the
m
Let Xk ∈ Rnk ×dk k=1 be the data matrix distributed on m pri-

active party. Third, each client sends the model update back to
vate parties with each row Xki∗ ∈ R1×dk being a data instance. We the server under encryption. Finally, the server aggregates these
use F k = {f1 , ..., fdk } to denote the feature set of corresponding model updates and construct the updated global model.
data matrix Xk . Two parties p and q have different sets of features, Following the general framework of federated learning, we
denoted as F p ∩ F q = ∅, ∀p 6= q ∈ {1...m}. Different parties see that to design a privacy-preserving tree boosting framework
may hold different sets of users as well, allowing some degree of in the setting of federated learning, essentially we have to answer
overlap. Only one of the parties holds the class labels y. the following three questions: (1) How can each client (i.e., a
Definition 1. Active Party: passive party) compute an updated model based on its local data
without reference to class label? (2) How can the server (i.e.,
We define the active party as the data provider who holds the active party) aggregate all the updated model and obtain a
both a data matrix and the class label. Since the class label new global model? (3) How to share the updated global model
information is indispensable for supervised learning, the active among all parties without leaking any information at inference
party naturally takes the responsibility as a dominating server in time? To answer these three questions, we start by reviewing a
federated learning. tree ensemble model, XGBoost [31], in a non-federated setting.
Definition 2. Passive Party: Given a data set X ∈ Rn×d with n samples and d features,
XGBoost predicts the output by using K regression trees.
We define the data provider which has only the data matrix
K
as a passive party. Passive parties play the role of clients in the X
federated learning setting. yˆi = fk (xi ) (1)
k=1
The problem of privacy-preserving machine learning over
vertically split data in federated learning can be stated
as:m To learn the set of regression tree models used in Eq.(1),
Given: a vertically partitioned data matrix Xk k=1 dis- it greedily adds a tree ft at the t-th iteration to minimize the
tributed on m private parties and the class labels y distributed following loss.
on active party. n
1

Learn: a machine learning model M without giving informa-

(t−1)
X
(t) 2
L ≃ l yi , yˆi + gi ft (xi ) + hi ft (xi ) + Ω(ft )
tion of the data matrix of any party to others in the process. The
i=1
2
model M is a function that has a projection Mi at each party i, (2)
2
such that Mi takes input of its own features Xi . where Ω(ft ) = γT + 21 λ kwk , gi = ∂ŷ(t−1) l(yi , ŷ (t−1) ) and
Lossless Constraint: We require that the model M is lossless, hi = ∂ŷ2(t−1) l(yi , ŷ (t−1) ).
which means that the loss of M under federated learning over the When constructing the regression tree in the t-th iteration,
training data is the same as the loss of M ′ when M ′ is built on it starts from the tree with depth of 0 and add a split for each
the union of all data. leaf node until reaching the maximum depth. In particular, it
maximizes the following equation to determine the best split,
where IL and IR are the instance spaces of left and right tree
4 F EDERATED L EARNING WITH S ECURE B OOST
nodes after the split.
As one of the most popular machine learning algorithms, the
gradient-tree boosting excels in many machine learning tasks, such " P 2 P 2 2 #
i∈IL gi i∈IR gi
P
as fraud detection, feature selection and product recommenda- 1 i∈I gi
tion. In this section, we propose a novel gradient-tree boosting Lsp = P +P −P −γ
2 i∈IL hi + λ i∈IR hi + λ i∈I hi + λ
algorithm called SecureBoost in the federated learning setting. (3)
It consists of two major steps. First, it aligns the data under After it obtains an optimal tree structure, the optimal weight
the privacy constraint. Second, it collaboratively learns a shared wj∗ of leaf j can be computed by the following equation, where Ij
gradient-tree boosting model while keeping all the training data is the instance space of leaf j .
secret over multiple private parties. We explain each step below. P
Our first goal is to find a common set of data samples at all ∗ i∈Ij gi
participating parties so as to build a joint model M . When the wj = − P (4)
i∈Ij hi + λ
data is vertically partitioned among parties, different parties hold
different but partially overlapping users, which can be identified From the above review, we make following observations:
4

Algorithm 1 Aggregate Encrypted Gradient Statistics Algorithm 2 Split Finding

Input: I , instance space of current node Input: I, instance m space of current node
Input: d, feature dimension Input: Gi , Hi i=1 , aggregated encrypted gradient statistics
Input: {hgi i , hhi i}i∈I from m parties
Output: G ∈ Rd×l , H ∈ Rd×l Output: Partition current instance space according to the selected
1: for k = 0 → d do attribute’s value
2: Propose Sk = {sk1 , sk2 , ..., skl } by percentiles on fea- 1: /*Conduct on Active Party*/
P P
ture k 2: g ← i∈I gi , h ← i∈I hi
3: end for 3: for i = 0 to m do
4: for k = 0 → d do 4: for k = 0 to di do
gl ← 0, hl ← 0
P
5: Gkv = i∈{i|sk,v ≥xi,k >sk,v−1 } hgi i 5:
//enumerate all threshold value
P
6: Hkv = i∈{i|sk,v ≥xi,k >sk,v−1 } hhi i 6:
7: end for 7: for v = 0 to lk do
8: get decrypted values D(Gikv ) and D(Hikv )
9: gl ← gl + D(Gikv ), hl ← hl + D(Hikv )
(1) The evaluation of split candidates and the calculation of 10: g r ← g − g l , hr ← h − hl
gl2 gr2 g2
the optimal weight of leaf only depends on the gi and hi , which 11: score ← max(score, hl +λ + hr +λ − h+λ )
makes it easily adapted to the setting of federated learning. 12: end for
(2) The class label can be inferred from gi and hi . For instance, 13: end for
when we take the square loss as the loss function, we have gi = 14: end for
(t−1) 15: Return kopt and vopt to the passive party iopt when we obtain
ŷi − yi .
the max score.
With the above observations, we now introduce our federated
16: /*Conduct on Passive Party iopt */
gradient tree boosting algorithm. Following observation (1), we
17: Determine the selected attribute’s value according to kopt and
can see that passive parties can determine their locally optimal
vopt and partition current instance space.
split with only its local data and gi ,hi , which motivates us to
18: Record the selected attribute’s value and return [record id, IL ]
follow such method to decompose learning task at each party.
back to the active party.
However, according to observation (2), gi and hi should be re-
19: /*Conduct on Active Party*/
garded as sensitive data, since they are able to disclose class label
20: Split current node according to IL and associate current node
information to passive parties. Therefore, in order to keep gi and
with [party id, record id].
hi confidential, the active party is required to encrypt gi and hi
before sending them to passive parties. The remaining challenge
is how to determine the locally optimal split with encrypted gi and
hi for each passive party. because the active party does not have features located in passive
parties, for the active party to know which passive party to
P According to Eq.(3), P the optimal split can be found if gl = deliver an instance to, as well as instructing the passive party
i∈IL g i and h l = i∈IL hi are calculated for every possible which split condition to use at inference time, it associates every
split. So next, we show how to obtain gl and hl with encrypted gi
tree node with a pair (party id i, record id r). Specific details
and hi using additive homomorphic encryption scheme [32]. The
about the split finding algorithm for SecureBoost is summarized
Paillier encryption scheme is taken as our encryption scheme.
in Algorithm 2. The problem remaining is the computation of
Denoting the encryption of a number u under the Paillier cryp-
optimal leaf weights. According
P to Equation P 4, the optimal weight
tosystem as hui, the main property of the Paillier cryptosystem
of leaf j only depends on i∈Ij gi and i∈Ij hi . Consequently,
ensures that for arbitrary numbers u and v , we have Q hui . hvi = it follows similar procedures as split
P finding. When P a leaf node is
Q
hu + vi. Therefore, hhl i = i∈IL hhi i and hgl i = i∈IL hgi i. reached, the passive party sends h i∈Ij gi i and h i∈Ij hi i to the
Consequently, the best split can be found in the following way.
active party, which are then deciphered to compute corresponding
First, each passive party computes hgl i and hhl i for all possible
weights through Equation 4.
splits locally, which are then sent back to the active party. The
active party deciphers all hgl i and hhl i and calculates the global
optimal split according to Eq.(3). We adopt the approximation 5 F EDERATED I NFERENCE
scheme used by [31], so as to alleviate the need of enumerating all In this section, we describe how the learned model (distributed
possible split candidates and communicating their hgi i and hhi i. among parties) can be used to classify a new instance even
The details of our secure gradient aggregation algorithm are shown though the features of the instance to be classified are private
in Algorithm 1. and distributed among parties. Since each party knows its own
Following the observation (1), the split finding algorithm re- features but nothing of the others, we need a secure distributed
mains largely the same as XGBoost except for minor adjustments inference protocol to control passes from one party to another,
to fit the federated learning framework. Due to separation in based on the decision made. To illustrate the inference process,
features, SecureBoost requires different parties to store certain we consider a system with three parties as depicted in Figure 3.
information for each split, so as to perform prediction for new Specifically, party 1 is the active party, which collects information
samples. Passive parties should keep a lookup table as shown in including user’s monthly bill payment, education, as well as the
Figure 3. It contains split thresholds [feature id k , threshold value label, whether the user X made the payment on time. Party 2 and
v ] and a unique record id r used to index the table, in order party 3 are passive parties, holding features age, gender, marriage
to look up split conditions during inference. In the meantime, status and amount of given credit respectively. Suppose we wish to
5

Party 1 (Passive Party) Party 2 (Active Party) Party 3 (Passive Party)

Example BIll Payment Education Example Age Gender Marriage Label Example Amount of given credit
X1 3102 2 X1 20 1 0 0 X1 5000
X2 17250 3 X2 30 1 1 1 X2 300000

Training Set X3 14027 2 X3 35 0 1 1 X3 250000

X4 6787 1 X4 48 0 1 2 X4 300000
X5 280 1 X5 10 1 0 3 X5 200

Example BIll Payment Education Example Age Gender Marriage Label Example Amount of given credit
Predict X6 4367 2 X6 28 1 0 0 X6 5500

input
ķ Party 1 query for ‘1' Root Lookup table
from its lookup table
Party ID: 1
Record ID: 1 Record ID Feature threshold value
Party 1:
1 Bill Payment 5000
ĸ Party 1: 4367<5000
Node 1 Node 2
Ĺ Party 3 query for ‘1' Party ID:3 Party ID:2 Record ID Feature threshold value
from its lookup table Record ID:1 Record ID:1 Party 2: 1 Age 40

ĺ Party 3: 5500>800
Record ID Feature threshold value
w1 w2 w3 w4 Party 3:
1 Amount of given credit 800

{X5} {X1} {X2, X3} {X4}

Fig. 3: An illustration of Federated Inference

predict whether a user X6 would make payment on time, then all throughout the construction of the tree and result in identical M
sites would have to collaborate to make the prediction. The whole and M ′ , which ensures the property of lossless.
process is coordinated by the active party. Starting from the root,
by referring to the record [party id:1, record id:1], the active party
knows party 1 holds the root node, thereby requiring party 1 to
7 S ECURITY D ISCUSSION
retrieve the corresponding attribute, Bill Payment, from its lookup SecureBoost avoids revealing data records held by each of the
table based on the record id 1. Since the classifying attribute is bill parties to others during training and inference thus protecting the
payment and party 1 knows the bill payment for user X6 is 4367, privacy of individual parties’ data. However, we stress that there
which is less than the threshold 5000, it makes the decision that is some leakage that can be inferred during the protocol execution
it should move down to its left child, node 1. Then, active party which is quite different for passive vs. active parties.
refers to the record [party id:3, record id:1] associated with node 1 The active party is in an advantageous position with Secure-
and requires party 3 to conduct the same operations. This process Boost as it learns the instance space for each split and which party
continues until a leaf is reached. is responsible for the decision at each node. Also, it learns all the
possible values of gl , gr and hl , hr , during learning. The former
seems unavoidable in this setting, unless one is willing to severely
6 T HEORETICAL A NALYSIS FOR L OSSLESS P ROP - increase the overhead during the inference phase. However, the
ERTY latter can be avoided using secure multi-party computation tech-
Theorem 1. SecureBoost is lossless, i.e. SecureBoost model M niques for comparison of encrypted values (e.g., [33], [34]). In
and XGBoost model M ′ would behave identically provided that this way, the active party learns only the optimal gl , gr , hl , hr per
the models M and M ′ are identically initialized and hyper- party; on the other hand, this significantly affects the efficiency
parameterized. during learning.
Note that the instances that are associated with the same leaf
Proof. According to Eq.(3), gl and hl are the only informa- strongly indicates they belong to the same class. We denote the
tion needed for the calculation of the best split, which can be proportion of samples which belong to the majority class as leaf
obtained with encrypted gi and hi using Paillier cryptosystem purity. The information leakage with respect to passive parties
in SecureBoost. In the Paillier cryptosystem, the encryption of is directly related with leaf purity of the first tree of SecureBoost.
a message m is hmi = g m rn mod n2 , for some random Moreover, the first tree’s leaf purity can be inferred from the
r ∈ {0, . . . , n−1}. Given the definition of encrypted message, we weight of the leaves.
have hm1 i . hm2 i = hm1 + m2 i for arbitrary message m1 and
Theorem 2. Given a learned SecureBoost model, its first tree’s
m2 under Paillier cryptosystem, which can be proved as follows:
leaf purity can be inferred from the weight of the leaves.
hm1 i . hm2 i = (g m1 r1c )(g m2 r2c ) mod n Proof. The loss function for binary classification problem is given
= g m1 +m2 (r1 r2 )c mod n (5) as follows.
= hm1 + m2 i
Q L = yi log(1 + e−yî ) + (1 − yi )log(1 + eyî ) (6)
Q Therefore, we have hhl i = i∈IL hhi i and hgl i =
(0)
i∈IL hgi i. Provided that with the same initialization, an instance Based on the loss function, we have gi = yî − yi and hi =
i will have the same value of gi and hi under either setting. yî (0) ∗(1−yî (0) ) during the construction of the decision tree at first
Thus, model M and M ′ can always achieve the same best split iteration. Specifically, yî (0) is given as initialized value. Suppose
6
(0)
we initialize all yî as a where 0 < a < 1. According to of |gi | for negative samples as µn . When we have a large amount
Eq.(4), for the instances
P associated with the specific leaf j , yî (1) = of samples but small number of leave nodes k , we can use the
gi
S(wj∗ ) = S(− P j i∈I
) where S(x) is the sigmoid function. following equation to approximates Eq.( 9).
i∈Ij hi +λ
Suppose the number of instances associated with the leaf j is nj k
X (nnj µn − npj µp )2
and the percentage of positive samples
P is θj . When nj is relatively (10)
i∈Ij gi nnj µn (µn − 1) + npj µp (µp − 1)
big, we can ignore λ in −P hi +λ and rewrite the weight j=1
P i∈Ij
p
of leaf j as wj∗ = − P
i∈Ij gi
= − θj ∗n∗(a−1)+(1−θ j )∗n∗a
= Where nn j and nj represent the number of negative samples
i∈Ij hi n∗a∗(1−a) and positive samples associated with leaf j . Since µn ∈ [0, 1]
θ ∗n∗(a−1)+(1−θ )∗n∗a a−θ
− j n∗a∗(1−a)
j j
= a(a−1) . By reformulating the equa- and µn ∈ [0, 1], we know the numerator has to be positive and the
tion, we have θj = a − a(a − 1)wj∗ . θj depends on a and wj∗ denominator has to be negative. Thus, the whole equation has to be
and a is given as initialization. Thus, wj∗ is the key to determine negative. To minimize Eq.(10) is equal to maximizing the numera-
θj . Note that θj can be used to represent the leaf purify of leaf j torPwhile minimizing the denominator.
P Note that the denominator
(i.e., purify of leaf j can be formally written as max(θj , 1 − θj ), is x2 and the numerator is ( x)2 where x ∈ [0, 1] . The
leaf purity of the first tree can be inferred from the weight of the equation is dominated by numerator. Thereby, minimizing Eq.( 10)
p 2
leaves (wj∗ ) given a learned SecureBoost model. can be regarded as maximizing the numerator (nn j µn − n j µp ) .
n p
Ideally, we require nj = nj in order to prevent label information
According to Theorem 2, given a SecureBoost model, the from divulging. When |µn − µp | is bigger, more possible we can
weight of the leaves of its first tree can reveal sensitive informa-
achieve the goal. And we know |gi | = |yî (t−1) − yi | = yî (t−1)
tion. In order to reduce information leakage with respect to passive
for negative samples and |gi | = |yî (t−1) − yi | = 1 − yî (t−1) for
parties, we opt to store decision tree leaves at the active party and
positive samples. Thereby, µn = N1n j=1 (1−θj )nj yî (t−1) and
Pk
propose a modified version of our framework, called Reduced-
µp = N1p kj=1 θj nj (1 − yî (t−1) ). |µn − µp | can be calculated
P
Leakage SecureBoost (RL-SecureBoost). With RL-SecureBoost,
the active party learns the first tree independently based only on its as follows.
own features which fully protects the instance space of its leaves.
Hence, all the information that passive parties learn is based on |µn − µp |
residuals. Although the residuals may also reveal information, we k k
prove that as the purity in the first tree increases, this residual 1 X 1 X
=| (1 − θj )nj yî (t−1) − θj nj (1 − yî (t−1) )|
information decreased. Nn j=1 Np j=1
Theorem 3. As the purity in the first tree increases, the residual (11)
information decreased. Where Nn and Np correspond to the number of negative
Proof. As mentioned before, for binary classification problem, we samples and positive samples in total. θj is the percentage of
have gi = yî (t−1) − yi and hi = yî (t−1) ∗ (1 − yî (t−1) ), where positive samples associated with leave j for decision tree at
gi ∈ [−1, 1]. Hence, (t − 1)-th iteration (previous decision tree). nj denote the number
of instances associated with leave j for previous decision tree.
yî (t−1) = S(wj ) where wj represents the weight of j -th leave

hi = gi (1 − gi ), if yi = 0
(7)
hi = −gi (gi + 1), if yi = 1 of previous decision tree. When the positive samples and negative
samples are balanced, Nn = Np , we have
When we construct the decision tree at the t-th iteration with
k leaves to fit the residuals of the previous tree, in essential, we
split the data into k clusters to minimize the following loss. |µn − µp |
k
1 X
= | ((1 − θj )nj S(wj ) − θj nj (1 − S(wj ))|
k ( i∈Ij gi )2
P
X Nn j=1
L=− P
i∈Ij hi k
j=1
(8) 1 X (12)
= nj |(S(wj ) − θj )|
k ( i∈Ij gi )2
P
X Nn j=1
=− P P
i∈IjN gi (1 − gi ) + i∈I P −gi (1 + gi ) k
j=1 j 1 X a − θj
= nj |(S( ) − θj )|
We know yî (t−1) ∈ [0, 1] and gi = yî (t−1) − yi . Thus, we Nn j=1 a(a − 1)
have gi ∈ [−1, 0] for positive samples and gi ∈ [0, 1] for negative
samples. Taking the range of gi into consideration, we can rewrite As observed from Eq.( 12), it achieves the minimum value
a−θj
the above equation as follows. when S( a(a−1) ) = a. By solving the equation, we have the
a
optimal solution of θj as θj ∗ = a(1 + (1 − a) ln( 1−a ))). In order
to achieve bigger µn −µp , we want the deviation from θj to θj ∗ to
( i∈IjN |gi | − i∈IjP |gi |)2
P P
k
X be as big as possible. When we have proper initialization of a, for
(9)
instance a = 0.5, θj ∗ = 0.5. In this case, maximizing |θj − θj ∗ |
P P
j=1 i∈IjN |gi |(|gi | − 1) + i∈IjP |gi |(|gi | − 1)
is the same as maximizing max(θj , 1 − θj ), which exactly is the
Where IjN and IjP denote the set of negative samples and leaf purity. Therefore, we have proved that high leaf purity will
positive samples associated with leaf j respectively. We denote the guarantee big difference between µn and µp , which finally results
expectation of |gi | for positive samples as µp and the expectation in less information leakage. We complete our proof.
7

0.58
training and test loss of SecureBoost are very much alike GBDT
0.575 SecureBoost SecureBoost
0.550
GBDT
XGBoost
0.56 GBDT
XGBoost
and XGBoost.
0.54
0.525
0.52 Next, to investigate how maximum depth of individual trees
loss

loss
0.500
0.50
0.475 0.48 affects the runtime of learning, we vary the maximum depth of
0.450

0.425
0.46
0.44
individual tree among {3, 4, 5, 6, 7, 8} and report the runtime
1 5 10 15
# boost stages
20 25
0.42
1 5 10 15
# boost stages
20 25 of one boosting stage. As depicted in Figure 5 (a), the runtime
increases almost linearly with the maximum depth of individual
(a) Learning Curve (b) Test Error
trees, which indicates that we can train deep trees with relatively
Fig. 4: Loss convergence little additional time, which is very appealing in practice, espe-
cially in scenarios like big data.
TABLE 1: First Tree vs. Second Tree in terms of Leaf Purity Finally we study the impact of data size on the runtime of our
Mean Purity Credit 1 Credit 2
proposed system. We augment the feature sets by feature products.
1st Tree 0.8058 0.7159 We fix the maximum depth of individual regression trees to 3 and
2rd Tree 0.66663 0.638 vary the feature number in {50, 500, 1000, 5000} and the sample
number in {5000, 10000, 30000}. We compare the runtime of one
boosting stage to investigate how each variant affects the efficiency
Given Theorem 3, we prove that RL-SecureBoost is secure as of the algorithm. We make similar observations on both Figure 5
long as its first tree learns enough information to mask the actual (b) and Figure 5 (c), which imply that sample and feature numbers
label with residuals. Moreover, as we experimentally demonstrate contribute equally to running time. In addition, we can see that our
in Section 8, RL-SecureBoost performs identically as SecureBoost proposed framework scales well even with relatively big data.
in terms of prediction accuracy.
8.2 Performance of RL-SecureBoost
To investigate the performance of RL-SecureBoost in both se-
8 E XPERIMENTS curity and prediction accuracy, we aim to answer the following
We conduct experiments on two public datasets. questions: (1) Does the first tree, built upon only features held
Credit 12 : It involves the problem of classifying whether a by active party, learns enough information to reduce information
user would suffer from serious financial problems. It contains a leakage? (2) Does RL-SecureBoost suffer from a loss of acccuracy
total of 150000 instances and 10 attributes. compared with SecureBoost?
Credit 23 : It is also a credit scoring dataset, correlated to the First, we study the performance of RL-SecureBoost in security.
task of predicting whether a user would make payment on time. It Following the analysis in Section 7, we evaluate information
consist of 30000 instances and 25 attributes in all. leakage in terms of leaf purity. Also, we know that as the leaf
In our experiment, we use 2/3 of each dataset for training purity in the first tree increases, leaked information is reduced.
and the remaining for testing. We split the data vertically into Thereby, to verify the security of RL-SecureBoost, we have
two halves and distribute them to two parties. To fairly compare to illustrate that the first tree of RL-SecureBoost perform well
different methods, we set the maximum depth of each tree as 3, the enough to reduce the information leaked from the second tree. As
fraction of samples used to fit individual regression trees as 0.8, shown in Table 1, we compare the mean leaf purity of the first
and learning rate as 0.3 for all methods. The Paillier encryption tree with the second tree. In particular, the mean
Pk leaf purity is
scheme is taken as our encryption scheme with a key size of 512 the weighted average, which is calculated by i=0 nni pi . Here,
bits. All experiments are conducted on a machine with 8GB RAM k and n represents number of leaves and number of instances in
and Intel Core i5-7200u CPU. total. pi and ni are defined as leaf purity and number of instances
associated with leaf i. According to Table 1, the mean leaf purity
decreases significantly from the first to the second tree on both
8.1 Scalability
datasets, which reflects a great reduction in information leakage.
Note that the efficiency of SecureBoost may be reflected by rate Moreover, the mean leaf purity of the second tree is just over 0.6
of convergence and runtime, which may be influenced by (1) on both datasets, which is good enough to ensure a safe protocol.
maximum depth of individual regression trees; (2) the size of the Next, to investigate the prediction performance of RL-
datasets. In this subsection, we conduct convergence analysis as SecureBoost, we compare RL-SecureBoost with SecureBoost with
well as study the impact of all the variables on the runtime of respect to the the first tree’s performance and the overall perfor-
learning. All experiments are conducted on dataset Credit2. mance. We consider commonly used metrics including accuracy,
First, we are interested in the convergence rate of our proposed Area under ROC curve (AUC) and f1-score. The results are
system. We compare the convergence behavior of SecureBoost presented in Table 2. As observed, RL-SecureBoost performs
with non-federated tree boosting counterparts, including GBDT4 equally well compared to SecureBoost in almost all cases. We
and XGBoost5 . As can be observed from Figure 4, SecureBoost also conduct a pairwise Wilcoxon signed-rank test between RL-
shows a similar learning curve with other non-federated methods SecureBoost and SecureBoost. The comparison results indicate
on the training dataset and even performs slightly better than that RL-SecureBoost is as accurate as SecureBoost, with a signifi-
others on the test dataset. In addition, the convergence behavior of cance level of 0.05. The property of lossless is still guaranteed for
RL-SecureBoost.
2. https://www.kaggle.com/c/GiveMeSomeCredit/data
3. https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset 9 C ONCLUSION
4. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
GradientBoostingClassifier.html In this paper, we proposed a lossless privacy-preserving tree
5. https://github.com/dmlc/xgboost boosting algorithm, SecureBoost, to train a high-quality tree
8

800

700 # samples = 5000 # features = 50

80000 # samples = 10000 80000 # features = 500
600 # samples = 30000 # features = 1000
runtime (s)

# features = 5000

runtime (s)
60000

runtime (s)
60000
500
40000 40000
400

300 20000 20000

3 4 5 6 7 8 0 0
# depths 50 500 1000 5000 5000 10000 30000
# features # samples
(a)Runtime w.r.t. maximum depth of
(b)Runtime w.r.t. feature size (c)Runtime w.r.t. sample size
individual tree
Fig. 5: Scalability Analysis of SecureBoost

TABLE 2: Classification Performance for RL-SecureBoost vs. SecureBoost

Credit 1 Credit 2
Model
ACC F1-score AUC ACC F1-score AUC
1st Tree, SecureBoost 0.9298 0.012 0.7002 0.7806 0 0.6381
1st Tree, RL-SecureBoost 0.9186 0 0.6912 0.7793 0 0.6320
Overall, SecureBoost 0.9345 0.2576 0.8461 0.8180 0.4634 0.7701
Overall, RL-SecureBoost 0.9331 0.2549 0.8423 0.8179 0.4650 0.7682

boosting model with private data split across multiple parties. We [12] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
theoretically prove that our proposed framework is as accurate learning,” Synthesis Lectures on Artificial Intelligence and Machine
Learning, vol. 13, no. 3, pp. 1–207, 2019.
as non-federated gradient tree boosting counterparts. In addition, [13] Y. Liu, Y. Kang, X. Zhang, L. Li, Y. Cheng, T. Chen, M. Hong,
we analyze information leakage during the protocol execution and and Q. Yang, “A communication efficient vertical federated learning
propose provable ways to reduce it. framework,” CoRR, vol. abs/1912.11187, 2019. [Online]. Available:
http://arxiv.org/abs/1912.11187
[14] X. Liang, Y. Liu, J. Luo, Y. He, T. Chen, and Q. Yang, “Self-supervised
ACKNOWLEDGMENT cross-silo federated neural architecture search,” 2021.
[15] J. Vaidya, “A survey of privacy-preserving methods across vertically
This work was partially supported by the National Key Re-
partitioned data,” in Privacy-preserving data mining. Springer, 2008,
search and Development Program of China under Grant No. pp. 337–358.
2018AAA0101100. [16] J. Vaidya and C. Clifton, “Privacy-preserving decision trees over verti-
cally partitioned data,” in IFIP Annual Conference on Data and Applica-
tions Security and Privacy. Springer, 2005, pp. 139–152.
R EFERENCES [17] KDD ’16: Proceedings of the 22nd ACM SIGKDD International Confer-
[1] P. Regulation, “The general data protection regulation,” European Com- ence on Knowledge Discovery and Data Mining. New York, NY, USA:
mission. Available at: https://eur-lex. europa. eu/legal-content/EN/TXT, Association for Computing Machinery, 2016.
2016. [18] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in
[2] J. P. Albrecht, “How the gdpr will change the world,” Eur. Data Prot. L. Proceedings of the 22nd ACM SIGSAC conference on computer and
Rev., vol. 2, p. 287, 2016. communications security. ACM, 2015, pp. 1310–1321.
[3] V. Mayer-Schonberger and Y. Padova, “Regime change: Enabling big [19] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N.
data through europe’s new data protection regulation,” Colum. Sci. & Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, R. G. L.
Tech. L. Rev., vol. 17, p. 315, 2015. D’Oliveira, S. E. Rouayheb, D. Evans, J. Gardner, Z. Garrett, A. Gascón,
[4] B. Goodman and S. Flaxman, “European union regulations on algo- B. Ghazi, P. B. Gibbons, M. Gruteser, Z. Harchaoui, C. He, L. He,
rithmic decision-making and a” right to explanation”,” arXiv preprint Z. Huo, B. Hutchinson, J. Hsu, M. Jaggi, T. Javidi, G. Joshi, M. Khodak,
arXiv:1606.08813, 2016. J. Konecný, A. Korolova, F. Koushanfar, S. Koyejo, T. Lepoint, Y. Liu,
[5] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and P. Mittal, M. Mohri, R. Nock, A. Özgür, R. Pagh, M. Raykova, H. Qi,
B. Thorne, “Private federated learning on vertically partitioned data via D. Ramage, R. Raskar, D. Song, W. Song, S. U. Stich, Z. Sun, A. T.
entity resolution and additively homomorphic encryption,” arXiv preprint Suresh, F. Tramèr, P. Vepakomma, J. Wang, L. Xiong, Z. Xu, Q. Yang,
arXiv:1711.10677, 2017. F. X. Yu, H. Yu, and S. Zhao, “Advances and open problems in federated
[6] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy- learning,” CoRR, vol. abs/1912.04977, 2019. [Online]. Available: http://
preserving machine learning,” in 2017 38th IEEE Symposium on Security arxiv.org/abs/1912.04977
and Privacy (SP). IEEE, 2017, pp. 19–38. [20] L. Zhao, L. Ni, S. Hu, Y. Chen, P. Zhou, F. Xiao, and L. Wu, “Inprivate
[7] C. Dwork, A. Roth et al., “The algorithmic foundations of differential digging: Enabling tree-based distributed data mining with differential
privacy,” Foundations and Trends® in Theoretical Computer Science, privacy,” in IEEE INFOCOM 2018-IEEE Conference on Computer
vol. 9, no. 3–4, pp. 211–407, 2014. Communications. IEEE, 2018, pp. 2087–2095.
[8] C. Dwork, “Differential privacy: A survey of results,” in International [21] Q. Li, Z. Wen, and B. He, “Practical federated gradient boosting decision
Conference on Theory and Applications of Models of Computation. trees,” in Proceedings of the AAAI Conference on Artificial Intelligence,
Springer, 2008, pp. 1–19. vol. 34, no. 04, 2020, pp. 4642–4649.
[9] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and [22] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,
D. Bacon, “Federated learning: Strategies for improving communication “Federated optimization in heterogeneous networks,” arXiv preprint
efficiency,” arXiv preprint arXiv:1610.05492, 2016. arXiv:1812.06127, 2018.
[10] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, [23] F. Hanzely, S. Hanzely, S. Horváth, and P. Richtárik, “Lower bounds and
S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation optimal algorithms for personalized federated learning,” arXiv preprint
for privacy-preserving machine learning,” in CCS, 2017, pp. 1175–1191. arXiv:2010.02372, 2020.
[11] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: [24] M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,”
Concept and applications,” ACM Transactions on Intelligent Systems and in International Conference on Machine Learning. PMLR, 2019, pp.
Technology, vol. 10, no. 2, pp. 12:1–12:19, 2019. 4615–4625.
9

[25] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang,

and Y. Khazaeni, “Bayesian nonparametric federated learning of neural
networks,” in International Conference on Machine Learning. PMLR,
2019, pp. 7252–7261.
[26] H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khaz-
aeni, “Federated learning with matched averaging,” arXiv preprint
arXiv:2002.06440, 2020.
[27] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang, “A secure federated
transfer learning framework,” IEEE Intelligent Systems, vol. 35, no. 4,
pp. 70–82, 2020.
[28] J. Vaidya, C. Clifton, M. Kantarcioglu, and A. S. Patterson, “Privacy-
preserving decision trees over vertically partitioned data,” ACM Transac-
tions on Knowledge Discovery from Data (TKDD), vol. 2, no. 3, p. 14,
2008.
[29] M. Djatmiko, S. Hardy, W. Henecka, H. Ivey-Law, M. Ott, G. Patrini,
G. Smith, B. Thorne, and D. Wu, “Privacy-preserving entity resolution
and logistic regression on encrypted data,” Private and Secure Machine
Learning (PSML), 2017.
[30] G. Liang and S. S. Chawathe, “Privacy-preserving inter-database oper-
ations,” in International Conference on Intelligence and Security Infor-
matics. Springer, 2004, pp. 66–82.
[31] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining. ACM, 2016, pp. 785–794.
[32] P. Paillier, “Public-key cryptosystems based on composite degree resid-
uosity classes,” in International Conference on the Theory and Applica-
tions of Cryptographic Techniques. Springer, 1999, pp. 223–238.
[33] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine learning clas-
sification over encrypted data,” in 22nd Annual Network and Distributed
System Security Symposium, NDSS, 2015.
[34] F. Baldimtsi, D. Papadopoulos, S. Papadopoulos, A. Scafuro, and
N. Triandopoulos, “Server-aided secure computation with off-line par-
ties,” in Computer Security - ESORICS 2017 - 22nd European Symposium
on Research in Computer Security, Proceedings, Part I, 2017, pp. 103–
123.

Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
No ratings yet
Privacy-Preserving Federated Learning Based On Differential Privacy and Momentum
6 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
1907 07157v91
No ratings yet
1907 07157v91
7 pages
COM3030 Week 10 Slides
No ratings yet
COM3030 Week 10 Slides
63 pages
FL 1
No ratings yet
FL 1
25 pages
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
No ratings yet
Bharati Et Al 2022 Federated Learning Applications Challenges and Future Directions
17 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
Splitfed: When Federated Learning Meets Split Learning
No ratings yet
Splitfed: When Federated Learning Meets Split Learning
14 pages
1-A Privacy-Preserving FL Scheme With TEE
No ratings yet
1-A Privacy-Preserving FL Scheme With TEE
11 pages
Research Article - A Privacy-Preserving Data Stream Mining Technique Based On Cumulative and Independent Additive Noise For Improving Random Projection
No ratings yet
Research Article - A Privacy-Preserving Data Stream Mining Technique Based On Cumulative and Independent Additive Noise For Improving Random Projection
26 pages
Federated Learning A Survery
No ratings yet
Federated Learning A Survery
31 pages
Developing DApps On EOS MuditMarda
No ratings yet
Developing DApps On EOS MuditMarda
18 pages
Split-Fed Learning A Deep Dive Into Methods Innova
No ratings yet
Split-Fed Learning A Deep Dive Into Methods Innova
24 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
NIT Research Proposal1
No ratings yet
NIT Research Proposal1
2 pages
Hybridalpha: An Efficient Approach For Privacy-Preserving Federated Learning
No ratings yet
Hybridalpha: An Efficient Approach For Privacy-Preserving Federated Learning
11 pages
Privacy-Preserving Feature Selection With Secure Multiparty Computation
No ratings yet
Privacy-Preserving Feature Selection With Secure Multiparty Computation
8 pages
Privacy Preserving Machine Learning
No ratings yet
Privacy Preserving Machine Learning
28 pages
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
No ratings yet
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
9 pages
A Hybrid Approach To Privacy-Preserving Federated Learning
No ratings yet
A Hybrid Approach To Privacy-Preserving Federated Learning
11 pages
A Federated Learning Framework Based On CSP Homomorphic Encryption
No ratings yet
A Federated Learning Framework Based On CSP Homomorphic Encryption
6 pages
Deep Chain
No ratings yet
Deep Chain
18 pages
A Generic Framework For Privacy Preserving Deep Learning: Member of The Openmined Community
No ratings yet
A Generic Framework For Privacy Preserving Deep Learning: Member of The Openmined Community
5 pages
BTP Research Internship Final Report
No ratings yet
BTP Research Internship Final Report
21 pages
Enhancing IoT Security: A Novel Approach With Federated Learning and Differential Privacy Integration
No ratings yet
Enhancing IoT Security: A Novel Approach With Federated Learning and Differential Privacy Integration
17 pages
Federated Learning Attacks and Defenses: A Survey
No ratings yet
Federated Learning Attacks and Defenses: A Survey
10 pages
D P M - L: Ifferentially Rivate ETA Earning
No ratings yet
D P M - L: Ifferentially Rivate ETA Earning
17 pages
Quality Inference in Federated Learning With Secure Aggregation
No ratings yet
Quality Inference in Federated Learning With Secure Aggregation
8 pages
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
No ratings yet
Federated Learning Differential Privacy Preservation Method Based On Differentiated Noise Addition
5 pages
File - BB 6 15
No ratings yet
File - BB 6 15
10 pages
BPS-FL Blockchain-Based Privacy-Preserving and Sec
No ratings yet
BPS-FL Blockchain-Based Privacy-Preserving and Sec
25 pages
Anomaly Detection and Defense Techniques in Federated Learning A Comprehensive Review
No ratings yet
Anomaly Detection and Defense Techniques in Federated Learning A Comprehensive Review
34 pages
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
No ratings yet
A Survey On Federated Learning Systems: Vision, Hype and Reality For Data Privacy and Protection
41 pages
22 TBD
No ratings yet
22 TBD
14 pages
Jsan 12 00013
No ratings yet
Jsan 12 00013
18 pages
Defending Against Label-Flipping Attacks in Federated Learning Systems Using Uniform Manifold Approximation and Projection
No ratings yet
Defending Against Label-Flipping Attacks in Federated Learning Systems Using Uniform Manifold Approximation and Projection
8 pages
Blockchain For Federated Learning
No ratings yet
Blockchain For Federated Learning
18 pages
Privacy and Robustness in Federated Learning: Attacks and Defenses
No ratings yet
Privacy and Robustness in Federated Learning: Attacks and Defenses
21 pages
PPFL Privacy Preserving FL With TEE
No ratings yet
PPFL Privacy Preserving FL With TEE
15 pages
DP-GSGLD - A Bayesian Optimizer Inspired by Differential Privacy Defending Against Privacy Leakage in Federated Learning
No ratings yet
DP-GSGLD - A Bayesian Optimizer Inspired by Differential Privacy Defending Against Privacy Leakage in Federated Learning
15 pages
Decentralised Deep
No ratings yet
Decentralised Deep
14 pages
Futureinternet 13 00073 v2
No ratings yet
Futureinternet 13 00073 v2
14 pages
Research Paper Mine
No ratings yet
Research Paper Mine
9 pages
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
No ratings yet
Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
56 pages
Research Paper 8
No ratings yet
Research Paper 8
7 pages
8 A Privacy Preserving Federated Learning Scheme Using Homomorphic
No ratings yet
8 A Privacy Preserving Federated Learning Scheme Using Homomorphic
15 pages
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
No ratings yet
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
11 pages
A Critical Overview of Privacy in Machine Learning
No ratings yet
A Critical Overview of Privacy in Machine Learning
9 pages
BL 5
No ratings yet
BL 5
6 pages
Client Aided Privacy Preserving Machine Learning
No ratings yet
Client Aided Privacy Preserving Machine Learning
42 pages
From Distributed Machine Learning To Federated Learning: A Survey
No ratings yet
From Distributed Machine Learning To Federated Learning: A Survey
33 pages
Sec22 Stevens
No ratings yet
Sec22 Stevens
18 pages
Differential Privacy For Deep and Federated Learning A Survey
No ratings yet
Differential Privacy For Deep and Federated Learning A Survey
22 pages
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
No ratings yet
Does Differential Privacy Really Protect Federated Learning From Gradient Leakage Attacks
15 pages
Open Source FL Frameworks Ranking
No ratings yet
Open Source FL Frameworks Ranking
26 pages
3-PPFL Enhancing Privacy in FL With Confidential Computing
No ratings yet
3-PPFL Enhancing Privacy in FL With Confidential Computing
4 pages
A Survey On Differentially Private Machine Learning Review Article
No ratings yet
A Survey On Differentially Private Machine Learning Review Article
16 pages
Big Data Project Privacy and FL DP
No ratings yet
Big Data Project Privacy and FL DP
3 pages
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
No ratings yet
Blockchain-Based Privacy-Preserving Multi-Tasks Federated Learning Framework
24 pages
Cloud Computing: The Untold Origins of Cloud Computing (Manipulation, Configuring and Accessing the Applications Online)
From Everand
Cloud Computing: The Untold Origins of Cloud Computing (Manipulation, Configuring and Accessing the Applications Online)
William Cormier
No ratings yet
Tos Tle Cookery Third Quarter Bahian
100% (1)
Tos Tle Cookery Third Quarter Bahian
2 pages
PRJ 3
No ratings yet
PRJ 3
7 pages
Benchmarking Edge For Successful Sales Execution1
No ratings yet
Benchmarking Edge For Successful Sales Execution1
14 pages
Kirankumar Kaisetty Manoharan Resume
No ratings yet
Kirankumar Kaisetty Manoharan Resume
7 pages
Practical Amazon EC2, SQS, Kinesis, and S3: A Hands-On Approach To AWS
No ratings yet
Practical Amazon EC2, SQS, Kinesis, and S3: A Hands-On Approach To AWS
1 page
Doosan Schematic All Models
100% (69)
Doosan Schematic All Models
20 pages
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
No ratings yet
Design of A Latent Heat Storage System For The Replacement of Cooling Tower For DG Set
6 pages
Aln-V Ha-06-043-Analog Sensor Bases Installation Instructions
No ratings yet
Aln-V Ha-06-043-Analog Sensor Bases Installation Instructions
4 pages
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
No ratings yet
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
9 pages
Tybsc-It Sem5 SPM Apr19
No ratings yet
Tybsc-It Sem5 SPM Apr19
2 pages
(HK241) Convolution Operation
No ratings yet
(HK241) Convolution Operation
6 pages
Stationary Waves
No ratings yet
Stationary Waves
3 pages
Tech. Manual SG
No ratings yet
Tech. Manual SG
29 pages
Lab - 1 Active Directory Installation
No ratings yet
Lab - 1 Active Directory Installation
32 pages
Datasheet
No ratings yet
Datasheet
15 pages
Free Siemens NX (Unigraphics) Tutorial - Surface Modeling
73% (11)
Free Siemens NX (Unigraphics) Tutorial - Surface Modeling
53 pages
Wattless Current
No ratings yet
Wattless Current
2 pages
Num5 Ibm
No ratings yet
Num5 Ibm
222 pages
Benzara MBA 2024 MAIT
No ratings yet
Benzara MBA 2024 MAIT
3 pages
Presentation 17
No ratings yet
Presentation 17
18 pages
Comparison of Different DEM Generation Methods Based On Open Source Datasets
No ratings yet
Comparison of Different DEM Generation Methods Based On Open Source Datasets
23 pages
Main
No ratings yet
Main
12 pages
Forms of Quadratic Function
No ratings yet
Forms of Quadratic Function
2 pages
Understanding The Security Architecture of The One Identity Safeguard Appliance
No ratings yet
Understanding The Security Architecture of The One Identity Safeguard Appliance
6 pages
Ite 1 Reviewer
No ratings yet
Ite 1 Reviewer
4 pages
Presentation On Nanda Nilekani
No ratings yet
Presentation On Nanda Nilekani
7 pages
2022-23 B.C.A (CBCS) Syllabus
No ratings yet
2022-23 B.C.A (CBCS) Syllabus
28 pages
Headspace 2nd Year
100% (1)
Headspace 2nd Year
401 pages
MS-Word Assignment
No ratings yet
MS-Word Assignment
13 pages
Untitled
No ratings yet
Untitled
6 pages

SecureBoost A Lossless Federated Learning Framework

Uploaded by

SecureBoost A Lossless Federated Learning Framework

Uploaded by

1

SecureBoost: A Lossless Federated Learning

Index Terms—Federated Learning, Privacy, Security, Decision Tree

Sub-Model 1 Sub-Model 3 Sub-Model 2

Privacy-Preserving Entity Alignment Privacy-Preserving Entity Alignment

Fig. 1: Illustration of the proposed SecureBoost framework

Virtually Joint Table

Algorithm 1 Aggregate Encrypted Gradient Statistics Algorithm 2 Split Finding

Party 1 (Passive Party) Party 2 (Active Party) Party 3 (Passive Party)

Training Set X3 14027 2 X3 35 0 1 1 X3 250000

{X5} {X1} {X2, X3} {X4}

Fig. 3: An illustration of Federated Inference

700 # samples = 5000 # features = 50

300 20000 20000

TABLE 2: Classification Performance for RL-SecureBoost vs. SecureBoost

[25] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang,

You might also like