Privacy-Preserving Feature Selection With Secure Multiparty Computation

This document presents a novel Secure Multiparty Computation (MPC) protocol for privacy-preserving feature selection in machine learning, focusing on the filter method using Gini impurity. The proposed protocol allows multiple data owners to collaboratively select relevant features without revealing their private data, thus enhancing model accuracy while complying with privacy regulations. Experiments demonstrate the protocol's effectiveness and efficiency in various data scenarios, paving the way for improved privacy-preserving machine learning applications.

Uploaded by

kexuedaishu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

Privacy-Preserving Feature Selection With Secure Multiparty Computation

Uploaded by

kexuedaishu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1

Privacy-Preserving Feature Selection with Secure

Multiparty Computation
Xiling Li, Rafael Dowsley and Martine De Cock

Abstract—Existing work on privacy-preserving machine learn- to each other [12]. We consider the scenario where different
ing with Secure Multiparty Computation (MPC) is almost exclu- data owners or enterprises are interested in training an ML
sively focused on model training and on inference with trained model over their combined data. There is a lot of potential in
models, thereby overlooking the important data pre-processing
stage. In this work, we propose the first MPC based protocol training ML models over the aggregated data from multiple
arXiv:2102.03517v1 [cs.CR] 6 Feb 2021

for private feature selection based on the filter method, which is enterprises. First of all, training on more data typically yields
independent of model training, and can be used in combination higher quality ML models. For instance, one could train a
with any MPC protocol to rank features. We propose an efficient more accurate model to predict the length of hospital stay
feature scoring protocol based on Gini impurity to this end. To of COVID-19 patients when combining data from multiple
demonstrate the feasibility of our approach for practical data
science, we perform experiments with the proposed MPC proto- clinics. This is an application where the data is horizontally
cols for feature selection in a commonly used machine-learning- distributed, meaning that each data owner or enterprise has
as-a-service configuration where computations are outsourced to records/rows of the data. Furthermore, being able to combine
multiple servers, with semi-honest and with malicious adversaries. different data sets enables new applications that pool together
Regarding effectiveness, we show that secure feature selection data from multiple enterprises, or even from different entities
with the proposed protocols improves the accuracy of classifiers
on a variety of real-world data sets, without leaking information within the same enterprise. An example of this would be an
about the feature values or even which features were selected. ML model that relies on lab test results as well as healthcare
Regarding efficiency, we document runtimes ranging from several bill payment information about patients, which are usually
seconds to an hour for our protocols to finish, depending on the managed by different departments within a hospital system.
size of the data set and the security settings. This is an example of an application where the data is vertically
distributed, i.e. each data owner has their own columns. While
there are clear advantages to training ML models over data
I. I NTRODUCTION
that is distributed across multiple data owners, often these data
Machine learning (ML) thrives because of the availability of owners do not want to disclose their data to each other, because
an abundant amount of data, and of computational resources the data in itself constitutes a competitive advantage, or because
and devices to collect and process such data. In many effective the data owners need to comply with data privacy regulations.
ML applications, the data that is consumed during ML model These roadblocks can even affect different departments within
training and inference is often of a very personal nature. the same enterprise, such as different clinics within a healthcare
Protection of user data has become a significant concern system.
in ML model development and deployment, giving rise to During the last decade, cryptographic protocols designed
laws to safeguard the privacy of users, such as the European with MPC have been developed for training of ML models over
General Data Protection Regulation (GDPR) and the California aggregated data, without the need for the individual data owners
Customer Privacy Act (CCPA). Cryptographic protocols that or enterprises to reveal their data to anyone in an unencrypted
allow computations on encrypted data are an increasingly manner. This existing work includes MPC protocols for training
important mechanism to enable data science applications of decision tree models [26], [17], [11], [1], linear regression
while complying with privacy regulations. In this paper, we models [29], [15], [2], and neural network architectures [28],
contribute to the field of privacy-preserving machine learning [3], [34], [21], [16]. Existing approaches assume that the data
(PPML), a burgeoning and interdisciplinary research area at the sets are pre-processed and clean, with features that have been
intersection of cryptography and ML that has gained significant pre-selected and constructed. In practical data science projects,
traction in tackling privacy issues. model building constitutes only a small part of the workflow:
In particular, we use techniques from Secure Multiparty real-world data sets must be cleaned and pre-processed, outliers
Computation (MPC), an umbrella term for cryptographic must be removed, training features must be selected, and
approaches that allow two or more parties to jointly compute a missing values need to be addressed before model training
specified output from their private information in a distributed can begin. Data scientists are estimated to spend 50% to 80%
fashion, without actually revealing their private information of their time on data wrangling as opposed to model training
itself [27]. PPML solutions will not be adopted in practice if
Xiling Li is with the School of Engineering and Technology, University of
Washington, Tacoma, WA, USA. Email: [email protected] they do not encompass these data preparation steps. Indeed,
Rafael Dowsley is with the Faculty of Information Technology, Monash there is little point in preserving the privacy of clean data sets
University, Clayton, Australia. Email: [email protected] during model training – which is currently already possible –
Martine De Cock is with the School of Engineering and Technology,
University of Washington, Tacoma, WA, USA and Ghent University, Ghent, if the raw data has to be leaked first to arrive at those clean
Belgium. Email: [email protected] data sets!
2

Fig. 1. Overview of private feature selection and model training in 3PC setting with computing servers (parties) Alice, Bob, and Carol.

In this paper, we contribute to filling this gap in the open can be trivially revealed by combining shares, no information
literature by proposing the first MPC based protocol for privacy- about the data is revealed by the shares received by any single
preserving feature selection. Feature selection is the process server, meaning that none of the servers by themselves learn
of selecting a subset of relevant features for model training anything about the actual values of the data. In Step 2A, the
[10]. Using a well chosen subset of features can lead to more three servers execute protocols πMS−GINI and πFILTER−FS to
accurate models, as well as efficiency gains during model create a reduced version of the data set that contains only the
training. A commonly used technique for feature selection selected features. Throughout this process, none of the parties
is the so-called filter method in which features are ranked learns the values of the data or even which features are selected,
according to a score indicative of their predictive ability, and as all computations are done over secret shares. Next, in Step
subsequently the highest ranked features are retained. Despite 2B, the parties train an ML model over the pre-processed data
of its known shortcomings, including the fact that it considers using existing privacy-preserving training protocols, e.g., a
each feature in isolation and ignores feature dependencies, the privacy-preserving protocol for logistic regression training [16].
filter method is popular in practical data science because it is Finally, in Step 3, the servers can disclose the trained model
computationally very efficient, and independent of any specific to the intended model owner by revealing their shares. Steps
ML model architecture. 1 and 3 are trivial as they follow directly from the choice of
The MPC based protocol πFILTER−FS for private feature se- the underlying MPC scheme (see Sec. II-B). MPC protocols
lection that we propose in this paper can be used in combination for Step 2B have previously been proposed. The focus of this
with any MPC protocol to rank features in a privacy-preserving paper is on Step 2A. Our approach works in scenarios where
manner. Well-known techniques to score features in terms of the data is horizontally partitioned (each data owner has one
their informativeness include mutual information (MI), Gini or more of the rows or instances), scenarios where the data is
impurity (GI), and Pearson’s correlation coefficient (PCC). We vertically partitioned (each data owner has some of the columns
propose an efficient feature scoring protocol πMS−GINI based on or attributes), or any other partition.
Gini impurity, leaving the development of privacy-preserving After presenting preliminaries about Gini impurity and MPC
protocols for other feature scoring techniques as future work. in Sec. II, and discussing related work in Sec. III, we present
The computation of a GI score for continuous valued features our main protocol πFILTER−FS for private feature selection and
traditionally requires sorting of the feature values to determine the supporting protocols πGINI−FS and πMS−GINI in Sec. IV.
candidate split points in the feature value range. As sorting In Sec. V we demonstrate the feasibility of our approach
is an expensive operation to perform in a privacy-preserving for practical data science in terms of accuracy and runtime
way, we instead propose a “mean-split Gini score” (MS-GINI) results through experiments executed on real-world data sets.
that avoids the need for sorting by selecting the mean of the In our experiments, we consider honest-majority 3PC settings
feature values as the split point. As we show in Sec. V, feature with semi-honest as well as malicious adversaries. While
selection with MS-GINI leads to accuracy improvements that parties corrupted by semi-honest adversaries follow the protocol
are on par with those obtained with GI, PCC, and MI in the data instructions correctly but try to obtain additional information,
sets used in our experiments. Depending on the application parties corrupted by malicious adversaries can deviate from the
and the data set at hand, one may want to use a different protocol instructions. Defending against the latter comes at a
feature scoring technique, in combination with our protocol higher computational cost which, as we show, can be mitigated
πFILTER−FS for private feature selection. by using a recently proposed MPC scheme for 4PC.
Fig. 1 illustrates the flow of private feature selection and
subsequent model training at a high level in an outsourced “ML II. P RELIMINARIES
as a service setting” with three computing servers, nicknamed
Alice, Bob, and Carol (three-party computation, 3PC). 3PC with A. Feature Selection based on Gini Impurity
honest majority, i.e. with at most one server being corrupted, Assume that we have a set S of m training examples, where
is a configuration that is often used in MPC because this setup each training example consists of an input feature vector
allows for some of the most efficient MPC schemes. In Step 1 (x1 , . . . , xp ) and a corresponding label y. Throughout this
of Fig. 1, each of m data owners sends secret shares of their paper, we assume that there are n possible class labels. We
data to the three servers (parties). While the secret shared data wish to induce an ML model from this training data that can
3

infer, for a previously unseen input feature vector, a label y in dishonest-majority as well as honest-majority settings, with
as accurately as possible. Not all p features may be equally passive or active adversaries. This is achieved by changing
beneficial to this end. In the filter approach to feature selection, the underlying MPC scheme to align with the desired security
all features are first assigned a score that is indicative of setting. Some of the most efficient MPC schemes have been
their predictive ability. Subsequently only the best scoring developed for 3 parties, out of which at most one is corrupted.
features are retained. A well-known feature scoring criterion We evaluate the runtime of our protocols in this honest-majority
is Gini impurity, made popular as part of the classification and 3PC setting, which is growing in popularity in the PPML
regression tree algorithm (CART) [7]. literature, e.g. [14], [24], [31], [34], and we demonstrate how
If the j th feature Fj is a discrete feature that can assume ` even better runtimes can be obtained with a recently proposed
different values, then it induces a partition S1 ∪ S2 ∪ . . . ∪ S` MPC scheme for 4PC with one corruption [13].
of S in which Si is the set of instances that have the ith value In the MPC schemes used in this paper, all computations by
for the j th feature. The Gini impurity of Si is defined as: the parties (servers) are done over integers in a ring Zq . Raw
n
X n
X data in ML applications is often real-valued. As is common in
G(Si ) = pc · (1 − pc ) = 1 − p2c (1) the MPC literature, we convert real numbers to integers using
c=1 c=1 a fixed-point representation [9]. After this conversion, the data
owners secret share their values with the parties using a secret
where pc is the probability of a randomly selected instance
sharing scheme and proceed by performing operations over the
from Si belonging to the cth class. The Gini score of feature
secret shares.
Fj is a weighted average of the Gini impurities of the Si ’s:
For the passive 3PC setting, we follow a replicated secret
`
X |Si | sharing scheme from Araki et al. ([4]). To share a secret value
G(Fj ) = · G(Si ) (2) x ∈ Zq among parties P1 , P2 and P3 , the shares x1 , x2 , x3
m
i=1 are chosen uniformly at random in Zq with the constraint
Conceptually, G(Fj ) estimates the likelihood of a randomly that x1 + x2 + x3 = x mod q. P1 receives x1 and x2 , P2
selected instance to be misclassified based on knowledge of receives x2 and x3 , and P3 receives x3 and x1 . Note that it
the value of the j th feature. During feature selection, the k is necessary to combine the shares available to two parties in
features with the lowest Gini scores are retained. order to recover x, and no information about the secret shared
If Fj is a feature with continuous values, then G(Fj ) is value x is revealed to any single party. For short, we denote
defined as the weighted average of the Gini impurities of a this secret sharing by [[x]]q . Let [[x]]q , [[y]]q be secret shared
set S≤θ containing all instances for which the j th feature values and c be a constant, the following computations can be
value is smaller than or equal to θ, and a set S>θ with all done locally by parties without communication:
instances for which the j th feature value is larger than θ. In the • Addition (z = x + y): Each party Pi gets shares of z by

CART algorithm, an optimal threshold θ is determined based computing zi = xi + yi and z(i+1 mod 3) = x(i+1 mod 3) +
on sorting of all the instances on their feature values. Since y(i+1 mod 3) . This is denoted by [[z]]q ← [[x]]q + [[y]]q .
privacy-preserving sorting is a time-consuming operation in • Subtraction [[z]]q ← [[x]]q − [[y]]q is performed analogously.

MPC [6], [20], in Sec. IV-B we propose a more straightforward • Multiplication by a constant (z = c·x): Each party multiplies

approach for threshold selection which, as we show in Sec. V, its local shares of x by c to obtain shares of z. This is denoted
yields desirable improvements in accuracy. by [[z]]q ← c · [[x]]q
• Addition of a constant (z = x + c): P1 and P3 add c to their
share x1 of x to obtain z1 , while the parties set z2 = x2
B. Secure Multiparty Computation and z3 = x3 . This will be denoted by [[z]]q ← [[x]]q + c.
Protocols for MPC enable a set of parties to jointly compute The main advantage of replicated secret sharing compared to
the output of a function over each of the parties’ private inputs, other secret sharing schemes is that replicated shares enables
without requiring parties to disclose their input to anyone. a very efficient procedure for multiplying secret shared values.
MPC is concerned with the protocol execution coming under To compute x · y = (x1 + x2 + x3 )(y1 + y2 + y3 ), the parties
attack by an adversary which may corrupt parties to learn locally perform the following computations: P1 computes z1
private information or cause the result of the computation to be = x1 · y1 + x1 · y2 + x2 · y1 , P2 computes z2 = x2 · y2 +
incorrect. MPC protocols are designed to prevent such attacks x2 · y3 + x3 · y2 and P3 computes z3 = x3 · y3 + x3 · y1 +
being successful, and use proven cryptographic techniques to x1 · y3 . By doing so, without any interaction, each Pi obtains
guarantee privacy. zi such that z1 + z2 + z3 = x · y mod q. After that, the
Adversarial Model: An adversary A can corrupt any parties are required to convert from this additive secret sharing
number of parties. In a dishonest-majority setting, half or representation back to the original replicated secret sharing
more of the parties may be corrupt, while in an honest- representation (which requires that the parties add a secret
majority setting, more than half of the parties are honest (not sharing of zero and that each party sends one share to one
corrupted). Furthermore, A can be a semi-honest or a malicious other party for a total communication of three shares). See [4]
adversary. While a party corrupted by a semi-honest or “passive” for more details.
adversary follows the protocol instructions correctly but tries to In the active 3PC setting, we use the MPC scheme
obtain additional information, parties corrupted by malicious or SYReplicated2k recently proposed by Dalskov et al. ([13]). In
“active” adversaries can deviate from the protocol instructions. this MPC scheme, the parties are prevented from deviating from
The protocols in Sec. IV are sufficiently generic to be used the protocol and from gaining knowledge from other parties
4

through the use of information-theoretic message authentication manner. [30] proposed a more principled 2PC protocol with
codes (MACs). In addition to computations over secret shares Paillier homomorphic encryption for private feature selection
of the data, the parties also perform computations required for with χ2 as filter criteria in the semi-honest setting, without
MACs. See [13] for details. Finally, we use the MPC scheme an experimental evaluation of the proposed approach. To the
recently proposed by Dalskov et al. ([13]) for the active 4PC best of our knowledge, private feature selection with malicious
setting, where the computations are outsourced to four servers adversaries has not yet been proposed or evaluated. The recent
out of which at most one has been corrupted by a malicious approach by [35] is not based on cryptography, does not
adversary. provide any formal privacy guarantees, and leaks information
Building Blocks: Building on the cryptographic primitives through disclosure of intermediate representations.
listed above for addition and multiplication of secret shared Secure Gini Score Computation: Besides as a technique to
values, MPC protocols for other operations have been developed score features for feature selection, as we do in this paper, Gini
in the literature. In this paper, we use: impurity is traditionally used in ML in the CART algorithm
• Secure matrix multiplication πDMM : at the start of this for training decision trees [7], and it has been adopted in
protocol, the parties have secret sharings [[A]] and [[B]] of MPC protocols for privacy-preserving training of decision tree
matrices A and B; at the end, the parties have a secret sharing models [17], [11], [1]. Gini score computation for continuous
[[C]] of the product of the matrices, C = A×B. πDMM can be valued features, as we do in this paper, is especially challenging
constructed as a direct extension of the secure multiplication from an MPC point of view, as it requires sorting of feature
protocol for two integers, which we will denote as πDM in values to determine candidate split points in the feature range.
the remainder of the paper. Similarly, we use πDP to denote Abspoel et al. ([1]) put ample effort in performing this sorting
the protocol for the secure dot product of two vectors. In process as efficiently as possible in a secure manner. We take a
a replicated sharing scheme, dot products can be computed drastically different approach by assuming that the mean of the
more efficiently than the direct extension from πDM , and feature values serves as a good approximation for an optimal
matrix multiplication can use this optimized version of dot split threshold. This has the double advantage that (1) there
products; we refer to Keller ([23]) for details. is no need for oblivious sorting of feature values, and (2) for
• Secure comparison protocol πLT [8]: at the start of this each feature only one Gini score for one threshold θ has to be
protocol, the parties have secret sharings [[x]] and [[y]] of two computed as opposed to computing the Gini score for multiple
integers x and y; at the end, they have a secret sharing of 1 candidate thresholds and then selecting the best one through
if x < y, and a secret sharing of 0 otherwise. secure comparisons. This leads to significant efficiency gains,
• Secure argmin protocol πARGMIN : this protocol accepts secret while preserving good accuracy, as we demonstrate in Sec. V.
sharings of a vector of integers and returns a secret sharing
of the index at which the vector has the minimum value. Protocol 1 Protocol πFILTER−FS for Secure Filter based Feature
πARGMIN is straightforwardly constructed using the above Selection
mentioned secure comparison protocol. Input: A secret shared m × p data matrix [[D]]q , a secret shared p-length
score vector [[G]]q , the number k < p of features to be selected, and a constant
• Secure equality test protocol πEQ [9]: at the start of this
t that is bigger than the highest possible score in [[G]]q
protocol, the parties have secret sharings [[x]] and [[y]] of two Output: a secret shared m × k matrix [[D0 ]]q
integers x and y; at the end, they have a secret sharing of 1 1: for i = 1 to k do
if x = y, and a secret sharing of 0 otherwise. 2: [[I[i]]]q ← πARGMIN ([[G]]q )
3: for j ← 1 to p do
• Secure division protocol πDIV [9]: at the start of this protocol, 4: [[f lagk ]]q ← πEQ ([[I[i]]]q , j)
the parties have secret sharings [[x]]q and [[y]]q of two integers 5: [[T [j][i]]]q ← [[f lagk ]]q
x and y; at the end, they have a secret sharing [[z]]q of 6: [[G[j]]]q ← [[G[j]]]q + πDM ([[f lagk ]]q , t − [[G[j]]]q )
7: end for
z = x/y. 8: end for
9: [[D0 ]]q ← πDMM ([[D]]q , [[T ]]q )
III. R ELATED W ORK 10: return [[D0 ]]q

Private Feature Selection: Given that feature selection is an

important step in the data preparation pipeline, it has received
remarkably little attention in the PPML literature to date. IV. M ETHODOLOGY
Feature selection techniques have been proposed that favor We present a protocol for oblivious feature selection based
features that do not contain sensitive information [22]. Work on precomputed scores for the features, followed by a protocol
like that is orthogonal to ours, as it assumes the existence for computing the feature scores themselves in a private manner.
of a data curator with full access to all the data. Regarding In Sec. V we evaluate the protocols in 3PC and 4PC honest-
approaches to private feature selection among multiple data majority settings.
owners, early attempts [5], [32] in the semi-honest setting use
a “distributed secure sum protocol” reminiscent of the way in
which sums are computed in MPC based on secret sharing (see A. Secure Filter based Feature Selection
Sec. II-B). The limitations of this work in terms of security At the start of the Protocol πFILTER−FS for secure feature
include the fact that the parties find out which features are selection, the parties have secret shares of a data matrix D
selected, and statistical information about the data is leaked to of size m × p, in which the rows correspond to instances and
all parties during the computation of the feature scores, as only the columns to features. The parties also have secret shares
summations, and not other operations, are done in a secure of a vector G of length p containing a score for each of the
5

features. At the end of the protocol, the parties have a reduced As is common in MPC protocols, we use multiplication
matrix D0 of size m × k in which only the columns from D instead of control flow logic for conditional assignments. To this
corresponding to the lowest scores in G are retained (note that end, a conditional based branch operation as “if c then a ← b”
this protocol can be trivially modified to select the k features is rephrased as a ← a + c · (b − a). In this way, the number
with the highest scores). The main ideas behind the protocol and the kind of operations executed by the parties does not
(which is described in Protocol 1) are to: depend on the actual values of the inputs, so it does not leak
1) Determine the indices of the features that need to be selected information that could be exploited by side-channel attacks.
(these are stored in a secret-shared way in I). Such a conditional assignment occurs in Line 6 of Protocol 1,
2) Create a matrix T in which the columns are one-hot-encoded where the value of the condition c itself is computed on Line
representations of these indices. 4. In the final step, on Line 9, the parties multiply matrix D
3) Multiply D with this feature selection matrix T . with matrix T in a secure manner to obtain a matrix D0 that
Before walking through the pseudocode of Protocol 1, we contains only the feature columns corresponding to the k best
present a plaintext example to illustrate the notation. features. Throughout this process, the parties are unaware of
Example 1. Consider the data matrix D at the left of which features were actually selected. The secret shared matrix
Equation (3), containing values for m = 5 instances (rows) D0 can subsequently be used as input for a privacy-preserving
and p = 4 features (columns). Assume that the feature score ML model training protocol, e.g. [16].
vector is G = [65, 26, 83, 14] and that we want to select the
k = 2 features with the lowest scores in G. B. Secure Feature Score Computation
1 2 3 4 4 2
   
5 6 7 8

0 0

 8 6 Protocol πFILTER−FS assumes the availability of a feature
  0 1 
  

 9 10 11 12  ·  0 0  =  12 10
 
 (3) score vector G and an upper bound t for the values in
 13 14 15 16   16 14 
17 18 19 20
1 0
20 18 G. Below we explain how this can be obtained from the
data in a secure manner. To this end, we present a protocol
| {z }
| {z } | {z }
T
D D0
πMS−GINI for computation of the score of a feature based
The lowest scores in G are 14 and 26, hence the 4th and the on Gini impurity. This protocol is applicable to data sets
2nd column of D should be selected. The columns of T in with continuous features. It is computationally cheaper than
Equation (3) are a one-hot-encoding of 4 and 2 respectively, previously proposed protocols for Gini impurity that rely on
and multiplying D with T will yield the desired reduced data sorting of feature values. Furthermore, as shown in previous
matrix D0 . This multiplication takes place on Line 9 in Protocol work [25] and in Sec. V, the “Mean-Split” GINI score can
1. The bulk of Protocol 1 is about how to construct T based yield similar accuracy improvements.
on G. As explained below, this process involves an auxiliary Recall that we have a set S of m training examples, where
vector, which, at the end of the protocol, contains the following each training example consists of an input feature vector
values for our example: I = [4, 2]. (x1 , . . . , xp ) and a corresponding label y. We propose to split
In the protocol, vector [[I]]q of length k stores the indices the set of values of the j th feature Fj based on its mean value
of the k selected features out of the p features of [[D]]q and as a threshold θ. We denote by S≤θ the set of instances that
matrix [[T ]]q is a p × k transformation matrix that eventually have xj ≤ θ, and by S>θ the set of instances that have xj > θ.
holds one-hot-encodings of the indices in I. Through executing Furthermore, for c = 1, . . . , n, we denote by Lc the set of
Lines 1-8 of Protocol 1, the parties construct a feature selection examples from S that have class label y = c. Based on the
matrix T based on the values in G. In Line 2 the index of binary split, we define the MS-GINI (“Mean-Split” GINI) score
the ith smallest value in [[G]]q is identified. To this end, the for feature Fj as:
parties run a secure argmin protocol πARGMIN . The inner for-
1
loop serves two purposes, namely constructing the ith column G(Fj ) = · (|S≤θ | · G(S≤θ ) + |S>θ | · G(S>θ )) (4)
of matrix T , and overwriting the score in G of the feature that m
was selected in Line 2 by the upper bound, so that it will not with the Gini impurities of S≤θ and S>θ defined as:
be selected anymore in further iterations of the outer for-loop Xn Xn
(such an upper bound t is passed as input to Protocol 1 and is G(S≤θ ) = 1 − (p≤θ
c ) 2
; G(S>θ ) = 1 − (p>θ
c )
2
(5)
usually very easy to determine in practice, as most common c=1 c=1
feature scoring techniques range between 0 and 1): and the probabilities defined as:
th
• To construct the i column of T , the parties loop through
|S≤θ ∩ Lc | |S>θ ∩ Lc |
row j = 1 . . . p, and on Line 5, update T [j][i] with either a p≤θ
c = ; p>θ
c = (6)
0 or a 1, depending on the outcome of the secure equality |S≤θ | |S>θ |
test on Line 4. The outcome of this test will be 1 exactly Formulas (4), (5) and (6) are consistent with the definition of
once, namely when j equals I[i], hence Line 5 results in a Gini score given in Sec. II, and presented here in more detail to
one-hot-encoding of I[i] stored in the ith column of T . enhance the readability of our secure protocol πMS−GINI for the
• The flag f lagk computed on Line 4 is used again on Line 6 computation of the Gini score G(F ) of feature F (described
to overwrite G[I[i]] with t in an oblivious manner, where t in Protocol 2).
is a value that is larger than the highest possible score that At the start of Protocol πMS−GINI , the parties have secret
occurs in [[G]]q . This theoretical upper bound t ensures that shares of a feature column F (think of this as a column from
feature I[i] will not be selected again in later iterations of data matrix D in Example 1), as well as secret shares of an one-
the outer for-loop. hot-encoded version of the label vector. The latter is represented
6

Protocol 2 Protocol πMS−GINI for Secure MS-GINI Score of 13-14 can be performed locally by the parties, on their own
a Feature shares. Moving the computation of [[A[n]]]q and [[B[n]]]q out
Input: A secret shared feature column [[F ]]q = ([[f1 ]]q ,[[f2 ]]q ,...,[[fm ]]q ), a of the for-loop, reduces the number of secure multiplications
secret shared m × (n − 1) label-class matrix [[L]]q , where m is the number
of instances and n is the number of classes. needed from m × n to m × (n − 1). In the case of a binary
Output: MS-GINI score [[G(F )]]q of the feature F classification problem, i.e. n = 2, this means that the number
1
1: [[θ]]q ← ([[f1 ]]q + [[f2 ]]q + ... + [[fm ]]q ) · m of secure multiplications required is cut down by half.
2: Initialize [[a]]q , [[b]]q , [[A]]q and [[B]]q with zeros. Using the notations for the counters from the pseudocode
3: for i ← 1 to m do
4: [[f lags ]]q ← πLT ([[θ]]q , [[fi ]]q ) of Protocol 2, Equation (4) comes down to:
5: [[b]]q ← [[b]]q + [[f lags ]]q     
6: for j ← 1 to n − 1 do 1 Xn
A[j] 2
Xn
B[j] 2

7: [[f lagm ]]q ← πDM ([[f lags ]]q , [[L[i][j]]]q ) G(F ) = · a· 1−
   + b· 1−
 
m j=1
a b
8: [[B[j]]]q ← [[B[j]]]q + [[f lagm ]]q j=1
1 1 1
9: [[A[j]]]q ← [[A[j]]]q + [[L[i][j]]]q − [[f lagm ]]q = · a− ·A•A + b− ·B•B
10: end for m a b
11: end for
12: [[a]]q ← m − [[b]]q in which A•A and B •B are the dot products of A and B with
13: [[A[n]]]q ← [[a]]q − ([[A[1]]]q + ... + [[A[n − 1]]]q ) themselves, respectively. These computations are performed
14: [[B[n]]]q ← [[b]]q − ([[B[1]]]q + ... + [[B[n − 1]]]q ) by the parties on Lines 15-17 using, among other things, the
15: [[G(S≤θ )]]q ← [[a]]q − πDM ( πDP ([[A]]q , [[A]]q ), πDIV (1, [[a]]q ))
protocol πDP for secure dot product of vectors, and the protocol
16: [[G(S>θ )]]q ← [[b]]q − πDM ( πDP ([[B]]q , [[B]]q ), πDIV (1, [[b]]q ))
17: [[G(F )]]q ← [[G(S≤θ )]]q + [[G(S>θ )]]q πDIV for secure division. We note that the final multiplication
18: return [[G(F )]]q with the factor 1/m is omitted altogether from Protocol 2 as
this will have no effect on the relative ordering of the scores
of the individual features.
as a label-class matrix [[L]]q , in which [[L[i][j]]]q = [[1]]q means If data are vertically partitioned and all data owners have the
that the label of the ith instance is equal to the j th class. label vector, they can compute MS-GINI scores offline without
Otherwise, [[L[i][j]]]q = [[0]]q . We note that, while there are n πMS−GINI , and the computing servers would only have to do
classes, it is sufficient for L to contain only n − 1 columns: feature selection based on pre-computed MS-GINI scores with
as there is exactly one value 1 per row, the value of the nth Protocol πFILTER−FS . In reality, often, it is not reasonable to
column is implicit from the values of the other columns. We allow each data owner to have all labels, so we do not assume
indirectly take advantage of this fact by terminating the loop this scenario in our protocols.
on Line 6-10 at n − 1, and performing calculations for the nth
class separately and in a cheaper manner on Line 13-14, as C. Secure Feature Selection with MS-GINI
we explain in more detail below. Protocol πGINI−FS (described in Protocol 3) performs secure
On Line 1, the parties compute [[θ]]q as a threshold to split filter-based feature selection with MS-GINI, used for the
the input feature [[F ]]q , as the mean of the feature values in experiments in this work. It combines the building blocks
the column. To this end, each party first sums up the secret presented earlier in the section. By executing the loop on Line
shares of the feature values, and then multiplies the sum with 1-3, the parties compute the MS-GINI score of the ith feature
1
a known constant m locally. Line 2 is to initialize all counters from the original data matrix [[D]] using Protocol π
q MS−GINI ,
related to S≤θ and S>θ to zero. After Line 14, these counters and store it into [[G[i]]] . On Line 4, the parties perform filter-
q
will contain the following values: based feature selection using Protocol πFILTER−FS to obtain a
a = |S≤θ | m × k matrix [[D0 ]]q with k selected features from [[D]]q . As
b = |S>θ | the standard GINI score is upper bounded by 1, and πMS−GINI
A[j] = |S≤θ ∩ Lj |, for j = 1 . . . n ignores the multiplication by 1/m for efficiency reasons, it is
B[j] = |S>θ ∩ Lj |, for j = 1 . . . n safe to use m as the upper bound that is passed to Protocol
πFILTER−FS on Line 4.
These counters are needed for the probabilities in Equation
(6). For each instance, in Line 4 of Protocol 2, the parties Protocol 3 Protocol πGINI−FS for Secure Filter-based Feature
perform a secure comparison to determine whether the instance Selection with MS-GINI
belongs to S>θ . The outcome of that test is added to b on Input: A secret shared m × p data matrix [[D]]q = ([[F1 ]]q ,[[F2 ]]q ,...,[[Fp ]]q ),
shared m × (n − 1) label-class matrix [[L]]q , where m is the number
Line 5. Since the total number of instances is m, a can be aofsecret instances, p the number of features, n the number of classes, and k the
straightforwardly computed as m − b after the outer for-loop, number of features to be selected.
i.e. on Line 12. Lines 7-8 check whether the instance belongs Output: a secret shared m × k matrix [[D0 ]]q
to S>θ ∩ Lj , in which case B[j] is incremented by 1. The 1: for i ← 1 to p do
2: [[G[i]]]q ← πMS−GINI ([[Fi ]]q , [[L]]q , m, n)
equivalent operation of Line 7-8 for A[j] would be [[A[j]]]q ← 3: end for
[[A[j]]]q + πDM ((1 − [[f lags ]]q ), [[L[i][j]]]q ). We have simplified 4: [[D0 ]]q ← πFILTER−FS ([[D]]q , [[G]]q , k, m)
this instruction on Line 9, taking advantage of the fact that 5: return [[D0 ]]q
πDM ([[f lags ]]q , [[L[i][j]]]q ) has been precomputed as [[f lagm ]]q
on Line 7.
On Line 13-14 the parties compute [[A[n]]]q and [[B[n]]]q , V. E XPERIMENTS AND R ESULTS
leveraging the fact that sum of all values in [[A]]q is [[a]]q , and The first four columns of Table I contain details for three data
the sum of all values in [[B]]q is [[b]]q . All operations on Line sets corresponding to binary classification tasks with continuous
7

TABLE I
F EATURE SELECTION ACCURACY AND RUNTIME RESULTS
data set details logistic regression accuracy results runtime
Data set m p k #folds RAW MS-GINI GI PCC MI passive 3PC active 3PC active 4PC
CogLoad 632 120 12 6 50.90% 52.50% 52.70% 48.57% 51.59% 50 sec 163 sec 79 sec
LSVT 126 310 103 10 80.09% 86.15% 82.74% 78.89% 85.38% 60 sec 254 sec 89 sec
SPEED 8,378 122 67 10 95.24% 97.26% 95.56% 95.89% 95.83% 949 sec 3,634 sec 1,435 sec

TABLE II For further insight in the dominating factors in the runtime

RUNTIME DETAILS FOR ACTIVE 3PC cost, in Table II we present more fine-grained runtime results
data set details runtime for the active 3PC setting. Protocol 2, which is executed once
Data set m p k Prot 1 Prot 1, Ln 9 Prot 2 per feature, in itself grows in the number of instances m. While
CogLoad 632 120 12 27 sec 23 sec 1.13 sec
LSVT 126 310 103 152 sec 53 sec 0.33 sec the nested for-loop on Line 1-8 in Protocol 1 depends on k
SPEED 8,378 122 67 1,837 sec 1,812 sec 14.73 sec and p only, the matrix multiplication on Line 9 in Protocol 1
depends on all of m, p, and k, and contributes substantially to
the runtime. The increase in runtime for the SPEED vs. the
valued input features: Cognitive Load Detection1 (CogLoad) CogLoad data set e.g., which have almost the same number
[19], Lee Silverman Voice Treatment2 (LSVT) [33], and Speed of original features p, is due both to the increase in m (which
Dating3 (SPEED) [18], along with the number of instances affects Line 9 in Protocol 1, and Line 3-11 in Protocol 2), and
m, raw features p, selected features k, and folds for cross- the increase in k (which affects Line 1-8 of Protocol 1).
validation (CV). The middle five columns of Table I contain
accuracy results by averaging from CV for logistic regression VI. C ONCLUSION AND F UTURE W ORK
(LR) models trained on the RAW data sets with all p features, Data preprocessing, an important part of the ML model
and on reduced data sets with only the top k features selected development pipeline, has been largely overlooked in the PPML
with a variety of scoring techniques, namely MS-GINI (as literature to date. In this paper we have proposed an MPC
proposed in this paper), traditional Gini impurity (GI), Pearson protocol for privacy-preserving selection of the top k features
correlation coefficient (PCC), and mutual information (MI). of a data set, and we have demonstrated its feasibility in
Feature selection with all these techniques was performed practice through an experimental evaluation. Our protocol is
according to the filter approach, i.e. independently of the fact based on the filter approach for feature selection, which means
that the selected features were subsequently used to train a that it is independent of any specific ML model architecture.
LR model. As the results show, feature selection based on Furthermore, it can be used in combination with any feature
MS-GINI is on par with the other methods, and substantially scoring technique. In this paper, we have proposed an efficient
improves the accuracy compared to model training on the RAW MPC protocol based on Gini impurity to this end.
data sets. In addition to MPC protocols for other feature selection
The last three columns of Table I contain runtime results techniques, MPC protocols for many more tasks related to the
for protocol πGINI−FS for secure filter-based feature selection data preprocessing phase still need to be developed, including
with MS-GINI (see Protocol 3). To obtain these results, we privacy-preserving hyperparameter search to determine the best
implemented πGINI−FS along with the supporting protocols value of k for the number of features to be selected, as well as
πMS−GINI and πFILTER−FS in MP-SPDZ [23]. All benchmark protocols for dealing with outliers and missing values. While
tests were completed on 3 or 4 co-located F32s V2 Azure virtual these may be perceived as less exciting tasks of the ML end-
machines. Each VM contains 32 cores, 64 GiB of memory, to-end pipeline, they are crucial to enable PPML applications
and up to a 14 Gbps network bandwidth between each virtual in practical data science.
machine. The runtime results are for semi-honest (“passive”)
and malicious (“active”) adversary models (see Sec. II-B) in R EFERENCES
a 3PC or 4PC honest-majority setting over a ring Zq with [1] Mark Abspoel, Daniel Escudero, and Nikolaj Volgushev. Secure training
q = 264 . Each of the parties ran on separate machines, which of decision trees with continuous attributes. In Proceedings on Privacy
means that the results in Table I cover communication time in Enhancing Technologies (PoPETs), pages 167–187, 2021.
[2] Anisha Agarwal, Rafael Dowsley, Nicholas D McKinney, Dongrui
addition to computation time. Similarly as for the accuracies, Wu, Chin Teng Lin, Martine De Cock, and Anderson Nascimento.
the reported runtimes in Table I are an average across the Protecting privacy of users in brain-computer interface applications.
folds. The relative differences between the passive 3PC, active IEEE Transactions on Neural Systems and Rehabilitation Engineering,
27(8):1546–1555, 2019.
3PC , and active 4PC settings are in line with known findings [3] Nitin Agrawal, Ali Shahin Shamsabadi, Matt J Kusner, and Adrià Gascón.
from the MPC literature, in particular the fact that completing QUOTIENT: two-party secure neural network training and prediction. In
private feature selection in the active setting takes substantially ACM SIGSAC Conference on Computer and Communications Security,
pages 1231–1247, 2019.
longer than in the passive setting; this increase in runtime is a [4] Toshinori Araki, Jun Furukawa, Yehuda Lindell, Ariel Nof, and Kazuma
price one has to pay for security and correctness in case the Ohara. High-throughput semi-honest secure three-party computation with
parties can not be trusted to follow the protocol instructions. an honest majority. In Proceedings of the 2016 ACM SIGSAC Conference
on Computer and Communications Security, page 805–817, 2016.
[5] Madhushri Banerjee and Sumit Chakravarty. Privacy preserving feature
1 https://www.ubittention.org/2020/data/Cognitive- load%20challenge%20description.pdf
selection for distributed data using virtual dimension. In Proceedings of
2 https://archive.ics.uci.edu/ml/datasets/LSVT+Voice+Rehabilitation the 20th ACM International Conference on Information and Knowledge
3 https://www.openml.org/d/40536 Management, pages 2281–2284, 2011.
8

[6] Dan Bogdanov, Sven Laur, and Riivo Talviste. Oblivious sorting of of millions of records. In IEEE Symposium on Security and Privacy
secret-shared data. Technical Report, 2013. (SP), pages 334–348, 2013.
[7] Leo Breiman, Jerome Friedman, Charles Stone, and Richard Olshen. [30] Vanishree Rao, Yunhui Long, Hoda Eldardiry, Shantanu Rane, Ryan A.
Classification and Regression Trees. Taylor and Francis, 1st edition, Rossi, and Frank Torres. Secure two-party feature selection. arXiv
1984. preprint arXiv:1901.00832, 2019.
[8] O. Catrina and S. De Hoogh. Improved primitives for secure multiparty [31] M.S. Riazi, C. Weinert, O. Tkachenko, E.M. Songhori, T. Schneider, and
integer computation. In International Conference on Security and F. Koushanfar. Chameleon: A hybrid secure computation framework for
Cryptography for Networks, pages 182–199. Springer, 2010. machine learning applications. In Asia Conference on Computer and
[9] O. Catrina and A. Saxena. Secure computation with fixed-point numbers. Communications Security, pages 707–721, 2018.
In 14th International Conference on Financial Cryptography and Data [32] Mina Sheikhalishahi and Fabio Martinelli. Privacy-utility feature selection
Security, volume 6052 of Lecture Notes in Computer Science, pages as a privacy mechanism in collaborative data classification. In IEEE
35–50. Springer, 2010. 26th International Conference on Enabling Technologies: Infrastructure
[10] Girish Chandrashekar and Ferat Sahin. A survey on feature selection for Collaborative Enterprises (WETICE), pages 244–249, 2017.
methods. Computers & Electrical Engineering, 40(1):16 – 28, 2014. [33] Athanasios Tsanas, Max A. Little, Cynthia Fox, and Lorraine O.
[11] C.A. Choudhary, M. De Cock, R. Dowsley, A. Nascimento, and Ramig. Objective automatic assessment of rehabilitative speech treatment
D. Railsback. Secure training of extra trees classifiers over continuous in parkinson’s disease. IEEE Transactions on Neural Systems and
data. In AAAI-20 Workshop on Privacy-Preserving Artificial Intelligence, Rehabilitation Engineering, 22(1):181–190, 2014.
2020. [34] Sameer Wagh, Divya Gupta, and Nishanth Chandran. SecureNN: 3-party
[12] Ronald Cramer, Ivan Bjerre Damgard, and Jesper Buus Nielsen. Secure secure computation for neural network training. Proceedings on Privacy
Multiparty Computation and Secret Sharing. Cambridge University Press, Enhancing Technologies (PoPETs), 2019(3):26–49, 2019.
1st edition, 2015. [35] Xiucai Ye, Hongmin Li, Akira Imakura, and Tetsuya Sakurai. Distributed
[13] A. Dalskov, D. Escudero, and M. Keller. Fantastic four: Honest-majority collaborative feature selection based on intermediate representation. In
four-party secure computation with malicious security. Cryptology ePrint International Joint Conference on Artificial Intelligence, pages 4142–
Archive, Report 2020/1330, 2020. 4149, 2019.
[14] A. Dalskov, D. Escudero, and M. Keller. Secure evaluation of quantized
neural networks. Proceedings on Privacy Enhancing Technologies,
2020(4):355–375, 2020.
[15] Martine De Cock, Rafael Dowsley, Anderson C. A. Nascimento, and
Stacey C. Newman. Fast, privacy preserving linear regression over
distributed datasets based on pre-distributed data. In 8th ACM Workshop
on Artificial Intelligence and Security (AISec), page 3–14, 2015.
[16] Martine De Cock, Rafael Dowsley, Anderson C. A. Nascimento, Davis
Railsback, Jianwei Shen, and Ariel Todoki. High performance logistic
regression for privacy-preserving genome analysis. BMC Medical
Genomics, 14(1):23, 2021.
[17] Sebastiaan De Hoogh, Berry Schoenmakers, Ping Chen, and Harm
op den Akker. Practical secure decision tree learning in a teletreatment
application. In International Conference on Financial Cryptography and
Data Security, pages 179–194. Springer, 2014.
[18] Raymond Fisman, Sheena S. Iyengar, Emir Kamenica, and Itamar
Simonson. Gender differences in mate selection: Evidence from a speed
dating experiment. The Quarterly Journal of Economics, 121(2):673–697,
2006.
[19] Martin Gjoreski, Tine Kolenik, Timotej Knez, Mitja Luštrek, Matjaž
Gams, Hristijan Gjoreski, and Veljko Pejović. Datasets for cognitive
load inference using wearable sensors and psychological traits. Applied
Sciences, 10(11):38–43, 2020.
[20] M. Goodrich. Zig-zag sort: A simple deterministic data-oblivious sorting
algorithm running in o(n log n) time. In Proceedings of the 46th Annual
ACM Symposium on Theory of Computing, pages 684–693, 2014.
[21] Chuan Guo, Awni Hannun, Brian Knott, Laurens van der Maaten, Mark
Tygert, and Ruiyu Zhu. Secure multiparty computations in floating-point
arithmetic. arXiv preprint arXiv:2001.03192, 2020.
[22] Yasser Jafer, Stan Matwin, and Marina Sokolova. A framework for
a privacy-aware feature selection evaluation measure. In 13th Annual
Conference on Privacy, Security and Trust (PST), pages 62–69. IEEE,
2015.
[23] Marcel Keller. MP-SPDZ: A versatile framework for multi-party
computation. In Proceedings of the 2020 ACM SIGSAC Conference
on Computer and Communications Security, page 1575–1590, 2020.
[24] N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma.
CrypTFlow: Secure TensorFlow inference. In 41st IEEE Symposium on
Security and Privacy, 2020.
[25] Xiling Li and Martine De Cock. Cognitive load detection from wrist-band
sensors. In Adjunct Proceedings of the 2020 ACM International Joint
Conference on Pervasive and Ubiquitous Computing and Proceedings of
the 2020 ACM International Symposium on Wearable Computers, page
456–461, 2020.
[26] Yehuda Lindell and Benny Pinkas. Privacy preserving data mining. In
Annual International Cryptology Conference, pages 36–54. Springer,
2000.
[27] Steven Lohr. For big-data scientists, ‘janitor work’ is key hurdle to
insights. The New York Times, 2014.
[28] P. Mohassel and Y. Zhang. Secureml: A system for scalable privacy-
preserving machine learning. In IEEE Symposium on Security and
Privacy (SP), pages 19–38, 2017.
[29] Valeria Nikolaenko, Udi Weinsberg, Stratis Ioannidis, Marc Joye, Dan
Boneh, and Nina Taft. Privacy-preserving ridge regression on hundreds

Research Proposal For MS (CS) Thesis
100% (1)
Research Proposal For MS (CS) Thesis
9 pages
Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
Client Aided Privacy Preserving Machine Learning
No ratings yet
Client Aided Privacy Preserving Machine Learning
42 pages
Safenet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning
No ratings yet
Safenet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning
20 pages
Sec24summer Prepub 278 Liu Fengrun
No ratings yet
Sec24summer Prepub 278 Liu Fengrun
18 pages
Privacy Preserving Machine Learning
No ratings yet
Privacy Preserving Machine Learning
28 pages
Https:Petsymposium - org:Popets:2025:Popets 2025 0010
No ratings yet
Https:Petsymposium - org:Popets:2025:Popets 2025 0010
15 pages
2017 - Privacy-Preserving
No ratings yet
2017 - Privacy-Preserving
20 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
SecureBoost A Lossless Federated Learning Framework
No ratings yet
SecureBoost A Lossless Federated Learning Framework
9 pages
MLFormer A High Performance MPC Linear Inference F
No ratings yet
MLFormer A High Performance MPC Linear Inference F
21 pages
1differentially Private Federated Learning With An Adaptive Noise Mechanism
No ratings yet
1differentially Private Federated Learning With An Adaptive Noise Mechanism
14 pages
Secure Multi-Party Computation For Machine Learning A Survey
No ratings yet
Secure Multi-Party Computation For Machine Learning A Survey
19 pages
Research Article - A Privacy-Preserving Data Stream Mining Technique Based On Cumulative and Independent Additive Noise For Improving Random Projection
No ratings yet
Research Article - A Privacy-Preserving Data Stream Mining Technique Based On Cumulative and Independent Additive Noise For Improving Random Projection
26 pages
A Survey On Differentially Private Machine Learning Review Article
No ratings yet
A Survey On Differentially Private Machine Learning Review Article
16 pages
Enterprise Data Protection with Rubrik: Definitive Reference for Developers and Engineers
From Everand
Enterprise Data Protection with Rubrik: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mobile Neural Network Framework in Practice: The Complete Guide for Developers and Engineers
From Everand
Mobile Neural Network Framework in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Challenges in Algorithmic Fairness When Using Multi Party Computation Models
No ratings yet
Challenges in Algorithmic Fairness When Using Multi Party Computation Models
16 pages
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
No ratings yet
Privacy-Preserving Data Mining in The Malicious Model: Murat Kantarcioglu
23 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
25 Privacy Preserving Machine Lea
No ratings yet
25 Privacy Preserving Machine Lea
13 pages
Beacon
No ratings yet
Beacon
18 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
2012 Ijact
No ratings yet
2012 Ijact
7 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Gaussian Membership Inference Privacy: Tobias Leemann Martin Pawelczyk Gjergji Kasneci
No ratings yet
Gaussian Membership Inference Privacy: Tobias Leemann Martin Pawelczyk Gjergji Kasneci
34 pages
CLIP Systems and Applications: The Complete Guide for Developers and Engineers
From Everand
CLIP Systems and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Data Mining
No ratings yet
Data Mining
52 pages
Rania Talbi Pres Soutn
No ratings yet
Rania Talbi Pres Soutn
197 pages
A Critical Overview of Privacy in Machine Learning
No ratings yet
A Critical Overview of Privacy in Machine Learning
9 pages
Privacy Preserving ML 3
No ratings yet
Privacy Preserving ML 3
49 pages
How To Scale Multi-Party Computation: Marcel Keller
No ratings yet
How To Scale Multi-Party Computation: Marcel Keller
12 pages
Enterprise Data Protection with Veritas Technologies: Definitive Reference for Developers and Engineers
From Everand
Enterprise Data Protection with Veritas Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
From Everand
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Privacy-Preserving in Machine Learning Technical Seminar Mukund
No ratings yet
Privacy-Preserving in Machine Learning Technical Seminar Mukund
16 pages
Data Loss Prevention Technologies and Strategies: Definitive Reference for Developers and Engineers
From Everand
Data Loss Prevention Technologies and Strategies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Strategic Security Information and Event Management: Definitive Reference for Developers and Engineers
From Everand
Strategic Security Information and Event Management: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Striim Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Striim Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cloud Computing: The Untold Origins of Cloud Computing (Manipulation, Configuring and Accessing the Applications Online)
From Everand
Cloud Computing: The Untold Origins of Cloud Computing (Manipulation, Configuring and Accessing the Applications Online)
William Cormier
No ratings yet
Sok: Security and Privacy in Machine Learning
No ratings yet
Sok: Security and Privacy in Machine Learning
16 pages
Federated Learning With Differential Privacy Algorithms and Performance Analysis
No ratings yet
Federated Learning With Differential Privacy Algorithms and Performance Analysis
16 pages
Cloud Computing Essentials: A Practical Guide with Examples
From Everand
Cloud Computing Essentials: A Practical Guide with Examples
William E. Clark
No ratings yet
Privacy Preserving On Continuous and Discrete Data Sets - A Novel Approach
No ratings yet
Privacy Preserving On Continuous and Discrete Data Sets - A Novel Approach
7 pages
Differential Privacy For Deep and Federated Learning A Survey
No ratings yet
Differential Privacy For Deep and Federated Learning A Survey
22 pages
BTP Research Internship Final Report
No ratings yet
BTP Research Internship Final Report
21 pages
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
No ratings yet
Expert Systems - 2023 - Gopalakrishnan - PriMed Private Federated Training and Encrypted Inference On Medical Images in
14 pages
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
No ratings yet
Differential Privacy Preserving Using TensorFlow DP-SGD and 2D-CNN For Large-Scale Image Data
9 pages
Debezium in Action: Definitive Reference for Developers and Engineers
From Everand
Debezium in Action: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
(IJCST-V12I4P8) :pankaj Sarde, Vaishali Sarde
No ratings yet
(IJCST-V12I4P8) :pankaj Sarde, Vaishali Sarde
5 pages
COM3030 Week 10 Slides
No ratings yet
COM3030 Week 10 Slides
63 pages
Securing Cloud Services - A pragmatic guide: Second edition
From Everand
Securing Cloud Services - A pragmatic guide: Second edition
Lee Newcombe
No ratings yet
A Generic Framework For Privacy Preserving Deep Learning: Member of The Openmined Community
No ratings yet
A Generic Framework For Privacy Preserving Deep Learning: Member of The Openmined Community
5 pages
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
From Everand
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
BPS-FL Blockchain-Based Privacy-Preserving and Sec
No ratings yet
BPS-FL Blockchain-Based Privacy-Preserving and Sec
25 pages
Metaplane for Data Reliability Engineering: The Complete Guide for Developers and Engineers
From Everand
Metaplane for Data Reliability Engineering: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
Cloud Computing: A Comprehensive Guide to Cloud Computing (Your Roadmap to Cloud Computing, Big Data and Linked Data)
From Everand
Cloud Computing: A Comprehensive Guide to Cloud Computing (Your Roadmap to Cloud Computing, Big Data and Linked Data)
Murray Turner
No ratings yet
Predicting Sentiment of Comments To News On Reddit
No ratings yet
Predicting Sentiment of Comments To News On Reddit
81 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Ch08 MF
No ratings yet
Ch08 MF
28 pages
A Comprehensive Survey On Machine Learning For Networking
No ratings yet
A Comprehensive Survey On Machine Learning For Networking
99 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Paper-AIHC-Stock Prediction Using Social Media, News-Revised
No ratings yet
Paper-AIHC-Stock Prediction Using Social Media, News-Revised
21 pages
Sport Analytics For Cricket Game Results Using Machine Learning - An Experimental Study - Semantic Scholar
No ratings yet
Sport Analytics For Cricket Game Results Using Machine Learning - An Experimental Study - Semantic Scholar
9 pages
Using Machine Learning Techniques To Identify Rare Cyber Attacks On The UNSW NB15 Dataset
No ratings yet
Using Machine Learning Techniques To Identify Rare Cyber Attacks On The UNSW NB15 Dataset
14 pages
21CS644
No ratings yet
21CS644
3 pages
Vanaja Internship Report 2023
No ratings yet
Vanaja Internship Report 2023
39 pages
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
No ratings yet
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
20 pages
Dengue Fever Prediction Using Data Mining Technique: Abstract
No ratings yet
Dengue Fever Prediction Using Data Mining Technique: Abstract
8 pages
8.scopus ECB IJ Ramesh 2023
No ratings yet
8.scopus ECB IJ Ramesh 2023
11 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Hybrid Feature Selection Models For Machine
No ratings yet
Hybrid Feature Selection Models For Machine
5 pages
Data Classification and Incremental Clustering in Data Mining and Machine Learning Sanjay Chakraborty PDF Download
No ratings yet
Data Classification and Incremental Clustering in Data Mining and Machine Learning Sanjay Chakraborty PDF Download
76 pages
ML Module 1
No ratings yet
ML Module 1
26 pages
Veneta Haralampieva
No ratings yet
Veneta Haralampieva
66 pages
Computational Science - ICCS 2020
No ratings yet
Computational Science - ICCS 2020
632 pages
Bi Notes
No ratings yet
Bi Notes
138 pages
Program
No ratings yet
Program
51 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
57 pages
Mphil Thesis in Computer Science Data Mining
100% (3)
Mphil Thesis in Computer Science Data Mining
7 pages
1 s2.0 S0957417422001452 Main
No ratings yet
1 s2.0 S0957417422001452 Main
41 pages
MLOps Brochure BITS
No ratings yet
MLOps Brochure BITS
27 pages
A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
No ratings yet
A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
7 pages
House Prediction Using Machine Learning
No ratings yet
House Prediction Using Machine Learning
21 pages

Privacy-Preserving Feature Selection With Secure Multiparty Computation

Uploaded by

Privacy-Preserving Feature Selection With Secure Multiparty Computation

Uploaded by

1

Privacy-Preserving Feature Selection with Secure

Private Feature Selection: Given that feature selection is an

TABLE II For further insight in the dominating factors in the runtime

You might also like