0% found this document useful (0 votes)
18 views20 pages

2021 Digestive Neural Network A Novel Defense Strategy Against Inference Attacks in Federated Learning

Uploaded by

Muhammad Abrar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

2021 Digestive Neural Network A Novel Defense Strategy Against Inference Attacks in Federated Learning

Uploaded by

Muhammad Abrar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

computers & security 109 (2021) 102378

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cose

Digestive neural networks: A novel defense


strategy against inference attacks in federated
learning ✩

Hongkyu Lee a, Jeehyeong Kim b, Seyoung Ahn c, Rasheed Hussain d,


Sunghyun Cho c, Junggab Son a,1,∗
a Informationand Intelligent Security (IIS) Lab, Kennesaw State University, Marietta, GA 30060, USA
b KoreaElectronics Technology Institute, Seongnam, South Korea
c Department of Computer Science and Engineering, Major in Bio Artificial Intelligence, Hanyang University, Ansan,

South Korea
d Networks and Blockchain Lab, Innopolis University, Innopolis, Russia

a r t i c l e i n f o a b s t r a c t

Article history: Federated Learning (FL) is an efficient and secure machine learning technique designed for
Received 1 March 2021 decentralized computing systems such as fog and edge computing. Its learning process em-
Revised 3 June 2021 ploys frequent communications as the participating local devices send updates, either gra-
Accepted 14 June 2021 dients or parameters of their models, to a central server that aggregates them and redis-
Available online 23 June 2021 tributes new weights to the devices. In FL, private data does not leave the individual local
devices, and thus, rendered as a robust solution in terms of privacy preservation. However,
Keywords: the recently introduced membership inference attacks pose a critical threat to the impec-
Federated learning (FL) cability of FL mechanisms. By eavesdropping only on the updates transferring to the center
Inference attack server, these attacks can recover the private data of a local device. A prevalent solution
White-box assumption against such attacks is the differential privacy scheme that augments a sufficient amount
Digestive neural networks of noise to each update to hinder the recovering process. However, it suffers from a signif-
t-SNE analysis icant sacrifice in the classification accuracy of the FL. To effectively alleviate the problem,
Federated learning security this paper proposes a Digestive Neural Network (DNN), an independent neural network at-
ML Security tached to the FL. The private data owned by each device will pass through the DNN and
AI Security then train the FL. The DNN modifies the input data, which results in distorting updates, in
a way to maximize the classification accuracy of FL while the accuracy of inference attacks
is minimized. Our simulation result shows that the proposed DNN shows significant perfor-
mance on both gradient sharing- and weight sharing-based FL mechanisms. For the gradient
sharing, the DNN achieved higher classification accuracy by 16.17% while 9% lower attack
accuracy than the existing differential privacy schemes. For the weight sharing FL scheme,
the DNN achieved at most 46.68% lower attack success rate with 3% higher classification
accuracy.

© 2021 The Authors. Published by Elsevier Ltd.


This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)

https://doi.org/10.1016/j.cose.2021.102378
0167-4048/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
2 computers & security 109 (2021) 102378

FL is more vulnerable to the inference attack than the


1. Introduction conventional ML since its training topology divulges parame-
ters during communications. Accordingly, the attack with the
Conventional ML approaches for distributed data require ag-
white-box setting can be more critical in FL as the attackers
gregation of the data into a single repository. This is particu-
can have full access to the parameters and gradients of the
larly problematic for several reasons. Not only the congrega-
target model. Nasr et al. designed a membership inference at-
tion requires excessive network resources, but the data center
tack against FedAvg algorithm (Nasr et al., 2019). They showed
itself also requires a frequent update from mobile devices to
that a malicious device could send a malicious update to make
make the up-to-date data available for proper training. The
the target model reveal private information more easily. A gra-
data often includes privacy-sensitive information, entailing a
dient resulting from the stochastic gradient descent algorithm
privacy concern on the fabrication of a monolithic data repos-
suggests a direction in which model parameters need to be up-
itory. Moreover, from the ML perspective, a huge dataset can
dated. This direction in gradient is strongly correlated with the
slow down the learning speed.
corresponding mini-batch of the training data. For these rea-
In contrast, Federated Learning (FL) is a decentralized Ma-
sons, other attacks have also exploited gradients shared dur-
chine Learning (ML) algorithm that is more suited to learn
ing decentralized training in FedSgd. To this end, deep leakage
from distributed data (McMahan et al., 2017). The learning pro-
from gradients is one crucial example where it is possible to
cess in FL involves a central server and multiple participating
recover raw training data from the gradient updates (Zhu et al.,
mobile devices. It begins with the server’s coordination of the
2019). It updates a data sample so that a gradient of the sample
learning model with the participating devices. In each com-
can have a smaller distance from a captured gradient. Similar
munication round, the server broadcasts parameters gener-
studies have demonstrated that training data can be fully re-
ated from its global model to the devices. Each device then
covered through alternative approaches (Geiping et al., 2020;
copies the parameters into their local model and trains it us-
Zhao et al., 2020).
ing the private data. The device generates updates, either pa-
Existing works use differential privacy to protect the pri-
rameters or gradients depending on the type of the FL proto-
vacy of FL (Agarwal et al., 2018; Geyer et al., 2017; Wei et al.,
cols, that includes changes in the local model and sent it back
2020). Although FL with differential privacy is proven to
to the server. To this end, FL intrinsically resolves the privacy
converge, it compromises the classification accuracy of the
and efficiency problems as the devices need to send a piece of
trained model (Geyer et al., 2017). Therefore, encryption has
information rather than whole data for training purposes.
been used as an alternative solution as well as other security
The reason that FL is secure (despite it involves frequent
protocols for FL to eliminate the exposure of raw updates to an
communication) is because extracting private data from an
adversary (Dong et al., 2020; Liu et al., 2019; Phong et al., 2017;
ML model is infeasible. However, recent studies have shown
Zhang et al., 2020a). However, incorporating a cryptographic
that retrieving private training data from an ML model is pos-
scheme incurs computational overhead and could potentially
sible. The membership inference attack is an adversarial al-
become bottleneck throughout the learning process. In sum-
gorithm designed to recover training data of an ML model (Li
mary, both differential privacy and cryptographic schemes are
et al., 2020; Liang et al., 2018; Shokri et al., 2017). Shokri et al.
not sufficient because they either degrade the accuracy or pro-
identified that it is possible to recognize a sample in a pri-
cessing speed.
vate training dataset only by assessing the output of an ML
In this paper, we propose a novel FL scheme with Diges-
model (Shokri et al., 2017). An over-fitted classifier model
tive Neural Network (DNN). DNN is a set of sequential con-
tends to yield higher classification confidence for samples of
volutional neural networks that have identical input and out-
the training data. The authors trained shadow models in order
put dimensions. A mini-batch is transformed within the DNN
to exploit this divergent behavior of the target model. Followed
into another form, hence we denote it as the digested mini-
by the initial work, several research results have identified var-
batch. Each mobile device has a DNN and a collaborative neu-
ious techniques to recover private information from a target
ral network. The collaborative neural network accepts the di-
model (Hisamoto et al., 2020; Salem et al., 2019; Shokri et al.,
gested mini-batch and yields a corresponding prediction vec-
2017; Yeom et al., 2018). The membership inference attack typ-
tor. Each device collects updates from its collaborative neural
ically has either a white-box or a black-box assumption (Salem
network using the digested mini-batch. The DNN is trained
et al., 2019; Yeom et al., 2018). In the black box assumption, the
on digestive loss, conjunction of distance loss and classifica-
attacker can only feed inputs and observe the resulting pre-
tion loss. Using the digestive loss, the DNN produces digested
diction vector. On the other hand, attackers can fully access
mini-batch that has features useful for classification and has
the target model in the white box assumption. It is obvious
the maximum distance from the original mini-batch. The di-
that the white box assumption is stronger than the black box
gestive loss is controlled with a threshold value, which can be
assumption since the attacker can fully investigate the target
chosen by the FL service provider to offer various privacy lev-
model.
els. The proposed DNN achieves high classification accuracy
and low attack success rate than FL schemes with differential


The preliminary version of this paper was accepted in IEEE International Conference on Communications (ICC’21)

Corresponding author.
E-mail addresses: [email protected] (H. Lee), [email protected] (J. Kim), [email protected] (S. Ahn),
[email protected] (R. Hussain), [email protected] (S. Cho), [email protected] (J. Son).
1
http://i2s.kennesaw.edu
computers & security 109 (2021) 102378 3

Fig. 1 – Various FL protocols with adversary performing membership inference attack.

privacy. The proposed DNN is applicable to both FedAvg and participating device and di be a private dataset of i-th device,
FedSgd, and is robust against the state-of-the-art membership where i ∈ {1, 2, · · · N}. We denote FL scheme as a comprehen-
inference attacks. sive terminology for federated learning that includes every Di
The contributions of this paper are summarized as follows: and a central server C with a training protocol. Each Di has
its private dataset di along with its local model hi which has
• A novel DNN is proposed to defend against inference at- identical structure with the central server’s model hc . Each hi is
tacks in FL by transforming input data into an unrecogniz- parameterized by wi , and gradient of wi is gi,b which is defined
able, yet trainable data. The proposed scheme has a negli- as follows:
gible effect on the accuracy.    
• Scalability of the proposed scheme has been demonstrated ∂L hi wi , xi,b , yi,b
gi,b = , (1)
with experiments on both gradient sharing FL (FedSgd) and ∂xi,b
parameter sharing FL (FedAvg).  
where L is cross-entropy loss and xi,b , yi,b are b-th mini-batch
• Convergence stability for the proposed scheme is illus-
data samples. We define mini-batch data a set of data samples
trated.     
that satisfy xi,b , yi,b ∈ di and N/B xi,b , yi,b = di .
• Experimental analysis on the train-ability of the digested b=1  
In FedAvg, each device trains its local model ∀ xi,b , yi,b ∈ di .
mini-batch is demonstrated with t-distributed Stochastic
After training, device sends wi from hi directly to the central
Neighbor Embedding (t-SNE) (Maaten and Hinton, 2008).
server. The central server obtains the aggregated parameter W
• To assess the resilience against the membership inference
as follows:
attack, pre-trained models are utilized to quantify the re-
semblance of the recovered data and original data. 1 
N
W= wi ,
N
The rest of the paper is organized as follows. Section 2 in- i=1

troduces the backgrounds and preliminaries of this research.


Section 3 introduces our proposed scheme, Digestive Neural wc ← W.
Network (DNN) with a federated learning scheme. In Section 4,
we show the experimental results. Section 5 provides related The central server exchanges wc with W and distributes W to
work. Finally, conclusions are drawn in Section 6. all devices. Each device replaces its wi with W and continues
the protocol.
In FedSgd protocol, each device sends gi to the central
2. Preliminaries server. The central server aggregates all gradients to G, and
executes gradient descent on the hc . Mathematically, this can
In this section, we describe the fundamentals of federated be described as follows:
learning, inference attacks, and the differential privacy.
1 
N
G= gi,b ,
2.1. Federated learning N
i=1

The FedAvg (McMahan et al., 2017), a method updating pa- wc ← wc − γ G,


rameters, and FedSgd (Shokri and Shmatikov, 2015), a method
updating gradients, are two prevalent FL protocols. Fig. 1 de- where γ is a learning rate. The wc is then distributed back to
scribes the training protocol for both of them. Let Di be an i-th all devices. Devices replace their wi with wc .
4 computers & security 109 (2021) 102378

The main difference between the two protocols is the type leakage of one’s private information. However, with a sophis-
and the frequency of the update. FedAvg performs communi- ticated set of queries, it is possible to extract private informa-
cation once per a training epoch. Whereas devices participat- tion from the data. An example can be querying the hospi-
 
ing in FedSgd communicate for every xi,b , yi,b . For theses rea- tal records. Although names of patients are removed from the
sons, FedAvg is often incorporated in FL scheme with wireless dataset, it is possible to identify particular individuals by us-
communication, while FedSgd is often used for collaborative ing multiple datasets. If a patient living in a particular area has
learning in distributed computing units (Lim et al., 2020). a unique pattern of disease, using other data, it is possible to
retrieve the individual’s information. This attack is known as
2.2. Inference attack the linkage attack.
To prevent privacy abuse, differential privacy has been in-
A successfully trained ML model predicts unseen data with troduced. The differential privacy hides the contribution of
high accuracy using features learned from its training dataset. a particular sample on a query. Dwork et al. proposed an -
This implies that the performance of an ML model is deter- differential privacy (Dwork, 2006) to assure the confidentiality
mined by its training dataset. Therefore, it is possible to trace of a statistical dataset based on how much contribution a sin-
back what kind of samples exist in the training data of an ML gle data point has on a query. Formally, -differential privacy
model based on its performance. This introduces a significant is defined as follows. Let D1 and D2 are two datasets that have
threat to the dataset itself. The adversary can recover sam- one sample difference. Let A is a randomized algorithm and
ples in a commercially available dataset by analyzing the ML let S is a subset of the output of A. The algorithm A provides
model trained on the dataset. Also, if the dataset includes pri- -differential privacy under the following equation:
vate information, then the adversary may divulge the private
information from the dataset. As the title suggests, the mem- Pr[A(D1 ) ∈ S] ≤ exp() · Pr[A(D2 ) ∈ S]
bership inference attack is an adversarial algorithm that ana-
lyzes a target ML to discover samples of the training dataset.
Thus, the dataset provider can inject  amount of noise into
There are two assumptions on the target model for mem-
the dataset to achieve differential privacy. The differential pri-
bership inference attack; the black-box and the wite-box. In
vacy is highly scalable as the definition can be utilized by in-
the black-box setting, the adversary cannot have access to the
jecting noise into the parameters of ML models.
target model. The adversary is only allowed to observe the
output prediction vector from the corresponding input sam-
ple. Based on the model’s behavior, the adversary analyzes the
distributions of the dataset. Whereas the white-box setting as- 3. Proposed digestive neural network based
sumes that the adversary has full access to the target model. secure federated learning against inference attack
The adversary has access to a fully functioning target model
and can compute gradient or inference. Therefore, the white- 3.1. Baseline overview
box attack is more destructive than the black-box attack since
the assumption provides more knowledge to the adversary. Fig. 2 illustrates the proposed DNN scheme with FL. The
Pivotal research conducted on membership inference at- scheme consists of a single central server and multiple mo-
tack introduced an effective performance on discriminating bile devices. Each device has a DNN and a collaborative neu-
samples that are in the training dataset (Agarwal et al., 2018). ral network, while the central server only has the collabo-
The authors of the research developed a membership in- rative neural network. The collaborative neural network is a
ference attack model based on neural networks. To mimic classifier; the architecture accepts multi-dimensional data by
the target model’s behavior, the authors trained a number of convolutional neural networks and yields a one-dimensional
shadow models. Using the shadow model, the authors trained prediction vector. Devices and the central server must have a
the attack model that determines whether a given sample is homogeneous architecture of collaborative neural networks.
included in the target’s training dataset. The ML model shows This is because only the parameters of collaborative neu-
more prediction confidence on samples that are in the train- ral networks are communicated during the protocol. The
ing dataset. DNN accepts multi-dimensional data and produces multi-
The red box on the left top of Fig. 1 depicts an adver- dimensional data with identical dimensions with the input.
sary conducting the membership inference attack on the FL The parameters of the DNN are not shared. More precisely, the
scheme. The adversary A is targeting the D1 and eavesdrops on devices do not reveal the DNN to the communication channel.
the update from the D1 . The A eavesdrops w1 or g1 depending The arrow signs in Fig. 2 depict our proposed training pro-
on the FedAvg or FedSgd protocol respectively. The adversary tocol for DNN. A mini-batch of private training data sequen-
has its attack model a that is trained on either w1 or g1 . The tially passes through the DNN and the collaborative neural
goal of the a can be defined mathematically as follows. network. The DNN digests features of the given mini-batch into
completely different domains to remove private information

1 if x ∈ di of the data. The digested mini-batch needs to exclude original
a (x ) =
0 otherwise information and must contain features useful for high classi-
fication accuracy. The collaborative network receives digested
2.3. Differential privacy mini-batch and optimizes its parameter to improve the classi-
fication accuracy of the digested data. To increase the predic-
Constructing a dataset involves data sanitation. That is, the tion accuracy of the collaborative neural networks, both DNN
data should not include any information that leads to the and collaborative networks need to be trained cooperatively.
computers & security 109 (2021) 102378 5

Fig. 2 – Overview of FL procedures with the proposed scheme.

Each device prepares an update to send to the central with DNN prevents the attack since the target device gener-
server, using its collaborative neural network. The update is ates the gradient from the digested mini-batch. This entails
generated only using digested mini-batch and collaborative the adversary to recover information that is irrelevant to the
neural networks. After every device finishes sending its up- original mini-batch. The adversary in FedAvg is assumed to al-
date, the server distributes new parameters obtained from the ready possess a portion of the target device’s private training
aggregation of all the updates. Mobile devices train their DNN data. Therefore, the adversary trains an attack model that can
based on the updated collaborative neural network. Training identify a given sample’s membership of the target device’s
of DNN is determined by the digestive loss, a conjunction of dataset using eavesdropped parameters and the known por-
the classification loss, and the distance loss controlled by a tion of the private training data. The proposed FL scheme with
threshold. The goal of the digestive loss is to maximize the dis- DNN mitigates the attack since the parameter update from
tance between the original mini-batch and the digested mini- the target device is trained on the digested mini-batch. This
batch while including crucial information within the digested makes the adversary’s knowledge of private data totally use-
mini-batch for high classification accuracy. The classification less. This also entails the adversary to train the attack model
loss is a cross-entropy loss and the distance loss focuses to on a completely erroneous environment and mitigates the at-
increase the distance between input mini-batch and digested tack that is aimed at properly identifying the private training
mini-batch. More specifically, the distance loss increases the data.
distance between gradients of mini-batch and digested mini-
batch for FedSgd, while it increases the pixel-wise distance of 3.2. Training protocol: FedSgd
mini-batch and digested mini-batch for FedAvg protocol. Af-
ter updating DNN, each device proceeds to the next training Fig. 3 depicts the procedure of a communication round in the
round. It is worth mentioning that the proposed DNN applies training protocol of the proposed scheme with FedSgd proto-
to both FedAvg and FedSgd protocols. A detailed description of col. A communication round begins with training of the col-
each training protocol of the proposed scheme is illustrated in laborative neural network hC,i followed by training of the DNN
a subsequent section. hD,i . Let S be a central server and I be a set of all devices par-
The adversary in FedSgd and FedAvg utilizes gradient up- ticipating in the training. In a communication round t, device
date and parameter update from the target device. Further- i ∈ I gets a mini-batch (xi,t , yi,t ) from its private dataset. We
more, the adversary in FedSgd eavesdrops gradient update to denote mini-batch after passing the digestive network as xi,t ,
 
recover the mini-batch. To this end, the proposed FL scheme which is xi,t = hD,i xi,t . In training collaborative neural net-
6 computers & security 109 (2021) 102378

Fig. 3 – Training protocol of the proposed scheme on FedSgd.

work, the device obtains gradient gi,t using classification loss follows:
Lc from the collaborative neural network, where gi,t is defined   
∂Lc hC,i wi,t , xi,t
as follows: gCi,t = , (5)
∂wi,t+1
  
∂Lc hC,i xi,t   
gi,t = . (2) ∂Lc hCi wi,t xi,t
∂wi,t
gD
i,t = . (6)
∂wi,t+1
The gradient gi,t is an update for the server S that dis-
tributes the new parameter wt+1 after receiving and accumu- The digestive loss LD only updates the DNN. Furthermore,
lating gi,t from all the participating devices. The server con- training of the digestive network does not affect the collabora-
ducts a step in gradient descent with a learning rate τ as fol- tive neural network and vise versa. After every device updates
lows: its DNNs, mobile devices collect the next mini-batch for the
subsequent communication round.
1 
I
wt+1 = wt − τ gi,t . (3) 3.3. Attack scenario: FedSgd
|I|
j=1

We suppose an adversary defined in (Geiping et al., 2020; Zhao


Once mobile devices receive the wt+1 for their collabora- et al., 2020; Zhu et al., 2019) for attacking the FedSgd proto-
tive neural networks, each device trains its DNN. The training col. The adversary eavesdrops on the communication channel
objective of the DNN is to digest the data so that the attack and acquires wi,t and g j∈I,t . It is assumed that the adversary
algorithm fails to recover the original data. Simultaneously, does not possess any prior knowledge of each device’s private
xi,t needs to contain recognizable features for the collabora- dataset but does have enough computing resources to fabri-
tive neural network to make accurate predictions. To achieve cate a data sample and derive a gradient using the acquired
both requirements, we present the loss LD defined as follows: weights wi,t . When the adversary eavesdrops g j,t and wi,t+1 , it
generates a random data sample xadv , yadv and starts to min-
LD = Lc + max (α − Ld , 0 ), (4) imize the distance of gradient of xadv with the eavesdropped
g j,t . The adversary updates the random data sample to mini-
where Lc is a classification loss and Ld is a distance loss. Lc is mize the optimization goals:
responsible for assisting the collaborative neural network to
   2
increase the classification accuracy, while Ld updates DNN to ∂Lc yadv , hC,adv wi,t , xadv
arg min − g j,t . (7)
discourage the attack. We define a threshold α to properly reg- xadv ∂wi,t
ulate Ld as the static influence of Ld impairs the convergence
of the entire FL scheme. Ld requires gCi,t , gD i,t
and Lc , where gCi,t , That is, the attacker updates xadv and yadv so that the gradient
D
gi,t are the gradients of the collaborative neural network on of xadv can be similar to the gi,t . Iterating the process alters xadv
xi,t and xi,t , respectively. Gradients and losses are derived from to be analogous to xi,t .
the same mini-batch that were used to train the collaborative The attack model presented in (Geiping et al., 2020) mini-
network. Mathematically, two gradients can be represented as mizes cosine similarity between two gradients. The intuition
computers & security 109 (2021) 102378 7

behind the cosine similarity is that the gradient suggests a ity, Minkowski distance with p = 1 and p = 2. We denote the
direction to optimize the neural network with respect to the Minkowski distance with p = 2, or Euclidean distance as L2
mini-batch. Finding a data sample that suggests the same di- and Minkowski distance with p = 1 as L1, as they are L p norm
rection produces the original data. A detailed formulation of of differences (Friedrich, 1910).
the direction in optimization problem is defined as follows, In the proposed FL scheme, the constant presence of Ld im-
where x, y /||x|| · ||y|| is cosine similarity of x and y: pairs the global convergence of all mobile devices. Thus we in-
troduce a threshold α to control the influence of Ld over the
∂Lc (yadv ,hC,adv (wi,t ,xadv ) )
, g j,t scheme. As in Eq. (4), Ld becomes 0 when it exceeds α. This
∂wi,t
arg min 1 − . (8) keeps distance of gradients of xi,t and gradient of xi,t to stay
∂Lc (yadv ,hC,adv (wi,t ,xadv ) )
x∈[0,1]n · g j,t farther than α. Unlike Ld , Lc has a constant influence on the
∂wi,t
training of DNN to maintain high accuracy.

In the simulations, we experimented with both attackers


3.5. Training protocol: FedAvg
to examine the effectiveness of the two attacks and compare
them with our proposed defense mechanisms.
Devices participating in the FedAvg algorithm upload weight
parameter of their local neural networks after training for an
3.4. Digestive loss: FedSgd
epoch. The central server distributes averaged weights to all
the devices. The training protocol of the proposed scheme in-
The digestive loss consists of Ld and Lc , which are responsible
cludes training of both collaborative neural network and di-
for improving privacy and accuracy, respectively. The design
gestive neural network without violating the underlying Fe-
of Ld starts from scrutinizing the attack scenario. A gradient-
dAvg algorithm. Fig. 4 depicts the training protocol of i-th de-
based attack minimizes the distance between a target gradi-
vice participating in the FL of the proposed scheme for FedAvg
ent and a gradient from the dummy data. The DNN originated
algorithm (McMahan et al., 2017). The training is divided into
from an intuition of reversing the attack process using a neu-
two phases. In training a collaborative neural network, each
ral network architecture. To this end, the optimization goal of
device freezes the digestive neural network and trains only
the DNN is to increase the distance of the two gradients. The
the collaborative neural network. Whereas in training the di-
update from device i is gi,t , which is defined as in Eq. 2 and it is
gestive neural network, each device freezes the collaborative
the gradient of xi,t with respect to the hC,i . Thus, when an ad-
neural network and trains only the digestive neural network.
versary launches a successful attack using gi,t , it will recover
The initial phase is training collaborative neural network
xi,t . To prevent the attack, we need to convert gi,t to pose diffi-
hC,i . First each device inputs its private data xi , yi into its own
culty in processing the attack algorithm. Moreover, even if the
digestive neural network hD,i and collaborative neural net-
attack algorithm successfully recovers the xi,t , recovering xi,t
work hC,i . The device then derives only updates from its hC,i
is of no use if xi,t = xi,t . In other words, it does not give any
using cross-entropy loss Lc from the outputs. After training
advantage to the adversary.  
∂hC,i x ∂hC,i (xi,t )
for an iteration on whole data, the device sends its trained pa-
Thus, we propose pushing ∂w i,t far from ∂w . This rameters of wi,t to the central server. As the central server re-
C,i C,i
will introduce difficulty to the adversarial algorithm as push- ceives all parameters from all the devices, the server averages
ing the gradient increases the distance that the adversarial parameters and distributes the newly updated parameter.
algorithm has to minimize. Accordingly, pushing makes xi,t = The second phase is training the digestive neural network
xi,t as two data will produce two distant gradients. This way, hD,i . Similar to the proposed scheme on FedSgd, the digestive
the adversary will not be able to recover private information neural network needs to transform the original mini-batch xi
of xi,t from xi,t as it is completely different from xi,t . To achieve into a totally divergent mini-batch, however, it yet needs to
this, we propose an optimization goal of hD,i as follows: contain proper features that can assist the classification per-
        formed in the collaborative neural network. After receiving a
∂hC,i hD,i xi,t ∂hC,i wC,i , xi,t new parameter from the server, the device freezes the param-
= max dist gi,t , . (9)
∂wC,i xi,t ∂wC,i eter of hC,i . For each mini-batch xi collected from the private
data, the device obtains digested data xi by passing xi through
The xi,t generates gradient gi,t that has significant difference the digestive neural network hD,i . Using xi and xi , the device
∂h (w ,x )
with C,i ∂wC,i i,t . To optimize hD,i for successful generation of calculates the distance loss LD where Ld is a batch-wise mean
C,i
xi,t , the distance loss that contributes to update hD,i , is defined distance between xi and xi . This will maximize the pixel-wise
as follows: distance between xi and xi . Also, the device sends its digested
 data xi to the collaborative neural network and obtains cross-
   

∂Lc (hC (wi,t+1 xi,t ) ) ∂Lc hC wi,t+1 ,xi,t entropy loss Lc . Two losses are aggregated into LD and used
dist ∂wi,t+1 , ∂wi,t+1 for updating the digestive neural network.
Ld = −   . (10)
h C 
LD = Lc + max (α − Ld , 0 ), (11)
The distance loss is defined as an average distance between
two gradients normalized by the size of the hC,i . It updates the The digestive loss is controlled by a threshold value α. If the
hD,i so that the gradient of xi,t can be distant from the gradient distance loss is greater than the threshold value, the distance
of the xi,t . The difference between the two gradients is deter- loss is deactivated. After deactivation, the DNN minimizes the
mined by various distance metrics, including cosine similar- classification loss only.
8 computers & security 109 (2021) 102378

Training collaborative Training digestive


neural network Private Data neural network

Maximize
Digestive
Digestive neural network
neural network
Central server

Collaborative
neural network
Collaborative
neural network
Upload Distribute

Fig. 4 – Training protocol of the proposed scheme on FedAvg algorithm.

3.6. Attack scenario: FedAvg model as follows:

We define the adversary attacks on the FedAvg protocol us- ok = ht (wt , xk ), (14)
ing the passive global attacker in (Nasr et al., 2019). The goal
of the passive global attacker is to build an attack model that
can identify whether a data sample is included in the target
device’s training dataset. The attack model is a binary classi- lk = L(ht (wt , xk ), yk ), (15)
fier that predicts the membership of the input sample. The
attacker already has a portion of the target’s data and the
attacker has enough computing resource to simulate a neu- ∂L(ht (wt , xk ), yk )
ral network with the target’s parameters. In the designated gk = , (16)
∂xk
training rounds, the attacker eavesdrops on the parameter up-
date from the target device. It saves the parameter to its local
storage until the FL protocol is completely executed. After ac-
fk = {hn,t (wt , xk )∀hn,t ∈ ht }. (17)
quiring parameters from all the training rounds, the attacker
trains its attack model using the target’s training data that the
where hn,t is the k-th layer of the target model. Thus hn,t (xk , yk )
attacker already has. Then the attacker extracts gradient, fea-
represents the feature map of k-th layer. The attacker trains
ture map, and output from neurons and loss values by simu-
the attack model to build the following binary classifier.
lating the target’s neural network. The attack model is trained
in a supervised manner. Let A is an attack model, dt is a tar- 
1 if (xk , yk ) ∈ dt
get device’s private dataset, and let (xt , yt ) ∈ dt is a member A ( xk , yk , ok , lk , gk , f k ) = (18)
0 otherwise
data of dt and (x , y ) ∈
/ dt is non-member data. The adversary
is assumed to have a partial dataset da,t ⊂ dt that satisfies
That is, the attacker tries to optimize A to predict the mem-
da,t = dt . The adversary aims to infer the unknown portion
 . We also assume the target model
bership of (xk , yk ).
of the dataset du,t = da,t
with target’s weight ht (wt ). Then the attacker forms the attack 3.7. Digestive loss: FedAvg
dataset as follows:

  Similar to the FedSgd, the digestive loss consists of distance


datk = {(xi , yi , z )∀(xi , yi ) ∈ da,t } ∪ { x j , y j , z }, (12) loss L and classification loss Lc . In FedAvg, The Ld directly
derives the mathematical distance of a data sample x and the
corresponding digested sample x . The Ld attempts to increase
pixel-wise distance between the x and x . By iteratively opti-

1 if (xk , yk ) ∈ da,t mizing the DNN using Ld converts DNN so that DNN delivers
z= (13)
0 otherwise. x to the collaborative neural network. Mathematically, it can
be defined as follows:
The attacker uses datk for training the attack model using dist (x, x )
gradient, loss value, feature maps and output of the target Ld = − . (19)
|x|
computers & security 109 (2021) 102378 9

After training multiple FL schemes, we simulated the attack


Table 1 – Simulation configurations for digestive network.
algorithm suggested in the previous section. The attack sim-
Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 ulation runs the attack algorithm to recover 100 private data
K S K S K S K S K S K S instances.
For the FedAvg algorithm, we simulated FL with |I| =
32 1 32 2 32 1 32 1 32 2 32 2
3 and randomly selected 30,000 private data instances for
32 1 32 2 16 1 64 1 16 2 64 2
32 1 32 1 16 1 64 1 16 1 64 1 each device from the CIFAR-100 dataset with allowing dupli-
32 1 32 0 16 1 64 1 16 0 64 0 cates. The simulation incorporates same DNN, but we used
32 1 32 0 32 1 32 1 32 0 32 0 AlexNet (Krizhevsky et al., 2012) as our collaborative neu-
ral network architecture. Every FL scheme is trained for 300
K: Number of kernels, S: Strides
epochs with a batch size of 64 and a learning rate of 1e − 3.
The training conditions and hyperparameters are selected to
The difference between the two samples is determined be in compliance with the attack algorithm. Moreover, we
by various distance metrics, including cosine similarity and implemented the passive local attacker for FedAvg introduced
Minkowski distance with p = 1 and p = 2. We will denote in (Melis et al., 2019). Every FL scheme is implemented by Py-
Minkowski distance with p = 2, or Euclidean distance as L2, torch on workstation i7 with four RTX 2080 TI GPUs.
and Minkowski distance with p = 1 as L1, as they are L p norm In this section, we also evaluate the performance of the
of the differences (Friedrich, 1910). FL by comparing accuracy and attack success rates. The ac-
The goal of the attacker of FedAvg protocol is to infer the curacy denotes model performance on the test data whereas
portion of the target’s private data using another portion of the the attack success rate indicates how the model is robust to
data that the attacker already has. Therefore, DNN delivers x an attack. As aforementioned, we simulated the attack algo-
that is totally different from the x for training the collaborative rithm for both FedSgd and FedAvg protocol. For the FedSgd
neural network. Thus, the attacker will receive the parameters protocol, the attack success rate is measured by comparing
of the collaborative neural network that was trained on com- the similarity of original data and the data resulting from 100
pletely different data x than what the attacker currently has. attack simulations. The similarity is measured in terms of ac-
This fails the training of the attack model as the data the at- curacy predicted by a pre-trained classifier that can classify
tacker has, is different from the x . the original dataset with high accuracy. The high prediction
accuracy of a pre-trained classifier means that the attack al-
gorithm successfully reconstructed data that contains similar
4. Experiment results and evaluation visual features to the original data. Thus high attack success
rate means the model is not defending well against the attack.
We simulated the proposed scheme in various conditions The low attack success rate, on the other hand, shows that the
and quantified the performance of the DNN. The simulation attack simulation has failed to generate a data sample that
was conducted with different architectures of DNNs including has visually analogous features. For the FedAvg algorithm, we
three different distance metrics. The detailed configuration of simulated the passive local attacker. Furthermore, the attack
DNN is described in Table 1. The DNN includes five residual model is a binary classifier that predicts the membership of
blocks. Each block includes three convolutional layers with a the sample, and the attack success rate is directly obtained
residual path. Each row in Table 1 is a configuration for the by the accuracy of the attack model. The lower accuracy of
residual block. The number of kernels in convolutional layers the attack algorithm indicates that the attack model could not
is determined by K, and S determines to downsample and up- determine whether a given sample is included in the private
sampling. Columns of S indicate strides for convolutional lay- training dataset. On the other hand, a higher attack success
ers. S = 2 means that the corresponding block includes a con- rate indicates that the attack model is certain whether a given
volutional layer that downsamples the input mini-batches di- sample is a member of the training data.
mensions in half. S = 0 means the output of the corresponding For both FedSgd and FedAvg, we trained the models with
block will be double of the size of the input mini-batch. S = 1 differential privacy. The differential privacy model injects a
indicates that the output dimension is identical to the input predefined amount of noise into the updates. For FedAvg, the
dimension. noise is added to the parameter update while for the FedSgd
The experiment involves six different types of architec- model, noise is added to the gradient of the data. We trained
tures for DNNs. Each type is selected to measure the varying the differential privacy models for the same training epochs
performance of the proposed scheme according to the archi- and same batch size to make an exact and fair comparison
tecture of the DNNs. All residual blocks in type 1 have identi- with the proposed schemes. Also, we simulated attacks on dif-
cal kernel sizes as well as identical feature map dimensions. ferentially private models to compare the attack success rate
Residual blocks in type 3, 4, 5, and 6 have varying kernel sizes for Differential private models and the proposed DNN.
while type 2, 5, and 6 include blocks that reduce and increases
feature maps. 4.1. Performance analysis: accuracy and attack success
We simulated our attack on both FedSgd and FedAvg proto- rate on FedSgd
cols. For FedSgd, we simulated FL with |I| = 5 using aforemen-
tioned multiple DNNs and resnet-20 as collaborative neural We first compare the accuracy and attack success rate of the
networks (He et al., 2016). Every FL scheme is trained for 300 FL with differential privacy on the FedSgd protocol. Table 2
epochs with batch sizes of 32 and a learning rate of 1e − 4. shows the accuracy and attack success rate of plain and vari-
10 computers & security 109 (2021) 102378

Fig. 5 – Test accuracy per epoch for plain and differential privacy, and proposed scheme with type 1 digestive neural
networks on CIFAR10 and SVHN dataset.

Table 2 – Accuracy and attack success rate of plain and Table 3 – Performance of proposed scheme with L2 dis-
differential privacy models. tance metric.

Scheme CIFAR 10 SVHN Digestive type α Accuracy attack success rate


Attack Attack Type 1 1e-1 82.448 10.0
success success Type 2 1e-1 78.320 11.0
Accuracy rate Accuracy rate Type 3 1e-1 82.416 11.0
Type 4 1e-1 85.794 10.0
Plain 70.620 19.0 77.631 20.0
Type 5 1e-1 78.080 9.0
model
Type 6 1e-1 77.894 9.0
δ 2 =1e-4 69.620 18.0 73.087 17.0
Type 1 1e-2 81.476 8.0
δ 2 =1e-3 68.130 17.0 70.200 15.0
Type 2 1e-2 77.262 8.0
δ 2 =1e-2 64.280 17.0 54.141 26.0
Type 3 1e-2 83.704 10.0
Type 4 1e-2 82.728 7.0
Type 5 1e-2 78.682 8.0
Type 6 1e-2 77.740 9.0
Type 1 1e-3 83.018 14.0
ous differential privacy schemes on the CIFAR-10 and SVHN
Type 2 1e-3 78.218 10.0
datasets. The plain scheme (with no differential privacy) is
Type 3 1e-3 83.334 8.0
an FL scheme without any defensive mechanisms whereas Type 4 1e-3 82.778 9.0
differential privacy schemes use an additive Gaussian noise Type 5 1e-3 78.098 11.0
with variance δ 2 to ensure data privacy. As differential privacy Type 6 1e-3 77.248 11.0
schemes add noise to the gradient updates, the accuracy de- Type 1 1e-4 81.960 12.0
creases as the level of noise increases. Instead, the attack suc- Type 2 1e-4 78.914 15.0
Type 3 1e-4 82.934 12.0
cess rate decreases as the noise level increases, meaning that
Type 4 1e-4 82.438 13.0
reconstruction of data through attack simulation was unsuc- Type 5 1e-4 77.822 17.0
cessful as noise level increases. Type 6 1e-4 78.240 17.0
Fig. 5 illustrates the performance of each scheme over
training epochs. We compared plain scheme, differential pri-
vacy scheme, and three proposed schemes with distance met-
rics of L1, L2, and cosine similarity. All proposed schemes in
the graph had type 1 DNN and a threshold of α=1e-1. The conducted to analyze the effects of various DNN types intro-
graph shows that the proposed schemes, regardless of differ- duced in Table 1 as well as various α values on FedSgd protocol.
ent distance metrics, achieved high accuracy than the rest of All schemes in the table had L2 as a distance metric.
the schemes. This is because the DNN is not only responsi- Comparing Tables 2 and 3, it is clear that the proposed
ble for the digestion of the data but also contributes to the scheme achieved higher accuracy and lower attack success
improvement in accuracy. Also, the proposed scheme showed rate than the differential privacy schemes. Moreover, the pro-
fast convergence on both CIFAR-10 and SVHN datasets, while posed scheme had 7% to 15% higher accuracy than the plain
plain and differential privacy schemes showed slow conver- scheme. This is because the digestive neural network not only
gence. The graph also shows the downside of differential pri- contributes to the digestion of the input data but also con-
vacy schemes which is the drop in the accuracy of the differ- tributes to reducing the classification loss. Also, the proposed
ential privacy schemes as compared to the plain model. scheme had a lower attack success rate, the attack success
Table 3 shows the accuracy and attack success rate of the rate of 10.0 indicates that the reconstructed data do not have
proposed scheme, including various DNNs. The simulation is any feature similar to the original data. Furthermore, the pre-
computers & security 109 (2021) 102378 11

Table 4 – Accuracy and attack success rate comparisons over various α values on CIFAR-10 dataset.

Digestive type α L1 L2 sim


Acc Att acc Acc Att acc Acc Att acc
Type 1 1e-1 83.158 16.0 82.448 10.0 81.532 8.0
Type 1 1e-2 81.064 13.0 81.476 8.0 81.672 9.0
Type 1 1e-3 81.002 13.0 83.018 14.0 81.716 13.0
Type 1 1e-4 81.060 9.0 81.960 12.0 82.912 15.0
Type 2 1e-1 79.240 4.0 78.320 11.0 78.604 14.0
Type 2 1e-2 78.606 14.0 77.262 8.0 78.834 12.0
Type 2 1e-3 78.438 6.0 78.218 10.0 78.034 4.0
Type 2 1e-4 78.594 11.0 78.914 15.0 78.418 14.0
Type 4 1e-1 82.218 10.0 85.794 10.0 82.808 8.0
Type 4 1e-2 82.724 9.0 82.728 7.0 82.636 7.0
Type 4 1e-3 82.756 10.0 82.778 9.0 81.326 11.0
Type 4 1e-4 81.950 6.0 82.438 13.0 82.486 13.0

trained classifier for measuring attack success rate makes a ral network on the SVHN dataset is almost similar to the pro-
random guess to the reconstructed data. posed schemes with the type of digestive neural networks.
Table 3 implies that the types of architecture and thresh-
old α determines both accuracy and attack success rate. The 4.2. Performance analysis: accuracy and attack success
reduction of α correlates with the attack success rate. This is rate on FedAvg
because the α determines the training of Ld . A small value of
α puts less burden on models than high α where in case of Table 6 shows the performance and the attack success rate of
higher α, the models have to maintain Ld up to α. After Ld plain FedAvg protocol and the attack success rate. The abso-
reaches α, models train DNN only to minimize the classifica- lute accuracy of the plain FedAvg is low since the neural net-
tion error. Thus, models with smaller α achieve higher accu- work for the FL is AlexNet. The Alexnet without batch normal-
racy as they can train their DNN for assisting classification ization tends to significantly suffer from overfitting problems.
more pervasively than the schemes with high α. Conversely, We can see that the accuracy of differential private models is
the models with high α need to contribute themselves more to slightly degraded from the plain model as noise added to the
maintain Ld up to α. Although this degrades the classification parameter update affects the global accuracy of the model.
accuracy, it makes the scheme more robust to the inference The attack success rate is more important as it shows how
attack on FedSgd. differential privacy defends the attack on FL. The attack suc-
The types of architecture also have a varying impact on the cess rate on the plain scheme is 98.74% while the attack on
accuracy and attack success rate of the model. The reduction differential privacy models is around 74%. The attack success
of dimensions has a negative effect on accuracy. Simulation of rate drops significantly. The differential privacy with the high-
type 2 only shows degradation in classification accuracy, but est variance of 1e − 1 reduces the attack success rate signifi-
not in the attack success rate. On the other hand, type 4, an ar- cantly. However, the differential privacy model with the vari-
chitecture with larger kernels showed a relatively lower attack ance of 1e − 1 is not desired as classification accuracy is sig-
success rate than other models while not showing a decrease nificantly low.
in the classification accuracy. Effects resulting from a reduc- Table 7 shows the classification accuracy and the attack ac-
tion in dimensions and large kernels are independent as sim- curacy of the proposed scheme with various DNN on FedAvg
ulations of type 6 show effects from both configurations. protocol. We used the same DNNs defined in Table 1. Along
From the observation, we have narrowed down to conduct with the previous simulations, we focused on type 1, 2, and
another simulation that only includes type 1, 2, and 4 for the 4 DNNs that have a particular behavior associated with their
rest of the experiments. The selected types show distinctive structure. With various α values, the proposed DNN converted
effects than other types. Table 4 shows the classification ac- the input mini-batch into the digested mini-batch. It can be
curacy and the attack success rate of the proposed scheme seen that the DNN does not affect the classification accuracy
with various types, α, and distance metrics. We can see that significantly as DNN is also trained for reducing the classifica-
types and α show similar effects even when the distance met- tion error. Also, the collaborative neural network trains itself
rics are different. Also, effects from α and effects of architec- to increase classification accuracy. Similarly, with the applica-
ture types are independently affecting the accuracy. The accu- tion of FedSgd protocol, the proposed DNN converts the un-
racy degradation resulting from α is seldom observable in the derlying features in original data into different features. The
proposed schemes with type 4. Since type 4 has wider kernel attack success rate substantiates the fact that the DNN com-
sizes, the accuracy degradation from the smaller α has been pletely converts the image feature. Furthermore, the adver-
counterbalanced by the architecture. Table 5 shows an identi- sary trains the attack model using half of the samples of the
cal experiment conducted on SVHN dataset. Unlike CIFAR-10, target device. However, most of the experiments had an attack
effects resulting from using type 4 have not been observed. success rate of less than 10%. This shows that in most cases,
Moreover, the attack success rate of the type 4 digestive neu- the attack model failed to identify the other half of the data
12 computers & security 109 (2021) 102378

Table 5 – Accuracy and attack success rate comparisons over various α values on SVHN dataset.

Digestive type α L1 L2 sim


Acc Att acc Acc Att acc Acc Att acc
Type 1 1e-1 92.307 15.0 92.338 14.0 92.411 9.0
Type 1 1e-2 92.273 13.0 92.745 11.0 92.384 16.0
Type 1 1e-3 92.331 17.0 92.189 8.0 92.086 17.0
Type 1 1e-4 92.382 6.0 92.974 20.0 92.441 7.0
Type 2 1e-1 88.692 18.0 88.843 7.0 89.368 14.0
Type 2 1e-2 88.989 9.0 88.949 17.0 88.408 15.0
Type 2 1e-3 87.735 11.0 88.916 16.0 90.209 10.0
Type 2 1e-4 89.593 14.0 88.794 10.0 78.172 16.0
Type 4 1e-1 92.205 14.0 91.927 19.0 92.437 16.0
Type 4 1e-2 92.492 16.0 92.059 15.0 92.430 12.0
Type 4 1e-3 92.402 17.0 92.740 9.0 92.494 14.0
Type 4 1e-4 92.957 7.0 92.501 9.0 91.901 12.0

out the training epochs. Architecture types and α values affect


Table 6 – Accuracy and attack success rate of passive local
the formation of distance loss. In α = 1e−1, the model updates
attacker on plain model and differentially private models
on CIFAR-100 dataset. the digestive loss rapidly up to 1e − 1. Exceeding that point,
the influence of distance loss in digestive loss is neglected.
Scheme Accuracy attack success rate We have anticipated that the distance loss will decrease af-
Plain model 26.80 98.74 ter exceeding α, though it did not decrease after it exceeded
δ 2 = 1e − 4 26.84 74.15 its requirements. This is because of the sequential nature of
δ 2 = 1e − 3 22.46 72.27 the training protocol. DNNs and collaborative neural networks
δ 2 = 1e − 2 22.07 76.64
are trained sequentially. Thus, although the DNN rapidly in-
δ 2 = 1e − 1 10.01 49.37
creases distance loss, the collaborative network adapts itself
to the DNN and vice versa. Thus after exceeding the α, dis-
tance loss never decreases as the collaborative network has
that was transformed through the DNN. This shows that even already adapted itself. A similar rapid increment of distance
though the attack algorithm extracts information from the loss is observed in simulations with α = 1e − 2. Type 4 showed
target’s collaborative neural network, the attack model failed a moderate increment in the distance loss. The slow incre-
to extract crucial information for identifying the digested data ment may contribute to the highest accuracy as collaborative
using the original private data. neural networks could adapt themselves to digested feature
domain.
4.3. Analysis on distance loss When α is too small, such as 1e − 4, a rapid surge in the dis-
tance loss is not observed. Fig. 6 (d) is an example where initial
Distance loss is one of the most crucial components of our distance loss is already bigger than α. In such cases, the di-
proposed scheme. Throughout the training, the goal was to gestive network is only minimizing classification loss whereas
maintain the distance loss higher than α. Fig. 6 shows the the distance is automatically slightly increased as the training
change in the average distance losses of five models through- continues. This implies that increasing classification accuracy

Table 7 – Accuracy and attack success rate of passive local attacker on FedAvg with proposed DNN.

Digestive type α L1 L2 sim


Acc Att acc Acc Att acc Acc Att acc
Type 1 1e-1 18.73 37.58 23.02 5.755 21.31 12.20
Type 1 1e-2 24.73 49.37 23.75 9.237 23.53 11.65
Type 1 1e-3 21.74 6.349 24.25 6.448 23.64 14.933
Type 1 1e-4 22.52 2.235 23.77 6.369 21.28 5.657
Type 2 1e-1 23.33 62.05 26.35 13.865 29.56 2.987
Type 2 1e-2 21.53 3.125 19.35 8.979 31.07 13.92
Type 2 1e-3 20.28 9.494 22.87 3.679 23.36 21.34
Type 2 1e-4 22.96 4.451 28.79 8.030 25.32 9.237
Type 4 1e-1 25.68 8.4454 25.90 5.894 22.56 2.690
Type 4 1e-2 25.25 14.750 24.62 12.50 22.90 7.183
Type 4 1e-3 27.44 8.069 23.75 9.138 25.02 8.930
Type 4 1e-4 21.98 26.167 25.08 5.835 23.42 12.26
computers & security 109 (2021) 102378 13

Fig. 6 – Distance loss per epoch on proposed scheme with L2 distance metric.

and increasing the distance of gradients are not inconsistent. α. The level of α has little effect on the classification accuracy.
That is, minimizing the distance loss does not increase the That is, the service provider can choose the desired α values
classification loss. Moreover, training for increasing classifica- with little concern for the performance of the FL scheme.
tion accuracy assists in an increase in distance loss.
From the same graph, different effects caused by various
types of DNN are illustrated. The distance loss incremented 4.4. Attack simulations on digestive neural networks
by the collaborative neural network is significantly affected by
the feature map sizes. Architecture types, including feature The aforementioned experiments have substantiated that the
map reduction, not only suffered from classification accuracy proposed scheme can defend against the membership in-
but also showed a slow increment in the distance loss. ference attack more effectively than the differential privacy
Fig. 7 shows how the distance loss is shaped with respect scheme. Here we investigate in further detail, the details of the
to different al pha values under different distance metrics. In relationship between the attack algorithm and the proposed
every scheme, simulation with al pha = 1e − 1 showed a rapid scheme. Table 8 shows the process of the attack initiated on
increment of distance in the early training period. Since dis- the plain, differential privacy, and our proposed schemes with
tance metrics are all different, α = 1e − 1 does not have an various types, threshold, and distance metrics. The attack al-
identical impact on the model as well. The α = 1e − 1 already gorithm generates an initial random image and repetitively
is too small to be a threshold for the proposed schemes with updates the image so that the image can have more similar
cosine similarity. Thus distance losses for all experiments on gradients with the target gradients.
the similarity are at a similar level as they do not have to up- The attack on the plain scheme has successfully recovered
date their DNN based on the distance loss. the image. Moreover, the recovered image retains similar vi-
The proposed DNN on FL scheme offers a better FL environ- sual features with the ground truth. On the other hand, an
ment for an FL service provider. FL service provider who incor- attack on differential privacy with δ 2 = 1e − 4 also showed
porates differential privacy within its FL scheme has to decide that the attack algorithm has recovered the image that con-
the adequate level of noise for securing the whole scheme. tains relatively similar features to the ground truth. While the
The accuracy degradation due to the presence of noise is in- attacks on the proposed scheme have reconstructed images
evitable for the FL service provider and every mobile device. that do not contain any representative features that are simi-
Under the FL scheme with the proposed DNN, the FL service lar to the ground truth. This confirms that our proposed DNN
provider can select the level of privacy by selecting the proper is operating as we designed. The DNN pushes the input mini-
batch so that it can have a dissimilar gradient with the original
14 computers & security 109 (2021) 102378

Fig. 7 – Distance loss with respect to various α and training epochs in different distance metrics.

gradients. Pushing the gradients also alters the visual features ality reduction algorithm that converts multi-dimensional
of the original image into totally disparate features. This is data into two-dimensional data without losing spatial rela-
why our proposed scheme is more secure than the differential tionships within high-dimensional data. The algorithm often
privacy scheme. Even the attack algorithm successfully recov- forms clusters by placing data points with high similarity to-
ers the image using gradients, the resulting image already has gether and placing data points of low similarity far apart.
totally different features from the original data. The attacker We analyzed each model by applying t-SNE to the output of
may successfully obtain data; however, will not be able to ex- the last convolutional layer of each model. This illustrates the
tract any private information based on the analysis of totally visualization of the learned representation of input data. Since
different features than what the attacker is looking for. all models have identical architecture and hyper-parameters
Table 9 shows visualization of the digested data from the with identical epochs of training, the learned representation
original data. The digested data is the output of the DNN. The can reveal the learnability of a given dataset. To quantify the
collaborative neural networks receive this mini-batch from learned representation without any bias, we conducted exper-
the DNN and send gradients to the server based on the mini- iments only on the test data of the CIFAR-10 dataset.
batch data. Thus, a successful membership inference attack Fig. 8 illustrates a visual depiction of the reduced dimen-
on the gradient will reconstruct these data. Thus the attacker sion by t-SNE. We analyzed plain, differential private schemes
hardly will gain any meaningful information from the mini- with two proposed schemes with the highest accuracy. A dot
batch data. More importantly, the digestion of data is con- in the figure represents a data point in a dataset, and the color
ducted locally. Thus unless the attacker can sneak into the denotes its class. In the figure, plain and differential privacy
system of a participating mobile device, recovering ground with δ 2 = 1e − 2, δ 2 = 1e − 3 do not show clear clusters.
truth from digested data would not be possible. We can consider noise added to the update in the differential
privacy scheme as noise added to the data. This shows that
data is too sophisticated to differentiate using convolutional
4.5. Train-ability of the digested data layers’ learned representations. On the other hand, clusters
are vividly visible on t-SNE analysis in the proposed schemes.
We had a curiosity about how our proposed scheme could This indicates that feature extraction of the digested data is
achieve higher accuracy on the CIFAR-10 and SVHN datasets easier and more easily separable than the differential privacy
with the DNN. To find what caused this, we incorporated t- schemes. The DNN actually converts the CIFAR-10 dataset, so
distributed Stochastic Neighbor Embedding (t-SNE) to see how that the convolutional layers in collaborative neural networks
the model is perceiving its data. The t-SNE is a dimension-
computers & security 109 (2021) 102378 15

Table 8 – Progress of attack simulation.

can extract features effectively. The digested data may seem proposed a scheme to identify training data by inspecting a
like random noise, however, it actually converted the dataset model (Shokri et al., 2017). The authors leveraged a typical be-
better so that the collaborative network can take advantage of havior of an overfitted model. When a sample from training
it. data is given, an overfitted model shows higher confidence
than a generalized model. By assessing this different behav-
ior, the attack model determines if a given datum is included
5. Related works in the training data of a victim model. To train the attack
model, multiple shadow models were incorporated to simu-
In this section, we outline the existing work pertaining to in- late the behavior of the target model. Unlike the initial in-
ference attacks on ML and FL, and privacy preservation tech- ference attacks that required multiple shadow models, the
niques in FL. data transferring attack by Salem et al. found out that the
inference attack can be accomplished by only one shadow
5.1. Membership inference attack on machine learning model (Salem et al., 2019). The authors proposed three differ-
models ent approaches with an attack model that does not require
any training for inference. They reported that statistical anal-
Membership inference attack, or inference attack, can target ysis on the posteriors is sufficient to explain if a sample is in-
various ML algorithms over various environments. Cai et al. cluded in the training data. Another similar work (Yeom et al.,
proposed an inference attack on basic classifiers trained on 2018) showed that the overfitting of the target model is a suffi-
social network graph data (Cai et al., 2018; Li et al., 2021). cient condition, not a necessary condition for inference attack.
For an inference attack on deep learning models, Shokri et al. The authors simulated the attack on a real-world dataset. De-
16 computers & security 109 (2021) 102378

Table 9 – Digested Image of α = 10.

viating from using the shadow model, Lenio et al. proposed an participates in the FL protocol to induce the victim to reveal
attack based on white-box assumptions (Leino and Fredrik- more information. However, the attack model for the passive
son, 2020). They identified that an overfitted model only uses attacker needs a substantial amount of data and extensive
certain features to make an inference. training. Melis et al. proposed an alternative approach of defin-
ing privacy leakage in the FedAvg scheme (Melis et al., 2019).
Feature analysis through deep neural network examine fea-
5.2. Membership inference attack on FL
tures that are insignificant for classification objective. The at-
tack successfully showed the extraction of private informa-
FL is more vulnerable to inference attacks than other ML algo-
tion. The proposed attack demonstrated its effectiveness on
rithms. In non-collaborative learning, training is processed se-
an ethnicity classifier based on the LFW dataset. The model
curely. However, in FL, the training process is exposed through
successfully identified the presence of glasses or sunglasses
a communication channel, allowing inference attacks to ex-
in the training data.
ploit the training process. Nasr et al. defined possible adver-
Generative Adversarial Networks(GAN) are also leveraged
saries against parameter sharing FL scheme (Nasr et al., 2019).
to extract private information from the model (Hitaj et al.,
The authors defined an active and a passive attacker who
2017; Zhang et al., 2020b). Hitaj et al. demonstrated the fea-
exploits the communication protocol of the victim FL. The
sibility of constructing a GAN-based attack model. The attack
passive attacker only observes the communication channel
model maliciously participates in the training protocol to train
and snatch parameter update to launch an inference attack
the generative model effectively. However, there is no guaran-
against a participant of the FL. The active attacker actively
computers & security 109 (2021) 102378 17

Fig. 8 – t-SNE analysis.

tee that the generated samples are identical to the original sample so that the distance of the victim’s gradient and the
samples in the training data. Similarly, Wang et al. installed gradient from the sample can decrease. This induces the sam-
an additional validation step to quantify how similar the syn- ple to be identical to the original sample in the victim’s train-
thetic data are compared to the original data in the training ing dataset. Furthermore, some works focused on increasing
set. the stability of the attack (Geiping et al., 2020; Zhao et al.,
In FedSgd, a different approach is proposed where deep 2020). Instead of Euclidean distance by deep leakage from gra-
leakage from gradients exploits gradient updates from the de- dients, inverting gradients used cosine similarity to match
vices (Zhu et al., 2019). In FedSgd, each gradient update from directions.
devices includes a direction to enhance the current parame-
ter with respect to its mini-batch data. The attack model gen-
5.3. Privacy protection mechanisms on FL
erates a random data sample and calculates the gradient of
the sample with respect to the newest model distributed by
Differential privacy has been widely used to dissuade privacy
the central server. The attack model continuously updates the
breaches in FL. Differential privacy is implemented on FL as an
18 computers & security 109 (2021) 102378

added noise to its updates (Agarwal et al., 2018; Geyer et al., Writing - review & editing. Sunghyun Cho: Resources, Fund-
2017; Jayaraman et al., 2018; Wei et al., 2020). In (Wei et al., ing acquisition. Junggab Son: Supervision, Project administra-
2020), the authors showed that FL models could converge tion, Conceptualization, Methodology, Investigation, Writing -
in the presence of differential privacy. However, most of the review & editing.
schemes suffer from performance degradation.
Alternatively, cryptographic approaches are utilized to se-
cure FL. To this end, homomorphic encryption has been ap- Acknowledgment
plied to FL to enhance privacy (Dong et al., 2020; Fang et al.,
2021; Liu et al., 2019; Phong et al., 2017; Zhang et al., 2020a; This work was supported by the MSIT (Ministry of Science,
Zhang et al., 2020). The advantage of homomorphic encryp- ICT), Korea, under the High-Potential Individuals Global Train-
tion is that arithmetic calculation is possible on encrypted ing Program (2019-0-01601) supervised by the IITP (Institute
data. However, it requires significant computational resources. for Information & Communications Technology Planning &
A number of studies tried to reduce the computation cost Evaluation).
of homomorphic encryption. For instance, in (Zhang et al.,
2020a), the authors utilized the quantization of parameters
to reduce overhead. Similarly, ElGamel homomorphic encryp-
tion is usually used for lightweight computation on limited
Supplementary material
network bandwidth in the Internet of Things (IoT) environ-
Supplementary material associated with this article can be
ment (Fang et al., 2021), and the Chinese remainder theorem is
found, in the online version, at 10.1016/j.cose.2021.102378
also used for reducing the size of gradient update (Zhang et al.,
2020).
As aggregation is one of the most crucial components in R E F E R E N C E S
FL, it is essentially important to secure aggregation. To date,
different works have focused on the aggregation security of
FL (Fang et al., 2020; Li et al., 2020; Tran et al., 2021). Fang et Agarwal N, Suresh AT, Yu FX, Kumar S, McMahan B. cpSGD:
al. incorporated multi-party computation to enable aggrega- Communication-efficient and differentially-private
tion of average encrypted weight updates in the FedAvg al- distributed SGD, 31; 2018. p. 7575–86. doi:a
gorithm (Fang et al., 2020). Similarly, Chain-PPFL proposed a Cai Z, He Z, Guan X, Li Y. Collective data-sanitization for
token-based system with chaining participants to counter the preventing sensitive information inference attacks in social
networks. IEEE Trans. Dependable Secur. Comput.
honest-but-curious server (Li et al., 2020).
2018;15(4):577–90. doi:10.1109/TDSC.2016.2613521.
Dong Y, Chen X, Shen L, Wang D. Eastfly: efficient and secure
ternary federated learning. Comput. Secur. 2020;94:101824.
6. Conclusion doi:10.1016/j.cose.2020.101824.
Dwork C. Differential privacy. Int. Colloq. Autom. Lang. Program.
2006;4052:1–12. doi:10.1007/11787006_1.
In this paper, We proposed DNN and its training protocol in a
Fang C, Guo Y, Hu Y, Ma B, Feng L, Yin A. Privacy-preserving and
collaborative setting for secure and effective FL training. The communication-efficient federated learning in internet of
proposed scheme testified its performance by showing higher things. Comput. Secur. 2021;103:102199.
classification accuracy while a lower attack success rate than doi:10.1016/j.cose.2021.102199.
differential privacy models. The paper has substantiated the Fang C, Guo Y, Wang N, Ju A. Highly efficient federated learning
scalability of the proposed scheme as it shows high perfor- with strong privacy preservation in cloud computing. Comput.
mance on both FedAvg and FedSgd protocols. Also, the pa- Secur. 2020;96:101889. doi:10.1016/j.cose.2020.101889.
Friedrich R. Untersuchungen über systeme integrierbarer
per demonstrated the experimental analysis on the success-
unktionen. Mathematische Annalen 1910;69(4):449–97.
ful performance of the proposed scheme on FL. In the future, doi:10.1007/bf01457637.
we plan to identify the relationship between distance loss and Geiping J, Bauermeister H, Dröge H, Moeller M. Inverting
classification loss in order to clarify the relationship between gradients - How easy is it to break privacy in federated
classification accuracy and attack success rate. learning?, 33; 2020. p. 16937–47.
Geyer, R. C., Klein, T., Nabi, M., 2017. Differentially private
federated learning: a client level perspective. arXiv preprint
arXiv:1712.07557.
Declaration of Competing Interest He K, Zhang X, Ren S, Sun J. Deep residual learning for image
recognition. In: Proceedings of the IEEE Conference on
The authors declare that they have no known competing fi- Computer Vision and Pattern Recognition (CVPR); 2016.
nancial interests or personal relationships that could have ap- p. 770–8.
Hisamoto S, Post M, Duh K. Membership inference attacks on
peared to influence the work reported in this paper.
sequence-to-sequence models: is my data in your machine
translation system? Trans. Assoc. Comput. Linguist.
2020;8:49–63. doi:10.1162/tacl_a_00299.
CRediT authorship contribution statement
Hitaj B, Ateniese G, Perez-Cruz F. Deep models under GAN:
Information leakage from collaborative deep learning. In:
Hongkyu Lee: Writing - original draft, Software, Valida- Proceedings of ACM SIGSAC Conference on Computer and
tion, Investigation. Jeehyeong Kim: Methodology, Visualiza- Communications; 2017. p. 603–18.
tion. Seyoung Ahn: Software, Data curation. Rasheed Hussain: doi:10.1145/3133956.3134012.
computers & security 109 (2021) 102378 19

Jayaraman B, Wang L, Evans D, Gu Q. Distributed learning without Yeom S, Giacomelli I, Fredrikson M, Jha S. Privacy risk in machine
distress: Privacy-preserving empirical risk minimization. learning:analyzing the connection to overfitting. In:
Proceedings of Advances in Neural Information Processing Proceedings of IEEE Computer Security Foundations
Systems, 2018. Symposium (CSF); 2018. p. 268–82. doi:10.1109/CSF.2018.00027.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification Zhang C, Li S, Xia J, Wang W, Yan F, Liu Y. BatchCrypt: efficient
with deep convolutional neural networks, 25; 2012. homomorphic encryption for cross-silo federated learning. In:
Leino K, Fredrikson M. Stolen memories: Leveraging model Proceedings of USENIX Annual Technical Conference (USENIX
memorization for calibrated white-box membership ATC 20); 2020. p. 493–506.
inference. In: Proceedings of USENIX Security Symposium Zhang J, Zhang J, Chen J, Yu S. GAN enhanced membership
(USENIX Security 20); 2020. p. 1605–22. inference: A passive local attack in federated learning. In:
Li K, Luo G, Ye Y, Li W, Ji S, Cai Z. Adversarial privacy preserving Proceedings of IEEE International Conference on
graph embedding against inference attack. IEEE Internet Communications (ICC); 2020. p. 1–6.
Things J. 2020;Early Access:1–12. doi:10.1109/ICC40277.2020.9148790.
doi:10.1109/JIOT.2020.3036583. Zhang X, Fu A, Wang H, Zhou C, Chen Z. A privacy-preserving and
Li K, Luo G, Ye Y, Li W, Ji S, Cai Z. Adversarial privacy-preserving verifiable federated learning scheme. In: ICC 2020 - 2020 IEEE
graph embedding against inference attack. IEEE Internet International Conference on Communications (ICC); 2020.
Things J. 2021;8(8):6904–15. doi:10.1109/JIOT.2020.3036583. p. 1–6. doi:10.1109/ICC40277.2020.9148628.
Li Y, Zhou Y, Jolfaei A, Yu D, Xu G, Zheng X. Privacy-preserving Zhao, B., Mopuri, K. R., Bilen, H., 2020. iDLG: improved deep
federated learning framework based on chained secure leakage from gradients. arXiv:2001.02610, pp. 1-5.
multi-party computing. IEEE Internet Things J. 2020. Zhu L, Liu Z, Han S. Deep leakage from gradients. Proceedings of
doi:10.1109/JIOT.2020.3022911. 1–1 Advances in Neural Information Processing Systems
Liang Y, Cai Z, Yu J, Han Q, Li Y. Deep learning based inference of (NeurIPS), 2019.
private information using embedded sensors in smart devices.
IEEE Netw. 2018;32(4):8–14. doi:10.1109/MNET.2018.1700349. Hongkyu Lee received his BSE degree in Electrical Engineering
Lim WYB, Luong NC, Hoang DT, Jiao Y, Liang Y-C, Yang Q, from the Hanyang University, Ansan, South Korea (2019). Since
Niyato D, Miao C. Federated learning in mobile edge networks: 2019, he is a master’s student with the Information and Intelli-
acomprehensive survey. IEEE Commun. Surv. Tutor. gent Security (IIS) Lab. at Kennesaw State University. His research
2020;22(3):2031–63. doi:10.1109/COMST.2020.2986024. interest includes privacy-preserving deep learning and trustwor-
Liu C, Chakraborty S, Verma D. Secure model fusion for thy machine learning that ensures robustness and accuracy of ma-
distributed learning using partial homomorphic encryption. chine learning models.
Policy-Based Auton. Data Gov. 2019:154–79.
Jeehyeong Kim received the BSE and Ph.D. degrees in Computer
doi:10.1007/978-3-030-17277-0_9.
Science and Engineering from Hanyang University, South Korea,
Maaten LVD, Hinton G. Visualizing data using t-SNE. J. Mach.
in 2015 and 2020, respectively. He was a Post-doctoral researcher
Learn. Res. 2008;9(86):2579–605.
with the Information and Intelligent Security (IIS) Lab. at Kenne-
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA.
saw State University from August 2020 to January 2021. Currently,
Communication-efficient learning of deep networks from
he is a senior researcher of Autonomous IoT Platform Research
decentralized data, PMLR 54; 2017. p. 1273–82.
Center at Korea Electronics Technology Institute, Republic of Ko-
Melis L, Song C, Cristofaro ED, Shmatikov V. Exploiting
rea. His research interests include machine learning in cyber se-
unintended feature leakage in collaborative learning. In:
curity and digital twin.
Proceedings of IEEE Symposium on Security and Privacy (SP);
2019. p. 691–706. doi:10.1109/SP.2019.00029.
Seyoung Ahn received the B.E. degree in Computer Science and
Nasr M, Shokri R, Houmansadr A. Comprehensive privacy
Engineering from Hanyang University, South Korea, in 2018. He is
analysis of deep learning: Passive and active white-box
currently pursuing the M.S.-leading-to-Ph.D. degree in Computer
inference attacks against centralized and federated learning.
Science and Engineering, Hanyang University, South Korea. Since
In: Proceedings of IEEE Symposium on Security and Privacy
2018, he has been with the Computer Science and Engineering,
(SP); 2019. p. 739–53. doi:10.1109/SP.2019.00065.
Hanyang University of Engineering, South Korea. His research in-
Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacy-preserving
terests include federated learning and network automation with
deep learning via additively homomorphic encryption, 13;
machine learning and deep learning techniques.
2017. p. 1333–45. doi:10.1109/TIFS.2017.2787987.
Salem A, Zhang Y, Humbert M, Berrang P, Fritz M, Backes M. Rasheed Hussain is currently working as an Associate Professor
ML-leaks: Model and data independent membership and Head of the MS program in Security and Network Engineer-
inference at-tacks and defenses on machine learning models. ing (SNE) at Innopolis University, Russia. He is also the Head of
Proceedings of the Network and Distributed System Security the Networks and Blockchain Lab at Innopolis University. He re-
Symposium (NDSS), 2019. ceived his B.S. Engineering degree in Computer Software Engineer-
Shokri R, Shmatikov V. Privacy-preserving deep learning. ing from University of Engineering and Technology, Peshawar, Pak-
Proceedings of the 22nd ACM SIGSAC Conference on istan in 2007, MS and PhD degrees in Computer Science and Engi-
Computer and Communications Security (CCS), 2015. neering from Hanyang University, South Korea in 2010 and 2015,
Shokri R, Stronati M, Song C, Shmatikov V. Membership inference respectively. He worked as a Postdoctoral Fellow at Hanyang Uni-
attacks against machine learning models. Proceedings of the versity, South Korea from March 2015 to August 2015 and as a guest
IEEE Symposium on Security and Privacy, 2017. researcher and consultant at University of Amsterdam (UvA) from
Tran A-T, Luong T-D, Karnjana J, Huynh V-N. An efficient approach September 2015 till May 2016. He also worked as Assistant Pro-
for privacy preserving decentralized deep learning models fessor at Innopolis University, Innopolis, Russia from June 2016
based on secure multi-party computation. Neurocomputing till December 2018. He serves as editorial board member for vari-
2021;422:245–62. doi:10.1016/j.neucom.2020.10.014. ous journals including IEEE Access, IEEE Internet Initiative, Inter-
Wei K, Li J, Ding M, Ma C, Yang HH, Farokhi F, Jin S, Quek TQS, net Technology Letters, Wiley, and serves as reviewer for most of
Poor HV. Federated learning with differential privacy: the IEEE transactions, Springer and Elsevier Journals. He was the
Algorithms and performance analysis, 15; 2020. p. 3454–69. symposium chair for IEEE ICC CISS 2021. He also serves as techni-
doi:10.1109/TIFS.2020.2988575. cal program committee member of various conferences such as
20 computers & security 109 (2021) 102378

IEEE VTC, IEEE VNC, IEEE Globecom, IEEE ICCVE, and so on. He was a Postdoctoral Visiting Scholar in the Department of Electri-
is a certified trainer for Instructional Skills Workshop (ISW). Fur- cal Engineering, Stanford University. He is currently a Professor in
thermore, he is also ACM Distinguished Speaker and Senior Mem- the dept. of Computer Science and Engineering, Hanyang Univer-
ber of IEEE. His research interests include Information Security sity.
and Privacy and particularly security and privacy issues in Ve-
hicular Ad Hoc NETworks (VANETs), vehicular clouds, and vehic- Junggab Son received the BSE degree in computer science and
ular social networking, applied cryptography, Internet of Things, engineering from Hanyang University, Ansan, South Korea (2009),
Content-Centric Networking (CCN), Digital Twins (DT), Artificial and the Ph.D. degree in computer science and engineering from
Intelligence in cybersecurity, eXplainable AI (XAI), and blockchain. Hanyang University, Seoul, South Korea (2014). From 2014 to 2016,
he was a Post-doctoral Research Associate with the Department of
Sunghyun Cho received his B.S., M.S., and Ph.D. in Computer Sci- Math and Physics, North Carolina Central University. From 2016 to
ence and Engineering from Hanyang University, Korea, in 1995, 2018, he was a Research Fellow and a Limited-term Assistant Pro-
1997, and 2001, respectively. From 2001 to 2006, he was with Sam- fessor at Kennesaw State University. Since 2018, he has been an
sung Advanced Institute of Technology, and with Telecommuni- Assistant Professor of Computer Science and a Director of Infor-
cation R&D Center of Samsung Electronics, where he has been en- mation and Intelligent Security (IIS) Lab. at Kennesaw State Uni-
gaged in the design and standardization of MAC and network lay- versity.
ers of WiBro/WiMAX and 4G-LTE systems. From 2006 to 2008, he

You might also like