0% found this document useful (0 votes)
26 views9 pages

MPAF Model Poisoning Attacks To Federated Learning Based On Fake Clients

Uploaded by

Nga Dao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views9 pages

MPAF Model Poisoning Attacks To Federated Learning Based On Fake Clients

Uploaded by

Nga Dao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients

Xiaoyu Cao Neil Zhenqiang Gong


Duke University Duke University
[email protected] [email protected]
arXiv:2203.08669v2 [cs.CR] 6 May 2022

Abstract tally vulnerable to model poisoning attacks [8,9,14,28]. All


existing model poisoning attacks assume that an attacker
Existing model poisoning attacks to federated learning has access to compromised genuine clients and rely on their
assume that an attacker has access to a large fraction of genuine local training data. Specifically, in all or some
compromised genuine clients. However, such assumption is FL rounds, the compromised genuine clients first compute
not realistic in production federated learning systems that local model updates based on their genuine local training
involve millions of clients. In this work, we propose the data [14] or their poisoned versions [8, 9], and then further
first Model Poisoning Attack based on Fake clients called manipulate the local model updates before sending them to
MPAF. Specifically, we assume the attacker injects fake the cloud server. As a result, the learnt global model mis-
clients to a federated learning system and sends carefully classifies many indiscriminate test inputs (known as untar-
crafted fake local model updates to the cloud server during geted attacks) [14] or attacker-chosen ones (known as tar-
training, such that the learnt global model has low accuracy geted attacks) [8, 9]. In this work, we focus on untargeted
for many indiscriminate test inputs. Towards this goal, our attacks because they are harder to perform as they need to
attack drags the global model towards an attacker-chosen influence the predictions for many indiscriminate test in-
base model that has low accuracy. Specifically, in each puts. Existing untargeted model poisoning attacks have
round of federated learning, the fake clients craft fake local shown their effectiveness to FL, even with the presence of
model updates that point to the base model and scale them Byzantine-robust defenses [10, 19, 30], i.e., they can reduce
up to amplify their impact before sending them to the cloud the test accuracy of the learnt global model by a significant
server. Our experiments show that MPAF can significantly amount.
decrease the test accuracy of the global model, even if clas-
sical defenses and norm clipping are adopted, highlighting However, existing untargeted model poisoning attacks
the need for more advanced defenses. all require a large fraction of compromised genuine clients
and are less effective when the fraction of compromised
genuine clients is small [14]. A recent work [21] argued that
1. Introduction such requirement of a large fraction of compromised gen-
uine clients is not realistic in production FL that involves
Federated learning (FL) is an emerging machine learn- millions of clients. Specifically, the cost for compromis-
ing paradigm for multiple clients (e.g., smartphones or IoT ing genuine clients is so high that an attacker cannot afford
devices) to jointly learn a model with the help of a cloud to compromise a large fraction of genuine clients in pro-
server. Instead of sharing their private local training data duction FL. For instance, to compromise genuine clients,
with the cloud server, the clients maintain local models to fit an attacker needs to pay for the access to a large number
their local training data and iteratively share local model up- of undetected zombie devices. As a result, the fraction of
dates with the cloud server, which aggregates the clients’ lo- compromised genuine clients is usually small (e.g., 0.01%)
cal model updates to obtain global model updates and uses in production FL. Moreover, only a subset of clients are se-
them to update a global model. FL has attracted growing at- lected in each round of production FL to participate in the
tention in both academia and industry. For instance, Google training. Therefore, it is likely that no compromised gen-
adopts FL in its Gboard application for next-world predic- uine client is selected in many rounds of production FL.
tion [2]; a union of world’s leading pharmaceutical compa- Based on these arguments, they came to a conclusion that
nies uses FL for drug discovery in a project called MEL- production FL with the non-robust FedAvg [18] or classi-
LODDY [3]; and WeBank leverages FL to predict credit cal defenses (e.g., Trimmed-mean [30]) is robust enough
risk of borrowers [5]. against untargeted model poisoning attacks. However, as
However, due to its distributed nature, FL is fundamen- we will show later, this conclusion does not stand when the
attacker can inject fake clients into FL systems and perform extra knowledge about the FL system beyond the re-
model poisoning attacks based on them. ceived global models during training.
Our work: In this work, we introduce MPAF, the first • We evaluate MPAF on multiple datasets and multiple
model poisoning attack to FL that is based on fake clients. FL methods. Our results show that MPAF is effective,
We note that the cost of injecting fake clients is much lower even if classical defenses and norm clipping are lever-
than compromising genuine clients in FL. Specifically, the aged as a countermeasure.
attacker can emulate many fake clients (e.g., android de-
vices) easily using open-source projects [1] or android em- 2. Related Work
ulators [4, 6] on their own machines.
2.1. Federated Learning (FL)
However, a key challenge of model poisoning attacks
based on fake clients is that the fake clients provide no ex- Assume there are n clients in FL, each holding some
tra knowledge (e.g., no genuine local training data) about local training data. These clients aim to collaboratively
the FL system, beyond the global models they receive from learn a global model with the help of a cloud server. Dur-
the cloud server during training. All existing model poi- ing training, each client maintains a local model based on
soning attacks [14, 28] rely on the assumption that the at- its local training data and shares its local model updates
tacker has some degree of extra knowledge about the FL with the cloud server. Specifically, in the t-th round of
system, e.g., the genuine local training data on the compro- FL, the cloud server first sends the current global model
mised genuine clients. In this work, we consider an extreme wt to all or a subset of clients. Then, the clients who re-
case for the attacker, where no extra knowledge about the ceive the global model fine-tune their local models based on
FL system (e.g., genuine local training data, global learn- the global model using stochastic gradient descent (SGD)
ing rate, or even the FL method) is available to the attacker, and their local training data. The clients then send the local
beyond the global models that fake clients receive during model updates to the cloud server. The cloud server aggre-
training. We note that in FL, the global model is shared gates the local model updates and updates the global model
with selected clients in each round, including both genuine as follows:
clients and fake ones. Therefore, our threat model considers
wt+1 ← wt + ηg t , (1)
the minimum-knowledge scenario for an attacker.
t
To address the challenge, we propose MPAF, which where η is the global learning rate, and g is the global
crafts fake local model updates based on the global mod- model update in the t-th round obtained as follows:
els only. Specifically, in MPAF, an attacker chooses an ar- g t = A(g1t , g2t , · · · , gnt ). (2)
bitrary model (called base model) that shares the same ar-
chitecture as the global model and has low test accuracy. Here, A is the aggregation rule the cloud server uses to ag-
For instance, an attacker could randomly initialize a model gregate the local model updates, which plays an important
as the base model. Our intuition is that if we can force the role in FL. Different FL methods essentially use different
global model to behave like the base model whose test accu- aggregation rules. Next, we will discuss three popular ag-
racy is low, then the test accuracy of the learnt global model gregation rules, including the non-robust FedAvg [18], and
would likely decrease. Therefore, in each round of FL, the two Byzantine-robust ones, i.e., Median [30] and Trimmed-
fake clients generate the direction of fake local model up- mean [30].
dates via subtracting the current global model from the base FedAvg: FedAvg [18] is the most popular aggregation rule
model. The fake clients then scale up the magnitudes of in FL. It calculates the average of the local model updates
the fake local model updates to enlarge their impact in the as the global model update. FedAvg achieves the state-of-
global model update. Our evaluations on multiple datasets the-art performance in non-adversarial settings.
and multiple FL methods show that MPAF is effective in Median: Median [30] is a coordinate-wise aggregation
reducing the test accuracy of the learnt global model even rule. The server sorts the values of each parameter in local
if classical defenses and norm clipping are adopted. For model updates and finds the median value as the aggregated
instance, on Purchase dataset, MPAF decreases the test ac- value for the corresponding parameter in the global model
curacy of the global model learnt using Trimmed-mean by update.
32% when 10% fake clients are injected. Trimmed-mean: Trimmed-mean [30] is another
Our contribution can be summarized as follows: coordinate-wise aggregation rule. For each model param-
• We perform the first study on model poisoning attacks eter, instead of using its median value, Trimmed-mean re-
to FL based on fake clients. moves the largest and smallest k values from its sorted val-
ues, and then computes the average of the remaining val-
• We propose MPAF, a novel untargeted model poison- ues as the corresponding parameter in the global model up-
ing attack that is based on fake clients and requires no date. In Trimmed-mean, k achieves a trade-off between
the robustness in adversarial settings and test accuracy in However, their derived provable security guarantee does not
non-adversarial settings. In our experiments, we assume a consider fake clients.
strong defender who knows the number of fake clients, i.e., A recent work [21] claims that production FL with the
k equals to the number of fake clients in each round. non-robust FedAvg or classical defenses such as Trimmed-
mean is already robust against untargeted model poisoning
2.2. Existing Model Poisoning Attacks to FL attacks that rely on compromised genuine clients, because
Various attacks [8, 9, 14, 27, 28] have been proposed to the fraction of compromised genuine clients is small in pro-
poison the global model in FL, all of which rely on com- duction FL systems. However, this claim does not stand
promised genuine clients. Based on the attacker’s goal, for fake clients based model poisoning attacks. As we will
they can be divided into two categories: untargeted model show, an attacker can inject many fake clients into FL sys-
poisoning attacks [14, 28] and targeted model poisoning at- tems and perform MPAF to degrade the performance of the
tacks [8, 9, 27]. Untargeted model poisoning attacks aim learnt global model.
to decrease the test accuracy of the global model, while tar- In fact, the claim on the robustness of FedAvg is not ac-
geted model poisoning attacks aim to force the global model curate even if the attacker only has access to a small fraction
to output attacker-chosen target labels for attacker-chosen of compromised genuine clients. [21] claims that FedAvg
target inputs. We focus on untargeted model poisoning at- is robust because 1) the server selects a small fraction of
tacks in this work. clients in each global training round, 2) compromised gen-
Existing untargeted model poisoning attacks [14, 24, 28] uine clients are unlikely to be selected when their fraction is
follow the following two steps in all or multiple rounds of small, and 3) the compromised genuine clients’ impact on
FL. First, the compromised genuine clients compute the the global model will be eliminated during training even if
genuine local model updates based on their genuine local they are selected in certain training rounds. However, ro-
training data. Then, they perturb their genuine local model bustness/security is about an FL system’s performance in
updates such that the poisoned global model updates will the worst-case scenarios. A compromised genuine client
substantially deviate from the genuine ones. These attacks can substantially degrade the global model’s accuracy in
require many compromised genuine clients to be effective. the scenario where it is selected near the end of the train-
However, in production FL, it may not be affordable for an ing process. Although such worst-case scenario happens
attacker to obtain access to a large number of compromised with a small probability when the fraction of compromised
genuine clients [21]. Therefore, we consider a more prac- genuine clients is small, it still invalidates the robustness of
tical scenario for model poisoning attacks that an attacker FedAvg.
injects fake clients to the FL system. Unfortunately, exist-
ing attacks are not applicable to such scenario, since they 3. Threat Model
require extra knowledge about the FL system (e.g., gen-
3.1. Attacker’s Goal
uine local training data), which is not available on the fake
clients. We notice that several works [15, 17] studied the The attacker’s goal is to decrease the test accuracy of
free-rider attacks with fake clients, which are orthogonal to the learnt global model. Specifically, a larger difference be-
model poisoning attacks. tween the test accuracy of the global models with and with-
out attack indicates a stronger attack.
2.3. Defenses against Model Poisoning Attacks
3.2. Attacker’s Capability
Many defenses [10, 13, 14, 19, 20, 22, 29, 30] have been
proposed against model poisoning attacks to FL, which We assume the attacker can inject many fake clients into
fall into two main categories. The first type of defense FL systems. The attacker can control these fake clients to
[10, 13, 19, 20, 29, 30] designs Byzantine-robust aggrega- send arbitrary fake local model updates to the cloud server.
tion rules. Their idea is to mitigate the impact of statis- Compared to compromising genuine clients, the cost of
tical outliers among the local model updates. For instance, injecting fake clients is much more affordable. Specifically,
Trimmed-mean [30] removes the largest and smallest values to compromise genuine clients, an attacker needs to bypass
of each coordinate in the local model updates before taking the anti-malware software on the clients’ devices, which be-
the average. The other type of defense [12, 26] aims to pro- comes more difficult as the anti-malware industry evolves.
vide provable guarantee against poisoning attacks. For in- The attacker may also choose to pay for the zombie devices
stance, Cao et al. [12] leveraged the fundamental robustness that are already compromised and could be remotely ac-
of majority vote to design an ensemble-based provably se- cessed. However, it would be too costly to buy a large num-
cure federated learning framework. They proved that when ber of zombie devices. Moreover, performing the attacks
the number of compromised genuine clients is bounded, the on compromised devices requires the attacker to evade the
predictions for test inputs are not affected by any attack. anomaly detection on the systems, making it even harder.
On the contrary, it would be easy and cheap to perform 𝒘𝑡−1 genuine local model updates
attacks based on fake clients. First, an attacker can emu-
late fake clients using open-source projects [1], or even the
𝒘∗
free softwares, e.g., android emulators on PC [4, 6]. It is
fake local model updates 𝒘𝑡
worth noting that modern android emulators support multi-
instance functionality, which means that an attacker can em-
𝒘′
ulate many instances (clients) using a single machine, sig- 𝒘𝑡+1
nificantly reducing the cost. Another advantage of using
fake client is that the attacker has full control over the de- Figure 1. Illustration of MPAF. w0 is an attacker-chosen base
vices. For instance, the android emulators can grant the at- model. wt−1 , wt , and wt+1 are the global models in round
tacker root access to the devices, and the attacker does not t − 1, t, and t + 1, respectively. w∗ is the learnt global model
need to deal with any alert that the system may probably without attack. The fake local model updates from the fake clients
raise during the attack. drag the global model towards the base model.

3.3. Attacker’s Knowledge


Another intuitive attack based on fake clients is to esti-
Existing model poisoning attacks that rely on compro- mate the benign global model updates using historical in-
mised genuine clients assume the attacker knows extra formation, and then generate fake local model updates that
knowledge about the FL system, e.g., the genuine local have the opposite direction. Specifically, in the t-th round,
training data on the compromised genuine clients, other given the current global model wt and the previous global
than the received global models during training. However, model wt−1 , we can compute the global model update g t−1
such assumption often does not hold when it comes to at- in the (t − 1)-th round as g t−1 = (wt − wt−1 )/η, where
tacks based on fake clients. Specifically, the fake clients are η is the global learning rate. Since the global model up-
created by the attacker and there are usually no genuine lo- dates in consecutive rounds do not differ much, especially
cal training data on them. Therefore, we consider a more when the global model is near convergence, we can approx-
realistic threat model, where the attacker has no knowledge imate the benign global model update in the t-th round as
about the FL system other than the received global models ĝ t ≈ g t−1 = (wt − wt−1 )/η. Under our threat model,
during training. In particular, the attacker does not know the global learning rate η is unknown to the attacker. How-
any local training data or local model updates on any gen- ever, an attacker does not need to know the exact magnitude
uine client. Moreover, the attacker does not know the FL of the benign global model update. Instead, the attacker
aggregation rule or the global learning rate that the cloud can use a large scaling factor λ to scale up the fake local
server uses. Since the global model is broadcast to selected model updates such that their magnitudes are no smaller
clients in each round of FL, including both genuine and fake than the ones from the genuine clients. Formally, a fake
clients, our threat model considers the scenario with mini- client i sends git = −λ(wt − wt−1 ) to the cloud server in
mum knowledge for the attacker. the t-th round, where the negative sign means the attacker
aims to deviate the global model to the opposite direction.
4. Our Attack We call such attack history attack.
The two baseline attacks are intuitive. However, as we
We will first discuss two baseline attacks and analyze will show in Section 5, they have limited impact on the ac-
why they are not effective. Then, we will introduce our curacy of the learnt global model when classical defenses
MPAF. (e.g., Trimmed-mean) are applied. We suspect that this is
because the attacks are not consistent in consecutive rounds.
4.1. Baseline Attacks Specifically, the attacks may successfully deviate the global
A naive way of performing model poisoning attacks with model to some direction by a small step in each individ-
limited knowledge is to use random noise as the fake local ual FL round. However, such deviations may have different
model updates. For instance, the fake clients could sam- directions in different rounds, which means the deviations
ple a Gaussian random noise for each model parameter. may cancel out in multiple rounds.
They can then enlarge the magnitudes of the random lo-
4.2. MPAF
cal model updates using a scaling factor λ. The fake clients
send the scaled random noise to the cloud server as the fake Figure 1 illustrates our MPAF. The attacker selects a base
local model updates. Formally, the i-th fake client sends model w0 that has low test accuracy. For instance, the at-
git = −λε to the cloud server in the t-th round, where ε tacker can select a randomly initialized model as the base
is a random vector sampled from the multivariate Gaussian model, whose test accuracy is near random guessing. In
distribution N (0, I). We call such attack random attack. MPAF, the fake clients craft their local model updates to
drag the global model towards the base model. Specifically, we distributed the training examples to the clients with de-
in the t-th round of FL, the fake clients generate fake lo- gree of non-IID q = 0.5 to simulate non-IID training data.
cal model updates, whose direction is determined via sub- We use the same CNN architecture for the global model as
tracting the current global model parameters from the base in [11].
model parameters. Then the fake clients scale up their fake Fashion-MNIST: Like MNIST, Fashion-MNIST [25] is a
local model updates by a factor λ to amplify their impact. 10-class image classification dataset with 60,000 training
The key challenge of attacks based on fake clients is that examples and 10,000 testing examples. Similar to MNIST,
the attacker has minimum knowledge about the FL system, we distribute the training examples to the clients with de-
i.e., only the global models received during training. There- gree of non-IID q = 0.5. We use the same CNN as the one
fore, finding an effective way of leveraging such limited in- for MNIST.
formation becomes the critical component of attacks. In
MPAF, our main idea is to force the global model to mimic Purchase: Purchase [7] is a 100-class classification
the base model w0 . Formally, we formulate our attack as dataset, whose goal is to predict customer’s purchase styles.
the following optimization problem: There are 197,324 purchase records in Purchase, each of
which has 600 binary features. We split the dataset into
min kwT − w0 k, (3) 180,000 training records and 17,324 test records. We dis-
git ,i∈[n+1,n+m],t∈[0,T −1] tribute the training data evenly to the clients. We use a fully
connected neural network as the global model architecture.
where n is the number of genuine clients, m is the number There is one hidden layer in the network, whose number of
of fake clients (n+1, n+2, · · · , n+m are the fake clients), neurons is 1,024 and activation function is Tanh.
T is the number of FL rounds during training, wT is the
learnt final global model, and k · k represents the `2 norm.
Note that our problem formulation takes the entire training 5.1.2 FL and Attack Settings
process into consideration. Specifically, in any FL round,
the fake clients have the same goal of deviating the final For all three datasets, we assume there are n = 1, 000 gen-
global model towards a fixed attacker-chosen base model. uine clients in total. We define the fraction of fake clients as
We solve the optimization problem via driving the global the number of injected fake clients divided by the number
model towards the base model in each FL round. Specifi- of genuine clients, i.e., m/n. By default, we assume there
cally, in the t-th round of FL, the fake clients compute the are m = 100 fake clients, i.e., the fraction of injected fake
direction of fake local model updates by subtracting the cur- clients is 10%, unless otherwise mentioned. In each round
rent global model from the base model, i.e., d = w0 − wt . of FL, the genuine clients train their local model using SGD
The global model is closer to the base model if it is deviated with batch size of 32, 32, and 128 for MNIST, Fashion-
to this direction. Then, the fake clients scale up d by a fac- MNIST, and Purchase, respectively. We set the global learn-
tor λ to amplify the magnitude. The final fake local model ing rate η to 0.01, 0.01, and 0.005 for the three datasets, re-
update for a fake client i in the t-th round is as follows: spectively. We use different settings for different datasets
to achieve high test accuracy in non-adversarial settings. In
git = λ(w0 − wt ). (4) each FL round, we assume the cloud server randomly sam-
ples β fraction of clients to participate in training. We set
An attacker can choose a large λ to guarantee that the attack the default value of β to 1, i.e., the server selects all clients
is still effective after the cloud server aggregates the fake in each round during training. We will evaluate the impact
local model updates from the fake clients and the genuine of β in our experiments. We further set the number of FL
local model updates from the genuine clients. rounds T to 200 200 500
β , β , and β for the three datasets. This
is because a smaller β indicates fewer clients in each FL
5. Evaluation round, thus needs more rounds to converge. For our attacks,
we set the default value of the scaling factor λ = 1 × 106
5.1. Experimental Setup and we will explore its impact. We repeat the attacks in
5.1.1 Datasets and Global Model Architectures each experiment for 20 times with different random seeds
and report the average results.
We evaluate our attacks using multiple datasets, i.e.,
MNIST [16], Fashion-MNIST [25], and Purchase [7].
5.1.3 Evaluation Metric
MNIST: MNIST [16] is a benchmark image classification
dataset. There are 60,000 training examples and 10,000 We focus on untargeted model poisoning attacks in this
testing examples of 10 classes, where each example is a work, whose goal is to decrease the test accuracy of the
hand-written digit image of size 32 × 32. Following [14], learnt global model. Therefore, we use the test accuracy
1.0 1.0 1.0
random attack
0.8 history attack 0.8 0.8
test accuracy

test accuracy

test accuracy
MPAF
0.6 0.6 0.6

0.4 0.4 0.4


random attack random attack
0.2 0.2 history attack 0.2 history attack
MPAF MPAF
0.0 0.0 0.0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
fraction of fake clients (%) fraction of fake clients (%) fraction of fake clients (%)

1.0 1.0 1.0


random attack
0.8 history attack 0.8 0.8
test accuracy

test accuracy

test accuracy
MPAF
0.6 0.6 0.6

0.4 0.4 0.4


random attack random attack
0.2 0.2 history attack 0.2 history attack
MPAF MPAF
0.0 0.0 0.0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
fraction of fake clients (%) fraction of fake clients (%) fraction of fake clients (%)

1.0 1.0 1.0


random attack
0.8 history attack 0.8 0.8
test accuracy

test accuracy

test accuracy
MPAF
0.6 0.6 0.6

0.4 0.4 0.4


random attack random attack
0.2 0.2 history attack 0.2 history attack
MPAF MPAF
0.0 0.0 0.0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
fraction of fake clients (%) fraction of fake clients (%) fraction of fake clients (%)

(a) FedAvg (b) Median (c) Trimmed-mean

Figure 2. Test accuracy of the global models learnt by different FL methods under the three attacks when the fraction of fake clients varies.
The datasets are MNIST (first row), Fashion-MNIST (second row) and Purchase (third row).

of the learnt global models as our metric. A lower test ac- model learnt with Trimmed-mean by 32% when there are
curacy indicates a stronger attack. 10% fake clients, while the baseline attacks can only de-
crease the test accuracy by at most 4%. Moreover, we also
5.2. Evaluation Results observe that MPAF is more effective when the fraction of
Impact of the fraction of fake clients: We explore the fake clients is larger. For instance, on Purchase dataset
impact of the fraction of fake clients on two baseline at- when Trimmed-mean is used, the test accuracy that MPAF
tacks (i.e., random attack and history attack) and MPAF. can reduce increases from 32% to 49% when the fraction of
Figure 2 shows the test accuracy of the global models learnt malicious clients increases from 10% to 25%.
by different FL methods when the fraction of fake clients Impact of the sample rate β: We evaluate the effective-
varies on the three datasets. We observe that when Fe- ness of MPAF when the server samples different fractions
dAvg is used, both baseline attacks and MPAF can reduce of clients in each FL round. Figure 3 shows the test accu-
the test accuracy of the learnt global models to random racy of the global models learnt with Trimmed-mean on all
guessing with only 1% fake clients. However, when clas- three datasets. We omit the results of non-robust FedAvg
sical defenses (e.g., Median and Trimmed-mean) are ap- for simplicity as the test accuracy is consistently close to
plied, MPAF can still significantly decrease the test accu- random guessing under MPAF. We observe that the sam-
racy while the baseline attacks cannot. For instance, on Pur- ple rate β does not have much impact on MPAF and that
chase dataset, MPAF reduces the test accuracy of the global MPAF can significantly decrease the test accuracy when β
1.0 1.0 1.0

0.8 0.8 0.8


test accuracy

test accuracy

test accuracy
0.6 0.6 0.6

0.4 0.4 0.4

0.2 no attack 0.2 no attack 0.2 no attack


MPAF MPAF MPAF
0.0 0.0 0.0
0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00
β β β

(a) MNIST (b) Fashion-MNIST (c) Purchase

Figure 3. Impact of the sample rate β on the test accuracy of the global models learnt by Trimmed-mean.

1.0 1.0 1.0

0.8 0.8 0.8


test accuracy

test accuracy

test accuracy
0.6 0.6 0.6

0.4 0.4 0.4

0.2 no attack 0.2 no attack 0.2 no attack


MPAF MPAF MPAF
0.0 0.0 0.0
0.1 10 1000 100000 0.1 10 1000 100000 0.1 10 1000 100000
λ λ λ

(a) MNIST (b) Fashion-MNIST (c) Purchase

Figure 4. Impact of the scaling factor λ on the test accuracy of the global models learnt by Trimmed-mean.

ranges from 0.01 to 1.00. The previous claim that FedAvg main unchanged. Formally, a local model update g becomes
g
and classical defenses are robust to untargeted model poi- max(1,kgk2 /M ) after norm clipping. The largest `2 -norm of
soning attacks when β is small [21] does not apply to our the clipped local model updates is M . Therefore, the im-
attack. This is because their claim is based on the assump- pact of the malicious local model updates will be limited.
tion that an attacker can only compromise a small fraction As a result, the backdoor attacks [8] that rely on scaled lo-
of genuine clients. cal model updates will have lower attack success rate when
Impact of the scaling factor λ: We explore the impact of norm clipping is adopted as a countermeasure.
the scaling factor on MPAF. Figure 4 shows the test accu- We note that the idea of using norm clipping as a coun-
racy of the global models learnt by Trimmed-mean on all termeasure is not limited to backdoor attacks. In fact, it may
three datasets. We observe that the test accuracy first de- also be leveraged as a countermeasure against untargeted at-
creases as λ increases, and then remains almost unchanged tacks that involve scaling. In MPAF, we use a scaling factor
when λ further increases. Our results show that even though λ to increase the impact of fake local model updates during
the attacker does not know the hyperparameters of FL (e.g., aggregation. Therefore, it is intuitive to apply norm clipping
the global learning rate η), by choosing a reasonably large as a countermeasure against MPAF. We empirically evalu-
value for λ, e.g., λ ≥ 1 in our experiments, MPAF can re- ate the effectiveness of MPAF when norm clipping is used
duce the test accuracy of the global model significantly. as a countermeasure. Specifically, we use our default set-
ting for Fashion-MNIST dataset and Trimmed-mean as the
6. Norm Clipping as A Countermeasure aggregation rule. Before using Trimmed-mean to aggregate
A recent work [23] has proposed norm clipping as a the local model updates, we clip them with norm threshold
countermeasure against backdoor attacks in federated learn- M , where we vary the value of M in our experiments. We
ing. Specifically, the server selects a norm threshold M , omit the results of FedAvg for simplicity as the test accu-
and clips all local model updates whose `2 -norm is larger racy is consistently close to random guessing under MPAF.
than M such that their `2 -norm becomes M . The local Figure 5 shows the test accuracy of the global model
model updates whose `2 -norms are no larger than M re- learnt by Trimmed-mean on Fashion-MNIST. We use M →
based on fake clients. For instance, an interesting future
1.0 work is to improve MPAF with extra knowledge, e.g., train-
no attack ing data/model obtained from a similar learning task.
0.8 MPAF
test accuracy

Second, existing untargeted model poisoning attacks


based on compromised genuine clients (e.g., [14]) formu-
0.6
late round-wise optimization problems. Specifically, in
each individual round of FL, the compromised genuine
0.4
clients solve an independent problem to obtain the mali-
cious local model updates. The solutions to these indepen-
0.2
dent problems may contradict to each other. As a result,
such malicious local model updates in different rounds may
0.0
0.1 1.0 10.0 100.0 1000.0 ∞ cancel each other out, leading to sub-optimal overall attack
M effect. On the contrary, our MPAF leverages a simple yet
effective way of formulating a global optimization problem
that deviates the global model towards a fixed base model.
Figure 5. Impact of the norm clipping bound M on the test ac-
curacy of the global model learnt by Trimmed-mean on Fashion- Third, it is an interesting future work to extend our
MNIST. MPAF to perform targeted model poisoning attacks. Specif-
ically, an attacker can choose a base model that has an
attacker-desired targeted behavior, e.g., a backdoored base
∞ to represent the case when there is no norm clipping. We model. By forcing the learnt global model to be close to a
observe that MPAF can still effectively decrease the test ac- backdoored base model, the learnt global model may have
curacy of the global model when norm clipping is deployed. the same backdoor behavior as the base model and predict
Specifically, under no attack, the global model achieves the attacker-chosen target labels for attacker-chosen test inputs.
largest test accuracy of 0.85 when M → ∞. However,
under MPAF, the global model achieves the largest test ac- Acknowledgements
curacy of 0.68 when M is around 100, which represents
0.17 accuracy loss. We also observe that the difference be- We thank the anonymous reviewers for constructive re-
tween the test accuracy of the global model under MPAF views and comments. This work was supported by Na-
and the one under no attack is smaller as M decreases. tional Science Foundation under grant No. 2112562 and
This is because more fake local model updates are clipped 1937786, as well as Army Research Office under grant No.
as M decreases. However, as M decreases, the test ac- W911NF2110182.
curacy under no attack also decreases, e.g., M < 100 in
Figure 5 leads to a test accuracy that is much lower than References
that when M → ∞. This is because when M decreases,
more benign local model updates are also clipped, which [1] Android-x86 run android on your pc. https://www.android-
x86.org/. 2, 4
results in a less accurate global model. Our results indicate
that MPAF is still effective in reducing the test accuracy [2] Federated learning: Collaborative machine learn-
ing without centralized training data. https :
of the global model, even if both classical defenses (e.g.,
//ai.googleblog.com/2017/04/federated-
Trimmed-mean) and norm clipping are adopted. learning-collaborative.html. 1
[3] Machine learning ledger orchestration for drug discovery
7. Conclusion and Discussion (melloddy). https://www.melloddy.eu/. 1
[4] Noxplayer, the perfect android emulator to play mobile
In this work, we proposed MPAF, the first model poison- games on pc. https://www.bignox.com/. 2, 4
ing attack to FL that is based on fake clients. We considered
[5] Utilization of fate in risk management of credit in small and
a minimum-knowledge setting for the attacker and showed micro enterprises. https://www.fedai.org/cases/utilization-
that our attack is effective even when classical defenses and of-fate-in-risk-management-of-credit-in-small-and-micro-
norm clipping are applied, highlighting the need for more enterprises/. 1
advanced defenses against model poisoning attacks based [6] The world’s first cloud-based android gaming platform.
on fake clients. https://www.bluestacks.com/. 2, 4
We hope our work can inspire more future studies on [7] Acquire valued shoppers challenge at kaggle.
model poisoning attacks and their defenses. First, since it https://www.kaggle.com/c/acquire-valued-
is unrealistic for an attacker to compromise a large fraction shoppers - challenge / data, Last accessed April,
of genuine clients, it is more interesting to explore attacks 2021. 5
[8] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah [20] Shashank Rajput, Hongyi Wang, Zachary Charles, and Dim-
Estrin, and Vitaly Shmatikov. How to backdoor federated itris Papailiopoulos. Detox: A redundancy-based framework
learning. In AISTATS, 2020. 1, 3, 7 for faster and more robust gradient aggregation. In NIPS,
[9] Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, 2019. 3
and Seraphin Calo. Analyzing federated learning through an [21] Virat Shejwalkar, Amir Houmansadr, Peter Kairouz, and
adversarial lens. In ICML, 2019. 1, 3 Daniel Ramage. Back to the drawing board: A critical evalu-
[10] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, ation of poisoning attacks on production federated learning.
and Julien Stainer. Machine learning with adversaries: In S&P, 2022. 1, 3, 7
Byzantine tolerant gradient descent. In NeurIPS, 2017. 1, [22] Shiqi Shen, Shruti Tople, and Prateek Saxena. Auror: De-
3 fending against poisoning attacks in collaborative deep learn-
[11] Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang ing systems. In ACSAC, 2016. 3
Gong. FLTrust: Byzantine-robust federated learning via trust [23] Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and
bootstrapping. In NDSS, 2021. 5 H Brendan McMahan. Can you really backdoor federated
[12] Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. Prov- learning? In FL-NeurIPS 2019 Workshop, 2019. 7
ably secure federated learning against malicious clients. In [24] Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and
AAAI, 2021. 3 Ling Liu. Data poisoning attacks against federated learning
[13] Yudong Chen, Lili Su, and Jiaming Xu. Distributed statisti- systems. arXiv preprint arXiv:2007.08432, 2020. 3
cal machine learning in adversarial settings: Byzantine gra- [25] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-
dient descent. In POMACS, 2017. 3 mnist: a novel image dataset for benchmarking machine
[14] Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhen- learning algorithms, 2017. 5
qiang Gong. Local model poisoning attacks to byzantine- [26] Chulin Xie, Minghao Chen, Pin-Yu Chen, and Bo Li. Crfl:
robust federated learning. In USENIX Security Symposium, Certifiably robust federated learning against backdoor at-
2020. 1, 2, 3, 5, 8 tacks. In ICML, 2021. 3
[15] Yann Fraboni, Richard Vidal, and Marco Lorenzi. Free-rider
[27] Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. Dba:
attacks on model aggregation in federated learning. In AIS-
Distributed backdoor attacks against federated learning. In
TATS, 2021. 3
ICLR, 2020. 3
[16] Yann LeCun, Corinna Cortes, and CJ Burges. Mnist
[28] Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Fall of
handwritten digit database. Available: http://yann. lecun.
empires: Breaking byzantine-tolerant sgd by inner product
com/exdb/mnist, 1998. 5
manipulation. In UAI, 2020. 1, 2, 3
[17] Jierui Lin, Min Du, and Jian Liu. Free-riders in fed-
erated learning: Attacks and defenses. arXiv preprint [29] Cong Xie, Sanmi Koyejo, and Indranil Gupta. Zeno: Dis-
arXiv:1911.12560, 2019. 3 tributed stochastic gradient descent with suspicion-based
[18] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth fault-tolerance. In ICML, 2019. 3
Hampson, and Blaise Agüera y Arcas. Communication- [30] Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter
efficient learning of deep networks from decentralized data. Bartlett. Byzantine-robust distributed learning: Towards op-
In AISTATS, 2017. 1, 2 timal statistical rates. In ICML, 2018. 1, 2, 3
[19] El Mahdi El Mhamdi, Rachid Guerraoui, and Sébastien
Rouault. The hidden vulnerability of distributed learning in
byzantium. In ICML, 2018. 1, 3

You might also like