Personalized Federated Learning: A Combinational Approach: Sone Kyaw Pye Han Yu
Personalized Federated Learning: A Combinational Approach: Sone Kyaw Pye Han Yu
1 Introduction This paper addresses this gap and evaluates various combinations
of personalization approaches in scenarios of plain FL,
Federated Learning (FL) is a distributed machine learning (ML) differentially private FL (DP-FL), and robust aggregation FL
approach that involves multiple users, referred to as clients, (RA-FL). We observe that existing personalization approaches
collaboratively training a global model without transferring data affect different aspects of the FL process, such as MoE not
from local storage to a central server [1]. Ideally, an FL model affecting the FL model, while FT further trains the FL model after
performs better than individual models trained only on each federated training, and KD and MTL modifies FT. Therefore, we
client's data due to more training data. FL can be further classified try out all possible combinations of these approaches. Our main
into two scenarios: cross-device and cross-silo FL, with the contributions in this paper are as follows:
difference between them being the number of clients, with the
latter having significantly fewer clients but with more data per • We demonstrate that for certain clients, FL does not provide
client [2]. FL's distributed data paradigm contrasts traditional ML, enough performance incentive to be a part of the federation
which requires data to be stored in a single location which brings of clients and incorporating DP and RA can further reduce
about concerns in terms of communication costs and privacy. that incentive due to performance degradation.
There can be significant communication overheads incurred in • We propose combinations of personalization approaches
transferring data from devices to a central location, and such data comprising common personalization approaches.
• We empirically show that these combinations yield better updates for aggregation. RA reduces the impact that statistical
performance gains than standalone personalization outliers have on model weights as only median weight, which
approaches and compensate for performance degradation. outliers do not contribute to, is used. RA is represented as such:
• We observe that certain combinations are more impactful in 𝐺 𝑡 = 𝐺 𝑡−1 + 𝜂(𝑃̃ 𝑡 − 𝐺 𝑡−1 ) (4)
certain scenarios and tasks while others improve
performance across the board, and a combination of where 𝑃̃ 𝑡 is the element-wise median for the updates acquired by
approaches always tends to be better than individual ones. the server performing the aggregation in round 𝑡 of FL training.
RA has also been shown to degrade performance of models [7].
The rest of the paper is organized as follows. Section 2 presents
background on FL, DP, RA, and personalization approaches. 2.3 Personalization of FL Models
Section 3 presents our experimental setup, Section 4 showcases Numerous personalization approaches have been proposed, and
results and analysis and conclusion is presented in Section 5. most can be categorized into the following archetypes:
2 Background Finetuning (FT): FL model after federated training is further
trained on client's local data. The intuition is akin to transfer
2.1 Federated Learning
learning, where knowledge acquired from a global pool of data is
A typical FL training process would encompass: leveraged to learn better local features instead of learning from
1. Selection of training participants: In each FL training scratch on a limited local pool of data [10]. A variant of FT, called
round 𝑡 = 1 … 𝑇, server randomly samples m clients. This
freeze-base FT, involves freezing some model layers, such as
selection only occurs in cross-device FL, as cross-silo FL
involves all clients due to the small number of total clients. base layers, and leaving only top layers unfrozen [7].
2. Distribution of Initial Global Model: Selected clients from Multi-task learning (MTL): A MTL problem involves solving
Step 1 download the latest model 𝐺 𝑡−1 from server. multiple related tasks together using commonalities across tasks
3. Local Training: Each client trains 𝐺 𝑡−1 for K epochs using [11]. In FL, training the FL model and personalizing it can be
local data and computes an update 𝑃𝑖𝑡 to 𝐺 𝑡−1 . treated as related tasks [12]. In [7], FL training process is treated
4. Aggregation: Server collects updates and averages them into as task X and personalization for a client as task Y to formulate an
updated global model 𝐺 𝑡 using Federated Averaging MTL problem. The aim is to use 𝐺 𝑇 , which is optimized for X,
(FedAvg) with aggregation learning rate η: and optimize it for Y, producing personalized model A.
𝑚
𝜂 This optimization can be viewed as an extension of FT with a
𝐺 𝑡 = 𝐺 𝑡−1 + ∑(𝑃𝑖𝑡 − 𝐺 𝑡−1 ) (1)
𝑚 different loss function. To address possible catastrophic forgetting
𝑖=1
This process can go on as long as new data is available for [13] of X while optimizing for Y, elastic weight controls [14] are
training from clients and there are clients eligible for training. used [7] to reduce rate of learning on critical layers/weights for X.
As such, cross-entropy loss is augmented:
2.2 Differential Privacy & Robust Aggregation
𝜆 2
DP limits information learnable about clients from model updates 𝑙(𝐴, 𝑥) = 𝐿𝑐𝑟𝑜𝑠𝑠 (𝐴, 𝑥) + ∑ ( 𝐹𝑖 (𝐴𝑖 − 𝐺𝑖𝑇 ) ) (5)
2
or FL models [8]. However, DP degrades performance of the FL 𝑖
model. In formal terms, differential privacy (DP) provides an where 𝜆 is the importance of task X vs. Y, F is the Fisher
(𝜖, 𝛿) privacy guarantee when the federated mechanism M and information matrix, and 𝑖 is the label of each parameter.
two sets of users Q, Q' that differ by one participant produce
models in any set G with probabilities that satisfy: Knowledge distillation (KD): KD involves extracting learned
features of a teacher model to teach a student model [15]. In FL,
Pr[𝑀(𝑄) ∈ 𝐺] ≤ 𝑒 𝜖 Pr[𝑀(𝑄 ′ ) ∈ 𝐺] + 𝛿 (2)
treating FL model (𝐺 𝑇 ) as the teacher and personalized model (A)
To incorporate DP in FL, it involves clipping each client's update as the student and using loss function from knowledge distillation
and adding Gaussian noise 𝑁(0, 𝜎) . Referencing Equation 1, literature, KD can be viewed as an extension to FT. Like MTL,
aggregation is modified as follows: the cross-entropy loss function is augmented as such:
𝑚
𝜂 𝑙(𝐴, 𝑥) = 𝛼𝐾 2 𝐿𝑐𝑟𝑜𝑠𝑠 (𝐴, 𝑥) (6)
𝐺𝑡 = 𝐺 𝑡−1 + ∑(𝐶𝑙𝑖𝑝(𝑃𝑖𝑡 − 𝐺 𝑡−1 , 𝑆)) + 𝑁(0, 𝜎) (3)
𝑚 𝐺 𝑇 (𝑥) 𝐴(𝑥)
𝑖=1 + (1 − 𝛼)𝐾𝐿(𝜎 ( ),𝜎( ))
𝐾 𝐾
where 𝑆 is clipping bound, and 𝑁(0, 𝜎) is noise added. These
values are dependent on the number of clients. The lesser the where KL is Kullback-Leibler divergence loss, 𝜎 is softmax, 𝛼 is
number of clients, the larger the magnitude of clipping and noise the weight parameter, and K is the temperature constant.
added to preserve privacy [8]. This makes it incompatible with
Mixture of Experts (MoE): MoE treats personalization as an
cross-silo FL, with its small number of clients.
ensemble learning task, where a local model trained only on the
RA is a suggested defense against poisoning attacks of malicious client's data is used together with the FL model [16]. The local
clients. RA replaces FedAvg, and instead of averaging updates model and FL model are put into a weighted average ensemble,
like FedAvg, the geometric median is used [9]. Typically, which can be expressed as such:
poisoning attacks would involve scaling model weights or using
𝑦 = 𝛼(𝐺 𝑇 (𝑥)) + (1 − 𝛼)(𝐷𝐸𝑖 (𝑥))
poisoned data to train the FL model before sending the poisoned
where 𝑥 is the testing data, 𝐺 𝑇 is the FL model, 𝐷𝐸𝑖 is the domain acting as a client: Amazon-branded products, Alexa-branded
expert, 𝛼 is the weight, and 𝑦 is the prediction/output. products, food, phones, and headphones [21-25]. Reviews are
rated 1 to 5. The datasets were obtained from Kaggle. This dataset
Meta-learning: A meta-learner trains a model on similar tasks
would be used for cross-silo FL for text classification.
with the aim of adapting to a new but similar task quickly despite
limited data for the new task [17]. For FL, meta-learning Cross-Sector Dataset: This dataset comprises three smaller
considers personalization for clients as similar tasks [18]. consumer review datasets of different customer service-related
sectors, each acting as a client: Amazon, Yelp, and Hotel [21,
Although these approaches have been studied individually, no
study has been conducted to explore the efficacy of combining 26,27]. All reviews are rated 1 to 5. The datasets were obtained
personalization approaches. As certain archetypes such as meta- from Kaggle. This dataset would be used for the cross-silo FL
learning are incompatible with others, they will not be included in setting for the text classification task.
our study. The compatible approaches (FT, MTL, KD, MoE) have CIFAR-10 FL Dataset: CIFAR-10 [28] is a well-known dataset
not been studied on both cross-silo and cross-device FL scenarios, for image classification. For cross-device FL, the training set is
and a comparison across scenarios, tasks, and combinations of divided into 100 subsets, each acting as a client. Following [29],
personalization approaches has not been done before. to simulate non-iid-ness in the dataset amongst clients, each client
is allocated images from each class using a Dirichlet distribution
3 Methodology with α = 0.9. As for evaluating the FL model on each client,
This study explores how different combinations of personalization unlike other datasets where each client had its own test set, the
approaches impacts performance of FL across various original CIFAR-10 test set is used, with the model's per-class
tasks/scenarios. Tasks and datasets used are in Table 1. accuracy being multiplied by the corresponding class's ratio in the
Table 1: FL Tasks and Datasets client's training set and summed up.
Scenario Task Dataset Clients Reddit Dataset: For next-word prediction with cross-device FL,
Cross- Image Classification Office 3 the dataset from [29], which is made up of posts of 80,000 Reddit
Silo DomainNet 5 users from November 2017, is used [30]. A corpus of the 50,000
Text Classification Cross-Sector 3
most frequent words is used for the task, with the rest being
Cross-Product 5
replaced with the <unk> token. The data for each user was split
Cross- Image Classification CIFAR-10 100
Device into the training and testing sets in a ratio of 90:10.
Next Word Prediction Reddit 80,000
3.2 Tasks & Model Architecture
3.1 Datasets Image Classification: ResNet18 model architecture [31] was
As datasets made explicitly for FL are still rare, it is common to used for image classification tasks, with randomly initialized
retrofit ML datasets into FL ones by dividing the datasets into weights. Stochastic gradient descent (SGD) was used as the
subsets, representing clients. Domain adaptation datasets, which optimizer for all experiments as most Federated Learning works
already have the dataset divided into domains, can be used, with currently use SGD. The metric was top-1 accuracy.
each domain representing a client. The subsections that follow
will elaborate on the details of the six datasets we used. The FL model was trained for 100 rounds for cross-silo FL, with
all clients participating for every round. For cross-device FL, the
Office Dataset: This dataset contains three domains/clients: FL model was trained for 1000 epochs, with each round involving
Amazon, Webcam, and DSLR, representing cross-silo FL ten randomly selected clients. For both scenarios, local training
scenario for image classification. Each client contains images for each client was for two epochs.
from Amazon or images taken using a webcam or DSLR camera
[19]. The unequal number and different origin of images for each Text Classification: A CNN model with word embeddings for
client simulate the non-iid nature of FL data. sentence classification [32] was used. FL model was trained for
20 rounds, with two epochs of local training for each client, with
DomainNet Dataset: This dataset contains five domains/clients: all clients participating every round.
Infograph, Painting, Quickdraw, Real and Sketch [20], with each
client having different forms of visual representations for the Next-Word Prediction: For next-word prediction, a two-layer
same classes of objects. This dataset is used for the cross-silo FL LSTM model with 200 hidden units and 10 million parameters
scenario for the task of image classification. Only a subset of the was used, following [29]. FL model was trained for 2000 rounds
entire DomainNet dataset was used for this project in the interest with 100 randomly selected clients participating in each round.
of time and available computation resources. The subset has For personalization of the FL model, 8000 clients were randomly
seventeen randomly chosen classes but retains the five domains as selected for personalization experiments rather than all clients.
per the full dataset. 3.3 Personalization Approaches
Cross-Product Dataset: This dataset comprises five smaller In terms of coming up with combinations of different
consumer review datasets of different product categories, each personalization approaches, there is a need to consider the cross-
compatibility of different approaches and where they augment the since each client contributes a significant proportion of the overall
traditional FL process. FT is universally compatible and is the dataset compared to the cross-device scenario.
basis for the other approaches except for MoE. FB is a
modification of FT, so it is universally compatible as well. MTL
and KD modify FT/FB's loss function so they would be mutually
exclusive in combinations. MoE would come in after local
personalization is done through a combination of FT/FB and
KD/MTL. As such, the combinations of personalization
approaches to be explored can be found in Table 2.
Table 2: Combinations of Approaches
Combination FT FB KD MTL MoE
1 ✓
2 ✓ ✓
3 ✓ ✓ Figure 1: Performance comparison between local and FL
4 ✓ models
5 ✓ ✓ 4.2 FL with DP and RA
6 ✓ ✓ With DP and RA, performance degrades further as expected, as
7 ✓ seen in Figure 2, which compares the average accuracy of tasks
8 ✓ ✓ for normal FL against DP-FL and RA-FL where applicable.
9 ✓ ✓ ✓ Results suggest that RA-FL causes more degradation of
10 ✓ ✓ ✓ performance than DP in cross-device tasks. This could be due to
11 ✓ ✓ non-iid data, together with median aggregation, which would take
12 ✓ ✓ ✓ the median update that belongs to a single client, even though that
13 ✓ ✓ ✓ client might have a skewed distribution of data.
4.4.3 Combinations with KD and MTL. Although KD and implementations should always try to include combination
MTL on their own are not effective personalization approaches, approaches as part of their personalization solutions.
when combined with MoE, there is a greater degree of
4.4.5 Best Combination Approach for Cross-Silo and Cross-
performance improvement. This could be due to KD and MTL
Device FL. Across the eight tasks in cross-silo FL, for image
acting to create a personalized FL model that is more influenced
classification tasks, the best combination of personalization
by the global pool of data since both approaches introduce
approaches contains FT/FB, MoE, and MTL. For text
additional influences in the loss function based on the original
classification tasks, the best combination of personalization
global FL model. This effect would allow an even wider
approaches contains just FT with MoE. Across the six tasks in
distribution of features to be accessible to the MoE ensemble, thus
cross-device FL, the best combination of personalization
increasing performance to a greater degree than FL+FT+MoE.
approaches contains FT, MoE, and MTL.
This effect is more pronounced in cross-device FL, with the
KD/MTL combinations obtaining the best performance in five out
of six setups, compared to 3 out of 8 setups for cross-silo FL. 5 Conclusions and Future Work
Between KD and MTL, MTL appears to be the better The success of federated learning systems is dependent on the
personalization approach in terms of performance, with MTL number of clients that participate, and this is influenced by the
outperforming KD in 39 setups, KD outperforming MTL in 12 benefits that clients get in terms of performance gains, as well as
setups, and being of equal performance in 5 setups. As such, MTL the protections such systems offer, such as privacy and integrity.
should be used in cross-device FL scenarios and tasks when We have shown that due to the statistical heterogeneity present
possible. KD does not appear to be a viable alternative to MTL in across clients' data and the addition of privacy and integrity
terms of performance. For cross-silo FL scenarios and tasks, the protection, the performance of FL systems can suffer, sometimes
suitability of MTL would be limited to tasks such as image to the point where non-participation is favored. Personalization of
classification and not text classification ones. FL models, either through standalone approaches or combined
4.4.4 Effect of Combination Approaches on RA-FL and DP- ones, can reverse this performance degradation and even bring
FL. All combination approaches explored managed to compensate additional gains in performance, without the need for significant
for degradation caused by the additional privacy and integrity additional resources. Among the combinations of personalization
features of RA-FL and DP-FL. Such an effect would re- approaches explored for both cross-silo and cross-device FL,
incentivize clients which may not have joined or left when the combinations with finetuning, mixture of experts, and multi-task
setup was plain FL without personalization. These individual and learning gave the best performance gains.
combination approaches also do not incur much additional Future work could be in the form of further exploration of
overheads in terms of resources like time and space, and the different FL tasks beyond the domains of computer vision and
mechanism for implementing them is already mostly available natural language processing, more varied personalization
through the implementation of the FL framework. approaches that go beyond the typical archetypes presented here,
Combination approaches clearly provide a better performance or into the vertical FL scenarios since this project is solely
gain compared to individual ones, and therefore FL framework focused on the horizontal FL scenario.
Acknowledgements [20] X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko and B. Wang, "Moment
Matching for Multi-Source Domain Adaptation", arXiv.org, 2018. [Online].
This research is supported, in part, by the National Research Available: https://arxiv.org/abs/1812.01754. [Accessed: 6- Mar- 2021]
[21] "Consumer Reviews of Amazon Products", Kaggle.com. [Online]. Available:
Foundation, Singapore under its the AI Singapore Programme https://www.kaggle.com/datafiniti/consumer-reviews-of-amazon-products.
(AISG2-RP-2020-019); the Joint NTU-WeBank Research Centre [Accessed: 6- Mar- 2021]
[22] "Amazon Alexa Reviews | Kaggle", Kaggle.com. [Online]. Available:
on Fintech (NWJ-2020-008); the Nanyang Assistant Professorship https://www.kaggle.com/sid321axn/amazon-alexa-reviews. [Accessed: 6- Mar-
(NAP); the RIE 2020 Advanced Manufacturing and Engineering 2021]
Programmatic Fund (A20G8b0102), Singapore; the SDU-NTU [23] "Amazon Fine Food Reviews | Kaggle", Kaggle.com. [Online]. Available:
https://www.kaggle.com/snap/amazon-fine-food-reviews. [Accessed: 6- Mar-
Centre for AI Research (C-FAIR), Shandong University, China. 2021]
Any opinions, findings and conclusions or recommendations [24] "Amazon Reviews: Unlocked Mobile Phones | Kaggle", Kaggle.com. [Online].
Available: https://www.kaggle.com/PromptCloudHQ/amazon-reviews-
expressed in this material are those of the authors and do not unlocked-mobile-phones. [Accessed: 6- Mar- 2021]
reflect the views of the funding agencies. [25] "Headphone Reviews | Kaggle", Kaggle.com. [Online]. Available:
https://www.kaggle.com/pbabvey/headphone-reviews. [Accessed: 6- Mar-
2021]
REFERENCES [26] "Hotel Reviews | Kaggle", Kaggle.com. [Online]. Available:
[1] H. McMahan, E. Moore, D. Ramage, S. Hampson and B. Arcas. https://www.kaggle.com/datafiniti/hotel-reviews. [Accessed: 6- Mar- 2021]
"Communication-efficient learning of deep networks from decentralized data", [27] "Yelp Reviews Dataset | Kaggle", Kaggle.com. [Online]. Available:
in Proceedings of the 20th International Conference on Artificial Intelligence https://www.kaggle.com/omkarsabnis/yelp-reviews-dataset. [Accessed: 6- Mar-
and Statistics, pages 1273–1282, 2017. Available: 2021]
https://research.google/pubs/pub44822/. [Accessed: 6- Mar- 2021]. [28] A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images",
[2] P. Kairouz et al., "Advances and Open Problems in Federated Learning", 2009.
Google Research, 2020. [Online]. Available: [29] E. Bagdasaryan and V. Shmatikov, "Differential Privacy Has Disparate Impact
https://research.google/pubs/pub49232/. [Accessed: 6- Mar- 2021]. on Model Accuracy", arXiv.org, 2021. [Online]. Available:
[3] "General Data Protection Regulation (GDPR) Compliance Guidelines", https://arxiv.org/abs/1905.12101. [Accessed: 6- Mar- 2021].
GDPR.eu. [Online]. Available: https://gdpr.eu. [Accessed: 6- Mar- 2021]. [30] "Reddit comments". [Online]. Available:
[4] G. Annas, "HIPAA Regulations — A New Era of Medical-Record Privacy?", https://bigquery.cloud.google.com/dataset/fh-bigquery:reddit_comments.
New England Journal of Medicine, vol. 348, no. 15, pp. 1486-1490, 2003. [Accessed: 6- Mar- 2021]
[5] T. Li, A. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar and V. Smith, "Federated [31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image
Optimization in Heterogeneous Networks", arXiv.org, 2018. [Online]. Recognition," arXiv.org, 2015, [Online]. Available:
Available: https://arxiv.org/abs/1812.06127. [Accessed: 6- Mar- 2021]. http://arxiv.org/abs/1512.03385. [Accessed: 6- Mar- 2021]
[6] L. Melis, C. Song, E. De Cristofaro and V. Shmatikov, "Exploiting Unintended [32] Y. Kim, "Convolutional Neural Networks for Sentence Classification,"
Feature Leakage in Collaborative Learning", arXiv.org, 2019. [Online]. arXiv.org 2014. [Online]. Available: http://arxiv.org/abs/1408.5882. [Accessed:
Available: https://arxiv.org/abs/1805.04049. [Accessed: 6- Mar- 2021]. 6- Mar- 2021]
[7] T. Yu, E. Bagdasaryan and V. Shmatikov, "Salvaging Federated Learning by
Local Adaptation", arXiv.org, 2021. [Online]. Available:
https://arxiv.org/abs/2002.04758. [Accessed: 6- Mar- 2021].
[8] H. McMahan, D. Ramage, K. Talwar and L. Zhang, "Learning differentially
private recurrent language models", in International Conference on Learning
Representations (ICLR), 2018 [Online]. Available:
https://arxiv.org/abs/1710.06963. [Accessed: 6- Mar- 2021].
[9] X. Chen, T. Chen, H. Sun, Z. Wu and M. Hong, "Distributed training with
heterogeneous data: Bridging median and mean based algorithms", Arxiv.org,
2019. [Online]. Available: https://arxiv.org/pdf/1906.01736v1.pdf. [Accessed:
6- Mar- 2021].
[10] K. Wang, R. Mathews, C. Kiddon, H. Eichner, F. Beaufays and D. Ramage,
"Federated Evaluation of On-device Personalization", arXiv.org, 2019.
[Online]. Available: https://arxiv.org/abs/1910.10252v1. [Accessed: 6- Mar-
2021].
[11] R. Caruana, "Multi-task learning", Cs.cornell.edu, 1997. [Online]. Available:
https://www.cs.cornell.edu/~caruana/mlj97.pdf. [Accessed: 6- Mar- 2021]
[12] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, "Federated multi-task
learning", in Advances in Neural Information Processing Systems, 2017
[Online]. Available: https://proceedings.neurips.cc/paper/7029-federated-multi-
task-learning.pdf. [Accessed: 6- Mar- 2021]
[13] R. French, "Catastrophic forgetting in connectionist networks", Trends in
Cognitive Sciences, vol. 3, no. 4, pp. 128-135, 1999.
[14] J. Kirkpatrick, et al. "Overcoming catastrophic forgetting in neural networks".
Proc. NAS, 114(13):3521–3526, 2017.
[15] G. Hinton, O. Vinyals and J. Dean, "Distilling the Knowledge in a Neural
Network", arXiv.org, 2015. [Online]. Available:
https://arxiv.org/abs/1503.02531v1. [Accessed: 6- Mar- 2021]
[16] D. Peterson, P. Kanani, and V. J. Marathe, "Private Federated Learning with
Domain Adaptation", arXiv.org, 2019. [Online] Available:
http://arxiv.org/abs/1912.06733. [Accessed: 11- Mar- 2021]
[17] C. Finn, P. Abbeel, and S. Levine, "Model-Agnostic Meta-Learning for Fast
Adaptation of Deep Networks," arXiv.org, 2017. [Online]. Available:
http://arxiv.org/abs/1703.03400. [Accessed: 6- Mar- 2021]
[18] Y. Jiang, J. Konečný, K. Rush, and S. Kannan, "Improving Federated Learning
Personalization via Model Agnostic Meta Learning," arXiv.org, 2019. [Online].
Available: http://arxiv.org/abs/1909.12488. [Accessed: 6- Mar- 2021]
[19] "Domain Adaptation - UC Berkeley", Domain Adaptation Project. [Online].
Available: https://people.eecs.berkeley.edu/~jhoffman/domainadapt. [Accessed:
6- Mar- 2021]