0% found this document useful (0 votes)
39 views45 pages

Data Fusion - KEN4223

The document discusses federated learning, including taxonomy, data partitioning methods, algorithms like linear regression and federated random forest, and issues such as communication efficiency, non-IID data, and privacy. Vertical federated learning is described where different entities collaboratively train a model without sharing their raw data.

Uploaded by

Lilit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views45 pages

Data Fusion - KEN4223

The document discusses federated learning, including taxonomy, data partitioning methods, algorithms like linear regression and federated random forest, and issues such as communication efficiency, non-IID data, and privacy. Vertical federated learning is described where different entities collaboratively train a model without sharing their raw data.

Uploaded by

Lilit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Data Fusion – KEN4223

Lecture 2
Taxonomy of data fusion
Federated learning as model fusion

https://towardsdatascience.com/introduction-to-ibm-federated-learning-a-
https://doi.org/10.1016/B978-0-444-63984-4.00001-6 collaborative-approach-to-train-ml-models-on-private-data-2b4221c3839
Recap
“Federated learning is a machine learning setting where multiple
entities (clients) collaborate in solving a machine learning problem,
under the coordination of a central server or service provider. Each
client’s raw data is stored locally and not exchanged or transferred;
instead focused updates intended for immediate aggregation are
used to achieve the learning objective.”

Kairouz et al., Advances and open problems in federated learning,


2019.
Taxonomy of Federated Learning
Federated learning systems

Data Machine Privacy Communication Scale of Motivation for


partitioning learning model mechanisms architecture federation federation

- horizontal - linear models - differential - centralized - cross-silo - incentive


- vertical - neural networks privacy - decentralized - cross-device - regulation
- hybrid -… - cryptographic
methods

Li et al., A survey on federated learning systems: vision, hype and reality for data privacy and protection,
arXiv preprint arXiv:1907.09693, 2019.
Data partitioning
Horizontal FL Vertical FL

Data Data
from A from A
labels

labels
Data
from B
Data
from B
FEDAVG [McMahan et al.]
Vertical FL

Data
from A

labels
Data
from B
Possible application
Vertical federated learning

Yang, et al., Federated Machine Learning: Concept and Applications


Vertical federated learning
Part 1. Encrypted entity alignment
Monica Scannapieco, et al., 2007. Privacy Preserving Schema and Data Matching. https://doi.org/10.1145/1247480.1247553
Vertical federated learning
Part 2. Encrypted model training
• Step 1: collaborator C creates encryption pairs, send public key to A and B;
• Step 2: A and B encrypt and exchange the intermediate results for gradient
and loss calculations;
• Step 3: A and B computes encrypted gradients and adds additional mask,
respectively, and B also computes encrypted loss; A and B send encrypted
values to C;
• Step 4: C decrypts and send the decrypted gradients and loss back to A and
B; A and B unmask the gradients, update the model parameters accordingly.
Existing Vertically Federated Learning Algorithms
• Linear regression
(Gascon, et al., Privacy-preserving distributed linear regression on high-dimensional data. Proceedings on Privacy Enhancing Technologies,
2017(4):345-364,2017)
• Association rule-mining
(Vaidya, Clifton, Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 639-644. ACM, 2002.)
• K-means clustering
(Vaidya, Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the ninth ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 206-215, 2003.)
• Logistic regression
(Hardy et al., Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,
arXiv:1711.10677, 2017.)
• Random forest
(Liu, et al., Federated forest. arXiv:1905.10053, 2019.)
• XGBoost
(Cheng, et al., Secureboost: A lossless federated learning framework. arXiv:1901.08755, 2019.)
• …
Regression

(Zhu et al. Federated Learning on Non-IID Data: A Survey)


Linear regression
• 𝜂 – learning rate
• 𝜆 – regularization parameter
• 𝑥𝑖𝐴 𝑖∈𝐷𝐴
, 𝑥𝑖𝐵 , 𝑦𝑖 𝑖∈𝐷𝐵
– data set
• 𝜃𝐴 , 𝜃𝐵 – model parameters

Training objective:
2 𝜆
min σ𝑖 𝜃𝐴 𝑥𝑖𝐴 + 𝜃𝐵 𝑥𝑖𝐵 − 𝑦𝑖 + 𝜃𝐴 2 + 𝜃𝐵 2
𝜃𝐴 ,𝜃𝐵 2

Yang, et al., Federated Machine Learning: Concept and Applications


Linear regression
𝑢𝑖𝐴 = 𝜃𝐴 𝑥𝑖𝐴 , 𝑢𝑖𝐵 = 𝜃𝐵 𝑥𝑖𝐵 ,
2 𝜆
Loss: ℒ = σ𝑖 𝑢𝑖𝐴 + 𝑢𝑖𝐵 − 𝑦𝑖 + 𝜃𝐴 2
+ 𝜃𝐵 2
2
2 𝜆 2 𝜆
ℒ𝐴 = σ𝑖 𝑢𝑖𝐴 + 𝜃𝐴 , ℒ𝐵 = 𝑖 𝑢𝑖 − 𝑦𝑖 + 𝜃𝐵 2 ,
2
σ 𝐵
2 2
2
ℒ𝐴𝐵 = 2 σ𝑖 𝑢𝑖𝐴 𝑢𝑖𝐵 − 𝑦𝑖 then ℒ = ℒ𝐴 + ℒ𝐵 + ℒ𝐴𝐵
𝑑𝑖 = 𝑢𝑖𝐴 𝑢𝑖𝐵 − 𝑦𝑖 , then gradients are
𝜕ℒ 𝐴 𝜕ℒ
= σ𝑖 𝑑𝑖 𝑥𝑖 + 𝜆𝜃𝐴 and = σ𝑖 𝑑𝑖 𝑥𝑖𝐵 + 𝜆𝜃𝐵
𝜕𝜃𝐴 𝜕𝜃𝐵
Linear regression - training

Yang, et al., Federated Machine Learning: Concept and Applications


Linear regression - evaluation

Yang, et al., Federated Machine Learning: Concept and Applications


Linear regression - possible modification
SGD – linear regression
𝛼
𝜃0𝑖+1 = 𝜃0𝑖 − σ𝑛𝑙=1 𝑓 𝜽, 𝒙(𝑙) ) − 𝑦 (𝑙)
𝑛
𝛼 𝑛
𝑖+1 𝑖 (𝑙)
𝜃𝑗 = 𝜃0 − ෍ 𝑓 𝜽, 𝒙(𝑙) ) − 𝑦 (𝑙) 𝑥𝑗
𝑛 𝑙=1

Why don’t share 𝑓 𝜽, 𝒙(𝑙) ) − 𝑦 (𝑙) instead of the partial gradient?


Do we need a coordinator?

(Yang et al., Parallel Distributed Logistic Regression for Vertical Federated


Learning without Third-Party Coordinator, arXiv:1911.09824)
Do we need a coordinator?

(Yang et al., Parallel Distributed Logistic Regression for Vertical Federated


Learning without Third-Party Coordinator, arXiv:1911.09824)
Updates sequential or parallel?

(Liu, et al., A Communication-Efficient Collaborative Learning Framework


for Distributed Features, arXiv:1912.11187)
Updates sequential or parallel?

(Liu, et al., A Communication-Efficient Collaborative Learning Framework


for Distributed Features, arXiv:1912.11187)
Federated random forest

(Liu, et al., Federated Forest, arXiv:1905.10053 )


Issues?
Communication efficiency
Communication efficiency

Communication-Efficient Vertical Federated Learning


A Khan, M ten Thij, A Wilbik, Algorithms 15 (8), 273
Issues - Non-IID data
• Linear models.
- The loss function of training logistic regression in vertical FL has no
difference to that in centralized learning.
- Non-IID data does not affect the learning performance for linear models.
• Neural networks
Issues – performance, convergence, speed
E.g.,
• Multiple local updates
(Liu, et al., A Communication-Efficient Collaborative Learning Framework for Distributed Features,
arXiv:1912.11187)
• Using gradient and the Hessian of the Taylor loss approximation
of logistic regression
(Yang, et al. A Quasi-Newton Method Based Vertical Federated Learning Framework for Logistic Regression,
arXiv:1912.00513)
Issues - Privacy

• Cryptographic longterm key (CLK) for multiple personal identifiers


• Similarity between CLKs - the number of matching bits (Dice coefficient)
Frameworks

(arXiv:2212.00622)
VFL: research & opportunities

(arXiv:2212.00622)
VFL: research & opportunities
• Handling Dynamic Data/ Model Drift – Continual learning
• Explainability
• Fairness
• Incentive Mechanisms
• Dataset Availability

(arXiv:2212.00622)
Privacy (FL)
Preserving the Privacy of User Data
Keeping raw data local to each device is a first step
privacy

utility
Privacy principles
aggregate
anonymous server release
model deployment
collection

………
………
deployed
model
early
network aggregation
federated
training

client minimize
data data
exposure
Federated learning landscape - privacy
local
central
differential server model deployment differential
privacy secure

………
privacy

………
multi-party
computation deployed
model
encryption
network federated
training

client
data
Robustness to attacks and failures
model deployment

………
………
evasion
attacks

clientfederated
dropouttraining

model data
poisoning poisoning
Backdoor attacks

(Bagdasaryan, et. al. How to


Backdoor Federated Learning.
AISTATS’20)
Open topics
Open topics
• Going beyond empirical risk minimization formulations: tree-
based methods, online learning, Bayesian learning...
• RL, unsupervised and semi-supervised, active learning?
• Support ML workflows like hyperparameter searches?
• Make trained models smaller?
• Fairness in FL?
Open topics
• Security in FL:
- how to mitigate poisoning attacks?
- how to make local computation verifiable ?
• Do more with fewer clients or less resources per client?
• Reduce training time?
• Achieve personalization?
• Theory for FL?
• Real world applications
TRL
Exam material
• Slides
Next…
• Next week – Carnival Week
No Education!!

• 20/02/2024 (8:30 am) – Lab : Federated Learning


• 21/02/2024 – Lecture: High-Level Fusion
• 22/02/2024 – Guest lecture: Industry perspective

You might also like