Data Fusion - KEN4223
Data Fusion - KEN4223
Lecture 2
Taxonomy of data fusion
Federated learning as model fusion
https://towardsdatascience.com/introduction-to-ibm-federated-learning-a-
https://doi.org/10.1016/B978-0-444-63984-4.00001-6 collaborative-approach-to-train-ml-models-on-private-data-2b4221c3839
Recap
“Federated learning is a machine learning setting where multiple
entities (clients) collaborate in solving a machine learning problem,
under the coordination of a central server or service provider. Each
client’s raw data is stored locally and not exchanged or transferred;
instead focused updates intended for immediate aggregation are
used to achieve the learning objective.”
Li et al., A survey on federated learning systems: vision, hype and reality for data privacy and protection,
arXiv preprint arXiv:1907.09693, 2019.
Data partitioning
Horizontal FL Vertical FL
Data Data
from A from A
labels
labels
Data
from B
Data
from B
FEDAVG [McMahan et al.]
Vertical FL
Data
from A
labels
Data
from B
Possible application
Vertical federated learning
Training objective:
2 𝜆
min σ𝑖 𝜃𝐴 𝑥𝑖𝐴 + 𝜃𝐵 𝑥𝑖𝐵 − 𝑦𝑖 + 𝜃𝐴 2 + 𝜃𝐵 2
𝜃𝐴 ,𝜃𝐵 2
(arXiv:2212.00622)
VFL: research & opportunities
(arXiv:2212.00622)
VFL: research & opportunities
• Handling Dynamic Data/ Model Drift – Continual learning
• Explainability
• Fairness
• Incentive Mechanisms
• Dataset Availability
(arXiv:2212.00622)
Privacy (FL)
Preserving the Privacy of User Data
Keeping raw data local to each device is a first step
privacy
utility
Privacy principles
aggregate
anonymous server release
model deployment
collection
………
………
deployed
model
early
network aggregation
federated
training
client minimize
data data
exposure
Federated learning landscape - privacy
local
central
differential server model deployment differential
privacy secure
………
privacy
………
multi-party
computation deployed
model
encryption
network federated
training
client
data
Robustness to attacks and failures
model deployment
………
………
evasion
attacks
clientfederated
dropouttraining
model data
poisoning poisoning
Backdoor attacks