Federated Learning Challenges Methods and Future Directions
Federated Learning Challenges Methods and Future Directions
Federated Learning
Challenges, methods, and future directions
F
ederated learning involves training statistical models over
remote devices or siloed data centers, such as mobile phones
or hospitals, while keeping data localized. Training in het-
erogeneous and potentially massive networks introduces novel
challenges that require a fundamental departure from standard
approaches for large-scale machine learning, distributed optimi-
zation, and privacy-preserving data analysis. In this article, we
discuss the unique characteristics and challenges of federated
learning, provide a broad overview of current approaches, and
outline several directions of future work that are relevant to a
wide range of research communities.
Introduction
Mobile phones, wearable devices, and autonomous vehicles
are just a few of the modern distributed networks generating
a wealth of data each day. Due to the growing computational
power of these devices, coupled with concerns over transmit-
ting private information, it is increasingly attractive to store
data locally and push network computation to the edge.
The concept of edge computing is not a new one. Indeed,
computing simple queries across distributed, low-powered
devices is a decades-long area of research that has been
explored under the purview of query processing in sensor net-
works, computing at the edge, and fog computing [6], [30].
©ISTOCKPHOTO.COM/HAMSTER3D
Recent works have also considered training machine learning
models centrally but serving and storing them locally; for
example, this is a common approach in mobile user modeling
and personalization [23].
However, as the storage and computational capabilities of
the devices within distributed networks grow, it is possible to
leverage enhanced local resources on each device. In addi-
tion, privacy concerns over transmitting raw data require user-
generated data to remain on local devices. This has led to a
growing interest in federated learning [31], which explores
training statistical models directly on remote devices. The
term device is used throughout the article to describe entities
in the communication network, such as nodes, clients, sen-
Digital Object Identifier 10.1109/MSP.2020.2975749
Date of current version: 28 April 2020 sors, or organizations.
Local Local
Updates Updates
New Global
Model
FIGURE 1. An example application of federated learning for the task of next-word prediction on mobile phones. To preserve the privacy of the text data
and reduce strain on the network, we seek to train a predictor in a distributed fashion, rather than sending the raw data to a central server. In this setup,
remote devices communicate with a central server periodically to learn a global model. At each communication round, a subset of selected phones
performs local training on their nonidentically distributed user data, and sends these local updates to the server. After incorporating the updates, the
server then sends back the new global model to another subset of devices. This iterative training process continues across the network until convergence
is reached or some stopping criterion is met.
Minibatch Data
Computation
Communication
(a) (b)
FIGURE 2. (a) The distributed (minibatch) SGD. Each device, k, locally computes gradients from a minibatch of data points to approximate dFk (w) , and the
aggregated minibatch updates are applied on the server. (b) The local updating schemes. Each device immediately applies local updates, e.g., gradients, after
they are computed, and a server performs a global aggregation after a (potentially) variable number of local updates. Local updating schemes can reduce
communication by performing additional work locally.
Systems heterogeneity
In federated settings, there is significant variability in the sys-
tems characteristics across the network, as devices may differ
in terms of hardware, network connectivity, and battery power.
(a)
As depicted in Figure 4, these systems characteristics make
issues such as stragglers significantly more prevalent than in
typical data center environments. We roughly group several
key directions used to handle systems heterogeneity into 1)
asynchronous communication, 2) active device sampling, and
3) fault tolerance. As mentioned in the “Decentralized Train-
ing” section, we assume a star topology for the discussions pre-
sented in the following section.
Asynchronous communication
(b)
In traditional data center settings, synchronous (i.e., workers
waiting for each other for synchronization) and asynchronous
FIGURE 3. Centralized versus decentralized topologies. In the typical feder- (i.e., workers running independently without synchroniza-
ated learning setting and as a focus of this article, we assume (a) a star
tion) schemes are both commonly used to parallelize iterative
network where a server connects with all the remote devices. (b) Decentral-
ized topologies are a potential alternative when communication to the server optimization algorithms, with each approach having advan-
becomes a bottleneck. tages and disadvantages [37], [53]. Synchronous schemes are
Send Model
Training Training
Updates Send the
Send the Global Model
Global Model
4G Training
Device Failure
Training
FIGURE 4. Systems heterogeneity in federated learning. Devices may vary in terms of network connection, power, and hardware. Moreover, some of the
devices may drop at any time during training. Therefore, federated training methods must tolerate heterogeneous systems environments and low partici-
pation of devices, i.e., they must allow for only a small subset of devices to be active at each round.
Server Server
W ∆W2 W W W ∆W2 W W
Devices Devices
(a) (b)
Server
M (∆W1)
M (∆W3)
W M (∆W2) W W
Devices
(c)
FIGURE 5. An illustration of different privacy-enhancing mechanisms in one round of federated learning. M denotes a randomized mechanism used to
privatize the data. (a) Federated learning without additional privacy protection mechanisms, (b) global privacy, where a trusted server is assumed, and (c)
local privacy, where the central server may be malicious.