Federated Learning For 6G Communications: Challenges, Methods, and Future Directions
Federated Learning For 6G Communications: Challenges, Methods, and Future Directions
Abstract—As the 5G communication networks are being widely move towards the next generation of wireless technology, i.e.,
deployed worldwide, both industry and academia have started to Sixth-generation (6G) [1], [2].
move beyond 5G and explore 6G communications. It is generally The 5G system represents a new wireless communica-
believed that 6G will be established on ubiquitous Artificial
arXiv:2006.02931v2 [cs.NI] 12 Jul 2020
Intelligence (AI) to achieve data-driven Machine Learning (ML) tion paradigm that adopts a service-based architecture (SBA)
solutions in heterogeneous and massive-scale networks. However, instead of a communication-oriented architecture (COA) to
traditional ML techniques require centralized data collection and achieve “connected things”. In contrast to previous genera-
processing by a central server, which is becoming a bottleneck tions, 6G with transformative technologies will revolutionize
of large-scale implementation in daily life due to significantly the development of wireless communication from “connected
increasing privacy concerns. Federated learning, as an emerging
distributed AI approach with privacy preservation nature, is things” to “connected intelligence” [1]. Specifically, 6G will
particularly attractive for various wireless applications, especially revolutionize technology in three areas: new media, new
being treated as one of the vital solutions to achieve ubiquitous services, and new infrastructures. It is expected that the 6G
AI in 6G. In this article, we first introduce the integration system will adopt advanced artificial intelligence (AI) tech-
of 6G and federated learning and provide potential federated nologies in these fields, and promptly and efficiently collect,
learning applications for 6G. We then describe key technical
challenges, the corresponding federated learning methods, and transmit, and learn data anytime, anywhere to generate a large
open problems for future research on federated learning in the number of innovative applications and intelligent services [3].
context of 6G communications. In particular, ubiquitous AI will empower the promising 6G,
Index Terms—6G communication, federated learning, security a hyper-flexible architecture that brings human-centric devel-
and privacy protection opment concepts to all aspects of network systems, instead
of data-centric, machine-centric, and application-centric [4].
Therefore, 6G communications have higher-level security and
I. I NTRODUCTION stronger privacy protection requirements.
However, traditional Machine Learning (ML) empowered
eMBB
Web
[10]; (iv) model training and inference efficiency problems employ novel communication networks to support highly
among massive-scale 6G networks. We then propose advanced diversified data, e.g., audio, video, AR/VR data, which reaches
federated learning methods to address the above challenges new communication experience with virtual networking exis-
from different perspectives. Finally, we describe the open tence and involvement anywhere [2].
research topics and future directions of FL in 6G commu- 2) Higher Energy Efficiency: In the 6G era, there exist
nications. higher energy efficiency requirements for wireless devices with
charging constraints and battery life limitations. Therefore,
II. P RELIMINARIES AND OVERVIEW long battery life and low energy consumption are two pop-
A. Key 6G Requirements and Use Cases ular research topics for 6G communications. To address the
Unlike 5G communications, 6G has prominent features energy problems of wireless devices, especially smartphones,
to ensure ubiquitous, seamless, intelligent, high-performance existing studies have proposed energy harvesting technology,
connectivity and networking with security and privacy pro- wireless power transfer technology, and green communication
tection. More specifically, we will introduce the performance to improve energy efficiency and extend the working time of
requirements of 6G communications as follows. wireless devices [4]. Especially, the wireless devices can har-
1) High Performance Networking: It is commonly be- vest energy from ambient radio-frequency, solar, geothermal
lieved that 6G is a complex networking system with many het- energy, and wind energy by using different energy harvesting
erogeneous space-air-ground-underwater communication net- technologies, which can prolong the battery life. Similarly,
works [2], [4]. The three-dimensional super-connectivity net- the wireless devices with wireless charging equipment can
works provide worldwide connectivity and integrated network- obtain energy supplement from dense network infrastructures
ing to enable different types of network services and dense or mobile charging stations, e.g., Unmanned Aerial Vehicle
coverage through sub-networks and sub-systems, e.g., satellite (UAV), Electric Vehicles (EVs), through wireless power trans-
communication networks, underwater-land communications. fer technology.
With the help of massive-scale heterogeneous networks, 6G Recently, to address energy problems for wireless devices,
communications can achieve up to 1 Tbps data rate per user, and emerging technology named symbiotic radio (SR) is
ultra-low end-to-end delay, superior end-to-end reliability, and introduced to integrate passive backscatter devices with an
high energy efficiency networking [2]. Compared with 5G active transmission system [11], [12]. A typical example of SR
communications, 6G communications support networking and is ambient backscatter communication, that enables network
connecting the majority not only in dense areas but also the devices to utilize ambient RF signals to transmit information
less dense areas, such as the underwater environment, in an without requiring active RF transmission, making battery-free
efficient and low overhead manner [4]. 6G communications communication possible [11], [13]. Smart energy management
CHINA COMMUNICATIONS 3
is another promising mechanism with the goal of dynamically intelligent voice assistants to complete their daily sched-
optimizing the balance between energy demand and supply ules [15]. The 6G network ubiquitous AI will provide
[11]. users with highly intelligent applications.
For green communication techniques, AI-based solutions • Service Intelligence Furthermore, as a human-centric
are quite important to optimize energy usage and energy network, the high intelligence of the 6G network will
scheduling for wireless devices in a dynamic environment and provide intelligent services in a satisfactory and person-
complex optimization goals. Advanced machine learning tech- alized manner [1], [2], [4]. For example, FL provides
niques, such as deep reinforcement learning, can be utilized to users with personalized healthcare services, personalized
optimize the computation task offloading decision of a wireless recommendation services, and personalized intelligent
device, and also make the best scheduling solution of working voice services in a distributed learning manner. In the
and sleeping time, which can lower energy consumption and future, intelligent services will be tightly integrated with
enhance energy efficiency. The AI-based solutions can be also the 6G networks [4].
applied in multi-hop information routing in cooperative relay 5) Increased Device Density: Compared with 5G, the
communication and communication infrastructure deployment 6G has much higher transmission rates and shorter delay,
in network-densification 6G scenarios, which significantly greater device density, and the integration of Artificial Intel-
reduces the transmit power of the wireless devices without ligence (AI). With the increased device density and explosive
long propagation distance thus enabling high-efficiency com- increasing data traffic, it is more and more important to
munication [2], [4]. solve the network capacity challenges. One of the potential
3) High Security and Privacy: Existing research mainly solutions is to provide increasingly more but smaller radio
focuses on network throughput, reliability, and delay in 4G cells that can transmit data quickly and energy-efficiently.
and 5G communications [4]. However, in the past few decades, These cells are required to be connected as seamlessly as
wireless communication security and privacy issues have been possible to the fiber-optic core networks via high-performance
ignored to some extent. Since data security and privacy issues transmission links. An important goal is to connect these
are closely related to users’ lives, protecting data security and wireless transmission links directly to fiber-optic networks
privacy has become a very important part of human-centric without complex electronics. Thus the fiber optic networks can
6G communications. Meanwhile, communication/data service provide extremely high transmission capacity and reliability
providers legally collect a large amount of user information, for massive devices with insignificant latency through flexible
which leads to frequent leakage of privacy data. In order to and ubiquitous wireless networks [16].
solve this problem, it is envisaged that FL techniuqes can 6) Green Communication: It is significant for green com-
be used to achieve privacy-enhanced deep learning in 6G munication to make good decisions for optimizing resource
networks. utilization and communication efficiency. In 6G communi-
4) High Intelligence: The high intelligence of 6G will be cation scenarios, due to massive network traffic, innumer-
beneficial to provide users with high-quality, personalized, and able network devices, and dynamic network environments,
intelligent services. High-intelligent 6G includes operational there exist more and more complex resource optimization
intelligence, application intelligence, and service intelligence problems, e.g., green communication optimization and of-
as follows. floading decision, that traditional mathematical programming
• Operational Intelligence. Traditional network operations techniques and optimization solutions cannot tackle. Recently,
involve a series of resource optimization and multi- data science and AI-based optimization have also largely been
objective performance optimization problems [1]. In order used to solve problems related to resource optimization, task
to achieve a satisfactory level of network operation, opti- assignment in distributed systems, because of its advantages of
mization methods based on game theory, contract theory, data-driven decision, dynamic flexibility, and self-adjustment.
etc. are widely used. However, these optimization theories
may not obtain the optimal solution in large-scale time-
B. Typical Use Scenarios
varying variables and multi-objective scenarios. With the
development of deep learning technologies, the above Compared with previous generations, the 5G service model
can be solved by using advanced machine learning tech- has been transformed into a service-based architecture, and
nologies. On the other hand, the emergence of federated its user cases include: enhanced Mobile Broad Band (eMBB),
learning has transformed the multi-objective linear opti- massive Machine-Type-Communications (mMTC), and ultra-
mization problems into a nonlinear optimization problem, Reliable Low-Latency Communications (uRLLC). As shown
thus finding out the best solution for complex and time- in Fig. 2, driven by Industry 5.0 and deep learning technolo-
varying decisions in operational intelligence [2]. gies, 6G will provide the following new service types:
• Application Intelligence. At present, applications related • New Media. With the rapid development of wireless
to 5G networks are gradually becoming intelligent. For network communication technologies, it can be expected
6G networks, intelligent applications are one baseline that the form of information interaction will gradually
of application requirements [9], [14]. FL empowered evolve from AR/VR to high fidelity extended reality (XR)
wireless communication technologies to enable devices to interaction after 10 years, and even realizing wireless
connect with 6G networks to run a variety of intelligent holographic communication [1]–[4]. Users can enjoy the
applications. For example, in the future, users may need new services brought by holographic communication
CHINA COMMUNICATIONS 4
New Infrastructure
New Services
New Media
Remote Surgery Integrated Terrestrial
Virtual Education and Space
XR Virtual Painting
Human-centric 6G Communications Cloud PLC Federated Networks
Virtual Tourism
Intelligent Transportation
System Trustable infrastructure
and holographic display anytime and anywhere, such as of decentralized devices associated with different services can
virtual education, virtual tourism, virtual sports, virtual collaboratively train a shared global model (e.g., anomaly
painting, virtual concerts, and other fully immersive holo- detection, recommendation system, next-word prediction, etc.)
graphic experiences. by using locally collected datasets.
• New Services. According to ITU-T’s 6G communica- As shown in Fig. 3, the procedure of FL-based architecture
tion technology white paper, beyond and high-precision is divided into three phases: the initialization, the training, and
teleport technology will provide users with a variety of the aggregation phase. In the initialization phase, a device will
new services [1]–[4], [17]. Holographic teleport, quantum evaluate its service requests, needs, and connection conditions,
communication, visible light communication (VLC), and and decides whether to register with the nearest cloud to join
other communication technologies have subverted the the training of the shared global model via a wired or wireless
traditional service model. For example, industries such connection (e.g., 6G). Then, the cloud acting as task publisher
as remote surgery [18], cloud PLC [19], and intelligent will randomly select a subset of devices from the registered
transportation systems [20] will be empowered by the devices to participate in this round of training, and reject the
new service model to provide users with better ser- remaining registered devices. The cloud will also send ini-
vices. The goal of these new technologies is to provide tialized or pre-trained global model ωt to each selected device
high-precision services, deterministic service, and best- (steps 1 , 2 ). In the training phase, each selected device trains
guaranteed services. global model ωtk ← ωt by using local dataset to obtain the
New Infrastructure. With the development of deep k
• updated global model ωt+1 in each iteration. In particular,
learning technologies, the 6G communication system has for the k-th device (k ∈ {1, 2, . . . , K}), the loss function
spawned many emerging infrastructures such as Inte- needs to be optimized as follows: arg min Fk (ω), Fk (ω) =
ω∈R
grated Terrestrial and Space [19], federated learning net- 1
P
Dk i∈Dk fi (ω), where Dk denotes the size of local dataset
works [14], decentralized infrastructures [1], and trustable
that contains input-output vector pairs (xi , yi ), xi , yi ∈ R, ω
infrastructure [4]. In particular, the FL network benefits
is local model parameter, and fi (ω) is a local loss function
from the high bandwidth and low latency of the 6G
(e.g., fi (ω) = 21 (xi T ω − yi )). Each selected device uploads
network, which has brought many emerging intelligent
the model updates to the cloud (steps 3 , 4 , 5 ). In the
applications to cities, factories, and people.
aggregation phase, the cloud receives model updates of all
selected devices for aggregation to obtain a new global model
C. Federated Learning K
1
P
In this subsection, we introduce an FL-based distributed ωt+1 for the next iteration, i.e., ωt+1 ← ωt − K Fk (ω),
k=1
learning architecture in 6G. In this architecture, a large number where K denotes the number of edge nodes. In the next round,
CHINA COMMUNICATIONS 5
Local Model
Efficient and Effective Issues Security and Privacy Issues
Loading
Communication-efficient Secure Federated
Federated Learning Learning
③
Efficient Federated Edge Node Privacy-preserving
Learning ①② Federated Learning
Local Dataset
Uplink: Upload the Downlink:
model updates Download the global
model
Smart Phone ③ ①⑤
① IoT Sensor
Model Updates ①
Local Model Aggregation Local Model
④⑥
Loading Cloud Loading
③ ③
Smart Phone IoT Sensor
①② Local Dataset ①② Local Dataset
Fig. 3. An overview of federated learning process in 6G [24].
the device selected by the cloud downloads the current latest B. Challenge 2: Security Problems
global model ωt+1 from the cloud. The device will use the Since 6G networks can provide ubiquitous services across
received new global model to update its respective model. In a wider geographic area, the computing and communication
the next round of training, the cloud will randomly select a new capabilities of each device in the network may vary due to
device subset and repeat the above process until the trained changes in hardware (CPU, GPU), network connectivity (4G,
model converges or meets the stopping criteria (step 6 ). 5G, 6G, WiFi), and energy (battery level). Obviously, the
system heterogeneity between the devices will bring some
III. C ORE C HALLENGES FOR F EDERATED L EARNING IN
confusion and faults to the FL model and 6G network [9], [14].
6G
Additionally, there may be unreliable devices in the FL, which
In this section, we introduce the core challenges of FL, may cause the Byzantine failure of the system. Similarly,
which are the main bottleneck problem before large-scale adversaries may launch active learning-based attacks (like poi-
deployment of FL in 6G applications. soning attacks and backdoor attacks) on heterogeneous devices
and cause errors in the FL system. The security vulnerabilities
A. Challenge 1: Expensive Communication of these FL systems greatly exacerbate challenges such as
Since the FL involves thousands of devices participating mitigating attacks, tolerance, and faults. Therefore, developing
during model training, communication is a critical bottleneck a secure and robust FL must: (i) defend against malicious
for FL being widely used in 6G [14]. Previous studies [6], attacks, (ii) tolerate heterogeneous hardware, and (iii) achieve
[8], [10], [24]–[28] has made many efforts to improve the robust aggregation algorithms.
communication efficiency of FL system. Furthermore, it is
challenging for FL networks to achieve communication in the
FL networks is synchronized with the local calculation of the C. Challenge 3: Privacy Concerns
device [8], [10], [29]. To make the FL model suitable for 6G Although FL protects the privacy of each device by shar-
networks with massive, heterogeneous devices and networks, ing model updates (e.g., gradients information) instead of
it is necessary to develop a communication-efficient method, the raw data, the private data will still be disclosed during
which can greatly reduce the number of gradients exchanged the interaction between the device and the cloud [30]. For
between the devices and the cloud instead of all gradients example, adversaries will launch membership inference [30]
information. In order to further reduce communication over- or gradient leakage attacks [31] to steal local training data
head in this setting, two key aspects need to be considered: from the devices. Previous work has focused on using tools
(i) reducing the total number of communication rounds, or such as secure multi-party (SMC) computing or homomorphic
(ii) reducing the number of gradients in each communication encryption (HE) to enhance the privacy of FL, but these
round. methods cannot address the above malicious attacks [14]. SMC
CHINA COMMUNICATIONS 6
…
D. Challenge 4: Effective Issues Device i
Synchronous Federated Learning Asynchronous Federated Learning
Deploying FL models to devices generally involves model
: Computation : Communication
training and inference [14]. If the speed of model training
and reasoning is relatively slow, users will not be able to
Fig. 4. The overview of the synchronous and asynchronous FL.
experience real-time intelligent services [32]. Therefore, when
FL systems are widely deployed in 6G networks, they will
encounter the following challenges: (i) the size of the FL 2) Communication-efficient FL: Algorithm Level: At the
model is too large to adapt to a single device; (ii) the FL algorithm level, achieving communication-efficient FL can
model training is too slow to meet the delay requirements reduce the communication rounds of training a model by
of the 6G network; (iii) the FL model inference is too slow accelerating convergence [29] and reduce the communication
to satisfy the user’ real-time demand. Efficient training and cost of each round by using gradient compression techniques
inference are necessary for the perfect integration of FL and [10] (e.g., sparsification, quantization, etc.). More details are
6G networks. However, it is challenging for FL systems to described below.
achieve efficient model training and inference in a massive, • Accelerating Model Convergence: Stochastic gradient
heterogeneous network. descent (SGD) algorithms based on zero-order, first-order,
IV. A DVANCED F EDERATED L EARNING M ETHODS F OR 6G second-order, and federated optimization are used to
reduce the number of rounds of model training [8]. Since
To address the aforementioned challenges, we propose ad- the federated optimization method can protect the private
vanced federated learning systems through different emerging data on each device, it is very popular with this unique
technologies or methods to enable communication-efficient, motivation in accelerating training model convergence.
secure, and privacy-enhanced federated learning, respectively. • Reducing Communication Overhead: Gradient spar-
sification and gradient quantization can greatly reduce
A. Communication-efficient Federated Learning For 6G
the large number of gradients exchanged between the
In 6G, it is challenging for devices that span a larger devices and the cloud to achieve communication-efficient
geographic area to obtain a better global model but with FL. Lin et al. in [10] proposed a Top-k selection-based
huge communication overhead. The communication overhead gradient compression scheme to improve communication
will affect the gradient exchange between the devices and efficiency. In this scheme, the authors can compress the
the cloud, thus affecting the model aggregation at the cloud. gradient 300 times to reduce the number of gradients
Therefore, we need to find a more efficient way to achieve FL without compromising accuracy.
training. In this subsection, we will explain communication-
efficient FL from the perspectives of system-level and algo-
rithm level, which promotes a wider-range FL deployment and B. Secure Federated Learning For 6G
usage for 6G communications. Due to the wide range of 6G network connections, FL
1) Communication-efficient FL: System Level: From a will suffer malicious attacks from heterogeneous networks,
system perspective, data distribution (e.g., non-independent heterogeneous devices, and malicious participants during the
and identical distribution), device distribution (e.g., hetero- training process [9]. To alleviate this problem, researchers
geneous devices across regions and networks), computation have proposed many different defense solutions from three
methods (e.g., decentralized and centralized), and commu- perspectives: aggregation algorithm, detection mechanism, and
nication mechanisms (e.g., synchronous and asynchronous reputation management.
scheme) have different impacts on communication efficiency 1) Robust Aggregation Algorithm: Aggregation is a very
in different application scenarios [8]. important operation in the FL training process that directly
• Asynchronous FL System: As shown in Fig. 4, AFLS affects the results of model convergence. The motivation of the
can reduce the computation time of the devices by robust aggregation algorithm is to greatly reduce the impact
asynchronously aggregating the model updates, thereby of low-quality model updates generated by malicious devices
improving the communication efficiency of FL. Let κ = (i.e., poisoning attacks) on global model training. Furthermore,
Comm
Comp+Comm , where κ represents communication effi- this method can make the cloud tolerate Byzantine failures
ciency, Comm is the communication time, and Comp is of some devices [33], [34]. For example, Ang et al. in [35]
the computation time. It can be seen from Fig. 4 that the proposed the regularizer approximation method to reduce the
Comp of asynchronous model update scheme is shorter noise interference of heterogeneous devices and heterogeneous
than that of synchronous one, so the communication networks.
efficiency κ of the asynchronous model update scheme 2) Robust Detection Mechanism: Another intuitive idea is
is higher than that of the synchronous one. to detect malicious devices to prevent them from participating
CHINA COMMUNICATIONS 7
in FL training. Such a mechanism has generally utilized the Data parallization achieves efficient training by running
accuracy of the sub-model generated by the device as an multiple training samples in parallel [39], [40]. Model
evaluation metric to detect malicious devices. Liu et al. in parallelization accelerates model training by splitting the
[9] utilized the smart contract techniques in the blockchain to model over multiple processors [32].
design a malicious device detection mechanism to alleviate the • Federated Distillation: Model distillation adopts transfer
malicious attack problems. learning to utilize the output of a pre-trained complex
3) Reliable Reputation Management: The historical be- model (i.e., Teacher model) as a supervised signal to train
haviors of the devices can be used as a key indicator to another simple network, i.e., Student model. Such a way
evaluate its reliability and trustworthiness by a metric named can train student models to improve the efficiency of
reputation. The high reputation value indicates more reliable model training. Jeong et al. in [26] proposed federated
devices. Inspired by this, establishing a reputation management distillation (FD), an efficient distributed model training
scheme for device historical behaviors in FL can also prevent algorithm, whose training efficiency is much smaller than
malicious devices from damaging the global model. Kang the FL benchmark scheme, especially when the model
et al. in [15] proposed a reputation management scheme to size is large.
calculate the historical reputation of the devices to achieve a 2) Efficient Inference:: The size of the existing FL model
robust FL with high-reputation devices. is too large to realize real-time inference on the devices.
Efficient inference can be achieved in the following ways.
C. Privacy-preserving Federated Learning For 6G • Pruning: The pruning technique is a model optimization
1) Differentially Privacy: Differential privacy (DP) [36] technique that includes removing excess weights in the
techniques are proposed to protect the privacy of gradient weight tensor. The compressed neural network not only
information, thereby achieving cloud-level privacy protection. runs faster but also reduces the computational cost of the
Geyer et al. in [37] applied the DP technique in FL system that training network, which is a critical step in deploying the
protects cloud-level privacy. Similarly, in order to protect user- model to mobile phones or other edge devices.
level privacy, the local differential privacy (LDP) techniques • Weight Sharing. Weight sharing reduces the number of
achieve this goal by disturbing the gradients uploaded by the model parameters by sharing weights, thereby achieving
devices [30]. However, DP and LDP technologies enhance efficient model inference. The reason is that the fewer
FL privacy at the expense of model performance. Therefore, the parameters of the model, the smaller the model size.
there are currently advanced methods that balance privacy and Tran et al. in [27] utilized weight sharing approach for
performance as described below. wireless networks to improve model inference efficiency.
2) Deep Net Pruning: Neural network pruning is a tech-
nique of deep learning whose goal is to develop a smaller V. O PEN R ESEARCH T OPICS A ND F UTURE D IRECTIONS
and more efficient neural network. Recently, Huang et al. A. Trustworthy Federated Learning
[38] utilized pruning as an equivalent technique of DP to
1) Privacy-enhanced Federated Learning: Previous work
protect the privacy of the FL system while ensuring the model
about FL has covered user or cloud-level privacy for all
performance. Such a method creates a new idea of using model
devices in the 6G networks. However, in practice, the previous
pruning to be equivalent to DP techniques, which provides new
schemes provide strict privacy restrictions at the expense
opportunities for balancing utility and privacy.
3) Gradient Compression: The reason why adversaries of accuracy [14]. It is essential for FL to develop privacy-
can infer the local data of the devices is that the gradient enhanced techniques that do not compromise accuracy to
information contains rich semantic information [31]. Inspired provide strict privacy guarantees because the industry is very
by the above, an intuitive idea is that the methods that disrupt concerned about the accuracy of the FL model. To this end,
the distribution of gradient information thus protecting the few studies are exploring potential solutions. For example,
gradient privacy. Zhu et al. in [31] proved that gradient com- Huang et al. [38] recently proposed a net pruning technique to
pression can defend against gradient leakage attacks without provide strict privacy guarantees by replacing pruning with DP
compromising accuracy and the defense effect is better than technique, and also to improve the training efficiency of the
that of DP. model. It is an interesting and ongoing direction to developing
methods that can balance efficiency and privacy restrictions in
future work.
D. Effective Federated Learning For 6G 2) Security-enhanced Federated Learning: Since the FL
The long-term goal of human-centric communication ser- systems normally involve multiple entities of devices, cloud,
vices in 6G networks is to handle user needs in real time. and machine learning model providers, it is vulnerable to
Therefore, it is necessary to achieve efficient FL from training malicious attacks from adversaries against different entities.
and inference. Although existing work has made a lot of efforts to provide
1) Efficient Training: Efficient training can greatly reduce strong security protection for the FL systems, there is little
the training time of mobile devices to achieve efficient FL. work to defend or mitigate these malicious attacks from the
The advanced training methods are summarized as follows. system perspective. Bonawitz et al. in [28] explored several
• Federated Parallelization: Data and model paralleliza- more secure and robust aggregation algorithms and fault
tion are generally used to accelerate model training. tolerance mechanisms from the perspective of system design.
CHINA COMMUNICATIONS 8
The serurity-enhanced techniques are designed from a system 2) Neural Architecture Search: The structure of the cur-
perspective so that FL can develop more practical industrial rent FL models is generally predefined, but this predefined
applications with the help of 6G networks. architecture may not be the best choice because it may not be
3) Fair Federated Learning: FL involves thousands of suitable for non-independent and identical distribution (non-
devices training a shared global model in massive, heteroge- IID) data. Therefore, the Neural Architecture Search (NAS)-
neous networks [41]. Naive optimizing the global model in based Automating FL (AutoFL) schemes may be a promising
such a network may be unfair to some devices by causing solution to this problem. For example, a study in [42] proposed
disproportionate advantages or disadvantages. Obviously, FL a federated NAS (FedNAS) algorithm to help distributed
towards fairness is an indispensable requirement for human- devices collaborate to find a better architecture with higher
centric 6G communication services. Specifically, a fair FL accuracy. NAS provides opportunities for seeking a better FL
in a wireless network involves fair resource allocation and a model architecture in the future.
reasonable incentive mechanism. How to allocate computing
and communication resources accurately and fairly in massive, C. Towards Incentive Federated Learning
heterogeneous networks has become a critical challenge that
needs to be solved urgently. Some pioneering work, Li et Existing studies mainly focus on enhancing the performance
al. in [41] proposed q-Fair FL (q-FFL), which is a new of FL algorithms, e.g., accuracy and training time. Neverthe-
aggregation algorithm to achieve a fair allocation of resources less, an optimistic assumption, that all the data owners are
and accuracy. willing to join the FL anytime and anywhere, is not practical
in 6G scenarios with massive self-interest devices. As a result,
4) Explainable Federated Learning: The vast majority of
incentive mechanisms for honest and active participation are
FL models are black-box models (i.e., without interpretability),
a core and urgent research topic [7], [25], [43]–[46]. Some
which makes users unable to understand what kind of services
interesting topics include: i) Due to information asymmet-
the model provides for themselves. In a complex 6G network
ric between task publishers and participating devices, e.g.,
system, the unexplainable predictions or decisions output by
information about time-varying available resources, unfixed
the black-box model may cause huge losses to users. For
working periods and changeable participation willingness, it
example, 6G-supported self-driving relies on an on-vehicle
is still an open issue to design effective online-learning based
visual recognition model to determine whether the vehicle
incentive mechanisms to remove the impacts of both infor-
is running or stopped. Since the on-vehicle model has no
mation asymmetric and time-varying factors, and also ensure
interpretability, the driver cannot understand the decision of
efficient federated learning in 6G scenarios; ii) Considering
the vehicle model output. In 2018, the self-driving vehicle
heterogeneous and massive devices with diverse hardware
developed by Uber caused a car accident due to the wrong
equipment in 6G scenarios, the data quality of the devices is
output of the on-vehicle black-box model 1 . Therefore, in
diverse. But the data quality plays an important role in learning
the context of a complex network system, such as 6G, the
performance. It is a challenging problem how to design data
development of an interpretable FL model is the necessary
quality-based incentive mechanisms to motivate more devices
way to human-centric communication services.
with high-quality data to participate in federated learning and
obtain higher rewards for their high-quality data contributions,
thus improving both the system reliability and the learning
B. Efficient and Effective Federated Learning
performance [7], [43], [45].
1) Novel Asynchronous System: Even though the 6G
network can bring the advantage of extremely low latency D. Towards Personalized Federated Learning
to the FL systems, the communication overhead is still the
It is challenging for the FL system in the 6G network
bottleneck of the FL systems being widely used [14]. As
to provide users with personalized services. Prior studies
described in Section IV-A1, the two most commonly studied
[47]–[50] adopt different personalized techniques to provide
communication optimization schemes in distributed machine
users with real-time personalized services, which is a solid
learning systems are the batch synchronous method and the
step towards personalized FL. However, personalized FL still
asynchronous method (where the delay of the model update is
faces challenges from non-IID data, system heterogeneity,
assumed to be bounded) [28]. Indeed, asynchronous commu-
and network heterogeneity. Personalized service is a very
nication schemes involve scheduler, coordinator, worker, and
important part of the human-centric 6G services. Therefore,
updater, so there are several optimization problems for these
it is an interesting and meaningful topic for FL to seek novel
roles that can be considered in the future: i) how the sched-
ways to address the above challenges.
uler reasonably schedule the communication and computing
resources in the systems; ii) how the coordinator efficiently
control the working state and idle state of the devices; iii) VI. C ONCLUSION
how workers and updaters optimize hyperparameters for model In this article, we provided an overview of integrating
updates. These optimization problems are worth studying in federated learning into 6G communications. We discussed the
future work in order to develop novel asynchronous systems. requirements of 6G communication and core challenges of fed-
erated learning for 6G applications. For the above challenges,
1 http://tech.sina.com.cn/zt˙d/uberincident/ we provided a comprehensive introduction of the emerging
CHINA COMMUNICATIONS 9
advanced federated learning methods for 6G communications, [23] A. Souri, A. Hussien, M. Hoseyninezhad, and M. Norouzi, “A systematic
which including communication-efficient federated learning, review of iot communication strategies for an efficient smart environ-
ment,” Transactions on Emerging Telecommunications Technologies, p.
secure federated learning, and effective federated learning. e3736, 2019.
Finally, we outlined out a handful of open problems and [24] L. U. Khan, N. H. Tran, S. R. Pandey, W. Saad, Z. Han, M. N. Nguyen,
directions worth future research efforts. and C. S. Hong, “Federated learning for edge networks: Resource op-
timization and incentive mechanism,” arXiv preprint arXiv:1911.05642,
2019.
R EFERENCES [25] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim,
“Incentive design for efficient federated learning in mobile networks:
[1] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The A contract theory approach,” in 2019 IEEE VTS Asia Pacific Wireless
roadmap to 6g: Ai empowered wireless networks,” IEEE Communica- Communications Symposium (APWCS). IEEE, 2019, pp. 1–5.
tions Magazine, vol. 57, no. 8, pp. 84–90, 2019. [26] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim,
[2] Y. Xiao, G. Shi, and M. Krunz, “Towards ubiquitous ai in 6g with “Communication-efficient on-device machine learning: Federated dis-
federated learning,” arXiv preprint arXiv:2004.13563, 2020. tillation and augmentation under non-iid private data,” arXiv preprint
[3] K. David and H. Berndt, “6g vision and requirements: Is there any need arXiv:1811.11479, 2018.
for beyond 5g?” IEEE Vehicular Technology Magazine, vol. 13, no. 3, [27] N. H. Tran, W. Bao, A. Zomaya, N. M. NH, and C. S. Hong,
pp. 72–80, 2018. “Federated learning over wireless networks: Optimization model design
[4] S. Dang, O. Amin, B. Shihada, and M.-S. Alouini, “What should 6g and analysis,” in IEEE INFOCOM 2019-IEEE Conference on Computer
be?” Nature Electronics, vol. 3, no. 1, pp. 20–29, 2020. Communications. IEEE, 2019, pp. 1387–1395.
[5] S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning for [28] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman,
wireless communications: Motivation, opportunities and challenges,” V. Ivanov, C. M. Kiddon, J. Konen, S. Mazzocchi, B. McMahan,
arXiv preprint arXiv:1908.06847, 2019. T. V. Overveldt, D. Petrou, D. Ramage, and J. Roselander, “Towards
[6] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and federated learning at scale: System design,” in SysML 2019, 2019, to
D. Bacon, “Federated learning: Strategies for improving communication appear. [Online]. Available: https://arxiv.org/abs/1902.01046
efficiency,” arXiv preprint arXiv:1610.05492, 2016. [29] A. F. Atiya and A. G. Parlos, “New results on recurrent network
[7] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, “Incentive mech- training: unifying the algorithms and accelerating convergence,” IEEE
anism for reliable federated learning: A joint optimization approach transactions on neural networks, vol. 11, no. 3, pp. 697–709, 2000.
to combining reputation and contract theory,” IEEE Internet of Things [30] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, “Beyond
Journal, vol. 6, no. 6, pp. 10 700–10 714, 2019. inferring class representatives: User-level privacy leakage from federated
[8] Y. Shi, K. Yang, T. Jiang, J. Zhang, and K. B. Letaief, learning,” in IEEE INFOCOM 2019-IEEE Conference on Computer
“Communication-efficient edge ai: Algorithms and systems,” arXiv Communications. IEEE, 2019, pp. 2512–2520.
preprint arXiv:2002.09668, 2020. [31] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” in Advances
[9] Y. Liu, J. Peng, J. Kang, A. M. Iliyasu, D. Niyato, and A. A. A. El- in Neural Information Processing Systems, 2019, pp. 14 747–14 756.
Latif, “A secure federated learning framework for 5g networks,” arXiv
[32] L. Li, H. Xiong, Z. Guo, J. Wang, and C.-Z. Xu, “Smartpc: Hierarchical
preprint arXiv:2005.05752, 2020.
pace control in real-time federated learning system,” in 2019 IEEE Real-
[10] Y. Lin, S. Han, H. Mao, Y. Wang, and B. Dally, “Deep gradient compres-
Time Systems Symposium (RTSS). IEEE, 2019, pp. 406–418.
sion: Reducing the communication bandwidth for distributed training,”
[33] A. Portnoy and D. Hendler, “Towards realistic byzantine-robust feder-
in International Conference on Learning Representations, 2018.
ated learning,” arXiv preprint arXiv:2004.04986, 2020.
[Online]. Available: https://openreview.net/forum?id=SkhQHMW0W
[11] T. Huang, W. Yang, J. Wu, J. Ma, X. Zhang, and D. Zhang, “A survey on [34] S. Guo, T. Zhang, X. Xie, L. Ma, T. Xiang, and Y. Liu, “Towards
green 6g network: Architecture and technologies,” IEEE Access, vol. 7, byzantine-resilient learning in decentralized systems,” arXiv preprint
pp. 175 758–175 768, 2019. arXiv:2002.08569, 2020.
[12] R. Long, H. Guo, L. Zhang, and Y.-C. Liang, “Full-duplex backscatter [35] F. Ang, L. Chen, N. Zhao, Y. Chen, W. Wang, and F. R. Yu, “Robust
communications in symbiotic radio systems,” IEEE Access, vol. 7, pp. federated learning with noisy communication,” IEEE Transactions on
21 597–21 608, 2019. Communications, 2020.
[13] G. Yang, Q. Zhang, and Y.-C. Liang, “Cooperative ambient backscatter [36] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,
communications for green internet-of-things,” IEEE Internet of Things K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in
Journal, vol. 5, no. 2, pp. 1116–1130, 2018. Proceedings of the 2016 ACM SIGSAC Conference on Computer and
[14] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Communications Security, 2016, pp. 308–318.
Challenges, methods, and future directions,” IEEE Signal Processing [37] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated
Magazine, vol. 37, no. 3, pp. 50–60, 2020. learning: A client level perspective,” arXiv preprint arXiv:1712.07557,
[15] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, 2017.
“Reliable federated learning for mobile networks,” IEEE Wireless Com- [38] Y. Huang, Y. Su, S. Ravi, Z. Song, S. Arora, and K. Li,
munications, vol. 27, no. 2, pp. 72–80, 2020. “Privacy-preserving learning via deep net pruning,” arXiv preprint
[16] M. Giordani, M. Polese, M. Mezzavilla, S. Rangan, and M. Zorzi, “To- arXiv:2003.01876, 2020.
ward 6g networks: Use cases and technologies,” IEEE Communications [39] T.-D. Cao, T. Truong-Huu, H. Tran, and K. Tran, “A federated learning
Magazine, vol. 58, no. 3, pp. 55–61, 2020. framework for privacy-preserving and parallel training,” arXiv preprint
[17] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, arXiv:2001.09782, 2020.
and P. Fan, “6g wireless networks: Vision, requirements, architecture, [40] Z. Jiang, A. Balu, C. Hegde, and S. Sarkar, “Collaborative deep
and key technologies,” IEEE Vehicular Technology Magazine, vol. 14, learning in fixed topology networks,” in Advances in Neural Information
no. 3, pp. 28–41, 2019. Processing Systems, 2017, pp. 5904–5914.
[18] S. Nayak and R. Patgiri, “6g communication technology: A vision on [41] T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair resource allocation
intelligent healthcare,” arXiv preprint arXiv:2005.07532, 2020. in federated learning,” in International Conference on Learning
[19] M. Giordani, M. Polese, M. Mezzavilla, S. Rangan, and M. Zorzi, “To- Representations, 2020. [Online]. Available: https://openreview.net/
ward 6g networks: Use cases and technologies,” IEEE Communications forum?id=ByexElSYDr
Magazine, vol. 58, no. 3, pp. 55–61, 2020. [42] C. He, M. Annavaram, and S. Avestimehr, “Fednas: Feder-
[20] Y. Liu, J. J. Q. Yu, J. Kang, D. Niyato, and S. Zhang, “Privacy-preserving ated deep learning via neural architecture search,” arXiv preprint
traffic flow prediction: A federated learning approach,” IEEE Internet of arXiv:2004.08546, 2020.
Things Journal, pp. 1–1, 2020. [43] L. U. Khan, N. H. Tran, S. R. Pandey, W. Saad, Z. Han, M. N. Nguyen,
[21] S. Gu, J. Jiao, Z. Huang, S. Wu, and Q. Zhang, “Arma-based adap- and C. S. Hong, “Federated learning for edge networks: Resource op-
tive coding transmission over millimeter-wave channel for integrated timization and incentive mechanism,” arXiv preprint arXiv:1911.05642,
satellite-terrestrial networks,” IEEE Access, vol. 6, pp. 21 635–21 645, 2019.
2018. [44] J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo, “Deepchain:
[22] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint Auditable and privacy-preserving deep learning with blockchain-based
learning and communications framework for federated learning over incentive,” IEEE Transactions on Dependable and Secure Computing,
wireless networks,” arXiv preprint arXiv:1909.07972, 2019. 2019.
CHINA COMMUNICATIONS 10