Satellite-Based_Computing_Networks_with_Federated_Learning
Satellite-Based_Computing_Networks_with_Federated_Learning
Digital Object Identifier: Hao Chen, Ming Xiao (corresponding author) and Zhibo Pang are with the Royal Institute of Technology (KTH), Sweden;
10.1109/MWC.008.00353 Zhibo Pang is also with ABB Corporate Research Sweden.
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
with distributed data kept locally is one alterna-
LEO satellite
tive solution to data analysis, especially for large- constellation
scale ML models. As one important distributed ML Cloud
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
Machine learning for
LEO satellite LEO satellite
massively connected constellation constellation
connected UE to the
(a) (b)
cloud, and this generic
model can then been LEO satellite LEO satellite
constellation constellation
distributed and applied
to all UEs. Server
ISL Relay ISL
global model
Gateway Gateway
... ...
Terrestrial Terrestrial
network network
Edge Footprint Edge Footprint
intelligence intelligence
(c) (d)
FIGURE 2. ML in LEO-based SatCom networks: a) CL with satellites as relays; b) CL with computing servers
allocated at satellites; c) FL with satellites as relays; d) FL with satellites as the servers.
mAchInE LEArnIng And FEdErAtEd LEArnIng networks and then summarize benefits of FL in
Machine learning for massively connected IoT has SatCom.
traditionally been focused on centralized learning
(CL), where a powerful artificial neural network WhErE to LEArn?
(ANN) model is often trained by uploading all In LEO-based STNs, UEs may jointly learn a
raw data from each connected UE to the cloud, data-driven model, among which UEs are served
and this generic model can then been distributed by terrestrial base stations (TBSs) when they are
and applied to all UEs. Amazon Web Services, within the coverage of TBSs and can gain access
Google Cloud, and Microsoft Azure are the typ- via satellites when TBSs are not available and work
ical ML-as-a-service providers, where models can as relays to forward signals between UEs and LEO
be deployed and used on a large scale. Howev- satellites. In what follows, to simplify illustration,
er, these learning schemes face the challenges we specifically consider a SatCom that consists
of high communication costs and security. To of a LEO satellite constellation, a cloud, a server,
address the problems, FL has been proposed as and multiple UEs. Learning in such a wide area is
a communication-efficient and privacy-preserving established with the local data information of UEs.
paradigm of decentralized learning, where par- As depicted in Fig. 2, the four possible learning
ticipating UEs can collaboratively build a shared strategies in SatCom are elaborated as below. For
learning model while leaving the training data illustration, properties of four possibilities of inte-
local [10]. In particular, a UE computes its update grating ML in SatCom are summarized in Table 1.
to the current global model on its local training Mode 1 (Remote Cloud Learning): In Fig. 2a,
data and periodically transmits model parame- the central server is deployed in cloud, and raw
ters to a central server, where the local models data streams of UEs are then directly transmitted
are then aggregated into global models, which to the cloud server. Thus, a global model is devel-
are sent back according to aggregation strate- oped by classical CL in the cloud with centralized
gies such as the federated averaging algorithm processing and inference and then applied to UEs.
(FedAvg) [10]. This process is repeated until a tar- In this scenario, LEO satellites are utilized as relays
get accuracy of the learning model is reached. In to retransmit the traversed data stream. Mode 1 is
such a way, the user data privacy is well protect- straightforward in integrating into the current com-
ed since local training data are not shared, and munication system due to the conceptualization
the communication loads are relieved since only and development of hybrid STNs since 1964 [7].
model parameters are exchanged instead of a However, it inevitably incurs longer latency, which
large amount of raw data samples, which distin- is not a favorable scheme for real-time applications
guishes FL from conventional CL in data acquisi- since lower end-to-end latency is one of the main
tion, storage, training, and inference. technical concerns.
Mode 2 (Onboard Satellite Learning): Dif-
FEdErAtEd LEArnIng In sAtELLItE nEtWorks ferent from Mode 1, the computing servers are
In this section, we investigate the possible ways to deployed in satellites with close proximity to UEs
implement FL in LEO-based satellite constellation instead of remote cloud, as displayed in Fig. 2b.
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
Compared to Mode 1, the latency of this mode by Architecture
propagation delay and transmission time is reduced
Property
because communication between the cloud and Remote cloud On-board Federated Federated
satellites is avoided. Meanwhile, the probability of learning satellite learning learning 1 learning 2
information leakage is decreased due to the lower
Communication
number of communication hops. Thus, Mode 2 is Very high High Moderate Low
overheads
quite suitable for delay-sensitive applications such
as military communications. However, this scheme Deployment
requires that satellites are equipped with extensive Low Moderate Moderate High
costs
computation and storage hardware. By embed-
ding a dedicated server in the onboard satellite, Privacy and
Very low Low High High
one main potential drawback of this mode is eco- security
nomically expensive. Most importantly, intensive
Latency Very high High Moderate Low
onboard computation and training consume lots
of precious energy of satellites, which may not be Energy
practical when the energy supply of satellites is High Very high Low Moderate
consumption
strictly limited.
Mode 3 (Federated Learning 1): As shown Feasibility High Moderate Very high High
in Fig. 2c, SatCom builds a generic model via FL TABLE 1. Comparison between the four potential modes.
without raw data sharing. Although the parameter
server is still allocated in the remote cloud as in
Mode 1, the protection of data privacy and secu- to accept a burst of UEs to accelerate the ML pro-
rity of UEs can be significantly enhanced by Mode cess or directly ignore UEs with poor or limited
3. Since only model parameters are transmitted connectivity. Meanwhile, UEs in SatCom have a
among UEs, satellites, and cloud, the sheer bulk high level of flexibility to participate in or leave the
of raw data is not exchanged in the networks. local processing and cooperation among them.
Thus, Mode 3 dramatically improves latency and Third, data privacy and security of individual enti-
communication overheads. Further, it is one of the ties in SatCom are enhanced. Training and pre-
most flexible and robust modes since UEs have dicting can be completed without sharing data.
more flexibility to decide on participating in build- Thus, sensitive data of UEs will not be exposed or
ing the global model whenever a straggler event exported to servers or even third parties. Finally,
happens due to the disconnection of UEs or poor communication efficiency in FL-based SatCom is
wireless connection. However, this mode leads improved compared to those of CL-based archi-
to relatively higher deployment costs in UEs than tectures.
CL-based approaches, since the capabilities of
local computing and training are prerequisites, and Performance Evaluation
learning tasks are distributed to UEs. The iterative To assess the performance of the proposed dis-
process of FL may also increase the cumulative tributed learning strategies in SatCom networks,
communication overhead, although in most cases we consider a SatCom similar to the example
it is still much lower than CL-based strategies. Over- network shown in Fig. 2. A typical Iridium sys-
all, Mode 3 is quite suitable for practical implemen- tem consisting of 66 LEO satellites is utilized to
tation and maintenance. emulate the LEO constellation to provide cellu-
Mode 4 (Federated Learning 2): When it is lar-like service in areas where terrestrial cellular
feasible to deploy parameter servers onboard sat- service is unavailable. A total of N = 100 UEs
ellites and run FL without data sharing, we have identically equipped with computing and mem-
Mode 4, as illustrated in Fig. 2d. As we can see ory are assumed to be dispersed on the under-
from Table 1, communication overheads, informa- served areas for training a model of interest with
tion leakage, and latency are the lowest among all the following parameters: the propagation delay
strategies because the parameter server is not only between directly connected LEO satellite and
closer to UEs, but the number of visited intermedi- UE/cloud is assumed to be 5 ms, and the con-
ate network nodes (i.e., satellite, gateway, etc.) is sidered ISL delay follows the uniform distribution
negligible. It is also insignificant from the current and is in the range of [5, 15] ms; the data rate for
architecture on the burden of the remote cloud. upstream and downstream in SatCom could be
Note that onboard satellite aggregation in Mode up to 8 Mb/s [14]. The well-known MNIST data-
4 evokes more energy consumption compared to set [10] is used to train the FL model for digit rec-
Mode 3, but is more energy-efficient than Mode 2. ognition tasks. The dataset consists of 10 classes
of 28 28 gray-scale images of handwritten dig-
What Can FL Bring in LEO-Based SatCom? its with 60,000 training samples and 10,000 test
In LEO SatCom networks, UEs from multiple areas samples, respectively. During model training, the
(e.g., forests and urban woodlands) may collabo- training data are partitioned over N blocks, each
ratively build a shared learning model (fire mon- of which is only viable at one of the UEs that are
itor) via the FL framework. Similar to the current independently and identically distributed. FedAvg
operating application of Gboard on Android [13], [10] is chosen for the FL implementation. Thus,
FL in SatCom may have the following advantages. the loss function f(wt) of the global model at the
First, FL-based SatCom has fast response. Data tth communication round can be calculated by
processing and learning are at the terminal of the
N
SatCom and closer to the UEs, thereby reducing n
the response time of decision making. Second, f (w t ) = ∑ ni Fi (wti ),
compared to the CL approach, the parameter i=1 (1)
server in FL is flexible enough to decide whether where N is the total number of involved UEs in
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
CHEN1_3.23.pdf 1 3/19/22 12:46 AM
one FL task, n is the total number of data samples to the opportunistic offline or slow or expensive
for all involved UEs, and wt represents the global connections, only 10 percent of UEs are random-
FL model parameter, while ni, wit, and Fi(wit) are ly selected to join in the learning process in each
the number of data samples, the local FL model learning round. For the target model, we consider
parameter, and the loss function for the ith UE, a simple convolutional neural network (CNN) with
respectively. Considering the slow response due two 5 5 convolutional layers (the first with 10
channels, the second with 20, and each followed
by ReLU function), a fully connected layer with 320
units and ReLU activation, and a final softmax out-
put layer (8480 total parameters). For fair compar-
ison, the hyperparameters for learning approaches
are tuned and kept the same in different experi-
ments. Our model was simulated by PyTorch in
Python, and experiments were carried out on an
Intel CPU @2.3 GHz (16 GB RAM) laptop.
We first plot in Fig. 3a the communication
loads of FL-based architectures (i.e., Mode 3 and
Mode 4) vs. CL-based architectures (i.e., Mode 1
and Mode 2) in terms of communication rounds.
Communication rounds are defined as the cumu-
lative upload number of communication. Com-
munication loads are measured by the number of
transmitted packets during model training. For the
CL-based approach, the communication load is
equal to the total number of packets for transmit-
(a) ting a raw data stream, while it is the accumulative
number of packets for transmitting model parame-
ters in the FL-based approach, where it is propor-
tional to the number of participating UEs and the
number of communication rounds. It is observed
that communication loads of CL-based approaches
keep constant as communication rounds increase,
and they increase gradually in cases of Mode 3 and
Mode 4. This is due to the fact that in CL-based
ways (i.e., Mode 1 and Mode 2), raw data are
uploaded into a centralized entity (i.e., either cloud
or onboard satellite) before learning, while Mode
3 and Mode 4 refer to an iterative process with
model parameter sharing in each communication
round. It is also noted that the proposed FL-based
approaches can significantly reduce the commu-
nication loads compared to traditional CL-based
approaches in SatCom.
In Figs. 3b and 3c, we evaluate the learning
(b) performance of the proposed FL-based architec-
tures with respect to communication time (laten-
cy) as well as number of local epochs. We define
an epoch as the number of training passes each
UE makes over its local dataset on each commu-
nication round (denoted as E in figures) and com-
munication time as the sum of transmission time
of uploading and downloading model parame-
ters and propagation time. In fact, the link speed
among UE and satellites is sufficiently high (e.g.,
400 Mb/s in OneWeb [15]) that the transmission
time is minor and can be negligible. In such cases,
the communication time is mainly determined by
the propagation time. For two proposed potential
modes in SatCom, training loss and test accura-
cy are evaluated in Figs. 3b and 3c, respectively.
As can be seen from Fig. 3b, the convergence
speed of the proposed Mode 3 (i.e., FL 1 in fig-
ures) is slower than that of Mode 4 (i.e., FL 2 in
(c) figures). This is because that in each communica-
tion round, Mode 3 suffers from extra communi-
FIGURE 3. Performance of FL in LEO-based satellite communication networks cation costs such as ISL delay and communication
on the MNIST dataset: a): communication overhead comparison between time among cloud and LEO satellites compared
CL and FL; b) training loss vs. communication time for many local epochs to Mode 4. Also, it is demonstrated in Fig. 3b that
(large E); c) test accuracy vs. communication time for many local epochs by increasing E to add more local updating of UE
(large E). on each communication round, the communica-
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
tion time can be significantly reduced. In terms of uation indicates that the proposed integration of Performance evalua-
test accuracy, we observe the same learning per- FL under LEO satellite constellation has practical tion indicates that the
formance as well. This proves that the proposed accuracy on the MNIST dataset, and communica- proposed integration
mechanisms can foster the availability and low- tion overheads of FL-based approaches are much of FL under LEO sat-
cost deployment of FL in LEO satellite constella- smaller than CL-based approaches. In addition, we
tions and alleviate their long communication time have noticed several challenges regarding privacy ellite constellation has
to build delay-sensitive learning. and security, resource management, and commu- practical accuracy on
nication overheads in future research. the MNIST dataset,
Challenges in Future Research and communication
For higher connectivity and bandwidth-limited Acknowledgments overheads of FL-based
communications, the FL-based computing net- This work was supported by the ERA-NET Smart
works in SatCom proposed in this article may face Energy Systems SG+ 2017 Program SMART-MLA approaches are much
new technical challenges. with project number 89029 (and SWEA number smaller than CL-based
42811-2), the Swedish Research Council Project approaches.
Privacy and Security Coding for Large-Scale Distributed Machine Learn-
Although privacy and security have been among ing, the Swedish Foundation for International Coop-
the initial objectives of adopting FL in SatCom eration in Research and Higher Education (STINT),
as a pertinent solution, the distributed character- project Efficient and Secure Distributed Machine
istic has raised additional issues to be addressed, Learning with Gradient Descent, and the FORMAS
such as revealing sensitive information via poi- project Intelligent Energy Management in Smart
soning local data and shared models. There are Community with Distributed Machine Learning”
lots of challenges, although recent efforts have number 2021-00306. Zhibo Pang’s work is part-
adopted different privacy-based approaches. For ly funded by the Swedish Foundation for Strategic
instance, when differential privacy is introduced, Research (SSF) through project APR20-0023.
various levels of artificial noise are also injected
to improve privacy, which directly degrades the References
[1] “Internet of Things — Number of Connected Devices World-
accuracy of the built model. Further, if a malicious wide 2015–2025,” 2016; https://www.statista.com/statis-
server exists, the sensitive information of UEs can tics/471264/iot-number-of-connecteddevices- worldwide/.
still be leaked. As a result, it is urgently demanded [2] S. Chen et al., “Vision, Requirements, and Technology Trend
to develop a robust privacy-preserved and secure of 6G: How to Tackle the Challenges of System Coverage,
Capacity, User Data Rate, and Movement Speed,” IEEE
system, where formal privacy and security are Wireless Commun., vol. 27, no. 2, Apr. 2020, pp. 218–28.
guaranteed with limited accuracy loss. [3] K. B. Letaief et al., “The Roadmap to 6G: AI-Empowered
Wireless Networks,” IEEE Commun. Mag., vol. 57, no. 8,
Resource Management Aug. 2019, pp. 84–90.
[4] O. Kodheli et al., “Satellite Communications in the New
Most LEO satellite resources are underutilized in Space Era: A Survey and Future Challenges,” IEEE Commun.
current networks due to periodic change of the Surveys & Tutorials, vol. 23, no. 1, 2021, pp. 70–109.
coverage of the satellite. It is thus inevitable to [5] Z. Qu et al., “LEO Satellite Constellation for Internet of
maximize resource utilization, particularly under Things,” IEEE Access, vol. 5, 2017, pp. 18,391–,401.
[6] “Why in the Next Decade Companies Will Launch Thou-
scarce satellite resources with limited spectrum sands More Satellites than in All of History,” 2019; https://
and orbits. Under such circumstances, software www.cnbc.com/2019/12/14/spacex-oneweb-and-ama-
defined networking (SDN), network function virtu- zon-tolaunch- thousands-more-satellites-in-2020s.html.
alization (NFV), and ML offer possibilities to break [7] X. Fang et al., “5G Embraces Satellites for 6g Ubiquitous IoT:
Basic Models for Integrated Satellite Terrestrial Networks,”
through the above difficulty. Dynamic resource IEEE IoT J., vol. 8, no. 18, 2021, pp. 14,399–417.
management by leveraging an organic combina- [8] Z. Zhang, W. Zhang, and F.-H. Tseng, “Satellite Mobile Edge
tion of SDN and NFV can be more flexible and Computing: Improving Qos of High-Speed Satellite-Terres-
reconfigurable in real events. trial Networks Using Edge Computing Techniques,” IEEE
Network, vol. 33, no. 1, Jan./Feb. 2019, pp. 70–76.
[9] P. Voigt and A. Von dem Bussche, The EU General Data
Communication Reduction Protection Regulation (GDPR), A Practical Guide, 1st ed.,
Although current satellites provide much more Springer, vol. 10, 2017, p. 3,152,676.
bandwidth than those of the past few years, [10] H. B. McMahan et al., “Federated Learning: Strategies for
Improving Communication Efficiency,” Proc. 20th Int’l. Conf.
their bandwidth still cannot be comparable to Artif. Intell. Stat., 2017.
terrestrial media, in particular, optic fibers. Due [11] A. D. Panagopoulos et al., “Satellite Communications at Ku,
to the feature of iterative model updates, com- Ka, and v Bands: Propagation Impairments and Mitigation
munication overhead will be still a major issue Techniques,” IEEE Commun. Surveys & Tutorials, vol. 6, no.
3, 2004, pp. 2–14.
in satellite-based computing networks with FL. [12] M. Giordani and M. Zorzi, “Satellite Communication at
To achieve pertinent, sustainable, and efficient Millimeter Waves: A Key Enabler of the 6G Era,” Proc. 2020
FL-based solutions, a way to communication effi- Int’l. Conf. Comput. Net. Commun., 2020, pp. 383–88.
ciency of FL needs to be paved, like promising [13] A. Hard et al., “Federated Learning for Mobile Keyboard
Prediction,” arXiv preprint arXiv:1811.03604, 2018.
strategies of communication compression tech- [14] R. Goyal et al., “Analysis and Simulation of Delay and Buffer
nique via quantization or sparsification. Requirements of Satellite-Atm Networks for TCP/IP Traffic,”
arXiv preprint cs/9809052, 1998.
Conclusions [15] “OneWeb’s Low-Earth Satellites Hit 400 Mbps and 32 Ms
Latency in New Test,” 2019; https://arstechnica.com/infor-
FL over wireless communication networks can dra- mation-technology/2019/07/onewebslow-earth-satellites-
matically improve learning performance to satisfy hit-400mbps-and-32ms-latency-in-new-test/.
ever increasing requirements in data privacy and
communication overheads in future 6G networks. Biographies
We have discussed the role of FL in addressing the Hao Chen ([email protected]) received his B.Sc. degree in com-
munication engineering and M.Sc. degree in electronic and
challenges in LEO-based SatCom. The proposed communication engineering from Soochow University, China,
schemes consider the preliminary combination of in 2014 and 2017, respectively. He is currently pursuing a Ph.D.
FL and LEO satellite systems. Performance eval- degree with the School of Electrical Engineering and Computer
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.
Science, Royal Institute of Technology (KTH), Stockholm, Swe- Z hibo P ang [M’13, SM’15] ([email protected])
den. His current research interests include distributed machine received his M.B.A. in innovation and growth from the Uni-
learning, distributed optimization, and edge computing. versity of Turku in 2012 and PhD in Electronic and Computer
Systems from KTH in 2013. He is currently a senior principal
M ing X iao [S’02, M’07, SM’12] ([email protected]) received his scientist at ABB Corporate Research Sweden, and adjunct
Bachelor’s and Master’s degrees in engineering from the Univer- professor at the University of Sydney and KTH. He is Co-Chair
sity of Electronic Science and Technology of China, Chengdu, of the IEEE Technical Committee on Industrial Informatics. He
in 1997 and 2002, respectively. He received his Ph.D. degree is an Associate Editor of IEEE Transactions on Industrial Infor-
from Chalmers University of Technology, Sweden, in November matics, the IEEE Journal of Biomedical and Health Informatics,
2007. From 1997 to 1999, he worked as a network and soft- and the IEEE Journal of Emerging and Selected Topics in Indus-
ware engineer at ChinaTelecom. From 2000 to 2002, he also trial Electronics. He was an Invited Speaker at the 2018 Gor-
held a position in the Sichuan communications administration. don Research Conference on Advanced Health Informatics,
From November 2007 to now, he has been in the Department General Chair of IEEE ES 2017, and General Co-Chair of IEEE
of Information Science and Engineering, School of Electrical WFCS 2021. He was awarded the 2016 Inventor of the Year
Engineering and Computer Science, KTH, where he is currently Award and 2018 Inventor of the Year Award by ABB Corpo-
an associate professor. rate Research Sweden.
Authorized licensed use limited to: Deakin University. Downloaded on July 22,2024 at 02:43:59 UTC from IEEE Xplore. Restrictions apply.