Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach
Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
1
Abstract—Internet of things (IoT) computing offloading is a example, virtual reality and HD video streaming require
challenging issue, especially in remote areas where common a large amount of computing resources for rendering and
edge/cloud infrastructure is unavailable. In this paper, we present video encoding/decoding, and the autonomous vehicles rely
a space-air-ground integrated network (SAGIN) edge/cloud
computing architecture for offloading the computation-intensive on computing for artificial intelligence (AI)-based steering
applications considering remote energy- and computation- control. These computation-intensive applications pose great
constraints, where flying unmanned aerial vehicles (UAVs) challenges on the battery and computing capabilities of
provide near-user edge computing and satellites provide access to the resource-constrained end devices, especially the IoT
the cloud computing. Firstly, for UAV edge servers, we propose devices, which motivates the cloud computing in which
a joint resource allocation and task scheduling approach to
efficiently allocate the computing resources to virtual machines computation-intensive applications are offloaded to the cloud
and schedule the offloaded tasks. Secondly, we investigate the servers with centralized and abundant computation resources.
computing offloading problem in SAGIN and propose a learning- Although cloud computing can significantly reduce the
based approach to learn the optimal offloading policy from computation delay and the energy consumption of the
the dynamic SAGIN environments. Specifically, we formulate users, it may fail to meet the demands of delay sensitive
the offloading decision making as a Markov decision process
where the system state considers the network dynamics. To cope applications, such as mobile gaming and augmented reality,
with the system dynamics and complexity, we propose a deep since the long transmission distances between end users
reinforcement learning-based computing offloading approach to and the cloud servers result in long transmission delays.
learn the optimal offloading policy on-the-fly, where we adopt To address this issue, mobile edge computing (MEC) has
the policy gradient method to handle the large action space been extensively investigated, where the computing resources
and actor-critic method to accelerate the learning process.
Simulation results show that the proposed edge virtual machine in the network edge are employed to provide efficient
allocation and task scheduling approach can achieve near- and flexible computing services. In 5G wireless systems,
optimal performance with very low complexity, and that the ultra-dense network edge devices will be deployed, such
proposed learning-based computing offloading algorithm not only as macro/small cell base stations and WiFi access points
converges fast, but also achieves a lower total cost compared with which can provide exponentially growing amount of edge
other offloading approaches.
computing resources. Many significant issues in MEC have
Index Terms—Computing offloading, edge computing, space- been extensively investigated, including offloading task model
air-ground, IoT, reinforcement learning [3], [4], energy efficiency [5], [6], [7], latency reduction
[8], [9], [10], and joint optimization of communication and
I. I NTRODUCTION computing [11], [12].
With the rapid development of 5G networks and Internet However, 5G networks may fail to provide ubiquitous
of things (IoT), a myriad of promising applications and coverage to suburban and rural areas, where IoT devices
services have emerged, such as virtual reality, HD live could be widely deployed to execute certain applications
streaming, autonomous driving, industry automation, smart with relatively high computing requirements. For example,
home, and so forth, which reap the benefits provided by the fusion of sensing information, especially the handling of
5G networks, such as ultra-high data rate, low latency, high high-definition sound or video information, will quickly drain
reliability, and massive connections [1], [2]. However, besides the battery of the sink nodes and result in large processing
efficient and reliable communication, a wide spectrum of delays. Due to the lack of terrestrial access network coverage,
applications also require massive computing capabilities. For the typical edge and cloud computing paradigms cannot be
applied in such scenarios. To this end, we propose to employ
N. Cheng is with the School of Telecommunication, Xidian University, the space-air-ground integrated network (SAGIN) architecture
Xian, China, and the Electrical and Computer Engineering Department,
University of Waterloo, Waterloo, ON, N2L3G1, Canada (email: for the computing offloading of remote IoT applications.
[email protected]). SAGIN integrates the satellite network and aerial network
F. Lyu, C. Zhou, W. Shi, and X. Shen are with the Electrical and Computer with the terrestrial network to provide seamless and flexible
Engineering Department, University of Waterloo, Waterloo, ON, N2L 3G1,
Canada (emails: {f2lyu,c89zhou,w46shi,sshen}@uwaterloo.ca). network coverage and services to large areas, and thus can
W. Quan is with the School of Electronic and Information Engineering, be applied in many promising fields, such as intelligent
Beijing Jiaotong University, Beijing, China. (email: [email protected]). transportation system, remote area monitoring, disaster rescue,
W. Quan is the corresponding author.
H. He is with the School of Information Engineering from Zhejiang and large-scale high-speed mobile Internet access [13]. SAGIN
University, Hangzhou, China (email: hongli [email protected]). is a multidimensional heterogeneous network consisting three
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
2
network segments, i.e., the satellite network, aerial network, approach to efficiently solve the problem. The system
and terrestrial network. Each network segment possesses state is defined to integrate the historical network
different resources and is affected by different limitations. The information to learn the system dynamics. In addition,
Low Earth Orbit (LEO) and geostationary (GEO) satellites a policy gradient-based actor-critic learning algorithm is
constitute a hierarchical network where LEO satellites provide proposed to cope with problem of dimensionality curse
high-speed access and GEO satellites relay the data between and accelerate the learning speed.
LEO for long distance transmission [14]. The aerial network, • We adopt network virtulization to flexibly allocate the
including flying unmanned aerial vehicles (UAVs), high resources of the edge server. We formulate the joint
latitude platforms (HAPs), and communication balloons, can edge server VM computation resource allocation and
be deployed on demand at locations with burst data traffic task scheduling problem as a mix-integer programming
to offer high-speed and dynamic network services, such problem, and propose an effective heuristic algorithm to
as dynamic coverage, edge computing, crowdsensing, etc. solve it.
[15], [16]. In the proposed SAG-IoT computing offloading • The performance of the proposed approaches are
architecture, the aerial network nodes can serve as the flying evaluated through extensive simulations. The joint
edge servers, which provides the IoT devices with the low- VM allocation and task scheduling can achieve near-
delay edge computing. On the other hand, the satellite optimal performance with low complexity. In addition,
communication, although may have lower communication the performance of the proposed RL-based computing
rate and higher transmission delay, can provide always-on offloading approach is evaluated with respect to design
cloud computing through seamless coverage and satellite parameters.
backbone networks [17]. However, employing the SAGIN The remainder of the paper is organized as follows. In Section
in IoT computing offloading introduces several challenging II, we present the related work. Section III describes the
issues. Firstly, the high mobility of the aerial network results in system model. In Section IV, the joint edge VM allocation
dynamic channel conditions and coverage, leading to varying and task scheduling problem is formulated and solved. Section
server availability and communication delay, which should be V formulates the SAG-IoT computing offloading problem,
carefully handled to guarantee the performance of the SAG- followed by the RL-based solution in Section VI. Section VII
IoT system. Secondly, different network segments in SAGIN evaluates the proposed approaches, and Section VIII concludes
possess distinct network conditions and resource constraints, the paper. Useful notations used throughout the paper are listed
and it is non-trivial to design an efficient computing offloading in Table I.
approach considering the complex and dynamic network
conditions and resources. II. R ELATED W ORK
In this paper, we present a flexible joint communication
and computation SAGIN framework to provide powerful A. Mobile Edge Computing
edge/cloud computing services to remote IoT users. Under The concept of MEC was originally proposed by ETSI
the framework, we propose an efficient computing offloading in [18], in which the motivation, definition, architecture,
approach which learns on-the-fly the optimal offloading policy and challenging issues are discussed. In edge computing,
to minimize the weighted sum of delay, energy consumption, the computation task offloading mechanism determines the
and server usage cost, considering the multidimensional overall performance of the MEC system. The energy-efficient
network dynamics and resource constraints. Firstly, the UAV computation offloading is crucial for energy-constraint IoT
edge servers’ computation resources are virtualized as virtual devices, and has been studied in [5] and [6]. In [5],
machines (VMs) for parallel execution of the offloaded Mahmoodi et al. studied the joint scheduling and computation
tasks. We formulate the joint VM resource allocation and offloading problem and proposed a real data measurement
task scheduling problem as a mixed-integer programming based optimization method to save the energy consumption
problem and propose an efficient heuristic algorithm to of the mobile users. In [6], Mao et al. proposed a Lyapunov
solve it. Secondly, we investigate the computing offloading method-based dynamic computation offloading for devices
problem in SAGIN, which is formulated as a Markov decision with energy harvesting. The execution cost which jointly
process (MDP). To learn the network dynamics, a model-free considers the execution latency and task failure is taken as the
reinforcement learning (RL)-based approach is proposed, and performance metric. In MEC system, the energy consumption
an actor-critic learning algorithm is designed to handle the and task delay rely not only on the task processing, but also on
large state and action spaces. To the best of our knowledge, the communication of the related data of the task. Therefore,
our work is the first work to study the computing offloading the joint optimization of the communication radio resources
problem in SAGIN, which validates the feasibility of SAGIN and the computing offloading has attracted much research
supporting computation-intensive applications for remote IoT attention [11], [12]. In [11], You et al. studied the resource
users, and can provide useful guidelines for SAGIN network allocation for multiuser MEC offloading problem considering
design and remote computing offloading. TDMA and OFDMA scenarios. In [12], Wu et al. studied the
The main contributions of the paper can be summarized as multi-access-assisted computing offloading, and presented a
follows. joint optimization of computation task scheduling and radio
• We formulate the SAG-IoT computing offloading resource allocation. However, these works only focus on the
problem as an MDP and propose an RL-based fixed MEC scenario, i.e., the edge computing services are
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
3
TABLE I
N OTATIONS USED IN THE PAPER .
Notation Description
M Number of IoT users
N Number of IoT applications
Cl, Ce, Cc The computation resources of local IoT users, the edge servers, and the cloud server
El The local process power consumption of IoT users
Eie , Eic The transmission power of IoT users to UAV and satellite
e c
Bij , Bij The usage cost of task Wij in edge server and cloud server
B Bandwidth of UAV-ground communication
Hjin , Hjout The size of input data and output data of j-th application
Zj The computing requirement of j-th application
M (t), mij (t) Remaining tasks matrix and the element corresponding to task Wij
X l (t), X e (t), X c (t) Offloading decision matrix corresponding to local process, offload to UAV, and offload to the cloud
xlij (t), xeij (t), xcij (t) The element of X l (t), X e (t), X c (t) corresponding to task Wij
α, β The weight of UAV-edge and cloud server usage cost over the IoT user energy consumption
$i The weight of delay over energy consumption and server usage cost for IoT user i
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
4
for transmission to UAV and satellite is denoted by Eie and Eic , the processing rate for each task is denoted by C c , and the
c
respectively. In the edge servers, i.e., UAVs, the computing usage cost for user i’s task j is denoted by Bi,j .
resources are virtulized as VMs, each for one specific
application [24]. In edge server k, the total computation B. Multi-user Multi-task SAG-IoT Computing Offloading
resource is C e , the resources allocated to the computation
We consider that there are M IoT users and N different
VM v is denoted by Cve , and the server usage cost of the
e computation applications, and each user is running all N
computation VM for user i’s task j is denoted by Bi,j . For
applications, leading to M × N computation tasks in the
the UAV-ground communication, since we consider the task
system. We also consider that the N applications have certain
offloading decision making, which is with much longer time
priorities, in the way that if multiple tasks are scheduled
scale than traditional resource scheduling time (1 ms), only
simultaneously, the task with smaller application number
large-scale channel fading is considered. In addition, since the
will be transmitted/processed earlier than those with larger
instantaneous channel information is not required, a satellite
application numbers. For j-th application, the size of the input
controlled global decision making is feasible. According to
data, the output data, and the workload are denoted by Hjin ,
[25], the pathloss between the UAV and the ground users
Hjout and Zj , respectively. These tasks can be executed locally
follows
1
at the IoT devices. However, due to the limited energy and
4πfc (h2 + r2 ) 2 computing capability of IoT devices, the computing tasks can
L(r, h) = 20 log + PLoS (r, h)ηLoS
c also be offloaded to the UAV edge servers or further to the
+ (1 − PLoS (r, h))ηN LoS , (1) cloud through the satellites. The offloading decision is made in
each time slot until all the M × N tasks are completed. At the
where h and r denote the UAV flying altitude and the
beginning of time slot t, the remaining tasks are denoted by a
horizontal distance between the UAV and the ground user,
M ×N matrix M(t), where the element mi,j (t) = 1 indicates
respectively. ηLoS and ηN LoS denote respectively the additive
task Wij has not completed, and mi,j (t) = 0 otherwise.
loss incurred on top of the free space pathloss for LoS and
Denote decisions of locally processing the tasks, offloading
NLoS links [26]. fc denotes the carrier frequency, and c
the tasks to edge, and offloading the tasks to cloud at time slot
denotes the speed of light. PLoS is the line-of-sight probability
t by M × N matrices Xl (t), Xe (t), and Xc (t), respectively,
of UAV-ground link, which can be calculated by
and each binary element xlij (t), xeij (t), and xcij (t) indicates
1 whether task Wij is processed locally, offloaded to the edge,
PLoS (r, h) = . (2)
1 + a exp(−b(arctan( hr ) − a)) or offloaded to the cloud, respectively. Note that task Wij can
be scheduled to at most one means at time t, i.e., the offloading
(a, b, ηLoS , ηN LoS ) are environment-dependent variables. For
decision is constrained by
instance, in remote areas, their values are (4.88, 0.43, 0.1,
21) [27]. In addition, the UAV-ground communication uses xlij (t), xeij (t), xcij (t) ∈ {0, 1}, (6)
WiFi protocols with total bandwidth B. If n IoT devices
xlij (t) + xeij (t) + xcij (t) ≤ mij (t). (7)
communicate with a UAV simultaneously, the bandwidth each
IoT device obtains is calculated by The inequality in (7) holds when an unfinished task is not
scheduled at time slot t. If the task Wij is processed locally
Bi = ρBξ(n) (3)
or offloaded to the cloud at time t, we consider the task can be
where ρ is the WiFi throughput efficiency factor, and ξ(n) is finished with a certain delay, and mi,j (t + 1) = 0. However,
the WiFi channel utilization function which is a decreasing if Wij is offloaded to the UAV edge server, it may not be
function of contenting user number n. Thus, the instant UAV- completed and return to user i successfully at the end of t,
ground and ground-UAV data rate can be calculated by which is due to two reasons. Firstly, if multiple tasks are
offloaded to one UAV edge server, some of them may not
Eie 10−Li /10 be able to be completed within the time slot; secondly, since
rGU = ρBξ(n) log2 (1 + ), (4)
σ2 the UAVs are moving, when task Wij is completed in the edge
and server, the result cannot be transmitted to user i if user i is
Eie− 10−Li /10 out of the coverage area of the UAV.
rU G = ρBξ(n) log2 (1 + ), (5)
σ2
respectively, where Eie− denotes the UAV transmit power C. Cost Model
to ground IoT users, Li denotes the pathloss for the The computing task offloading is to minimize the system
corresponding IoT user-UAV link, and σ 2 denotes the power cost of executing the M × N tasks. In the considered SAG-
of the Gaussian noise. For the satellite-ground communication, IoT system, the system cost is composed of two parts, i.e., the
we consider a constant communication data rate rSG , which delay cost and the energy and server usage cost.
is usually smaller than the UAV-ground date rate. The 1) Delay cost: If the task Wij is scheduled at time slot t, the
satellite is connected to the Internet/cloud through the satellite delay can be calculated according to the offloading decision.
backbone network. We denote the transmission rate between If the task is scheduled to process locally, the delay is
the satellite and the cloud by rSC . The cloud has much higher Zj
computing capability than IoT devices and edge servers, and Tijl = ε(t − 1) + tlr,i + , (8)
Cl
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
5
where ε is the length of the time slot, and ε(t−1) is the elapsed VM1
t11 t12 t13
time since the generation of the task. Due to the low computing
capability of IoT devices, it is likely that at the beginning 0
of time slot t, there are some tasks which are scheduled to
locally process yet not finished. tlr,i is the time for user i VM2
UAV edge t21 t22
to complete the remaining local processing tasks, which can server
be calculated by the remaining local workload divided by the 0
local processing capability C l . If the task is offloaded to the
UAV edge server, and the result is returned to user i within Fig. 2. An example of joint VM allocation and task scheduling for UAV edge
server.
time slot t, the total delay of the task can be calculated by
Pj
xei,a (t)Hain Hjout
Tij = ε(t − 1) + dij + a=1
e e
+ . (9) the UAV quickly, and thus executing such tasks may lead
rGU rU G
to excessive resource allocated to the corresponding VM.
where deij denotes the processing delay of Wij in the UAV For example, in Fig. 2, two VMs are considered to execute
edge server, which depends on the offloading decision and the offloaded tasks to the UAV edge server, and ti,j is the
VM resource allocation in the server as described in Section delay requirement of task j in VM i. We can see that the
IV. If multiple tasks of user i are scheduled to the edge server, delay requirement t2,1 is very strict and a larger amount of
P j e in
a=1 xia (t)Ha calculates the time for transmitting Wij task computation resource should be allocated to VM2 to finish the
data to the server considering the transmission of tasks with corresponding task before deadline. However, since the total
higher priorities. Similarly, if the task is offloaded to the cloud computation resource of an edge server is fixed, it is likely
through the satellite, the delay is calculated by that little resource allocated to VM1, and none of the three
Zj Hjin + Hjout Hjin + Hjout tasks in VM1 can be finished in time. Therefore, we jointly
Tijc = εt + + + . (10) optimize the VM allocation and task scheduling in the UAV
Cc rSG rSC
edge server to reduce the system sum delay.
2) Energy and server usage cost: The energy cost of locally
In the considered problem, there are multiple kinds of
processing Wij can be calculated by
applications (Apps), denoted by A = {1, . . . , N }, and one
Zj UAV edge server with computation capability C cycles/s1 . For
Llij = E l (11)
Cl m-th App, there might be multiple offloaded tasks, denoted by
If at time slot t, task Wij is offloaded to the UAV edge server Tm = {1, . . . , Nm }, which has same computation workload
and the result is successfully transmitted to user i, the energy but different maximum delay requirements. Note that Zm
and server usage cost can be calculated by denotes the computation workload of m-th App’s tasks. C =
v {cm | m ∈ A} denotes the computation resource variables,
X Hjin where cm is the computation resource allocated to the VM
Leij = Eie xeij (t) e
+ αBij , (12)
t=1
rGU (t) executing App m. Y = {ym,n | m ∈ A, n ∈ Tm } denotes
the decision variables on task execution, where ym,n = 1 if
where α represents the weight of the UAV server usage cost
Pv Hjin task n of App m is scheduled and executed, and ym,n = 0
e
over the IoT user energy consumption. t=1 xij (t) rGU (t) otherwise. Therefore, our sum delay minimization problem can
calculates the total energy consumption considering the case be formulated as follows.
in which former times of offloading of the task to a UAV edge
server failed to return within the scheduling slot. Similarly, if Nm
N X
" n
#
task Wij is offloaded to the cloud, the energy and server usage X X Zm
min ym,n ym,k + ε (1 − ym,n )
cost can be calculated by C,Y
m=1 n=1
cm
k=1
n
Eic Hjin X Zm
Lcij = c
+ βBij , (13) s.t. C1 : ym,k 6 tm,n , ∀m ∈ A, ∀n ∈ Tm
rSG cm
k=1
where β denotes the weight of cloud server usage cost over XM
the IoT user energy consumption. C2 : cm 6 C
m=1
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
6
connection with the UAV. C1 restricts the maximum delay for Algorithm 1 VM allocation and task scheduling in edge server
each task if it is executed at current time slot. C2 limits that 1: Input: C, Tm , tm,n , ε.
the total computation resources of VMs cannot exceed C. 2: Output: VM allocation cm , task scheduling Y.
It can be seen that Problem (14) is a mixed-integer 3: Initialize ym,n = 1 ∀m, n, and cm according to (19).
PM
programming that is difficult to solve. It involves the 4: while m=1 cm > C do
continuous variable C and 0-1 integer variable Y. Even 5: Update ym,n according to (17) and (20).
though we assume C is known, the residual subproblem is 6: Update cm according to (19).
still a quadratic problem with 0-1 integer constraints, which 7: end while
is NP-hard with non-definite matrix [28], [29]. This problem 8: return
is commonly reformulated by specific relaxation approach
and then solved by powerful convex optimization techniques.
However, this method performs extensive iterations and reveals cost in terms of the delay of the tasks, the energy consumption
little insight about scheduling policy. Thus, we are motivated of the IoT users, and the edge and cloud server usage costs.
to design an efficient low-complexity algorithm to obtain the This can be achieved by modeling the computing offloading
suboptimal solution. In the proposed VM allocation and task decisions as an MDP.
scheduling algorithm, we assume for each VM m, the delay An MDP is defined by a tuple (S, A, T, R), where S is the
requirements for Nm tasks have been sorted, i.e., tm,n ≤ set of possible system states, A is the set of actions, T =
tm,n+1 . At the beginning, we try to allocate cm as if all tasks {p(s0 |s, a)} is the set of transition probabilities, and R : S ×
had been scheduled, i.e., ym,n = 1, ∀m ∈ A, ∀n ∈ Tm . The A 7→ R is a real-value reward (or cost) function when the
allocation results would be system is at state s ∈ S and an action a ∈ A is taken. A
nZm policy π is a mapping from S to A. The MDP of the SAG-
cm = min{ }, ∀m ∈ A, ∀n ∈ Tm . (16) IoT computing offloading problem is defined as follows.
tm,n
PM 1) States: at the beginning of time slot t, the network
Given the allocation results, if m=1 cm > C, it means not all state is defined as M(t) ⊗ Tr (t) ⊗ PL(t) ⊗ PL(t −
tasks can be scheduled. Therefore, we choose not to schedule 1) ⊗ PL(t − 2) ⊗ · · · ⊗ PL(t − tq ), where Tlr (t) =
the task with the most harsh delay requirement, i.e., let {tl1 (t), tl2 (t), . . . tlM (t)} represents the remaining time for each
user to complete locally processing tasks, and PL(t) =
ym,n = 0, (17)
{P L1 (t), P L2 (t), . . . , P LM (t)} is the vector of pathloss
where values of all users to their associated UAV. The system
nZm state includes the pathloss information of the current and the
m, n = arg max , ∀m ∈ A, ∀n ∈ Tm . (18) previous tq time slots in order to learn and predict the pathloss
m,n tm,n
information.
Then, we calculate the VMPallocation cm again. Repeat this 2) Actions: at the beginning of time slot t, the system
M
process until the condition m=1 cm ≤ C is satisfied, and the takes the action of scheduling the tasks of the users, i.e., to
VM allocation cm and task scheduling Y is obtained. Note determine the matrices Xl (t), Xe (t), and Xc (t), or equally,
that for a generic Y, the VM allocation is calculated by to determine xlij , xeij , and xcij , ∀i, j. Therefore, we denote
a(t) = {Xl (t), Xe (t), Xc (t)}. Clearly, at time slot 0, there
P
ym,n Zm
cm = min{ n }, ∀m ∈ A, ∀n ∈ Tm , (19) are 4M N possible actions, which is a very large number when
tm,n
M and N are large.
and the unscheduled task selection is calculated by 3) Transition probability: since the UAV-user pathloss is not
P
n ym,n Zm affected by the actions, the system transition probability can
m, n = arg max , ∀m ∈ A, ∀n ∈ Tm . (20) be calculated by
m,n tm,n
The full algorithm of edge server VM allocation and task p(st+1 |st , at ) =p(PL(t + 1)|PL(t))· (Tlr (t + 1)|Tlr (t), at )
scheduling is shown in Algorithm 1. From the algorithm, · p(M(t + 1)|M(t), at ). (21)
we can see that the worst case (the cloud cannot finish any
offloaded task in time) requires N 0 (N 0 + 1)/2 comparisons Specifically, if the UAV trajectory and the flying speed are
where N 0 is the number of total offloaded tasks to the UAV planned to be fixed, p(PL(t + 1)|PL(t)) is 1 with a specific
edge server. Even the worst case complexity O(N 02 ) is very PL(t + 1) and 0 otherwise. However, due to the uncertainties
low, and therefore the proposed algorithm can work efficiently in the UAV mobility, p(PL(t + 1)|PL(t)) will be difficult to
in the dynamic SAGIN environment. model. Tlr (t + 1) can be calculated by
N
l l
X Zj
V. C OMPUTATION O FFLOADING P ROBLEM F ORMULATION Tr,i (t + 1) = max{Tr,i (t) + xlij (t) − ε, 0}. (22)
j=1
Cl
We design an online computing offloading approach for the
SAG-IoT system, in which at each time slot the computing For p(M(t + 1)|M(t), at ), it is difficult to model accurately.
tasks of IoT devices are scheduled to process locally, offloaded For example, if a task is offloaded to a UAV edge server,
to the UAV edge server, or offloaded to the cloud server whether the task can be complete within the time slots depends
through the satellite, in order to minimize the total system on the UAV data transmission rate, UAV computation resource
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
7
allocation, other users’ decision, and UAV mobility, which are trace of state-action sequence s0 , a0 , s1 , a1 , stmax , atmax in an
dynamic and correlated. episode following π(·|·, θ), where tmax denotes the preset
4) Reward: To minimize the weighted sum of delay, energy, value indicating the possible maximum number of time slots
P server usage cost, we use the cost function C(st , at ) =
and for processing all tasks. Then, we can have J(θ) as the value
ij Ci,j (st , at ) at time slot t, where Ci,j (st , at ) is the cost function of the start state s0 :
function of task Wij , which is calculated in the following way. tmax
. X
1) if mij (t) = 0, the task has already completed, and thus J(θ) = Vπθ (s0 ) = Eπθ [ γ k C(sk , ak )|π(·|·, θ)]. (25)
Cij (st , at ) = 0. k=0
2) if mij (t) = 1 and xlij + xeij + xcij = 0, the task is To learn the policy parameter θ which minimizes J(θ),
not scheduled in this time slot, and thus a delay of ε intuitively, we can use the gradient descent method to
is introduced. We define the cost function Cij (st , at ) = gradually update θ by
$i ε, where $i is user i’s weight on the delay.
3) if mij (t) = 1 and xlij + xeij + xcij = 1, Cij (st , at ) = θt+1 = θt − ϕ∇J(θt ). (26)
$i (xlij (Tijl −εt)+xeij (Tije −εt)+xcij (Tijl −εt))+xlij Llij + where ϕ represents the learning rate. According to the policy
xeij Leij + xcij Llij . gradient theorem [32], we have
Define the value function V of state s as the expected long- X
term discounted cost starting from s with policy π, i.e., ∇J(θt ) = Eπ [ qπ (st , a)∇θ π(a|st , θ)]
"∞ # a
X X ∇θ π(a|st , θ)
V (s|π) = E γ t C(st , at )|s0 = s, π , (23) = Eπ [ π(a|st , θ)qπ (st , a) ]
a
π(a|st , θ)
t=0
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
8
where ϕ0 is the learning rate, and the loss function L(ω) is TABLE II
defined as S IMULATION PARAMETERS .
L(ω) = |V̂ (st , ω) − (Ct + γ V̂ (st+1 , ω))|2 . (31) Parameter Value Parameter Value
M 30 N 5
Finally, motivated by the capability of deep neural networks
Cl 200 MC/s El 141 mW
to approximate complex functions, we employ deep learning
Ce 3 GC/s E e , E e− , E c 200 mW
architecture to learn the policy in terms of θ and the estimated
Cc 10 GC/s B 20 MHz
state-value function. The full proposed online computing
h 90 m rSG 10 Mbps
offloading approach for SAT-IoT is shown in Algorithm 2,
α 10−10 J/cycle rSC 10 Mbps
where ϕ and ϕ0 are learning rates for the actor and the critic,
respectively. β 4 × 10−10 J/cycle $i 0.2 J/s
The implementation of the proposed RL-based offloading
approach is shown in Fig. 3, which is composed of the SAGIN
environment, the computing offloading reward evaluator, an VII. P ERFORMANCE E VALUATION
actor network, a critic network, and a temporal-difference
component. The system state can be observed from the current
SAGIN environment, which is then sent to the input of the A. Simulation Configurations
actor network and the critic network. The actor network
generates the action a according to a = πθ (s), and updates
the policy θ. It can be easily seen that at time slot t, for an In this section, we evaluate the proposed joint VM
arbitrary task Wij , the decision xij (t) has four possibilities, resource allocation and task scheduling scheme for the UAV
i.e., not scheduled, process locally, offload to edge, and offload edge server, and the RL-based online computing offloading
to cloud. Therefore, we map these four possible decisions approach for SAG-IoT system. In the simulation, we consider
of to xij integer 0, 1, 2, 3 respectively, and design two a remote 1 km×1 km square area with M = 30 IoT users
output layers of the actor network, i.e., σ and µ, which fixed deployed in this area. The IoT user runs N = 5 different
can compose M × N normal distributed random variables to applications and thus each user has 5 tasks to process. We
represent the actions of each task. The critic network estimates select ARM Cortex-M based IoT devices as the ground users.
the value function V̂ (st , ω) and updates the parameter ω. Referring to [35] and [36], we set the IoT device computing
The reward of a state-action pair is evaluated by the reward capability C l to 200 MC/s (MC=106 cycles), and the energy
evaluator, and is used to calculate the temporal-difference (TD) consumption for local task processing is 141 mW. As defined
η = Ct +γ V̂ (st+1 , ω)− V̂ (st , ω), which is used in the update in [37], the transmission and reception power of IoT users
of the policy parameter θ and the critic network parameter ω. with UAVs and satellites, i.e., E e , E e− , and E c are set to
200 mW. 5 UAVs are serving as the flying edge servers for
Algorithm 2 Deep actor-critic based online computing the IoT computing. UAV movement trajectories are planned to
offloading maximize the minimum throughput which follows Wu’s work
[38] with adopting practical UAV-ground propagation channels
1: Input: IoT user information: location, C l , E l , Eie , Eie−
(1). Referring to [39], the edge server’s computation resource
and Eic
C e is set to 3 GC/s (GC=109 cycles), while the cloud server’s
UAV edge server information: mobility traces, Cve , Bij e
,B
c computation resource assigned to each task, i.e., C c , is set to
Cloud related information: rSG , rSC , Cc , Bij
10 GC/s. For satellite and remote cloud, we consider within
Task information: H in , H out , Z one episode of computing offloading, there is one LEO satellite
2: Output: Optimal computing offloading decision X(t) providing the full coverage to the area, and the satellite-ground
3: Randomly initialize critic network V̂ (s, ω) and actor communication rate rSG is set to 10 Mbps which is the average
network π(s, a|θ) observed transmission rate in the high throughput satellite
4: for episode=1,G do communication system ViaSat-1. The satellite-cloud data rate
5: Initialize a random vector N as the noise for action is also constrained by the satellite-ground transmission rate,
exploration and therefore we set rSC = rSG = 10 Mbps. Different
6: Observe the initial state s1 computation tasks may have different computation to data
7: for time slot t=1,tmax do ratios; however, for the simulation simplicity we choose
8: select action ar,t = π(s|θ) + Nt x264 VBR encode computation to data ratio, which is 1300
9: execute at and observe the cost Ct and state st+1 cycles/byte, i.e., Z = 1300H in [40]. Hjin and Hjout are
10: η ← Ct + γ V̂ (st+1 , ω) − V̂ (st , ω) randomly chosen between 5 MB and 15 MB, and between
11: update ω ← ω − ϕ0 η∇ω V̂ (st , ω) 1 MB and 5 MB, respectively. We set the usage cost of edge
12: update θ ← θ − ϕηγ t ∇π(aθ π(at |st ,θ)
t |st ,θ)
e
server/cloud server, i.e., Bij c
and Bij to the CPU cycles to
13: end for execute tasks Wij , i.e., Wij ’s workload. In addition, α =
14: end for 10−10 J/cycle, β = 4 × 10−10 J/cycle, and $i = 1 J/s for
15: return each user i. The detailed simulation parameters are shown in
Table. II unless otherwise specified.
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
9
M*N
Action Mapping
σ
Policy Network
(actor) M*N
actions
a=πθ(s)
μ
θ
M*N
Satellite
Network
Cloud Server State s
Edge server
IoT device Reward
TD
UAV Evaluator
VMs
SAGIN Environment
v(s,ω)
Value Network
(critic)
Fig. 3. RL-based computing offloading approach. The proposed approach implements two components, i.e., one actor network to update the policy, and one
critic network to evaluate the value function and guide the update of offloading policy.
B. VM Computing Resource Allocation and Task Scheduling time of the proposed heuristic algorithm remains very small
We first evaluate the proposed VM computing resource when the number of tasks increases. The zoomed-in run
allocation and task scheduling algorithm. We compare the time for the proposed heuristic algorithm shows clearly a
heuristic algorithm with ‘Brute-force’ method and ‘Random’ quadratic increase on run time when the number of offloaded
method. In ‘Brute-force’, exhaustive search is used to find tasks increase, which validates our analysis in Section
the optimal unscheduled tasks, which achieves the upper- IV. To summarize, the proposed VM computing resource
bound performance but is with high computing complexity. allocation and task scheduling algorithm can simultaneously
In ‘Random’, unscheduled tasks are randomly selected. achieve near-optimal performance and very low computational
Fig. 4 shows the delay performance of the proposed complexity, and therefore is suitable to allocation UAV edge
algorithm. In Fig. 4(a), the average delay with respect to server resources under dynamic network conditions.
the UAV edge server computing resource C e is shown. We
can see from the figure that with the increase of C e , the C. Deep RL-based IoT Computing Offloading
average delay of the three methods decrease, because with
In this part, the performance of the proposed RL-based
higher computing server capability, the average processing
SAG-IoT computing offloading approach is evaluated and
time will be reduced, and thus more tasks can be scheduled
compared. To show the efficiency of our proposed approach,
to satisfy their delay requirements. In Fig. 4(b), the average
we explicitly compare it with two other computing offloading
delay with respect to the total number of tasks offloaded to
approaches, i.e., ‘Random’ and ‘Greedy on edge’, which are
the considered edge server is shown, when C e is set to 10
described as follows.
GC. It can be seen that a larger number of tasks lead to
increasing average delay, since more tasks contend for the 1) ‘Random’: each task randomly selects a time slot t ∈
limited computing resources, and fewer tasks can complete {1, 2, . . . , tmax }, and an offloading decision (locally,
in time. In both figures, the proposed heuristic algorithm can edge, cloud).
achieve a very close performance with that of the ‘Brute-force’ 2) ‘Greedy on edge’: since the edge computing can usually
method, which demonstrates the efficiency of the proposed provide a lower computing delay and relatively low
algorithm. price, each user will offload all tasks to the UAV edge
Fig. 5 shows the comparison of the run time between the server if it is within the coverage of a UAV. Otherwise,
proposed heuristic algorithm and the ‘Brute-force’ method. We the user decides to wait, process locally, or offload to
can see from the figure that with the increasing number of cloud with certain probabilities. In the simulation, we
total tasks, the run time of ‘Brute-force’ method increases set the probabilities to 0.8, 0.1, and 0.1, respectively.
exponentially. This is because ‘Brute-force’ method uses Fig. 6 shows the convergence performance of the proposed
exhaustive search and a larger number of tasks leads to an RL-based computing offloading algorithm. The total cost is
exponentially growing searching space. In opposite, the run calculated by the summation of the cost of each task, which
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
10
14
190
180 Heuristic
BruteForce 12
Heuristic
170
Random BruteForce
10
160
Average total delay (s)
4
120
4
3
12 13 14 15 16 17 18
110
2
100
90 0
4 6 8 10 12 14 16 18 20 22 24 12 13 14 15 16 17 18
UAV edge server computing resource (109 cycles) Total number of tasks
Heuristic 3200
250
BruteForce
Random
3100
Average total delay (s)
200
3000
150
Total cost
2900
100
2800
50
2700
0
4 6 8 10 12 14 16
Total number of tasks 2600
0 10 20 30 40 50 60
(b) Average total delay v.s. total number of tasks. Episode
Fig. 4. Performance of the proposed VM computing resource allocation and Fig. 6. Convergence performance of our proposed algorithm.
task scheduling algorithm.
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
11
5500 6500
ProposedAC
6000 Greedy
5000 Random
5500
4500
5000
Total cost
Total cost
4000 4500
ProposedAC
Greedy
Random 4000
3500
3500
3000
3000
2500 2500
0 2 4 6 8 10 12 14 16 18 20 4 6 8 10 12 14 16 18 20
18000 120
Local
16000
energy consumption UAV-Edge
Energy comsumption and delay
12000 80
10000
60
8000
6000 40
4000
20
2000
0
ProposedAC Random Greedy 0
ProposedAC Random Greedy
Offloading Mechanism Offloading Mechanism
Fig. 8. The energy consumption and weighted delay. Fig. 10. Offloading means selection.
total cost increases with β because the increase of β leads the delay, i.e., $. With the increase of $, the total cost of all
to the increase of βB c which is a component of the total the three offloading approach increase, due to the increase of
cost. It can also be seen that the total cost of the proposed $T , which is the delay component of the total cost. However,
approach increases faster the other two approaches, which is the proposed offloading approach can achieve the lowest total
because in the current setting of the simulation, the satellite- cost and lower increase rate among the three approaches since
cloud offloading can achieve relatively better performance it can learn from the environment an optimal policy to reduce
than local processing and UAV, if properly chosen. Therefore, the total task delay.
the proposed approach learns the environments and chooses
cloud offloading with higher probability. This fact can be
VIII. C ONCLUSION
seen in Fig. 10, which shows the number of selections of
each offloading means for each offloading approach. For the In this paper, we have investigated the IoT computing
proposed approach, it selects satellite-cloud more frequently offloading problem in SAGIN. We have proposed a joint
over the other two offloading means. Compared to satellite- VM allocation and task scheduling mechanism to efficiently
cloud, the local processing results in longer delay due to week allocate the computing resources to different VMs in the UAV
local computation capability, while the UAV-edge may suffer edge server. To offload the computation-intensive tasks, we
the contention problem and high UAV mobility, although it have proposed an RL-based computing offloading approach
has the benefits of high transmission rate and low server to handle the multidimensional SAGIN resources and learns
usage cost. The ‘Random’ and ‘Greedy’ approaches select the dynamic network conditions. Deep neural networks, policy
almost the same number of local processing and satellite- gradient, and actor-critic methods have been employed to
cloud. The ‘Greedy’ approach selects more times of UAV-edge improve the learning performance. Simulation results have
since it may wait for the future UAV connection with a high validated the convergency and efficiency of the proposed
probability if the UAV is current unavailable. approaches. Our work can offer valuable insights to the
Fig. 11 shows the total cost with respect to the weight on important yet underexplored field of edge/cloud computing in
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
12
11000
[13] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground integrated
10000
ProposedAC network: A survey,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp.
Greedy 2714–2741, 2018.
Random
9000
[14] H. Nishiyama, Y. Tada, N. Kato, N. Yoshimura, M. Toyoshima,
and N. Kadowaki, “Toward optimized traffic distribution for efficient
8000 network capacity utilization in two-layered satellite networks.” IEEE
Trans. Veh. Technol., vol. 62, no. 3, pp. 1303–1313, 2013.
Total cost
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
13
[38] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication Hongli He received the B.Sc. degree in information
design for multi-uav enabled wireless networks,” IEEE Trans. Wireless engineering from Zhejiang University, Hangzhou,
Commun., vol. 17, no. 3, pp. 2109–2121, 2018. China, in 2014, where he is currently pursuing the
[39] M.-H. Chen, B. Liang, and M. Dong, “Joint offloading and resource Ph.D. degree with the Institute of Information and
allocation for computation and communication in mobile cloud with Communication Engineering. His current research
computing access point,” in Proc. IEEE INFOCOM, 2017, pp. 1–9. interests include video streaming in vehicular ad
[40] A. P. Miettinen and J. K. Nurminen, “Energy efficiency of mobile clients hoc networks, LTE in unlicensed spectrum and edge
in cloud computing.” in Proc. HotCloud, 2010, pp. 4–11. cloud computing.
0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.