0% found this document useful (0 votes)
104 views13 pages

Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach

ternet of things (IoT) computing offloading is a challenging issue, especially in remote areas where common edge/cloud infrastructure is unavailable.

Uploaded by

Nivi Ps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views13 pages

Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach

ternet of things (IoT) computing offloading is a challenging issue, especially in remote areas where common edge/cloud infrastructure is unavailable.

Uploaded by

Nivi Ps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
1

Space/Aerial-Assisted Computing Offloading for IoT


Applications: A Learning-based Approach
Nan Cheng, Member, IEEE, Feng Lyu, Member, IEEE, Wei Quan, Member, IEEE,
Conghao Zhou, Student Member, IEEE, Hongli He, Student Member, IEEE, Weisen Shi, Student Member, IEEE,
Xuemin (Sherman) Shen, Fellow, IEEE,

Abstract—Internet of things (IoT) computing offloading is a example, virtual reality and HD video streaming require
challenging issue, especially in remote areas where common a large amount of computing resources for rendering and
edge/cloud infrastructure is unavailable. In this paper, we present video encoding/decoding, and the autonomous vehicles rely
a space-air-ground integrated network (SAGIN) edge/cloud
computing architecture for offloading the computation-intensive on computing for artificial intelligence (AI)-based steering
applications considering remote energy- and computation- control. These computation-intensive applications pose great
constraints, where flying unmanned aerial vehicles (UAVs) challenges on the battery and computing capabilities of
provide near-user edge computing and satellites provide access to the resource-constrained end devices, especially the IoT
the cloud computing. Firstly, for UAV edge servers, we propose devices, which motivates the cloud computing in which
a joint resource allocation and task scheduling approach to
efficiently allocate the computing resources to virtual machines computation-intensive applications are offloaded to the cloud
and schedule the offloaded tasks. Secondly, we investigate the servers with centralized and abundant computation resources.
computing offloading problem in SAGIN and propose a learning- Although cloud computing can significantly reduce the
based approach to learn the optimal offloading policy from computation delay and the energy consumption of the
the dynamic SAGIN environments. Specifically, we formulate users, it may fail to meet the demands of delay sensitive
the offloading decision making as a Markov decision process
where the system state considers the network dynamics. To cope applications, such as mobile gaming and augmented reality,
with the system dynamics and complexity, we propose a deep since the long transmission distances between end users
reinforcement learning-based computing offloading approach to and the cloud servers result in long transmission delays.
learn the optimal offloading policy on-the-fly, where we adopt To address this issue, mobile edge computing (MEC) has
the policy gradient method to handle the large action space been extensively investigated, where the computing resources
and actor-critic method to accelerate the learning process.
Simulation results show that the proposed edge virtual machine in the network edge are employed to provide efficient
allocation and task scheduling approach can achieve near- and flexible computing services. In 5G wireless systems,
optimal performance with very low complexity, and that the ultra-dense network edge devices will be deployed, such
proposed learning-based computing offloading algorithm not only as macro/small cell base stations and WiFi access points
converges fast, but also achieves a lower total cost compared with which can provide exponentially growing amount of edge
other offloading approaches.
computing resources. Many significant issues in MEC have
Index Terms—Computing offloading, edge computing, space- been extensively investigated, including offloading task model
air-ground, IoT, reinforcement learning [3], [4], energy efficiency [5], [6], [7], latency reduction
[8], [9], [10], and joint optimization of communication and
I. I NTRODUCTION computing [11], [12].
With the rapid development of 5G networks and Internet However, 5G networks may fail to provide ubiquitous
of things (IoT), a myriad of promising applications and coverage to suburban and rural areas, where IoT devices
services have emerged, such as virtual reality, HD live could be widely deployed to execute certain applications
streaming, autonomous driving, industry automation, smart with relatively high computing requirements. For example,
home, and so forth, which reap the benefits provided by the fusion of sensing information, especially the handling of
5G networks, such as ultra-high data rate, low latency, high high-definition sound or video information, will quickly drain
reliability, and massive connections [1], [2]. However, besides the battery of the sink nodes and result in large processing
efficient and reliable communication, a wide spectrum of delays. Due to the lack of terrestrial access network coverage,
applications also require massive computing capabilities. For the typical edge and cloud computing paradigms cannot be
applied in such scenarios. To this end, we propose to employ
N. Cheng is with the School of Telecommunication, Xidian University, the space-air-ground integrated network (SAGIN) architecture
Xian, China, and the Electrical and Computer Engineering Department,
University of Waterloo, Waterloo, ON, N2L3G1, Canada (email: for the computing offloading of remote IoT applications.
[email protected]). SAGIN integrates the satellite network and aerial network
F. Lyu, C. Zhou, W. Shi, and X. Shen are with the Electrical and Computer with the terrestrial network to provide seamless and flexible
Engineering Department, University of Waterloo, Waterloo, ON, N2L 3G1,
Canada (emails: {f2lyu,c89zhou,w46shi,sshen}@uwaterloo.ca). network coverage and services to large areas, and thus can
W. Quan is with the School of Electronic and Information Engineering, be applied in many promising fields, such as intelligent
Beijing Jiaotong University, Beijing, China. (email: [email protected]). transportation system, remote area monitoring, disaster rescue,
W. Quan is the corresponding author.
H. He is with the School of Information Engineering from Zhejiang and large-scale high-speed mobile Internet access [13]. SAGIN
University, Hangzhou, China (email: hongli [email protected]). is a multidimensional heterogeneous network consisting three

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
2

network segments, i.e., the satellite network, aerial network, approach to efficiently solve the problem. The system
and terrestrial network. Each network segment possesses state is defined to integrate the historical network
different resources and is affected by different limitations. The information to learn the system dynamics. In addition,
Low Earth Orbit (LEO) and geostationary (GEO) satellites a policy gradient-based actor-critic learning algorithm is
constitute a hierarchical network where LEO satellites provide proposed to cope with problem of dimensionality curse
high-speed access and GEO satellites relay the data between and accelerate the learning speed.
LEO for long distance transmission [14]. The aerial network, • We adopt network virtulization to flexibly allocate the
including flying unmanned aerial vehicles (UAVs), high resources of the edge server. We formulate the joint
latitude platforms (HAPs), and communication balloons, can edge server VM computation resource allocation and
be deployed on demand at locations with burst data traffic task scheduling problem as a mix-integer programming
to offer high-speed and dynamic network services, such problem, and propose an effective heuristic algorithm to
as dynamic coverage, edge computing, crowdsensing, etc. solve it.
[15], [16]. In the proposed SAG-IoT computing offloading • The performance of the proposed approaches are
architecture, the aerial network nodes can serve as the flying evaluated through extensive simulations. The joint
edge servers, which provides the IoT devices with the low- VM allocation and task scheduling can achieve near-
delay edge computing. On the other hand, the satellite optimal performance with low complexity. In addition,
communication, although may have lower communication the performance of the proposed RL-based computing
rate and higher transmission delay, can provide always-on offloading approach is evaluated with respect to design
cloud computing through seamless coverage and satellite parameters.
backbone networks [17]. However, employing the SAGIN The remainder of the paper is organized as follows. In Section
in IoT computing offloading introduces several challenging II, we present the related work. Section III describes the
issues. Firstly, the high mobility of the aerial network results in system model. In Section IV, the joint edge VM allocation
dynamic channel conditions and coverage, leading to varying and task scheduling problem is formulated and solved. Section
server availability and communication delay, which should be V formulates the SAG-IoT computing offloading problem,
carefully handled to guarantee the performance of the SAG- followed by the RL-based solution in Section VI. Section VII
IoT system. Secondly, different network segments in SAGIN evaluates the proposed approaches, and Section VIII concludes
possess distinct network conditions and resource constraints, the paper. Useful notations used throughout the paper are listed
and it is non-trivial to design an efficient computing offloading in Table I.
approach considering the complex and dynamic network
conditions and resources. II. R ELATED W ORK
In this paper, we present a flexible joint communication
and computation SAGIN framework to provide powerful A. Mobile Edge Computing
edge/cloud computing services to remote IoT users. Under The concept of MEC was originally proposed by ETSI
the framework, we propose an efficient computing offloading in [18], in which the motivation, definition, architecture,
approach which learns on-the-fly the optimal offloading policy and challenging issues are discussed. In edge computing,
to minimize the weighted sum of delay, energy consumption, the computation task offloading mechanism determines the
and server usage cost, considering the multidimensional overall performance of the MEC system. The energy-efficient
network dynamics and resource constraints. Firstly, the UAV computation offloading is crucial for energy-constraint IoT
edge servers’ computation resources are virtualized as virtual devices, and has been studied in [5] and [6]. In [5],
machines (VMs) for parallel execution of the offloaded Mahmoodi et al. studied the joint scheduling and computation
tasks. We formulate the joint VM resource allocation and offloading problem and proposed a real data measurement
task scheduling problem as a mixed-integer programming based optimization method to save the energy consumption
problem and propose an efficient heuristic algorithm to of the mobile users. In [6], Mao et al. proposed a Lyapunov
solve it. Secondly, we investigate the computing offloading method-based dynamic computation offloading for devices
problem in SAGIN, which is formulated as a Markov decision with energy harvesting. The execution cost which jointly
process (MDP). To learn the network dynamics, a model-free considers the execution latency and task failure is taken as the
reinforcement learning (RL)-based approach is proposed, and performance metric. In MEC system, the energy consumption
an actor-critic learning algorithm is designed to handle the and task delay rely not only on the task processing, but also on
large state and action spaces. To the best of our knowledge, the communication of the related data of the task. Therefore,
our work is the first work to study the computing offloading the joint optimization of the communication radio resources
problem in SAGIN, which validates the feasibility of SAGIN and the computing offloading has attracted much research
supporting computation-intensive applications for remote IoT attention [11], [12]. In [11], You et al. studied the resource
users, and can provide useful guidelines for SAGIN network allocation for multiuser MEC offloading problem considering
design and remote computing offloading. TDMA and OFDMA scenarios. In [12], Wu et al. studied the
The main contributions of the paper can be summarized as multi-access-assisted computing offloading, and presented a
follows. joint optimization of computation task scheduling and radio
• We formulate the SAG-IoT computing offloading resource allocation. However, these works only focus on the
problem as an MDP and propose an RL-based fixed MEC scenario, i.e., the edge computing services are

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
3

TABLE I
N OTATIONS USED IN THE PAPER .

Notation Description
M Number of IoT users
N Number of IoT applications
Cl, Ce, Cc The computation resources of local IoT users, the edge servers, and the cloud server
El The local process power consumption of IoT users
Eie , Eic The transmission power of IoT users to UAV and satellite
e c
Bij , Bij The usage cost of task Wij in edge server and cloud server
B Bandwidth of UAV-ground communication
Hjin , Hjout The size of input data and output data of j-th application
Zj The computing requirement of j-th application
M (t), mij (t) Remaining tasks matrix and the element corresponding to task Wij
X l (t), X e (t), X c (t) Offloading decision matrix corresponding to local process, offload to UAV, and offload to the cloud
xlij (t), xeij (t), xcij (t) The element of X l (t), X e (t), X c (t) corresponding to task Wij
α, β The weight of UAV-edge and cloud server usage cost over the IoT user energy consumption
$i The weight of delay over energy consumption and server usage cost for IoT user i

provided by cellular BSes or WiFi APs, which is different from


Satellite Internet Cloud Server
our work where flying UAVs serve as the mobile edge servers. Network

In [19], a mobile edge computing mechanism is proposed


via a UAV-Mounted cloudlet. The bit allocation and UAV
trajectory are jointly designed to minimize the mobile energy
Edge server
consumption by solving a non-convex optimization problem.
UAV UAV
Different from [19], we consider both the energy consumption VMs

and task processing delay. In addition, the UAV trajectories are


IoT device
learnt instead of designed for the scenarios where the UAVs
are not deployed by network operators and the trajectories are
unknown in advance.

B. Space-air-ground Integrated Network Fig. 1. An overview of the SAG-IoT Architecture.


SAGIN is envisioned as a promising technology to address
many problems in future mobile communication networks,
such as remote and large-scale coverage, growth of mobile as monitoring and video surveillance. In the considered remote
data, uneven data traffic, and rigid backbone networks, and area, there is no cellular coverage, and therefore we consider
has recently attracted much attention from both academia a space-air-ground integrated network (SAGIN) to provide
and industry. Different SAGIN architecture is discussed in network functions, such as network access, edge computing
[20], [21]. In [20], Hoang et al. studied the optimal energy and caching, to the IoT devices. The overview of the SAG-
allocation problem in SAGIN and proposed a learning- IoT network is shown in Fig. 1. In the SAG-IoT network, there
based algorithm to optimize the network performance and are three network segments, i.e., the ground segment, the aerial
maximize the service providers’ revenue. In [21], Zhang segment, and the space segment. The IoT devices compose the
et al. proposed a software-defined SAGIN architecture and ground segment and have very limited energy and computing
discussed the challenging issues therein. The edge caching capabilities. The applications running at the IoT devices may
is employed in SAGIN to reduce the content retrieval delay generate data to upload and computing tasks to execute. In
and offload the backbone networks. In [22], Chen et al. the aerial segment, the flying UAVs can serve as edge servers
proposed an optimal content caching scheme to place content to provide ground users with edge caching and computing
at UAVs by considering the user’s information and the content capabilities. The flying UAVs, such as the Facebook Aquila,
request distribution. However, the study of edge computing can fly for months without charging by using solar panels [23].
offloading and computation resource allocation considering the The UAVs are configured with fixed flying trajectories to serve
cooperation of space, aerial, and ground network segments is the considered area. Furthermore, in the space segment, one
still missing in the literature, which is important for supporting or more LEO satellites provide the full coverage of the area
a myriad of computing-intensive applications in SAGIN. of interest, and connect the IoT devices with the cloud servers
through the satellite backbone network.
III. S YSTEM M ODEL
For the IoT device (user) i, it has the local computing
A. Network Model capability of C l , which is assumed identical for all users. The
We consider a remote area where IoT devices are deployed energy consumption for locally task computing/processing is
to conduct certain tasks with computation requirements, such denoted by E l , which is related to C l . The power consumption

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
4

for transmission to UAV and satellite is denoted by Eie and Eic , the processing rate for each task is denoted by C c , and the
c
respectively. In the edge servers, i.e., UAVs, the computing usage cost for user i’s task j is denoted by Bi,j .
resources are virtulized as VMs, each for one specific
application [24]. In edge server k, the total computation B. Multi-user Multi-task SAG-IoT Computing Offloading
resource is C e , the resources allocated to the computation
We consider that there are M IoT users and N different
VM v is denoted by Cve , and the server usage cost of the
e computation applications, and each user is running all N
computation VM for user i’s task j is denoted by Bi,j . For
applications, leading to M × N computation tasks in the
the UAV-ground communication, since we consider the task
system. We also consider that the N applications have certain
offloading decision making, which is with much longer time
priorities, in the way that if multiple tasks are scheduled
scale than traditional resource scheduling time (1 ms), only
simultaneously, the task with smaller application number
large-scale channel fading is considered. In addition, since the
will be transmitted/processed earlier than those with larger
instantaneous channel information is not required, a satellite
application numbers. For j-th application, the size of the input
controlled global decision making is feasible. According to
data, the output data, and the workload are denoted by Hjin ,
[25], the pathloss between the UAV and the ground users
Hjout and Zj , respectively. These tasks can be executed locally
follows
1
at the IoT devices. However, due to the limited energy and
4πfc (h2 + r2 ) 2  computing capability of IoT devices, the computing tasks can
L(r, h) = 20 log + PLoS (r, h)ηLoS
c also be offloaded to the UAV edge servers or further to the
+ (1 − PLoS (r, h))ηN LoS , (1) cloud through the satellites. The offloading decision is made in
each time slot until all the M × N tasks are completed. At the
where h and r denote the UAV flying altitude and the
beginning of time slot t, the remaining tasks are denoted by a
horizontal distance between the UAV and the ground user,
M ×N matrix M(t), where the element mi,j (t) = 1 indicates
respectively. ηLoS and ηN LoS denote respectively the additive
task Wij has not completed, and mi,j (t) = 0 otherwise.
loss incurred on top of the free space pathloss for LoS and
Denote decisions of locally processing the tasks, offloading
NLoS links [26]. fc denotes the carrier frequency, and c
the tasks to edge, and offloading the tasks to cloud at time slot
denotes the speed of light. PLoS is the line-of-sight probability
t by M × N matrices Xl (t), Xe (t), and Xc (t), respectively,
of UAV-ground link, which can be calculated by
and each binary element xlij (t), xeij (t), and xcij (t) indicates
1 whether task Wij is processed locally, offloaded to the edge,
PLoS (r, h) = . (2)
1 + a exp(−b(arctan( hr ) − a)) or offloaded to the cloud, respectively. Note that task Wij can
be scheduled to at most one means at time t, i.e., the offloading
(a, b, ηLoS , ηN LoS ) are environment-dependent variables. For
decision is constrained by
instance, in remote areas, their values are (4.88, 0.43, 0.1,
21) [27]. In addition, the UAV-ground communication uses xlij (t), xeij (t), xcij (t) ∈ {0, 1}, (6)
WiFi protocols with total bandwidth B. If n IoT devices
xlij (t) + xeij (t) + xcij (t) ≤ mij (t). (7)
communicate with a UAV simultaneously, the bandwidth each
IoT device obtains is calculated by The inequality in (7) holds when an unfinished task is not
scheduled at time slot t. If the task Wij is processed locally
Bi = ρBξ(n) (3)
or offloaded to the cloud at time t, we consider the task can be
where ρ is the WiFi throughput efficiency factor, and ξ(n) is finished with a certain delay, and mi,j (t + 1) = 0. However,
the WiFi channel utilization function which is a decreasing if Wij is offloaded to the UAV edge server, it may not be
function of contenting user number n. Thus, the instant UAV- completed and return to user i successfully at the end of t,
ground and ground-UAV data rate can be calculated by which is due to two reasons. Firstly, if multiple tasks are
offloaded to one UAV edge server, some of them may not
Eie 10−Li /10 be able to be completed within the time slot; secondly, since
rGU = ρBξ(n) log2 (1 + ), (4)
σ2 the UAVs are moving, when task Wij is completed in the edge
and server, the result cannot be transmitted to user i if user i is
Eie− 10−Li /10 out of the coverage area of the UAV.
rU G = ρBξ(n) log2 (1 + ), (5)
σ2
respectively, where Eie− denotes the UAV transmit power C. Cost Model
to ground IoT users, Li denotes the pathloss for the The computing task offloading is to minimize the system
corresponding IoT user-UAV link, and σ 2 denotes the power cost of executing the M × N tasks. In the considered SAG-
of the Gaussian noise. For the satellite-ground communication, IoT system, the system cost is composed of two parts, i.e., the
we consider a constant communication data rate rSG , which delay cost and the energy and server usage cost.
is usually smaller than the UAV-ground date rate. The 1) Delay cost: If the task Wij is scheduled at time slot t, the
satellite is connected to the Internet/cloud through the satellite delay can be calculated according to the offloading decision.
backbone network. We denote the transmission rate between If the task is scheduled to process locally, the delay is
the satellite and the cloud by rSC . The cloud has much higher Zj
computing capability than IoT devices and edge servers, and Tijl = ε(t − 1) + tlr,i + , (8)
Cl

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
5

where ε is the length of the time slot, and ε(t−1) is the elapsed VM1
t11 t12 t13
time since the generation of the task. Due to the low computing
capability of IoT devices, it is likely that at the beginning 0
of time slot t, there are some tasks which are scheduled to
locally process yet not finished. tlr,i is the time for user i VM2
UAV edge t21 t22
to complete the remaining local processing tasks, which can server
be calculated by the remaining local workload divided by the 0
local processing capability C l . If the task is offloaded to the
UAV edge server, and the result is returned to user i within Fig. 2. An example of joint VM allocation and task scheduling for UAV edge
server.
time slot t, the total delay of the task can be calculated by
Pj
xei,a (t)Hain Hjout
Tij = ε(t − 1) + dij + a=1
e e
+ . (9) the UAV quickly, and thus executing such tasks may lead
rGU rU G
to excessive resource allocated to the corresponding VM.
where deij denotes the processing delay of Wij in the UAV For example, in Fig. 2, two VMs are considered to execute
edge server, which depends on the offloading decision and the offloaded tasks to the UAV edge server, and ti,j is the
VM resource allocation in the server as described in Section delay requirement of task j in VM i. We can see that the
IV. If multiple tasks of user i are scheduled to the edge server, delay requirement t2,1 is very strict and a larger amount of
P j e in
a=1 xia (t)Ha calculates the time for transmitting Wij task computation resource should be allocated to VM2 to finish the
data to the server considering the transmission of tasks with corresponding task before deadline. However, since the total
higher priorities. Similarly, if the task is offloaded to the cloud computation resource of an edge server is fixed, it is likely
through the satellite, the delay is calculated by that little resource allocated to VM1, and none of the three
Zj Hjin + Hjout Hjin + Hjout tasks in VM1 can be finished in time. Therefore, we jointly
Tijc = εt + + + . (10) optimize the VM allocation and task scheduling in the UAV
Cc rSG rSC
edge server to reduce the system sum delay.
2) Energy and server usage cost: The energy cost of locally
In the considered problem, there are multiple kinds of
processing Wij can be calculated by
applications (Apps), denoted by A = {1, . . . , N }, and one
Zj UAV edge server with computation capability C cycles/s1 . For
Llij = E l (11)
Cl m-th App, there might be multiple offloaded tasks, denoted by
If at time slot t, task Wij is offloaded to the UAV edge server Tm = {1, . . . , Nm }, which has same computation workload
and the result is successfully transmitted to user i, the energy but different maximum delay requirements. Note that Zm
and server usage cost can be calculated by denotes the computation workload of m-th App’s tasks. C =
v {cm | m ∈ A} denotes the computation resource variables,
X Hjin where cm is the computation resource allocated to the VM
Leij = Eie xeij (t) e
+ αBij , (12)
t=1
rGU (t) executing App m. Y = {ym,n | m ∈ A, n ∈ Tm } denotes
the decision variables on task execution, where ym,n = 1 if
where α represents the weight of the UAV server usage cost
Pv Hjin task n of App m is scheduled and executed, and ym,n = 0
e
over the IoT user energy consumption. t=1 xij (t) rGU (t) otherwise. Therefore, our sum delay minimization problem can
calculates the total energy consumption considering the case be formulated as follows.
in which former times of offloading of the task to a UAV edge
server failed to return within the scheduling slot. Similarly, if Nm
N X
" n
#
task Wij is offloaded to the cloud, the energy and server usage X X Zm
min ym,n ym,k + ε (1 − ym,n )
cost can be calculated by C,Y
m=1 n=1
cm
k=1
n
Eic Hjin X Zm
Lcij = c
+ βBij , (13) s.t. C1 : ym,k 6 tm,n , ∀m ∈ A, ∀n ∈ Tm
rSG cm
k=1
where β denotes the weight of cloud server usage cost over XM
the IoT user energy consumption. C2 : cm 6 C
m=1

IV. C OMPUTATION VM A LLOCATION C3 :cm > 0


C4 :ym,n ∈ {0, 1} , ∀m ∈ A, ∀n ∈ Tm (14)
In time slot t, multiple tasks may be offloaded to one UAV
edge server. In such a scenario, these tasks are executed in where tm,n is the delay requirement of task n of App m and
different VMs in parallel to reduce the processing latency. One ε is the length of the time slot. tm,n can be calculated by
VM executes the tasks belonging to a specific application. We
tm,n = min(tlc , ε) (15)
therefore study a VM allocation problem to allocate the edge
server computation resources to different VMs considering where tlc is the time when the user who offloads this task loses
the tasks offloaded to the edge server. In addition, due to
the mobility of UAVs, some users may lose connection with 1 Here we use C instead of C e for simplicity.

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
6

connection with the UAV. C1 restricts the maximum delay for Algorithm 1 VM allocation and task scheduling in edge server
each task if it is executed at current time slot. C2 limits that 1: Input: C, Tm , tm,n , ε.
the total computation resources of VMs cannot exceed C. 2: Output: VM allocation cm , task scheduling Y.
It can be seen that Problem (14) is a mixed-integer 3: Initialize ym,n = 1 ∀m, n, and cm according to (19).
PM
programming that is difficult to solve. It involves the 4: while m=1 cm > C do
continuous variable C and 0-1 integer variable Y. Even 5: Update ym,n according to (17) and (20).
though we assume C is known, the residual subproblem is 6: Update cm according to (19).
still a quadratic problem with 0-1 integer constraints, which 7: end while
is NP-hard with non-definite matrix [28], [29]. This problem 8: return
is commonly reformulated by specific relaxation approach
and then solved by powerful convex optimization techniques.
However, this method performs extensive iterations and reveals cost in terms of the delay of the tasks, the energy consumption
little insight about scheduling policy. Thus, we are motivated of the IoT users, and the edge and cloud server usage costs.
to design an efficient low-complexity algorithm to obtain the This can be achieved by modeling the computing offloading
suboptimal solution. In the proposed VM allocation and task decisions as an MDP.
scheduling algorithm, we assume for each VM m, the delay An MDP is defined by a tuple (S, A, T, R), where S is the
requirements for Nm tasks have been sorted, i.e., tm,n ≤ set of possible system states, A is the set of actions, T =
tm,n+1 . At the beginning, we try to allocate cm as if all tasks {p(s0 |s, a)} is the set of transition probabilities, and R : S ×
had been scheduled, i.e., ym,n = 1, ∀m ∈ A, ∀n ∈ Tm . The A 7→ R is a real-value reward (or cost) function when the
allocation results would be system is at state s ∈ S and an action a ∈ A is taken. A
nZm policy π is a mapping from S to A. The MDP of the SAG-
cm = min{ }, ∀m ∈ A, ∀n ∈ Tm . (16) IoT computing offloading problem is defined as follows.
tm,n
PM 1) States: at the beginning of time slot t, the network
Given the allocation results, if m=1 cm > C, it means not all state is defined as M(t) ⊗ Tr (t) ⊗ PL(t) ⊗ PL(t −
tasks can be scheduled. Therefore, we choose not to schedule 1) ⊗ PL(t − 2) ⊗ · · · ⊗ PL(t − tq ), where Tlr (t) =
the task with the most harsh delay requirement, i.e., let {tl1 (t), tl2 (t), . . . tlM (t)} represents the remaining time for each
user to complete locally processing tasks, and PL(t) =
ym,n = 0, (17)
{P L1 (t), P L2 (t), . . . , P LM (t)} is the vector of pathloss
where values of all users to their associated UAV. The system
nZm state includes the pathloss information of the current and the
m, n = arg max , ∀m ∈ A, ∀n ∈ Tm . (18) previous tq time slots in order to learn and predict the pathloss
m,n tm,n
information.
Then, we calculate the VMPallocation cm again. Repeat this 2) Actions: at the beginning of time slot t, the system
M
process until the condition m=1 cm ≤ C is satisfied, and the takes the action of scheduling the tasks of the users, i.e., to
VM allocation cm and task scheduling Y is obtained. Note determine the matrices Xl (t), Xe (t), and Xc (t), or equally,
that for a generic Y, the VM allocation is calculated by to determine xlij , xeij , and xcij , ∀i, j. Therefore, we denote
a(t) = {Xl (t), Xe (t), Xc (t)}. Clearly, at time slot 0, there
P
ym,n Zm
cm = min{ n }, ∀m ∈ A, ∀n ∈ Tm , (19) are 4M N possible actions, which is a very large number when
tm,n
M and N are large.
and the unscheduled task selection is calculated by 3) Transition probability: since the UAV-user pathloss is not
P
n ym,n Zm affected by the actions, the system transition probability can
m, n = arg max , ∀m ∈ A, ∀n ∈ Tm . (20) be calculated by
m,n tm,n
The full algorithm of edge server VM allocation and task p(st+1 |st , at ) =p(PL(t + 1)|PL(t))· (Tlr (t + 1)|Tlr (t), at )
scheduling is shown in Algorithm 1. From the algorithm, · p(M(t + 1)|M(t), at ). (21)
we can see that the worst case (the cloud cannot finish any
offloaded task in time) requires N 0 (N 0 + 1)/2 comparisons Specifically, if the UAV trajectory and the flying speed are
where N 0 is the number of total offloaded tasks to the UAV planned to be fixed, p(PL(t + 1)|PL(t)) is 1 with a specific
edge server. Even the worst case complexity O(N 02 ) is very PL(t + 1) and 0 otherwise. However, due to the uncertainties
low, and therefore the proposed algorithm can work efficiently in the UAV mobility, p(PL(t + 1)|PL(t)) will be difficult to
in the dynamic SAGIN environment. model. Tlr (t + 1) can be calculated by
N
l l
X Zj
V. C OMPUTATION O FFLOADING P ROBLEM F ORMULATION Tr,i (t + 1) = max{Tr,i (t) + xlij (t) − ε, 0}. (22)
j=1
Cl
We design an online computing offloading approach for the
SAG-IoT system, in which at each time slot the computing For p(M(t + 1)|M(t), at ), it is difficult to model accurately.
tasks of IoT devices are scheduled to process locally, offloaded For example, if a task is offloaded to a UAV edge server,
to the UAV edge server, or offloaded to the cloud server whether the task can be complete within the time slots depends
through the satellite, in order to minimize the total system on the UAV data transmission rate, UAV computation resource

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
7

allocation, other users’ decision, and UAV mobility, which are trace of state-action sequence s0 , a0 , s1 , a1 , stmax , atmax in an
dynamic and correlated. episode following π(·|·, θ), where tmax denotes the preset
4) Reward: To minimize the weighted sum of delay, energy, value indicating the possible maximum number of time slots
P server usage cost, we use the cost function C(st , at ) =
and for processing all tasks. Then, we can have J(θ) as the value
ij Ci,j (st , at ) at time slot t, where Ci,j (st , at ) is the cost function of the start state s0 :
function of task Wij , which is calculated in the following way. tmax
. X
1) if mij (t) = 0, the task has already completed, and thus J(θ) = Vπθ (s0 ) = Eπθ [ γ k C(sk , ak )|π(·|·, θ)]. (25)
Cij (st , at ) = 0. k=0
2) if mij (t) = 1 and xlij + xeij + xcij = 0, the task is To learn the policy parameter θ which minimizes J(θ),
not scheduled in this time slot, and thus a delay of ε intuitively, we can use the gradient descent method to
is introduced. We define the cost function Cij (st , at ) = gradually update θ by
$i ε, where $i is user i’s weight on the delay.
3) if mij (t) = 1 and xlij + xeij + xcij = 1, Cij (st , at ) = θt+1 = θt − ϕ∇J(θt ). (26)
$i (xlij (Tijl −εt)+xeij (Tije −εt)+xcij (Tijl −εt))+xlij Llij + where ϕ represents the learning rate. According to the policy
xeij Leij + xcij Llij . gradient theorem [32], we have
Define the value function V of state s as the expected long- X
term discounted cost starting from s with policy π, i.e., ∇J(θt ) = Eπ [ qπ (st , a)∇θ π(a|st , θ)]
"∞ # a
X X ∇θ π(a|st , θ)
V (s|π) = E γ t C(st , at )|s0 = s, π , (23) = Eπ [ π(a|st , θ)qπ (st , a) ]
a
π(a|st , θ)
t=0

where γ ∈ [0, 1] is a discount factor, and the expectation is ∇θ π(at |st , θ)


= Eπ [qπ (st , at ) ]
taken over all possible state trajectories starting from s. The π(at |st , θ)
online computing offloading approach is to select an optimal ∇θ π(at |st , θ)
= Eπ [Gt ]. (27)
policy π ∗ , which minimizes the value function of each state, π(at |st , θ)
i.e., Note that qπ (s, a) is the state-action value function for policy
π, and Gt = Ct +γCt+1 +γ 2 Ct+2 . . . is the discounted return
X
π ∗ (s) = arg min p(s0 |s, a)[C(s, a) + γV (s0 |π ∗ ))]. (24)
a
s0 of cost. Using the above, we can then update θ by
∇θ π(at |st , θ)
VI. RL- BASED O FFLOADING D ECISION M AKING θt+1 = θt − ϕGt . (28)
π(at |st , θ)
In problem (24), the reward function and transition However, although such a update method (which is referred
probabilities are difficult to model accurately due to the to as REINFORCE method [33]) can converge to a local
UAV mobility and dynamic VM allocation of UAV edge minimum asymptotically, it usually leads to high variance and
servers. In addition, with the increasing system scale, i.e., learns slowly. In the online SAG-IoT computing offloading,
M and N , the exponentially growing system state space both state space and action space are large, and therefore
makes the system intractable. Therefore, the proposed online REINFORCE method may not be suitable. To further improve
computing offloading problem can be solved by model- the learning performance, we thus employ the actor-critic
free RL-based methods, such as Q-learning [30] and policy method in which the approximations to both policy and
gradient methods [31]. Although Q-learning methods have value functions are learned [34]. In actor-critic method,
shown great potentials in solving RL problems with a large the policy is updated in each time slot instead of every
state space, it usually cannot efficiently deal with problems episode of the computing offloading. Therefore, the number
with large or even continuous action spaces, which is the case of samples required to learn the optimal policy can be
in problem (24). Therefore, in this paper, we propose an online reduced dramatically, which accelerates the learning process.
computing offloading approach for the SAG-IoT system by To achieve this, we need to learn the value function and use
adapting the policy gradient method. it as a critic to guide the update of policy at each time slot.
In the proposed online computing offloading approach, the Specifically, denote by V̂ (st , ω) the estimation of the value
policy is parameterized by a vector θ ∈ Rd , i.e., π(a|s, θ) = function of state st , where ω ∈ Rm is the parameter vector
P (at = a|st = s, θt = θ), for the probability that action a is to fit the value function. Then, in each time slot t, the update
taken when the system is in state s at time t, under the policy of θ can be done by
with parameter θ. If θ is defined for each feature of the state,
∇θ π(at |st , θ)
i.e., each element in M (t), T r (t), and P L(t), the length of θt+1 = θt − ϕ(Ct + γ V̂ (st+1 , ω) − V̂ (st , ω)) .
vector θ is M (N + tq + 2). To learn the policy parameter, we π(at |st , θ)
first define the performance measure of θ, which is denoted (29)
by J(θ). Since the online computing offloading problem is Note that in each time slot, the parameter vector ω of the
episodic (an episode ends when all M N tasks are finished), estimated value function V̂ is also updated according to
we define the performance measure as the total discounted
cost of the episode of computing all tasks. Denote by τ a ωt+1 = ωt − ϕ0 ∇ω L(ω), (30)

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
8

where ϕ0 is the learning rate, and the loss function L(ω) is TABLE II
defined as S IMULATION PARAMETERS .

L(ω) = |V̂ (st , ω) − (Ct + γ V̂ (st+1 , ω))|2 . (31) Parameter Value Parameter Value
M 30 N 5
Finally, motivated by the capability of deep neural networks
Cl 200 MC/s El 141 mW
to approximate complex functions, we employ deep learning
Ce 3 GC/s E e , E e− , E c 200 mW
architecture to learn the policy in terms of θ and the estimated
Cc 10 GC/s B 20 MHz
state-value function. The full proposed online computing
h 90 m rSG 10 Mbps
offloading approach for SAT-IoT is shown in Algorithm 2,
α 10−10 J/cycle rSC 10 Mbps
where ϕ and ϕ0 are learning rates for the actor and the critic,
respectively. β 4 × 10−10 J/cycle $i 0.2 J/s
The implementation of the proposed RL-based offloading
approach is shown in Fig. 3, which is composed of the SAGIN
environment, the computing offloading reward evaluator, an VII. P ERFORMANCE E VALUATION
actor network, a critic network, and a temporal-difference
component. The system state can be observed from the current
SAGIN environment, which is then sent to the input of the A. Simulation Configurations
actor network and the critic network. The actor network
generates the action a according to a = πθ (s), and updates
the policy θ. It can be easily seen that at time slot t, for an In this section, we evaluate the proposed joint VM
arbitrary task Wij , the decision xij (t) has four possibilities, resource allocation and task scheduling scheme for the UAV
i.e., not scheduled, process locally, offload to edge, and offload edge server, and the RL-based online computing offloading
to cloud. Therefore, we map these four possible decisions approach for SAG-IoT system. In the simulation, we consider
of to xij integer 0, 1, 2, 3 respectively, and design two a remote 1 km×1 km square area with M = 30 IoT users
output layers of the actor network, i.e., σ and µ, which fixed deployed in this area. The IoT user runs N = 5 different
can compose M × N normal distributed random variables to applications and thus each user has 5 tasks to process. We
represent the actions of each task. The critic network estimates select ARM Cortex-M based IoT devices as the ground users.
the value function V̂ (st , ω) and updates the parameter ω. Referring to [35] and [36], we set the IoT device computing
The reward of a state-action pair is evaluated by the reward capability C l to 200 MC/s (MC=106 cycles), and the energy
evaluator, and is used to calculate the temporal-difference (TD) consumption for local task processing is 141 mW. As defined
η = Ct +γ V̂ (st+1 , ω)− V̂ (st , ω), which is used in the update in [37], the transmission and reception power of IoT users
of the policy parameter θ and the critic network parameter ω. with UAVs and satellites, i.e., E e , E e− , and E c are set to
200 mW. 5 UAVs are serving as the flying edge servers for
Algorithm 2 Deep actor-critic based online computing the IoT computing. UAV movement trajectories are planned to
offloading maximize the minimum throughput which follows Wu’s work
[38] with adopting practical UAV-ground propagation channels
1: Input: IoT user information: location, C l , E l , Eie , Eie−
(1). Referring to [39], the edge server’s computation resource
and Eic
C e is set to 3 GC/s (GC=109 cycles), while the cloud server’s
UAV edge server information: mobility traces, Cve , Bij e
,B
c computation resource assigned to each task, i.e., C c , is set to
Cloud related information: rSG , rSC , Cc , Bij
10 GC/s. For satellite and remote cloud, we consider within
Task information: H in , H out , Z one episode of computing offloading, there is one LEO satellite
2: Output: Optimal computing offloading decision X(t) providing the full coverage to the area, and the satellite-ground
3: Randomly initialize critic network V̂ (s, ω) and actor communication rate rSG is set to 10 Mbps which is the average
network π(s, a|θ) observed transmission rate in the high throughput satellite
4: for episode=1,G do communication system ViaSat-1. The satellite-cloud data rate
5: Initialize a random vector N as the noise for action is also constrained by the satellite-ground transmission rate,
exploration and therefore we set rSC = rSG = 10 Mbps. Different
6: Observe the initial state s1 computation tasks may have different computation to data
7: for time slot t=1,tmax do ratios; however, for the simulation simplicity we choose
8: select action ar,t = π(s|θ) + Nt x264 VBR encode computation to data ratio, which is 1300
9: execute at and observe the cost Ct and state st+1 cycles/byte, i.e., Z = 1300H in [40]. Hjin and Hjout are
10: η ← Ct + γ V̂ (st+1 , ω) − V̂ (st , ω) randomly chosen between 5 MB and 15 MB, and between
11: update ω ← ω − ϕ0 η∇ω V̂ (st , ω) 1 MB and 5 MB, respectively. We set the usage cost of edge
12: update θ ← θ − ϕηγ t ∇π(aθ π(at |st ,θ)
t |st ,θ)
e
server/cloud server, i.e., Bij c
and Bij to the CPU cycles to
13: end for execute tasks Wij , i.e., Wij ’s workload. In addition, α =
14: end for 10−10 J/cycle, β = 4 × 10−10 J/cycle, and $i = 1 J/s for
15: return each user i. The detailed simulation parameters are shown in
Table. II unless otherwise specified.

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
9

M*N

Action Mapping
σ
Policy Network
(actor) M*N
actions
a=πθ(s)
μ

θ
M*N
Satellite
Network
Cloud Server State s
Edge server
IoT device Reward
TD
UAV Evaluator
VMs
SAGIN Environment

v(s,ω)
Value Network
(critic)

Fig. 3. RL-based computing offloading approach. The proposed approach implements two components, i.e., one actor network to update the policy, and one
critic network to evaluate the value function and guide the update of offloading policy.

B. VM Computing Resource Allocation and Task Scheduling time of the proposed heuristic algorithm remains very small
We first evaluate the proposed VM computing resource when the number of tasks increases. The zoomed-in run
allocation and task scheduling algorithm. We compare the time for the proposed heuristic algorithm shows clearly a
heuristic algorithm with ‘Brute-force’ method and ‘Random’ quadratic increase on run time when the number of offloaded
method. In ‘Brute-force’, exhaustive search is used to find tasks increase, which validates our analysis in Section
the optimal unscheduled tasks, which achieves the upper- IV. To summarize, the proposed VM computing resource
bound performance but is with high computing complexity. allocation and task scheduling algorithm can simultaneously
In ‘Random’, unscheduled tasks are randomly selected. achieve near-optimal performance and very low computational
Fig. 4 shows the delay performance of the proposed complexity, and therefore is suitable to allocation UAV edge
algorithm. In Fig. 4(a), the average delay with respect to server resources under dynamic network conditions.
the UAV edge server computing resource C e is shown. We
can see from the figure that with the increase of C e , the C. Deep RL-based IoT Computing Offloading
average delay of the three methods decrease, because with
In this part, the performance of the proposed RL-based
higher computing server capability, the average processing
SAG-IoT computing offloading approach is evaluated and
time will be reduced, and thus more tasks can be scheduled
compared. To show the efficiency of our proposed approach,
to satisfy their delay requirements. In Fig. 4(b), the average
we explicitly compare it with two other computing offloading
delay with respect to the total number of tasks offloaded to
approaches, i.e., ‘Random’ and ‘Greedy on edge’, which are
the considered edge server is shown, when C e is set to 10
described as follows.
GC. It can be seen that a larger number of tasks lead to
increasing average delay, since more tasks contend for the 1) ‘Random’: each task randomly selects a time slot t ∈
limited computing resources, and fewer tasks can complete {1, 2, . . . , tmax }, and an offloading decision (locally,
in time. In both figures, the proposed heuristic algorithm can edge, cloud).
achieve a very close performance with that of the ‘Brute-force’ 2) ‘Greedy on edge’: since the edge computing can usually
method, which demonstrates the efficiency of the proposed provide a lower computing delay and relatively low
algorithm. price, each user will offload all tasks to the UAV edge
Fig. 5 shows the comparison of the run time between the server if it is within the coverage of a UAV. Otherwise,
proposed heuristic algorithm and the ‘Brute-force’ method. We the user decides to wait, process locally, or offload to
can see from the figure that with the increasing number of cloud with certain probabilities. In the simulation, we
total tasks, the run time of ‘Brute-force’ method increases set the probabilities to 0.8, 0.1, and 0.1, respectively.
exponentially. This is because ‘Brute-force’ method uses Fig. 6 shows the convergence performance of the proposed
exhaustive search and a larger number of tasks leads to an RL-based computing offloading algorithm. The total cost is
exponentially growing searching space. In opposite, the run calculated by the summation of the cost of each task, which

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
10

14
190

180 Heuristic
BruteForce 12
Heuristic
170
Random BruteForce

10
160
Average total delay (s)

Run time (s)


150
8
10-3
7
140
6
6
130 5

4
120
4
3
12 13 14 15 16 17 18
110

2
100

90 0
4 6 8 10 12 14 16 18 20 22 24 12 13 14 15 16 17 18

UAV edge server computing resource (109 cycles) Total number of tasks

(a) Average total delay v.s. C e .


Fig. 5. The run time comparison.
300

Heuristic 3200
250
BruteForce
Random
3100
Average total delay (s)

200

3000
150

Total cost
2900
100

2800

50

2700

0
4 6 8 10 12 14 16
Total number of tasks 2600
0 10 20 30 40 50 60
(b) Average total delay v.s. total number of tasks. Episode

Fig. 4. Performance of the proposed VM computing resource allocation and Fig. 6. Convergence performance of our proposed algorithm.
task scheduling algorithm.

the task is processing (including the upload, processing, and


is the weighted sum of delay, energy consumption, and server transmission of the results), the UAV may fly away and the
usage cost. It can be seen that the algorithm converges very fast user loses the connection.
from the fact that at about 10-th episode the algorithm already In Fig. 8, the main components of the cost, i.e., energy
converges. The high convergence rate stems from the adopted consumption (E + B · α(or β)) and weight delay ($T ) are
actor-critic algorithm in which the critic network judges and shown. It can be seen that the proposed computing offloading
guides the actor network to learn the policy in each time approach can achieve the lowest energy consumption and
slot, instead of in each episode for non-actor-critic policy the lowest delay due to the learnt optimal offloading policy.
gradient methods. The fast convergence of the algorithm can The reason that ‘Random’ approach achieves the similar total
bring many benefits, such as fast reconfiguration if more users delay as RL-based scheme is that in RL-based scheme, more
and application are deployed, more flexibility in a dynamic energy is consumed in transmitting the tasks to the satellite,
environment, and so forth. and in Random scheme, more energy is consumed in locally
Fig. 7 shows the performance of the proposed computing processing the tasks due to longer local process delay since
offloading approach with respect to the UAV server usage more tasks are process locally with ‘Random’ approach (as
cost weight α. It can be seen that the proposed RL-based shown in Fig. 10). However, the ‘Greedy’ approach has very
approach can achieve the lowest total cost than the other high energy consumption and delay, which is due to that failed
approaches since it can learn the optimal offloading policy execution of tasks in UAV edge servers leads to multiple
through interactions with the environments. ‘Greedy’ approach uploads of the same tasks, and thus it consumes a large amount
suffers the most total cost among the three approaches. This is of energy of the IoT devices and leads to prolonged delay.
because ‘Greedy’ approach forces many tasks content for the Fig. 9 shows the total cost with respect to the cloud server
UAV channel and edge server computation resources, which usage cost weight β. Comparing the three approaches, it can
increases the times to complete the tasks. In addition, due be seen the proposed RL-based computing offloading approach
to the mobility of UAVs, within the time duration in which can achieve the lowest average total cost in an episode. The

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
11

5500 6500

ProposedAC
6000 Greedy
5000 Random

5500

4500
5000

Total cost
Total cost

4000 4500
ProposedAC
Greedy
Random 4000
3500

3500

3000
3000

2500 2500
0 2 4 6 8 10 12 14 16 18 20 4 6 8 10 12 14 16 18 20

(10-10 J/cycle) (10-10 J/cycle)

Fig. 7. Total cost v.s. α. Fig. 9. Total cost v.s. β.

18000 120

Local
16000
energy consumption UAV-Edge
Energy comsumption and delay

weighted delay 100 Satellite-Cloud

Average of selection no. tasks


14000

12000 80

10000
60
8000

6000 40

4000

20
2000

0
ProposedAC Random Greedy 0
ProposedAC Random Greedy
Offloading Mechanism Offloading Mechanism

Fig. 8. The energy consumption and weighted delay. Fig. 10. Offloading means selection.

total cost increases with β because the increase of β leads the delay, i.e., $. With the increase of $, the total cost of all
to the increase of βB c which is a component of the total the three offloading approach increase, due to the increase of
cost. It can also be seen that the total cost of the proposed $T , which is the delay component of the total cost. However,
approach increases faster the other two approaches, which is the proposed offloading approach can achieve the lowest total
because in the current setting of the simulation, the satellite- cost and lower increase rate among the three approaches since
cloud offloading can achieve relatively better performance it can learn from the environment an optimal policy to reduce
than local processing and UAV, if properly chosen. Therefore, the total task delay.
the proposed approach learns the environments and chooses
cloud offloading with higher probability. This fact can be
VIII. C ONCLUSION
seen in Fig. 10, which shows the number of selections of
each offloading means for each offloading approach. For the In this paper, we have investigated the IoT computing
proposed approach, it selects satellite-cloud more frequently offloading problem in SAGIN. We have proposed a joint
over the other two offloading means. Compared to satellite- VM allocation and task scheduling mechanism to efficiently
cloud, the local processing results in longer delay due to week allocate the computing resources to different VMs in the UAV
local computation capability, while the UAV-edge may suffer edge server. To offload the computation-intensive tasks, we
the contention problem and high UAV mobility, although it have proposed an RL-based computing offloading approach
has the benefits of high transmission rate and low server to handle the multidimensional SAGIN resources and learns
usage cost. The ‘Random’ and ‘Greedy’ approaches select the dynamic network conditions. Deep neural networks, policy
almost the same number of local processing and satellite- gradient, and actor-critic methods have been employed to
cloud. The ‘Greedy’ approach selects more times of UAV-edge improve the learning performance. Simulation results have
since it may wait for the future UAV connection with a high validated the convergency and efficiency of the proposed
probability if the UAV is current unavailable. approaches. Our work can offer valuable insights to the
Fig. 11 shows the total cost with respect to the weight on important yet underexplored field of edge/cloud computing in

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
12

11000
[13] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground integrated
10000
ProposedAC network: A survey,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp.
Greedy 2714–2741, 2018.
Random
9000
[14] H. Nishiyama, Y. Tada, N. Kato, N. Yoshimura, M. Toyoshima,
and N. Kadowaki, “Toward optimized traffic distribution for efficient
8000 network capacity utilization in two-layered satellite networks.” IEEE
Trans. Veh. Technol., vol. 62, no. 3, pp. 1303–1313, 2013.
Total cost

7000 [15] Y. Zhou, N. Cheng, N. Lu, and X. Shen, “Multi-UAV-aided networks:


aerial-ground cooperative vehicular networking architecture,” IEEE Veh.
6000 Technol. Mag., vol. 10, no. 4, pp. 36–44, 2015.
[16] N. Cheng, W. Xu, W. Shi, Y. Zhou, N. Lu, H. Zhou, and X. Shen, “Air-
5000 ground integrated mobile edge networks: Architecture, challenges, and
opportunities,” IEEE Commun. Mag., vol. 56, no. 8, pp. 26–32, 2018.
4000
[17] Y. Hu and V. O. Li, “Satellite-based Internet: A tutorial,” IEEE Commun.
Mag., vol. 39, no. 3, pp. 154–162, 2001.
3000
[18] M. Patel, B. Naughton, C. Chan, N. Sprecher, S. Abeta, A. Neal et al.,
“Mobile-edge computing introductory technical white paper,” White
2000
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Paper, Mobile-edge Computing (MEC) industry initiative, 2014.
( 0.1 J/s) [19] S. Jeong, O. Simeone, and J. Kang, “Mobile edge computing via a uav-
mounted cloudlet: Optimization of bit allocation and path planning,”
Fig. 11. Total cost v.s. $. IEEE Trans. Veh. Technol., vol. 67, no. 3, pp. 2049–2063, 2018.
[20] D. T. Hoang, D. Niyato, and N. T. Hung, “Optimal energy allocation
policy for wireless networks in the sky,” in Proc. IEEE ICC, 2015, pp.
3204–3209.
SAGIN. In the future, we will focus on jointly optimizing the [21] N. Zhang, S. Zhang, P. Yang, O. Alhussein, W. Zhuang, and X. S.
communication, caching, and computing resources in SAGIN. Shen, “Software defined space-air-ground integrated vehicular networks:
Challenges and solutions,” IEEE Commun. Mag., vol. 55, no. 7, pp. 101–
109, 2017.
ACKNOWLEDGMENT [22] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong,
“Caching in the sky: Proactive deployment of cache-enabled unmanned
This work is sponsored by and by the National Natural aerial vehicles for optimized quality-of-experience,” IEEE J. Sel. Areas
Science Foundation of China (NSFC) under Grant No. Commun., vol. 35, no. 5, pp. 1046–1061, 2017.
[23] S. Jagtap, N. Gandhi, and P. Kadam. (2017) Comparative study
91638204 and the Natural Sciences and Engineering Research of project loon and facebook aquila. [Online]. Available: http:
Council (NSERC) of Canada. //ijesc.org/upload/4d8ea34db8143025dc7aff3880ed9ba9.Comparative%
20Study%20of%20Project%20Loon%20&%20Facebook%20Aquila.pdf
[24] W. Quan, Y. Liu, H. Zhang, and S. Yu, “Enhancing crowd collaborations
R EFERENCES for software defined vehicular networks,” IEEE Commun. Mag., vol. 55,
no. 8, pp. 80–86, 2017.
[1] M. Shafi, A. F. Molisch, P. J. Smith, T. Haustein, P. Zhu, P. Silva, [25] W. Shi, J. Li, W. Xu, H. Zhou, N. Zhang, S. Zhang, and X. Shen,
F. Tufvesson, A. Benjebbour, and G. Wunder, “5g: A tutorial overview “Multiple drone-cell deployment analyses and optimization in drone
of standards, trials, challenges, deployment, and practice,” IEEE J. Sel. assisted radio access networks,” IEEE Access, vol. 6, pp. 12 518–12 529,
Areas Commun., vol. 35, no. 6, pp. 1201–1221, 2017. 2018.
[2] N. Cheng, F. Lyu, J. Chen, W. Xu, H. Zhou, S. Zhang, and X. Shen, [26] A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitude for
“Big data driven vehicular networks,” IEEE Netw., vol. 32, no. 6, pp. maximum coverage,” IEEE Commun. Lett., vol. 3, no. 6, pp. 569–572,
160–167, 2018. 2014.
[3] C. You, K. Huang, and H. Chae, “Energy efficient mobile cloud [27] R. I. Bor-Yaliniz, A. El-Keyi, and H. Yanikomeroglu, “Efficient 3-d
computing powered by wireless energy transfer,” IEEE J. Sel. Areas placement of an aerial base station in next generation cellular networks,”
Commun., vol. 34, no. 5, pp. 1757–1771, 2016. in Proc. IEEE ICC, 2016, pp. 1–5.
[4] W. Zhang, Y. Wen, and D. O. Wu, “Collaborative task execution in [28] S. Sahni, “Computationally related problems,” SIAM Journal on
mobile cloud computing under a stochastic wireless channel,” IEEE Computing, vol. 3, no. 4, pp. 262–279, 1974.
Trans. Wireless Commun., vol. 14, no. 1, pp. 81–93, 2015. [29] P. M. Pardalos and S. A. Vavasis, “Quadratic programming with one
[5] S. E. Mahmoodi, R. Uma, and K. Subbalakshmi, “Optimal joint negative eigenvalue is np-hard,” Journal of Global Optimization, vol. 1,
scheduling and cloud offloading for mobile applications,” IEEE Trans. no. 1, pp. 15–22, 1991.
Cloud Comput., early access, 2018. [30] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
[6] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading with double q-learning,” in Proc. AAAI, 2016, pp. 1–5.
for mobile-edge computing with energy harvesting devices,” IEEE J. [31] R. S. Sutton, A. G. Barto, F. Bach et al., Reinforcement learning: An
Sel. Areas Commun., vol. 34, no. 12, pp. 3590–3605, 2016. introduction. MIT press, 1998.
[7] S. Barbarossa, S. Sardellitti, and P. Di Lorenzo, “Communicating while [32] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
computing: Distributed mobile cloud computing over 5g heterogeneous “Deterministic policy gradient algorithms,” in Proc. ICML, 2014, pp. 1–
networks,” IEEE Signal Process. Mag., vol. 31, no. 6, pp. 45–55, 2014. 9.
[8] K. Kumar, J. Liu, Y.-H. Lu, and B. Bhargava, “A survey of computation [33] R. J. Williams, “Simple statistical gradient-following algorithms for
offloading for mobile systems,” Springer Mobile Netw. Appl., vol. 18, connectionist reinforcement learning,” Machine learning, vol. 8, no. 3-4,
no. 1, pp. 129–140, 2013. pp. 229–256, 1992.
[9] Y.-H. Kao, B. Krishnamachari, M.-R. Ra, and F. Bai, “Hermes: Latency [34] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley,
optimal task assignment for resource-constrained mobile computing,” D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep
IEEE Trans. Mobile Comput., vol. 16, no. 11, pp. 3056–3069, 2017. reinforcement learning,” in Proc. ICML, 2016, pp. 1928–1937.
[10] F. Lyu, H. Zhu, H. Zhou, W. Xu, N. Zhang, M. Li, and X. Shen, “Ss- [35] ARM. (2017) Arm cortex-m for beginners.
mac: A novel time slot-sharing mac for safety messages broadcasting in [Online]. Available: https://community.arm.com/
vanets,” IEEE Trans. Veh. Technol., vol. 67, no. 4, pp. 3586–3597, Apr. cfs-file/ key/telligent-evolution-components-attachments/
2018. 01-2142-00-00-00-00-52-96/White-Paper- 2D00 -Cortex 2D00
[11] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient M-for-Beginners- 2D00 -2016- 2800 final-v3 2900 .pdf
resource allocation for mobile-edge computation offloading,” IEEE [36] A. Hussain, “Energy consumption of wireless iot nodes,” Master’s thesis,
Trans. Wireless Commun., vol. 16, no. 3, pp. 1397–1411, 2017. NTNU, 2017.
[12] Y. Wu, L. P. Qian, H. Mao, X. Yang, H. Zhou, X. Tan, and D. H. [37] 3GPP, “Study on new radio (nr) to support non terrestrial networks,”
Tsang, “Secrecy-driven resource management for vehicular computation Tech. Rep., 2018. [Online]. Available: http://www.3gpp.org/DynaReport/
offloading networks,” IEEE Netw., vol. 32, no. 3, pp. 84–91, 2018. 38811.htm

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2906789, IEEE Journal
on Selected Areas in Communications
13

[38] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication Hongli He received the B.Sc. degree in information
design for multi-uav enabled wireless networks,” IEEE Trans. Wireless engineering from Zhejiang University, Hangzhou,
Commun., vol. 17, no. 3, pp. 2109–2121, 2018. China, in 2014, where he is currently pursuing the
[39] M.-H. Chen, B. Liang, and M. Dong, “Joint offloading and resource Ph.D. degree with the Institute of Information and
allocation for computation and communication in mobile cloud with Communication Engineering. His current research
computing access point,” in Proc. IEEE INFOCOM, 2017, pp. 1–9. interests include video streaming in vehicular ad
[40] A. P. Miettinen and J. K. Nurminen, “Energy efficiency of mobile clients hoc networks, LTE in unlicensed spectrum and edge
in cloud computing.” in Proc. HotCloud, 2010, pp. 4–11. cloud computing.

Nan Cheng (M’16) received the Ph.D. degree


from the Department of Electrical and Computer
Engineering, University of Waterloo, and B.E.
degree and the M.S. degree from the Department
of Electronics and Information Engineering, Tongji
University. He is currently working as a joint
professor with the School of Telecommunication,
Xidian University, and joint Post-doctoral Weisen Shi (SM’15) received the B.S. degree from
fellow with the Department of Electrical and Tianjin University, Tianjin, China, in 2013 and
Computer Engineering, University of Toronto received the M.S. degree from Beijing University
and the Department of Electrical and Computer of Posts and Telecommunications, Beijing, China,
Engineering, University of Waterloo. His research interests include in 2016. He is currently working toward the
performance analysis, MAC, opportunistic communication for vehicular Ph.D. degree with the Department of Electrical
networks, unmanned aerial vehicles, and application of AI for wireless and Computer Engineering, University of Waterloo,
networks. Waterloo, ON, Canada. His interests include drone
communication and networking, network function
virtualization and vehicular networks.
Feng Lyu (M’18) received his Ph.D degree
in the Department of Computer Science and
Engineering from Shanghai Jiao Tong University,
Shanghai, China, in 2018, and BS degree in
software engineering from Central South University,
Changsha, China, in 2013. Since Sept. 2018,
he worked as a postdoctoral fellow in BBCR
Group, Department of Electrical and Computer
Engineering, University of Waterloo, Canada.
His research interests include vehicular ad hoc Xuemin (Sherman) Shen (M’97-SM’02-F’09)
networks, cloud/edge computing, and big data driven received the B.Sc.(1982) degree from Dalian
application design. Maritime University (China) and the M.Sc. (1987)
and Ph.D. degrees (1990) from Rutgers University,
New Jersey (USA), all in electrical engineering.
He is a University Professor and the Associate
Wei Quan (M’14) received his Ph.D. degree Chair for Graduate Studies, Department of Electrical
in Communication and Information System from and Computer Engineering, University of Waterloo,
Beijing University of Posts and Telecommunications Canada. Dr. Shens research focuses on resource
(BUPT), Beijing, China in 2014, and currently management, wireless network security, social
is an Associate Professor at School of Electronic networks, smart grid, and vehicular ad hoc and
and Information Engineering, BJTU. His research sensor networks. He was an elected member of IEEE ComSoc Board of
interests include key technologies for network Governor, and the Chair of Distinguished Lecturers Selection Committee.
analytics, future Internet, 5G networks, and Dr. Shen served as the Technical Program Committee Chair/Co-Chair for
vehicular networks. He has published more than IEEE Globecom16, Infocom14, IEEE VTC10 Fall, and Globecom07, the
20 papers in prestigious international journals Symposia Chair for IEEE ICC10, the Tutorial Chair for IEEE VTC11 Spring
and conferences including IEEE Communications and IEEE ICC08, the General CoChair for ACM Mobihoc15, Chinacom07 and
Magazine, IEEE Wireless Communications, IEEE Network, IEEE QShine06, the Chair for IEEE Communications Society Technical Committee
Transactions on Vehicular Technology, IEEE Communications Letters, IFIP on Wireless Communications, and P2P Communications and Networking.
Networking, IEEE ICC, IEEE GLOBECOM. He serves as an Associate He also serves/served as the Editor-in-Chief for IEEE Internet of Things
Editor of Journal of Internet Technology (JIT), Peer-to-Peer Networking Journal, IEEE Network, Peerto-Peer Networking and Application, and IET
and Applications (PPNA), IET Networks and technical reviewers for many Communications; a Founding Area Editor for IEEE Transactions on Wireless
important international journals. He is also the TPC Member of IEEE ICC Communications; an Associate Editor for IEEE Transactions on Vehicular
(2017, 2018), ACM MOBIMEDIA (2015, 2016, 2017) and IEEE CCIS Technology, Computer Networks, and ACM/Wireless Networks, etc.; and
(2015, 2016). He is a Member of IEEE, ACM, and a Senior Member of the Guest Editor for IEEE JSAC, IEEE Wireless Communications, IEEE
CAAI (Chinese Association of Artificial Intelligence). Communications Magazine, and ACM Mobile Networks and Applications,
etc. Dr. Shen received the Excellent Graduate Supervision Award in 2006,
and the Premiers Research Excellence Award (PREA) in 2003 from the
Province of Ontario, Canada. Dr. Shen is a registered Professional Engineer of
Ontario, Canada, an IEEE Fellow, an Engineering Institute of Canada Fellow,
Conghao Zhou (SM’19) received the B.S. degree a Canadian Academy of Engineering Fellow, a Royal Society of Canada
from Northeastern University, Shenyang, China, in Fellow, and a Distinguished Lecturer of IEEE Vehicular Technology Society
2017 and received the M.S. degree from University and Communications Society.
of Illinois at Chicago, Chicago, IL, USA, in 2018.
He is currently pursuing toward the Ph.D. degree
with the Department of Electrical and Computer
Engineering, University of Waterloo, Waterloo, ON,
Canada. His research interests include space-air-
ground integration networks and machine learning
in wireless networks.

0733-8716 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like