Intelligent Edge: Leveraging Deep Imitation Learning For Mobile Edge Computation Offloading
Intelligent Edge: Leveraging Deep Imitation Learning For Mobile Edge Computation Offloading
Intelligent Edge: Leveraging Deep Imitation Learning For Mobile Edge Computation Offloading
Intelligent Edge:
Leveraging Deep Imitation Learning for
Mobile Edge Computation Offloading
Shuai Yu, Xu Chen, Lei Yang, Di Wu, Mehdi Bennis, and Junshan Zhang
Digital Object Identifier: Shuai Yu, Xu Chen (corresponding author), and Di Wu are with Sun Yat-sen University; Lei Yang is with the University of Nevada;
10.1109/MWC.001.1900232 Mehdi Bennis is with the University of Oulu; Junshan Zhang is with Arizona State University.
Input
Wireless Layer Hidden Layer
Output
Unit Mobile MEC Remote
Subcarrier ... Layer
User Server Cloud
1
1 C1
MEC
Subcarrier
2 ... 2 C2
...
Server
...
Subcarrier
N 1 C3
0 C4
Processor
... C5
0
CPU Core
...
1
1 C6
CPU Core Backhaul
2
...
Offloading
CPU Core
M Deep Neural Network Decision
System State
Then the objective of the computation offloading decision in the output layer is a T-dimensional vec-
optimization problem is to obtain a near-optimal tor for the application. If a sub-task is offloaded,
offloading policy * that can minimize the offload- its value is 2 (cloud) or 1 (edge); otherwise, it is
ing cost given by t∈TE(S, It). Note that the offload- local. We define the multi-label offloading accu-
ing cost is the sum of costs for the sub-tasks of racy as the proportion of the predicted correct
mobile application A, which is not provided imme- labels to the total number of labels. Through the
diately. We can obtain the long-term cost when all accuracy, we can evaluate the output (i.e., pre-
the sub-tasks have been processed. dicted offloading actions) with respect to the opti-
mal offloading actions.
deep ImItAtIon leArnIng for Figure 2 illustrates the flowchart of our model.
It consists of three phases: offline demonstration
computAtIon offloAdIng generation, offline model training, and online
The optimization problem of minimizing the off- decision making. In the following, we describe
loading cost is a combinatorial optimization prob- these phases.
lem. Thus, it is impossible to achieve the optimal Offline Demonstration Generation: Based on
solution in real time by using standard optimiza- behavioral cloning [4], imitation learning performs
tion methods. Another possible approach is to supervised learning through imitating demonstra-
utilize RL. Nevertheless, since the action space is tions (i.e., optimal offloading action). Thus, the
defined over the combination of the execution objective of this phase is to generate demonstra-
selections for multiple sub-tasks, it suffers from the tions to train our DIL framework. We acquire a
curse of dimensionality and hence converges very large number of decision samples by leveraging
slowly in practical implementation. the offline optimization scheme for solving the opti-
To address these challenges, we explore a mization problem. In general, when the decision
novel scheme of autonomous computation off- space is:
loading decision by leveraging DIL. Intuitively, we • Small, we can use an exhaustive approach
first obtain the demonstrations (i.e., the optimal to obtain the optimal offloading decision by
decision samples) by solving the computation off- searching the whole action space (there are 3T
loading optimization problem in an offline manner. possibilities in the space).
Then, using these demonstrations, we train a DIL • Medium, the problem can be solved by some
model for imitating the optimal decision patterns mixed integer programming solver (e.g.,
and generate efficient online computation offload- CPLEX).
ing decisions in real time. • Huge, we can leverage some approximate
offline algorithms to obtain efficient decision
deep multI-lAbel clAssIfIcAtIon model for samples.
computAtIon offloAdIng Then the network state S as well as its optimal
As shown in Fig. 1, the optimization problem can offloading decision are recorded as raw decision
be formulated as a multi-label classification [11] samples to train our framework in the next phase.
problem. Assume that mobile application A con- Offline Model Training: In this phase, we use
sists of T sub-tasks. The input layer of our training the deep neural network (DNN) to extract and
model consists of the observation of the applica- train the features of training data. We conventional-
tion features and network states. Our offloading ly use the rectified linear unit (ReLU) as the activa-
0.6
0.5
DROS
SOS
0.5932
0.6197
proof-of-concept performAnce evAluAtIon
0.4 DIOS sImulAtIon settIng
0.3 OOS In order to evaluate the performance of our DIL-
0.2 based offloading scheme, we consider a MEC
0.1 0.08715 network consisting of a mobile device and a MEC
0.0546 0.05
0.0011 0.01345 server. The number of CPU cores for the SCceNB
0.0
Offloading Schemes is set to be 16 (i.e., M = 16). For the edge network,
we consider the Rayleigh-fading environment, and
FIGURE 3. Comparison of offloading decision the total bandwidth is divided into 256 subcarriers
accuracy. (i.e., N = 256). The wired (backhaul) delay between
the SCceNB and the remote cloud is W ∈ [0.01,
0.02] s. FThe mobile application usually consists of
tion function for the hidden layers. Our offloading a few sub-tasks to dozens of sub-tasks in reality. In
model inputs the system state S and outputs off- this article, the mobile application consists of 6 sub-
loading decisions It(t = 1, 2, …, T). The sigmoid tasks (i.e., T = 6). The data dependencies and the
function is used as the output of our model. Note workload for the sub-tasks follow uniform distribu-
that it can be formulated as a multi-label classifica- tion, similar to [14]. Note that the random variables
tion problem to maximize the multi-label (i.e., pre- for different sub-tasks are independent.
dicted offloading actions) accuracy. We consider In the offline demonstration generation phase,
the cross-entropy loss [12] to measure the perfor- we use MATLAB to generate 100,000 demonstra-
mance of the model, and use the Adam optimizer tions, which means that the mobile application is
[13] to optimize the neural network. The output executed 100,000 times independently under var-
layer consists of T neurons that represent the off- ious network environments. At the same time, the
loading actions of the T sub-tasks. If an output sample of the optimal offloading scheme can be
neuron is less than 0.5, it denotes local execution; obtained in this phase. In the online decision mak-
otherwise, offloading. ing phase, we evaluate the performance of our
Online Decision Making: Once the offline DIL-based offloading scheme (DIOS) by leveraging
model training phase of the DNN is finished, it can the Jupyter notebook. We consider the following
be used to make real-time computation offload- eight benchmark schemes from the literature.
ing decisions in an online manner. At this time, the Optimal Offloading Scheme: We search the
DNN outputs a sequence of offloading decisions whole action space to find the optimal offloading
for all sub-tasks of the mobile application. Based on scheme (OOS).
System Cost
1.5 1.47696 COS
loading scheme that is based on the DRL method 1.41503 DROS environments. Com-
[9]. 1.4 1.38794
SOS pared to the traditional
1.31364
Greedy Algorithm-Based Offloading Scheme 1.3 DIOS machine learning meth-
(GOS): The mobile device chooses offloading 1.2384
OOS ods, deep learning out-
actions through a greedy algorithm, which means 1.2 1.15241.161971.13479
that the mobile device chooses the sub-action that performs in processing
1.1 1.07561
can maximize the offloading cost in each sub-task huge data, since it can
execution step. 1.0 precisely learn high-lev-
Offloading Schemes
Random Offloading Scheme (ROS): The off- el features (e.g., faces
loading decisions are generated randomly. FIGURE 4. Comparison of offloading cost. and voices), extracts
Shallow Learning-Based Offloading Scheme
(SOS): The number of hidden layers is set to be 1. new features automati-
Edge Offloading Scheme (EOS): With coarse 50 cally for different prob-
offloading strategies, the entire mobile application ROS lems, and takes much
45 GOS
is offloaded to the MEC server side. less time to inference