0% found this document useful (0 votes)
16 views

J2) DRL-based Offloading For Computation Delay Minimization in Wireless-Powered Multi-Access Edge Computing

Uploaded by

iziheng520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

J2) DRL-based Offloading For Computation Delay Minimization in Wireless-Powered Multi-Access Edge Computing

Uploaded by

iziheng520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO.

3, MARCH 2023 1755

DRL-Based Offloading for Computation Delay


Minimization in Wireless-Powered
Multi-Access Edge Computing
Kechen Zheng , Member, IEEE, Guodong Jiang, Xiaoying Liu , Member, IEEE, Kaikai Chi , Member, IEEE,
Xinwei Yao , Member, IEEE, and Jiajia Liu , Senior Member, IEEE

Abstract— Wireless power transfer (WPT) and edge com- I. I NTRODUCTION


puting have been validated as effective ways to solve the
energy-limited problem and computation-capacity-limited prob-
lem of wireless devices (WDs), respectively. This paper studies
T RADITIONALLY, powering wireless devices (WDs) by
batteries is popular [1], [2], [3]. However, the capacity
of batteries limits the lifetime of WDs, and narrows the
the wireless-powered multi-access edge computing (WP-MEC)
network, where WDs conduct either local computing or task application of WDs in practice. With the help of wireless
offloading for their individable computation tasks. We aim to power transfer (WPT) technology [4], [5], WDs are able to
minimize total computation delay (TCD) when each WD has a harvest energy from the radio frequency (RF) signal to charge
computation task to execute, referred to as the total computation the batteries, and extend the lifetime [6].
delay minimization (TCDM) problem, by jointly optimizing the
offloading-decision, WPT duration, and transmission durations Along with the rapid development of wireless networks and
of offloading WDs. The TCDM problem is a mixed integer explosive growth of WDs, there are ever-increasing computa-
programming (MIP) problem that is challenging to efficiently tion demands from various applications of WDs. To cope with
obtain the optimal or near-optimal solution. To tackle this chal- computation-intensive and delay-sensitive computation tasks
lenge, we decompose the TCDM problem into the sub-problem of WDs, the multi-access edge computing (MEC) [7], [8], [9]
of optimizing the WPT duration and transmission durations,
and the top-problem of optimizing the offloading decision. For technology emerges. With MEC technology, WDs reduce the
the nonconvex sub-problem, we design a worst-WD-adjusting energy consumption and computational burden by offloading
(WDA) algorithm to efficiently obtain its optimal solution. For computation tasks to nearby edge servers.
the top-problem, under the time-varying channel conditions, Taking advantages of both WPT and MEC, the wireless-
traditional optimization methods are hard to determine the powered multi-access edge computing (WP-MEC) network
optimal or near-optimal offloading decision within the channel
coherence duration. To fast obtain the near-optimal offloading [10], [11] has been proposed recently. From the perspective
decision, we propose a deep neural networks (DNN)-based of offloading modes in the WP-MEC network, existing works
deep reinforcement learning (DRL) model, which takes the could be divided into two modes: partial offloading and binary
sub-problem solving as one component for utility evaluation. offloading. The difference between the two modes is whether
Finally, numerical results demonstrate that the proposed online the computation task is tackled as a whole. Works [12],
DRL-based offloading algorithm achieves the near-minimal TCD
with low computational complexity, and is suitable for the fast- [13], [14], [15], [16], [17], [18], [19], [20] focused on the
fading WP-MEC network. subdividable computation tasks, and adopted partial offloading
mode, while other works [21], [22], [23], [24], [25], [26],
Index Terms— Multi-access edge computing, wireless power
transfer, deep reinforcement learning, computation delay. [27], [28], [29] focused on the computation tasks that are
individable, and adopted binary offloading mode. From the
Manuscript received 29 April 2022; revised 19 September 2022 and perspective of performance metrics, existing works on the
8 December 2022; accepted 11 January 2023. Date of publication 18 January
2023; date of current version 17 March 2023. This work was supported WP-MEC network have studied the energy consumption [12],
in part by the National Natural Science Foundation of China under Grant [13], [14], [15], [29], delay [16], [17], energy efficiency [19],
No. 62272414, 61902351, 61902353, in part by the Zhejiang Provincial [25], computation rate [21], [22], [23], [24], [27], [30],
Natural Science Foundation of China under Grant No. LR20F020003,
LY21F020022, LY21F020023, and in part by the Fundamental Research Funds and the tradeoff between delay and other key performances
for the Provincial Universities of Zhejiang under Grant No. RF-A2022005. [18], [28].
The associate editor coordinating the review of this article and approving it As far as we know, few work has considered the com-
for publication was N. Miridakis. (Corresponding author: Kaikai Chi.)
Kechen Zheng, Guodong Jiang, Xiaoying Liu, Kaikai Chi, and Xinwei Yao putation delay minimization in the WP-MEC network with
are with the School of Computer Science and Technology, Zhejiang the binary offloading mode, while [16] and [17] considered
University of Technology, Hangzhou 310023, China (e-mail: kechenzheng@ the delay minimization problem of the WP-MEC networks
zjut.edu.cn; [email protected]; [email protected]; kkchi@zjut.
edu.cn; [email protected]). by using the partial offloading mode. In addition, the WDs
Jiajia Liu is with the School of Cybersecurity, Northwestern Polytechnical in this paper offload the computation tasks to the AP by
University, Xi’an 710072, China (e-mail: [email protected]). active transmission whereas [16] adopted both the backscatter
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCOMM.2023.3237854. communication and active transmission. As most MEC sce-
Digital Object Identifier 10.1109/TCOMM.2023.3237854 narios, the computation tasks in this paper come from WDs
0090-6778 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1756 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

themselves whereas the tasks of the device come from the obtains the near-optimal solutions with low computational
hybrid access point (H-AP) in [17]. complexity after converging.
This paper studies the total computation delay minimization
(TCDM) problem that jointly optimizes the WPT duration, II. R ELATED W ORK
transmission time allocation, and binary offloading-decision In [12], [13], [14], [15], [16], [17], [18], [19], and [20],
to minimize the total computation delay (TCD). It is very the computation tasks are considered as subdividable and low
challenging due to the joint optimization of the binary vari- coupling, thus it is suitable to adopt the partial offloading for
ables and continuous variables, which is the 0-1 mixed integer them. In [12], a multi-antenna access point (AP) (integrated
programming (MIP) problem, and is an NP-hard problem due with an MEC server) conducts the WPT, and each WD uses
to [31] and [32]. To solve the TCDM problem, traditional the harvested energy to execute computation tasks by using
optimization techniques are computationally expensive since the Time Division Multiple Access (TDMA) based partial
they usually require a large number of iterations to reach offloading, in order to minimize the AP’s total energy con-
the optimal or near-optimal solutions. So the traditional opti- sumption subject to the computation latency constraints. Wang
mization techniques not only take a lot of time to reach the et al. [13] considered the energy consumption minimization of
solution, which is unacceptable for the delay-sensitive tasks, the WP-MEC network, where the energy transmitter employs
but also hardly obtain the optimal or near-optimal solutions energy beamforming for WPT and a single user should finish
within the channel coherence duration under the time-varying the computation of some tasks by offloading partial tasks to
conditions. As for using machine learning or deep learning the edge computing server (ECS). Wang et al. [14] further
models to output the values of all the variables (including studied the energy-consumption-minimization problem for the
binary variables) of the TCDM problem, it is very difficult multi-user WP-MEC network. Bai et al. [15] considered
for the models to learn the near-optimal solutions or even intelligent reflecting surface (IRS) assisted and Orthogonal
the models do not converge. To tackle the aforementioned Frequency Division Multiplexing (OFDM) based WP-MEC
issues, we propose a DRL-based offloading algorithm, which network, and proposed an algorithm to minimize the energy
jointly exploits DRL and optimization techniques, to obtain consumption. Ye et al. [16] minimized the computation delay
the near-optimal offloading solutions with low computational of the WP-MEC network, where Internet of Things (IoT)
complexity. The main contributions of this paper are summa- nodes offload tasks to the ECS by backscatter communication
rized as follows: and active transmission. In [17], the H-AP first transmits
• In the multi-user WP-MEC network, we formulate the data and energy simultaneously to the device, and the device
TCDM problem as a 0-1 MIP problem, where each utilizes the harvested energy to process both the local data
WD conducts either local computing or task offload- and the received data by using partial offloading, aiming to
ing to complete the computation task. We decompose minimize the computation delay. Mao et al. [18] considered
the TCDM problem into the sub-problem and the top- the long-term tradeoff between energy efficiency and delay
problem. The sub-problem optimizes the WPT dura- in a multi-user WP-MEC network, and proposed an algo-
tion and transmission time allocation under a given rithm by transforming the original problem into a series of
offloading-decision, and the top-problem optimizes the deterministic optimization problems. Shi et al. [19] considered
offloading-decision. the non-orthogonal multiple access (NOMA)-based multi-edge
• For the nonconvex sub-problem, we design a worst- user WP-MEC network, aiming to maximize computational
WD-adjusting (WDA) algorithm to efficiently obtain the energy efficiency by using an iterative algorithm to jointly
optimal WPT duration and transmission time allocation. optimize computational frequency, network time allocation,
In the WDA algorithm, in order to decrease the comple- and transmission power. Min et al. [20] studied a WP-MEC
tion delay of the worst local-computing WD that com- network with multiple ECSs and an IoT device with energy
pletes the task slowest, the WPT duration is continuously harvesting components, and proposed a DRL-based offloading
increased or decreased until one of the three events occur: algorithm to determine the offloading rate and the ECS through
(a) the completion delay of the worst local-computing the predicted harvested energy, current energy level, and
WD stops decreasing and begins to increase; (b) a new previous transmission rate of each ECS.
worst local-computing WD emerges, and its completion In [21], [22], [23], [24], [25], [26], [27], [28], and [29],
delay increases with the adjustment of WPT duration; the computation tasks are individable, thus the binary offload-
(c) the completion delay of offloading WDs equals that ing mode is adopted. Bi and Zhang [21] considered a
of local-computing WDs. TDMA-based multi-user WP-MEC network, and proposed
• For the top-problem, we design a deep neural networks a decoupled optimization algorithm to jointly optimize the
(DNN)-based DRL model to obtain the near-optimal offloading-decision and transmission time allocation to max-
offloading-decisions with low computational complexity. imize the weighted sum-computation-rate (WSCR). On the
To be specific, a DNN with an exploration mechanism is basis of [21], [22] considered the time-varying property of
used to learn the near-optimal offloading-decisions, with wireless channels, and proposed an online learning frame-
the help of the WDA algorithm to evaluate each candidate work that obtains the optimal time allocation under a given
offloading-decision. offloading-decision by using the optimization theory, and
• Through numerical results, we verify the effectiveness of applies the DNN-based DRL model to obtain the near-
the DRL-based offloading algorithm. It converges fast and optimal offloading-decision. Nguyen et al. [23] considered

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1757

TABLE I with one single antenna and a rechargeable battery for energy
D EFINITION OF I MPORTANT PARAMETERS storage. The H-AP is also equipped with one single antenna,
and works in the same frequency band as the WDs.
The H-AP is able to broadcast the RF signal to charge
WDs through the downlink (DL) channel, and receive the
computation tasks from WDs by uplink (UL) channel. WDs
harvest energy from the RF signal, store it in the batteries,
and consume it for local computation or transmission of the
computation tasks. Let hi denote the gain of the UL channel
between H-AP and Ni , and gi denote the gain of the DL
channel between H-AP and Ni . For simplicity, we consider the
channel gains of UL and DL are symmetric, i.e., hi = gi . The
wireless channel gain contains large-scale fading and small-
scale fading. The large-scale fading follows the free-space path
loss model, and the small-scale fading follows the Rayleigh
fading model. According to the free-space path fading model,
the large-scale fading hi of UL and DL could be given as
 de
3 · 108
hi = Ad , (1)
4πfc di
where Ad denotes the antenna gain, fc denotes the carrier
frequency, di denotes the distance between Ni and H-AP, and
de denotes the path loss exponent. According to the Rayleigh
the OFDMA-based WP-MEC network, and maximized the fading model, the channel gain h = [h1 , h2 , . . . , hI ] could be
total calculation rate through optimizing offloading decision, given as
time allocation, backscatter coefficient and transmit power.
Zhou et al. [24] considered the WP-MEC network, where hi = hi γi , (2)
an unmanned aerial vehicle (UAV) is applied for WPT to
where γi denotes the independent random channel fading
alleviate the performance degradation caused by RF energy
factor following an exponential distribution with unit mean.
propagation path loss. Through optimizing transmission time
The channel gain samples for computation tasks of a given
allocation, transmit power and UAV trajectory, the WSCR is
WD are independent.
maximized subject to the UAV’s speed constraint. Zhou and
The scheme of WPT and task offloading is introduced as fol-
Hu [25] considered the multi-user WP-MEC network under
lows. In order to avoid mutual interference between WPT and
both partial and binary offloading modes, and maximized
data transmission, and that among different WDs’ transmis-
the energy efficiency under the max-min fairness criterion
sions, the WP-MEC network follows the TDMA mechanism,
for both TDMA and NOMA. Liu et al. [26] maximized the
which has multiple advantages summarized in [33]. First, the
minimal energy balance among all WDs, where WDs perform
WDs transmit the information about WDs’ task workloads
all the tasks by one of the following three means: computing
S = [S1 , S2 , . . . , SI ] to the H-AP. Second, the H-AP adopts
locally, offloading to the cloud server, and offloading to the
the proposed DRL-based offloading algorithm to output the
fog server. Zeng et al. [27] considered the NOMA-based WP-
offloading-decision, broadcasts the offloading-decision to the
MEC network using the binary offloading mode, and proposed
WDs, and conducts WPT with duration β. Finally, some WDs
a heuristic method based on wireless channel gain to maxi-
(also called local-computing WDs) use the harvested energy
mize the total computation rate. Zarandi and Tabassum [28]
to execute local computation, and the other WDs (also called
proposed a federated deep reinforcement learning framework
offloading WDs) use the harvested energy to offload their tasks
to minimize the expected long-term task completion delay
to the H-AP based on TDMA. Besides, the overhead due to
and energy consumption of IoT devices. Tang and Wong [29]
signaling exchanges between MEC and WDs is assumed to
incorporated the long short-term memory, dueling deep
be neglected. Let τi denote the task execution time of WD
Q-network, and double-DQN techniques to improve the esti-
Ni when it chooses local-computing, and ti denote the task
mation of the long-term cost in mobile edge computing
execution time of WD Ni when it chooses task offloading.
systems.
Since the computational ability of ECS is much stronger
than WDs, the computing time of ECS could be ignored. Addi-
III. P RELIMINARY tionally, since the H-AP uses the relatively high transmission
A. System Model power to transmit the computing results to the offloading WDs
through DL, the data transmission time of DL can be ignored.
We consider a WP-MEC network composed of one H-AP The amount of the harvested energy Ei by Ni could be given
and I WDs. Let Ni represent the i-th WD, i ∈ I where as
I = {1, 2, 3, . . . , I}. Ni has a computation task with Si
computation input bits to execute. Each WD is equipped Ei = μP hi β, (3)

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1758 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

second) could be given as


 
μP hi 2 β
ri = W log2 1 + , (5)
ti σ 2
where W denotes the transmission bandwidth. In order to fully
offload the task to the H-AP by Ni , the following UL data
transmission constraint should be satisfied
 
μP hi 2 β
Si ≤ ti ri = ti W log2 1 + . (6)
ti σ 2

C. Local Computing
When Ni chooses local computing (i.e., i ∈ Z), after
harvesting energy from the RF signal during time interval
(0, β], it starts to compute the task locally. Let fi be the
speed of the processor in cycles/second of Ni and fmax be the
maximum speed of the processor. Due to the limited energy,
Ni satisfies the constraint as
Fig. 1. An example of the WP-MEC network and time allocation.
kfi 3 τi ≤ Ei , (7)
where k denotes the computation energy efficiency coefficient.
where 0 < μ < 1 denotes the energy harvesting efficiency,
The harvested energy may not be used up by local-computing
and P denotes the transmit power of the H-AP.
WDs based on (7), while the remaining energy is neglected
As aforementioned, there are two means for each WD to
for future computation tasks due to the energy leakage and the
complete its computation task by consuming the harvested
convenience of analysis. Then φ denotes the number of cycles
energy. The first mean is to compute the task locally, i.e.,
that needed by the local-computing WDs to process one bit of
local computing, and the second mean is to offload the task
task data, where φ > 0. Therefore, τi could be expressed as
to the H-AP, i.e., task offloading. In this paper, we consider
the binary offloading, i.e., Ni chooses either task offloading or Si φ
τi = . (8)
local computing to complete its task. Here, mi = 0 represents fi
that Ni chooses local computing, and mi = 1 represents that To minimize τi under the given β, we need to maximize
Ni chooses task offloading. Let K denote the set of indexes fi while satisfying the energy constraint in (7). Based on (7)
of offloading WDs, and Z denote the set of indexes of local- and (8), we infer that τi decreases with fi , and the energy
computing WDs, where K = {i : mi = 1, 1 ≤ i ≤ I} and consumption kfi 3 τi increases with fi . When fi = fmax , the
Z = {i : mi = 0, 1 ≤ i ≤ I}. value of τi is
An example of the WP-MEC network with four WDs is
shown in Fig. 1. During the time interval (0, β], the H-AP Si φ
τi = , (9)
broadcasts the RF signal to N1 , N2 , N3 , and N4 , and fmax
meanwhile the four WDs harvest energy from the RF signal. and the energy consumption is k(fmax )2 Si φ, which corre-
N1 and N2 choose task offloading, and they offload the tasks sponds to the WPT duration
to the H-AP by TDMA. t1 represents the transmission time
of offloading task by N1 , and t2 represents that of offloading kSi φfmax 2
βi = . (10)
task by N2 . N3 and N4 that choose local computing start μP hi
work synchronously. τ3 represents the time consumption of Therefore, fi = fmax is feasible and optimal if Ei ≥
computing task by N3 , and τ4 represents that of computing k(fmax )2 Si φ, i.e., β ≥ βi . On the contrary, if β < βi ,
task by N4 . Then the TCD T of the WP-MEC network with to achieve the maximal fi , all the harvested energy Ei should
four WDs is be consumed, i.e.,
T = β + max {(t1 + t2 ) , max {τ3 , τ4 }} . (4) kfi 3 τi = Ei . (11)
Based on (3), (8), and (11), we have
 
B. Task Offloading Ei μP hi β
fi = = , (12)
When Ni chooses task offloading (i.e., i ∈ K), let Pi denote kSi φ kSi φ
the transmission power of Ni , and Pi = Etii holds. Then the and
signal-to-noise ratio of Ni ’s signal at the H-AP is Pσi h2 i , where 
σ 2 denotes the background noise variance. Based on Shannon kφ3 Si 3
τi = . (13)
formula and (3), the data transmission rate of Ni (in bits per μβhi P

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1759

Then we summarize the task execution time of


local-computing WD Ni with respect to the duration of
WPT β as
 3 3
kτ Si
, β < βi ,
τi (β) = S τμβhi P (14)
i
fmax , β ≥ β i .

D. Problem Formulation
Based on aforementioned discussions and (16), the TCD
consists of two parts. The duration of the first part is β, and
that of the second part can be formulated as
 
 Fig. 2. The decomposition of the TCDM problem.
max tk , max τz , (15)
z∈Z
k∈K
where tk represents the total duration of task offloading E. Problem Decomposition
k∈K
from the offloading WDs to the H-AP based on TDMA, and Based on (20a)-(20f), the TCDM problem is a MIP problem.
maxz∈Z τz represents the computation time of the worst local- To address this problem, we decompose the TCDM problem
computing WD. Among the WDs that choose local computing, into the sub-problem and top-problem, and design efficient
we only need to consider the computation time of the worst approaches to solve them, as shown in Fig. 2. The sub-problem
WD since every local-computing WD is required to complete that optimizes β and t under the given m, could be formulated
its own task. Then the TCD is given as as
 I 
 (TCDM-sub) : T (m) = min T (m, t, β)
T (m, t, β) = β + max mi ti , max τz . (16) β,t
z∈Z
i=1 subject to (20b), (20c), (20d), (20e).
where m = {mi |i ∈ I } and t = {ti |i ∈ I}. We aim to (21)
minimize the TCD T (m, t, β). Since β influences both the
local computing and task offloading, we let The top-problem that optimizes m, could be formulated as

I (TCDM-top) : min T (m)
Tof f = β + m i ti , (17) m

i=1
subject to (20f). (22)
and Notice that TCDM-sub is not convex due to the complex
Tloc = β + max τz . (18) objective function and constraint (20c).
z∈Z
Therefore the TCD could be further given as IV. A PPROACH FOR S OLVING P ROBLEM TCDM-S UB
T (m, t, β) = max {Tof f , Tloc } . (19) In this section, due to the nonconvexity of the sub-problem,
In summary, the considered TCDM problem is formulated as we present some properties of the sub-problem, and design
follows. an efficient algorithm with low complexity to solve the
sub-problem.
(TCDM) : min T (m, t, β) (20a)
m,t,β
 
μP hi 2 β A. Properties of the Sub-Problem
subject to mi Si ≤ ti W log2 1 + ,
ti σ 2 In this subsection, with a given m, we explore the trade-
∀i ∈ I, (20b) off between Tof f and Tloc to reach the minimum TCD.
  3
3
(1 − mi ) kτ Si We observe from (17) and (18) that both Tof f and Tloc depend
τi = μβhi P , β < βi ,
on β. Then we first obtain a reasonable initial value of β,
(1 − mi ) fSmax

, β ≥ βi and adjust it progressively to approach the optimal β that
∀i ∈ I, (20c) minimizes the TCD.
β ≥ 0, (20d) Based on (19), we have two approaches to obtain the
ti ≥ 0, ∀i ∈ I, (20e) reasonable initial value of β. The first approach is to obtain the
β that minimizes Tloc , and the second approach is to obtain the
mi ∈ {0, 1}, ∀i ∈ I. (20f)
β that minimizes Tof f . For the first approach, if Tof f > Tloc
Here (20b) ensures that the offloading WD Ni with mi = 1 at this value of β, we need to adjust β to decrease Tof f at
could transmit all its task data, and (20c) ensures that the the cost of increasing Tloc . The rules about how to adjust
local-computing WD Ni with mi = 0 finishes its task the current β and when the adjusting is finalized are hard to
workload. The brief proof that the TCDM is NP-hard can be design. Therefore, we first obtain the value of β that minimizes
found in Appendix A. Tof f with a given m, and design the rules about adjusting the

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1760 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

current β and when the adjusting is finalized. (17) could be


replaced as

Tof f = β + tk . (23)
k∈K

We observed from (6) that tk depends on β, and Tof f could


be further expressed as

Tof f (β) = β + tk (β) . (24)
k∈K

As proved in [34], Tof f is a strictly convex function of β,


but the closed-form expression of Tof f with respective to β
cannot be obtained. Following [34], the optimal tk (β)(k ∈ K)
under a given β is obtained by the fixed-point iteration based
on (6), and the golden section search is used to obtain the
unique optimal β, denoted by β  , that achieves the minimum
Tof f . When we obtain β  , there are two possible cases when
β = β:
1) Tof f ≥ Tloc , which means that we cannot decrease the
TCD by adjusting β, i.e., β = β  is the optimal.
2) Tof f < Tloc , which means that we can further decrease
the TCD by adjusting β to decrease Tloc at the cost of
increasing Tof f .
Then we consider case 2) as follows. Fig. 3. Two possible cases of Ti for the local-computing WD Ni .
Definition 1: Call the local-computing WD with the largest
task execution time under the current β as the worst
local-computing WD (WLC-WD), denoted by NJ .
duration βi with fi = fmax , the WPT duration that minimizes
With a given β, NJ could be known according to (14).
Ti is βi  , as shown in Fig. 3(a). Otherwise, the WPT duration
In Algorithm 1, we adjust the current β to decrease NJ ’s TJ
that minimizes Ti is βi , as shown in Fig. 3(b). So we Let
at the cost of increasing Tof f , where
TJ = β + τJ , (25) βith = min{βi , βi  }. (28)
such that the TCD is decreased. The following theorem From Theorem 1, we infer that for the current β, there
indicates whether increasing β and decreasing β should be are 5 different cases for NJ : Case A-1, Case A-2, Case A-3,
applied to decrease TJ , as shown in Fig. 3. Case B-1, and Case B-2, as shown in Fig. 4. Then we present
Theorem 1: For each local-computing WD Ni , Ti first Procedure 1 to indicate whether increasing β and decreasing
monotonically decreases with β and then monotonically β should be applied to decrease TJ .
increases with β. Procedure 1 (Determining whether increasing β or decreas-
Proof: According to (14), when β < βi , the first-order ing β should be conducted): For Case A-1 and Case B-1,
derivative of Ti with respect to β is we increase β to decrease TJ . For Case A-2, Case A-3, and
 − 12   Case B-2, we decrease β to decrease TJ .
dTi 1 kφ3 Si 3 kφ3 Si 3
= 1+ × − × β −2 As we continuously increase/decrease the value of β to
dβ 2 μβhi P μhi P
 decrease TJ , one of the following four events will happen.
1 − 3 kφ3 Si 3 • Event 1: TJ = Tof f holds when NJ is always the
= 1− β 2 . (26)
2 μhi P WLC-WD and β = βJth always holds during the
increase/decrease of β;
We denote
 • Event 2: β = βJth holds when TJ > Tof f always holds
 3 kφ3 Si 3 and NJ is the WLC-WD during the increase/decrease
βi = . (27) of β;
4μhi P
• Event 3: The WLC-WD changes from the current one
It could be inferred from (26) and (27) that when to a new one and the task execution time of the new
β ≤ min(βi , βi ), dT
dβ ≤ 0 holds, and when βi > β > βi ,
i 
one increases if β continues increasing/decreasing, which
dTi
dβ > 0 holds. happens when TJ > Tof f and β = βJth always hold
When β ≥ βi , the first-order derivative of Ti with respect during the increase/decrease of β;
to β is dTi
dβ = 1 > 0. This completes the proof.  • Event 4: The WLC-WD changes from the current one
Remark: Theorem 1 demonstrates the existence of the to a new one and the task execution time of the new
minimum Ti . If βi is smaller than the critical value of WPT one decreases if β continues increasing/decreasing, which

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1761

Fig. 4. Five possible cases of the current WLC-WD NJ when adjusting β to decrease TJ .

happens when TJ > Tof f and β = βJth always hold Proof: When Event 1 happens, the current value of β is
during the increase/decrease of β. βJ=of f , and it is the optimal solution. Notice that Tof f (β) is
Remark: During the increasing/decreasing of β to decrease a convex function of β with β  that minimizes Tof f . If we
TJ , we know that the TCD T = max{Tof f , Tloc } continu- continuously increase/decrease β from β = β  to decrease TJ
ously decreases. Therefore, we should adjust the value of β at the cost of increasing Tof f , we would not increase/decrease
in one direction. That is to say, we continuously increase (or β when TJ = Tof f holds. Otherwise, the TCD begins to
decrease) β until the TCD cannot be decreased. If Event 1, increase.
Event 2, or Event 3 happens, the new β is optimal, which When Event 2 happens, the current value of β is βth , and
will be proved in Theorem 2, and Algorithm 1 terminates. it is the optimal solution. If we increase/decrease β from β =
If Event 4 happens, we further increase/decrease the value of βth , TJ begins to increase from the minimum value, and the
β to decrease TJ of the new WLC-WD NJ . TCD increases.
In the above approach of the optimal β, we continuously When Event 3 happens, the current value of β is βl←J , and
increase/decrease β with a small enough step-size (e.g., 0.001) it is the optimal solution. If we increase/decrease β from β =
until one of above four events happens. However, the above βl←J , the task execution time of new WLC-WD increases,
approach is time consuming. We present an efficient pro- which results in the increase of the TCD.
cedure, i.e., Procedure 2 in Section IV-B, to determine the When Event 4 happens, Algorithm 1 begins a new round
threshold value of β (denoted by βJ=of f ) satisfying TJ = of increasing/decreasing β by decreasing TJ of the new NJ
Tof f , and present another efficient procedure, i.e., Procedure until another event happens. Then we infer that Algorithm 1
3 in Section IV-B, to determine the threshold value of β terminates at the occurrence of Event 1, Event 2, or Event 3.
(denoted by βl←J ), at which Nl becomes the new WLC-WD. This completes the proof. 
With the values of βJ=of f and βl←J , we discuss which one
of four events happens first. When we decrease β, the value B. Procedures for Obtaining Two Thresholds of β
of β would be adjusted to the maximum of βJ=of f , βJth , and
Let Tof f |β=α denote the value of Tof f when β = α, and
βl←J , i.e.,
TJ |β=α denote the value of TJ when β = α.
β = max βJ=of f , βJth , βl←J . (29) Procedure 2 (Determining the Threshold Values of β and t
That Satisfy TJ = Tof f ):
When we increase β, the value of β would be adjusted to the Suppose that the current NJ is always the WLC-WD and
smallest of βJ=of f , βJth , and βl←J , i.e., β = βJth always holds when we increase/decrease β. Then we
β = min βJ=of f , βJth , βl←J . (30) determine the threshold value of β that satisfies TJ = Tof f .
We divide the five cases into two categories based on whether
Then we present the WDA algorithm to solve TCDM-sub in β should be increased or decreased to decrease TJ .
Algorithm 1. Category 1: Case A-1 and Case B-1. For both two cases,
Theorem 2: The optimal solution of TCDM-sub could be in order to decrease TJ , we increase β. In interval [β, βJ  ] of
obtained by Algorithm 1. Case A-1 and interval [β, βJ ] of Case B-1, TJ monotonically

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1762 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

Algorithm 1 WDA Algorithm Tof f = TJ , and we set βJ=of f = +∞. If Tof f |β=βJ  >
Input : The offloading-decision m. TJ |β=βJ  , we obtain βJ=of f that satisfies Tof f |β=βJ=of f =
Output: The optimal β ∗ and t∗ . TJ |β=βJ=of f by the bisection search method.
1 Obtain β and t that minimize Tof f by the method
  Category 2: Case A-2, Case A-3, and Case B-2. First, let
in [34] and β = β  ; us consider Cases A-2 and A-3. In order to decrease TJ ,
2 if Tof f ≥ Tloc then we decrease β in interval [βJ  , β] of Cases A-2 and A-3, where
3 β ∗ = β  and t∗ = t ; TJ monotonically increases with β and Tof f monotonically
4 return. decreases with β. If Tof f |β=βJ  < TJ |β=βJ  , there is no
5 end feasible solution of β in [βJ  , β] that satisfies Tof f = TJ ,
6 Find the WLC-WD NJ by (14) for the current β; and we set βJ=of f = −∞. Otherwise, βJ=of f is obtained
7 Execute Procedure 1 to determine whether to increase or by the bisection search method. The analysis of Case B-2 is
decrease β; similar to that of Cases A-2 and A-3.
8 Obtain βJ=of f and t by executing Procedure 2;
Procedure 3 (Determining the Threshold Value of β at
9 Obtain βl←J by executing Procedure 3;
Which the WLC-WD Changes):
10 if the result of Step 7 is decreasing β then
Suppose that TJ > Tof f always holds and β = βJth always
11 if βJ=of f = max βJ=of f , βJth , βl←J then
holds when we increase/decrease β. Then we determine the
12 β ∗ = βJ=of f and t∗ = t;
threshold value of β at which the WLC-WD changes.
13 return.
We divide the five cases into three categories. Category 1:
14 else if βJth = max βJ=of f , βJth , βl←J then
Case A-1, Case A-2, and Case B-1, where β ≤ β J holds
15 β ∗ = βJth ;
16 Obtain t∗ that minimizes Tof f by the method during the increase/decrease of β. Category 2: Case B-2 where
in [34] ; β > β J holds during the decrease of β. Category 3: Case A-3
17 return. where β > β J holds at first and then β ≤ β J holds during
18 else if βlth > βl←J then the decrease of β.
19 β ∗ = βl←J ; Category 1: Case A-1, Case A-2, and Case B-1. We observe
20 Obtain t∗ that minimizes Tof f by the method from Fig. 4 that, in order to decrease TJ , we need to increase
in [34] ; β in the interval [β, βJ ] of Case A-1, increase β in the interval
21 return. [β, β J ] of Case B-1, and decrease β in the interval [βJ , β] of
22 else Case A-2.
23 β = βl←J and J = l; Based on (14), we know that for all the three cases, TJ in
24 goto Step 8; the above intervals is
25 end

26 else
kφ3 SJ 3
27 if βJ=of f = min βJ=of f , βJth , βl←J then TJ = β + , (31)
28 β ∗ = βJ=of f and t∗ = t; μβhJ P
29 return.
30 else if βJth = min βJ=of f , βJth , βl←J then and then we analyze the value of βl←J .
31 β ∗ = βJth ; Proposition 1: For Case A-1 and Case B-1, when we
32 Obtain t∗ that minimizes Tof f by the method increase β in [β, βJth ], βl←J could be given as follows. If
in [34] ; M = φ, βl←J = min M , otherwise βl←J = +∞, where M
33 return. is the following set
34 else if βlth < βl←J then
35 β ∗ = βl←J ; kφ(fmax ) SJ 3
2
36 Obtain t∗ that minimizes Tof f by the method :
in [34] ; μP hJ Sl 2

37 return.
2
kφ(fmax ) SJ 3  
∈ β, βJ
th
βl , ∞ , l ∈ Z − {J} .
38 else μP hJ Sl 2
39 β = βl←J and J = l; (32)
40 goto Step 8;
41 end
Proof: Based on (14), Nl becomes the new WLC-WD if
42 end
one of two following conditions is satisfied.
(A1)
The first condition: for some l ∈ Z − {J}, βl satisfies
(A1)  (A1)
βl ∈ β, βJth , βl ≤ βl , and
decreases with β, and Tof f monotonically increases with β.
 
By taking Case A-1 as an example, we use the following way
(A1) kφ3 SJ 3 (A1) kφ3 Sl 3
to obtain βJ=of f . βl + (A1)
= βl + (A1)
. (33)
Notice that for the current β, Tof f < TJ holds. We com- μβl hJ P μβl hl P
pare Tof f |β=βJ  and TJ |β=βJ  . If Tof f |β=βJ  < TJ |β=βJ  ,
(A1)
there is no feasible solution of β in [β, βJ  ] that satisfies Obviously, there is no feasible solution of βl in (33).

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1763

(A1) 
The second condition: for some l ∈ Z −{J}, βl satisfies in sub-interval β J , β , and
(A1)  (A1)
βl ∈ β, βJth , βl ≥ βl , and 
 kφ3 SJ 3
TJ = β + (40)
(A1) kφ3 SJ 3 (A1) Sl φ μβhJ P
βl + = βl + , (34)
μβ
(A1)
hJ P fmax 
l in sub-interval βJ , β J . Then we have the following propo-
i.e., sition about the value of βl←J .
2
Proposition 4: For Case A-3, when we decrease β, βl←J
(A1) kφ(fmax ) SJ 3 could be given as follows. If M1 = φ, then βl←J = max M1
βl = . (35)
μP hJ Sl 2 where M1 is the following set
2
Notice that there may be one or more βl
(A1)
satisfying the kφ(fmax ) Sl 3
:
second condition for l ∈ Z − {J}. When we increase β, the μP hl SJ 2
(A1) 
WD with the smallest βl first becomes the new WLC-WD. 2
kφ(fmax ) Sl 3  
(A1) ∈ β , β 0, β l , l ∈ Z − {J} .
If there does not exist βl satisfying the second condition μP hl SJ 2
J
for l ∈ Z − {J}, the current NJ is always the worst. This (41)
completes the proof. 
Propositions 2-4 can be proved similar to Proposition 1, and If M2 = φ, then βl←J = max M2 ; else βl←J = −∞, where
they are omitted here. M2 is the following set
Proposition 2: For Case A-2, when we decrease β in
kφ(fmax )2 SJ 3
[βJ , β], βl←J could be given as follows. If M = φ, then :
βl←J = max M ; else βl←J = −∞, where M is the following μP hJ Sl 2

set kφ(fmax )2 SJ 3 
 
∈ βJ , β J βl , ∞ , l ∈ Z −{J} .
kφ(fmax ) SJ 3
2 μP hJ Sl 2
: (42)
μP hJ Sl 2
2  
kφ(fmax ) SJ 3 

∈ [βJ , β] βl , ∞ , l ∈ Z − {J} .
μP hJ Sl 2 C. Computational Complexity
(36) In Algorithm 1, obtaining β  that minimizes Tof f by
Category 2: Case B-2. We observe from Fig. 4 that, in order the method in [34] has the complexity O(|K|). In Step 6,
to decrease TJ , we need to decrease β in the interval β J , β finding the WLC-WD NJ by (14) has the complexity O(|Z|).
of Case B-2. The complexity of Procedure 1 is O(1), that of Procedure
Based on (14), we know that in the above interval, the 2 is O(|K|), and that of Procedure 3 is O(|Z|). Moreover,
expression for TJ of NJ is given as numerical results in Section VI validate that, the iteration times
of executing Algorithm 1 to obtain β ∗ is limited, and is less
SJ φ than the number of WDs. Hence the computation complexity
TJ = β + , (37)
fmax of Algorithm 1 is O(I).
and then we have the following proposition about the value of
βl←J . V. DRL-BASED O FFLOADING A LGORITHM
Proposition 3: For Case B-2, when we decrease β in The WDA algorithm solves the sub-problem in (21) and
[β J , β], βl←J could be given as follows. If M = φ, then obtains the optimal WPT duration β and transmission time
βl←J = max M ; else βl←J = −∞, where M is the following allocation t under a given m. In this section, we propose
set a DNN-based DRL model to solve the top-problem in (22),
kφ(fmax ) Sl 3
2 which is shown in Fig. 5.
: 1) Taking the channel gains h and WDs’ task workloads S
μP hl SJ 2
 as input, a fully connected DNN is applied to output a
2
kφ(fmax ) Sl 3  
∈ β , β 0, β l , l ∈ Z − {J} . relaxed offloading-decision m̂, and each element of m̂
μP hl SJ 2
J
lies in [0,1].
(38) 2) Although the DNN only outputs one offloading-decision
m̂, we apply an exploration mechanism to gener-
Category 3: Case A-3. We observe from Fig. 4 that, in order
ate L candidate offloading-decisions on the basis
to decrease TJ , we need to decrease β in the interval [βJ , β]
of m̂.
of Case A-3.
3) For every generated candidate offloading-decision mn
Based on (14), we know that in the above intervals, the
(n = 1, 2, . . . , L), we obtain its corresponding opti-
expression for TJ of NJ is
mal β, t, and TCD T (mn ) by the WDA algorithm.
SJ φ We obtain the optimal m with the minimal TCD among
TJ = β + (39) L candidate offloading-decisions, denoted by m∗ . Then
fmax

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1764 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

Fig. 5. The structure of the proposed DRL-based offloading algorithm.

we output (m∗ , β ∗ , t∗ ) as the offloading solution. Mean- in the ascending


 order, and obtain the distance
while, the new sample (h, S, m∗ ) is stored in the replay sequence
 as m̂(1) −   
 0.5 ≤ m̂ (2) − 0.5 ≤ . . . ≤
memory. m̂(i) − 0.5 . . . ≤ m̂(I) − 0.5. Here, m̂(i) is the
4) A batch of samples are randomly chosen to train the i-th closest element of m̂ from 0.5. Then, the n-th
DNN in each training interval. candidate offloading-decision mn (n = 2, 3, . . . , L),
The DRL-based offloading algorithm in Fig. 5 is composed is obtained based on m̂(n−1) as
of the following key parts: ⎧

⎪0 m̂i < m̂(n−1) ,
• Policy function: We aim to train the DNN to learn the ⎪
⎨0 m̂ = m̂
optimal offloading-decision m∗ to minimize the TCD. i (n−1) and m̂(n−1) > 0.5,
mn,i =
The input and output of the DNN could be mapped as a ⎪1 m̂i = m̂(n−1) and m̂(n−1) ≤ 0.5,



policy π by 1 m̂i > m̂(n−1) ,
m̂ = πβ (h, S) , (43) (47)

where θ represents the network parameters in the DNN, where i = 1, 2, . . . , I. Let us give an example to
and m̂ represents the action of DRL. We use T (mn ) in explain the exploration mechanism. In the example,
Fig. 5 as the reward of candidate action mn . The network L=5 and m̂ = [0.1, 0.35, 0.6, 0.7, 0.8] hold. The dis-
parameters of DNN are randomly initialized. tance sequence of m̂ is |0.6 − 0.5| ≤ |0.35 − 0.5| ≤
• Exploration mechanism: The DNN outputs a relaxed |0.7 − 0.5| ≤ |0.8 − 0.5| ≤ |0.1 − 0.5|. Then
offloading-decision m̂ = πβ (h) with the policy πβ as we obtain m̂(1) = 0.6, m̂(2) = 0.35, m̂(3) =
0.7, m̂(4) = 0.8, and m̂(5) = 0.1. Based on (47),
m̂ = {m̂i |m̂i ∈ [0, 1] , i ∈ I} , (44) five generated candidate offloading-decisions are
m1 =[0, 0, 1, 1, 1], m2 =[0, 0, 0, 1, 1], m3 =[0, 1,
where m̂i represents the offloading-decision of Ni .
1, 1, 1], m4 =[0, 0, 0, 0, 1], and m5 =[0, 0, 0, 0, 0].
To improve the training samples, the exploration mecha-
nism generates L offloading-decisions based on m̂ as • Policy update: A batch of samples are randomly chosen
   to train the DNN in each training interval. Let T denote
 I
m̂ → mn mn ∈ {0, 1} , n = 1, . . . , L . (45) the set of indexes of the chosen samples, and parameters
θ could be updated by Adam algorithm [35] to decrease
The exploration mechanism of the DRL-based offloading the averaged cross-entropy loss as
algorithm is similar to that in [22], which is given as
1 
follows. L(θ) = − ((m∗c )log πβ (hc )
|T |
1) The first candidate offloading-decision, denoted by c∈T
m1 , is given as +(1−m∗c )log (1−πβ (hc ))),
0 m̂i ≤ 0.5, where the superscript represents the transpose opera-
m1,i = (46)
1 m̂i > 0.5, tor, {(hc , Sc , m∗c ) |c ∈ T } represents a batch of samples
where i = 1, 2, . . . , I. randomly chosen from replay memory, and |T | denotes
2) The remaining L − 1 offloading-decisions are gen- the size of T .
erated as follows. First, we sort all the WDs Trained by iteratively executing the aforementioned key
according to the distance between 0.5 and m̂i parts, the DNN-based DRL model continuously obtains better

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1765

Algorithm 2 DRL-Based Offloading Algorithm to Solve


the TCDM Problem
Input : The channel gains h and task workloads S
Output: Offloading-decision m, β ∗ and t∗
1 Randomly initialize the parameters θ of DNN;
2 Set training interval ψ;
3 for i = 1, 2, . . . do
4 Use the DNN to generate the relaxed
offloading-decision m̂ = πβ (h, S);
5 Generate L candidate offloading-decisions {mn }
by (45);
6 For each mn , compute T (mn ) by Algorithm 1; Fig. 6. Convergence performance of proposed DRL-based algorithm with
7 Obtain m∗ = arg min T (mn ), β ∗ and t∗ ; respective to the normalized TCD T̂ when I = 8.
{mn }
8 Store the sample (h, S, m∗ ) in the replay memory;
9 if i mod ψ = 0 then energy efficiency coefficient k = 10−26 , maximum computing
10 A batch of samples {(hc , Sc , m∗c ) | c ∈ T } are speed fmax = 15 Mbits/s, and the number of cycles that every
chosen from the replay memory; WD needs to process one bit of task data φ = 100 [36]. The
11 Update the training parameters θ of the policy πβ communication bandwidth W = 1 MHz and the background
by Adam algorithm with the chosen samples; noise variance σ 2 = 10−12 Watt.
12 end
In addition, a fully connected DNN, composed of one
13 end
input layer, two hidden layers, and one output layer,
is applied to implement the DRL-based offloading algo-
rithm by PyTorch framework using Python. The first hidden
samples due to the use of exploration mechanism, and thus layer has 120 neurons, and the second hidden layer has
continuously enhances the DNN parameters. After a num- 80 neurons.
ber of training intervals, the DNN outputs the near-optimal We introduce the normalized TCD T̂ to evaluate the solution
offloading-decisions and the WDA algorithm obtains the near- obtained by the DRL-based offloading algorithm. Here, T̂ is
optimal β ∗ and t∗ even under the pairs of channel gains h and defined as
task workloads S that have not ever been experienced. The
T ∗ (m, t, β)
complete DRL-based offloading algorithm is summarized in T̂ = , (48)
Algorithm 2, where i represents the i-th round of determining T (m, t, β)
WD’s task offloading solution. where T (m, t, β) represents the TCD of the solution
Then we analyze the computational complexity of Algo- obtained by the proposed DRL-based offloading algorithm,
rithm 2. For each round of determining WD’s task offloading and T ∗ (m, t, β) represents the TCD of the optimal solution
solution, using the DNN to generate the relaxed offloading- obtained by traversing all 2I offloading-decisions and solving
decision m̂ in Step 4 has the complexity O(I). Sorting the sub-problem by Algorithm 1.
algorithm has the complexity O(I log(I)), and generating Fig. 6 shows the convergence performance of Algorithm 2 in
L candidate offloading-decisions has the complexity O(LI). terms of the normalized TCD T̂ when I = 8. The parameters
Thus the complexity of Step 5 is O(max{I log(I), LI}). of the DNN in Fig. 6 include memory size=512, batch size=64,
In Step 6, the computation complexity of Algorithm 1 is training interval=10, and learning rate=0.01. Fig. 6 plots the
O(I). The complexity of Step 7 is O(I), and that of average T̂ over 300 mostly recent samples. We observe from
Step 8 is a constant. Steps 9-12 are executed only dur- Fig. 6 that, when the number of training samples is larger
ing the training phase, and do not need to be executed if than 2000, the DRL-based algorithm achieves an average T̂
the DNN converges. Therefore, the overall complexity is of more than 0.98. Moreover, after 8000 training samples,
O(max{I log(I), LI}). the average T̂ is larger than 0.98. This indicates that the
proposed DRL-based offloading algorithm obtains the near-
VI. N UMERICAL R ESULTS minimal TCD.
In this section, we provide numerical results to validate Fig. 7 shows the impact of four parameters on the DNN
the TCD achieved by the proposed DRL based offloading training by the moving average T̂ in the WP-MEC network
algorithm. The parameter setting of TCDM problem is intro- with 8 WDs. Fig. 7(a) shows T̂ under different replay memory
duced as follows. For the wireless channel gain, the distance sizes, with batch size, training interval and learning rate being
di between Ni and H-AP is randomly distributed in [3.5, 7] 64, 10 and 0.01, respectively. We observe that DNN has good
meters, antenna gain Ad = 4.11, the carrier frequency fc = convergence even if the replay memory is small. For example,
715MHz, the path loss exponent de = 2.5, and the energy even when the replay memory size is 128, the moving average
harvesting efficiency μ = 0.7. The simple computation tasks T̂ is larger than 0.97 after about 3000 training samples. In the
that are individable are considered, the task workload Si of Ni following we set the memory size as 512. Fig. 7(b) shows
is randomly distributed in the range [5, 200] bits, computation T̂ under different training batch sizes, with replay memory

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1766 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

Fig. 7. The impact of four parameters on the moving average T̂ when I = 8.

Fig. 8. The impact of four parameters on the moving average T̂ when I = 5.

size, training interval and learning rate being 512, 10 and 0.01, Fig. 8 evaluates the moving average T̂ of the WP-MEC
respectively. We observe that DNN has good convergence even network with 5 WDs. By comparing Figs. 7-8, we observe
if the batch size is small. In the following the batch size is set that, with the impact of four parameters, the number
as 64. Fig. 7(c) shows T̂ under different training intervals, with of WDs does not impact the convergence performances
memory size, batch size and learning rate being 512, 64 and of T̂ .
0.01, respectively. In Fig. 7(c), we observe that the DNN with We compare the DRL-based offloading algorithm with five
a small training interval (such as 5, 10, 20) has better conver- schemes:
gence performance. However, a smaller training interval means
more frequent update of DNN parameters and the training • Edge-computing scheme: All the WDs complete the tasks
overhead may be kind of large. Therefore, we set the training by offloading them to the H-AP.
interval as 10. Fig. 7(d) shows T̂ under different learning rates, • Local-computing scheme: All the WDs locally compute
with memory size, batch size and training interval being 512, their own computation tasks.
64 and 10, respectively. In Fig. 7(d), the DRL-based offloading • Channel-quality-descending scheme: We sort I WDs in a
algorithm with small learning rate (such as 0.001 and 0.0001) descending order of the channel gain to the H-AP. In the
converges slowly. Consequently, we choose the learning rate i-th round with i = 1, . . . , I, the top i WDs perform
as 0.01. task offloading, and the other I − i WDs perform local

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1767

Fig. 11. CDF of TCD in the WP-MEC network with 10 WDs and transmit
power P = 3W.
Fig. 9. TCD of the WP-MEC network under different number of WDs, with
P = 3W and di is randomly distributed in [3.5, 7] meters.

Fig. 10. TCD of the WP-MEC network under different number of WDs, Fig. 12. TCD of the WP-MEC network with 10 WDs under different transmit
with P = 3W and di is randomly distributed in [7, 12] meters. power of H-AP.

computing. The minimal computation delay among the I average TCD achieved by the DRL-based offloading algorithm
rounds could be achieved. is smaller than those of five comparison schemes. Moreover,
• Coordinate descent (CD) scheme [21]: The CD scheme with the increase of WDs, the TCD under the proposed
iteratively swaps offloading mode, from local computing algorithm increases slowly and the TCD reduction achieved
to task offloading or verse, for every WD in each round by the proposed DRL-based offloading algorithm increases.
that results in a reduction of minimum computation delay Besides, compared with the local-computing algorithm, the
until the computation delay cannot be minimized. TCD under the edge-computing algorithm is greatly decreased
• Deep deterministic policy gradient (DDPG) scheme: For due to the low computation capability of WDs. Furthermore,
I WDs, the gains of the channels between the WDs with the increase of di , the channel gain in (1) decreases, and
and the H-AP, and the task workloads are input to the offloading WDs need longer time to fully offload the task to
neural networks using DDPG, which output the WDs’ the H-AP based on (5)-(6). Therefore, as shown in Figs. 9
offloading decisions, the corresponding transmission time and 10, the TCD increases with distance di .
of offloading task, and the duration of WPT β. Then we Fig. 11 shows the cumulative distribution function (CDF)
adjust the offloading decisions to the binary offloading, of TCD in the WP-MEC network with 10 WDs and transmit
and derive the corresponding TCD according to (16) as power P = 3W, which is in correspondence with the I =
the reward to train the policy of DDPG. 10 case in Fig. 9. We observe that, the WP-MEC network
The first two schemes have lower computation complexity under the proposed scheme, the edge computing scheme,
than the DRL-based offloading algorithm, while the last three and the coordinate descent scheme achieves less than 1ms
schemes [21] have higher computation complexity. Every com- computation delay for more than 90% independent samples.
parison scheme achieves worse TCD performance as shown in The WP-MEC network under the local computing scheme and
Figs. 9-12. DDPG scheme achieves larger than 2ms computation delay for
Figs. 9 and 10 show the average TCD of the WP-MEC the vast majority of independent samples.
network with different numbers of WDs I. For the DRL-based Fig. 12 shows the average TCD of the WP-MEC network
offloading algorithm, we adopt the trained DNN that has under different transmit power of the H-AP. The TCD is
learned from 10000 rounds of task offloading with different averaged by 4000 independent samples. We observe that, the
wireless channel gains and WDs’ task workloads. The TCD is average TCD achieved by the DRL-based offloading algorithm
averaged by 4000 independent samples. We observe that, the is smaller than those of five baseline schemes. Additionally,

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1768 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

with the increase of H-AP’s transmit power, the average TCD need to consider the change of the worst local-computing WD
decreases. This indicates that, with larger transmit power of (WLC-WD). It is easy to infer that maxz∈ZA τz is a monoton-
the H-AP, WDs have larger energy harvesting power, resulting ically non-decreasing function of A, and Δx TA (m, t, β) is a
in that the offloading WDs offload their tasks with less time, monotonically non-increasing function of A.
and the local-computing WDs execute the computation tasks If WD x performs task offloading, which indicates the
with less time. current βA may not be enough for WD x to perform task
offloading and its task workload Sx is high, it is obvious that
VII. C ONCLUSION AND F UTURE W ORKS the WLC-WD does not change after adding WD x. Based
This paper studied the WP-MEC network where WDs on (5) and (50), the impact of adding WD x on the total
IA
conduct binary task offloading for their computation tasks. duration of task offloading mi ti is independent of A.
The considered TCDM problem was formulated as a MIP i=1
problem. To tackle this challenge, we decomposed the TCDM Then it is easy to infer that − βA ) is a monotonically
(βA∪x
problem into the sub-problem of optimizing the WPT duration non-increasing function of A.
and transmission durations, and the top-problem of optimiz- To conclude, for all A ⊂ B and x ∈ B, Δx TA (m, t, β) ≥
ing the offloading decision. For the nonconvex sub-problem, Δx TB (m, t, β) is satisfied. Thus T (m, t, β) is submodu-
we designed the WDA algorithm to efficiently obtain its opti- lar [32]. Notice that the marginal value Δx TA (m, t, β) could
mal solution. For the top-problem, we proposed a DNN-based not be negative, based on Lemma 2 in [31], the TCDM is
DRL model to fast obtain the near-optimal offloading deci- a monotonic submodular problem and is NP-hard according
sion. The proposed online DRL-based offloading algorithm to [31] and [32].
converges fast during the training process. It is able to achieve
the near-minimal TCD with low computational complexity and R EFERENCES
is appropriate for the fast-fading WP-MEC network, and it [1] M. Stoyanova, Y. Nikoloudakis, S. Panagiotakis, E. Pallis, and
also holds for the scenario where some of the other fading E. K. Markakis, “A survey on the Internet of Things (IoT) forensics:
models are adopted and individable computation tasks need Challenges, approaches, and open issues,” IEEE Commun. Surveys Tuts.,
vol. 22, no. 2, pp. 1191–1221, 2nd Quart., 2020.
to be finished. With the increase of WDs, the TCD achieved [2] K. Zheng, X. Liu, B. Wang, H. Zheng, K. Chi, and Y. Yao, “Through-
by the DRL-based offloading algorithm increases slowly, and put maximization of wireless-powered communication networks:
the TCD reduction achieved by the proposed DRL-based An energy threshold approach,” IEEE Trans. Veh. Technol., vol. 70,
no. 2, pp. 1292–1306, Feb. 2021.
offloading algorithm increases. [3] X. Liu, B. Xu, X. Wang, K. Zheng, K. Chi, and X. Tian, “Impacts of
This paper could be extended from the following directions. sensing energy and data availability on throughput of energy harvesting
First, due to the advantages and disadvantages of TDMA, cognitive radio networks,” IEEE Trans. Veh. Technol., vol. 72, no. 1,
pp. 747–759, Jan. 2023.
other promising protocols, such as NOMA, could also be [4] K. Zheng, X. Jia, K. Chi, and X. Liu, “DDPG-based joint time and
adopted by offloading WDs to offload their computation tasks energy management in ambient backscatter-assisted hybrid underlay
to the H-AP. Second, from the perspective of computation CRNs,” IEEE Trans. Commun., vol. 71, no. 1, pp. 441–456, Jan. 2023.
[5] K. Chi, Z. Chen, K. Zheng, Y.-H. Zhu, and J. Liu, “Energy provision
tasks, both subdividable and individable computation tasks minimization in wireless powered communication networks with net-
could be considered to be completed by WDs, which is a work throughput demand: TDMA or NOMA?” IEEE Trans. Commun.,
more practical scenario and more challenging for researchers. vol. 67, no. 9, pp. 6401–6414, Sep. 2019.
Third, from the perspective of energy transfer, the reserved [6] X. Liu, B. Xu, K. Zheng, and H. Zheng, “Throughput maximization of
wireless-powered communication network with mobile access points,”
energy of local-computing WDs may be helpful for future IEEE Trans. Wireless Commun., early access, Dec. 5, 2022, doi:
computation tasks. Fourth, it is interesting and challenging 10.1109/TWC.2022.3225085.
to implement the DRL-based offloading algorithm in the real [7] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey
on mobile edge computing: The communication perspective,” IEEE
experiments. Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, 4th Quart., 2017.
[8] T. X. Tran and D. Pompili, “Joint task offloading and resource allocation
A PPENDIX A for multi-server mobile-edge computing networks,” IEEE Trans. Veh.
Technol., vol. 68, no. 1, pp. 856–868, Jan. 2019.
Let A denote the current set of WDs, KA denote the set of [9] B. Zhu, K. Chi, J. Liu, K. Yu, and S. Mumtaz, “Efficient offloading
indexes of offloading WDs, and ZA denote the set of indexes for minimizing task computation delay of NOMA-based multi-access
edge computing,” IEEE Trans. Commun., vol. 70, no. 5, pp. 3186–3203,
of local-computing WDs. Based on (16), we have May 2022.
I  [10] X. Hu, K.-K. Wong, and Y. Zhang, “Wireless-powered edge com-
A
puting with cooperative UAV: Task, time scheduling and trajectory
TA (m, t, β) = βA + max mi ti , max τz . (49) design,” IEEE Trans. Wireless Commun., vol. 19, no. 12, pp. 8083–8098,
z∈ZA
i=1 Dec. 2020.
[11] Q. Gu, Y. Jian, G. Wang, R. Fan, H. Jiang, and Z. Zhong, “Mobile
where m = {mi ∈ {0, 1} |i ∈ A }, and t = {ti |i ∈ KA }. edge computing via wireless power transfer over multiple fading blocks:
The marginal value Δx TA (m, t, β) denotes the increase of An optimal stopping approach,” IEEE Trans. Veh. Technol., vol. 69,
the TCD by adding WD x into A, i.e., no. 9, pp. 10348–10361, Sep. 2020.
[12] F. Wang, J. Xu, X. Wang, and S. Cui, “Joint offloading and com-
Δx TA (m, t, β) = TA∪x (m, t, β) − TA (m, t, β) . (50) puting optimization in wireless powered mobile-edge computing sys-
tems,” IEEE Trans. Wireless Commun., vol. 17, no. 3, pp. 1784–1797,
If WD x performs local computing, which indicates the cur- Mar. 2018.
[13] F. Wang, J. Xu, and S. Cui, “Optimal energy allocation and task offload-
rent βA is enough for WD x to perform local computing and ing policy for wireless powered mobile edge computing systems,” IEEE
its task workload Sx is low, βA = βA∪x holds, and we only Trans. Wireless Commun., vol. 19, no. 4, pp. 2443–2459, Apr. 2020.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
ZHENG et al.: DRL-BASED OFFLOADING FOR COMPUTATION DELAY MINIMIZATION IN WP-MEC 1769

[14] F. Wang, H. Xing, and J. Xu, “Real-time resource allocation for Kechen Zheng (Member, IEEE) received the B.E.
wireless powered multiuser mobile edge computing with energy and and Ph.D. degrees in electronic engineering from
task causality,” IEEE Trans. Commun., vol. 68, no. 11, pp. 7140–7155, Shanghai Jiao Tong University, Shanghai, China,
Nov. 2020. in 2013 and 2018, respectively. He is currently an
[15] T. Bai, C. Pan, H. Ren, Y. Deng, M. Elkashlan, and A. Nallanathan, Associate Professor with the School of Computer
“Resource allocation for intelligent reflecting surface aided wireless Science and Technology, Zhejiang University of
powered mobile edge computing in OFDM systems,” IEEE Trans. Technology, Hangzhou, China. He has published
Wireless Commun., vol. 20, no. 8, pp. 5389–5407, Aug. 2021. more than 40 technical papers in journals and confer-
[16] Y. Ye, L. Shi, X. Chu, D. Li, and G. Lu, “Delay minimization in wireless ences, including IEEE T RANSACTIONS ON M OBILE
powered mobile edge computing with hybrid BackCom and AT,” IEEE C OMPUTING, IEEE T RANSACTIONS ON W IRELESS
Wireless Commun. Lett., vol. 10, no. 7, pp. 1532–1536, Jul. 2021. C OMMUNICATIONS, and IEEE T RANSACTIONS ON
[17] J. Park, S. Solanki, S. Baek, and I. Lee, “Latency minimization for C OMMUNICATIONS. His research interests include the IoT, performance
wireless powered mobile edge computing networks with nonlinear evaluation in cognitive networks and social networks, and energy harvesting
rectifiers,” IEEE Trans. Veh. Technol., vol. 70, no. 8, pp. 8320–8324, wireless communication networks. He was a recipient of the Best Paper Award
Aug. 2021. from the International Conference on Networking and Network Applications
[18] S. Mao, S. Leng, S. Maharjan, and Y. Zhang, “Energy efficiency and in 2021.
delay tradeoff for wireless powered mobile-edge computing systems
with multi-access schemes,” IEEE Trans. Wireless Commun., vol. 19,
no. 3, pp. 1855–1867, Mar. 2020.
[19] L. Shi, Y. Ye, X. Chu, and G. Lu, “Computation energy efficiency
maximization for a NOMA-based WPT-MEC network,” IEEE Internet
Things J., vol. 8, no. 13, pp. 10731–10744, Jan. 2021.
[20] M. Min, L. Xiao, Y. Chen, P. Cheng, D. Wu, and W. Zhuang, “Learning-
based computation offloading for IoT devices with energy harvesting,” Guodong Jiang received the B.S. degree from
IEEE Trans. Veh. Technol., vol. 68, no. 2, pp. 1930–1941, Feb. 2019. the Zhejiang University of Technology, Hangzhou,
[21] S. Bi and Y. Zhang, “Computation rate maximization for wireless pow- China, in 2020, where he is currently pursuing
ered mobile-edge computing with binary computation offloading,” IEEE the M.S. degree with the School of Computer Sci-
Trans. Wireless Commun., vol. 17, no. 6, pp. 4177–4190, Jun. 2018. ence and Technology. His current research inter-
[22] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning ests include delay minimization of wireless-powered
for online computation offloading in wireless powered mobile-edge multi-access edge computing.
computing networks,” IEEE Trans. Mobile Comput., vol. 19, no. 11,
pp. 2581–2593, Nov. 2020.
[23] P. X. Nguyen et al., “Backscatter-assisted data offloading in OFDMA-
based wireless-powered mobile edge computing for IoT networks,” IEEE
Internet Things J., vol. 8, no. 11, pp. 9233–9243, Jun. 2021.
[24] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization
in UAV-enabled wireless-powered mobile-edge computing systems,”
IEEE J. Sel. Areas Commun., vol. 36, no. 9, pp. 1927–1941, Sep. 2018.
[25] F. Zhou and R. Q. Hu, “Computation efficiency maximization in
wireless-powered mobile edge computing networks,” IEEE Trans. Wire-
less Commun., vol. 19, no. 5, pp. 3170–3184, May 2020.
[26] J. Liu, K. Xiong, D. W. K. Ng, P. Fan, Z. Zhong, and K. B. Letaief, Xiaoying Liu (Member, IEEE) received the B.E.
“Max-min energy balance in wireless-powered hierarchical fog-cloud degree in electronic engineering from the Nanjing
computing networks,” IEEE Trans. Wireless Commun., vol. 19, no. 11, University of Science and Technology, Nanjing,
pp. 7064–7080, Nov. 2020. China, in 2013, and the Ph.D. degree in electronic
[27] M. Zeng, R. Du, V. Fodor, and C. Fischione, “Computation rate maxi- engineering from Shanghai Jiao Tong University,
mization for wireless powered mobile edge computing with NOMA,” in Shanghai, China, in 2018. She is currently an Asso-
Proc. IEEE 20th Int. Symp. ‘World Wireless, Mobile Multimedia Netw.’ ciate Professor with the School of Computer Science
(WoWMoM), Washington, DC, USA, Jun. 2019, pp. 1–9. and Technology, Zhejiang University of Technology,
[28] S. Zarandi and H. Tabassum, “Federated double deep Q-learning for joint Hangzhou, China. Her research interests include
delay and energy minimization in IoT networks,” in Proc. IEEE Int. wireless communications, cognitive radio networks,
Conf. Commun. Workshops (ICC Workshops), Montreal, QC, Canada, and energy harvesting networks. She was a recipient
Jun. 2021, pp. 1–6. of the Best Paper Award from the International Conference on Networking
[29] M. Tang and V. W. S. Wong, “Deep reinforcement learning for task and Network Applications in 2021.
offloading in mobile edge computing systems,” IEEE Trans. Mobile
Comput., vol. 21, no. 6, pp. 1985–1997, Jun. 2022.
[30] X. Deng, J. Li, L. Shi, Z. Wei, X. Zhou, and J. Yuan, “Wireless
powered mobile edge computing: Dynamic resource allocation and
throughput maximization,” IEEE Trans. Mobile Comput., vol. 21, no. 6,
pp. 2271–2288, Jun. 2022.
[31] X. Lyu, H. Tian, C. Sengul, and P. Zhang, “Multiuser joint task
offloading and resource optimization in proximate clouds,” IEEE Trans. Kaikai Chi (Member, IEEE) received the B.S. and
Veh. Technol., vol. 66, no. 4, pp. 3435–3447, Apr. 2017. M.S. degrees from Xidian University, Xi’an, China,
[32] S. Fujishige, Submodular Functions and Optimization, vol. 58. in 2002 and 2005, respectively, and the Ph.D. degree
Amsterdam, The Netherlands: Elsevier, 2005. from Tohoku University, Sendai, Japan, in 2009.
[33] A. Grami, Introduction to Digital Communications, 1st ed. New York, He is currently a Professor with the School of Com-
NY, USA: Academic, 2015. puter Science and Technology, Zhejiang University
[34] K. Chi, Y. Zhu, Y. Li, L. Huang, and M. Xia, “Minimization of trans- of Technology, Hangzhou, China. He has published
mission completion time in wireless powered communication networks,” more than 40 refereed technical papers in proceed-
IEEE Internet Things J., vol. 4, no. 5, pp. 1671–1683, Oct. 2017. ings and journals, such as IEEE T RANSACTIONS ON
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” W IRELESS C OMMUNICATIONS, IEEE T RANSAC -
in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, TIONS ON V EHICULAR T ECHNOLOGY , and IEEE
May 2015, pp. 1–15. T RANSACTIONS ON PARALLEL AND D ISTRIBUTED S YSTEMS . His current
[36] Y. Wang, M. Sheng, X. Wang, L. Wang, and J. Li, “Mobile-edge com- research interests include wireless cellular networks, wireless ad hoc networks,
puting: Partial computation offloading using dynamic voltage scaling,” and wireless sensor networks. He was a recipient of the Best Paper Award from
IEEE Trans. Commun., vol. 64, no. 10, pp. 4268–4282, Aug. 2016. the IEEE Wireless Communications and Networking Conference in 2008.

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.
1770 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 71, NO. 3, MARCH 2023

Xinwei Yao (Member, IEEE) received the B.S. mobile ad hoc networks, space-air-ground integrated networks, intelligent and
degree in mechanical engineering and the Ph.D. connected vehicles, mobile/edge/cloud computing and storage, the Internet
degree in information engineering from the Zhejiang of Things security, and 5G. He received the IEEE ComSoc Asia-Pacific
University of Technology, Hangzhou, China. From Outstanding Young Researcher Award in 2017, the IEEE T RANSACTIONS
March 2012 to February 2013, he was a Visiting ON V EHICULAR T ECHNOLOGY (TVT) Top Editor Award in 2017, and
Scholar at Loughborough University, Leicestershire, the Best Paper Awards from many international conferences, including the
U.K. From August 2015 to July 2016, he was a IEEE flagship events, such as IEEE GLOBECOM in 2016, IEEE WCNC
Visiting Professor at the University at Buffalo, The in 2012 and 2014, and IEEE IC-NIDC in 2018. He was a recipient of
State University of New York, Buffalo, NY, USA. the prestigious 2012 Niwa Yasujiro Outstanding Paper Award due to his
He is currently a Professor with the School of exceptional contribution to the analytics modeling of two-hop ad hoc mobile
Computer Science and Technology and the Vice networks, which has been regarded by the award committees as the theoretical
Dean of the Institute for Frontier and Interdisciplinary Sciences, Zhejiang foundation for analytical evaluation techniques of future ad hoc mobile
University of Technology. His current research interests include the AIoT, the networks. He was also a recipient of the Tohoku University President Award in
Internet of Nano-Things, smart crowdsensing and colaboration, terahertz-band 2013, the Graduate School of Information Sciences Dean Award in 2013, the
communication networks, and electromagnetic nanonetworks. In these areas, Professor Genkuro Fujino Award in 2012, the Chinese Government Award for
he has coauthored more than 100 peer-reviewed scientific publications and Outstanding Ph.D. Students Abroad in 2011, and the RIEC Student Award
four books. He has also been granted more than 20 Chinese patents. He is in 2012. He is serving as a Technical Program Committees of numerous
a member of the ACM. He was a recipient of the Wu Wen-Jun Artificial international conferences, such as the leading Symposium Co-Chair of AHSN
Intelligence Excellent Youth Award and more than six Prices of Technological Symposium for GLOBECOM in 2017, CRN Symposium for ICC in 2018, and
Invention from the Chinese Government. He has served on technical program AHSN Symposium for ICC 2019. He is also a Secretary of the IEEE AHSN
committees of many IEEE/ACM conferences. TC and a Distinguished Lecturer of the IEEE Communications Society. He has
been actively joining the society activities, like serving as an Associate Editor
for the IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS Since May
2018, the IEEE T RANSACTIONS ON C OMPUTERS from October 2015 to June
2017, and the IEEE T RANSACTIONS ON V EHICULAR T ECHNOLOGY since
Jiajia Liu (Senior Member, IEEE) received the B.S. January 2016, a Editor for the IEEE Network Since July 2015 and the IEEE
degree in computer science from the Harbin Institute T RANSACTIONS ON C OGNITIVE C OMMUNICATIONS AND N ETWORKING
of Technology in 2004, the M.S. degree in computer Since January 2019, and a Guest Editor of top ranking international journals,
science from Xidian University, China, in 2009, such as IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTING
and the Ph.D. degree in information sciences from (TETC), the IEEE Network Magazine, and the IEEE I NTERNET OF T HINGS
Tohoku University in 2012. He was a JSPS Special J OURNAL.
Research Fellow at Tohoku University, Japan, from
April 2012 to October 2013, and a Data Analytics
Engineer at Aviation Industry Corporation of China,
from July 2004 to August 2006. He was a Full
Professor at the School of Cyber Engineering and
the Director of the Internet of Things Security Research Center, Xidian
University, from 2013 to 2018 and from 2016 to 2018, respectively. Since
January 2019, he has been a Full Professor with the School of Cybersecurity,
Northwestern Polytechnical University. He has published more than 130 peer-
reviewed papers in many high quality publications, including the prestigious
IEEE journals and conferences. His research interests includes wireless and

Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on May 01,2023 at 10:53:52 UTC from IEEE Xplore. Restrictions apply.

You might also like