1854785
1854785
Publication date
05-01-2023
Licence
This work is made available under the Copyright not evaluated licence and should only be used in accordance
with that licence. For more information on the specific terms, consult the repository record for this item.
Document Version
Accepted version
Published in
IEEE Transactions on Vehicular Technology
Abstract—Vehicular edge computing (VEC) is a new com- vehicular applications, such as augmented reality (AR) navi-
puting paradigm that enhances vehicular performance by in- gation and autonomous driving. However, offloading all the
troducing both computation offloading and service caching, computing tasks to the remote cloud results in a heavy
to resource-constrained vehicles and ubiquitous edge servers.
Recent developments of autonomous vehicles enable a variety backhaul load and unacceptable latency. Through vehicular
of applications that demand high computing resources and low edge computing (VEC) [1], vehicular users can offload their
latency, such as automatic driving, auto navigation, etc. However, tasks via vehicle-to-infrastructure (V2I) communications to
the highly dynamic topology of vehicular networks and limited edge nodes to reduce response latency, which caters for un-
caching space at resource-constrained edge servers calls for intel- precedentedly exploding data traffic and increasingly stringent
ligent design of caching placement and computation offloading.
Meanwhile, service caching decisions are highly correlated to the requirements of vehicular applications [2].
computation offloading decisions, which pose a great challenge
to effectively design service caching and computation offloading
strategies. In this paper, we investigate a joint optimization prob- Task computation requires both input task data from users
lem by integrating service caching and computation offloading and program data installed on edges, which are defined as
in a general VEC scenario with time-varying task requests. To content caching and service caching, respectively. Content
minimize the average task processing delay, we formulate the caching refers to caching of the input data needed and output
problem using long-term mixed integer non-linear programming data generated (e.g., in computational or infotainment appli-
(MINLP) and propose an algorithm based on deep reinforcement
learning to obtain a suboptimal solution with low computation cations) at vehicles and edge nodes [3]–[7]. Since these data
complexity. The simulation results demonstrate that our proposed dominate mobile data traffic, content caching at edge servers
scheme exhibits an effective performance improvement in task can effectively alleviate mobile traffic on backhaul links and
processing delay compared with other representative benchmark reduce content delivery latency [3]. On the other hand, service
methods. caching refers to caching the specific programs for task execu-
Index Terms—Vehicular edge computing, service caching, com- tion [8]–[14]. As a motivating example, in object detection, the
putation offloading, deep reinforcement learning. input data consists of videos and radar sensor data, and task
execution requires the corresponding object detection service
I. I NTRODUCTION program to be cached in the vehicle or the edge server. The
input data of object detection service is typically unique and
T HE rapid development of Internet of Things (IoT) and
artificial intelligence (AI) has recently led to the emer-
gence of diverse computation-intensive and latency-sensitive
hardly reusable for other executions. In comparison, service
program data in the cache is evidently reusable by future
executions of the same type of tasks. Because edge servers
Copyright (c) 2022 IEEE. Personal use of this material is permitted. have limited caching space, how to selectively cache service
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to [email protected]. program over space (e.g., at multiple edge servers) and time
Manuscript received June 2, 2022; revised September 4, 2022; accepted resources for achieving optimum transmission and computing
December 9, 2022. This work was supported in part by Guangzhou Sci- performance is crucial for efficient task computation.
ence and Technology Plan Project under Grant 202201010239; in part by
Graduate Education & Innovation Project of Guangdong Province under
Grant 2022XSLT022; in part by Guangdong Introducing Innovative and
Entrepreneurial Teams of The Pearl River Talent Recruitment Program The design of optimal computation offloading and service
(2021ZT09X044); in part by Guangdong Introducing Outstanding Young caching faces many challenges in vehicular networks. First,
Scholars of The Pearl River Talent Recruitment Program (2021QN02X546); vehicles and edge servers can only cache a small number
in part by Guangdong Provincial Key Laboratory of Photonics Information
Technology (No. 2020B121201011); in part by the Science and Technology of service programs at a time due to limited storage space.
Program of Guangzhou under Grant 202102020869, and the Guangdong Basic Thus, which service programs should be cached needs to
and Applied Basic Research Foundation under Grant 2022A1515110602 and be decided judiciously. Second, the computing resources on
Grant 2022A1515010153. (Corresponding author : Chang Liu.)
Zheng Xue, Chang Liu, Canliang Liao and Guojun Han are with different edge servers may be unevenly distributed. It is critical
School of Information Engineering, Guangdong University of Technolo- to balance the computation load by cooperative offloading.
gy, Guangzhou 510006, China. (e-mail: [email protected]; li- Third, the computation offloading decisions and the service
[email protected]; [email protected]; [email protected]).
Zhengguo Sheng is with the Department of Engineering and Design, Uni- caching decisions are closely correlated. Intuitively, we tend
versity of Sussex, Brighton, BN1 9RH, U.K. (e-mail: [email protected]). to offload a task if the required program is already cached at
2 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
the edge.1 Besides, the network status and available resources communication, caching and computing strategy to minimize
of edge servers change dynamically during the movement the system cost. To guarantee the reliability of communication
of vehicles. Therefore, it becomes significant and yet very systems, powerful error correction codes, such as low-density
challenging to design an appropriate service caching and parity-check (LDPC) codes, can be applied to enhance the
computation offloading strategy in the VEC systems. anti-noise and anti-fading capability [25]–[27]. Qiao et al. [20]
In this paper, we investigate a joint optimization of service minimize the content access cost for a novel cooperative edge
caching and computation offloading for a VEC system with caching framework. Ning et al. [21] design a mobility-aware
limited storage and computing capacities, taking account of edge computing and caching scheme to maximize mobile
time-varying task requests and dynamic network topology. In network operators’ profits. However, all these works are based
order to make full use of the limited caching and computing re- on the implicit assumption that all the services are available in
sources of each node (i.e. vehicles and edge servers) as well as edge servers, which is impractical due to their limited storage
the cooperative offloading between edge servers, we propose capacities.
a deep reinforcement learning (DRL)-based service caching Service caching, which refers to the caching of related
and computation offloading scheme to provide low-complexity programs for computational task execution, can significantly
decision making and adaptive resource management. The main affect the performances of MEC systems since service caching
contributions of this paper can be summarized as follows: strategies and computation offloading strategies are always
1) We design a novel edge service caching and computation coupled. There has been considerable research focusing on
offloading framework with cooperation among the cloud, edge joint service caching, computation offloading and resource
servers and vehicles, which not only balances the computa- allocation for mobile users in MEC system [8]–[12]. Yan
tion load among edge servers, but also enables integration et al. [8] propose an MEC service pricing scheme to co-
of caching and computing resources combined with edge ordinate with service caching decisions and control wireless
intelligence. devices’ task offloading behaviors in a cellular network. Ko
2) We minimize the cumulative average task processing et al. [9] maximize a sum-utility for multi-mobile service
delay over a long time-slotted horizon, considering dynamic caching enabled MEC. Bi et al. [10] formulate a mixed
task requesting, offloading and service caching, as well as integer nonlinear programming (MINLP) that jointly optimizes
dynamic channel conditions between vehicles and edge servers service caching placement, computation offloading decisions,
at each time slot. To solve the formulated long-term mixed and system resource allocation to minimize computation delay
integer non-linear programming (MINLP) problem, we pro- and energy consumption of mobile user. Zhang et al. [12]
pose an edge caching and offloading scheme based on deep investigate joint service caching, computation offloading and
deterministic policy gradient (DDPG) [15] to efficiently make resource allocation problem in a general multi-user multi-task
decisions on task offloading and service caching. scenario and aim to minimize the weighted sum of all users
3) We carry out extensive simulations to evaluate the computation cost and delay cost. However, the joint caching
average task processing delay and energy consumption of and computing strategy optimization in MEC systems cannot
the proposed scheme in VEC. Numerical results demonstrate be directly applied to VEC systems. The high mobility of
that our proposed scheme can achieve better performances vehicles results in complicated and dynamic topology as well
compared with the other benchmark approaches. as frequent server switching.
The rest of the paper is organized as follows. We review the There have been emerging efforts on content caching and
related work in Section II. The system model and the problem computation offloading in VEC [20], [21], [28]–[31]. Tian
formulation are described in Section III. In Section IV, the et al. [30] propose a collaborative computation offloading
proposed scheme is presented in details. Section V provides and content caching method, by leveraging DRL for a self-
numerical results, and Section VI concludes this paper. driving system. Wu et al. [31] propose a multi-agent based
reinforcement learning (RL) algorithm to make decisions on
II. R ELATED W ORK task offloading and edge caching to optimize both service
latency and energy consumption of vehicles. The important
In the past few years, computation offloading in mobile edge
difference between content caching and service caching is
computing (MEC) has been intensively discussed [16]–[19].
that the latter not only concerns storage capacity but also the
Zhang et al. [16] minimize the average bandwidth consump-
computing capacity. Thus, the service caching brings more
tion with a novel bidirectional computation task model by joint
challenges to the VEC system design. To the best of our
caching and computing offloading policy optimization. Tout et
knowledge, so far only a few works [13], [14] have inves-
al. [17] find the optimal dissemination of computational tasks
tigated the problem of joint optimization of service caching
within MEC-based infrastructures while satisfying persona
and computation offloading in VEC system. Tang et al. [13]
needs on a wider range of devices and assuring minimal
apply application caching to VEC to optimize the response
additional fees imposed by remote execution. Multi-user multi-
time for the outsourced applications while satisfying the time
task computation offloading and resource allocation for mobile
slot spanned energy consumption constraint. The Lyapunov
edge computing are proposed in [19]. For the task offloading
optimization technology is adopted to tackle this constraint
in VEC [1], [2], [20]–[24]. Tan et al. [2] propose a joint
issue. Finally, two greedy heuristic algorithms are incorporated
1 In this paper, we use the terms “edge”, “edge node” and “edge server” into the drift-plus-penalty based algorithm to help finding the
interchangeably. approximate optimal solution. Lan et al. [14] propose a fog-
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME 3
TABLE I
based vehicular architecture featuring computation offloading P RIMARY N OTATIONS
with service caching and design the offloading strategies based
on a DRL algorithm to reduce the task computation delay and Notation Definition
V The set of vehicles V = {1, 2, ...v, ...}
energy cost. However, the limitation of storage and computing E The set of edge nodes E = {1, 2, ...e, ...NE }
resources in vehicles or edge servers is not considered in [13], K The set of service programs K = {1, 2, ...k, ...NK }
[14]. Based on the existing literature, service caching and B The total bandwidth of each edge server
Bv,e (t) The bandwidth of edge server e allocated to vehicle v
computation offloading scenarios involving vehicles and edge in time slot t
servers with limited resources in VEC have not been explored. dk The input data size of task k
In recent years, AI-based algorithms have been successfully θk The required storage space of service program k
SvV (SeE ) The storage at each vehicle (edge)
applied in numerous related works on emerging vehicular γv,e The signal-to-noise ratio from edge e to vehicle v
applications and services which are mostly delay-sensitive. gv,e (t) The up-link gain between edge e and vehicle v in time
This smart driving assistance thus can significantly improve slot t
f V (f E ) The fixed CPU frequency of each vehicle (edge)
driving safety, reduce energy consumption, and enhance traffic κ The computing energy efficiency parameter
management efficiency [32], [33]. Specifically, RL and DRL λ The nature of service application
have been demonstrated to significantly improve the perfor- cV E
v,k (ce,k ) The caching decision of vehicle v’s (edge e’s) service
mances of vehicular task offloading [14], [20], [21], [29]–[31], program k
oV E
v,k (oe,k ) The offloading decision of vehicle v’s (edge e’s) task
[34]–[39]. Zhu et al. [35] propose a multiagent DRL-based k
computation offloading scheme to minimize the total task exe- Netask (t) The total number of tasks received at edge e in time
cution delay of the considered system during a certain period. slot t
Redge (Rcloud ) The transmission rate from edge to edge (cloud)
Since current vehicular services always contain multiple tasks pedge (pcloud ) The transmission power from edge to edge (cloud)
with data dependency, a knowledge driven service offloading
framework is proposed in [36] by exploring the DRL model
to find the long-term optimal service offloading policy. Since Service program
DRL agents can react to vehicular environment changes in Caching resource
milliseconds to achieve real-time decision making, they have Service A B C D E ... Computing resource
superiority in complex and highly dynamic vehicular networks Energy
with fast varying channels and computation load. Edge cooperative offloading
The above studies on VEC are mostly based on the assump- Task offloading Service caching
tion that service programs are completely available in vehicles Edge pool
or edge servers with limited resources, which is not always
feasible in practice. To the best of our knowledge, Refs. [13]
and [14] are the only existing literature that do not assume full
availability of service programs. However, they have not taken
the resource limitation into consideration. To fill this gap, we
aim at joint computation offloading and service caching, taking
full advantage of the limited resources of vehicles and edge
nodes as well as edge cooperative offloading to minimize the
Fig. 1. System illustration.
average task processing time.
service-dependent tasks in the system (i.e., running these tasks
III. S YSTEM M ODEL AND P ROBLEM F ORMULATION requires precaching of their corresponding service programs).
In this section, we first provide the system overview, com- If the associated service program of a requested task has been
munication model, as well as the service caching and task cached at the cloud or the edge pool, vehicles can offload
offloading model. Then we analytically derive the computa- a portion of the computing tasks via wireless communication
tion delay, energy consumption, and provide extreme value (e.g., cellular vehicle-to-everything (C-V2X)) to the edge pool
analysis of single task processing delay and edge node energy or the cloud, depending on the trajectory of the vehicle and
consumption. Last we present our problem formulation, which the location of the cached service program. The edge pool
is essential for service caching and task offloading scheme de- consists of a set of interconnected edge servers to balance
cision making. The primary notations utilized in the following different computing and caching resources. Due to the mobility
are summarized in Table I. of vehicles, cooperative edge offloading between multiple edge
servers can further improve the caching efficiency. However,
unlike the cloud which has abundant computing and storage
A. System Overview resources, limited computing and caching resources of edge
As illustrated in Fig. 1, we consider a general vehicular nodes allow only a small set of services program to be
network consisting of an edge pool (the set of edge nodes cached at the same time. Therefore, an AI-based management
are denoted as E = {1, 2, ...e, ...NE }), the cloud and a controller (i.e., agent) is typically deployed at the edge pool,
number of moving vehicles (the set of vehicles are denoted which collects information from vehicles and edge servers, and
as V = {1, 2, ...v, ...}). Suppose that there are NK ser- makes decisions on service caching and task offloading. To this
vice programs (e.g., executable .EXE files) corresponding to end, time is divided into a set of discrete time slot, indexed
4 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
Yes Yes
Yes Case 5 Case 6
Task completely Task completely
Program offloading offloading
cached
Yes
Case 1
No Program
No offloading cached Yes
Case 2 Case 3
Task partially No
No offloading
offloading
Case 4
Task partially
offloading
by {1, 2, ...t, ...tend }, each of which has an equal duration the Shannon’s formula:
∆t. The service caching and task offloading strategy can be
Rv,e (t) = Bv,e (t)log2 (1 + γv,e (t)), (2)
updated at each time slot.
where Bv,e (t) is the allocated bandwidth for vehicle v and
B. Communication Model Bv,e (t) = B/Netask (t).
To characterize a practical vehicle moving environment,
vehicles with driving speed ranging from Vmin to Vmax are C. Service Caching and Task Offloading Model
considered [4], [5] and edge nodes are assumed to be aware At the beginning of each time slot t, vehicles entering the
of all the vehicles belonging to its coverage area. We consider range of edge nodes update task requests, and complete task
a multi-channel uplink model, where each edge has overall offloading and computation in this time slot. Before the end
bandwidth of B, and each channel has two possible states (i.e., of each time slot, service caching decisions for edge nodes are
occupied and unoccupied). Note that the allocated spectrum is made by the management controller, while those for vehicles
the same for different edge nodes and the allocated channels are made by the vehicles themselves based on interests. Let
are all orthogonal so that the interference is negligible in the K = {1, 2, ...k, ...Nk } denote the set of service programs.
coverage area of the same edge node. We assume that each Each task can be represented as {dk , θk }, k ∈ K where
vehicle can only send one task request in time slot t, with dk denotes data size of the input data and θk denotes the
Netask (t) denoting the total number of tasks received at edge required storage space of service program k. Then we have
node e in time slot t. In this case, the signal-to-interference- the following storage capacity constraints:
plus-noise ratio (SINR) between vehicle v and edge e in time NK
X
slot t is given by cVv,k (t)θk ≤ SvV , ∀v, t, (3)
gv,e (t)pv,e (t) k=1
γv,e (t) = PN , (1)
E
PNetask (t)
pv,e (t) + σ 2 and
e=1 gv,e (t) v=1 NK
X
2
where σ refers to the variance of additive white Gaussian cE E
e,k (t)θk ≤ Se , ∀v, t. (4)
k=1
noise, gv,e (t) denotes the average channel gain, and pv,e (t)
represents the average channel transmission power of edge where cVv,k (t) ∈ {0, 1}, cE e,k (t) ∈ {0, 1} are binary decision
node e. variables to denote whether service program k is cached (i.e.,
If vehicle v needs to offload data to the nearest edge e, the cVv,k (t) = 1, cE V E
e,k (t) = 1) or not (i.e., cv,k (t) = 0, ce,k (t) = 0)
wireless transmission rate at time slot t is calculated based on on vehicle v and edge node e in time slot t. Sv and SeE V
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME 5
TABLE II
TASK O FFLOADING R ATIO OF D IFFERENT C ACHING C ASES
are the storage capacity of each vehicle and each edge server, exponent parameter. ωk denotes the number of cycles needed
respectively. for service program k, which is expressed as the number
In our system model, a partial offloading strategy is used of computation input data dk multiplied by a factor λ, i.e.
for vehicles’ tasks in each time slot, as illustrated in Fig. 2. ωk = λdk . Here λ (λ > 0) is determined based on the nature
For example, when vehicle v within communication range of service application, (e.g., computational complexity) [40].
of edge e initiates an offloading request for task k, if the f V and f E denote the fixed CPU frequencies of the vehicle
corresponding service program is not locally cached, this task and the edge server, respectively.
will be completely offloaded to the nearest edge. Then if the When vehicle v within communication range of the nearest
edge pool does not have the needed service program for task k edge e initiates a offloading request for task k, the agent
precached, the task will be uploaded to the cloud for execution selects device nodes that task k should be offloaded to
(Case 6 in Fig. 2). If program k is cached in both the vehicle according to the cache placement of the service program,
and its nearest edge node, the task can be handled at edge and determines the offload ratio for resource allocation. Let
nodes by partial offloading (Case 2 in Fig. 2). If program k oVv,k (t) ∈ [0, 1], oE
e,k (t) ∈ [0, 1] be a continuous decision
is cached in the nearest edge node and the edge pool but not variable to denote the ratio of task k offloaded to the nearest
in the vehicle itself, the vehicle completely offloads task k to edge and edge pool, and (1-oVv,k (t)) and (1-oE e,k (t)) bes the
the nearest edge node, and then task k is partially offloaded remaining task that is locally executed at vehicle v and the
from the nearest edge node to the edge pool (Case 4 in Fig. 2). nearest edge e, respectively. Table II lists the mathematical
As can be seen from Fig. 2, tasks are offloaded to different expressions of the task offloading ratio in different caching
computing nodes based on the corresponding service caching cases (corresponding to those in Fig. 2). Based on those, the
strategies. Therefore, caching and computing resources need computation time and transmission time of tasks involving
to be properly allocated by the agent to achieve maximum different caching cases are derived as follows.
benefits. Then the local execution delay of task k in vehicle v at time
slot t can be given as
D. Computation Delay and Energy Consumption
local λdk
Following [10], the time and energy consumption for the Tv,e,k (t) = cVv,k (t)(1 − oVv,k (t)) . (7)
fV
computation of task k are calculated as
ωk If task k needs to be offloaded to the nearest edge e, the time
Dk = , (5) consumption for the input data offloading of task k is
fk
( P
NE E dk
and up ϕ( e=1 ce,k (t))oVv,k (t) Rv,e (t) , Rv,e (t) > 0,
Tv,e,k (t) =
εk = κfkα Dk = κωk (fk )α−1 , (6) 0, Rv,e (t) = 0,
(8)
respectively, where fk denotes the CPU frequency and is
where
constrained by a maximum frequency fmax (i.e., fk < fmax ), (
κ (κ > 0) denotes the computing energy efficiency parameter, 1, if x > 0,
ϕ(x) = (9)
and α (in this work we assume that α = 2) denotes the 0, if x = 0.
6 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
In (8), Rv,e (t) = 0 indicates that the caching scheme belongs We define the average delay of task processing in time slot
to Case 1, and tasks do not need to be offloaded to edge nodes. t as
The execution time on this nearest edge is expressed as Ne task
NE
1 X 1 X(t)
total (15)
edge λdk Td (t) = Tv,e,k (t).
Tv,e,k (t) = cE V E
e,k (t)ov,k (t)(1 − oe,k (t)) . (10) NE e=1 Netask (t) v=1
fE
If the nearest edge server e is unavailable and task k needs Similarly, the average energy consumption of edge servers in
to be further offloaded to other edge servers in the edge pool, time slot t can be calculated as
task
the uploading time is obtained as NE Ne
X(t)
1 X 1
NE ε(t) = εtotal (t). (16)
uppool
X NE e=1 Netask (t) v=1 v,e,k
Tv,e,k (t) =(1 − cVv,k (t))ϕ( cE V E
e,k (t))ov,k (t)oe,k (t)
i=1,i6=e
dk E. Problem Formulation
× ,
Redge With the emerging of interactive AR/VR services, user ex-
(11) perience is crucial for users’ viscosity, which is the key to the
where Redge denotes the transmission rate among edge servers. success of those services. As an important aspect of the user
Then, the delay resulting from the computation at edge pool experience, the task processing delay is gradually becoming a
can be given as crucial wireless network performance metric. The main goal
NE
X of our work is to design a joint computation offloading and
pool service caching scheme for the purpose of minimizing long-
Tv,e,k (t) =(1 − cVv,k (t))ϕ( cE V E
e,k (t))ov,k (t)oe,k (t)
i=1,i6=e term cumulative average task processing time. Specifically, the
λdk optimization problem can be described as:
× E.
f tX
end
Td (t)
(12) min γ t−1 ( ), (17)
Since the output data is typically much smaller than the {cE
e,k
(t),oV
v,k
(t),oE
e,k
(t)}
t=1
Tdmax
input task size, we ignore the time delay of the output return s.t.
process [31]. As the computing and storage resources are
cVv,k (t) ∈ {0, 1}, ∀v ∈ V, ∀k ∈ K, ∀t ∈ {1, 2, ..., tend },
abundant in the cloud, we ignore the queuing and computing
(17a)
latency incurred by the cloud. However, users need to consider
the latency for task offloading to the cloud. Considering cE
e,k (t) ∈ {0, 1}, ∀e ∈ E, k, t, (17b)
the transmission delay and calculation delay under different 0≤ oVv,k (t) ≤ 1, ∀v, k, t, (17c)
caching and offloading decisions, the total delay of processing 0≤ oE ≤ 1, ∀e, k, t,
e,k (t) (17d)
task k in time slot t can be expressed as
0 ≤ γ ≤ 1; (17e)
up NK
total local
Tv,e,k (t) = max{Tv,e,k (t), Tv,e,k (t) X
cVv,k (t)θk ≤ SvV , ∀v, t, (17f)
edge uppool pool
+ max{Tv,e,k (t), Tv,e,k (t) + Tv,e,k (t)}} k=1
NK
N E X
(13) cE E
e,k (t)θk ≤ Se , ∀e, t. (17g)
X
+ (1 − cVv,k (t))(1 − ϕ( cEe,k (t)))
e=1 k=1
total
time slot t under different caching cases, in which the caching Hence, the minimum value of Tv,e,k (t) can be expressed as
decision variable cVv,k (t), cE
e,k (t) are not relevant.
total
1) Case 1: min{Tv,e,k (t)} = min{min{TdCase2 (t)}, min{TdCase4 (t)},
λdk
TdCase1 (t) = V . (18) TdCase6 (t)}.
f (28)
2) Case 2: The analysis above indicates that the maximum and min-
imum task processing delays are related to the communi-
λdk
TdCase2 (t) = max{(1 − oVv,k (t)) , cation, computing and caching capabilities of each device.
fV Meanwhile, aiming at minimizing the delay, cache decisions
(19)
dk λdk corresponding to the maximum tolerated delay should be
oVv,k (t)( + E )}.
Rv,e (t) f avoided as much as possible.
Similarly, in order to evaluate the maximum and minimum
3) Case 3:
values of εtotal
v,e,k (t), we need to derive the energy consumption
dk λdk of edge servers in time slot t under different caching cases.
TdCase3 (t) = + E. (20)
Rv,e (t) f 1) Case 1:
εCase1 (t) = 0. (29)
4) Case 4:
2) Case 2:
dk λdk
TdCase4 (t) = + max{(1 − oEe,k (t)) E , λdk
Rv,e f εCase2 (t) = κ(f E )2 (ov,k (t) ). (30)
(21) fE
dk λd k
oE
e,k (t)( + E )}. 3) Case 3:
Redge f
λdk
5) Case 5: εCase3 (t) = κ(f E )2 ( ). (31)
fE
dk dk λdk 4) Case 4:
TdCase5 (t) = + + E. (22)
Rv,e (t) Redge f λdk λdk
εCase4 (t) = κ(f E )2 ((1 − oe,k (t)) + oe,k (t) E )+
6) Case 6: fE f
dk
dk dk oe,k (t)pedge
TdCase6 (t) = + . (23) Redge
Rv,e (t) Rcloud dk
= κf E λdk + oe,k (t)pedge .
From the above analysis, we have Redge
(32)
total
max{Tv,e,k (t)} = max {TdCasei (t)} 5) Case 5:
{i=1,...6}
dk
= max{TdCase1 (t), TdCase5 (t), TdCase6 (t)} εCase5 (t) = κf E λdk + pedge . (33)
Redge
λdk dk dk λdk
= max{ V , + + E, 6) Case 6:
f Rv,e (t) Redge f
dk dk dk
+ }. εCase6 (t) = pcloud . (34)
Rv,e (t) Rcloud Rcloud
(24) Based on the above analysis, we have
max{εtotal
v,e,k (t)} = max {εCasei (t)}
total {i=1,...6}
min{Tv,e,k (t)} = min {TdCasei (t)}
{i=1,...6} = max{εCase5 (t), εCase6 (t)}
= min{TdCase2 (t), TdCase4 (t), TdCase6 (t)}. dk (35)
(25) = max{κf E λdk + pedge ,
Redge
In order to eliminate the offloading decision variables, it is dk
easy to obtain the minimum delays of Case 2 and Case 4 pcloud }.
Rcloud
based on their properties as piecewise linear functions.
From (29) it can be easily observed that min{εtotal
v,e,k (t)} = 0.
E
dk (f + λRv,e (t)) From the analysis of the maximum and minimum values
min{TdCase2 (t)} = fV fE
, total
λ + fV Rv,e (t) + f E Rv,e (t) of Tv,e,k (t), it can be seen that in order to reduce the task
(26) processing delay, it is necessary to make Case 2 and Case 4
fV
where ov,k (t) = 1/(1 + λ ( Rv,e1 (t) + λ
fE
)). caching decisions as many as possible. That is, tasks should be
offloaded to the edge nodes as much as possible. On the other
dk λdk (λRedge + f E ) hand, energy consumption is also a critical indicator for edge
min{TdCase4 (t)} = + E , (27)
Rv,e (t) f (2λRedge + f E ) server operators. Offloading tasks to edge nodes can reduce
task processing delays, but it increases energy consumption of
fE
where oe,k (t) = 1/(2 + λRedge ). edge servers.
8 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
IV. D EEP R EINFORCEMENT L EARNING FOR E DGE • cEe,k (t) : The service caching indicator of edge node e
C ACHING AND O FFLOADING after completing all task offloading and calculation in
Due to diverse user demands and the constrained compu- time slot t.
V
• ov,k (t) : The proportion of vehicle v offloading task k to
tation and caching resources, it is complex to minimize the
cumulative system average delay in (17). Most traditional edge node e in time slot t.
E
• oe,k (t) : The proportion of edge node e offloading task
optimization methods (e.g. convex optimization, game theory,
etc.) are assumed to have the knowledge of key factors in k to edge pool in time slot t.
vehicular networks, such as channel conditions and content The action of edge node e in time slot t is denoted as
popularity. However, these key factors are time-varying and ae (t) ={[cE (t)]ρmax ×NK , [oV (t)]ρmax ×NK , [oE (t)]ρmax ×NK }.
unavailable in reality. These methods can merely achieve (38)
optimal or near optimal results for one snapshot, since they Then the actions of all edge nodes are denoted as ae (t) after
ignore how the current decision exerts long-term influence on dimensionality reduction and normalization. The system action
resource allocation [21]. DRL is viewed as an efficient way to at time slot t consists of the actions of all edge nodes, defined
solve the complicated problem in a dynamic environment by as
optimizing the expected cumulative reward. In DRL, an agent at = {a1 (t), a2 (t), ..., aNE (t)}. (39)
collects the needed information regarding diverse demands
of users and available resources in vehicular networks. Then 3) Reward: The agent’s behavior is reward-based, and the
the agent takes an action to manage offloading and caching reward should correspond with the objective function. Hence,
decisions and optimizes resource allocation. We formulate the we set the reward in time slot t as
joint optimization problem as a discrete-time markov decision Td (t)
process (MDP) and propose a DDPG-based edge caching and rt = r(st , at ) = −( ). (40)
Tdmax
offloading scheme for joint service caching and computation
offloading strategy design. B. DDPG-Based Edge Caching and Offloading Scheme
Since the state space consists of a great amount of dynamic
A. Problem Formulation Based on DRL environmental information and the action space contains con-
1) State space: At the initial phase of each time slot t, each tinuous value, we exploit the deep deterministic policy gradi-
edge server gathers all the environmental parameters, which ent (DDPG) algorithm, a model-free and actor-critic algorithm,
contains the following parts: to solve the joint computation offloading and service caching
• Iv,k (t) : The request indicator for task k by vehicle v
problem. The framework of DDPG-based method is illustrated
within the coverage range of edge node e at time slot t, in Fig. 3, consisting of primary networks, target networks and
V
• cv,k (t) : The service caching indicator for vehicle v
a replay buffer.
within the coverage range of edge node e at time slot Based on the deterministic actor-critic model, we leverage
t. deep neural networks to provide accurate estimation of de-
• Bv,e (t), γv,e (t) : The bandwidth that edge node e allo-
terministic policy function µ(st ) and action-value function
cates to vehicle v and the received SINR of edge node e Q(st , at ), which should satisfy the following condition:
at time slot t.
E
• ce,k (t) : The service caching indicator of edge node e at Q(st , µ(st |θµ )|θQ ) ≈r(st , at )+
0 0 (41)
time slot t. γQ(st+1 , µ(st+1 |θµ )|θQ ).
The state of edge node e at time slot t is denoted as
As shown in Fig. 3, The primary networks use the actor
se (t) ={[I(t)]ρmax ×NK , [cV (t)]ρmax ×NK , [B(t)]ρmax ×NK , network µ(st |θµ ) and the critic network Q(st , at |θQ ) to
[γ(t)]ρmax ×NK , [cE (t)]ρmax ×NK }, approximate the policy function and the Q-value function,
(36) respectively. In addition, The target networks contain a actor
0 0
2
where ρmax denotes the maximum vehicle density within network µ(st |θµ ) and a critic network Q(st , at |θQ ) with the
each edge node. The system state at time slot t consists of the same structure. The target Q-value can be represented as
states of all edge nodes, defined as 0 0
yt = r(st , at ) + γQµ (st+1 , µ(st+1 |θµ )|θQ ). (42)
st = {s1 (t), s2 (t), ..., sNE (t)}. (37) We utilize the actor network to explore the policies and
Where the states of edge node e is denoted as se (t) after the critic network to critic the policies. The actor network
dimensionality reduction and normalization. architecture is illustrated in Fig. 4, which takes the state st
2) Action space: The agent obtains the states of service as input, and outputs an action at . The action variable cE
e,k (t)
caching and communication information between vehicles and needs to be discretized (i.e. dcE
e,k (t))e).
edge nodes, and then decides the offloading ratio of all tasks At the beginning of each time slot t, the agent collects
and updates service caching of all edge nodes. Here, the environmental information to get the current system state st .
corresponding action contains the following parts: In order to solve the exploration problem of deterministic
policy, we construct the action space by adding behavior noise
2 Bold letters are used to denote matrices. nt to obtain action at = µ(st |θµ ) + nt . After vehicles and
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME 9
Agent θQ ← τ θQ + (1 − τ )θQ ,
Replay Sampling 0 0 (45)
buffer
N ´ ( st , at , rt , st +1 )
θµ ← τ θµ + (1 − τ )θµ ,
Action
where τ is the updating coefficient. The whole process of the
Primary networks Target networks
Actor_P Critic_P Actor_T Critic_T
DDPG-based edge caching and offloading scheme is presented
in Algorithm 1.
Action Action
at = m ( st | q m )
at +1 = m(st +1 | q m¢ )
Algorithm 1 DDPG-Based Edge Caching and Offloading
Algorithm
Qm ( st , m ( st | q m ) | q Q )
Qm ( st +1 , m ( st +1 | q m ¢ ) | q Q¢ ) 0 0
Update q m
Update q Q
Input: The initial parameters: γ, θµ , θµ , θQ , θQ τ , M ,
Q Q
Policy gradient Ñq m J (q ) Loss function J (q ) tend , D, N .
Output: Primary optimal actor network parameter θµ .
1: Initialize primary networks and target networks.
2: Empty the experience replay buffer D.
Fig. 3. The framework of the DDPG-based method for vehicular edge caching
and offloading. 3: for episode = 1, 2, ...M do
4: Initial observation state s0 .
5: Add a random Gaussian distributed behavior noise nt
for action exploration.
6: for t = 1, 2, ...tend do
Vehiclesÿrequest
task 7: Agent receives normalized observation state st .
Vehiclesÿ
Edgesÿcaching 8: Select action at = µ(st |θµ ) + nt .
indicators
caching indicators
9: Perform action at , calculate immediate reward rt , and
Proportion of tasks
Channel bandwidth
offloaded to the
obtain the next normalized state st+1 .
allocated to
vehicles
nearest edges 10: if the replay buffer is not full then
Proportion of tasks
The received SINR
offloaded to edge
11: Store transition (st , at , rt , st+1 ) in replay buffer D.
of edge nodes pools 12: else
Edgesÿcaching 64 13: Randomly replace a transition in replay buffer D
indicators
128 with (st , at , rt , st+1 ).
256 14: Randomly sample a mini-batch of N transitions
512
(sj , aj , rj , sj+1 ), ∀j = 1, 2, ..., N from D.
0 0
Input Layer Hidden Layers Output Layer 15: Set yj = r(sj , aj ) + γQµ (sj+1 , µ(sj+1 |θµ )|θQ ).
TABLE III
S IMULATION PARAMETERS slot.
5) Least recently used (LRU) edge caching and offloading:
Parameter Value The services requested by the edge server in the previous time
Total number of time slots (tend ) 40
The duration for each time slot ∆t 30 s slot continue to be cached in the next time slot, the services
Density of vehicles within range of edge (ρ) [2,5] that have not been requested are randomly replaced [41], and
Each edge communication range 500 m the offloading ratio is determined according to the offloading
Bandwidth of each edge server (B) 20 MHz
The value of SINR 4∼5 dB scheme based on latency minimization.
Number of edge servers (NE ) 3 6) Executing all tasks in the cloud: All tasks are offloaded
Number of service types (NK ) 5 to the cloud for execution.
Data size of each task (dk ) 20 Mb
Storage space of each service program (θk ) 50 Mb
CPU cycles of each vehicle (f V ) 5 × 108 cy-
cles/s B. Simulation Results
CPU cycles of each edge server (f E ) 1 × 109 cy- First, we compare the total delay per episode of different
cles/s
Computation intensity for each task (λ) 105 cycles/bit schemes based on the DDPG learning algorithm. It can be seen
Storage capacity of each vehicle (SvV ) 50 Mb in Fig. 5 that all schemes can approach their stable cumulative
Storage capacity of each edge server (SeE ) 100 Mb average delay as the number of episodes increases. Meanwhile,
The computing energy efficiency parameter (κ) 1 × 10−26
Transmission rate between edge servers (Redge ) 15 Mbps
since the energy consumption is related to the task processing
Transmission rate from edge to the cloud Rcloud 10 Mbps delay, we evaluate the total energy consumption per episode
Transmission power between edge servers (pedge ) 1W in Fig. 6. As the number of episodes increases, except for
Transmission power from edge to the cloud (pcloud ) 2W
Discount factor (γ) 0.99
the offloading without edge caching scheme, the total delay
Learning rate of actor network (lra ) 0.001 of all the other considered schemes decreases and reaches a
Learning rate of critic network (lrc ) 0.002 stable value, while the total energy consumption increases and
Soft update coefficient (τ ) 0.01
Size of mini-batch sample (N ) 128
reaches a stable value. As analyzed in Sec. III-F, to reduce
The size of experience replay buffer (D) 10000 latency, it is necessary to offload tasks to edge nodes as much
as possible, which on the other hand increases the energy
A. Experimental Settings consumption of edge nodes. This is verified in Fig. 5 and
The involved parameters along with their corresponding Fig. 6. Another notable point is that our proposed scheme
values are listed in Table III. These previous works [10], achieves the lowest task processing latency with the same
[12], [31] can bring rich experience to the parameter settings, energy consumption of edge nodes. The offloading without
including the duration for each time slot ∆t, κ, and so edge caching scheme keeps the maximum delay, which is
on. For instance, ∆t should be set appropriately such that approximately the same as the total delay that the offloading
on one hand the agent has enough time to make caching and scheme based on energy minimization converges to. This is
offloading decisions, and on the other hand tasks offloaded to because when aiming to minimize energy consumption, the
edge server can be completely accomplished by the end of time offloading ratios are determined as the ones that minimizes
slot. We use Python 3.6 to create a simulation environment energy consumption in all the considered cases. However, of-
for the considered vehicular edge caching and task offloading floading without edge caching scheme consumes the maximum
system. In the simulation, each vehicle randomly requests its energy. The offloading without edge caching scheme remains
task of interest at the beginning of each time slot. The duration smooth because agent cannot participate in decision-making,
of each time slot is set appropriately such that the agent has and edge servers cannot provide computing and caching
sufficient time to make caching and offloading decisions, while resources. Our proposed scheme can yield the lowest total
tasks offloaded to other nodes can be accomplished by the delay compared with the other benchmark schemes, which
end of a time slot. Furthermore, we use TensorFlow 1.14.0 demonstrates the efficiency of DDPG-Based edge caching and
to implement the DDPG-based edge caching and offloading offloading scheme.
scheme. Next, we investigate the effects of vehicle density on the
We consider the following benchmark methods for perfor- total delay and energy consumption for different schemes in
mance comparison: Fig. 7. As the density of vehicle ρ increases, the number
1) Offloading without edge caching: Tasks requested by the of task requests increases, while the bandwidth allocated to
vehicle are computed locally or offloaded to the cloud. each vehicle and the transmission rate of tasks uploaded to
2) Offloading based on latency minimization: The task edge nodes decrease, resulting in an increase in the total task
offloading is performed according to the optimal ratio to processing delay. When ρ = 2, 3, the total latency of the
achieve the minimum latency under each case based on the executing all tasks in the cloud scheme is lower than that of
DDPG learning algorithm. the offloading without edge caching scheme. This is because as
3) Offloading based on energy minimization: The task each vehicle occupies more bandwidth resources, tasks can be
offloading is performed according to the optimal ratio to uploaded faster to the cloud, which has powerful computing
achieve the minimum energy consumption under each case resources. As the vehicle density continues to increase, this
based on the DDPG learning algorithm. superiority vanishes. It can be observed that our proposed
4) Random edge caching and offloading: The service scheme outperforms the other methods in terms of the total
caching and task offloading ratios are random in each time delay in the period tend .
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME 11
'