0% found this document useful (0 votes)

52 views9 pages

Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Uploaded by

abritypaulchowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views9 pages

Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Uploaded by

abritypaulchowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

SPECIAL SECTION ON EDGE COMPUTING AND NETWORKING FOR UBIQUITOUS AI

Received June 13, 2020, accepted June 30, 2020, date of publication July 3, 2020, date of current version July 14, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3007002

Collaborative Edge Computing and Caching

With Deep Reinforcement Learning
Decision Agents
JIANJI REN, HAICHAO WANG , TINGTING HOU, SHUAI ZHENG, AND CHAOSHENG TANG
College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China
Corresponding author: Jianji Ren ([email protected])

ABSTRACT Large amounts of data will be generated due to the rapid development of the Internet
of Things (IoT) technologies and 5th generation mobile networks (5G), the processing and analysis
requirements of big data will challenge existing networks and processing platforms. As the most promising
technology in 5G networks, edge computing will greatly ease the pressure on network and data processing
analysis on the edge. In this paper, we considered the coordination between compute and cache resources
between multi-level edge computing nodes (ENs), users under this system can offload computing tasks to
ENs to improve quality of service (QoS). We aimed to maximize the long-term profit on the edge, while
satisfying the low-latency computing of the users, and jointly optimize the edge-side node offloading strategy
and resource allocation. However, it is challenging to obtain an optimal strategy in such a dynamic and
complex system. To solve the complex resource allocation problem on the edge and make edge have certain
adaptation and cooperation, we used double deep Q-learning (DDQN) to make decisions, ability to maximize
long-term gains while making quick decisions. The simulation results prove the effectiveness of DDQN in
maximizing revenue when allocation resources on the edge.

INDEX TERMS Collaborative computing, edge computing, optimization strategy.

I. INTRODUCTION Edge computing and cloud computing, distributed comput-

As mobile communication and IoT technologies advances, ing, parallel computing, etc. provide the necessary technical
smart cities, health care systems, etc. are deeply integrated means to achieve accurate and fast integrated computing anal-
with IoT technologies, and a large amount of data gener- ysis. Cloud computing is based on platform virtualization,
ated will pose challenges to data analysis and processing. distributed storage, and parallel computing, flexible comput-
Although the cloud computing [1] platform provides an effi- ing resources are allocated. Edge computing can be used as an
cient computing platform for big data processing, high band- extension of cloud computing [2], [3]. It provides ubiquitous
width and high latency are unacceptable for scenarios with and low latency and reliable computing.
low latency requirements such as industrial control and real- However, edge computing has no powerful computing
time analysis. power of cloud computing. When a single computing node
In recent years, edge computing [2] as a new computing has many computing tasks, it is prone to high latency caused
platform attracted the attention of researchers, although edge by long task queues. Therefore, edge computing still has great
computing does not have a uniform definition, in essence, challenges in deployment and application.
it is by deploying computing resources at the edge of the (1) Firstly, the uncertainty of the computing task, due to
Internet, thereby reducing service delays, mitigating traffic the uncertainty of factors such as the size of the computing
pressure on the backhaul link and meeting the computational task, the length of computing time, and the delay of the
requirements of low latency applications. task, the workload between edge computing nodes may vary
greatly;
(2) Secondly, the workload scheduling of a single node,
The associate editor coordinating the review of this manuscript and
the task dynamic scheduling and computing resource alloca-
approving it for publication was Xu Chen. tion between nodes when collaborative computing between

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
120604 VOLUME 8, 2020
J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

multiple nodes. The nodes of the same level have the same allocation decisions and adjusts the CPU frequency of mobile
computing power, but there are differences in the number of devices. Mao et al. [10] proposes a dynamic calculation
tasks. It is of great significance to coordinate the workload offload algorithm, which is based on Lyapunov optimization.
balance between nodes and maintain low latency. This algorithm can jointly determine the CPU frequency and
(3) Finally, the time interval, the most valuable part of edge offload strategy of the energy harvesting equipment MECO
computing is the computing of low latency, so collaborative problem.
computing should meet the requirements of low latency. For global model training, a sorted list network with mul-
To solve the above problems, in this work, we use deep tiple losses is proposed by Sheng et al. [11] to speed up train-
reinforcement learning agents to determine the relevant nodes ing. This method can effectively mine training samples and
of collaborative computing. Specifically, we are using double avoid time-consuming initialization. An online orchestration
deep Q-learning [4], [5] tomaximize the long-term profit of framework that can be used for cross-edge service function
collaborative computing and ensure load balancing between chains is proposed by Zhou et al. [12], the framework can
nodes. dynamically optimize the flow routing and resource alloca-
The rest of the paper is organized as follows: The second tion jointly to improve the overall cost efficiency as much as
part summarizes the related work of collaborative computing possible. In order to sample and improve network decisions
in edge computing. The third part describes the dynamic in flow-aware software-defined networks, Wang et al. [13]
system model of edge collaborative computing. The fourth proposed a space-time cooperative sampling (STCS) frame-
part describes the collaborative computing strategy based on work, and the experimental results prove the effectiveness of
DDQN. We provided the results of simulation experiments its sampling.
and experiments in the fifth part. Finally, we summarized our Recently, researchers have begun to use machine learning
work and discusses the direction of future work. or deep learning to optimize the computational offload strat-
egy for edge computing. Zhang et al. [14] proposed an inter-
II. RELATED WORK mittent connection cloudlet system based on the Markov deci-
In recent years, edge computing networks based on multiple sion process for the dynamic offloading problem of mobile
access have received extensive attention from academia and users. But in the literature [15], in the mobile edge cloud,
industry. Edge computing eliminates latency by providing a the author studied the dynamic service migration problem and
large number of computing resources for application services proposed a sequential offload decision framework based on
that require low latency and high computational demands. the Markov decision process.
Although cloud computing has become very popular due Li et al. [16] proposed an RL-based optimization frame-
to its powerful computing and flexible resource allocation work to solve the resource allocation problem in wireless
strategies. However, due to the long distance between the MEC. The framework optimizes the offloading decision and
end device and the cloud, cloud computing services may not computing resource allocation by optimizing the total cost of
provide assurance for low latency applications in the edge delay and energy consumption of all UEs. Yang et al. [17]
network. proposed a computing resource allocation strategy based on
To solve these problems, Edge Computing (EC) [2], [3], deep reinforcement learning for URLLC edge computing
[6] has been studied to deploy computing resources closer networks with multiple users.
to the user device, which can effectively improve applica- Wang et al. [18] considered the decision-making ability of
tions’ Quality of Service(QoS) that requires large amounts reinforcement learning and the security of federated learning,
of computation and low latency. The computing of the task a framework combining the two is proposed to optimize
at the edge is complicated by the complex factors of com- communication, caching, and computation on the edge side.
puting, storage, caching, network, energy consumption, etc., Ren et al. [19] considered the dynamic workload and complex
it is difficult to make an offload strategy under low latency radio environment in the IoT environment, indicate the deci-
calculation limits, therefore, researchers used game theory sion of the IoT device through multiple Deep Reinforcement
to solve such problems. Zheng et al. [7] introduced ran- Learning (DRL) agents, distributed training is performed on
dom games to represent the mobile user’s dynamic offload DRL agents through federated learning, and agents are also
decision-making process and proposed a multi-agent random distributed on multiple edge nodes in a distributed manner.
learning algorithm to solve the multi-user computing offload For intelligent IoT applications, a framework is proposed
problem. Due to the problem of MEC multi-user computing by Liu et al. [20], which is based on the cloud edge architec-
offload in a multi-channel wireless interference environment, ture, apply federal learning to make smart applications avail-
Chen et al. [8] proposed to use the game theory, and proves able. In order to solve the heterogeneous problem in the IoT
the advantages of the algorithm in energy consumption and environment. Zhou et al. [21] made a comprehensive review
computing execution time. of recent studies on EI. First, he reviewed and analyzed the
In addition, heuristic algorithms or dynamic programming motivation and background of artificial intelligence running
methods can also be used to solve computational offload- at the edge of the network. Then he summarized several
ing problems. Dinh et al. [9] proposed a joint optimization key technologies on the edge, deep learning frameworks,
computational offloading framework that can improve task models, etc.. Edge intelligence builds intelligent edges by

VOLUME 8, 2020 120605

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Secondly, the base station, cellular network, wireless net-

work access point, etc., in which the user device is connected.
These equipment are located at the edge of the Internet,
connecting users and the Internet, and closest to the user
device. Placing the edge computing nodes here will greatly
reduce the delay and improve the users’ experience, this paper
assumes that the base station has an edge computing node
which user connected to, and the node is marked as a level 1
computing node.
Then there is a level 2 compute node. The level 2 compute
node is located between the level 1 compute node and the
FIGURE 1. Edge computing supported IoT system. cloud computing platform of the core network. It acts as a col-
laborator for the compute node of the level 1 compute node.
integrating DL into the edge computing framework to achieve A level 2 compute node can coordinate a cache, calculation,
dynamic and adaptive edge maintenance and management. etc. There are several levels 1 compute nodes in the area, and
Wang et al. [22] introduced and discussed the application sce- the level 2 compute nodes are close to the user, but not as
narios of edge intelligence, methods and technologies used, close as the level 1 node.
and future work challenges. Finally, the cloud computing platform is located in the core
The ‘‘Edge Artificial Intelligence’’ framework is network. The cloud computing platform stores the running
designed [18] to intelligently use the collaboration between environment of the user application and the latest data and
the device and the edge nodes to exchange data and model can release the file image to the computing node near the user
parameters, thereby better training and inferring the model. when needed.
In order to deal with complex dynamic control problems,
Wang et al. [23] proposed a FADE framework to accelerate A. COMMUNICATION MODEL
training. Shen et al. [24] by deployed deep reinforcement The system includes β level 1 computing nodes and α level
learning (DRL) agents on IoT devices to make offload com- 2 computing nodes, wherein the level 1 computing node β
puting decisions and used federated learning (FL) to conduct belongs to β, β = {1, 2, . . . , β}, and the level 2 computing
distributed training for DRL. Wu et al. [25] proposed a node α belongs to α, α = {1, 2, . . . , α}. There are a total
hierarchical edge artificial intelligence learning framework of γ , γ = {1, 2, . . . , γ } user devices, and user device γ
HierTrain, which effectively deploys DNN training tasks on belong to γ . Among them, γ devices are divided into β
a hierarchical MECC structure. groups, and user devices in each group are connected to β.
When we consider communication, computing resource For quantitative analysis, the time horizon is discretized into
allocation, delay constraints, etc., the complexity of the edge time epochs indexed by δ with equivalent duration as (in
computing system will be very high, it is challenging to obtain seconds).
an optimal strategy in such a dynamic and complex system. We described the network model using a single base station
Deep reinforcement learning is an improvement in reinforce- β and the user device γ connected to it. When the device
ment learning. The deep Q network is used to approximate γ establishes a connection with the base station β, the base
the Q value function [4] to avoid excessively high estimates. station allocates W Hz spectrum resources to the device, but
It can be used to implement automatic resource allocation in the base station channel experiences a time-space change of
wireless networks. rayleigh fading and following the flat fading model.
β
Therefore, we proposed that edge nodes use deep rein- We denote ζδ as the channel gain during the epoch δ
forcement learning agents to determine the allocation of between the device and an EN β ∈ β, which is assumed static
η
computing resources and the maximum long-term benefits. and independently taken from a finite state space ζ . The ρδ
η
Specifically, due to the complex resource allocation problem is the transmit power with maximum limitation ρmax , which
at the edge, we used DDQN as a decision agent, which makes 9 is the power of interference plus noise. The transmission
the edge have certain adaptation and cooperation. Ability to speed θ of the user device is calculated as follows:
maximize long-term gains while making quick decisions.
β η
θ = W ∗ log2 (1 + ζδ · ρδ /9) (1)
III. SYSTEM MODEL
This paper used the nodes with computing ability in the edge B. COMPUTING MODEL
computing environment to analyze, as shown in Figure 1. We assume that every computing node in the system has
Overall, the system is divided into four levels. The first is the ability to be virtualization, and the application running
the device layer where the user device is located, including environment image of the user device can be found in the
various networked devices of the user, such as mobile phones, cloud. Each level 1 compute node has an IP address and a
IoT, VR, PC, etc., which establish connections with the Inter- remaining computing resource table of other level 1 nodes
net through a wireless network access point or 5G. connected to it and a level 2 computing node α of the upper

120606 VOLUME 8, 2020

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

level. Within a certain period of time δ, the node connects C. PAYMENT STRATEGY
γ
to the surrounding node and one level 2 node α. The level After the user’s computing request Rβ is completed by node
1 node β broadcasts its own remaining computing resource β and the computing result is returned to the user device,
χβ , χβ = (Cβ , Sβ ) and accepts the remaining computing the user device will pay to the node β according to the delay of
β
resource χβ0 broadcasts from other nodes, updating the local the computing completion λδ . If the computing is completed
resource table based on the broadcast content. within the limited time of Tl , the user pays according to the
We treat γ β as a set of service requesters, where each actual delay time. If the edge node times out to complete the
requester γ belongs to γ β , connects to the nearest base station computing task, users will not pay it.
β according to its signal strength, and then sends a request  β β
to the compute node β where the base station is located, π ∗ λδ if λδ < Tl , π is the price of edge;

γ γ γ β
where Rβ , Rβ = (Ds , Tl , Cr , Sr ) is computing request send zβ = 0 if λδ > Tl ; (3)
γ
from user γ to the node β. Where Ds includes the task data η if task Rβ failed, and η < 0.


and image file globally unique identifier (GUID), Tl is the
β
computational delay limit of node β, Cr is the computing The delay λδ in this paper is defined as the time interval
resource size required, and Sr is the storage resource size between when the user equipment initiates the calculation
required. request and when the device receives the node calculation
γ
After node β receives the task request Rβ , First check the result. If the computing task is placed at the base station β
remaining computing resources χβ = (Cβ , Sβ ) at the node β, to which the user device is connected, the computing delay
β
and if the remaining computing resource χβ meets the user λδ can be expressed as:
computing requirement Cr < Cβ &Sr < Sβ , then the service β
is started. During the service start phase, the node β searches λδ = σγ + σr0 + hβ + σβ (4)
γ
for the image cache resource image required for the user task where σγ is the time spent on the transmit task Rβ , and σr0
computing from the local cache area. If the image resource is the time it takes to transmit the result. hβ is the time taken
is in the local cache area, the image is loaded from the local by the computing node β to switch the computing task at the
cache area and the computing service is started. If there is no base station, which is a small fixed value.
cached image locally, the node downloads the image file from
the cloud platform to the node β and starts the computing σγ = Ds /θ (5)
service. When the calculation ends, the node β returns the σβ is the time required for the computing node to complete
computing result to the user device, completes the computing the computing task. It is usually related to the CPU frequency
task of this period, waits for the user to compute the task for fcpu , the size of CPU cache ccpu , and the data size Ds of the
the next period or the user ends the computing command and task data.
γ
pay for the task Rβ .
If the remaining computing resource χβ at node β cannot σβ = Ds /(fcpu ∗ ccpu ) (6)
satisfy the user computing requirement, Cr > Cβ kSr > Sβ , If the computing task is placed at the neighboring base
the service cannot be opened locally, and node β will forward β0
station β 0 , the delay λδ is:
the user request to the node in the computing resource table
β0 β0
that meets the user’s computing requirements. If the node x λδ = σγ + σr0 + hβ 0 + σβ 0 + 2 ∗ dβ /c (7)
accepts the computing task, the user’s computing service will
β0
be completed by the node x, and the base station β performs where dβ is the fixed distance between the base station β
the transfer function here. and the neighbor node β 0 , c is the speed of light. Finally, our
If there is no node in the calculation resource table that objective function is:
meets user requirements, the length of the queue determines X X γ
whether the task is placed in the task queue or offloaded to max zβ
the cloud. If the queue at node β is not full, φβ < φβmax , β∈{β,α} δ

the computing task is placed in the local task queue, and the s.t. ∀γ ∈ γ , ∀β ∈ {β, α}
task queue is a first in first out FIFO model; if the task queue Cr >= 0, Sr >= 0 (8)
at node β is full φβ = φβmax , the computing task is offloaded
γ
to the cloud. In summary, the user computing task Rβ offload IV. COLLABORATIVE COMPUTING STRATEGY BASED ON
γ
policy ωβ is: DOUBLE DEEP Q-LEARNING
In order to better understand the DDQN agent, we briefly
0 if Cr < Cβ , Sr < Sβ and φβ = 0, node β;


 introduce DDQN in this paper. First, we introduce reinforce-
1 if C < C , S < S and φ = 0, node x;
 ment learning. Reinforcement learning is an important branch
γ r x r x x
ωβ = of machine learning, agents in reinforcement learning can
 2 if Cr < C ,
x r S < S β and φ β < φβ , queue φβ ;
max


 γ learn the actions that maximize the return through interaction
3 Offload task Rβ to the cloud. with the environment. Unlike supervised learning, reinforce-
(2) ment learning does not learn from the samples provided.

VOLUME 8, 2020 120607

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Instead, act and learn from their own experience in an uncer-

tain environment.

Algorithm 1 Collaborative Computing Framework

1: Initialize:
2: Each edge node loads DDQN model as agent;
3: The compute node β broadcasts its remaining computing
power χβ (Cβ , Sβ ) to other nodes;
4: Node β receives the remaining computing power broad-
cast and adds it to the calculation table C;
γ FIGURE 2. Decision Agent Based on DDQN.
5: If: there is computing task Rβ = (Ds, Tl, Cr, Sr);
γ
6: The agent takes the node β status s and the task Rβ as
input, s = C; Q
which Yt is defined as:
7: Generate a decision policy according to action a;
Q
8: Switch(a):; Yt = Rt+1 + γ · maxQ(St+1 , At ; θt ) (11)
a
9: case 0: Immediately allocate computing resources and
perform computing tasks; The formula of deep Q-learning is:
10: case 1: Put tasks into the local task queue φ and wait DQN
Yt = Rt+1 + γ · maxQ(St+1 , a; θt0 ) (12)
to allocate computing resources; a
11: case 2: Send task to the node in the computing resource Improved DQN: double deep Q-learning. DDQN-based
table; agent was shown in Figure 2. In conventional DQN, selecting
12: case 3: Send tasks to the cloud computing platform; an action and evaluating the selected action uses a maximum
13: Return: Send the results to the user device;
value that exceeds the Q value, which results in an overly
14: The user pays the node according to the calculation delay
optimistic estimate of the Q value. In order to alleviate the
and the task resource allocation amount; problem of overestimation, the target value in DDQN is
designed and updated to
Reinforcement learning has two salient features: multiple Q
trials and delayed rewards. Trial testing means weighing Yt = Rt+1 + γ · Q(St+1 , argmaxQ(St+1 , a; θt ); θt ) (13)
a
trade-offs between exploration and development. Agents will
The error function in DDQN is rewritten as:
try some effective actions that can generate rewards based on
past experience, but in order to generate higher returns, there Yt
DoubleQ
= Rt+1 + γ · Q(St+1 , argmaxQ(St+1 , a; θt ); θt0 )
is also a certain probability to explore new actions. Agents a
must take a variety of actions and gradually get the most out (14)
of it. Another feature of reinforcement learning is that agents
Among them, the action selection is separated from the
should look at the global, not only considering immediate
target Q value generation. This simple technique makes the
rewards, but also long-term cumulative rewards, which are
overestimation significantly reduced and the training process
designated as reward functions.
runs faster and more reliably.
Model-free reinforcement learning has been successfully
applied to the processing of deep neural networks and value
V. SIMULATION RESULTS
functions [4]. It can directly use the original state representa-
A. EXPERIMENT SETUP
tion as a neural network input to learn the strategy of difficult
This paper uses a simulation experiment method to instan-
tasks [5]. Q-Learning is a model-free reinforcement learning
tiate user device and edge computing nodes for simulation
algorithm. The most important component of the Q-learning
through Python programming. The operating system used in
algorithm is a method for correctly and effectively estimating
the experiment was CentOS7, the processor was Intel E5-
the Q value. Q-functions can be implemented simply by look-
2650 V4, the hard disk size was 480G SSD + 4T enterprise
up tables or function approximators, sometimes by nonlinear
hard disk, and the memory was 32G. The code interpreter
approximators, such as neural networks or even more com-
is Python, version 3.6, and the code runtime dependen-
plex deep neural networks. Q-learning is combined with deep
cies include Tensorflow, Keras, Numpy, Scipy, Matplotlib,
neural networks, so-called Deep learning Q-learning(DQN).
CUDA, etc..
The formula for Q-learning is:
The experimental data includes the computational task of
Qπ (s, a) = E[R1 + γ R2 + · · · |S0 = s, A0 = a, π] (9) the user offloading to the edge node in the time period i, which
is randomly generated by calling the Bernoulli and Poisson
The parameter update formula is:
h functions in the Scipy library. The experiment assumes that
Q
θt+1 = θt + α(Yt − Q(St , At ; θt )) Q(St , At ; θt ) (10) the user device has the ability to connect to the network and
θt can offload computing tasks and receive computing results.

120608 VOLUME 8, 2020

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Algorithm 2 Collaborative Edge Computing Strategy Based

on Double Deep Q-Learning
1: Initialization:
2: Initialize replay memory: R and the memory capacity: M ;
3: Main deep-Q network with random weights: θ ;
4: Target deep-Q network with weights: θ − = θ ;
5: For epoch i in I :
6: Input the system state s into the generated Q-network;
7: Compute the Q-value Q(s, a; θ );
8: Input the system state s0 into the generated Q-network;
9: Compute the Q-value Q(s0 , a; θ );
10: Input the system state s0 into the target Q-network; FIGURE 4. Detailed of rewards obtained by agents (6000-12000 steps).
11: Compute the Q-value Q(s0 , a; θ − );
12: Compute the target Q-value;
13: Y = p(s, a) + γ Q(s0 , argmax(s0 , a ; θ ), θ − );
14: Output: action a;
15: Record the changed status s00 and reward z after action
a to memory R;
16: End For
17: Save model.

FIGURE 5. The number of failed tasks changes during agent training.

FIGURE 3. Rewards obtained by the agents during training.

Experiments in this paper compared Double deep Q-

learning (DDQN), Deep Q-learning (DQN), Dueling deep Q-
learning and Natural Q-learning, where the learning rate is set
FIGURE 6. Detailed changes in the number of failed
to 0.001, the replay memory size is 200, and the total training tasks(6000-12000 steps).
steps is 12000, the neural network update iteration cycle was
set to update every 100 times. The penalty for mission failure
is −30. After the task is successfully completed, agents will However, with the increase of training and the update
get rewards, the size of the reward obtained is closely related of neural networks, the agent can make the correct offload
to the time the task is completed, the time the task is transmit- decision, and get more rewards than punishments, so the
ted, and the total time spent offloading the calculation. In the total rewards continue to increase with training. And it is
experiment, the wireless channel was set to 6 different levels. obvious that agents based on DDQN can get more rewards
with training.
B. RESULT ANALYSIS In order to more intuitively show the difference in rewards
The reward obtained by the agent in the experiment is shown obtained by different neural network agents, the zoomed-in
in Figure 3. During the period when training started, the neu- reward total changes are shown in Figure. 4.
ral network weights had just been initialized, and the agent Figure 5 shows the change in the number of task failures
could not give a good offload decision. At this time, the deci- during the training process. At the beginning of the training,
sion was randomly generated, resulting in The calculation of the task calculation failed more, and the total number of task
the received offload task fails and is punished, so the total failures increased rapidly. With the training, the increased
reward is negative. rate of task failures decreased and eventually reached a stable

VOLUME 8, 2020 120609

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

FIGURE 7. Agent training loss.

FIGURE 9. Value changes in agents.

FIGURE 8. Distribution map of transmission time when task is offloaded. agation time of user data is normally distributed, but it is
irregular. In the experiment, some user data are more, but
level. Observation shows that the DDQN-based agent has the network channel is poor. Therefore, the long transmission
fewer task failures during the training process. time leads to a high delay. However, some users have fewer
In order to more intuitively show the difference in the data and the network is better, and the transmission time is
number of failed tasks of different neural network agents, shorter.
the enlarged number change is shown in Figure. 6. The num- Figure 9 shows the value of the value calculated by the
ber of DDQN-based agent failures is always lower than other value function in the agent. At the beginning of the training,
reinforcement learning. the value cannot be well estimated, but as the training pro-
Figure 7. shows the loss of change during training. The loss gresses, the value estimation continues to approach the true
function of DDQN is defined as the square of the difference level. And reached a stable level in the end.
between the estimation function and the value function. The
loss is recorded every 6 pieces of training. The training loss VI. CONCLUSION AND FUTURE WORK
based on DDQN is less than the other reinforcement learning In this paper, we consider the bandwidth, computing, and
at the beginning. As the weight of the neural network is cache resources of the ENs, benefit from the deep learning
updated, the loss is getting smaller and smaller, the training and powerful learning ability and decision-making charac-
loss of several other reinforcement learning agents still fluc- teristics, maximize the edge while satisfying the low-latency
tuates. computing of users at the edge. In addition, it also considers
Record and display the user data transmission time in the the horizontal and vertical coordination cache and computing
experiment as shown in Figure 8. On the whole, the prop- at the edge, which has certain adaptability and can fully

120610 VOLUME 8, 2020

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

coordinate the computing resources at the edge to maximize [19] J. Ren, H. Wang, T. Hou, S. Zheng, and C. Tang, ‘‘Federated learning-
the value. However, the DDQN on the EN has a long training based computation offloading optimization in edge computing-supported
Internet of Things,’’ IEEE Access, vol. 7, pp. 69194–69201, 2019.
period and the effect is unstable. It needs to train for a while [20] D. Liu, X. Chen, Z. Zhou, and Q. Ling, ‘‘HierTrain: Fast hierarchical edge
to make better decisions. In addition, when multiple ENs in AI learning with hybrid parallelism in mobile-edge-cloud computing,’’
the same group perform collaborative computing, we have IEEE Open J. Commun. Soc., vol. 1, pp. 634–645, 2020, doi: 10.1109/
OJCOMS.2020.2994737.
not studied the prioritization strategy of computing resources [21] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, ‘‘Edge intelli-
or the computing resource bidding strategy. Future work we gence: Paving the last mile of artificial intelligence with edge computing,’’
will focus on competitive bidding and allocation priorities. Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019, doi: 10.1109/
JPROC.2019.2918951.
In addition, the security of users on the edge is also the focus [22] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen,
of research. ‘‘Convergence of edge computing and deep learning: A comprehensive
survey,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904,
2nd Quart., 2020, doi: 10.1109/COMST.2020.2970550.
REFERENCES [23] X. Wang, C. Wang, X. Li, V. C. M. Leung, and T. Taleb, ‘‘Federated deep
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, reinforcement learning for Internet of Things with decentralized cooper-
G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, ‘‘A view of ative edge caching,’’ IEEE Internet Things J., early access, Apr. 9, 2020,
cloud computing,’’ Int. J. Networked Distrib. Comput., vol. 53, no. 4, doi: 10.1109/JIOT.2020.2986803.
pp. 50–58, 2013. [24] S. Shen, Y. Han, X. Wang, and Y. Wang, ‘‘Computation offloading with
[2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, ‘‘Edge computing: Vision and multiple agents in edge-computing-supported IoT,’’ ACM Trans. Sensor
challenges,’’ IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016. Netw., vol. 16, no. 1, pp. 1–27, Feb. 2020, doi: 10.1145/3372025.
[3] M. Satyanarayanan, ‘‘The emergence of edge computing,’’ Computer, [25] Q. Wu, K. He, and X. Chen, ‘‘Personalized federated learning for intel-
vol. 50, no. 1, pp. 30–39, Jan. 2017. ligent IoT applications: A cloud-edge based framework,’’ IEEE Open J.
[4] H. H. Van, A. Guez, and D. Silver, ‘‘Deep reinforcement learning with Comput. Soc., vol. 1, pp. 35–44, 2020, doi: 10.1109/OJCS.2020.2993259.
double Q-learning,’’ in Proc. 13th AAAI Conf. Artif. Intell., 2016, pp. 1–13.
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,
D. Wierstra, S. Legg, and D. Hassabis, ‘‘Human-level control through
deep reinforcement learning,’’ Nature, vol. 518, no. 7540, p. 529, 2015.
[6] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, ‘‘Mobile edge JIANJI REN received the B.S. degree from the
computing: A key technology towards 5G,’’ ETSI White Paper, vol. 11, Department of Mathematics, Jinan University,
no. 11, pp. 1–16, Sep. 2015. in 2005, and the M.S. and Ph.D. degrees from
[7] J. Zheng, Y. Cai, Y. Wu, and X. S. Shen, ‘‘Stochastic computation offload- the School of Computer Science and Engineering,
ing game for mobile cloud computing,’’ in Proc. IEEE/CIC Int. Conf. Dong-A University, in 2007 and 2010, respec-
Commun. China (ICCC), Jul. 2016, pp. 1–6. tively. He is currently an Associate Professor
[8] X. Chen, L. Jiao, W. Li, and X. Fu, ‘‘Efficient multi-user computation with the College of Computer Science and Tech-
offloading for mobile-edge cloud computing,’’ IEEE/ACM Trans. Netw., nology, Henan Polytechnic University. His cur-
vol. 24, no. 5, pp. 2795–2808, Oct. 2016. rent research interests include mobile content-
[9] T. Quang Dinh, J. Tang, Q. Duy La, and T. Q. S. Quek, ‘‘Offloading centric networks and collaborative caching in edge
in mobile edge computing: Task allocation and computational frequency computing.
scaling,’’ IEEE Trans. Commun., vol. 65, no. 8, pp. 3571–3584, Aug. 2017.
[10] Y. Mao, J. Zhang, and K. B. Letaief, ‘‘Dynamic computation offloading
for mobile-edge computing with energy harvesting devices,’’ IEEE J. Sel.
Areas Commun., vol. 34, no. 12, pp. 3590–3605, Dec. 2016.
[11] H. Sheng, Y. Zheng, W. Ke, D. Yu, X. Cheng, W. Lyu, and Z. Xiong, ‘‘Min-
ing hard samples globally and efficiently for person re-identification,’’
IEEE Internet Things J., early access, Mar. 13, 2020, doi: 10.1109/ HAICHAO WANG received the B.S. degree in
JIOT.2020.2980549. natural geography and resource environment from
[12] Z. Zhou, Q. Wu, and X. Chen, ‘‘Online orchestration of cross-edge ser- Henan Polytechnic University, Jiaozuo, Henan,
vice function chaining for cost-efficient edge computing,’’ IEEE J. Sel. China, in 2018, where he is currently pursuing
Areas Commun., vol. 37, no. 8, pp. 1866–1880, Aug. 2019, doi: 10. the master’s degree in software engineering with
1109/JSAC.2019.2927070. the College of Computer Science and Technology
[13] X. Wang, X. Li, S. Pack, Z. Han, and V. C. M. Leung, ‘‘STCS: Spatial- (Software College). His research interests include
temporal collaborative sampling in flow-aware software defined net- edge computing, edge caching, big data analy-
works,’’ IEEE J. Sel. Areas Commun., vol. 38, no. 6, pp. 999–1013,
sis, deep learning, and the Internet of Things
Jun. 2020, doi: 10.1109/JSAC.2020.2986688.
technology.
[14] Y. Zhang, D. Niyato, and P. Wang, ‘‘Offloading in mobile cloudlet systems
with intermittent connectivity,’’ IEEE Trans. Mobile Comput., vol. 14,
no. 12, pp. 2516–2529, Dec. 2015.
[15] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, and K. K. Leung,
‘‘Dynamic service migration in mobile edge-clouds,’’ in Proc. IFIP Netw.
Conf. (IFIP Netw.), May 2015, pp. 1–9.
[16] J. Li, H. Gao, T. Lv, and Y. Lu, ‘‘Deep reinforcement learning based TINGTING HOU is currently pursuing the B.S.
computation offloading and resource allocation for MEC,’’ in Proc. IEEE degree with the College of Computer Science
Wireless Commun. Netw. Conf. (WCNC), Apr. 2018, pp. 1–6. and Technology, Henan Polytechnic University,
[17] T. Yang, Y. Hu, M. C. Gursoy, A. Schmeink, and R. Mathar, ‘‘Deep Jiaozuo, Henan, China. Her current major interests
reinforcement learning based resource allocation in low latency edge include edge computing, edge caching, deep learn-
computing networks,’’ in Proc. 15th Int. Symp. Wireless Commun. Syst. ing, big data analysis, and the Internet of Things
(ISWCS), Aug. 2018, pp. 1–5. technology.
[18] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, ‘‘In-edge AI:
Intelligentizing mobile edge computing, caching and communication by
federated learning,’’ IEEE Netw., vol. 33, no. 5, pp. 156–165, Sep. 2019.

VOLUME 8, 2020 120611

J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

SHUAI ZHENG is currently pursuing the B.S. CHAOSHENG TANG received the Ph.D. degree
degree with the College of Computer Science in management science and engineering from
and Technology, Henan Polytechnic University, Yanshan University, Qinhuangdao, Hebei, China,
Jiaozuo, Henan, China. His current major interests in 2015. His major interests include machine learn-
include edge computing, edge caching, deep learning, complexity theory, multimedia applications,
ing, big data analysis, and data mining. and online social networks.

120612 VOLUME 8, 2020

SAP Group Reporting Preparation Ledger
100% (4)
SAP Group Reporting Preparation Ledger
114 pages
Ambiguous Grammar: Context Free Grammars (CFGS) Are Classified Based On
No ratings yet
Ambiguous Grammar: Context Free Grammars (CFGS) Are Classified Based On
3 pages
Course Contents BS (Hons)
0% (1)
Course Contents BS (Hons)
28 pages
Coedge: Exploiting The Edge-Cloud Collaboration For Faster Deep Learning
No ratings yet
Coedge: Exploiting The Edge-Cloud Collaboration For Faster Deep Learning
9 pages
Optimizing Task Offloading and Resource Allocation in Edge-Cloud Networks: A DRL Approach
No ratings yet
Optimizing Task Offloading and Resource Allocation in Edge-Cloud Networks: A DRL Approach
15 pages
Resource Scheduling in Edge Computing Architecture Taxonomy Open Issues and Future Research Directions
No ratings yet
Resource Scheduling in Edge Computing Architecture Taxonomy Open Issues and Future Research Directions
22 pages
Resource Scheduling in Edge Computing A Survey
No ratings yet
Resource Scheduling in Edge Computing A Survey
35 pages
Deep Learning For Edge Computing Applications A State-Of-The-Art Survey
No ratings yet
Deep Learning For Edge Computing Applications A State-Of-The-Art Survey
15 pages
Scheduling Job and Online Dispatching in Edge Cloud Ijariie22944
No ratings yet
Scheduling Job and Online Dispatching in Edge Cloud Ijariie22944
5 pages
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
No ratings yet
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
6 pages
LBA-EC: Load Balancing Algorithm Based On Weighted Bipartite Graph For Edge Computing
No ratings yet
LBA-EC: Load Balancing Algorithm Based On Weighted Bipartite Graph For Edge Computing
12 pages
Edge Computing For Internet of Everything
No ratings yet
Edge Computing For Internet of Everything
14 pages
Cloud-Edge Hybrid Deep Learning Framework For Scalable Iot Resource Optimization
No ratings yet
Cloud-Edge Hybrid Deep Learning Framework For Scalable Iot Resource Optimization
27 pages
Deep Learning For Edge Computing Applications A ST
No ratings yet
Deep Learning For Edge Computing Applications A ST
14 pages
Cloud Computing Offloading Paper2 - Scopus
No ratings yet
Cloud Computing Offloading Paper2 - Scopus
10 pages
Task Partitioning and Offloading in Iot Cloud-Edge Collaborative Computing Framework: A Survey
No ratings yet
Task Partitioning and Offloading in Iot Cloud-Edge Collaborative Computing Framework: A Survey
19 pages
An Overview On Edge Computing Research
No ratings yet
An Overview On Edge Computing Research
18 pages
Management and Orchestration of Edge Computing For IoT A Comprehensive Survey
No ratings yet
Management and Orchestration of Edge Computing For IoT A Comprehensive Survey
25 pages
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
No ratings yet
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
36 pages
(IJCST-V13I2P17) :suhail, Sahdev, Abhinay Kumar, Shivam Saini, Ashutosh Pradhan
No ratings yet
(IJCST-V13I2P17) :suhail, Sahdev, Abhinay Kumar, Shivam Saini, Ashutosh Pradhan
4 pages
Task Allocation Algorithm and Optimization Model On Edge Collaboration
No ratings yet
Task Allocation Algorithm and Optimization Model On Edge Collaboration
16 pages
Dynamic Scheduling For Stochastic Edge-Cloud Computing Environments Using A3C Learning and Residual Recurrent Neural Networks
No ratings yet
Dynamic Scheduling For Stochastic Edge-Cloud Computing Environments Using A3C Learning and Residual Recurrent Neural Networks
15 pages
Online-Learning Task Scheduling With GNN-RL Scheduler
No ratings yet
Online-Learning Task Scheduling With GNN-RL Scheduler
17 pages
QoS-Aware Task Scheduling in Cloud-Edge Environment
No ratings yet
QoS-Aware Task Scheduling in Cloud-Edge Environment
10 pages
Enhancing User Code Efficiency in Edge Computing Applications Through Machine Learning
No ratings yet
Enhancing User Code Efficiency in Edge Computing Applications Through Machine Learning
31 pages
Ref.14 Price-Based - Resource - Allocation - For - Edge - Computing - A - Market - Equilibrium - Approach
No ratings yet
Ref.14 Price-Based - Resource - Allocation - For - Edge - Computing - A - Market - Equilibrium - Approach
16 pages
Edge Computing Needs Concerns and Challenges
No ratings yet
Edge Computing Needs Concerns and Challenges
13 pages
1 s2.0 S2665917423000764 Main
No ratings yet
1 s2.0 S2665917423000764 Main
8 pages
Cloud Computing Assingmnet 1
No ratings yet
Cloud Computing Assingmnet 1
7 pages
DeepEdge A New QoE-Based Resource Allocation Framework Using Deep Reinforcement Learning For Future Heterogeneous Edge-IoT Applications
No ratings yet
DeepEdge A New QoE-Based Resource Allocation Framework Using Deep Reinforcement Learning For Future Heterogeneous Edge-IoT Applications
13 pages
1 s2.0 S0167739X23003862 Main
No ratings yet
1 s2.0 S0167739X23003862 Main
15 pages
1 s2.0 S266729522100009X Main
No ratings yet
1 s2.0 S266729522100009X Main
13 pages
CC Research
No ratings yet
CC Research
6 pages
Governmentpolytechniccollege MATTANNUR-670702: (Department of Technical Education, Kerala)
No ratings yet
Governmentpolytechniccollege MATTANNUR-670702: (Department of Technical Education, Kerala)
29 pages
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
No ratings yet
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
14 pages
Edgecomputing
No ratings yet
Edgecomputing
6 pages
Edge and Central Cloud Computing A Perfect Pairing For High Energy Efficiency and Low-Latency
No ratings yet
Edge and Central Cloud Computing A Perfect Pairing For High Energy Efficiency and Low-Latency
14 pages
Edge Computing Sil
No ratings yet
Edge Computing Sil
18 pages
Large Language Models LLMs Inference Offloading and Resource Allocation in Cloud-Edge Computing An Active Inference Approach
No ratings yet
Large Language Models LLMs Inference Offloading and Resource Allocation in Cloud-Edge Computing An Active Inference Approach
12 pages
Sensors 22 09546
No ratings yet
Sensors 22 09546
16 pages
Edge Computing in Internet of Things (IoT) Real-Time Data Processing and Decision Making
No ratings yet
Edge Computing in Internet of Things (IoT) Real-Time Data Processing and Decision Making
4 pages
Conference Template A4
No ratings yet
Conference Template A4
3 pages
Edge-AI IoT Request Service Provisioning in Federated Edge Computing Using Actor-Critic Reinforcement Learning
No ratings yet
Edge-AI IoT Request Service Provisioning in Federated Edge Computing Using Actor-Critic Reinforcement Learning
10 pages
Journal of Computer Networks and Communications - 2024 - Mashal - Multiobjective Offloading Optimization in Fog Computing
No ratings yet
Journal of Computer Networks and Communications - 2024 - Mashal - Multiobjective Offloading Optimization in Fog Computing
16 pages
R16 Seminar Report Template1
No ratings yet
R16 Seminar Report Template1
21 pages
Dynamical Resource Allocation in Edge For Trustable Internet-of-Things Systems: A Reinforcement Learning Method
No ratings yet
Dynamical Resource Allocation in Edge For Trustable Internet-of-Things Systems: A Reinforcement Learning Method
11 pages
Collaborate Edge and Cloud Computing With Distributed Deep Learning For Smart City Internet of Things
No ratings yet
Collaborate Edge and Cloud Computing With Distributed Deep Learning For Smart City Internet of Things
12 pages
Edge Computing
No ratings yet
Edge Computing
23 pages
Edge Comp Report
No ratings yet
Edge Comp Report
26 pages
An Edge Computing Tutorial
No ratings yet
An Edge Computing Tutorial
5 pages
Seminar Report
No ratings yet
Seminar Report
15 pages
YouSz 20210421
No ratings yet
YouSz 20210421
26 pages
Edge Computing
No ratings yet
Edge Computing
19 pages
Seminar
No ratings yet
Seminar
13 pages
Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach
No ratings yet
Space/Aerial-Assisted Computing of Oading For IoT Applications: A Learning-Based Approach
13 pages
Toward Performance and Energy-Efficient Edge-of-Things
No ratings yet
Toward Performance and Energy-Efficient Edge-of-Things
7 pages
RK21BTB48 - Term Paper - CSE423 - Term Paper Roll No 48
No ratings yet
RK21BTB48 - Term Paper - CSE423 - Term Paper Roll No 48
6 pages
Seminario Edge
No ratings yet
Seminario Edge
29 pages
An Overview On Edge Computing Research
No ratings yet
An Overview On Edge Computing Research
15 pages
Edge Computing Application, Architecture, and Challenges in Ubiquitous Power Internet of Things
No ratings yet
Edge Computing Application, Architecture, and Challenges in Ubiquitous Power Internet of Things
46 pages
6617 - Seminar - Report
No ratings yet
6617 - Seminar - Report
21 pages
Edge Computing and Cloud Computing For Internet of Things
No ratings yet
Edge Computing and Cloud Computing For Internet of Things
22 pages
Cloud Computing
From Everand
Cloud Computing
Dr. Nirvikar Katiyar
No ratings yet
Gray Green Minimalist Corporate Professional Resume
No ratings yet
Gray Green Minimalist Corporate Professional Resume
1 page
Domain Adaptation For Underwater Image Enhancement: Zhengyong Wang, Liquan Shen, Mei Yu, Kun Wang, Yufei Lin and Mai Xu
No ratings yet
Domain Adaptation For Underwater Image Enhancement: Zhengyong Wang, Liquan Shen, Mei Yu, Kun Wang, Yufei Lin and Mai Xu
13 pages
A Debt To Pay - Eden Fanning
No ratings yet
A Debt To Pay - Eden Fanning
354 pages
Quiz Clash
No ratings yet
Quiz Clash
3 pages
International Womens Day 2024
No ratings yet
International Womens Day 2024
5 pages
Unit 1 Sách ĐT5
No ratings yet
Unit 1 Sách ĐT5
18 pages
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
No ratings yet
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
48 pages
Handbook ON Direct Benefit Transfer For LPG Consumer (DBTL) : Ministry of Petroleum and Natural Gas Government of India
No ratings yet
Handbook ON Direct Benefit Transfer For LPG Consumer (DBTL) : Ministry of Petroleum and Natural Gas Government of India
61 pages
Bang-Bang and Singular in Biology
No ratings yet
Bang-Bang and Singular in Biology
45 pages
Galileo Textbook Specimen
No ratings yet
Galileo Textbook Specimen
41 pages
Hands On - Collection, Generics and Stream API
No ratings yet
Hands On - Collection, Generics and Stream API
6 pages
Install Apex Using WebLogic11g - 12C
No ratings yet
Install Apex Using WebLogic11g - 12C
51 pages
CH-2023-07-12 - Amazon QuickSight
No ratings yet
CH-2023-07-12 - Amazon QuickSight
11 pages
SMTP Diag Tool
No ratings yet
SMTP Diag Tool
6 pages
DevOps Engineer
No ratings yet
DevOps Engineer
2 pages
Unit-5 Test
No ratings yet
Unit-5 Test
2 pages
Web Based Customer Management System For Electric Power Nekemte City
No ratings yet
Web Based Customer Management System For Electric Power Nekemte City
80 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
HP Laserjet Pro M404 Series
No ratings yet
HP Laserjet Pro M404 Series
5 pages
ISPF User's Guide Volume I PDF
No ratings yet
ISPF User's Guide Volume I PDF
260 pages
Amazon - Pass4sures - Aws Certified Solutions Architect Associate
100% (3)
Amazon - Pass4sures - Aws Certified Solutions Architect Associate
69 pages
Introduction To Java Programming: Preparred by R.Divya, Btech (It)
No ratings yet
Introduction To Java Programming: Preparred by R.Divya, Btech (It)
24 pages
R E C E S S L Unch Break: K. J. Somaiya College of Arts, Commerce & Science, Kopargaon
No ratings yet
R E C E S S L Unch Break: K. J. Somaiya College of Arts, Commerce & Science, Kopargaon
7 pages
Characteristics of New Media
No ratings yet
Characteristics of New Media
1 page
7mbi100sa 060B
No ratings yet
7mbi100sa 060B
8 pages
Information Security Handbook: Enhance Your Proficiency in Information Security Program Development 2nd Edition Anonymous 2024 Scribd Download
No ratings yet
Information Security Handbook: Enhance Your Proficiency in Information Security Program Development 2nd Edition Anonymous 2024 Scribd Download
40 pages
Packet Tracer
No ratings yet
Packet Tracer
4 pages
Grover 221210109
No ratings yet
Grover 221210109
5 pages
ORDERS D97A Message-Guideline
No ratings yet
ORDERS D97A Message-Guideline
45 pages
Introduction To Software Testing Tools
100% (1)
Introduction To Software Testing Tools
3 pages
Revised CET Draft TS (Rev 03) For Proposed HPWMS System at PBS As Per Comments Observations 26 12 2023
No ratings yet
Revised CET Draft TS (Rev 03) For Proposed HPWMS System at PBS As Per Comments Observations 26 12 2023
137 pages
HumanFactors BBS
No ratings yet
HumanFactors BBS
26 pages

Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Uploaded by

Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents

Uploaded by

SPECIAL SECTION ON EDGE COMPUTING AND NETWORKING FOR UBIQUITOUS AI

Collaborative Edge Computing and Caching

INDEX TERMS Collaborative computing, edge computing, optimization strategy.

I. INTRODUCTION Edge computing and cloud computing, distributed comput-

VOLUME 8, 2020 120605

Secondly, the base station, cellular network, wireless net-

120606 VOLUME 8, 2020

VOLUME 8, 2020 120607

Instead, act and learn from their own experience in an uncer-

Algorithm 1 Collaborative Computing Framework

120608 VOLUME 8, 2020

Algorithm 2 Collaborative Edge Computing Strategy Based

FIGURE 5. The number of failed tasks changes during agent training.

FIGURE 3. Rewards obtained by the agents during training.

Experiments in this paper compared Double deep Q-

VOLUME 8, 2020 120609

FIGURE 7. Agent training loss.

FIGURE 9. Value changes in agents.

120610 VOLUME 8, 2020

VOLUME 8, 2020 120611

120612 VOLUME 8, 2020

You might also like