Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents
Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents
Received June 13, 2020, accepted June 30, 2020, date of publication July 3, 2020, date of current version July 14, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3007002
ABSTRACT Large amounts of data will be generated due to the rapid development of the Internet
of Things (IoT) technologies and 5th generation mobile networks (5G), the processing and analysis
requirements of big data will challenge existing networks and processing platforms. As the most promising
technology in 5G networks, edge computing will greatly ease the pressure on network and data processing
analysis on the edge. In this paper, we considered the coordination between compute and cache resources
between multi-level edge computing nodes (ENs), users under this system can offload computing tasks to
ENs to improve quality of service (QoS). We aimed to maximize the long-term profit on the edge, while
satisfying the low-latency computing of the users, and jointly optimize the edge-side node offloading strategy
and resource allocation. However, it is challenging to obtain an optimal strategy in such a dynamic and
complex system. To solve the complex resource allocation problem on the edge and make edge have certain
adaptation and cooperation, we used double deep Q-learning (DDQN) to make decisions, ability to maximize
long-term gains while making quick decisions. The simulation results prove the effectiveness of DDQN in
maximizing revenue when allocation resources on the edge.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
120604 VOLUME 8, 2020
J. Ren et al.: Collaborative Edge Computing and Caching With Deep Reinforcement Learning Decision Agents
multiple nodes. The nodes of the same level have the same allocation decisions and adjusts the CPU frequency of mobile
computing power, but there are differences in the number of devices. Mao et al. [10] proposes a dynamic calculation
tasks. It is of great significance to coordinate the workload offload algorithm, which is based on Lyapunov optimization.
balance between nodes and maintain low latency. This algorithm can jointly determine the CPU frequency and
(3) Finally, the time interval, the most valuable part of edge offload strategy of the energy harvesting equipment MECO
computing is the computing of low latency, so collaborative problem.
computing should meet the requirements of low latency. For global model training, a sorted list network with mul-
To solve the above problems, in this work, we use deep tiple losses is proposed by Sheng et al. [11] to speed up train-
reinforcement learning agents to determine the relevant nodes ing. This method can effectively mine training samples and
of collaborative computing. Specifically, we are using double avoid time-consuming initialization. An online orchestration
deep Q-learning [4], [5] tomaximize the long-term profit of framework that can be used for cross-edge service function
collaborative computing and ensure load balancing between chains is proposed by Zhou et al. [12], the framework can
nodes. dynamically optimize the flow routing and resource alloca-
The rest of the paper is organized as follows: The second tion jointly to improve the overall cost efficiency as much as
part summarizes the related work of collaborative computing possible. In order to sample and improve network decisions
in edge computing. The third part describes the dynamic in flow-aware software-defined networks, Wang et al. [13]
system model of edge collaborative computing. The fourth proposed a space-time cooperative sampling (STCS) frame-
part describes the collaborative computing strategy based on work, and the experimental results prove the effectiveness of
DDQN. We provided the results of simulation experiments its sampling.
and experiments in the fifth part. Finally, we summarized our Recently, researchers have begun to use machine learning
work and discusses the direction of future work. or deep learning to optimize the computational offload strat-
egy for edge computing. Zhang et al. [14] proposed an inter-
II. RELATED WORK mittent connection cloudlet system based on the Markov deci-
In recent years, edge computing networks based on multiple sion process for the dynamic offloading problem of mobile
access have received extensive attention from academia and users. But in the literature [15], in the mobile edge cloud,
industry. Edge computing eliminates latency by providing a the author studied the dynamic service migration problem and
large number of computing resources for application services proposed a sequential offload decision framework based on
that require low latency and high computational demands. the Markov decision process.
Although cloud computing has become very popular due Li et al. [16] proposed an RL-based optimization frame-
to its powerful computing and flexible resource allocation work to solve the resource allocation problem in wireless
strategies. However, due to the long distance between the MEC. The framework optimizes the offloading decision and
end device and the cloud, cloud computing services may not computing resource allocation by optimizing the total cost of
provide assurance for low latency applications in the edge delay and energy consumption of all UEs. Yang et al. [17]
network. proposed a computing resource allocation strategy based on
To solve these problems, Edge Computing (EC) [2], [3], deep reinforcement learning for URLLC edge computing
[6] has been studied to deploy computing resources closer networks with multiple users.
to the user device, which can effectively improve applica- Wang et al. [18] considered the decision-making ability of
tions’ Quality of Service(QoS) that requires large amounts reinforcement learning and the security of federated learning,
of computation and low latency. The computing of the task a framework combining the two is proposed to optimize
at the edge is complicated by the complex factors of com- communication, caching, and computation on the edge side.
puting, storage, caching, network, energy consumption, etc., Ren et al. [19] considered the dynamic workload and complex
it is difficult to make an offload strategy under low latency radio environment in the IoT environment, indicate the deci-
calculation limits, therefore, researchers used game theory sion of the IoT device through multiple Deep Reinforcement
to solve such problems. Zheng et al. [7] introduced ran- Learning (DRL) agents, distributed training is performed on
dom games to represent the mobile user’s dynamic offload DRL agents through federated learning, and agents are also
decision-making process and proposed a multi-agent random distributed on multiple edge nodes in a distributed manner.
learning algorithm to solve the multi-user computing offload For intelligent IoT applications, a framework is proposed
problem. Due to the problem of MEC multi-user computing by Liu et al. [20], which is based on the cloud edge architec-
offload in a multi-channel wireless interference environment, ture, apply federal learning to make smart applications avail-
Chen et al. [8] proposed to use the game theory, and proves able. In order to solve the heterogeneous problem in the IoT
the advantages of the algorithm in energy consumption and environment. Zhou et al. [21] made a comprehensive review
computing execution time. of recent studies on EI. First, he reviewed and analyzed the
In addition, heuristic algorithms or dynamic programming motivation and background of artificial intelligence running
methods can also be used to solve computational offload- at the edge of the network. Then he summarized several
ing problems. Dinh et al. [9] proposed a joint optimization key technologies on the edge, deep learning frameworks,
computational offloading framework that can improve task models, etc.. Edge intelligence builds intelligent edges by
level. Within a certain period of time δ, the node connects C. PAYMENT STRATEGY
γ
to the surrounding node and one level 2 node α. The level After the user’s computing request Rβ is completed by node
1 node β broadcasts its own remaining computing resource β and the computing result is returned to the user device,
χβ , χβ = (Cβ , Sβ ) and accepts the remaining computing the user device will pay to the node β according to the delay of
β
resource χβ0 broadcasts from other nodes, updating the local the computing completion λδ . If the computing is completed
resource table based on the broadcast content. within the limited time of Tl , the user pays according to the
We treat γ β as a set of service requesters, where each actual delay time. If the edge node times out to complete the
requester γ belongs to γ β , connects to the nearest base station computing task, users will not pay it.
β according to its signal strength, and then sends a request β β
to the compute node β where the base station is located, π ∗ λδ if λδ < Tl , π is the price of edge;
γ γ γ β
where Rβ , Rβ = (Ds , Tl , Cr , Sr ) is computing request send zβ = 0 if λδ > Tl ; (3)
γ
from user γ to the node β. Where Ds includes the task data η if task Rβ failed, and η < 0.
and image file globally unique identifier (GUID), Tl is the
β
computational delay limit of node β, Cr is the computing The delay λδ in this paper is defined as the time interval
resource size required, and Sr is the storage resource size between when the user equipment initiates the calculation
required. request and when the device receives the node calculation
γ
After node β receives the task request Rβ , First check the result. If the computing task is placed at the base station β
remaining computing resources χβ = (Cβ , Sβ ) at the node β, to which the user device is connected, the computing delay
β
and if the remaining computing resource χβ meets the user λδ can be expressed as:
computing requirement Cr < Cβ &Sr < Sβ , then the service β
is started. During the service start phase, the node β searches λδ = σγ + σr0 + hβ + σβ (4)
γ
for the image cache resource image required for the user task where σγ is the time spent on the transmit task Rβ , and σr0
computing from the local cache area. If the image resource is the time it takes to transmit the result. hβ is the time taken
is in the local cache area, the image is loaded from the local by the computing node β to switch the computing task at the
cache area and the computing service is started. If there is no base station, which is a small fixed value.
cached image locally, the node downloads the image file from
the cloud platform to the node β and starts the computing σγ = Ds /θ (5)
service. When the calculation ends, the node β returns the σβ is the time required for the computing node to complete
computing result to the user device, completes the computing the computing task. It is usually related to the CPU frequency
task of this period, waits for the user to compute the task for fcpu , the size of CPU cache ccpu , and the data size Ds of the
the next period or the user ends the computing command and task data.
γ
pay for the task Rβ .
If the remaining computing resource χβ at node β cannot σβ = Ds /(fcpu ∗ ccpu ) (6)
satisfy the user computing requirement, Cr > Cβ kSr > Sβ , If the computing task is placed at the neighboring base
the service cannot be opened locally, and node β will forward β0
station β 0 , the delay λδ is:
the user request to the node in the computing resource table
β0 β0
that meets the user’s computing requirements. If the node x λδ = σγ + σr0 + hβ 0 + σβ 0 + 2 ∗ dβ /c (7)
accepts the computing task, the user’s computing service will
β0
be completed by the node x, and the base station β performs where dβ is the fixed distance between the base station β
the transfer function here. and the neighbor node β 0 , c is the speed of light. Finally, our
If there is no node in the calculation resource table that objective function is:
meets user requirements, the length of the queue determines X X γ
whether the task is placed in the task queue or offloaded to max zβ
the cloud. If the queue at node β is not full, φβ < φβmax , β∈{β,α} δ
the computing task is placed in the local task queue, and the s.t. ∀γ ∈ γ , ∀β ∈ {β, α}
task queue is a first in first out FIFO model; if the task queue Cr >= 0, Sr >= 0 (8)
at node β is full φβ = φβmax , the computing task is offloaded
γ
to the cloud. In summary, the user computing task Rβ offload IV. COLLABORATIVE COMPUTING STRATEGY BASED ON
γ
policy ωβ is: DOUBLE DEEP Q-LEARNING
In order to better understand the DDQN agent, we briefly
0 if Cr < Cβ , Sr < Sβ and φβ = 0, node β;
introduce DDQN in this paper. First, we introduce reinforce-
1 if C < C , S < S and φ = 0, node x;
ment learning. Reinforcement learning is an important branch
γ r x r x x
ωβ = of machine learning, agents in reinforcement learning can
2 if Cr < C ,
x r S < S β and φ β < φβ , queue φβ ;
max
γ learn the actions that maximize the return through interaction
3 Offload task Rβ to the cloud. with the environment. Unlike supervised learning, reinforce-
(2) ment learning does not learn from the samples provided.
FIGURE 8. Distribution map of transmission time when task is offloaded. agation time of user data is normally distributed, but it is
irregular. In the experiment, some user data are more, but
level. Observation shows that the DDQN-based agent has the network channel is poor. Therefore, the long transmission
fewer task failures during the training process. time leads to a high delay. However, some users have fewer
In order to more intuitively show the difference in the data and the network is better, and the transmission time is
number of failed tasks of different neural network agents, shorter.
the enlarged number change is shown in Figure. 6. The num- Figure 9 shows the value of the value calculated by the
ber of DDQN-based agent failures is always lower than other value function in the agent. At the beginning of the training,
reinforcement learning. the value cannot be well estimated, but as the training pro-
Figure 7. shows the loss of change during training. The loss gresses, the value estimation continues to approach the true
function of DDQN is defined as the square of the difference level. And reached a stable level in the end.
between the estimation function and the value function. The
loss is recorded every 6 pieces of training. The training loss VI. CONCLUSION AND FUTURE WORK
based on DDQN is less than the other reinforcement learning In this paper, we consider the bandwidth, computing, and
at the beginning. As the weight of the neural network is cache resources of the ENs, benefit from the deep learning
updated, the loss is getting smaller and smaller, the training and powerful learning ability and decision-making charac-
loss of several other reinforcement learning agents still fluc- teristics, maximize the edge while satisfying the low-latency
tuates. computing of users at the edge. In addition, it also considers
Record and display the user data transmission time in the the horizontal and vertical coordination cache and computing
experiment as shown in Figure 8. On the whole, the prop- at the edge, which has certain adaptability and can fully
coordinate the computing resources at the edge to maximize [19] J. Ren, H. Wang, T. Hou, S. Zheng, and C. Tang, ‘‘Federated learning-
the value. However, the DDQN on the EN has a long training based computation offloading optimization in edge computing-supported
Internet of Things,’’ IEEE Access, vol. 7, pp. 69194–69201, 2019.
period and the effect is unstable. It needs to train for a while [20] D. Liu, X. Chen, Z. Zhou, and Q. Ling, ‘‘HierTrain: Fast hierarchical edge
to make better decisions. In addition, when multiple ENs in AI learning with hybrid parallelism in mobile-edge-cloud computing,’’
the same group perform collaborative computing, we have IEEE Open J. Commun. Soc., vol. 1, pp. 634–645, 2020, doi: 10.1109/
OJCOMS.2020.2994737.
not studied the prioritization strategy of computing resources [21] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, ‘‘Edge intelli-
or the computing resource bidding strategy. Future work we gence: Paving the last mile of artificial intelligence with edge computing,’’
will focus on competitive bidding and allocation priorities. Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019, doi: 10.1109/
JPROC.2019.2918951.
In addition, the security of users on the edge is also the focus [22] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen,
of research. ‘‘Convergence of edge computing and deep learning: A comprehensive
survey,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904,
2nd Quart., 2020, doi: 10.1109/COMST.2020.2970550.
REFERENCES [23] X. Wang, C. Wang, X. Li, V. C. M. Leung, and T. Taleb, ‘‘Federated deep
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, reinforcement learning for Internet of Things with decentralized cooper-
G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, ‘‘A view of ative edge caching,’’ IEEE Internet Things J., early access, Apr. 9, 2020,
cloud computing,’’ Int. J. Networked Distrib. Comput., vol. 53, no. 4, doi: 10.1109/JIOT.2020.2986803.
pp. 50–58, 2013. [24] S. Shen, Y. Han, X. Wang, and Y. Wang, ‘‘Computation offloading with
[2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, ‘‘Edge computing: Vision and multiple agents in edge-computing-supported IoT,’’ ACM Trans. Sensor
challenges,’’ IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016. Netw., vol. 16, no. 1, pp. 1–27, Feb. 2020, doi: 10.1145/3372025.
[3] M. Satyanarayanan, ‘‘The emergence of edge computing,’’ Computer, [25] Q. Wu, K. He, and X. Chen, ‘‘Personalized federated learning for intel-
vol. 50, no. 1, pp. 30–39, Jan. 2017. ligent IoT applications: A cloud-edge based framework,’’ IEEE Open J.
[4] H. H. Van, A. Guez, and D. Silver, ‘‘Deep reinforcement learning with Comput. Soc., vol. 1, pp. 35–44, 2020, doi: 10.1109/OJCS.2020.2993259.
double Q-learning,’’ in Proc. 13th AAAI Conf. Artif. Intell., 2016, pp. 1–13.
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran,
D. Wierstra, S. Legg, and D. Hassabis, ‘‘Human-level control through
deep reinforcement learning,’’ Nature, vol. 518, no. 7540, p. 529, 2015.
[6] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, ‘‘Mobile edge JIANJI REN received the B.S. degree from the
computing: A key technology towards 5G,’’ ETSI White Paper, vol. 11, Department of Mathematics, Jinan University,
no. 11, pp. 1–16, Sep. 2015. in 2005, and the M.S. and Ph.D. degrees from
[7] J. Zheng, Y. Cai, Y. Wu, and X. S. Shen, ‘‘Stochastic computation offload- the School of Computer Science and Engineering,
ing game for mobile cloud computing,’’ in Proc. IEEE/CIC Int. Conf. Dong-A University, in 2007 and 2010, respec-
Commun. China (ICCC), Jul. 2016, pp. 1–6. tively. He is currently an Associate Professor
[8] X. Chen, L. Jiao, W. Li, and X. Fu, ‘‘Efficient multi-user computation with the College of Computer Science and Tech-
offloading for mobile-edge cloud computing,’’ IEEE/ACM Trans. Netw., nology, Henan Polytechnic University. His cur-
vol. 24, no. 5, pp. 2795–2808, Oct. 2016. rent research interests include mobile content-
[9] T. Quang Dinh, J. Tang, Q. Duy La, and T. Q. S. Quek, ‘‘Offloading centric networks and collaborative caching in edge
in mobile edge computing: Task allocation and computational frequency computing.
scaling,’’ IEEE Trans. Commun., vol. 65, no. 8, pp. 3571–3584, Aug. 2017.
[10] Y. Mao, J. Zhang, and K. B. Letaief, ‘‘Dynamic computation offloading
for mobile-edge computing with energy harvesting devices,’’ IEEE J. Sel.
Areas Commun., vol. 34, no. 12, pp. 3590–3605, Dec. 2016.
[11] H. Sheng, Y. Zheng, W. Ke, D. Yu, X. Cheng, W. Lyu, and Z. Xiong, ‘‘Min-
ing hard samples globally and efficiently for person re-identification,’’
IEEE Internet Things J., early access, Mar. 13, 2020, doi: 10.1109/ HAICHAO WANG received the B.S. degree in
JIOT.2020.2980549. natural geography and resource environment from
[12] Z. Zhou, Q. Wu, and X. Chen, ‘‘Online orchestration of cross-edge ser- Henan Polytechnic University, Jiaozuo, Henan,
vice function chaining for cost-efficient edge computing,’’ IEEE J. Sel. China, in 2018, where he is currently pursuing
Areas Commun., vol. 37, no. 8, pp. 1866–1880, Aug. 2019, doi: 10. the master’s degree in software engineering with
1109/JSAC.2019.2927070. the College of Computer Science and Technology
[13] X. Wang, X. Li, S. Pack, Z. Han, and V. C. M. Leung, ‘‘STCS: Spatial- (Software College). His research interests include
temporal collaborative sampling in flow-aware software defined net- edge computing, edge caching, big data analy-
works,’’ IEEE J. Sel. Areas Commun., vol. 38, no. 6, pp. 999–1013,
sis, deep learning, and the Internet of Things
Jun. 2020, doi: 10.1109/JSAC.2020.2986688.
technology.
[14] Y. Zhang, D. Niyato, and P. Wang, ‘‘Offloading in mobile cloudlet systems
with intermittent connectivity,’’ IEEE Trans. Mobile Comput., vol. 14,
no. 12, pp. 2516–2529, Dec. 2015.
[15] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, and K. K. Leung,
‘‘Dynamic service migration in mobile edge-clouds,’’ in Proc. IFIP Netw.
Conf. (IFIP Netw.), May 2015, pp. 1–9.
[16] J. Li, H. Gao, T. Lv, and Y. Lu, ‘‘Deep reinforcement learning based TINGTING HOU is currently pursuing the B.S.
computation offloading and resource allocation for MEC,’’ in Proc. IEEE degree with the College of Computer Science
Wireless Commun. Netw. Conf. (WCNC), Apr. 2018, pp. 1–6. and Technology, Henan Polytechnic University,
[17] T. Yang, Y. Hu, M. C. Gursoy, A. Schmeink, and R. Mathar, ‘‘Deep Jiaozuo, Henan, China. Her current major interests
reinforcement learning based resource allocation in low latency edge include edge computing, edge caching, deep learn-
computing networks,’’ in Proc. 15th Int. Symp. Wireless Commun. Syst. ing, big data analysis, and the Internet of Things
(ISWCS), Aug. 2018, pp. 1–5. technology.
[18] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, ‘‘In-edge AI:
Intelligentizing mobile edge computing, caching and communication by
federated learning,’’ IEEE Netw., vol. 33, no. 5, pp. 156–165, Sep. 2019.
SHUAI ZHENG is currently pursuing the B.S. CHAOSHENG TANG received the Ph.D. degree
degree with the College of Computer Science in management science and engineering from
and Technology, Henan Polytechnic University, Yanshan University, Qinhuangdao, Hebei, China,
Jiaozuo, Henan, China. His current major interests in 2015. His major interests include machine learn-
include edge computing, edge caching, deep learn- ing, complexity theory, multimedia applications,
ing, big data analysis, and data mining. and online social networks.