0% found this document useful (0 votes)

44 views13 pages

A Multi-Model Edge Computing Offloading Framework For Deep Learning Application Based On Bayesian Optimization

This document summarizes a research paper that proposes a multi-model edge computing offloading framework for deep learning applications. The framework uses NVIDIA Jetson edge devices and GeForce RTX GPU servers to simulate an edge computing environment. It introduces a Bayesian optimization algorithm called MTPE to reduce the total cost of edge computation, including response time and energy consumption, while ensuring accuracy for tasks like face detection. Experimental results show the MTPE algorithm can achieve an optimal solution faster than alternatives and reduce costs by an average of 17.94% compared to a single-model framework.

Uploaded by

Abdullah Mansour Algashwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views13 pages

A Multi-Model Edge Computing Offloading Framework For Deep Learning Application Based On Bayesian Optimization

Uploaded by

Abdullah Mansour Algashwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

This article has been accepted for publication in IEEE Internet of Things Journal.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2023.3280162

A Multi-model Edge Computing Offloading

Framework for Deep Learning Application Based
on Bayesian Optimization
Zidi Zhao, Hong Zhang*, Liqiang Wang, and Haijun Huang

Abstract—With the rapid development of Internet of Things vehicles, asset tracking & monitoring, account for around 60
(IoT), data generated by IoT devices are also increasing expo- percent of all connected IoT devices in 2020. The explosive
nentially. The edge computing has alleviated the problems of growth of IoT devices and the development of 5G technology
limited network and transmission delay when processing tasks
of IoT devices in traditional cloud computing. And with the have subsequently promoted their applications in Internet of
popularity of deep learning, more and more terminal devices are Medical Things (IoMT) [2], Internet of Vehicles (IoV) [3] and
embedded with AI (Artificial Intelligence) processors for higher other scenarios [4]–[6]. Unfortunately, various data generated
processing capability at the edge. However, the problems of by IoT devices are not just literal data and pictures, but also
Deep Learning task offloading in heterogeneous edge computing include streaming video that take up a bunch of resources
environment have not been fully investigated. In this paper, a
multi-model edge computing offloading framework is proposed, and are time-critical. In intelligent scenarios, although cloud
using NVIDIA Jetson edge devices (Jetson TX2, Jetson Xavier platforms may provide higher computing capacity, there are
NX, and Jetson Nano) and GeForce RTX GPU servers (RTX3080 many issues with them, such as limited network resources
and RTX2080) to simulate the edge computing environment, and [7], transmission delay [8], privacy leakage [9] and other
make binary computational offloading decisions for face detection problems [10], which could be considerably challenging for
tasks. We also introduce a Bayesian Optimization algorithm,
namely MTPE (Modified Tree-structured Parzen Estimator), to IoT applications. The concepts of edge computing [11], fog
reduce the total cost of edge computation within a time slot computing [12] and cloudlet [13] have been put forward
including response time and energy consumption, and ensure to alleviate these problems by moving computing resources
the accuracy requirements of face detection. In addition, we from the cloud to the edge to reduce cloud computing load
employ the Lyapunov model to obtain the harvesting energy beforehand.
between time slots to keep the energy queue stable. Experiments
reveal that MTPE algorithm can achieve the globally optimal
solution in fewer iterations. The total cost of multi-model edge
computing framework is reduced by an average of 17.94%
compared to a single-model framework. In contrast to the Double
Deep Q-Network (DDQN), our proposed algorithm can decrease
the computational consumption by 23.01% for obtaining the
offloading decision.
Index Terms—edge computing, multi-model, Deep Learning,
Bayesian Optimization, Modified Tree-structured Parzen Esti-
mator, Lyapunov drift function.

I. I NTRODUCTION

I N recent years, the variety and number of Internet of

Things (IoT) devices have been growing rapidly, according
to Statista [1], the number of IoT devices worldwide is
forecast to almost triple from 8.74 billion in 2020 to more
than 25.4 billion IoT devices in 2030. IoT devices in the con-
sumer segment, such as smartphones, connected (autonomous)
The work of Z. Zhao, H. Zhang, and H. Huang was supported in part
by the Natural Science Foundation of Hebei Province of China under Grant
No. F2019201361, by the Science and Technology Research Project of Hebei
Higher Education Institutions under Grant QN2020133. The work of L. Wang
was supported in part by NSF-1952792. (Corresponding author: Hong Zhang.) Fig. 1. Edge computing framework.
Zidi Zhao and Hong Zhang are with the School of Cyber Security
and Computer, Hebei University, Baoding 071002, Hebei, China (e-mail:
[email protected]; [email protected]). The common three-layer edge computing framework, also
Liqiang Wang is with the Department of Computer Science, University of known as the “Cloud-Edge-End” framework [14], is shown in
Central Florida, Orlando, FL, USA (e-mail: [email protected]). Fig. 1. The traditional end device is a device that only has
Haijun Huang is with the School of Cyber Security and Com-
puter, Hebei University, Baoding 071002, Hebei, China (e-mail: huanghai- the function of data collection, and the edge server nearby
[email protected]). performs data processing tasks. However, with the upgrade

Authorized licensed use limited to: De Montfort University. Downloaded on September 17,2023 at 16:02:48 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2023.3280162

of end devices, they can also perform edge computing with range of scenarios as an example, because face recognition
embedded AI processors [15], such as surveillance camera is conducted on a variety of end and edge devices such as
[16], unmanned aerial vehicle (UAV) [17], which further stations, commercial districts and military bases, meanwhile,
relieve the pressure of the cloud. Furthermore, edge and cloud face recognition often requires good performance on accuracy,
computing are not mutually exclusive. In large scenarios, efficiency, and energy-consumption. In our experiment, we
numerous edge servers can form an edge cloud closer to the deploy face detection models on different Jetson devices
terminal, which can be a part of cloud computing. and edge servers to construct edge computing environments.
An AI application task can be divided into several sub- In particular, we propose a modified Bayesian Optimization
tasks, which can be offloaded according to the capability of algorithm named MTPE to guide the offloading decision.
edge servers and subtask complexity. For example, a face The offloading indicators are regarded as the parameters that
recognition task can be roughly divided into four subtasks: need to be configured by MTPE to minimize the time and
image acquisition, image preprocessing, face detection and energy consumption of face detection, maintain the accuracy
face recognition. The first three steps can be handled by edge at an ideal level, and improve the quality of experience (QoE).
devices, whereas the feature extraction results are usually Finally, we ensure energy stability by minimizing the upper
transmitted to the cloud for face matching and recognition. bound of the Lyapunov drift function, and determining the
Koubaa et al. [18] compared the time consumption of face optimal values of the initial energy and the energy replenished
recognition tasks in cloud, edge and hybrid architectures, and between time slots.
found that the hybrid architecture took the shortest time, which The main contributions of this paper are summarized as
means the cloud and edge can complement each other and follows:
jointly improve task processing efficiency. There are four edge • We propose a novel edge computing offloading frame-
computing offloading schemes between end devices and edge work based on multiple models conduct extensive ex-
servers, which are one-to-one, one-to-many, many-to-one and periment on real end devices including Jetson GPUs and
many-to-many [19]. These schemes can be used in various edge servers with different backbone networks to evaluate
offloading scenarios, which fully embodies the flexibility and our framework on various scenarios. Deploying multiple
scalability of edge computing. models in edge computing environment can extremely
However, there are still several open issues and challenges balance the accuracy and efficiency at different task
in edge computing offloading: 1) Model selection: due to the levels.
heterogeneous environment of the “Cloud-Edge-End” frame- • We introduce a modified Tree-structured Parzen Estimator
work, a single deep learning model may not fit all computing (MTPE) algorithm to minimize the offloading cost of
platforms well. For example, GPU resources of end devices the whole edge computing environment in a single time
cannot support the demands of sophisticated neural networks slot. The optimized objectives such as accuracy, time and
such as Resnet50. Then, we need to choose certain lightweight energy consumption are fully considered in our algorithm
models suitable for heterogeneous end devices to process AI to choose the global optimal solution as possible by
computing requests. 2) Trade-off of accuracy and efficiency: establishing the probability density function.
in practice, the number of tasks generated by end devices • We also provide a dynamic energy adjustment algorithm
in different time slots is not uniform. As computation scale to keep the energy queue stable. The Lyapunov drift
up, the accuracy disparity of different models becomes in- function is employed to describe the energy change of the
creasingly pronounced. Lower end devices may not support dynamic energy queue and minimize these fluctuations by
the high-accuracy requirement, while offloading to the edge limiting the upper bound of the drift function. To obtain
server with a high-accuracy model diminishes the efficiency. the harvesting energy among time slots and the limitation
Therefore, the accuracy and efficiency at various task levels of the initial energy, we calculate the minimum point
cannot be balanced by deploying a single model on edge of a unary quadratic function, which keeps the energy
servers. 3) Complex offloading algorithm: with the expansion adequate throughout the processing cycle.
of the computation scale, offloading algorithms such as meta- The rest of this paper is organized as follows. The related
heuristic approach and deep reinforcement learning become work is summarized in Section II. The system model and our
much more complex than before in terms of initialization algorithm are introduced in Section III. Experimental results
parameters and iterations. 4) Energy supply: the total cost of are given in Section IV, and the conclusion and future work
task offloading within a single time slot is often considered are presented in Section V.
in previous research for edge computing. Yet how to supply
energy among time slots to ensure the stability of energy queue II. R ELATED W ORK
is still a critical problem that needs to be resolved. At present, there are many research achievements in the
In response to these shortcomings and challenges, we pro- field of edge computing. In addition to the common research
pose a multi-model edge computing framework for a many- on offloading problems, Deep Learning technology in AI
to-many offloading scheme to improve the efficiency and applications has been widely integrated into the edge comput-
accuracy of AI applications on edge IoT devices, keep energy ing framework. Wang et al. summarize the concepts of edge
consumption within an ideal range, create a more intelligent intelligence and intelligent edge in [20].
edge computing environment, and make end and edge devices For edge intelligence, most of the research on the appli-
highly cohesive. Here we take face recognition with a wide cation of embedded AI devices in edge computing systems

is related to object detection and recognition. Li et al. [21] dustrial IoT (IIoT) applications. In [40], Xue et al. proposed an
proposed a driver fatigue detection system based on CNN and Efficient offloading scheme for DNN Inference Acceleration
tested the performance of the detection system in real driving in a three-layer collaborative environment. For migration plan,
scenarios. In this paper, Jetson Nano was used as an edge algorithm PSO-GA is applied to obtain the distribution of
computing device for real-time detection to improve robustness DNN layers under the server with the lowest migration delay,
and accuracy of the system. Chang et al. [15] presented and for uploading plan, a Layer Merge Uploading Algorithm
a wearable assistive system based on AI edge computing is proposed to obtain DNN partitions and their upload order
technology and adopted Deep Learning technology for real- with efficient DNN query performance. The authors of [44]
time recognition of zebra crossing images. In [22], Liu et al. studied the application of MEC in the air-to-ground integrated
designed a food recognition system based on edge computing wireless network. They optimized the offloading decision of
to make an accurate dietary assessment and overcome the the DNN model, resource allocation, and UAV route based
problems of system delay and low battery life of mobile on energy consumption and resource constraints. It’s worth
devices in mobile cloud computing. Both [3] and [17] studied noting that the UAV deploys well trained DNN models with
the traffic video surveillance system based on edge computing different model input sizes to satisfy various QoS requirements
platform. Wan et al. [3] offloaded vehicle detection tasks to of IoT devices. These studies considered various influencing
edge nodes with Jetson TX2, while the system designed in [17] factors in edge offloading, such as delay, energy consumption,
was embedded in the UAV to track and detect vehicles in real- communication resources. The algorithm needs to tune a large
time. The authors of [23] proposed a distributed system for number of parameters. Moreover, the tuning process for the
video analysis, which divided the heavy processing of large- proposed algorithms such as reinforcement learning is very
scale video streams into various machine learning tasks and slow, which cannot guarantee timeliness in real-time scenarios.
deployed these tasks as data processing workflows on edge How to combine Deep Learning tasks in real scenes with
devices equipped with neural network hardware accelerators. edge computing, minimize task delay, energy consumption,
Rajavel et al. [24] proposed a video surveillance system based task discarding, cost payment and maximize computing speed
on edge computing for object tracking and behavior recogni- and energy efficiency through computational offloading are the
tion in IoMT. In this paper, the detection of moving objects critical problems in making edge computing come into prac-
was improved by combining background subtraction and the tice. Based on the above summary of the existing work, Deep
DNN algorithm, which brought robustness and intelligence to Learning has been widely applied in edge computing, but how
the distributed video surveillance system. These studies fully to offload Deep Learning tasks under the heterogeneous edge
demonstrated the wide applications of Deep Learning in edge computing framework remains to be discussed. In addition to
computing frameworks. However, they only considered single the time and energy consumption that most people focus on,
model to solve Deep Learning problems rather than multi- ensuring the accuracy of AI application services is also an
model systems when offloading, which did not take the trade- issue that cannot be ignored, which also motivates the work
off between efficiency and accuracy into account. of this paper.
For intelligent edge, how to design edge computing sys-
tem with intelligent offloading scheme under the multiple III. S YSTEM M ODEL AND S OLUTION A LGORITHM
constraints of network, communication, computing power In this section, we introduce the edge computing offloading
and energy consumption [25]–[28] is the key to improve architecture that contains different types of edge devices
the efficiency of edge computing. Common offloading de- running multiple models. We first introduce edge system
cision approaches include AI-based approaches [29] (e.g. architecture including model selection, response time and
Deep Q-Network [30]–[33]), Lyapunov Optimization [34]– energy consumption. Then, we formulate the problem within a
[38], Meta-heuristic algorithm [39]–[43], (non-)Convex Opti- time slot with the offloading decision matrix and provide the
mization [44], [45] etc. Tu et al. [30] proposed the Online solving process of the MTPE algorithm, which is improved
Predictive Offloading algorithm based on Double Deep Q- based on TPE [46]. At last, we provide the energy adjustment
Network (DDQN) and Long Short-Term Memory networks algorithm to keep the energy queue stable between time slots.
for cost minimization, which integrates the processing latency, The notations used in this paper are summarized in Table I.
processing energy consumption and the task throw rate of
latency-sensitive tasks. In [34], a blockchain-enabled IoT-
Edge-Cloud computing architecture that benefits both from A. Edge System Architecture
mobile cloud computing and mobile-edge computing (MEC) As depicted in Fig. 2, we consider a multi-model edge
was proposed. The authors derived an adaptive offloading- computing offloading framework with embedded edge devices
decision algorithm EEDTO by utilizing the Lyapunov opti- (namely, end devices) and edge servers. The entire offloading
mization problem such that the energy consumption of the IoT process is mainly divided into three stages: 1) Stage 1: the
device can be minimized when only sacrificing a little delay. end devices transmit the information of the generated tasks to
Natesha et al. [39] designed a service placement strategy based a central server containing the scheduler within a single time
on meta-heuristic hybrid algorithms MGAPSO and EGAPSO. slot. 2) Stage 2: the target node and model for the task are re-
They implemented a two-level fog computing framework turned to each end device by the central server after scheduling
developed by docker and container techniques to minimize tasks using the scheduler’s offloading method. 3) Stage 3: the
service costs and ensure the quality of service (QoS) for In- end devices offload the tasks to the target nodes, and the target

TABLE I
S UMMARY OF K EY N OTATIONS

Notation Description
M set of edge servers
K set of inference models deployed on the
edge servers
N set of end devices
T set of time slots
S set of tasks generated by N in each time slot
st task set in time slot t
Qk set of the accuracy of model k
Rt task generation rate within a time slot
Dnl response time of end device n
k
Dm response time of server m using model k
l
En energy consumption of executing task on
end device n
k
Em energy consumption of executing task on Fig. 2. Multi-model edge computing offloading framework.
edge server m
t
Dm time consumption of transferring task to
edge server m on both edge and end devices. But the face detection models
t
Em energy consumption of transferring task to with different backbone networks have certain differences in
edge server m efficiency and accuracy as well as the response time. For
Dne total time consumption of executing task on each backbone network, we analyze the accuracy Qk =
edge server m {qe , qm , qh }, k ∈ K at easy, medium and hard task levels,
Ene total energy consumption of executing task on respectively, which is officially classified in dataset “WIDER
edge server m FACE” [47]. Assume that tasks S = {s1 , s2 , ..., sT } are
Enmax maximum energy consumption for a task generated in Bernoulli distribution within a time slot T =
pl power consumption of end device {1, 2, ..., T }, and the task generation rate Rt of all end devices
pe power consumption of edge server determines the specific offloaded model on each edge device
pt transmission power within a time slot, where Rt = |sNt | .
pmax
t maximum transmission power If Rt is at the easy task level, it means few end devices
An , Bnm offloading indicators generate tasks within a time slot so that the overall tasks are
xt group of offloading indicators in time slot t easy to process, and the models with high accuracy may be
Et total energy consumption of executing task in selected to offload. Conversely, Rt at the hard task level means
time slot t tasks are hard to process, and the models with the appropriate
Etmax maximum total energy consumption in time accuracy may be selected. Even if it reduces processing
slot t efficiency, the necessary accuracy must be guaranteed. The
et harvesting energy between each time slot condition of model selection based on task generation rate
emax
t maximum harvesting energy and accuracy is shown in Eq. 1:
B0 initial energy within T 
Bt remaining energy at the end of time slot t  qe ≥ q1 ,
 if Rt < Rlow
qm ≥ q2 , if Rlow ≤ Rt < Rhigh (1)

qh ≥ q3 , if Rhigh ≤ Rt .

nodes send back the results after processing the tasks. The set As Rt increases, the accuracy of the task on each model
of edge servers is denoted as M = {1, 2, ..., M }, and the decreases. For the accuracy of selected models for offloading,
set of face detection inference models deployed on the edge the accuracy threshold satisfies q1 ≥ q2 ≥ q3 ≥ qbase , that
servers is denoted as K = {1, 2, ..., K}. N = {1, 2, ..., N } is, models that meet the accuracy constraints can be selected
is the set of embedded edge devices, and only one inference for offloading. Rlow and Rhigh are denoted to classify the
model in K is deployed on each edge device. From Fig. 2, an complexity of the tasks, which have been determined in dataset
edge cloud composed of edge servers connects to various types “WIDER FACE”. Rt < Rlow = 0.8 for easy tasks, Rlow ≤
of end devices and handles their uplink transmission tasks. Rt < Rhigh = 0.95 for medium tasks, and Rt ≥ Rhigh for
The same type of end devices can deploy different models, hard tasks.
while different types of end devices can also deploy the same We examine actual face detection inference tasks on end
model. Therefore, various model deployment scenarios are devices and edge servers, record and calculate the time and
fully considered in our edge computing framework. energy consumption of each task processed on different de-
The inference process of face detection can be executed vices. Dnl is the local response time on end device n with a

k
single model, while Dm is the time of task executed on the the model selection of each server. As shown in Fig. 3, the
edge server m using model k, in which n ∈ N , m ∈ M. second column in the model matrix values 1, which represents
Then the energy consumption of local execution is as inference with the second model is selected. Similarly, it has
follows: only one column with a value of 1.
Enl = Dnl pl , (2)
where pl is the power consumption on an end device. The
energy consumption of edge execution is calculated similarly:
k k
Em = Dm pe , (3)
where pe is the power consumption on an edge server.
If the task needs to be executed on an edge server, the time
t
of data transfer should also be considered. We denote Dm as
the transmission time consumed by offloading a frame image
to server m and then returning the result to the end device.
t
The energy consumption of data transmission Em is shown in
Eq. 4:
t t
Em = Dm pt , (4)
where pt is the transmission power, 0 < pt ≤ and pmax
t , Fig. 3. Binary offloading matrix.
pmax
t is the maximum transmission power between the server
and the end device. Eq. 5 shows the total time and energy After constructing the system model and the offloading
consumption for executing a task on the edge server: decision matrix, the time and energy consumption of the task
Dne = Dm
k t
+ Dm , generated by device n within a time slot are shown in Eq. 8
(5) and Eq. 9:
Ene = Em
k t
+ Em .
M
X K
X
B. Multi-objective Optimization within a Time Slot Tn = an0 Dnl + anm bm e
n Dn ,
k
(8)
m=1 k=1
Each computing task can be processed on an end de-
vice or offloaded to an edge server with better com- M
X K
X
puting performance and higher precision models. We de- En = an0 Enl + anm bm e
n En ,
k
(9)
note An = {an0 , an1 , ..., anm , ...anM } and Bnm = m=1 k=1
{bnm1 , bm mk mK
n , ..., bn , ..., bn } as the offloading indicators.
2
which are adjusted to the same order of magnitude when
an0 = 1 means the task generated on edge device n is calculating.
processed locally, and anm = 1 means the task is offloaded The offloading indicators are used to limit the selection of
to the server m. bm n = 1 shows that task generated on edge
k
delay and energy consumption. Since the offloading decision
device n is offloaded to the model k of the server m. in each time slot satisfies Eq. 6 and Eq. 7, the results calculated
For each end device n, only one parameter is set to 1 in by Tn and En are the time and energy consumption under the
An , and only one parameter is set to 1 in Bnm if the task is current decision. The objective function of the multi-model
offloaded to the server m, then the relationships in Eq. 6 and edge computing framework for task offloading within a time
Eq. 7 must be satisfied: slot is set in Eq. 10. The time and energy consumption are
M
considered comprehensively, and the goal is to minimize the
total cost of all the tasks, where x is the offloading decision
X
an0 + anm = 1, ∀n ∈ N , (6)
m=1
matrix:
N
X
(
K
X 0, if an0 = 1 P1 arg min f (x) = (αTn + βEn ) (10)
bm
n =
k
, ∀n ∈ N , ∀m ∈ M. (7) m]
x=[An ,Bn n=1
k=1
1, if anm = 1
s.t. (1), (6), (7), (8), (9),
Fig. 3 is a binary offloading matrix, showing the offloading
situation of tasks within a time interval. The large matrix has An , Bnm ∈ {0, 1} , ∀n ∈ N , (10a)
N rows, representing the offloading situation of each edge
device in N . The first column expresses whether the tasks are Tn ≤ t, ∀n ∈ N , (10b)
executed locally, and columns 2 through M + 1 show whether
En ≤ Enmax , ∀n ∈ N , (10c)
the tasks are offloaded to the edge servers. For example,
column M +1 in row N values 1 means end device N offloads 0 < pt ≤ pmax , (10d)
t
its task to server M . Note that each row has only one column
value of 1, and all others are 0. The small matrix represents α + β = 1. (10e)

Constraint (1) keeps the overall accuracy to a great level Algorithm 1 Modified Tree-structured Parzen Estimator Al-
in different task complexity. Constraint (6), (7) and (10a) gorithm
denote the binary selection indicators and limit the values Input:
they can take. Constraint (8) and (9) respectively state the I : number of iterations
time and energy consumed by the task of end device n in Ic : number of candidates per iteration
an offloading period. Constraint (10b) denotes that each task γ : quantile
should be completed within a time slot. Constraint (10c) states st : task set in time slot t
that the energy consumption of each task should be less than Output: x minimizes f (x) in Z
the maximal energy consumption. Constraint (10d) is the limit Initialization: Z = ∅
of transmission power. Constraint (10e) is to assign the weight 1: for i = 1 to Ic do
of time and energy consumption to the total cost. 2: set x to randomly offload 50% of st to the servers
Bayesian Optimization is often used to tune hyperparame- 3: calculate f (x) according to Eq. 10
ters in machine learning models, which is generally regarded 4: Z ← Z ∪ {(x, f (x))}
as a black box optimization problem. During the tuning 5: end for
process, we only observe its outputs based on the given 6: for i = 1 to I do
inputs, and there are no restrictions of concavity and convexity 7: k = ⌈γ · Ic ⌉
for the objective problem. Compared with Grid Search and 8: Zl ← select (x, f (x)) with bottom-k values in Z
Randomized Search, the parameter space and computation 9: Zg ← Z \ Zl
of Bayesian Optimization are greatly reduced, and Bayesian 10: construct l(x) with Zl and g(x) with Zg
Optimization does not require a large number of initial sam- 11: C ← {(xc , f (xc ))|xc ∼ l(x), c = 1, ..., Ic }
l(x)
ples in contrast to meta-heuristic algorithms. Due to these 12: x ← arg maxx∈C g(x)
advantages, we choose Bayesian Optimization to solve our 13: Z ← Z ∪ {(x, f (x))}
offloading decision problem, which can also be viewed as 14: if i > 1 and f (x) < f (xpre ) then
a black box function. The Tree-structured Parzen Estimator 15: if γ > 0.25 then
(TPE) is a classic sequential model-based Bayesian optimiza- f (x )−f (x)
16: fc = fpre (xpre )
tion algorithm, which constructs probability models to improve 17: γ = (1 − 0.1fc )γ
the system performance based on historical measurements. 18: end if
In this paper, we design a Modified Tree-structured Parzen 19: end if
Estimator algorithm (MTPE) that can optimize initial param- 20: xpre = x
eters and dynamically adjust quantile along with iterations. 21: Ic = Ic + 1
Compared with traditional Bayesian Optimization based on 22: end for
the Gaussian process, MTPE based on the Gaussian mixture 23: return x minimizes f (x) in Z
model can achieve better results with higher efficiency and
even solve discrete parameters’ optimization.
With the observed experience set {(x1 , f (x1 )), x2 , f (x2 )),
..., (xq , f (xq ))}, MTPE defines p(x|f (x)) using two probabil- arg min EIy∗ (x) = arg min E[y − y ∗ ]. (12)
ity density functions in Eq. 11 to divide the parameter space x x
into good part and bad part. Note that, x is the solution of a Since we minimize the objective function in Eq. 10 and y is
group of offloading indicators: less than y ∗ for the next point, the formula 12 can be rewritten
in Eq. 13:
(
l(x), if f (x) < f (x)∗
p(x|f (x)) = (11) arg max EIy∗ (x) = arg max E[max(y ∗ − y, 0)]. (13)
g(x), if f (x) ≥ f (x)∗ ,
x x

where f (x)∗ is chosen to be some quantile γ of the observed According to the Bayes formula, posterior probability
target function values, so that quantile γ satisfies p(f (x) < p(y|x) can be calculated by prior probability p(y) and condi-
f (x)∗ ) = γ. The probabilistic surrogate models l(x) and tional probability p(x|y), i.e. p(y|x) = p(x|y)p(y)
p(x) . Therefore,
g(x) are tree-structured hierarchical processes constructed by the above problem is derived as follows:
adaptive Parzen estimators and can be applied to discrete- Z ∞
valued variables. Observations {xi } is used to form density EIy∗ (x) = max(y ∗ − y, 0)p(y|x)dy
l(x) such that corresponding objective function f (xi ) is in −∞
Z y∗
the good part. Conversely, g(x) is the density function of the p(x|y)p(y)
= (y ∗ − y) dy
bad part. −∞ p(x)
Here, just for the sake of the derivation, let y = f (x). R y∗ (14)
γy ∗ − −∞
yp(y)dy
After the above estimation by the probability density function, =
we adopt the Expected Improvement (EI) function as the γ + (1 − γ) g(x)
l(x)
acquisition function to collect the next observation point. The g(x) −1
acquisition process is shown in Eq. 12: ∝ (γ + (1 − γ) ) .
l(x)

T
X
Bt (et − Et ) =B0 (et − E1 ) + B1 (et − E2 ) + ... + BT −1 (et − ET )
t=1
=B0 (et − E1 ) + (B0 − E1 + et )(et − E2 ) + ... + [B0 − E1 − ... − ET −1 + (T − 1)et ](et − ET )
=T et B0 − et [(T − 1)E1 + (T − 2)E2 + ... + ET −1 ] + (1 + 2 + ... + T − 1)e2t
(21)
− B0 (E1 + E2 + ... + ET ) − et [E2 + 2E3 + ... + (T − 1)ET ] + C1
T (T − 1) 2
= et − (T − 1)et [E1 + E2 + ... + ET ] + T et B0 − B0 (E1 + E2 + ... + ET ) + C1
2
T (T − 1) 2
= et − [(T − 1)C2 − T B0 ]et + C1 − C2 B0
2

According to Eq. 13 and Eq. 14, maximizing EI means to Then, we introduce the Lyapunov optimization, defined in
select the next observation point with the minimum value of Eq. 16:
g(x)
l(x) . So, we would like select point x with lower probability
1
L(Bt ) = Bt2 . (16)
under g(x) and higher probability under l(x). 2
However, the TPE algorithm starts from a randomly gen-
The Lyapunov drift can be described in Eq. 17:
erated solution, which is an NP-hard Optimization problem
with a large parameter space. Another issue is the fixed ∆(Bt ) = E[L(Bt ) − L(Bt−1 )|Bt ]. (17)
quantile also makes the result fall into a local optimization
early. Our proposed algorithm in Alg. 1 has the following Then, we infer an upper bound of ∆(Bt ), which can be
improvements: 1) It is known that the inference speeds of edge found in the following lemma 1 [35].
servers are much faster than that of end devices. Therefore, Lemma 1: For the end of each time slot, the upper bound
when initializing the observation points, we set the offloading of the Lyapunov drift function ∆(Bt ) is:
indicators to allow a certain proportion (50-50 policy based on
our experiments) of tasks to perform on the edge servers. It can ∆(Bt ) ≤ E[Bt−1 (et − Et )|Bt ] + C, (18)
dramatically reduce the number of iterations, compared with
selecting the initial parameters randomly. The overall process where C is a constant, and is denoted as:
is described in lines (1)-(5) of Alg. 1; 2) To avoid falling
into local optimization, the initial value of quantile γ is 0.4 (emax
t )2 + (Etmax )2
C= , (19)
by default, and changes dynamically with the output of each 2
iteration. In each iteration, we choose bottom-k (k = ⌈γ · Ic ⌉) limit 0 < et ≤ emax and 0 < Et ≤ Etmax .
t
from Z to generate set Zl , and the rest are in set Zg . For the To keep the energy queue dynamically stable, we minimize
next iteration, if the objective function value of the selected the upper bound in time T , which can be transformed into Eq.
point is less than that of the previous iteration, γ is set to 20:
reduce by 10% of the decline degree fc and stop changing T
until γ is less than 0.25. The overall process is described in 1X
P2 arg min Bt−1 (et − Et ). (20)
lines (6)-(22) of Alg. 1. et T t=1

Fig. 4 shows the dynamic change of the energy queue within

C. Energy Optimization between Time Slots time T , each square represents a time slot, and B0 is the initial
energy.
Through the optimization in the above subsection, we can
figure out the lowest cost for each time slot. To keep the
energy stable during time T , we employ the Lyapunov model
to construct a dynamic energy queue and obtain the energy
that needs to be newly supplied between two adjacent time
slots.
Assume that the remaining energy Bt varies at the end of
time slot t in the following Eq. 15: Fig. 4. Energy queue.

Bt = Bt−1 − Et + et , (15) The detailed derivation of calculating et is shown in Eq.

−1 P
TP T
21, where C1 = Ei Ej , C2 = E1 + E2 + ... + ET .
where Et is the total energy consumption in time slot t, and i=1 j=i+1
et is the energy that is replenished at the end of each time From the deduced result, we find that this is a unary quadratic
slot. function with et as the independent variable. Therefore, value

et that satisfies Eq. 20 can be obtained by computing the including response time and accuracy on real equipment. At
minimum point of the unary quadratic function in Eq. 22: last, we evaluate our proposed MTPE algorithm compared
with the TPE algorithm, two meta-heuristic algorithms and
T B0 − (T − 1)C2
et = − Randomized Search.
T (T − 1)
(22)
C2 B0
= − . A. AI Application on NVIDIA Jetson and RTX GPU
T T −1
Due to et > 0 and the sufficiency of initial energy in time T , In all experiments, we choose Jetson Xavier NX, Jetson TX2
the range of B0 is E1 < B0 < T T−1 C2 . Then, the initial energy and Jetson Nano from the NVIDIA Jetson series as embedded
and the harvesting energy between time slots can be solved edge devices, and use GeForce RTX 3080 and GeForce RTX
by maintaining energy stability in the whole task processing 2080 as edge servers. Different features and configurations
cycle. of these NVIDIA Jetson end devices are shown in Table II.
Theoretically, the AI performance of Jetson Xavier NX is
Algorithm 2 Dynamic Energy Adjustment Algorithm roughly 12.71 times that of Jetson TX2 and 4.51 times that of
Input: Jetson Nano. We install the same system versions on all three
M = {1, 2, ..., m, ..., M }: set of edge servers types of boards, ensuring that our experimental results only
K = {1, 2, ..., k, ..., K}: set of models rely on hardware conditions. Since the driver version on each
N = {1, 2, ..., n, ..., N }: set of end devices edge server is limited by the GPU model, which is shown in
T = {1, 2, ..., t, ..., T }: set of time slots Table III, GeForce RTX 3080 has higher requirements for the
B0 : initial energy GPU driver version.
et : harvesting energy
1 .0 0
1: set S = {s1 , s2 , ..., sT } by Bernoulli distribution
2: set D = ∅, E = ∅
3: for t = 1 to T do 0 .9 5
4: for each task in st do
5: import Dnl , Dm k
, Dmt
from the actual test 0 .9 0
e k t
6: Dn = Dm+ Dm
D ← D ∪ Dnl , Dne
A c c u ra c y

7:
0 .8 5
8: calculate Enl , Em k
, Emt
according to Eq. 2-4
e k t
9: En = Em+ Em
0 .8 0
10: E ← E ∪ Enl , Ene re s n e t5 0
11: end for s h u ffle n e tv 2
m o b ile n e tv 3
12: // call the MTPE algorithm of Alg. 1 0 .7 5 g h o s tn e t
13: xt = MTPE (st , D, E) PN
m o b ile n e t0 .2 5
14: put xt into (10b) to get Et = n=1 En 0 .7 0
e a sy m e d iu m h a rd
15: Bt = Bt−1 − Et + et
16: end for C o m p le x ity

Fig. 5. Accuracy of each model on different difficulty task.

The dynamic change of energy during the whole process is
shown in Alg. 2. In each time slot t, the time consumption Dnl ,
Dne and energy consumption Enl , Ene of executing generated First, we train the single-shot face detection model Reti-
tasks on the end devices and edge servers are calculated naface [48] of various backbone networks on the GeForce
in line (4) to (11). Then, the MTPE algorithm from Alg. RTX 3080 server using the face detection benchmark
1 constructs the objective function and returns the optimal dataset WIDER FACE [47]. The backbone networks includ-
offloading decision. Therefore, we can obtain the total energy ing Mobilenet0.25, Mobilenetv3, Ghostnet, Shufflenetv2 and
consumption Et of time slot t. At the end of each time slot Resnet50 are fully employed in our experiments. By combin-
t, Alg. 2 calculates energy et by Eq. 22 and updates energy ing extra-supervised and self-supervised multi-task learning,
queue Bt . Alg. 1 and Alg. 2 solve the problem of minimizing RetinaFace can perform pixel-level localization of faces at var-
the total cost of offloading within a time slot and the problem ious scales. After model training, we estimate the performance
of keeping the energy queue stable between the time slots in of all kinds of backbone networks on three complex task
the whole cycle T respectively, which correspond to P1 and levels, which is classified by dataset WIDER FACE according
P2. to the number of images. The exact accuracy results of each
model on the easy, medium and hard WIDER FACE testing
sets are shown in Fig. 5. The accuracy of each model on
IV. E XPERIMENTAL R ESULTS the easy testing set is above 90%, especially over 95% of
In this section, we first conduct experiments to compare Resnet50. On the medium testing set, the accuracy remains
the performance of end devices and edge servers. Then we stable without significant reduction, but only Mobilenet0.25
test the inference capability of various face detection models decreases below 90%. However, the accuracy of all models

TABLE II
NVIDIA J ETSON B OARD S PECIFICATION

Jetson Nano Jetson TX2 Jetson Xavier NX

Hexa (6)-core processor with 2x
Quad (4)-Core ARM Cortex-A57 6-core NVIDIA Carmel ARMv8.2
CPU NVIDIA Denver2 64-Bit CPU and 4x
MPCore 64-bit CPU 6 MB L2+4 MB L3
ARM Cortex-A57 cores
Volta GPU with 384 NVIDIA CUDA
GPU 128-core Maxwell GPU 256-core Pascal GPU
cores and 48 Tensor cores
AI Performance 472 GFLOPs (FP16) 1.33 TFLOPs (FP16) 6 TFLOPs (FP16)&21 TOPs (INT8)
Memory 4 GB 64-bit LPDDR4 @25.6GB/s 8 GB 128-bit LPDDR4 @59.7GB/s 8 GB 128-bit LPDDR4 @51.2GB/s
Power 5-10W 7.5-15W 10-15W
Jetpack version 4.4 4.4 4.4
Ubuntu Linux 18.04.5 18.04.5 18.04.5
CUDA version 10.2 10.2 10.2
CuDNN version 8.0 8.0 8.0
Archiconda version 3-0.2.3 3-0.2.3 3-0.2.3

TABLE III From Fig. 6, the response time of each model decreases
E DGE S ERVER S PECIFICATION gradually along with the increase of capability of the end
GeForce RTX GeForce RTX
devices, the model with backbone Mobilenet0.25 is the fastest
3080 2080 among all models. By correlated with Fig. 5, it shows that
CUDA Cores 8704 2944 the model with relatively higher accuracy takes a longer time
GPU Memory 10GB GDDR6X 8GB GDDR6 for inference. For Resnet50, as a complex backbone network,
Enforced Power Limit 320W 215W it takes far more time to execute tasks on end devices than
CentOS Linux 8.5.2111 8.5.2111 other lightweight networks. Especially, the reference time of
Driver version 510.47 470.86 Resnet50 on the Jetson Nano is more than two seconds, so it
CUDA version 11.6 11.4 is unsuitable to deploy on end devices.
CuDNN version 8.2.1 8.2.1
Anaconda version 4.8.2 4.8.2 1 5 0
G e F o rc e R T X 2 0 8 0
G e F o rc e R T X 3 0 8 0

on the hard testing set are greatly reduced, yet only Resnet50
shows an acceptable result (84.43%), compared to 80% or 1 0 0
R e s p o n s e T im e (m s )

even below 75% of the others.

Secondly, we also evaluate the response time on end devices
and edge servers, which is the detected time of each image.
The inference performance of different models on each testing
5 0
equipment is shown in Fig. 6 and Fig. 7.

2 5 0 0
re s n e t5 0
s h u ffle n e tv 2 0
re s n e t5 0 s h u ffle n e tv 2 m o b ile n e tv 3 g h o s tn e t m o b ile n e t0 .2 5
2 0 0 0 m o b ile n e tv 3
g h o s tn e t B a c k b o n e N e tw o rk s
m o b ile n e t0 .2 5
R e s p o n s e T im e (m s )

1 5 0 0 Fig. 7. Response time of each model on edge servers.

From Fig. 7, the response time of each model is relatively

1 0 0 0 low and similar. Compared with end devices, Resnet50 runs
more efficiently on edge servers, which presents the dual
advantages of high accuracy and low latency. Therefore, we
5 0 0
consider deploying one lightweight model on each end device
and multiple models including Resnet50 on edge servers.
0
N a n o T X 2 X a v ie r
E n d D e v ic e s B. Results and Analysis of MTPE Algorithm
In this part of the experiments, we verify the feasibility of
Fig. 6. Response time of each end devices. the multi-model edge computing framework and estimate the

performance of the MTPE algorithm compared with others. edge computing can dramatically reduce computing costs.
pl = 10W , pe = 100W and pt = 1W , which are based on
the actual test data and rated power of devices. The time slot t 3 .0 × 1 0 4

is 1s, and the energy consumption limited for a task within a a ll o n e n d

time slot Enmax is 10J. According to the accuracy measured by 2 .5 × 1 0 4
a ll o n e d g e
e d g e -e n d c o lla b o ra tio n
different complex tasks, q1 and q2 are set to 0.9, and q3 is 0.8.
Therefore, models with accuracy more than 0.9 are accepted 4
2 .0 × 1 0
for easy and medium tasks, whereas models with accuracy

T o ta l C o s t f(x )
better than 0.8 are selected for hard tasks. Due to the high
4
1 .5 × 1 0
requirements for real-time performance in actual applications,
the weight α of Tn should be set to more than 0.5. The larger α
4
is easily conducive to reduce the total response time. However, 1 .0 × 1 0
the lowest response time is often accompanied by the highest
energy consumption. Initially, α is adjusted to 0.8 to balance 5 .0 × 1 0 3

response time and energy consumption simultaneously.

0 .0
1 2 2 4 3 6 4 8 6 0
4
2 .6 × 1 0
N u m b e r o f E n d D e v ic e s
M T P E
4
2 .4 × 1 0 T P E
Fig. 9. Total cost for different offloading environment (α = 0.8, K = 5).
4
2 .2 × 1 0
Next, we compare the total cost of deploying different
numbers of models on the servers. N still ranges from 12 to
T o ta l C o s t f(x )

4
2 .0 × 1 0
60, and the number of models K varies from 1 to 5 as well as
4
1 .8 × 1 0 the complexity of the model added from difficulty to easiness.
4
As shown in Fig. 10, as more models are deployed, the total
1 .6 × 1 0
cost tends to decrease. Multi-model framework with K = 5 is
1 .4 × 1 0 4 reduced by an average of 17.94% compared to a single-model
framework, reaching up to 21.86% when N = 60. Hence,
1 .2 × 1 0 4 auxiliary deployment of lightweight models on edge servers
can further ease the computing burden, and the effect is more
0 5 1 0 1 5 2 0 2 5 3 0
obvious with the increase of tasks.
In te ra tio n s

4
1 .6 × 1 0
Fig. 8. The convergence of MTPE and TPE (α = 0.8, K = 5, N = 60).
K = 1
4 K = 2
1 .4 × 1 0
To consider multiple deployment scenarios in our experi- K = 3
K = 4
ments, end devices come into a group of 12, and each group 1 .2 × 1 0 4
K = 5
contains three types of devices mentioned in Table II and four
T o ta l C o s t f(x )

types of lightweight models in each type of end device. The 1 .0 × 1 0 4

number of edge servers M is 4, which is enough for our edge

computing framework. In Fig. 8, we set N = 60, five groups 8 .0 × 1 0 3

of end devices, and the number of inference models K = 5 to

3
obtain the convergence of the iterative process for the MTPE 6 .0 × 1 0
algorithm and TPE algorithm. We find that the total costs of
3
both algorithms start to drop significantly after 20 iterations, 4 .0 × 1 0

but MTPE converge faster than TPE to search the optimal

3
2 .0 × 1 0
value, and the optimal value is also smaller. Since the results 1 2 2 4 3 6 4 8 6 0
stabilize after 20 iterations, we set the number of iterations of N u m b e r o f E n d D e v ic e s
the MTPE algorithm to 30 in our training process, by default.
When the number of end devices N is varied from 12 to Fig. 10. Total cost for different number of models (α = 0.8).
60 with the fixed number of models K = 5, the total costs
of processing tasks generated by different numbers of devices The model selection under different task complexity with
under three offloading environments are calculated in Fig. 9. the number of end devices N = 60 is shown in Fig. 11.
We observe that with the increase of end devices and the In general, the advantage of multi-model is that lightweight
number of tasks, the total cost of task consumption gradually models can meet basic requirements without selecting complex
increases. However, it is obvious that the cost of edge-end models with long response time. All five models can be
collaboration is 36.10% and 54.03% lower than offloading all selected at the easy task level. However, with the increase
tasks to the servers or end devices, respectively. Therefore, of task complexity, the number of selected models decreases

to meet the requirement of accuracy. At the hard task level, SA and the original TPE algorithm. The result in contrast to
only Resnet50 can be selected for edge devices. the baseline algorithm, Randomized Search, is shown in Fig.
13 with α = 0.8, K = 5. Our MTPE algorithm can obtain
1 .0 the optimal solution by using fewer initialization decision
parameters. It can be concluded that our algorithm MTPE
0 .9
is superior to the DDQN algorithm and the meta-heuristic
0 .8 algorithm for solving the offloading problem in multi-model
edge computing framework, which reduces the total cost by
P ro p o rtio n o f E a c h M o d e l

0 .7
37.79%, 23.01%, 3.14%, 20.09% and 19.90% on average
0 .6
compared with Randomized Search, DDQN, TPE, EGAPSO
0 .5 and SA, respectively.
0 .4
re s n e t5 0 4
s h u ffle n e tv 2 2 .5 × 1 0
0 .3
m o b ile n e tv 3 R a n d o m
0 .2 g h o s tn e t D D Q N
m o b ile n e t0 .2 5 2 .0 × 1 0 4 E G A P S O
0 .1 lo c a l S A
T P E
0 .0 M T P E
e a sy m e d iu m h a rd

T o ta l C o s t f(x )
4
1 .5 × 1 0
C o m p le x ity

4
Fig. 11. Model selection (α = 0.8, K = 5, N = 60). 1 .0 × 1 0

To further observe the influence of the two factors of time 3

5 .0 × 1 0
and energy consumption on the experimental results, we vary
the weight α of the objective function f (x) from 0 to 1
with 0.1 in Fig. 12. We find that the total cost of edge- 0 .0
1 2 2 4 3 6 4 8 6 0
end collaboration keeps decreasing steadily as the value of
N u m b e r o f E n d D e v ic e s
α increases. But with α = 0.5 as the boundary, the total
cost of executing all tasks on end devices locally starts to
Fig. 13. Contrast experiment (α = 0.8, K = 5).
be higher than that of offloading all tasks to servers which
maintains a sharp downward trend. It can be concluded that
the edge servers are beneficial to reduce the total response
time, while the end devices contribute to reducing the total V. C ONCLUSION AND F UTURE W ORK
energy consumption. For this reason, the total cost of edge-
end collaboration can be effectively reduced and maintained In this paper, we proposed a multi-model edge computing
at a stable level. offloading framework, using embedded edge devices NVIDIA
Jetson and GeForce RTX GPU servers to simulate the edge
7 × 1 0 4 computing environment of real AI applications. We compre-
a ll o n e n d
hensively considered the accuracy, time and energy consump-
6 × 1 0 4 a ll o n e d g e tion of inference tasks. To work out the lowest total cost of
e d g e -e n d c o lla b o ra tio n the system within a time slot, we put forward a Bayesian
5 × 1 0 4 Optimization algorithm using MTPE, and theoretically indi-
cated that our algorithm can find an optimal solution under
T o ta l C o s t f(x )

4 × 1 0 4
less iterations. To ensure the stability of the energy queue
between time slots, we also employed the Lyapunov drift
4
3 × 1 0 function to solve the harvesting energy between the time
slots. Through comparative experiments, we verified that the
4
2 × 1 0
multi-model edge computing offloading framework achieved
4
satisfactory results in communication cost and computation
1 × 1 0
cost, and ensured the high accuracy of inference tasks. Com-
0
pared with the original TPE algorithm, the state-of-the-art
0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 DDQN algorithm and the meta-heuristic algorithms, EGAPSO
α v a l u e and SA, our algorithm was superior for solving offloading
problems at the lowest cost. In the future, we will concentrate
Fig. 12. Total cost for different α Value (K = 5, N = 60). on load balancing, including the scheduling and migration
of containers on edge devices to fully utilize the computing,
Finally, we compare our algorithm with the latest DDQN al- storage, and network resources, which will also improve the
gorithm [30], meta-heuristic hybrid algorithm EGAPSO [39], reliability of the edge computing environment.

R EFERENCES survey,” IEEE Communications Surveys & Tutorials, vol. 22, no. 2,
pp. 869–904, 2020.
[1] “Statista: Number of internet of things (iot) connected devices worldwide [21] X. Li, J. Xia, L. Cao, G. Zhang, and X. Feng, “Driver fatigue detection
from 2019 to 2030(2020)..” https://www.statista.com/statistics/1183457/ based on convolutional neural network and face alignment for edge
iot-connected-devices-worldwide/. computing device,” Proceedings of the Institution of Mechanical Engi-
[2] M. A. Rahman and M. S. Hossain, “An internet-of-medical-things- neers, Part D: Journal of Automobile Engineering, vol. 235, no. 10-11,
enabled edge computing framework for tackling covid-19,” IEEE In- pp. 2699–2711, 2021.
ternet of Things Journal, vol. 8, no. 21, pp. 15847–15854, 2021. [22] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, S. Chen,
[3] S. Wan, S. Ding, and C. Chen, “Edge computing enabled video segmen- and P. Hou, “A new deep learning-based food recognition system for
tation for real-time traffic monitoring in internet of vehicles,” Pattern dietary assessment on an edge computing service infrastructure,” IEEE
Recognition, vol. 121, p. 108146, 2022. Transactions on Services Computing, vol. 11, no. 2, pp. 249–261, 2017.
[4] L. Fan and L. Zhang, “Multi-system fusion based on deep neural [23] A. Rocha Neto, T. P. Silva, T. Batista, F. C. Delicato, P. F. Pires, and
network and cloud edge computing and its application in intelligent F. Lopes, “Leveraging edge intelligence for video analytics in smart city
manufacturing,” Neural Computing and Applications, vol. 34, no. 5, applications,” Information, vol. 12, no. 1, p. 14, 2020.
pp. 3411–3420, 2022. [24] R. Rajavel, S. K. Ravichandran, K. Harimoorthy, P. Nagappan, and
[5] M. A. Guillén, A. Llanes, B. Imbernón, R. Martı́nez-España, A. Bueno- K. R. Gobichettipalayam, “Iot-based smart healthcare video surveillance
Crespo, J.-C. Cano, and J. M. Cecilia, “Performance evaluation of edge- system using edge computing,” Journal of Ambient Intelligence and
computing platforms for the prediction of low temperatures in agriculture Humanized Computing, pp. 1–13, 2021.
using deep learning,” The Journal of Supercomputing, vol. 77, no. 1, [25] M. Lapegna, W. Balzano, N. Meyer, and D. Romano, “Clustering algo-
pp. 818–840, 2021. rithms on low-power and high-performance devices for edge computing
[6] R. Rajavel, S. K. Ravichandran, K. Harimoorthy, P. Nagappan, and environments,” Sensors, vol. 21, no. 16, p. 5395, 2021.
K. R. Gobichettipalayam, “Iot-based smart healthcare video surveillance [26] T. V. Pham, N. N. Q. Tran, H. M. Pham, T. M. Nguyen, and T. Ta Minh,
system using edge computing,” Journal of Ambient Intelligence and “Efficient low-latency dynamic licensing for deep neural network de-
Humanized Computing, vol. 13, no. 6, pp. 3195–3207, 2022. ployment on edge devices,” in 2020 The 3rd International Conference
[7] Z. Ali, Z. H. Abbas, G. Abbas, A. Numani, and M. Bilal, “Smart on Computational Intelligence and Intelligent Systems, pp. 44–49, 2020.
computational offloading for mobile edge computing in next-generation [27] A. Khakimov, I. A. Elgendy, A. Muthanna, E. Mokrov, K. Samouylov,
internet of things networks,” Computer Networks, vol. 198, p. 108356, Y. Maleh, and A. A. Abd El-Latif, “Flexible architecture for deployment
2021. of edge computing applications,” Simulation Modelling Practice and
Theory, vol. 114, p. 102402, 2022.
[8] A. S. Mohammed, K. Venkatachalam, S. Hubálovskỳ, P. Trojovskỳ, and
[28] Z. Shahbazi and Y.-C. Byun, “Improving transactional data system
P. Prabu, “Smart edge computing for 5 g/6 g satellite iot for reducing
based on an edge computing–blockchain–machine learning integrated
inter transmission delay,” Mobile Networks and Applications, pp. 1–10,
framework,” Processes, vol. 9, no. 1, p. 92, 2021.
2022.
[29] S. Iftikhar, S. S. Gill, C. Song, M. Xu, M. S. Aslanpour, A. N. Toosi,
[9] A. I. Tahirkheli, M. Shiraz, B. Hayat, M. Idrees, A. Sajid, R. Ullah,
J. Du, H. Wu, S. Ghosh, D. Chowdhury, et al., “Ai-based fog and
N. Ayub, and K.-I. Kim, “A survey on modern cloud computing
edge computing: A systematic review, taxonomy and future directions,”
security over smart city networks: Threats, vulnerabilities, consequences,
Internet of Things, p. 100674, 2022.
countermeasures, and challenges,” Electronics, vol. 10, no. 15, p. 1811,
[30] Y. Tu, H. Chen, L. Yan, and X. Zhou, “Task offloading based on lstm
2021.
prediction and deep reinforcement learning for efficient edge computing
[10] M. M. Sadeeq, N. M. Abdulkareem, S. R. Zeebaree, D. M. Ahmed, A. S.
in iot,” Future Internet, vol. 14, no. 2, p. 30, 2022.
Sami, and R. R. Zebari, “Iot and cloud computing issues, challenges
[31] J. Wang and L. Wang, “Mobile edge computing task distribution and
and opportunities: A review,” Qubahan Academic Journal, vol. 1, no. 2,
offloading algorithm based on deep reinforcement learning in internet of
pp. 1–7, 2021.
vehicles,” Journal of Ambient Intelligence and Humanized Computing,
[11] W. Shi, C. Jie, Z. Quan, Y. Li, and L. Xu, “Edge computing: Vision and pp. 1–11, 2021.
challenges,” Internet of Things Journal, IEEE, vol. 3, no. 5, pp. 637–646, [32] X. Li, “A computing offloading resource allocation scheme using deep
2016. reinforcement learning in mobile edge computing systems,” Journal of
[12] S. Yi, L. Cheng, and Q. Li, “A survey of fog computing: Concepts, Grid Computing, vol. 19, no. 3, pp. 1–12, 2021.
applications, and issues,” in Proceedings of the 2015 Workshop on [33] L. Ale, N. Zhang, X. Fang, X. Chen, S. Wu, and L. Li, “Delay-aware
Mobile Big Data (Mobidata ’15), 2015. and energy-efficient computation offloading in mobile-edge computing
[13] U. Shaukat, E. Ahmed, Z. Anwar, and F. Xia, “Cloudlet deployment using deep reinforcement learning,” IEEE Transactions on Cognitive
in local wireless networks: Motivation, architectures, applications, and Communications and Networking, vol. 7, no. 3, pp. 881–892, 2021.
open challenges,” Journal of Network & Computer Applications, vol. 62, [34] H. Wu, K. Wolter, P. Jiao, Y. Deng, Y. Zhao, and M. Xu, “Eedto:
no. Feb., pp. 18–40, 2016. an energy-efficient dynamic task offloading algorithm for blockchain-
[14] J. Ren, D. Zhang, S. He, Y. Zhang, and T. Li, “A survey on end- enabled iot-edge-cloud orchestrated computing,” IEEE Internet of Things
edge-cloud orchestrated network computing paradigms: Transparent Journal, vol. 8, no. 4, pp. 2163–2176, 2020.
computing, mobile edge computing, fog computing, and cloudlet,” ACM [35] Z. Chang, L. Liu, X. Guo, and Q. Sheng, “Dynamic resource alloca-
Computing Surveys (CSUR), vol. 52, no. 6, pp. 1–36, 2019. tion and computation offloading for iot fog computing system,” IEEE
[15] W.-J. Chang, L.-B. Chen, C.-Y. Sie, and C.-H. Yang, “An artificial Transactions on Industrial Informatics, vol. 17, no. 5, pp. 3348–3357,
intelligence edge computing-based assistive system for visually impaired 2020.
pedestrian safety at zebra crossings,” IEEE Transactions on Consumer [36] Z. Ning, P. Dong, X. Wang, S. Wang, X. Hu, S. Guo, T. Qiu,
Electronics, vol. 67, no. 1, pp. 3–11, 2020. B. Hu, and R. Y. Kwok, “Distributed and dynamic service placement
[16] X. Kong, K. Wang, S. Wang, X. Wang, X. Jiang, Y. Guo, G. Shen, in pervasive edge computing networks,” IEEE Transactions on Parallel
X. Chen, and Q. Ni, “Real-time mask identification for covid-19: and Distributed Systems, vol. 32, no. 6, pp. 1277–1292, 2020.
An edge-computing-based deep learning framework,” IEEE Internet of [37] S. Bi, L. Huang, H. Wang, and Y.-J. A. Zhang, “Lyapunov-guided
Things Journal, vol. 8, no. 21, pp. 15929–15938, 2021. deep reinforcement learning for stable online computation offloading
[17] N. Balamuralidhar, S. Tilon, and F. Nex, “Multeye: monitoring system in mobile-edge computing networks,” IEEE Transactions on Wireless
for real-time vehicle detection, tracking and speed estimation from uav Communications, vol. 20, no. 11, pp. 7519–7537, 2021.
imagery on edge-computing platforms,” Remote Sensing, vol. 13, no. 4, [38] J. Zhang, L. Zhou, Q. Tang, E. C.-H. Ngai, X. Hu, H. Zhao, and
p. 573, 2021. J. Wei, “Stochastic computation offloading and trajectory scheduling for
[18] A. Koubaa, A. Ammar, A. Kanhouch, and Y. Alhabashi, “Cloud versus uav-assisted mobile edge computing,” IEEE Internet of Things Journal,
edge deployment strategies of real-time face recognition inference,” vol. 6, no. 2, pp. 3688–3699, 2018.
IEEE Transactions on Network Science and Engineering, vol. PP, no. 99, [39] B. Natesha and R. M. R. Guddeti, “Meta-heuristic based hybrid service
pp. 1–1, 2021. placement strategies for two-level fog computing architecture,” Journal
[19] H. Lin, S. Zeadally, Z. Chen, H. Labiod, and L. Wang, “A survey of Network and Systems Management, vol. 30, no. 3, pp. 1–23, 2022.
on computation offloading modeling for edge computing,” Journal of [40] M. Xue, H. Wu, R. Li, M. Xu, and P. Jiao, “Eosdnn: An efficient
Network and Computer Applications, vol. 169, p. 102781, 2020. offloading scheme for dnn inference acceleration in local-edge-cloud
[20] X. Wang, Y. Han, V. C. Leung, D. Niyato, X. Yan, and X. Chen, collaborative environments,” IEEE Transactions on Green Communica-
“Convergence of edge computing and deep learning: A comprehensive tions and Networking, vol. 6, no. 1, pp. 248–264, 2021.

[41] J. Bi, H. Yuan, S. Duanmu, M. Zhou, and A. Abusorrah, “Energy-

optimized partial computation offloading in mobile-edge computing with
genetic simulated-annealing-based particle swarm optimization,” IEEE
Internet of Things Journal, vol. 8, no. 5, pp. 3774–3785, 2020.
[42] L. Kuang, T. Gong, S. OuYang, H. Gao, and S. Deng, “Offloading
decision methods for multiple users with structured tasks in edge
computing for smart cities,” Future Generation Computer Systems,
vol. 105, pp. 717–729, 2020.
[43] G. Li, Y. Liu, J. Wu, D. Lin, and S. Zhao, “Methods of resource
scheduling based on optimized fuzzy clustering in fog computing,”
Sensors, vol. 19, no. 9, p. 2122, 2019.
[44] C. Deng, X. Fang, and X. Wang, “Uav-enabled mobile edge comput-
ing for ai applications: Joint model decision, resource allocation and
trajectory optimization,” IEEE Internet of Things Journal, 2022.
[45] Q. Tang, Z. Fei, B. Li, and Z. Han, “Computation offloading in leo
satellite networks with hybrid cloud and edge computing,” IEEE Internet
of Things Journal, vol. 8, no. 11, pp. 9164–9176, 2021.
[46] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-
parameter optimization,” Advances in neural information processing
systems, vol. 24, 2011.
[47] “Wider face: A face detection benchmark..” http://shuoyang1213.me/
WIDERFACE/index.html.
[48] J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “Retinaface:
Single-shot multi-level face localisation in the wild,” in Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition,
pp. 5203–5212, 2020.