0% found this document useful (0 votes)

44 views9 pages

Dynamic Split Computing Framework in Distributed Serverless Edge Clouds

Dsc

Uploaded by

mde2022005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views9 pages

Dynamic Split Computing Framework in Distributed Serverless Edge Clouds

Dsc

Uploaded by

mde2022005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO.

8, 15 APRIL 2024 14523

Dynamic Split Computing Framework in

Distributed Serverless Edge Clouds
Haneul Ko , Senior Member, IEEE, Hyeonjae Jeong, Daeyoung Jung,
and Sangheon Pack , Senior Member, IEEE

Abstract—Distributed serverless edge clouds and split comput- carrying out the defined function, and the raw input data, such
ing are promising technologies to reduce the inference latency as video stream and speech to the cloud. Then, the container
of large-scale deep neural networks (DNNs). In this article, instance performs the inference and returns the result to the
we propose a dynamic split computing framework (DSCF) in
distributed serverless edge clouds. In DSCF, the edge cloud mobile device.
orchestrator dynamically determines 1) splitting point and Meanwhile, edge computing [9], [10] has been introduced
2) warm status maintenance of container instances (i.e., whether as an emerging computing paradigm, in which multiple
or not to maintain each container instance in a warm status). For edge clouds are deployed near mobile devices and perform
optimal decisions, we formulate a constrained Markov decision computing-intensive tasks (e.g., inference). As a result, it
process (CMDP) problem to minimize the inference latency while
maintaining the average resource consumption of distributed potentially addresses the challenges of high communication
edge clouds below a certain level. The optimal stochastic policy latency in cloud computing caused by the long distance
can be obtained by converting the CMDP model into a linear between mobile devices and the cloud.
programming (LP) model. The evaluation results demonstrate To take advantage of serverless computing and edge
that DSCF can achieve less than half the inference latency computing jointly, serverless edge computing-based infer-
compared to the local computing scheme while maintaining
sufficient low resource consumption of distributed edge clouds. ence models can be considered. However, standalone edge
clouds are not sufficient to further reduce the inference
Index Terms—Distributed serverless edge cloud, joint latency of large-scale DNN models, and thus we consider
optimization, split computing, warm start.
distributed serverless edge clouds where distributed edge
clouds are deployed and an orchestrator distributes serverless
tasks depending on the status of networking and computing
I. I NTRODUCTION
resources. Specifically, a given DNN model is divided into
EEP neural network (DNN) are one of the most general
D approaches in current intelligent mobile applications and
have become more popular due to their accurate and reliable
two subnetworks at a certain splitting layer. Then, the first
subnetwork from the first layer to the splitting layer is called
the head model, and the second subnetwork from the next
inference ability [1], [2]. Basically, for inference using DNNs, layer of the splitting layer to the last layer is called the
a local computing-based inference method can be considered, tail model. These head and tail models are constructed as
in which a mobile device performs all necessary computations container instances and run at edge clouds. Then, the DNN
for inference [3], [4]. In this method, there is no help from inference latency depends on the splitting point and whether
the network side and thus complex DNN models cannot be the corresponding container instances are maintained as warm
executed on the mobile device due to its limited resources. statuses.
Meanwhile, serverless computing can also be used for DNN In this article, we propose a dynamic split computing
inference.1 That is, a mobile device just defines a function to framework (DSCF) in which the orchestrator carefully decides
carry out the inference on the cloud side. Then, it transmits the the splitting point and warm status maintenance of container
request message, invoking the running of container instance instances (i.e., whether or not to maintain each container
instance in a warm status). For optimal decisions, we formulate
Manuscript received 22 June 2023; revised 28 November 2023; accepted
10 December 2023. Date of publication 13 December 2023; date of cur-
a constrained Markov decision process (CMDP) problem. The
rent version 9 April 2024. This work was supported by the Institute for optimal stochastic policy can be obtained by converting the
Information and Communications Technology Planning and Evaluation (IITP) CMDP model into a linear programming (LP) model. The
under Grant 2022-0-01015 and Grant 2022-0-00531. (Corresponding author:
Sangheon Pack.)
evaluation results show that DSCF can achieve less than half
Haneul Ko is with the Department of Electronic Engineering, Kyung Hee the inference latency compared to the local computing scheme
University, Yongin 17104, Gyeonggi, South Korea (e-mail: [email protected]). while maintaining a sufficiently low resource consumption of
Hyeonjae Jeong, Daeyoung Jung, and Sangheon Pack are with the School
of Electrical Engineering, Korea University, Seoul 02841, South Korea
edge clouds. Furthermore, DSCF can adaptively adjust policies
(e-mail: [email protected]; [email protected]; [email protected]). on the splitting point and the maintenance of a warm status
Digital Object Identifier 10.1109/JIOT.2023.3342438 by considering its operational environments.
1 Note that, in the serverless computing, a mobile device does not need
The key contributions of this article can be summarized as
to manage dedicated servers and/or container instances for the inference
of DNNs [5], [6], which implies increased resource efficiency and cloud follows. First, when employing the split computing approach
manageability [7], [8]. in distributed serverless edge clouds, the inference latency
2327-4662
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
14524 IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO. 8, 15 APRIL 2024

depends on the splitting point and whether the corresponding the splitting point and resource allocation based on a queuing
container instances are kept in warm statuses. Therefore, model that evaluates the inference latency. To solve the for-
to minimize the DNN inference latency, it is essential to mulated problem in a practical manner, they decomposed the
jointly optimize the splitting point and the status maintenance problem into subproblems and developed a heuristic algorithm.
of container instances. Nevertheless, previous works have Tang et al. [17] investigated an optimization problem to
not addressed this joint optimization, that is, the existing split the DNN model while taking into account the resource-
work focused solely on determining the optimal splitting constrained edge cloud. In addition, they devised an algorithm
point without considering the status of container instances. that exploits the structural aspects of the defined problem to
Consequently, even with the optimal splitting point, a suf- find a solution in polynomial time. Deb et al. [18] proposed
ficiently low inference latency cannot be obtained due to a decentralized multiuser computation offloading mechanism
the increased latency for container initialization. On the using game theory to appropriately offload tasks to nearby
contrary, DSCF jointly optimizes two decisions on the splitting edge clouds.
point and status maintenance of container instances; thus, it Li et al. [19] proposed a model splitting framework con-
can efficiently reduce inference latency. Second, to assess sidering the early-exit concept where the inference can be
the effectiveness of the proposed framework in real-world conducted at an appropriate intermediate layer. The framework
scenarios, we measured the inference latency of head and jointly optimizes exit and splitting points to maximize the
tail models under different splitting points and computing accuracy of the inference while guaranteeing the inference
environments. Using these measured inference latency, we latency below a certain level. Laskaridis et al. [20] proposed
conducted comprehensive evaluations and analyzed the results, a framework that continuously checks network and mobile
offering significant guidance for optimal performance in infer- device resources and determines the splitting and early-exist
ence latency and resource consumption of distributed edge points with consideration of application service-level agree-
clouds. ment (SLA).
The remainder of this article is organized as follows. Dehkordi et al. [21] formulated an optimization problem
Section II summarizes the related works. Section III presents to decide the splitting point and the quantization level of
DSCF, and Section IV describes the CMDP model for the the weights in DNN to reduce the inference latency without
optimal operation of DSCF. Section V presents the evaluation degradation of accuracy. In addition, they proposed a heuristic
results followed by the concluding remarks in Section VI. algorithm using properties of a directed acyclic graph (DAG)
to obtain the suboptimal solution. Krouka et al. [22] presented
a method that performs pruning and compression to further
II. R ELATED W ORK reduce the energy consumption of the mobile device while
Various work has been conducted to reduce inference guaranteeing the accuracy of the inference before splitting the
latency in split computing [11], [12], [13], [14], [15], [16], DNN model. Zhou et al. [23] proposed a method using model
[17], [18], [19], [20], [21], [22], [23]. The works can be pruning and compressing the feature map at the splitting point
classified into: 1) 3GPP standardization in split comput- to reduce inference latency.
ing [11], [12]; 2) pure split computing strategy [13], [14], [15], However, there is no work considering the split computing
[16], [17], [18]; 3) split computing with the concept of early- approach for DNN inference in distributed serverless edge
exit [19], [20]; and 4) split computing with the compression clouds.
method [21], [22], [23].
3GPP analyzed the performance of split computing in
various environments [11] and derived some issues to support III. DYNAMIC S PLIT C OMPUTING F RAMEWORK
split computing in mobile networks [12]. Fig. 1 shows the proposed DSCF in distributed serverless
Kang et al. [13] designed an automatic model splitting edge clouds. We consider two edge clouds (i.e., head and tail
mechanism that consists of the deployment and runtime edge clouds) in which the head and tail models are installed
phases. In the deployment phase, prediction models on each as container instances, respectively. Furthermore, these edge
layer’s performance (i.e., energy consumption and latency) are clouds are managed by an edge cloud orchestrator. DSCF
constructed. Based on the constructed prediction models, in exploits the split computing approach in which the whole DNN
the runtime phase, the splitting point is dynamically decided model with L layers is split into head and tail models. Note
by considering which metric is more important than the that any DNN model can be considered in DSCF if it can
other. Yan et al. [14] conducted a joint optimization on be split to head and tail models. For dynamic splitting of
the model placement and the splitting point to minimize the DNN, the head and tail edge clouds have all possible container
cost function consisting of energy consumption and inference instances for head models from the first layer to the lth layer,
latency considering the dynamics in wireless channels. In IH,l (for ∀l), and for the tail models after the lth layer, IT,l
addition, they derived the closed-form solution for the specific (for ∀l), respectively.
DNN model. Eshratifar et al. [15] formulated an integer When the mobile device needs the result of the inference,
LP problem to decide on more than one splitting point. it requests the execution of the DNN inference to the edge
They obtained the suboptimal solution by converting the cloud orchestrator. The request message contains the input data
formulated problem to the well-known shortest-path problem. for inference. After receiving the inference request message,
He et al. [16] formulated a joint optimization problem for the orchestrator decides the splitting point and requests the

Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
KO et al.: DYNAMIC SPLIT COMPUTING FRAMEWORK IN DISTRIBUTED SERVERLESS EDGE CLOUDS 14525

available computing power of the tail edge cloud, respectively.

Meanwhile, the latency to transmit intermediate data from the
head edge cloud to the tail edge cloud, ζD , can be obtained
as (Dl /R), where Dl and R are the intermediate data size at
the lth splitting point and the transmission rate between the
head and tail edge clouds. Therefore, to decide the optimal
splitting point, the orchestrator collects some information (i.e.,
the transmission rate between the head and tail edge clouds
and the available computing powers of the head and tail edge
clouds). Then, considering the information, it dynamically
decides the optimal splitting point.
However, even though the optimal splitting point is applied,
if the container instances for the corresponding head and tail
models are not maintained as warm statuses, the edge cloud
conducts the cold start taking a long time (over 2 s [26]).
Fig. 1. System model. Due to this long latency of the cold start (i.e., initialization
latency of head and tail models), the inference result cannot
execution of the container instance for the corresponding head be returned within a short duration. If all container instances
model to the head edge cloud.2 After receiving this request for all head and tail models are maintained as warm statuses,
message, if the container instance for the corresponding head even though any splitting point is given, the inference result
model, IH,l , is not maintained as a warm status, the head can be obtained without the concern of the latency of the cold
edge cloud performs the cold start to initialize the container start. However, warm starts need a considerable amount of
instance. Then, it conducts the inference and obtains the resources. Thus, if too many container instances are kept in
intermediate data (i.e., the output of the head model). After warm statuses, the edge cloud cannot provide more container
that, it sends the request message with the intermediate data instances for newly specified functions due to its limited
to the tail edge cloud.3 After receiving the message, the tail resource [27], [28]. To avoid this situation, the resource
edge cloud conducts the cold start to initialize the container consumed to maintain a warm status should be kept below a
instance when the container instance of the corresponding tail particular threshold. That is, the edge cloud orchestrator should
model, IT,l , is not in a warm status. After completing the cold carefully decide which container instances are kept as warm
start, the tail edge cloud can obtain the final inference result statuses or not.
by executing the tail model with the intermediate data as input. In summary, the inference latency is affected by the splitting
After obtaining the result, the tail edge cloud can return it to point and whether to maintain each container instance in a
the mobile device. warm status or not. In addition, the resource consumption
As shown in this procedure, the total inference latency of of the edge cloud is determined by whether to maintain
DSCF consists of 1) the transmission latency of the input data; each container instance in a warm status or not. Therefore,
2) the initialization and inference latency of the head model; DSCF conducts the joint optimization on the splitting point
3) the transmission latency of the intermediate data; and 4) the and warm status maintenance of container instances. The
initialization latency and inference latency of the tail model. following section describes the CMDP problem formulation
The inference latency of the head model, the transmission with the objective of minimizing the inference latency while
latency of the intermediate data, and the inference latency of maintaining the average resource consumption of distributed
the tail model depend on which layer of DNN is split (i.e., edge clouds below a certain level. In addition, the CMDP
splitting point) [3]. In addition, they are influenced by the problem is converted to an equivalent LP model to obtain the
transmission rate between the head and tail edge clouds and optimal solution.
their available computing powers. Specifically, the inference IV. C ONSTRAINT M ARKOV D ECISION P ROCESS
latency of the head model ζH,I depends on the splitting point
In this article, the edge cloud orchestrator decides the
and the available computing power of the head edge cloud.
splitting point and whether to maintain each container instance
Let FlH and CH denote floating-point operations (FLOPS) of
in a warm status or not at the time epochs T = {1, 2, 3, . . .}.
the head model with the lth splitting point and the available
Meanwhile, the CMDP model is suitable for optimization
computing power of the head edge cloud, respectively. Then,
problems where the agent performs a series of particular
the inference latency of the head model ζH,I can be calculated
actions to minimize (or maximize) the cost (or reward) under
as (FlH /CH ). Similarly, the inference latency of the tail model
constraints [29]. Therefore, we exploit the CMDP model to
ζT,I can be derived as (FlT /CT ), where FlT and CT denote
optimize the decision of the edge cloud orchestrator. Table I
FLOPS of the tail model with the lth splitting point and the
shows the summary of the important notations.
2 Since a container instance is used by a single mobile device in general
A. State Space
serverless computing [24], [25] and the mobile device is assumed to generate
a single request, we do not need to consider the case of concurrent requests for The total state space S can be defined as
the target container and thus no queuing latency is considered in this article.
3 Note that if the head edge cloud executes the whole model, it returns the S = CH × CT × R × WH,l × WT,l (1)
inference result to the mobile device. l
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
14526 IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO. 8, 15 APRIL 2024

TABLE I
I MPORTANT N OTATIONS Similar to CH and CT , R can be discretized with the
minimum scale uR (i.e., unit transmission rate). Then, R can
be expressed as

R = Rmin , Rmin + uR , . . . , Rmax (4)

where Rmin and Rmax are the minimum and maximum trans-
mission rates, respectively.
The container instance can be a warm or cold status.
Therefore, WH,l and WT,l can be described as
WH,l = {0, 1} (5)
and
WT,l = {0, 1} (6)
where WH,l (or WT,l ) denotes whether or not IH,l (or IT,l ) has
a warm status. That is, if WH,l = 1, IH,l has a warm status.
Otherwise, IT,l has a cold status.

B. Action Space
The total action space can be defined as

A = AS × AW,H,l × AW,T,l (7)
l
where AS denotes the splitting point action space. In addition,
AW,H,l and AW,T,l represent the warming action spaces for
IH,l and IT,l , respectively.
Since the model has L layers, AS can be represented as
AS = {0, 1, .., L}. (8)
Note that AS = 0 represents the case where the entire model is
carried out at the tail edge cloud (i.e., the tail model is equal
to the entire model). On the other hand, AS = L describes the
case where the whole model is conducted at the head edge
(i.e., the head model is equal to the whole model).
where CH and CT are the state spaces for the available The head and tail edge clouds can make each container
computing powers of the head and tail edge clouds, respec- instance IH,l or IT,l into a warm status, and therefore AW,H,l
tively. R is the state space for the transmission rate between and AW,T,l can be represented as
the head and tail edge clouds. WH,l (or WT,l ) denotes the AW,H,l = {0, 1} (9)
state space for indicating whether the container instance for
the head model from the first layer to the lth layer, IH,l (or the and
container instance for the tail model after the lth layer, IT,l ) AW,T,l = {0, 1}. (10)
is in a warm status or not.
Let CEmax denote the maximum computing power of the edge
C. Transition Probability
cloud. Since the elements of CH and CT can be discretized
with the minimum scale uC (i.e., unit computing power), CH Based on the chosen warming actions AW,H,l and AW,T,l , the
and CT can be represented by next states WH,l and WT,l (representing whether the container
instance is in a warm state or not) can be decided. Meanwhile,

CH = uC , 2uC , . . . , CEmax (2) the other state transitions are not affected by the chosen action,
and all states change independently. Therefore, the transition
and probability from the current state S = [CH , CT , R, WH,l , WT,l ]
to the next state S = [CH , C , R , W , W ] can be described
T H,l T,l
CT = uC , 2uC , . . . , CEmax . (3) as (11), shown at the bottom of the page.

It is assumed that the times required for container instances and

IH,l and IT,l to start cold follow exponential distributions =0
1, if WT,l
with mean 1/γH,l and 1/γT,l , respectively. Then, if container P WT,l |WT,l = 1, AW,T,l = 0 = (19)
0, otherwise.
instances IH,l and IT,l are maintained as cold statuses (i.e.,
WH,l = 0 and WT,l = 0) and head and tail edge clouds make Note that the transition probability of other states (i.e., CH ,
those container instances into warm statuses (i.e., AW,H,l = 1 CT , and R) can be defined statistically.
and AW,T,l = 1), the probabilities that the cold starts for IH,l
and IT,l are completed within the decision epoch duration τ D. Cost and Constraint Functions
can be obtained as γH,l τ and γT,l τ , respectively [30]. Thus, 1) Cost Function: We define the cost function to minimize
|W , A
P[WH,l H,l W,H,l = 1] and P[WT,l |WT,l , AW,T,l = 1] can, the inference latency. The inference latency consists of the
respectively, be described as transmission latency of the input data ζI , the latency for the
⎧ =1
⎨ γH,l τ, if WH,l cold start of the head model ζH,C , the inference latency of the
P WH,l |WH,l = 0, AW,H,l = 1 = 1 − γH,l τ, if WH,l =0 head model ζH,I , the transmission latency of the intermediate
⎩ data ζD , the latency for the cold start of the tail model ζT,C ,
0, otherwise
(12) and the inference latency of the tail model ζT,I . Note that
only when the container instances for the corresponding head
and and tail models are not maintained as warm statuses (i.e.,
⎧ =1 WH,l=AS = 0 and WT,l=AS = 0), the latencies for the cold
⎨ γT,l τ, if WT,l
=0
P WT,l |WT,l = 0, AW,T,l = 1 = 1 − γT,l τ, if WT,l start of the head and tail models, ζH,C and ζT,C , are needed.
⎩ Therefore, the cost function r(S, A) can be represented by (20),
0, otherwise.
(13) shown at the bottom of the page, where δ[ · ] is a function that
returns 1 if the given condition is true. Otherwise, it gives 0.
When container instances IH,l and IT,l are maintained as 2) Constraint Functions: The constraint functions cH (S, A)
cold statuses (i.e., WH,l = 0 and WT,l = 0) and head and tail and cT (S, A) for the average resource consumption of the head
edge clouds do not make IH,l and IT,l into warm statuses (i.e., and tail edge clouds can be defined as
AW,H,l = 0 and AW,T,l = 0), respectively, WH,l are
and WT,l
always 0. Therefore, the corresponding transition probabilities cH (S, A) = δ WH,l = 1 (21)
can be expressed by l

1, if
WH,l=0 and
P WH,l |WH,l = 0, AW,H,l = 0 = (14)
0, otherwise cT (S, A) = δ WT,l = 1 (22)
and l
=0
1, if WT,l respectively.
P WT,l |WT,l = 0, AW,T,l = 0 = (15)
0, otherwise.
Meanwhile, if the container instance for the head model IH,l E. Optimization Formulation
is in a warm status (i.e., WH,l = 1), without any additional The average inference latency ζL is defined as follows:
delay, the container instance can be turned off (become in a t
cold state) or can be kept in a warm status. This indicates 1
ζL = lim sup E r(St , At ) (23)
that the next state WH,l is always the same as the warming t→∞ t
t
actions AW,H,l . Consequently, the corresponding transition
probabilities are given as where St and At are the state and the chosen action at t ∈ T,
respectively.
=1
1, if WH,l
P WH,l |WH,l = 1, AW,H,l = 1 = (16) The average resource consumption of the head and tail edge
0, otherwise clouds, ψH and ψT , can be represented as
and t
1
=0
1, if WH,l ψH = lim sup E cH (St , At ) (24)
P WH,l |WH,l = 1, AW,H,l = 0 = (17) t→∞ t
0, otherwise. t

Similarly, for the container instance for the tail model IH,l and
in a warm status (i.e., WT,l = 1), the corresponding transition t

1
probabilities are represented as ψT = lim sup E cT (St , At ) (25)
t→∞ t
1, if
WT,l=1 t
P WT,l |WT,l = 1, AW,T,l = 1 = (18)
0, otherwise respectively.

r(S, A) = ζI + δ WH,l=AS = 0 ζC,H + ζI,H + ζD + δ WT,l=AS = 0 ζC,T + ζI,T (20)

Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
14528 IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO. 8, 15 APRIL 2024

TABLE II
We can express the CMDP model as follows: I NFERENCE L ATENCY (U NIT: S EC ) AND I NTERMEDIATE DATA S IZE
(U NIT: MB) ACCORDING TO THE S PLITTING P OINT
min ζL (26)
π
s.t. ψH ≤ θH,R and ψT ≤ θT,R (27)
where π is a policy that implies the probabilities of choosing a
specific action at each state. In addition, θH,R and θT,R denote
the thresholds for the resource consumption of the head and
tail edge clouds, respectively.
To convert the CMDP model to an equivalent LP model,
we define the stationary probabilities of state S and action A, The default evaluation settings are as follows. We consider
ϕ(S, A), as the decision variables of the LP model. Then, the a deployment architecture where a mobile device with Intel
LP model can be represented as Celeron n3350 CPU and two edge clouds with Intel i5-
11400F CPU exist. The default latency for the cold start
min ϕ(S, A)r(S, A) (28) is assumed as 2.0 s [33]. To measure the inference latency
ϕ(S,A)
S A according to the splitting point, we exploit VGG16 which is
subject to one of the basic convolutional neural network architectures.
We have measured the inference latency of VGG16 several
ϕ(S, A)cH (S, A) ≤ θH,R (29) times on a mobile device (i.e., Intel Celeron n3350 CPU),
S A which average is 7.87 s In addition, VGG16 is divided into
ϕ(S, A)cT (S, A) ≤ θT,R (30) head and tail models, which operate under the edge cloud
S A (i.e., Intel i5-11400F CPU). We then measure the inference
latency of the head and tail models considering five splitting
ϕ S , A = ϕ(S, A)P S |S, A (31)
points. Table II shows the measured inference latency and
A S A
intermediate data size according to the split points. Since we
ϕ(S, A) = 1 (32) measure the inference latency without any other background
S A applications, the measured inference latency can be considered
and as the inference latency when the edge cloud has the maximum
computing power. In addition, it is assumed that the inference
ϕ(S, A) ≥ 0. (33) latency is inversely proportional to the available computing
The objective function, (28), is to minimize inference power. Meanwhile, the average transmission rate between the
latency. The constraints in (29) and (30) can be compared to head and tail edge clouds is assumed to be 131.2 Mb/s [34].
the constraints of the CMDP model in (27). In addition, (31) The thresholds for the resource consumption of the edge
describes the constraint for the Chapman–Kolmogorov equa- clouds, θH,R and θT,R , are set to 2.
tion. The probability properties can be satisfied by the
constraints in (32) and (33). A. Effect of θT,R
Based on the solution ϕ ∗ (S, A) of the LP model mentioned Fig. 2 shows the effect of the threshold θT,R on the resource
above, we can derive the optimal stochastic policy π ∗ (S, A) consumption of the tail edge cloud. As shown in Fig. 2, DSCF
of the CMDP model. Using the optimal stochastic policy (i.e., can minimize the average inference latency ζL [see Fig. 2(a)]
optimal probability distribution), an appropriate action A can while maintaining the average resource consumption ψT of the
be chosen in a specific state S. tail edge clouds below the target threshold θT,R [see Fig. 2(b)].
This is because DSCF performs joint optimization on the
V. E VALUATION R ESULTS splitting point and container status. More precisely, DSCF
To show the effectiveness of DSCF on the inference latency dynamically determines the splitting point taking into account
and resource consumption of the edge cloud, we design the transmission rates and the computing power available from
the following comparison schemes: 1) MOBILE where the the head and tail edge clouds. For example, when the head
inference for the entire DNN model is conducted on the mobile edge cloud has a comparatively high computing power and
device; 2) OPT-ALL-WARM where the optimal splitting point the current transmission rate is insufficient for low-latency
is used4 and all container instances are kept in warm statuses; transmissions of intermediate data, DSCF decides that the
3) SM-ALL-COLD [32] where the DNN model is split at entire execution of the DNN model takes place in the head
a specific layer providing the smallest intermediate data size edge cloud. As another example, if the available computing
and all container instances are maintained as cold statuses; power of the tail edge cloud and the transmission rate are high,
and 4) SM-CO-WARM where the DNN model is split at a only the container instance for the whole DNN model can be
specific layer providing the smallest intermediate data size and maintained as a warm status in the tail edge cloud and conduct
the only corresponding container instances are maintained as the inference.
warm statuses. Meanwhile, from Fig. 2(a), it can be found that ζL of DSCF
decreases as θT,R increases. This is because a larger θT,R
4 The splitting point of OPT-ALL-WARM is the same as that of DSCF. indicates that more container instances for tail models can be

Fig. 2. Effect of θR . (a) Average inference latency. (b) Average resource Fig. 3. Effect of 1/γ . (a) Average inference latency. (b) Average resource
consumption of the tail edge cloud. consumption of the head edge cloud.

maintained as warm statuses (i.e., larger θT,R implies that the increasing 1/γ ). This result implies that DSCF can achieve a
tail edge cloud probably conducts the inference without any higher performance gain when more complex DNN models are
cold start), which leads to a decreased ζL [see Fig. 2(a)]. used. Note that the containers for complicated DNN models
On the other hand, from Fig. 2(a), it can be seen that ζL generally have longer latency for cold start.
of other comparison schemes remains constant regardless of Meanwhile, OPT-ALL-WARM maintains all container
variations in θT,R . This is because these comparison schemes instances in warm statuses. Also, in SM-CO-WARM, the
adhere to a fixed policy without accounting for the resource DNN model is split at a specific layer, providing the small-
consumption threshold of the tail edge cloud. est intermediate data size, and the corresponding container
instances are always maintained as warm statuses. Thus, the
average inference latency of OPT-ALL-WARM and SM-CO-
B. Effect of 1/γ
WARM is constant regardless of 1/γ [see Fig. 3(a)]. On the
Fig. 3(a) and (b) shows the effect of the average latency for other hand, since no container instance is maintained as a
cold start, denoted by 1/γ , on the average inference latency ζL warm status in SM-ALL-COLD, its average inference latency
and the average resource consumption of the head edge cloud, increases significantly as 1/γ increases [see Fig. 3(a)]. This
respectively. Interestingly, from Fig. 3(a), it can be seen that result indicates that the system operators of the serverless
the average inference latency ζL of DSCF decreases slightly architecture should exploit the warm start, especially when the
with increasing 1/γ . This can be explained as follows. When latency for the cold start is large.
large 1/γ is given, DSCF maintains aggressively container
instances as warm statuses to avoid any situations where
the cold start causes a long average inference latency. This C. Effect of Average Transmission Rate
operation of DSCF can be demonstrated in Fig. 3(b) (i.e., the Fig. 4 illustrates the effect of the average transmission rate
average resource consumption of the edge cloud increases with on the average inference latency ζL . It can be found that, with
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.
14530 IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO. 8, 15 APRIL 2024

[4] S. Wang, X. Zhang, H. Uchiyama, and H. Matsuda., “HiveMind:

Towards cellular native machine learning model splitting,” IEEE J. Sel.
Areas Commun., vol. 40, no. 2, pp. 626–640, Feb. 2022.
[5] C. Cicconetti, M. Conti, and A. Passarella, “A decentralized framework
for serverless edge computing in the Internet of Things,” IEEE Trans.
Netw. Service Manag., vol. 18, no. 2, pp. 2166–2180, Jun. 2021.
[6] S. Sarkar, R. Wankar, S. N. Srirama, and N. K. Suryadevara, “serverless
management of sensing systems for fog computing framework,” IEEE
Sensors J., vol. 20, no. 3, pp. 1564–1572, Feb. 2020.
[7] S. Hendrickson, S. Sturdevant, T. Harter, V. Venkataramani,
A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “Serverless
computation with OpenLambda,” in Proc. USENIX HotCloud, 2016,
pp. 33–39.
[8] H. Ko, S. Pack, and V. C. M. Leung, “Performance optimization of
serverless computing for latency-guaranteed and energy-efficient task
offloading in energy-harvesting Industrial IoT,” IEEE Internet Things J.,
vol. 10, no. 3, pp. 1897–1907, Feb. 2023.
[9] R. Xie, Q. Tang, C. Liang, F. R. Yu, and T. Huang, “Dynamic
computation offloading in IoT fog systems with imperfect channel-state
information: A POMDP approach,” IEEE Internet Things J., vol. 8,
Fig. 4. Effect of average transmission rate on ζL . no. 1, pp. 345–356, Jan. 2021.
[10] Q. Tang et al., “Decentralized computation offloading in IoT fog
computing system with energy harvesting: A Dec-POMDP approach,”
the exception of MOBILE, the average inference latency for all IEEE Internet Things J., vol. 7, no. 6, pp. 4898–4911, Jun. 2020.
schemes decreases as the average transmission rate increases. [11] “Study on traffic characteristics and performance requirements for
AI/ML model transfer in 5GS, Version 18.2.0,” 3GPP, Sophia Antipolis,
This trend is attributed to the fact that higher transmission France, Rep. TR 22.874, Dec. 2021.
rates facilitate faster transmission of intermediate data to the [12] “Study on 5G system support for AI/ML-based services, Version 1.0.0,”
edge cloud. 3GPP, Sophia Antipolis, France, Rep. TR 23.700-80, Sep. 2022.
[13] Y. Kang et al., “Neurosurgeon: Collaborative intelligence between the
Since the intermediate data sizes of SM-ALL-COLD and cloud and mobile edge,” in Proc. ASPLOS, 2017, pp. 615–629.
SM-CO-WARM are the smallest, it can be seen that their infer- [14] J. Yan, S. Bi, and Y. A. Zhang, “Optimal model placement and online
ence latency decreases slightly. Note that SM-ALL-COLD model splitting for device-edge co-inference,” 2021, arXiv:2105.13618.
[15] A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “JointDNN: An
and SM-CO-WARM split the DNN model at a specific layer, efficient training and inference engine for intelligent mobile cloud
providing the smallest intermediate data size. Meanwhile, computing services,” IEEE Trans. Mobile Comput., vol. 20, no. 2,
since there are no intermediate data in MOBILE, its inference pp. 565–576, Feb. 2021.
[16] W. He, S. Guo, S. Guo, X. Qiu, and F. Qi, “Joint DNN partition deploy-
latency is not affected by the average transmission rate. ment and resource allocation for delay-sensitive deep learning inference
in IoT,” IEEE Internet Things J., vol. 7, no. 10, pp. 9241–9254,
Oct. 2020.
VI. C ONCLUSION [17] X. Tang, X. Chen, L. Zeng, S. Yu, and L. Chen, “Joint multiuser DNN
partitioning and computational resource allocation for collaborative edge
In this article, we proposed DSCF, in which the optimal intelligence,” IEEE Internet Things J., vol. 8, no. 12, pp. 9511–9522,
splitting point and status of each container instance can Jun. 2021.
[18] P. K. Deb, C. Roy, A. Roy, and S. Misra, “DEFT: Decentralized
be obtained using the CMDP model. The evaluation results multiuser computation offloading in a fog-enabled IoV environ-
demonstrate that 1) DCSF can achieve less than half inference ment,” IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 15978–15987,
latency (around 3.5 s) compared to the local computing- Dec. 2020.
[19] E. Li, Z. Zhou, and X. Chen, “Edge intelligence: On-demand deep
based inference method and 2) DSCF can achieve higher learning model co-inference with device-edge synergy,” in Proc. ACM
performance gain (i.e., shorter inference latency) when longer MECOMM, 2018, pp. 31–36.
average cold start latency is expected (i.e., when more complex [20] S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane,
“SPINN: Synergistic progressive inference of neural networks over
neural networks are used). However, DSCF does not simulta- device and cloud,” in Proc. ACM MOBICOM, 2020, pp. 1–15.
neously exploit the inter- and intra-layer splitting approach for [21] A. Dehkordi, N. Vedula, J. Pei, F. Xia, L. Wang, and Y. Zhang, “Auto-
a single neural network. However, by employing both inter- Split: A general framework of collaborative edge-cloud AI,” in Proc.
ACM SIGKDD Conf. KDD, 2021, pp. 2543–2553.
and intra-layer splitting simultaneously, heterogeneous edge [22] M. Krouka, A. Elgabli, C. B. Issaid, and M. Bennis, “Energy-efficient
clouds’ resources can be optimally leveraged, resulting in a model compression and splitting for collaborative inference over time-
notable decrease in inference latency. Therefore, in our future varying channels,” 2021, arXiv:2106.00995.
[23] H. Zhou, W. Zhang, C. Wang, X. Ma, and H. Yu, “BBNet: A
work, we will extend DSCF to consider both inter- and intra- novel convolutional neural network structure in edge-cloud collaborative
layer splitting for heterogeneous cloud environments. inference,” MDPI Sens., vol. 21, no. 13, pp. 1–16, Jun. 2021.
[24] (Amazon Web Serv., Seattle, WA, USA). AWS Lambda
Documentation. Accessed: Dec. 31, 2023. [Online]. Available: https://
R EFERENCES docs.aws.amazon.com/lambda/
[25] G. McGrath and P. Brenner, “Serverless computing: Design, imple-
[1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing mentation, and performance,” in Proc. 37th IEEE ICDCSW, 2017,
of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105, pp. 405–410.
no. 12, pp. 2295–2329, Dec. 2017. [26] A. Fuerst and P. Sharma “FaasCache: Keeping serverless comput-
[2] P. K. Deb, A. Mukherjee, D. Singh, and S. Misra, “Loop-the-loops: ing alive with greedy-dual caching,” in Proc. ACM ASPLOS, 2021,
Fragmented learning over networks for constrained IoT devices,” IEEE pp. 386–400.
Trans. Parallel Distrib. Syst., vol. 34, no. 1, pp. 316–327, Jan. 2023. [27] R. Xie, Q. Tang, S. Qiao, H. Zhu, F. R. Yu, and T. Huang, “When
[3] Y. Matsubara, M. Levorato, and F. Restuccia, “Split computing and early serverless computing meets edge computing: Architecture, challenges,
exiting for deep learning applications: Survey and research challenges,” and open issues,” IEEE Wireless Commun., vol. 28, no. 5, pp. 126–133,
ACM Comput. Surv., vol. 55, no. 5, pp. 1–30, Dec. 2022. Oct. 2021.

[28] O. Ascigil, A. G. Tasiopoulos, T. K. Phan, V. Sourlas, I. Psaras, Daeyoung Jung received the B.S. degree from the
and G. Pavlou, “Resource provisioning and allocation in function-as- School of Electrical Engineering, Korea University,
a-service edge-clouds,” IEEE Trans. Services Comput., vol. 15, no. 4, Seoul, South Korea, in 2020, where he is currently
pp. 2410–2424, Jul./Aug. 2022. pursuing the M.S. and Ph.D. (integrated course)
[29] H. Ko, J. Lee, S. Seo, S. Pack, and V. C. M. Leung, “Joint client selection degrees.
and bandwidth allocation algorithm for federated learning,” IEEE Trans. His research interests include 5G/6G networks,
Mobile Comput., vol. 22, no. 6, pp. 3380–3390, Jun. 2023. network automation, mobile-edge computing, deep
[30] H. Ko and S. Pack, “A software-defined surveillance system with energy reinforcement learning, distributed computing, and
harvesting: Design and performance optimization,” IEEE Internet Things future Internet.
J., vol. 5, no. 3, pp. 1361–1369, Jun. 2018.
[31] Z. Xu et al., “Energy-aware inference offloading for DNN-driven
applications in mobile edge clouds,” IEEE Trans. Parallel Distrib. Syst.,
vol. 32, no. 4, pp. 799–814, Apr. 2021.
[32] Q. Yang, X. Luo, P. Li, T. Miyazaki, and X. Wang, “Computation
offloading for fast CNN inference in edge computing,” in Proc. ACM
RACS, 2019, pp. 101–106.
[33] M. Shilkov, “Comparison of cold starts in serverless functions across
AWS, Azure, and GCP.” 2021. [Online]. Available: https://mikhail.io/
serverless/coldstarts/big3/
[34] K. Kiela, M. Jurgo, V. Macaitis, and R. Navickas, “5G standalone
and 4G multi-carrier network-in-a-box using a software defined radio
framework,” MDPI Sens., vol. 21, no. 16, pp. 1–18, Aug. 2021.

Haneul Ko (Senior Member, IEEE) received the

B.S. and Ph.D. degrees from the School of Electrical
Engineering, Korea University, Seoul, South Korea,
in 2011 and 2016, respectively. Sangheon Pack (Senior Member, IEEE) received
He is currently an Assistant Professor with the the B.S. and Ph.D. degrees in computer engineering
Department of Electronic Engineering, Kyung Hee from Seoul National University, Seoul, South Korea,
University, Yongin, South Korea. From 2019 to in 2000 and 2005, respectively.
2022, he was an Assistant Professor with the In 2007, he joined the faculty of Korea
Department of Computer and Information Science, University, Seoul, where he is currently a Professor
Korea University (Sejong Campus), Sejong, South with the School of Electrical Engineering. From
Korea. From 2017 to 2018, he was a Postdoctoral 2005 to 2006, he was a Postdoctoral Fellow
Fellow with the University of British Columbia, Vancouver, BC, Canada. with the Broadband Communications Research
From 2016 to 2017, he was a Postdoctoral Fellow of Mobile Network Group, University of Waterloo, Waterloo, ON,
and Communications with Korea University. His research interests include Canada. His research interests include softwarized
5G/6G networks, network automation, mobile cloud computing, SDN/NFV, networking (SDN/NFV), 5G/6G mobile core networks, mobile-edge comput-
and Future Internet. ing/programmable data plane, and vehicular networking.
Dr. Ko was the recipient of the Minister of Education Award in 2019, Prof. Pack was the recipient of the IEEE/Institute of Electronics and
the IEEE ComSoc APB Outstanding Young Researcher Award in 2022, and Information Engineers (IEIE) Joint Award for IT Young Engineers Award
the Korean Institute of Communications and Information Sciences Haedong in 2017, the Korean Institute of Information Scientists and Engineers Young
Young Engineer Award in 2023. Information Scientist Award in 2017, the Korea University TechnoComplex
Crimson Professor in 2015, the Korean Institute of Communications and
Information Sciences Haedong Young Scholar Award in 2013, the LG Yonam
Foundation Overseas Research Professor Program in 2012, and the IEEE
ComSoc APB Outstanding Young Researcher Award in 2009. He served
Hyeonjae Jeong received the B.S. degree from as the TPC Vice-Chair for Information Systems of IEEE WCNC 2020, the
Chungnam University, Daejeon, South Korea, in Track Chair for IEEE VTC 2020-Fall/2010-Fall and IEEE CCNC 2019,
2019. She is currently pursuing the M.S. and Ph.D. the TPC Chair for IEEE/IEIE ICCE-Asia 2018/2020, EAI Qshine 2016,
(integrated course) degrees with Korea University, and ICOIN 2020, the Publication Co-Chair for IEEE INFOCOM 2014 and
Seoul, South Korea. ACM MobiHoc 2015, the Symposium Chair for IEEE WCSP 2013, the TPC
Her research interests include distributed com- Vice-Chair for ICOIN 2013, and the Publicity Co-Chair for IEEE SECON
puting, federated learning, 5G/6G mobile core 2012. He is an Editor of IEEE I NTERNET OF T HINGS J OURNAL, Journal of
networks, mobile-edge computing, and network Communications and Networks, and IET Communications, and he was a Guest
automation. Editor of IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTING and
IEEE T RANSACTIONS ON N ETWORK S CIENCE AND E NGINEERING.

Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on June 29,2024 at 13:38:52 UTC from IEEE Xplore. Restrictions apply.

God Se Apteek Deel 1
30% (10)
God Se Apteek Deel 1
2 pages
Manual: Loading Computer System
100% (1)
Manual: Loading Computer System
29 pages
How To Hack Wifi in Windows 7 - 8 - 8.1 - 10 Without Any Software - Using With CMD
No ratings yet
How To Hack Wifi in Windows 7 - 8 - 8.1 - 10 Without Any Software - Using With CMD
10 pages
Chapter 1+2
No ratings yet
Chapter 1+2
50 pages
YouSz 20210421
No ratings yet
YouSz 20210421
26 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Applsci 12 09124 v2
No ratings yet
Applsci 12 09124 v2
36 pages
Moving Deep Learning To The Edge
No ratings yet
Moving Deep Learning To The Edge
33 pages
A Docker Based Federated Learning Framework Design and Deployment For Multi Modal Data Stream Classification
No ratings yet
A Docker Based Federated Learning Framework Design and Deployment For Multi Modal Data Stream Classification
35 pages
Paper 2
No ratings yet
Paper 2
21 pages
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
No ratings yet
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
36 pages
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
No ratings yet
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
20 pages
Reda Hps PDF
100% (1)
Reda Hps PDF
1 page
Elastic Flowfasdfasdf
No ratings yet
Elastic Flowfasdfasdf
15 pages
Cross-Edge Orchestration of Serverless Functions With Probabilistic Caching
No ratings yet
Cross-Edge Orchestration of Serverless Functions With Probabilistic Caching
12 pages
Partitioning DNNs For Optimizing Distributed Inference Performance-Applsci-12-10619-V2
No ratings yet
Partitioning DNNs For Optimizing Distributed Inference Performance-Applsci-12-10619-V2
14 pages
DNNSplit Latency and Cost-Efficient Split Point Identification For Multi-Tier DNN Partitioning
No ratings yet
DNNSplit Latency and Cost-Efficient Split Point Identification For Multi-Tier DNN Partitioning
15 pages
Neural Discrete Diffusion For Text-to-Image Synthesis
No ratings yet
Neural Discrete Diffusion For Text-to-Image Synthesis
12 pages
Online-Learning Task Scheduling With GNN-RL Scheduler
No ratings yet
Online-Learning Task Scheduling With GNN-RL Scheduler
17 pages
Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
No ratings yet
Failure-Resilient Distributed Inference With Model Compression Over Heterogeneous Edge Devices
12 pages
1 s2.0 S0167739X23003862 Main
No ratings yet
1 s2.0 S0167739X23003862 Main
15 pages
Edge AI: On-Demand Accelerating Deep Neural Network Inference Via Edge Computing
No ratings yet
Edge AI: On-Demand Accelerating Deep Neural Network Inference Via Edge Computing
11 pages
An Open Source Framework Based On Kafka-ML - Ai
No ratings yet
An Open Source Framework Based On Kafka-ML - Ai
10 pages
A Multi-Model Edge Computing Offloading Framework For Deep Learning Application Based On Bayesian Optimization
No ratings yet
A Multi-Model Edge Computing Offloading Framework For Deep Learning Application Based On Bayesian Optimization
13 pages
DPS Dynamic Pricing and Scheduling For Distributed Machine Learning Jobs in Edge-Cloud Networks
No ratings yet
DPS Dynamic Pricing and Scheduling For Distributed Machine Learning Jobs in Edge-Cloud Networks
17 pages
Distributed Split Computing System in Cooperative Internet of Things IoT
No ratings yet
Distributed Split Computing System in Cooperative Internet of Things IoT
10 pages
Aop Iccps 2024
No ratings yet
Aop Iccps 2024
10 pages
INFOCOM22 O RDC TechnicalReport
No ratings yet
INFOCOM22 O RDC TechnicalReport
12 pages
A Case For Adaptive Deep Neural Networks
No ratings yet
A Case For Adaptive Deep Neural Networks
11 pages
Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices
No ratings yet
Clustering Enhanced Reinforcement Learning For Adaptive Offloading in Resource Constrained Devices
8 pages
Task Offloading Optimization Mechanism Based On Deep Neural Network in Edge-Cloud Environment
No ratings yet
Task Offloading Optimization Mechanism Based On Deep Neural Network in Edge-Cloud Environment
12 pages
Early Exiting-Aware Joint Resource Allocation and DNN
No ratings yet
Early Exiting-Aware Joint Resource Allocation and DNN
15 pages
Large Language Models LLMs Inference Offloading and Resource Allocation in Cloud-Edge Computing An Active Inference Approach
No ratings yet
Large Language Models LLMs Inference Offloading and Resource Allocation in Cloud-Edge Computing An Active Inference Approach
12 pages
Edge-Enabled Distributed Deep Learning For 5G Privacy Protection
No ratings yet
Edge-Enabled Distributed Deep Learning For 5G Privacy Protection
7 pages
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
No ratings yet
Multi-View Ensemble Federated Learning For Efficient Prediction of Consumer Electronics Applications in Fog Networks
8 pages
Dynamic Pricing For On-Demand DNN Inference in The Edge-AI Market
No ratings yet
Dynamic Pricing For On-Demand DNN Inference in The Edge-AI Market
18 pages
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
No ratings yet
Resource Allocation For Edge Computing in IoT Networks Via Reinforcement Learning
6 pages
Road Runner Algo
No ratings yet
Road Runner Algo
6 pages
Applsci 12 08411 v2
No ratings yet
Applsci 12 08411 v2
20 pages
Intelligent Edge: Leveraging Deep Imitation Learning For Mobile Edge Computation Offloading
No ratings yet
Intelligent Edge: Leveraging Deep Imitation Learning For Mobile Edge Computation Offloading
8 pages
Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights
No ratings yet
Offloaded Execution of Deep Learning Inference at Edge: Challenges and Insights
6 pages
AdaEE - Adaptive Early-Exit DNN InferenceThrough Multi-Armed Bandits
No ratings yet
AdaEE - Adaptive Early-Exit DNN InferenceThrough Multi-Armed Bandits
7 pages
Self-Aware Distributed Deep Learning Framework For Heterogeneous IoT
No ratings yet
Self-Aware Distributed Deep Learning Framework For Heterogeneous IoT
13 pages
DNN Partitioning For Inference Throughput Acceleration at The Edge
No ratings yet
DNN Partitioning For Inference Throughput Acceleration at The Edge
14 pages
Tensor-Based Lyapunov Deep Neural Networks Offloading Control Strategy With Cloud-Fog-Edge Orchestration
No ratings yet
Tensor-Based Lyapunov Deep Neural Networks Offloading Control Strategy With Cloud-Fog-Edge Orchestration
9 pages
Jointdnn: An Efficient Training and Inference Engine For Intelligent Mobile Cloud Computing Services
No ratings yet
Jointdnn: An Efficient Training and Inference Engine For Intelligent Mobile Cloud Computing Services
12 pages
PCS21
No ratings yet
PCS21
7 pages
TL-WR844N (EU) 1.0 Datasheet
100% (1)
TL-WR844N (EU) 1.0 Datasheet
5 pages
Digital Carbon Footprint Awareness Among High School Students
No ratings yet
Digital Carbon Footprint Awareness Among High School Students
20 pages
Diploma in Legal Studies 27.04.22
No ratings yet
Diploma in Legal Studies 27.04.22
17 pages
Sfepy Manual
No ratings yet
Sfepy Manual
988 pages
Cmcp700s-Cvt Manual v1.1
No ratings yet
Cmcp700s-Cvt Manual v1.1
8 pages
Debashis 006
No ratings yet
Debashis 006
16 pages
Rittal White Paper 401: The Benefits of Busbar Power Distribution Systems For North American & Global Applications
No ratings yet
Rittal White Paper 401: The Benefits of Busbar Power Distribution Systems For North American & Global Applications
9 pages
Mitsubishi Outdoor Bơm Nhiệt GN Nước
No ratings yet
Mitsubishi Outdoor Bơm Nhiệt GN Nước
34 pages
Unit 1 - ADT
No ratings yet
Unit 1 - ADT
26 pages
Parameter List EPA Commander SK (English)
No ratings yet
Parameter List EPA Commander SK (English)
2 pages
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
No ratings yet
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
2 pages
Copy of Копия - Short Film Budget Template -
No ratings yet
Copy of Копия - Short Film Budget Template -
3 pages
PW2 - Type of Fiber and Stripping Process SESI 1 2022 - 2023
No ratings yet
PW2 - Type of Fiber and Stripping Process SESI 1 2022 - 2023
12 pages
Assingment Ai
No ratings yet
Assingment Ai
7 pages
Conference
No ratings yet
Conference
3 pages
Assessment User Experience Responsive Web Applications Case Study
No ratings yet
Assessment User Experience Responsive Web Applications Case Study
8 pages
Lesson 1 in ICT FIRST QUARTER
No ratings yet
Lesson 1 in ICT FIRST QUARTER
2 pages
Stahl Control 6 K
No ratings yet
Stahl Control 6 K
12 pages
Ai 2024 Board Paper Solution
No ratings yet
Ai 2024 Board Paper Solution
4 pages
Zombie
No ratings yet
Zombie
5 pages
A High-Efficiency Step-Up Current-Fed PushPull Quasi-Resonant Converter With Fewer Components For Fuel Cell Application
No ratings yet
A High-Efficiency Step-Up Current-Fed PushPull Quasi-Resonant Converter With Fewer Components For Fuel Cell Application
10 pages
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
No ratings yet
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
4 pages
Unit 11: Travel Planning
No ratings yet
Unit 11: Travel Planning
6 pages
Year 7 Revision Final Solved
No ratings yet
Year 7 Revision Final Solved
14 pages
Daily Water Station Check List
No ratings yet
Daily Water Station Check List
1 page
TH460 Service Report 023832
No ratings yet
TH460 Service Report 023832
1 page
Fog and Edge Computing: Principles and Paradigms
From Everand
Fog and Edge Computing: Principles and Paradigms
Rajkumar Buyya
No ratings yet
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
Efficient AI Solutions: Deploying Deep Learning with ONNX and CUDA
From Everand
Efficient AI Solutions: Deploying Deep Learning with ONNX and CUDA
Peter Jones
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Deno KV for Scalable, Distributed Applications: The Complete Guide for Developers and Engineers
From Everand
Deno KV for Scalable, Distributed Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building Container Solutions with Fargate: Definitive Reference for Developers and Engineers
From Everand
Building Container Solutions with Fargate: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Weave Networking for Cloud-Native Infrastructure: Definitive Reference for Developers and Engineers
From Everand
Weave Networking for Cloud-Native Infrastructure: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Flannel Networking Essentials: Definitive Reference for Developers and Engineers
From Everand
Flannel Networking Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
From Everand
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
VXLAN Network Virtualization Guide: Definitive Reference for Developers and Engineers
From Everand
VXLAN Network Virtualization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Dynamic Split Computing Framework in Distributed Serverless Edge Clouds

Uploaded by

Dynamic Split Computing Framework in Distributed Serverless Edge Clouds

Uploaded by

IEEE INTERNET OF THINGS JOURNAL, VOL. 11, NO.

8, 15 APRIL 2024 14523

Dynamic Split Computing Framework in

available computing power of the tail edge cloud, respectively.

It is assumed that the times required for container instances and

[4] S. Wang, X. Zhang, H. Uchiyama, and H. Matsuda., “HiveMind:

Haneul Ko (Senior Member, IEEE) received the

You might also like