Partitioning DNNs For Optimizing Distributed Inference Performance-Applsci-12-10619-V2
Partitioning DNNs For Optimizing Distributed Inference Performance-Applsci-12-10619-V2
sciences
Article
Partitioning DNNs for Optimizing Distributed Inference
Performance on Cooperative Edge Devices: A Genetic
Algorithm Approach
Jun Na 1 , Handuo Zhang 2 , Jiaxin Lian 2 and Bin Zhang 1, *
Abstract: To fully unleash the potential of edge devices, it is popular to cut a neural network into
multiple pieces and distribute them among available edge devices to perform inference cooperatively.
Up to now, the problem of partitioning a deep neural network (DNN), which can result in the optimal
distributed inferencing performance, has not been adequately addressed. This paper proposes a
novel layer-based DNN partitioning approach to obtain an optimal distributed deployment solu-
tion. In order to ensure the applicability of the resulted deployment scheme, this work defines the
partitioning problem as a constrained optimization problem and puts forward an improved genetic
algorithm (GA). Compared with the basic GA, the proposed algorithm can result in a running time
approximately one to three times shorter than the basic GA while achieving a better deployment.
inference cooperatively [16,17]. This approach could overcome the problems above by
keeping the inferencing process in the edge network. Nevertheless, it is more challenging
to partition and distribute a neural network to achieve optimal performance, as it is an
NP-hard problem. Although some strategies have been developed in an attempt to split
a DNN into several parts effectively [18–20], most of them pay more attention to the
methodology of reorganizing the network structure rather than optimizing the process
for getting an optimal solution from the perspective of the actual system running. Hence,
the problem of partitioning a DNN model to achieve optimal deployment has not been
adequately addressed.
This paper proposes a novel layer-based partitioning approach to obtain an optimal
DNN deployment solution. In order to ensure the applicability of the resulting deployment
scheme, the partitioning problem is defined as a constrained optimization problem and
an improved genetic algorithm (GA) is proposed to ensure the generation of feasible
candidate solutions after each crossover and mutation operation. Compared to the basic
GA, the proposed GA in this paper results a running time that is one to three times shorter
than that of the basic GA, while obtaining a better deployment. The main contributions of
this paper are as follows:
• Firstly, the DNN model partitioning problem is modeled as a constrained optimization
problem and the corresponding problem is introduced.
• Secondly, the paper puts forward a novel genetic algorithm to shorten solving time by
ensuring the validity of chromosomes after crossover and mutation operation.
• Finally, experiments are performed on several existing DNN models, including
AlexNet, ResNet110, MobelNet, and SqueenzeNet, to present a more comprehen-
sive evaluation.
The remainder of this paper is organized as follows: Section 2 gives an overview
of the related work. Section 3 presents the problem definition of the DNN partition
problem. Section 4 introduces the details of the proposed algorithm. Section 5 provides the
experimental results, and Section 6 concludes the paper.
2. Literature Review
As most modern DNNs are constructed by layers, such as the convolutional layer,
the fully connected layer, and the pooling layer, layer-based partitioning is the most intuitive
DNN partitioning strategy. For example, Ref. [14] proposed to partition a CNN model at
the end of the convolutional layer, allocating the convolutional layers at the edge and the
rest of the fully-connected layers at the host. Unlike this fixed partitioning strategy, recent
methodologies have focused on adapting their results to the actual inferencing environment.
Generally, depending on the construction of the target deployment environment, existing
methods are divided into the following two categories.
According to the basic idea of the cloud-assisted approaches, some studies try to divide
a given DNN model into two sets and push the latter part to the cloud server. For example,
Ref. [13] designed a lightweight scheduler named Neurosurgeon to automatically partition
DNN computation between mobile devices and data centers based on neural network
layers. Similarly, Refs. [21–23] adopted the same strategy, while they took some further
processing. In [21], the authors integrated DNN right-sizing to accelerate the inference
by early exiting inference at an intermediate layer. In contrast, Ref. [22] first added early
exiting points to the original network and then partitioned the reformed network into
two parts. To determine the optimal single cut point, all of [13,21,22] applied exhaustive
searching, while [23] solved the problem with mixed-integer linear programming.
For making full use of the available resources in the edge environment, more DNN
partitioning strategies have been emerging to divide a DNN model into more than two
pieces for distributing the inference task among several edge devices. Generally, based on
the object to be partitioned, there are four kinds of main strategies, i.e., partitioning the
inputs [24,25], weights [26], and layers [18,19], as well as hybrid strategies [17,20,27–30].
Partitioning the inputs or weights focuses on the large storage requirements for storing
Appl. Sci. 2022, 12, 10619 3 of 14
large inputs or weights. Partitioning the DNN layers can solve the depth problem of DNN
inferencing. Furthermore, the hybrid strategies aim to solve both problems mentioned
above. For example, Ref. [27] employed input partitioning after layer-based partitioning
to obtain a small enough group of inferencing tasks to be executed. The authors of [20]
proposed fused tile partitioning (FTP) to fuse layers and partition them vertically in a grid
fashion. The authors of [29] modeled a neural network as a data-flow graph where vertices
are input data, operations, or output data and edges are data transfers between vertices.
Then, the problem was transformed into a graph partitioning problem.
Nearly all of the above works take inference delay or energy consumption as the opti-
mization objectives. Recently, more studies have begun to focus on the joint optimization of
DNN partitioning and resource allocation [31–33]. However, it is still an open and critical
challenge to achieve an optimal DNN distributed deployment. Unlike existing approaches,
this work models the DNN partitioning problem as a constrained optimization problem,
aiming to achieve the optimal inference performance with available resources in the edge
environment. Moreover, it proposes a novel genetic algorithm to optimize the solving
process of the formulated optimization problem.
where tci,j is the time of executing sub-model pi on device d j , and tri and tsi are the time for
receiving the input of pi and sending the output of pi , respectively. If tti is used to represent
the total transmission time, then tti = tri + tsi and ti,j = tci,j + tti .
In addition, because not all sub-models can run directly on any edge device, it also
needs to consider whether an edge device can complete a specific inferencing task according
to its current state. For example, it is necessary to determine that the available memory is
Appl. Sci. 2022, 12, 10619 4 of 14
enough and its remaining battery capacity is sufficient. Suppose m j is the size of available
memory on device d j and rmi is the required memory for running sub-model pi . If pi can
be executed on device d j , the following inequality must be true.
rmi ≤ m j (2)
Similarly, if ep j is the average running power of device d j and c j is the remaining battery
capacity on device d j , then if pi can be executed on device d j , the following inequality must
be true.
ep j × ti,j ≤ c j (3)
Above all, the DNN partitioning problem is formulated as a constrained optimization
problem, trying to minimizing the total execution time of the DNN inferencing under given
limitation of edge devices’ available memory and energy. The corresponding objective
function is formulated as follows:
n n
min ∑ ∑ αi,j × ti,j
i =1 j =1
rmi ≤ m j αi,j = 1,
(4)
ep j × ti,j ≤ c j αi,j = 1,
s.t. ∑in=1 αi,j =1 ∀ j ∈ {1, . . . , n},
∑n α
=1 ∀i ∈ {1, . . . , n},
j=1 i,j
αi,j ∈ {0, 1} ∀i, j ∈ {1, . . . , n}.
Figure 1. Illustration of a distributed DNN inference by collaboration between multiple edge devices.
section. On this basis, it puts forward the ideas for the improvements in this work and then
describes the corresponding algorithms in detail.
Figure 2. Illustration of the computing process in the partially mapped crossover operator.
In Figure 2, C1 and C2 are the two father individuals, while C11 and C21 are two new
individuals generated by swapping the subsections in each father individual included in the
rectangles. It is not difficult to find that the assumption that each sub-model only contains
continuous layers is broken during the above crossover operation. For example, layers
L2 and L5 are grouped together and deployed to device d2 in the left new individual C11 ,
while layers L2 , L5 , and L6 are grouped together and L1 , L3 and L4 are grouped together
in the right new individual C21 . Such deployments will lead to extra network bandwidth
and equipment energy consumption caused by repeated transmission between devices.
For example, if deploying the DNN according to C11 , the output of L1 will be sent from d1
to d2 , and then the output of L2 will be sent back from d2 to d1 . In turn, the output of L4
will be sent from d1 to d2 again. As a result, the intermediate results need to be transferred
four times among the three devices, twice as many as deployed according to C1 .
Appl. Sci. 2022, 12, 10619 6 of 14
Figure 3. An example the relationship between a DNN structure, a partitioning chromosome and a
deployment chromosome.
On this basis, the basic GA needs to be improved in the following two aspects.
• On the one hand, the initial population generation needs to be modified according
to the above chromosome classification. The initialization process should be divided
into two steps: first, the random generation of a partitioning population. Then, the
derivation of the corresponding deployment population based on Algorithm 1.
• On the other hand, after selecting excellent individuals out of the deployment pop-
ulation, the corresponding partitioning population should be extracted based on
Appl. Sci. 2022, 12, 10619 8 of 14
Algorithm 3 first predicts and stores the execution time of each DNN layer according
to Equation (1) for calculating individual fitness (from line 2 to line 4). Line 5 and line
6 initialize a deployment population. The while statement from line 9 to line 19 is the
main loop in the algorithm. First, line 10 updates the current number of iterations and
line 11 selects outstanding individuals from the current population to OP. Then, line 11
and line 12 extract the corresponding partitioning chromosomes from OP and perform
crossover and mutation to generate a new partitioning population PP. Line 14 constructs a
new deployment population according to OP and PP, each of which has a corresponding
partitioning chromosome in PP. In the end, the individual with the maximum fitness
in the current population is resulted through a specified number of times consecutively
as the stop condition. If so, the loop is exited. Otherwise, search is continued until
the maximum number of iterations MAXGEN is reached. Finally, the algorithm returns
Appl. Sci. 2022, 12, 10619 9 of 14
the deployment chromosome corresponding to the current maximum fitness as the final
optimal deployment scheme.
According to the optimization objectives described in Section 3, the fitness function is
defined as follows.
n n105 if dc satisfies all constraints
f itness(dc) = ∑i=1 ∑ j=1 dci,j ×ti,j (6)
10−6 if dc does not satisfy all constraints
In the above fitness function, dc is the α matrix in the formulated problem definition
(shown in Equation (4)) for calculating the fitness of a specific deployment chromosome.
5. Performance Evaluation
This section evaluates the performance of the proposed DNN partitioning method
on four real-world CNNs. It presents experimental results and compares them to other
existing methodologies to demonstrate that the proposed algorithm can execute given
CNN inference on a group of distributed collaborative edge devices in a shorter time.
I/O Bandwidth
Device No. GFLOPS Battery Capacity (J)
(MBPS)
1 0.218 250 140.85
2 9.92 20 1525.63
3 0.213 500 135.89
4 13.5 10 1698.25
5 0.247 300 140.91
6 3.62 200 159.45
The experiments are executed on a laptop with an AMD Ryzen7 5700U CPU and 16 GB
memory in a Pycharm environment. The following results are collected by running a same
algorithm ten times as a group.
As a result, the proposed algorithm can produce better deployments under different
scenarios compared to the basic genetic algorithm. The performance of some solutions is
even close to the optimal deployment generated by the exhaustive method.
Number of
DNN Model Exhaustive Method Improved GA Basic GA
Partitions
3 490.43 267.20 275.46
AlexNet 5 7870.74 6563.56 7214.8
7 3,675,603.00 6563.56 24,991.9
3 490.97 372.15 381.81
SqueezeNet 5 41,385.8 6568.28 6713.38
6 1,476,833 11,197.10 12,572.80
2 485.28 231.70 268.69
MobileNet 3 751.76 335.14 341.12
4 29,311.00 1238.22 1357.69
2 530.85 269.12 381.98
ResNet110 3 12,665.30 1620.24 4086.40
4 5,217,793.75 7322.81 19,345.10
The above table shows that the running time of all three algorithms increases sig-
nificantly with the growing number of devices or DNN layers. However, the improved
GA needs the least time to obtain a better solution. For example, in partitioning AlexNet,
the improved GA needs about 1.84×, 1.26×, and 166.74× shorter time than the exhaustive
method in each scenario. In partitioning ResNet110 into three parts, the improved GA can
save 712.13× running time compared to applying the exhaustive method and nearly 3×
running time compared to applying the basic GA. It can be seen that when the problem
size gets larger, the propose GA has better execution efficiency.
6. Conclusions
This paper establishes a dynamic DNN partitioning and deployment system model to
represent the actual application requirements of distributed DNN inferencing in an edge
environment. On this basis, the problem of optimal deployment-oriented DNN partitioning
is modeled as a constrained optimization problem. Considering that the crossover and
mutation operators in a basic genetic algorithm may produce many infeasible solutions,
it aims to distinguish two types of chromosomes, i.e., partitioning chromosomes and
deployment chromosomes. Then, it performs the crossover and mutation operations on
partitioning chromosomes to ensure generating reasonable deployment chromosomes and
produce new deployment chromosomes based on the updated partitioning population and
the select excellent deployment individuals for the next iteration. The experimental results
show that the proposed algorithm can not only result in shorter inferencing time and lower
device average energy cost, but also needs less time to achieve an optimal deployment.
To further improve this work, a potential future research direction is to try to reduce
working on CPU by constructing proper mathematical models. In addition, 3D image-
related applications will be considered in the future.
Appl. Sci. 2022, 12, 10619 13 of 14
Author Contributions: Conceptualization, J.N. and B.Z.; methodology, J.N., H.Z. and J.L.; validation,
J.L.; investigation, J.N. and H.Z.; data curation, H.Z. and J.L.; writing—original draft preparation,
J.N.; writing—review and editing, J.N.; supervision, B.Z.; funding acquisition, B.Z. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by the Key Project of the National Natural Science Foundation of
China: U1908212.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets can be obtained from http://www.cs.toronto.edu/~kriz/
cifar.html (accessed on 17 November 2021).
Conflicts of Interest: The funders had no role in the design of the study; in the collection, analyses,
or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
1. Dec, G.; Stadnicka, D.; Paśko, Ł.; Madziel,
˛ M.; Figliè, R.; Mazzei, D.; Tyrovolas, M.; Stylios, C.; Navarro, J.; Solé-Beteta, X. Role of
Academics in Transferring Knowledge and Skills on Artificial Intelligence, Internet of Things and Edge Computing. Sensors 2022,
22, 2496. [CrossRef] [PubMed]
2. Paśko, Ł.; Madziel,
˛ M.; Stadnicka, D.; Dec, G.; Carreras-Coch, A.; Solé-Beteta, X.; Pappa, L.; Stylios, C.; Mazzei, D.; Atzeni, D.
Plan and Develop Advanced Knowledge and Skills for Future Industrial Employees in the Field of Artificial Intelligence, Internet
of Things and Edge Computing. Sustainability 2022, 14, 3312.
3. Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge intelligence: Paving the last mile of artificial intelligence with edge
computing. Proc. IEEE 2019, 107, 1738–1762. [CrossRef]
4. Murshed, M.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine learning at the network edge: A
survey. Acm Comput. Surv. 2021, 54, 1–37. [CrossRef]
5. Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [CrossRef]
6. Liang, X.; Liu, Y.; Chen, T.; Liu, M.; Yang, Q. Federated transfer reinforcement learning for autonomous driving. arXiv 2019,
arXiv:1910.06001.
7. Zhang, Q.; Sun, H.; Wu, X.; Zhong, H. Edge video analytics for public safety: A review. Proc. IEEE 2019, 107, 1675–1696.
[CrossRef]
8. Liang, F.; Yu, W.; Liu, X.; Griffith, D.; Golmie, N. Toward edge-based deep learning in industrial Internet of Things. IEEE Internet
Things J. 2020, 7, 4329–4341. [CrossRef]
9. Qolomany, B.; Al-Fuqaha, A.; Gupta, A.; Benhaddou, D.; Alwajidi, S.; Qadir, J.; Fong, A.C. Leveraging machine learning and big
data for smart buildings: A comprehensive survey. IEEE Access 2019, 7, 90316–90356. [CrossRef]
10. Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv 2017,
arXiv:1710.09282.
11. Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model compression and hardware acceleration for neural networks: A comprehensive
survey. Proc. IEEE 2020, 108, 485–532. [CrossRef]
12. Choudhary, T.; Mishra, V.; Goswami, A.; Sarangapani, J. A comprehensive survey on model compression and acceleration. Artif.
Intell. Rev. 2020, 53, 5113–5155. [CrossRef]
13. Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative intelligence between the
cloud and mobile edge. ACM Sigarch Comput. Archit. News 2017, 45, 615–629. [CrossRef]
14. Ko, J.H.; Na, T.; Amir, M.F.; Mukhopadhyay, S. Edge-host partitioning of deep neural networks with feature space encoding for
resource-constrained internet-of-things platforms. In Proceedings of the 2018 15th IEEE International Conference on Advanced
Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; IEEE: Piscataway, NJ, USA, 2018;
pp. 1–6.
15. Jeong, H.J.; Lee, H.J.; Shin, C.H.; Moon, S.M. IONN: Incremental offloading of neural network computations from mobile
devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, Carlsbad, CA, USA, 11–13 October 2018;
pp. 401–411.
16. Jouhari, M.; Al-Ali, A.; Baccour, E.; Mohamed, A.; Erbad, A.; Guizani, M.; Hamdi, M. Distributed CNN Inference on Resource-
Constrained UAVs for Surveillance Systems: Design and Optimization. IEEE Internet Things J. 2021, 9, 1227–1242. [CrossRef]
17. Tang, E.; Stefanov, T. Low-memory and high-performance CNN inference on distributed systems at the edge. In Proceedings of
the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion, Leicester, UK, 6–9 December 2021;
pp. 1–8.
18. Zhou, J.; Wang, Y.; Ota, K.; Dong, M. AAIoT: Accelerating artificial intelligence in IoT systems. IEEE Wirel. Commun. Lett. 2019,
8, 825–828. [CrossRef]
Appl. Sci. 2022, 12, 10619 14 of 14
19. Zhou, L.; Wen, H.; Teodorescu, R.; Du, D.H. Distributing deep neural networks with containerized partitions at the edge. In
Proceedings of the 2nd USENIX Workshop on Hot Topics in Edge Computing (HotEdge 19), Renton, WA, USA, 9 July 2019.
20. Zhao, Z.; Barijough, K.M.; Gerstlauer, A. Deepthings: Distributed adaptive deep learning inference on resource-constrained iot
edge clusters. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 2348–2359. [CrossRef]
21. Li, E.; Zeng, L.; Zhou, Z.; Chen, X. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE
Trans. Wirel. Commun. 2019, 19, 447–457. [CrossRef]
22. Wang, H.; Cai, G.; Huang, Z.; Dong, F. ADDA: Adaptive distributed DNN inference acceleration in edge computing environment.
In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), Tianjin, China, 4–6
December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 438–445.
23. Gao, M.; Cui, W.; Gao, D.; Shen, R.; Li, J.; Zhou, Y. Deep neural network task partitioning and offloading for mobile edge
computing. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13
December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6.
24. Mao, J.; Chen, X.; Nixon, K.W.; Krieger, C.; Chen, Y. Modnn: Local distributed mobile computing system for deep neural network.
In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31
March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1396–1401.
25. Mao, J.; Yang, Z.; Wen, W.; Wu, C.; Song, L.; Nixon, K.W.; Chen, X.; Li, H.; Chen, Y. Mednn: A distributed mobile system with
enhanced partition and deployment for large-scale dnns. In Proceedings of the 2017 IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), Irvine, CA, USA, 13–16 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 751–756.
26. Shahhosseini, S.; Albaqsami, A.; Jasemi, M.; Bagherzadeh, N. Partition pruning: Parallelization-aware pruning for deep neural
networks. arXiv 2019, arXiv:1901.11391.
27. Kilcioglu, E.; Mirghasemi, H.; Stupia, I.; Vandendorpe, L. An energy-efficient fine-grained deep neural network partitioning
scheme for wireless collaborative fog computing. IEEE Access 2021, 9, 79611–79627. [CrossRef]
28. Hadidi, R.; Cao, J.; Woodward, M.; Ryoo, M.S.; Kim, H. Musical chair: Efficient real-time recognition using collaborative iot
devices. arXiv 2018, arXiv:1802.02138.
29. de Oliveira, F.M.C.; Borin, E. Partitioning convolutional neural networks for inference on constrained Internet-of-Things
devices. In Proceedings of the 2018 30th International Symposium on Computer Architecture and High Performance Computing
(SBAC-PAD), Lyon, France, 24–27 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 266–273.
30. Mohammed, T.; Joe-Wong, C.; Babbar, R.; Di Francesco, M. Distributed inference acceleration with adaptive DNN partitioning
and offloading. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON,
Canada, 6–9 July 2020; IEEE: Piscataway, NJ, USA,2020; pp. 854–863.
31. He, W.; Guo, S.; Guo, S.; Qiu, X.; Qi, F. Joint DNN partition deployment and resource allocation for delay-sensitive deep learning
inference in IoT. IEEE Internet Things J. 2020, 7, 9241–9254. [CrossRef]
32. Tang, X.; Chen, X.; Zeng, L.; Yu, S.; Chen, L. Joint multiuser dnn partitioning and computational resource allocation for
collaborative edge intelligence. IEEE Internet Things J. 2020, 8, 9511–9522. [CrossRef]
33. Dong, C.; Hu, S.; Chen, X.; Wen, W. Joint Optimization With DNN Partitioning and Resource Allocation in Mobile Edge
Computing. IEEE Trans. Netw. Serv. Manag. 2021, 18, 3973–3986. [CrossRef]
34. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf.
Process. Syst. 2012, 25. [CrossRef]
35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
36. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient
convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861.
37. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer
parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360.
38. Krizhevsky, A.; Hinton, G.; Learning Multiple Layers of Features from Tiny Images; Technical Report, University of Toronto, Toronto,
ON, Canada, 2009.
39. Qi, H.; Sparks, E.R.; Talwalkar, A. Paleo: A performance Model for Deep Neural Networks. 2016. Available online: https:
//openreview.net/pdf?id=SyVVJ85lg (accessed on 12 June 2021).
40. Tian, X.; Zhu, J.; Xu, T.; Li, Y. Mobility-included DNN partition offloading from mobile devices to edge clouds. Sensors 2021,
21, 229. [CrossRef]