Optimising Deep Learning Split Deployment For IoT Edge Networks
Optimising Deep Learning Split Deployment For IoT Edge Networks
Abstract—The Internet of Things (IoT) often generates large addressed is the low throughput rates of running deep learning
volumes of messy data which are difficult to process efficiently. models on commonly available edge devices. In order to main-
While deep learning models have demonstrated their suitability tain reasonable throughput rates while reducing the network
in processing this data, the memory and processing requirements
makes it difficult to deploy on edge nodes while achieving viable costs, one solution is to split the model and deploy one sub-
throughput results. Current solutions involve deploying the model model on the edge node and the other on the centralised server
in the cloud, but this leads to increased network costs due to the [4]. Data is partially processed on the edge node and is then
transfer of raw data. However, the layer based design of deep transmitted to the server as a smaller file to reduce network
learning models allows for a model to be split into sub-models costs. For example, if the sensor and edge nodes were deployed
and deployed separately across IoT nodes. By deploying parts
of the model on the edge node and in the cloud, the edge node on a local network, the network capacity would be much
is able to transmit an intermediate layer’s feature output to the greater and more cost efficient than transmitting to the cloud.
following sub-model instead of the raw input data. This reduces By processing the raw data through a distributed model, the
the size of the data being transmitted and results in a lower cost to amount of data being sent to the cloud could be minimised.
the network. However, selecting the best layer to split the model However, determining the layer for splitting the model so that
becomes a multi-objective optimisation problem. In this paper, we
propose an optimisation method that considers the network cost, it does not lose data throughput or overload the edge node
input rate and processing overhead in selecting the best layer while maximising the benefits is an open research problem.
for splitting a model across an IoT network. We profile several The following research questions have been identified for
popular model architectures to highlight their performance using addressing this problem:
this split deployment. Results from simulated and physical tests
of the optimal layers are provided to demonstrate the method’s RQ1 What is the optimal layer for splitting a model across an
effectiveness in real-world applications. IoT network?
I. I NTRODUCTION RQ2 What are the primary parameters for determining a opti-
mal layer for splitting a model?
The boom of IoT applications and networks in the 21st
century creates new opportunities for applications and their RQ1 covers the primary objective of determining the optimal
challenges for applying deep learning models [1]. Small, layer to split a model upon to gain the most benefits. To fulfil
low powered sensors within IoT networks provide an endless RQ1, RQ2 proposes a multi-objective optimisation problem to
supply of data that is transferred to the cloud for processing. be solved to determine the optimal layer.
Deep learning is particularly suited for use with IoT datasets In this paper, we propose a weighted dominance test for
due to its ability to comprehend large amounts of data and to determining the optimal layer for splitting a model across an
work with complex data such as image, audio, and ”messy” IoT network that considers the throughput rate, latency and
data that comes raw from its environment. However, one of device capabilities. This method is tested with both simulated
the main issues that IoT networks face is the bandwidth costs and physical experiments to demonstrate the robustness of our
of transferring the data from the low power sensor nodes method when deploying several popular convolutional neural
to a centralised server that is powerful enough to process network (CNN) models in an IoT application.
the incoming data at a reasonable rate. This is commonly The following sections of this paper include an examination
solved by shrinking or compressing the data or reducing the of related work to IoT, computing, and the usage of deep learn-
throughput rate [2], all of which have the additional downsides ing in these fields. Then the proposed new optimisation method
of reducing the available data feature sets for processing and how it can help solve deployment strategy problems are
or increasing overall latency. Edge computing aims to solve discussed. Then the performance results of our experiment
this by placing processing elements close to the sensors and are presented, which are then finally followed by our drawn
thereby reducing the transfer costs as the data needs to travel conclusions and future work.
less distances [3]. As mobile devices become more powerful,
IoT offers a platform for bringing deep learning classification II. R ELATED W ORK
closer to individual users for object recognition, augmented
reality and image filtering. In this section we briefly discuss the fields of IoT, edge
When combining the processing abilities of deep learning computing, deep learning and works in the literature that has
with edge computing, one of the issues that needs to be combined these fields together.
A. IoT and Edge Computing learning to predict when their large trucks need maintenance.
IoT started in 1999 as a term for supply chain management They now achieve a prediction accuracy greater than 90%
[5] and has grown to include a vast range of applications such which reduces their maintenance costs by $1.5-2.5 million per
as health-care, home automation, manufacturing and trans- year.
portation [6]. IoT revolves around the collection, transmission Deep learning models are commonly deployed to a central
and processing of data where data is collected, transmitted and server where data from the sensors can be fed into the model
then processed by powerful servers to generate information for processing [12]. Once processed, the information produced
about the network’s environment. This transportation of data by the model can be collected, used to inform a decision,
is the primary cost for IoT networks due to the amount of or cause an action to be taken place. While deep learning
data being transmitted. Reducing this network cost has been models are able to process data at a fast rate, they require
an ongoing and active research topic in this field. a large amount of processing capacity and memory to be
Edge computing was developed as an alternative archi- able to operate efficiently. This has traditionally limited the
tecture in order to solve this problem [7]. By relocating deployment options of models to large, powerful servers thus
processing to the edge, an IoT network’s overall network cost making it difficult to deploy in an IoT network.
and latency is reduced [8]. The edge computing architecture
is shown in Fig. 1 and demonstrates how edge nodes operate C. Deep Learning in IoT with Edge Computing
in a distributed manner that allows communication between
In deploying a deep learning model to process the large
each other. In the current IoT field, edge computing networks
amount of data from the devices on an IoT network, the
have access to an increasing market of low cost devices with
issue of efficiency in resource-limited devices has attracted
reasonable processing capabilities such as the Amazon Echo
attention from the research community. Studies in the literature
Dot, Apple Smart watch, Google Nest and Nvidia Jetson
demonstrate how integrating cloud and edge servers for pre-
[9], [10]. This has shifted research focus from reducing the
processing the large amounts of data significantly reduces
processing cost of a model to be reducing data transfer and
the latency and capacity consumption of deep learning mod-
network costs. These improvements in available hardware are
els [14]. However the task of implementing deep learning
opening up the amount of processing able to moved to the edge
processing on edge devices is quite difficult and involves
layer and will increase the attractiveness of edge computing
multifaceted improvements to hardware devices. The greatest
as a design paradigm.
hurdles in deploying a neural network over an IoT network
are the bandwidth costs and the ability to provide a high
B. Deep Learning in IoT
rate of data collection. Distributing the deep learning model
Deep learning proposes a method that allows the computer across edge nodes and cloud servers was proposed to address
to automatically learn the features of patterns, and integrates these issues [4]. Fig. 2 demonstrates how a model can be
the feature learning into the process of building the model. distributed between the edge node and the cloud server for
This process aims to reduce the reliance on a priori knowledge image classification. In doing so, the IoT network can send
for detecting patterns and instead develops this knowledge by the edge node’s feature output to the cloud instead of the raw
training on the data collected from the environment [11]. In input data. This reduces the amount of data being transmitted
many areas, deep learning has achieved greater classification across the IoT network and the associated network costs.
accuracy than humans. Sensor devices in IoT networks collect However, selecting which layer to split the model becomes
large amounts of data which can be difficult to process another challenge as different model segmentation points lead
effectively. Deep learning allows for the large quantity of data to different intermediate data-size, latency and network costs
to be processed effectively and efficiently [12]. For example, [4]. Therefore, determining the best layer to split the model is
Goldcorp Mining [13] uses a combination of IoT and deep a non-trivial multi-objective optimisation problem.
350
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BHILAI. Downloaded on September 02,2023 at 11:02:30 UTC from IEEE Xplore. Restrictions apply.
|E| |Ti |
Algorithm 1: SplitEdge Optimisation
For Input : E, Ti
i=1 j=1
Output : S: Set of optimal layers to split for
pij lkj
max F (k) = wp · − wr · rkj − wl · (1) valid nodes
Mij Ci · Oi 1 S ← null
Subject to 2 foreach ei ∈ E do
|E|
3 foreach tij ∈ Ti do
bij ≤ Bi ∗ Vi (2) 4 val ← 0; layer ← null ;
i=1 // initialisation
rkj ≤ 1 (3) 5 foreach k ∈ Nij do
rkj 6 F (k) ← computeQuality(k) ; // as per
dij · ≤ Qj (4) Eq. 1
bij
pij ≤ Mij (5) 7 if F (k) > val then
8 val ← F (k);
|Ti |
9 layer ← k
lkj · pij ≤ Ci · Oi (6)
10 S ← S ∪ layer
j=1
11 return S;
Previous work using deep learning for IoT with edge
computing by Li et al. [4] focused on how to maximise the of processing done by the edge node without impacting the
numbers of tasks that can be deployed to the edge nodes in the data throughput. As the optimum values for these parameters
simulation environment. One of the limitations of this study are dependent on the deep learning model being deployed, the
is the scope of the simulated environment where factors such processing capabilities of edge devices, the data rate from the
as the processing and memory capacities of physical devices sensors and the network capacity, it is unfeasible to define a
were not taken into consideration. Furthermore, only a limited specific layer as being optimal for splitting a model for all
number of layers were tested with the study focusing on the circumstances.
optimal segmentation of these layers. For a split model to be Determining the optimum layer for splitting a model in
deployed in real-world IoT applications, the process for select- a given deployment can be described as a multi-objective
ing the optimal layer must consider the data throughput rate, optimisation problem that aims to maximise the input rate
input rate and processing capabilities of the edge nodes. These while minimising the data throughput and processing costs.
parameters take into consideration the aspects of both the IoT The multi-objective optimisation function for these parameters
network and deep learning model that need to optimised within is presented in Eq. 1 where the function F (k) aims to
a distributed IoT model. maximise the data rate (pij ) while minimising the output data
size ratio (rkj ) and the processing cost of the node (lkj ). This
III. S PLIT D EPLOYMENT OPTIMISATION
applies for a set of E edge nodes containing nodes ei with each
Recent trends in optimising IoT deployments focus on having a set of deep learning models as tasks Ti containing
minimising the network cost. However, when deploying a deep tasks tij . For each task, a model contains layers from [1−Nij ]
learning model across an IoT network, nodes must transfer where we wish to find the optimal layer (k) to split on.
data to the cloud for processing. In particular, when a model
As the scale in values varies greatly between the three
is split between an edge node and the cloud, the amount of
parameters, these must be normalised in order to establish
data being transmitted from edge node to cloud is dependent
meaningful relationships. pij is normalised against the maxi-
on the layer on which the model is split. As such, one of the
mum data rate of the node (Mij ) and lkj is normalised against
objectives in optimising a split model is the data throughput
the theoretical processing capacity of the device (Ci · Oi )
or the rate at which data is being transferred through the
allocated to tasks Ti where Ci is the overall maximum and Oi
IoT network. In traditional approaches to deploying a deep
is the available percentage for processing tasks. rkj is the ratio
learning model on an IoT network, the data from the sensors
of output data size as compared to the raw data transmission
is often compressed or lower in resolution in order to minimise
size and is already a normalised value. This ensures that each
network costs. However, with a split model, as the sub-model
parameter is a value with a range of 0, · · · , 1. Weights for
is being processed on the edge network, the data rate from the
pij , rkj and lkj allow for the optimisation algorithm to adjust
sensors does not need to be minimised. In fact, the maximum
according to the system priorities and are wp , wr and wl
rate of input data for the minimum data throughput is ideal in a
respectively.
split model. As more powerful IoT devices become available,
the flexibility in the number of layers and the amount of Our proposed algorithm for solving this multi-objective
processing that can be done on the edge nodes increases. As optimisation problem is a summation of weighted dominance
this processing can contribute to greater latency between the tests for each task allocated to each node that are subjects
sensor and the cloud, it is important to minimise the amount to constraints outlined in Eq. 2-6. Algorithm 1 describes the
selection process to find the optimal layer for splitting a model.
351
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BHILAI. Downloaded on September 02,2023 at 11:02:30 UTC from IEEE Xplore. Restrictions apply.
(a) Data input rate (b) Computational overhead (c) Data throughput rate
Fig. 3. Parameters values for the multi-objective optimisation. The layer that maximises the data input rate while minimising the computational overhead and
throughput rate is the optimum layer to split rthe deep learning model.
In this instance layer 18, a max pooling layer is determined The input dataset consisted of images with a 1920x1080p res-
to be the optimal layer. olution, which were measured to be on average 1.12MBs. Both
The algorithm first initialises the constant values related to experiment environments and applications were set up and
the edge node and deep learning tasks. Then the algorithm built using Python 3.6 and the Keras 2.2 framework with Ten-
compares the profile of each layer from [1 − Nij ] and discards sorflow 1.13 as the back-end. The five deep learning models
layers that do not adhere to the constraints. The resulting set used in these experiments are listed in Table I. MnistDense5 is
of potential layers and their profiles are then used to calculate a fully connected sequential model. The other four are publicly
the fitness of splitting the model at the given layer. The layers available CNN models, each of which are commonly used
with the highest return value from F (k) for all tasks in Ti are in deep learning benchmarking. The MnistDense5 model was
deemed the optimal layer to split on for these tasks on edge trained on the MNIST dataset, and the four pre-trained CNN
node ei . models were trained on the same dataset Dogs vs Cats from
IV. E XPERIMENT E VALUATION Asirra [15] collected from ImageNet. Dogs vs Cats contains
This section describes the experiment goals, environment 25,000 images of dogs and cats of which 20,000 images have
settings and then discusses the performance evaluation results. been used to train the models, and the remaining 5,000 images
have been used for testing.
A. Experiment Goal Layer numbers in the following experiments vary from
The first goal of the experiments is examine how several the layers calculated by the model authors because in these
popular deep learning models will operate when deployed in experiments they are numbered by the tensorflow profiler. This
a split fashion. By splitting at each layer of a deep learning profiler identifies support layers such as batch normalisation
model and measuring the input rate, data throughput and and pooling as individual layers which can be used for
processing costs, F (k) can be used to quantify the suitability splitting. To reduce complexity, split points were also restricted
of splitting on that layer. These results were then analysed to layers in a model that were the single recipient of all
by comparing the optimum layers for splitting between each current flows of data. For ResNet50 and InceptionV3 models,
model. split points are limited to the outsides of residual blocks and
Two experiments are performed, one as an artificial simu- InceptionV3 modules, respectively. For single path networks
lation running on a single computer and the second using an such as MnistDense5, Alexnet and VGG19, each layer was
edge node connected to a server over a Wi-Fi network. The considered as a viable split point.
simulation was conducted to profile the models and their layers
C. Experiment Performance Evaluation
in a controlled environment. The cross-platform experiment
tested the performance of the determined optimal split layers The simulated experiment has tested the 5 models by
against a selection of baseline configurations to demonstrate splitting on each layer and measuring their data input rate,
the effectiveness of the proposed optimisation method. computational overhead and data throughput rate. As seen
in Fig. 3, the optimum layer for splitting a model can be a
B. Environment Settings complex balance between these three parameters.
The simulated experiment was conducted using a worksta- The data input rate (Fig. 3a) is the amount of data that
tion computer featuring an Intel i7 6700k CPU and an Nvidia the edge node is capable of processing and is the frame rate
Geforce GTX 1080 graphics card. The physical experiment normalised against the maximum frame rate (Mij ) of 30fps. A
was conducted using an Nvidia Jetson Nano as the edge higher number indicates that the edge node is able to process
node device which features 472 Gflops processing capacity more of the input data when split at the given layer. Model
and network bandwidth connection measured to be 25.9Mb/s, MnistDense5 displayed a high input rate as its small layer
connected to a workstation computer acting as a central server. size and memory footprint per layer allowed it to be run
352
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BHILAI. Downloaded on September 02,2023 at 11:02:30 UTC from IEEE Xplore. Restrictions apply.
(a) Multi-objective optimisation values (b) Quality (F (k)) of splitting at each level
Fig. 4. Determining the optimal layer to split the InceptionV3 model across an IoT network.
more quickly and efficiently than the other models which were F (k) value indicates the optimal layer.
impacted by limitations in available memory. From these results, we can see that the input rate of the
The data throughput rate (Fig. 3c) is the data size ratio models is significantly lower than the theoretical limits as
of each layer compared to the raw data size rkj . Dots are indicated by the maximum service capacity in terms of flops.
added to each line to indicate potential split points. Values From examination, we found that while a decrease in flops
that are greater than 1 indicates an increase in the size of processed in the split model subsection did result in a higher
data thus an increase in data throughput if the model was possible input rate, the relationship was not linear and was also
split on this layer, increasing the overall cost. Throughput affected by available memory, CPU speed and I/O overhead.
values below 1 indicates a reduction in data throughput and We also found that some models performed better in terms of
a reduction in cost. From this graph, we see that models increasing bandwidth reduction at a minimal flop overhead.
tend to initially increase in bandwidth cost and then decrease Inception, AlexNet and VGG19 had multiple split points
as they approach the final layer. Some models fluctuate in available to choose from that provided a similar input rate
throughput size as they transform the feature set from layer at a lower bandwidth cost than raw data transmission. This
to layer, this is observed to happen inside the residual blocks is likely due to the architectural design of these models with
and convolutional layers of ResNet and VGG19 respectively. multiple layers of feature reduction in their pooling layers.
In Figure 3b the cumulative floating operations per second Alternatively, ResNet did not see the same level of benefits
(flops) for each layer of the models are shown. From this as other models when splitting and was only able to split at
we can see that the processing cost for the models increases the end of its residual blocks rather than on a pooling layer.
linearly at a similar rate for all the models before rapidly This meant that the model did not see a reduced bandwidth
increasing at specific points within the models. To keep the ratio until the later layers of the model where the flops had
amount of processing the edge node needs to complete per increased. Finally, the MnistDense5 performed as expected
input to a minimum, it is ideal to form a split before these of such a small model of similarly sized layers. A distinct
points and perform these more intensive computational layers bandwidth ratio is achieved at the first layer after the input,
on a server node which has more processing resources. which remains the same throughout the following layers before
the output layer. Such a small model could fairly easily be fully
In order to demonstrate our optimisation function, we have deployed to a reasonably powerful edge node, however it is
selected the InceptionV3 model as it is the largest and most clear that by splitting on the second layer there are bandwidth
complex deep learning model of the five benchmark models. savings that can be gained by using a split deployment model.
By applying the optimisation function, we can calculate the
quality of splitting the model at each layer. The values of V. C ONCLUSION
each parameter can be seen in Fig 4. In Fig. 4a, we can see In conclusion, this paper examines deep learning deploy-
that there is a reduction in the data throughput as the layers ment within IoT edge networks and how a deep learning model
increases but an increase in both processing cost and input can be split into multiple subsections and deployed to make
rate. When the parameters are equally weighted, the quality use of available processing capacity, reduce throughput costs
of splitting on each layer (F (k)) can fluctuate between each and increase the data input rate of an IoT application. Where
layer as shown in Fig. 4b where the layer with the highest previous methods for determining the optimal deployment
353
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BHILAI. Downloaded on September 02,2023 at 11:02:30 UTC from IEEE Xplore. Restrictions apply.
arrangements did not take the physical capacities of the [15] J. Elson, J. J. Douceur, J. Howell, and J. Saul,
“Asirra: A captcha that exploits interest-aligned manual image
edge devices into consideration, the method proposed in this categorization,” in Proceedings of 14th ACM Conference on
paper uses the data input rate, data throughput rate and the Computer and Communications Security (CCS). Association
processing capabilities of each node to quantify the quality for Computing Machinery, Inc., October 2007. [Online].
Available: https://www.microsoft.com/en-us/research/publication/asirra-
of splitting a model at a given layer. In our experiments, we a-captcha-that-exploits-interest-aligned-manual-image-categorization/
performed both simulated and physical tests with five common
and widely known deep learning model architectures with split
deployment and profiled their characteristics. While we found
that split deployment can achieve reducing networks costs
for each model, certain architectures provide better results
than others due to data converging through pooling layers of
reduced layer size. We also found that using the profiled flops
of a deep learning model to calculate the potential processing
time of that model on a specific device based on its technical
specifications is not an ideal method to estimate inference rates
due to other hardware factors at play. This presents an open
research question of how to best estimate the processing time
of a deep learning model deployed to an edge node so that
the layer deployment can be more accurately optimised.
R EFERENCES
[1] R. van der Meulen, “Gartner says 8.4 billion connected ”things” will
be in use in 2017, up 31 percent from 2016,” Feb 2017. [Online].
Available: https://www.gartner.com/en/newsroom/press-releases/2017-
02-07-gartner-says-8-billion-connected-things-will-be-in-use-in-2017-
up-31-percent-from-2016
[2] J. Azar, A. Makhoul, M. Barhamgi, and R. Couturier, “An energy
efficient iot data compression approach for edge machine learning,”
Future Generation Computer Systems, vol. 96, pp. 168–175, 2019.
[3] J. Tang, D. Sun, S. Liu, and J.-L. Gaudiot, “Enabling deep learning on
iot devices,” Computer, vol. 50, no. 10, pp. 92–96, 2017.
[4] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for
the internet of things with edge computing,” IEEE Network, vol. 32,
no. 1, pp. 96–101, 2018.
[5] K. Ashton, “That internet of things thing,” RFID Journal, vol. 22, no. 7,
pp. 97–114, 2009.
[6] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things
(iot): A vision, architectural elements, and future directions,” Future
generation computer systems, vol. 29, no. 7, pp. 1645–1660, 2013.
[7] X. Qi and C. Liu, “Enabling deep learning on iot edge: Approaches and
evaluation,” in 2018 IEEE/ACM Symposium on Edge Computing (SEC).
IEEE, 2018, pp. 367–372.
[8] H. El-Sayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty,
and C.-T. Lin, “Edge of things: the big picture on the integration of
edge, iot and the cloud in a distributed computing environment,” IEEE
Access, vol. 6, pp. 1706–1717, 2017.
[9] S. Lee, K. Son, H. Kim, and J. Park, “Car plate recognition based on cnn
using embedded system with gpu,” in Human System Interactions (HSI),
2017 10th International Conference on. IEEE, 2017, pp. 239–241.
[10] Y. Ukidave, D. Kaeli, U. Gupta, and K. Keville, “Performance of the
nvidia jetson tk1 in hpc,” in Cluster Computing (CLUSTER), 2015 IEEE
International Conference on. IEEE, 2015, pp. 533–534.
[11] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, “A survey on deep learning
for big data,” Information Fusion, vol. 42, pp. 146–157, 2018.
[12] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep
learning for iot big data and streaming analytics: A survey,” IEEE
Communications Surveys & Tutorials, vol. 20, no. 4, pp. 2923–2960,
2018.
[13] M. Kranz, “Why industry needs to accelerate iot standards,” IEEE
Internet of Things Magazine, vol. 1, no. 1, pp. 14–18, 2018.
[14] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, S. Chen,
and P. Hou, “A new deep learning-based food recognition system for
dietary assessment on an edge computing service infrastructure,” IEEE
Transactions on Services Computing, vol. 11, no. 2, pp. 249–261, 2018.
354
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BHILAI. Downloaded on September 02,2023 at 11:02:30 UTC from IEEE Xplore. Restrictions apply.