IoT and ML Question
IoT and ML Question
Article01/04/2019
July 2018
Volume 33 Number 7
[Machine Learning]
Imagine that,in the not too distant future, you’re the designer of a smart traffic intersection. Your smart intersection has four video cameras connected to an
Internet of things (IoT) device with a small CPU, similar to a Raspberry Pi. The cameras send video frames to the IoT device, where they’re analyzed using a
machine learning (ML) image-recognition model and control instructions are then sent to the traffic signals. One of the small IoT devices is connected to Azure
Cloud Services, where information is logged and analyzed offline.
This is an example of ML on an IoT device on the edge. I use the term edge device to mean anything connected to the cloud, where cloud refers to something
like Microsoft Azure or a company’s remote servers. In this article, I’ll explain two ways you can design ML on the edge. Specifically, I’ll describe how to write a
custom model and IO function for a device, and how to use the Microsoft Embedded Learning Library (ELL) set of tools to deploy an optimized ML model to a
device on the edge. The custom IO approach is currently, as I write this article, the most common way to deploy an ML model to an IoT device. The ELL
approach is forward-looking.
Even if you’re not working with ML on IoT devices, there are at least three reasons why you might want to read this article. First, the design principles involved
generalize to other software development scenarios. Second, it’s quite possible that you’ll be working with ML and IoT devices relatively soon. Third, you may
just find the techniques described here interesting in their own right.
Why does ML need to be on the IoT edge? Why not just do all processing in the cloud? IoT devices on the edge can be very inexpensive, but they often have
limited memory, limited processing capability and a limited power supply. In many scenarios, trying to perform ML processing in the cloud has several
drawbacks.
Latency is often a big problem. In the smart traffic intersection example, a delay of more than a fraction of a second could have disastrous consequences.
Additional problems with trying to perform ML in the cloud include reliability (a dropped network connection is typically impossible to predict and difficult to
deal with), network availability (for example, a ship at sea may have connectivity only when a satellite is overhead) and privacy/security (when, for example,
you’re monitoring a patient in a hospital.)
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 1/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
This article doesn’t assume you have any particular background or skill set but does assume you have some general software development experience. The
demo programs described in this article (a Python program that uses the CNTK library to create an ML model, a C program that simulates IoT code and a
Python program that uses an ELL model) are too long to present here, but they’re available in the accompanying file download.
Take a look at the screenshot in Figure 1 and the diagram in Figure 2. The two figures show a neural network with four input nodes, five hidden layer processing
nodes and three output layer nodes. The input values are (6,1, 3.1, 5.1, 1.1) and the output values are (0.0321, 0.6458, 0.3221). Figure 1 shows how the model
was developed and trained. I used Visual Studio Code, but there are many alternatives.
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 2/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
This particular example involves predicting the species of an iris flower using input values that represent sepal (a leaf-like structure) length and width and petal
length and width. There are three possible species of flower: setosa, versicolor, virginica. The output values can be interpreted as probabilities (note that they
sum to 1.0) so, because the second value, 0.6458, is largest, the model’s prediction is the second species, versicolor.
In Figure 2, each line connecting a pair of nodes represents a weight. A weight is just a numeric constant. If nodes are zero-base indexed, from top to bottom,
the weight from input[0] to hidden[0] is 0.2680 and the weight from hidden[4] to output[0] is 0.9381.
Each hidden and output node has a small arrow pointing into the node. These are called biases. The bias for hidden[0] is 0.1164 and the bias for output[0] is
-0.0466.
You can think of a neural network as a complicated math function because it just accepts numeric input and produces numeric output. An ML model on an IoT
device needs to know how to compute output. For the neural network in Figure 2, the first step is to compute the values of the hidden nodes. The value of each
hidden node is the hyperbolic tangent (tanh) function applied to the sum of the products of inputs and associated weights, plus the bias. For hidden[0] the
calculation is:
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 3/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
XML
Hidden nodes [1] through [4] are calculated similarly. The tanh function is called the hidden layer activation function. There are other activation functions that
can be used, such as logistic sigmoid and rectified linear unit, which would give different hidden node values.
After the hidden node values have been computed, the next step is to compute preliminary output node values. A preliminary output node value is just the sum
of products of hidden nodes and associated hidden-to-output weights, plus the bias. In other words, the same calculation as used for hidden nodes, but without
the activation function. For the preliminary value of output[0] the calculation is:
XML
The values for output nodes [1] and [2] are calculated in the same way. After the preliminary values of the output nodes have been computed, the final output
node values can be converted to probabilities using the softmax activation function. The softmax function is best explained by example. The calculations for the
final output values are:
XML
As with the hidden nodes, there are alternative output node activation functions, such as the identity function.
To summarize, an ML model is all the information needed to accept input data and generate an output prediction. In the case of a neural network, this
information consists of the number of input, hidden and output nodes, the values of the weights and biases, and the types of activation functions used on the
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 4/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
OK, but where do the values of the weights and the biases come from? They’re determined by training the model. Training is using a set of data that has known
input values and known, correct output values, and applying an optimization algorithm such as back-propagation to minimize the difference between
computed output values and known, correct output values.
There are many other kinds of ML models, such as decision trees and naive Bayes, but the general principles are the same. When using a neural network code
library such as Microsoft CNTK or Google Keras/TensorFlow, the program that trains an ML model will save the model to disk. For example, CNTK and Keras
code resembles:
XML
mp = ".\\Models\\iris_nn.model"
model.save(mp, format=C.ModelFormat.CNTKv2) # CNTK
model.save(".\\Models\\iris_model.h5") # Keras
XML
mp = ".\\Models\\iris_nn.model"
model = C.ops.functions.Function.load(mp) # CNTK
model = load_model(".\\Models\\iris_model.h5") # Keras
Most neural network libraries have a way to save just a model’s weights and biases values to file (as opposed to the entire model).
For example, the 4-5-3 iris model described in the previous section has only (4 * 5) + 5 + (5 * 3) + 3 = 43 weights and biases. But an image classification model
with millions of input pixel values and hundreds of hidden processing nodes can have hundreds of millions, or even billions, of weights and biases. Notice that
the values of all 43 weights and biases of the iris example are shown in Figure 1:
XML
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 5/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
So, suppose you have a trained ML model. You want to deploy the model to a small, weak, IoT device. The simplest solution is to install onto the IoT device the
same neural network library software you used to train the model. Then you can copy the saved trained model file to the IoT device and write code to load the
model and make a prediction. Easy!
Unfortunately, this approach will work only in relatively rare situations where your IoT device is quite powerful—perhaps along the lines of a desktop PC or
laptop. Also, neural network libraries such as CNTK and Keras/TensorFlow were designed to train models quickly and efficiently, but in general they were not
necessarily designed for optimal performance when performing input-output with a trained model. In short, the easy solution for deploying a trained ML model
to an IoT device on the edge is rarely feasible.
The demo program starts by using the gcc C/C++ tool to compile file test.c into an executable on the target device. Here, the target device is just my desktop
PC but there are C/C++ compilers for almost every kind of IoT/CPU device. When run, the demo program displays the values of the weights and biases of the
iris flower example, then uses input values of (6.1, 3.1, 5.1, 1.1) and computes and displays the output values (0.0321, 0.6458, 0.3221). If you compare Figure 3
with Figures 1 and 2, you’ll see the inputs, weights and biases, and outputs are the same (subject to rounding error).
Demo program test.c implements only the neural network input-output process. The program starts by setting up a struct data structure to hold the number of
nodes in each layer, values for the hidden and output layer nodes, and values of the weights and biases:
c++
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 6/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
#include <stdio.h>
#include <stdlib.h>
#include <math.h> // Has tanh()
typedef struct {
int ni, nh, no;
float *h_nodes, *o_nodes; // No i_nodes
float **ih_wts, **ho_wts;
float *h_biases, *o_biases;
} nn_t;
c++
The key lines of code in the demo program main function look like:
c++
The point is that if you know exactly how a simple neural network ML model works, the IO process isn’t magic. You can implement basic IO quite easily.
The main advantage of using a custom C/C++ IO function is conceptual simplicity. Also, because you’re coding at a very low level (really just one level of
abstraction above assembly language), the generated executable code will typically be very small and run very fast. Additionally, because you have full control
over your IO code, you can use all kinds of tricks to speed up performance or reduce memory footprint. For example, program test.c uses type float but,
depending on the problem scenario, you might be able to use a custom 16-bit fixed-point data type.
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 7/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
The main disadvantage of using a custom C/C++ IO approach is that the technique becomes increasingly difficult as the complexity of the trained ML model
increases. For example, an IO function for a single hidden layer neural network with tanh and softmax activation is very easy to implement—taking only about
one day to one week of development effort, depending on many factors, of course. A deep neural network with several hidden layers is somewhat easy to deal
with—maybe a week or two of effort. But implementing the IO functionality of a convolutional neural network (CNN) or a long, short-term memory (LSTM)
recurrent neural network is very difficult and would typically require much more than four weeks of development effort.
I suspect that as the use of IoT devices increases, there will be efforts to create open source C/C++ libraries that implement the IO for ML models created by
different neural network libraries such as CNTK and Keras/TensorFlow. Or, if there’s enough demand, the developers of neural network libraries might create
C/C++ IO APIs for IoT devices themselves. If you had such a library, writing custom IO for an IoT device would be relatively simple.
In words, the ELL system accepts an ML model created by a supported library, such as CNTK, or a supported model format, such as open neural network
exchange (ONNX). The ELL system uses the input ML model and generates an intermediate model as an .ell file. Then the ELL system uses the intermediate .ell
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 8/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
model file to generate executable code of some kind for a supported target device. Put another way, you can think of ELL as a sort of cross-compiler for ML
models.
A more granular explanation of how ELL works is shown on the right side of Figure 4, using the iris flower model example. The process starts with an ML
developer writing a Python program named iris_nn.py to create and save a prediction model named iris_cntk.model, which is in a proprietary binary format. This
process is shown in Figure 1.
The ELL command-line tool cntk_import.py is then used to create an intermediate iris_cntk.ell file, which is stored in JSON format. Next, the ELL command-line
tool wrap.py is used to generate a directory host\build of C/C++ source code files. Note that “host” means to take the settings from the current machine, so a
more common scenario would be something like \pi3\build. Then the cmake.exe C/C++ compiler-build tool is used to generate a Python module of executable
code, containing the logic of the original ML model, named iris_cntk. The target could be a C/C++ executable or a C# executable or whatever is best-suited for
the target IoT device.
The iris_cntk Python module can then be imported by a Python program (use_iris_ell_model.py) on the target device (my desktop PC), as shown in Figure 5.
Notice that the input values (6.1, 3.1, 5.1, 1.1) and output values (0.0321, 0.6457, 0.3221) generated by the ELL system model are the same as the values
generated during model development (Figure 1) and the values generated by the custom C/C++ IO function (Figure 3).
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 9/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
The leading “(py36)” before the command prompts in Figure 5 indicate I’m working in a special Python setting called a Conda environment where I’m using
Python version 3.6, which was required at the time I coded my ELL demo.
The code for program use_iris_ell_model.py is shown in Figure 6. The point is that ELL has generated a Python module/package that can be used just like any
other package/module.
XML
# use_iris_ell_model.py
# Python 3.6
import numpy as np
import tutorial_helpers # used to find package
import iris_cntk as m # the ELL module/package
print("\nBegin use ELL model demo \n")
unknown = np.array([[6.1, 3.1, 5.1, 1.1]],
dtype=np.float32)
np.set_printoptions(precision=4, suppress=True)
print("Input to ELL model: ")
print(unknown)
predicted = m.predict(unknown)
print("\nPrediction probabilities: ")
print(predicted)
print("\nEnd ELL demo \n"
The ELL system is still in the very early stages of development, but based on my experience, the system is ready for you to experiment with and is stable enough
for limited production development scenarios.
I expect your reaction to the diagram of the ELL process in Figure 4 and its explanation is something like, “Wow, that’s a lot of steps!” At least, that was my
reaction. Eventually, I expect the ELL system to mature to a point where you can generate a model for deployment to an IoT device along the lines of:
XML
source_model = ".\\iris_cntk.model"
target_model = ".\\iris_cortex_m4.model"
ell_generate(source_model, target_model)
But for now, if you want to explore ELL you’ll have to work with several steps. Luckily, the ELL tutorial from the ELL Web site on which much of this article is
based is very good. I should point out that to get started with ELL you must install ELL on your desktop machine, and installation consists of building C/C++
source code—there’s no .msi installer for ELL (yet).
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 10/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
A cool feature of ELL that isn’t obvious is that it performs some very sophisticated optimization behind the scenes. For example, the ELL team has explored ways
to compress large ML models, including sparsification and pruning techniques, and replacing floating point math with 1-bit math. The ELL team is also looking
at algorithms that can be used in place of neural networks, including improved decision trees and k-DNF classifiers.
The tutorials on the ELL Web site are quite good, but because there are many steps involved, they are a bit long. Let me briefly sketch out the process so you
can get a feel for what installing and using ELL is like. Note that my commands are not syntactically correct; they’re highly simplified to keep the main ideas
clear.
XML
In words, you must have quite a few tools installed before starting, then you pull the ELL source code down from GitHub and then build the ELL executable tools
and Python binding using cmake.
XML
That is, you use ELL tool cntk_import.py to create a .ell file from a CNTK model file. You use wrap.py to generate a lot of C/C++ specific to a particular target IoT
device. And you use cmake to generate executables that encapsulate the original trained ML model’s behavior.
Wrapping Up
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 11/12
2/1/24, 11:22 AM Machine Learning - Machine Learning with IoT Devices on the Edge | Microsoft Learn
To summarize, a machine learning model is all the information needed for a software system to accept input and generate a prediction. Because IoT devices on
the edge often require very fast and reliable performance, it’s sometimes necessary to compute ML predictions directly on a device. However, IoT devices are
often small and weak, so you can’t simply copy a model that was developed on a powerful desktop machine to the device. A standard approach is to write
custom C/C++ code, but this approach doesn’t scale to complex ML models. An emerging approach is the use of ML cross-compilers, such as the Microsoft
Embedded Learning Library.
When fully mature and released, the ELL system will quite likely make developing complex ML models for IoT devices on the edge dramatically easier than it is
today.
Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr.
McCaffrey can be reached at [email protected].
Thanks to the following Microsoft technical experts who reviewed this article: Byron Changuion, Chuck Jacobs, Chris Lee and Ricky Loynd
https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/july/machine-learning-machine-learning-with-iot-devices-on-the-edge 12/12