Thesis - Anomaly Detection
Thesis - Anomaly Detection
maintenance predictability
Niklas Exell
Master’s thesis in Computer Engineering
Supervisor: Jerker Björkqvist
Åbo Akademi University
Faculty of Science and Engineering
Information Technologies
October, 2023
Abstract
Keywords:
TinyML, Machine Learning, Embedded Systems, Edge, Decentralised
1
Contents
1 Preface 4
2 Introduction 5
3 Machine learning 6
3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 7
3.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Anomaly Detection 11
4.1 Categories of anomaly detection. . . . . . . . . . . . . . . . . 11
4.2 Use cases of anomaly detection . . . . . . . . . . . . . . . . . 12
4.3 Anomaly types . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Examples of anomaly detection methods . . . . . . . . . . . . 14
5 Edge Computing 17
5.1 Why edge computing? . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Why not edge computing? . . . . . . . . . . . . . . . . . . . . 17
5.3 Examples of edge computing . . . . . . . . . . . . . . . . . . . 17
5.4 Microcontrollers on the edge . . . . . . . . . . . . . . . . . . . 18
5.4.1 Are microcontrollers necessary? . . . . . . . . . . . . . 19
5.5 Hybrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.6 Environmental impact . . . . . . . . . . . . . . . . . . . . . . 20
6 TinyML 22
6.1 What is tinyml . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Motivation of TinyML . . . . . . . . . . . . . . . . . . . . . . 22
6.2.1 Power consumption . . . . . . . . . . . . . . . . . . . . 22
6.2.2 Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3 Difference to conventional Machine Learning . . . . . . . . . . 23
6.4 TensorFlow vs TensorFlow Lite vs TensorFlow Lite Micro . . . 24
6.4.1 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . 24
6.4.2 TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . 24
6.4.3 TensorFlow Lite Micro . . . . . . . . . . . . . . . . . . 25
6.5 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.6 FlatBuffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.7 Adapting models for microcontrollers . . . . . . . . . . . . . . 30
2
6.8 Computational and hardware need . . . . . . . . . . . . . . . 30
6.8.1 Training and deployment . . . . . . . . . . . . . . . . . 30
6.8.2 Deployed . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8 Conclusion 49
9 Summary in Swedish 51
9.1 Inledning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.2 Maskininlärning . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3 Identifiering av avvikelser . . . . . . . . . . . . . . . . . . . . 51
9.4 Kantberäkning . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.5 Maskininlärning på mikrokontroller - TinyML . . . . . . . . . 52
9.6 Analys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.7 Sammanfattning . . . . . . . . . . . . . . . . . . . . . . . . . 54
3
1 Preface
With evergrowing data generation on the edge, processing the data on the
edge becomes ever more valuable. Simultaneously machine learning is also
growing in popularity. This thesis focuses on the combination of these two
in the form of TinyML.
This thesis was written with the guidance of Prof. Jerker Björkqvist who
provided excellent feedback.
4
2 Introduction
In today’s world with an accelerating amount of devices, data is often gathered
which is then later analysed to be used as feedback in some form. Most of
this computation is done in the cloud, which means that large amounts of
data need to be transported to, stored and analysed at a central location.
This means that a large amount of computation is needed in one place and
large links are needed to send the data to the central location and a large
amount of storage is needed there for all the raw data from all the deployed
devices equipped with sensors.
This is why edge computing, which as the name implies, does the com-
puting at the edge as close to where the data is generated as possible. Doing
the processing close to where the data is generated can also have advantages
in that the latency for feedback is lower and therefore performance can be
better. By doing the computation on the edge nodes cost savings are also
achieved by not needing a large amount of central computation or a large
link to the central location.
These are especially factors onboard a ship where a link to the cloud
onshore can not be taken for granted while out at sea and use of a high
bandwidth uplink, though Ethernet or WiFi while at port or cellular while
close to shore, is limited by lack of proximity to the port/shore.
This thesis will research whether it is possible to predict maintenance
needs using machine learning by the use of TinyML, computed on the edge.
TinyML enables using ML models on the edge close to or on the data-
gathering embedded devices.
This scope is specifically for medium size marine diesel engines. Similarly
to how a marine engineer onboard a ship can listen to the engines in the
engine room and recognize that some sound seems off, and sometimes even
pinpoint where the change is coming from, a model should also be able to
be trained to detect these changes. The hypothesis of this thesis is that it
might be possible to train a model to recognize maintenance needs based on
the vibrations gathered by accelerometers.
In this thesis, I will analyse data gathered on a Roll-on/Roll-off ferry
that operated in the Baltic Sea between Finland and Sweden. This ship
is equipped with four Wärtsilä 12V32 4SA four-stroke diesel engines. Each
of the four engines had sensor units with accelerometers attached to the
engine block directly and the engine frameset. The engine room also had
two sensor units that were used as a reference as well as to monitor the
compute hardware enclosure temperature. All the described sensors were
retrofitted.
5
3 Machine learning
Machine Learning shortened as ML is a subset of Artificial Intelligence (AI).
In Machine Learning a Machine meaning a computer ”learns” about a phe-
nomenon of interest by using:
6
3.2 Unsupervised Learning
In Unsupervised Learning (UL) the data is not tagged.
UL can be broadly classified into Probabilistic and Neural Networks:
7
associated with the neurons and edges. These weights are multipliers that
affect the signal downstream of that neuron or edge. This means that neural
networks are weighted graphs.
Neural networks are often arranged in layers, especially in deep learning.
If the neurons are connected to all neurons in the layers above and below the
network is called fully connected however, multiple patterns of connection
exist.
The first layer that receives external input is called the input layer and
the last layer which gives output is called the output layer. Between the
input and output layers, there can be zero to multiple so-called hidden layers.
Single-layer and unlayered networks also exist.
A neural network can be configured to either only feed information for-
ward in the network (feedforward neural network) or have a form of memory
from earlier input data (recurring neural network):
• Recurring Neural Networks (RNN) on the other hand are set up in such
a way that connections can form loops. This means that the output of
an RNN is dependent on multiple successive sets of input data. This
means that an RNN can have internal memory and can therefore either
process variable-length input data or more easily process consecutive
data sequences where the data is only valid in the correct order i.e.
speech recognition.
The most common RNN type is Long Short-Term Memory (LSTM)
ANN which is an RNN with the addition of long-term memory in the
form of an internal state where context relating to the current data
sequence can be stored. This long-term memory is used to help with
the vanishing gradient problem, though LSTMs can also suffer from it.
The vanishing gradient problem occurs because over multiple cycles a
”memory” based on previous input data may vanish due to trending to
zero or infinity. Examples of uses of LSTM ANNs are speech recognition
and machine translation.
8
3.4.1 Training
When an Artificial Neural Network (ANN) is trained the term backpropaga-
tion is often used. Backpropagation is an algorithm that is used for the
backwards propagation of errors using gradient descent. Backpropagation is
used to adjust the weight values of the network. This is done backwards,
starting from the outputs, since the desired outputs are known and errors
can then be calculated by comparing the current value and the desired state.
Gradient descent is then used to adjust the weights such that errors are min-
imized. This is done for each weight and the new weights are applied at
the end of a learning iteration. When training an ANN multiple learning
iterations are used to increase performance.
Before training is started the desired type of network needs to be chosen.
Depending on the data, the number of inputs and outputs need to be chosen
before training starts.
When a neural network is trained, some static hyperparameters need to
be set before the training is started. Some relate to the network structure,
and some to the training algorithm.
Some examples of hyperparameters are[3]:
• Network structure:
• Training algorithm:
9
– Learning rate:
When the network is trained a learning rate is set which sets the
size of the step taken to adjust the model. A large learning rate
shortens the training time however at the expense of the precision
of the model at the end. An adaptive learning rate can be applied
in order to decrease training time and increase precision and avoid
oscillations of the weights.
– Momentum:
Momentum helps to avoid oscillations by knowing the direction of
the next step.
– Number of epochs:
The number of times the full training data is shown during the
training.
– Batch size
The number of samples given to the network before an update
happens.
10
4 Anomaly Detection
When doing maintenance prediction anomalies can be good predictors of
parts starting to fail or sub-optimal running and of maintenance being needed.
As IBM[4] states ”Anomaly detection is a process in machine learning
that identifies data points, events, and observations that deviate from a data
set’s normal behaviour. And, detecting anomalies from time series data is
a pain point that is critical to address for industrial applications.” Anomaly
detection has a large interest in order for us to solve many problems.
11
4.2 Use cases of anomaly detection
Anomaly detection can be applied to many things. Some examples are, as
given in[4]:
•
Figure 2: Point anomaly, source:[5]
13
Figure 4: Contextual anomaly, source:[5]
14
Local Outlier Factor is a density-based method for finding anomalies.
For each observation, the nearest neighbours are calculated. Then with
the computed neighbourhood, the local density is computed with Local
Reachability Density. Finally, the LOF score is calculated by compar-
ing the LRD and the previous Nearest neighbour.
15
Figure 5: Anomaly detection algorithms to choose from, source:[5]
16
5 Edge Computing
Edge computing at its simplest is putting the computation as close to the
edge, where the data is created and used, as possible or as IBM states:
”Edge computing is a distributed computing framework that brings en-
terprise applications closer to data sources such as IoT devices or local edge
servers. This proximity to data at its source can deliver strong business bene-
fits, including faster insights, improved response times and better bandwidth
availability.”[6]
17
Autonomous vehicles are probably one of the applications that are most
talked about. In self-driving cars, decisions need to be made extremely
quickly and no delay is tolerated. Hence, the processing needs to be
done on the edge, in the car itself, because obviously if mistakes happen
people can be seriously injured if not killed due to large masses moving
at high speeds close to each other.
Examples of self-driving systems on the OEM side are Teslas Autopilot
and on the aftermarket side Comma.ai’s Openpilot. While the imple-
mentations of these systems vary, with Tesla using more sensors beyond
cameras and Comma focusing on just cameras since humans only need
sight to drive, they both rely on edge computing to make decisions in
time.
• Computer vision:
For the application of computer vision, edge computing is quite clearly
the way to go due to the large amount of data gathered by the camera
sensor(s), especially with moving objects at high resolution. By pro-
cessing the images at the point of capture only data that is interpreted
needs to be sent forward, and if actuation is needed it can be done
immediately once the processing is done. Examples, where computer
vision is used, are input for autonomous vehicles and barcode readers.
18
data. Often a microcontroller is needed anyway to connect the sensors, so
doing processing on them might be beneficial. Though the microcontroller
of course needs to have enough address space and computational power in
order to sufficiently do the wanted computations as well as both gather the
sensor data and sustain communications to other nodes in the system.
However, on some systems where the edge nodes have a strict power
budget, for example, due to running on battery and/or photovoltaic
cells, it might be better to not do any processing on these nodes and
instead send the data to nodes that do not have a strict power budget.
19
Though sending large amounts of data can also need a considerable
amount of power so some pre-processing might be necessary to achieve
the lowest power consumption possible.
Cons:
• Address space: Larger data sets can be loaded on servers and therefore
a larger context could be used in the computation.
5.5 Hybrid
When doing computation on gathered data the approach used does of course
not have to be 100% edge computing or centralised. Usually, a better ap-
proach might be to do some simple pre-processing on the edge that can
drastically condense the data needed to be sent, while only needing a small
amount of computational performance of the edge nodes. Then on a central
server, the condensed data from multiple nodes could be further processed
with the advantage of wider context by using data from multiple nodes. The
degree to which how much processing is done where varies considerably with
the implementation at hand.
20
course highly dependent on what is attempted to be solved and the way
the problem at hand is approached. If the problem is simply connecting a
number of spread-out sensors and doing some processing of the data from
those sensors, using microcontrollers for each node of course has a smaller
environmental impact as well as cost compared to a larger system.
It becomes more complicated to compare if different approaches are used:
For example, doing object tracking by using microcontrollers attached to
each tracked object compared to using a single larger system that has the
performance to do the task with computer vision. So keeping this in mind
either can be the better choice depending on the application and implement-
ation.
21
6 TinyML
In this chapter, I will paraphrase significantly from the book ”TinyML Ma-
chine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Mi-
crocontrollers” by Pete Warden & Daniel Situnayake [8]
22
6.2.2 Cost
When comparing microcontrollers to systems intended to run an operating
system as well as the needed application(s) it is noticeable that microcon-
trollers are much cheaper. As a comparison, the cheapest Raspberry Pi, the
Raspberry Pi Zero, is about 5€ and can be used as a server. However, more
typically a larger x86 server is used which can cost anywhere from 1000€ to
100 000€. When comparing this to 32-bit microcontrollers that are available
for much less than 1€ the difference is significant if large deployments are
needed. Though the comparison is not straightforward since a server can do
the processing of the data from a large array of sensor-gathering nodes.
These same microcontrollers also have and will benefit from traditional
analogue and electromechanical control circuits being able to be replaced
with software-defined alternatives on microcontrollers which will continue to
further bring down the price, as well as flexibility, as more microcontrollers
are produced of devices installed on the edge.
• Embedded 32-bit chip means that little ram is available, a few hundred
kilobytes, which means that models have to be kept small.
• Dynamic memory is often avoided since it is not needed and not using it
increases reliability and makes the implementation more deterministic.
• Weights are often quantized to 8-bit integers after training before be-
ing loaded onto the device due to floating point arithmetic not being
23
guaranteed on microcontrollers. By going to 8-bit integers from 32-bit
floating point precision is obviously lost, however, training requires the
largest dynamic range and it is still done with 32-bit floating point so
no precision is lost there.
• Does not support all data types, for example, double precision floating
point.
Because of this TensorFlow Lite can fit into a few hundred kilobytes which
makes it able to fit into size-constrained applications. TensorFlow Lite also
has good support for 8-bit quantization of networks. Comparing the size of
8-bit versus 32-bit values a 75% savings in space can be achieved, assuming
a dense mapping of 8-bit values is supported.
24
6.4.3 TensorFlow Lite Micro
While TensorFlow Lite with its constraints is good and compact enough for
mobile devices, microcontrollers have even tighter constraints than this and
that is what TensorFlow Lite Micro is created for. When the Google team
started creating TensorFlow Lite Micro they knew that they would have a
bunch of constraints running it on microcontrollers:
• Requires C++11
25
The TensorFlow Lite Micro was written in C++11 in order to keep
consistency with TensorFlow Lite and for the ability to not have to
rewrite it from scratch. So the team decided to trade support for older
devices with sharing code with TensorFlow Lite.
When the team developing TensorFlow Lite Micro was deciding how to
implement the model on microcontrollers they compared the advantages and
disadvantages of an interpreted model and code generation.
Interpreted model :
With an Interpreted model the model is loaded into data structures that
define the model that is separate from the executed code which is static.
Code generation :
With code generation, the model is generated into C or C++ code with
parameters stored as data arrays and the architecture expressed as a series
of function calls. The generated code is often comprised of a single large file
with a few entry points that can be included with the other code needed and
then compiled.
Here are some key advantages of code generation:
• Ease of build
Since the model is defined directly in code without dependencies it is
easy to implement since it can just be copied into the code and then
compiled together with the rest of the needed code.
• Modifiability
Since all the code is in a single file and without dependencies it is easy
to modify it without needing to know and find what parts of libraries
are included.
• Inline data
Since the model is implemented in source code no additional files are
needed and therefore no loading or parsing is needed.
26
• Code size
If the platform and model are known only needed code needs to be
included keeping the size down.
• Upgradability
If you have locally modified the code and you then want to upgrade to
a newer version of the framework it might entail a significant amount
of work to patch your changes and the updated framework together.
However, the team realized that many of the advantages of code genera-
tion can be had by using project generation instead.
Project Generation :
In TensorFlow Lite project generation creates copies of only the files needed
to build a model and optionally sets IDE-specific project files so that they
can be built easily. Project generation retains most of the advantages of code
generation but also adds some:
• Upgradability All source files are copies of the original and kept in the
same place in the folder hierarchy. This means that if local changes are
made upstream upgrades can be merged using standard merge tools.
• Inline data Model parameters can still be compiled into the program
if needed so no unpacking or parsing is needed. This is done using
FlatBuffer serialization format.
• External dependencies All the required header and source files are
copied into the project so no dependencies need to be separately down-
loaded and installed.
27
The largest advantage that does not come automatically is the code size
since the interpreter structure makes it hard to know which code paths will
never be called. In TensorFlow Lite this can be resolved using the OpResolver
mechanism to register only the kernel implementations expected to be used
in the application.
6.5 Quantization
Since microcontroller hardware is better suited for integer calculations, to
be able to use the model that has been trained with floating point numbers,
due to the numbers vastly fluctuating during training, the model needs to
be quantized from containing floating point numbers into integers before
deploying it to a microcontroller.
A reduction from 32-bit floating point to 8-bit integers also gives a 75%
reduction in storage needed for the completed model while not having a
noticeable impact on the accuracy of inference.
Another benefit of using 8-bit integers is that many signal processing
algorithms also use 8-bit integer multiply and accumulate instructions which
means that the same hardware can be utilized for TinyML.
Running a fully quantized model is also more efficient which gives us
better latency on almost all devices.
As quantization is an active research field there are many opinions on
how it should be done. For weights, this is somewhat easy since the range
is known for each layer after the training process. However, it is trickier for
activation since the range of the output is not known. If a range too small is
used there will be clipping at the maximum and/or minimum and if a range
too large is used the accuracy will suffer.
When using TensorFlow and TensorFlow Lite the quantization is done at
the same time as when the model is converted from a TensorFlow training
environment to a TensorFlow Lite graph. Two types of quantisation are done
when converting to a TensorFlow Lite graph are:
28
• Post-training integer quantization is used to create a model that only
contains integers. This means that no floating point hardware is needed,
which is desirable since floating point hardware can be rare. However,
when doing the quantization context of the ranges of input needs to
be supplied in the form of example input that the model could expect
to receive while deployed. Having the right range will result in greater
accuracy without clipping.
6.6 FlatBuffers
In order to have efficient storage of the model FlatBuffers are used. ”Flat-
Buffers is an efficient cross-platform serialization library for C++, C#, C,
Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust
and Swift. It was originally created at Google for game development and
other performance-critical applications.”
Flatbuffers are well described by Warden et al.[8] and in the white paper
by Google[9] and here are the main points borrowed from there:
• With the help of schemas the Flatbuffer compiler creates native code (
C, C++, Python, Java...).
[10] The motivation for using FlatBuffers is to avoid the need to un-
pack/parse data. ”A FlatBuffer is a binary buffer containing nested objects
(structs, tables, vectors,..) organized using offsets so that the data can be
traversed in place just like any pointer-based data structure. Unlike most in-
memory data structures, however, it uses strict rules of alignment and endian-
ness (always little) to ensure these buffers are cross-platform. Additionally,
for objects that are tables, FlatBuffers provides forwards/backwards com-
patibility and general optionality of fields, to support most forms of format
evolution.” [9] FlatBuffers are generated with the help of a schema which
describes the object types which are used to compile efficient code for data
access.
29
6.7 Adapting models for microcontrollers
In order to be able to run the models on microcontrollers the models need
to be adapted to run on microcontrollers. This is done with a converter
that takes a trained model from Python and creates a TensorFlow Lite file.
However, there are some things to consider.
• All values that need to be variables during the training process, such
as weights, need to be turned into constants.
• While the models are trained on desktops/servers the models can easily
become dependent on features of the desktop environment that are not
supported on microcontrollers. Such as snippets of Python code or
advanced operations. This needs to be resolved before deploying onto
microcontrollers.
• FlatBuffers are used so that the model data can be loaded into memory
without the need to unpack or parse it. A FlatBuffer is exactly the
same in memory as its serialized form this means that the model can
be directly accessed from flash memory without needing to copy or
parse it into RAM.
30
much computation, creating it does. So in the same way as with conventional
ML, the computationally intense training is completed on a workstation or
server which is also able to use larger training data sets since it can have vast
amounts of RAM and non-volatile storage.
A decision also needs to be made on whether a specific set of hardware is to
be used for all devices or if generalized hardware is to be used. Meaning will
the system only use a defined type of microcontroller and sensors or will the
system be able to use differing hardware? In that case, some normalization
needs to be added to both:
• The magnitude of the data collected. For example, two different ac-
celerometers might output the same physical acceleration in different
magnitudes and data types digitally.
31
server. This of course will increase recovery time after a power
loss and might not even be possible if the network connectivity is
only intermittent.
– Writing the model to non-volatile memory. For this, the device
needs to have non-volatile memory that is able to be written to
during runtime.
6.8.2 Deployed
When it comes to the requirements of the devices used with TinyML many
of the hardware requirements come from where the system will be deployed,
for how long and what resources are available.
So when looking at the constraints there are two opposing constraints,
power consumption versus computational power and address space:
Power consumption When it comes to the power used, obviously, the less
that is used the longer a device can be deployed with a set amount of energy
stored or generated. As Warden [8] states a 1 mW or below energy cost can
make many new applications possible. As stated earlier this forces us to use
microcontrollers. Since the overhead in computation and storage to run even
a light operating system could consume more power than is available, not
running one is often a must. Also, the energy-saving sleep of the device is
significantly more complicated when an OS is involved.
32
• Tensorflow Lite Micro code size: In order to run the model the Tensor-
flow Lite Micro code of course needs to be included so that the Neural
Network and the operators implemented in the model can be used.
TensorFlow Lite Micro is designed to work with as little as 20KB of
flash and 4KB of SRAM in some applications.
• Model data size: The size of the model is of course very application
dependent and needs to be large enough in order to be able to generalize
the phenomena at hand.
Figure 6: Output from Arduino IDE after compilation of the magic wand
example from the TinyML GitHub[11].
33
of TinyML on accelerometer data, gestures that are made. The three ges-
tures that are recognized are the ”wing”, ”ring” and ”slope”.
• Accelerometer handler:
Reads the values of the accelerometer in a way applicable to the hard-
ware in use and writes it to the model’s input tensor. In the case of
the Arduino Nano 33 BLE Sense the data is also down-sampled from
119 Hz to 25 Hz.
• TFLite interpreter:
Runs the TensorFlow Lite model. This is the interesting part and will
be covered next.
• Model:
Contains the underlying data about the gestures to be recognized gathered
during the training phase.
• Gesture predictor:
Takes the output of the model and decided whether a gesture has been
made based on probability and the number of consecutive positive pre-
dictions.
34
• Output handler:
When a gesture has been recognized outputs to LED light and the serial
port what gesture was recognized in a way applicable to the hardware
in use.
7.1.1 Performance
When executing the shapes I tried to mimic the way Pete Warden did them
in his presentations on YouTube. As a result, I can consistently perform
detectable wing shapes, however, the detection of the ring and slope shapes
is poor. As Warden states, they should be harder than the wing to perform
but I am not sure to what degree.
The execution of the shapes is checked by outputting the accelerometer
data after sub-sampling and axis normalization.
35
Wing Shape As can be seen in the plot (Figure 10) the accelerometer
data used for prediction is somewhat noisy so the task of detecting a shape
is challenging, especially considering the model needs to be kept very small.
But on the Z axis, a somewhat clear pattern to the motions of the wing
shape can be seen: peaks from the direction changes, but with some noise.
However, the X and Y axis are noisier and it is much harder to see any
pattern there other than slightly from the rotation of the device during the
execution. What is also interesting is that the model did not detect the
wing shape unless the shape was executed somewhat violently, meaning the
accelerometer clips. Considering all this the model does well for the wing
shape.
36
Figure 10: Accelerometer data plot of a successful try to detect a wing shape.
When looking down at the Arduino NANO 33 BLE sense with the USB port
facing us, the axis are: X = Red, Y = Green and Z = Blue.
Ring Shape As can be seen in the plot (Figure 12) the attempts look
similar to the wing movement but when analyzing close there is a difference.
In the ring, a somewhat constant acceleration toward the centre of the circle
should be seen while the wing execution should have sudden direction change
peaks. However, the model is mostly not able to detect the Ring shape being
executed.
37
Figure 11: Illustration of execution of Ring shape [8]
Slope Shape As can be seen in the plot (Figure 14) when executing the
Slope shape there is some structure to it with the three phases of acceleration
(start, direction change and stop) but the model seems to struggle to detect
the execution.
38
Figure 14: Accelerometer data plot of a few unsuccessfully tries to detect a
Slope shape. When looking down at the Arduino NANO 33 BLE sense with
the USB port facing us, the axis are: X = Red, Y = Green and Z = Blue.
7.1.2 Epilogue
This example has since been removed[12] from the examples in the repository,
but it can still be found in the repository history.
39
same assistants are also implemented on mobile phones, which are somewhat
battery constrained (not as much as battery IoT devices but still) and benefit
from the lighter implementations.
However once the wake-up phrase has been detected the following speech,
containing the actual request, is sent to the cloud to be parsed and acted upon
since it is much more complex to parse.
However, both of these systems are largely proprietary so from the out-
side, it is hard to know exactly how they work.
40
Figure 15: Inside the enclosure.
The SUs were connected to the two PUs in the two enclosures following
order:
41
– One SU from the first or third engine.
– One SU from the second or fourth engine.
42
7.3.2 Analyses
For the analysis, an arbitrary data period is chosen since the data should be
somewhat cyclic due to the ferry operating on the same route generally. The
acceleration dataset used consists of 20 million samples of four variables:
• X: 8689
• Y: 3303
• Z: 0
In figures 17, 18 and 19 the acceleration against time is plotted to see the
general behaviour of the data. By doing this we can see that there is some
noise in the data, more so in the x and z-axis than in the y-axis. Interestingly
the noise for the x-axis goes to 2500 for the most part but slopes off at the
ends whereas the noise for the other axis goes to zero arbitrarily throughout.
From this, we can also see that the z-axis is up and down in the real world
since it has a constant DC bias reflecting the gravity of the earth.
43
2500
2000
1500
1000
500
0
01-22 12
01-22 15
01-22 18
01-22 21
01-23 00
01-23 03
01-23 06
01-23 09
01-23 12
Figure 17: Raw plot off acceleration off accelerometer, x-axis = date and
y-axis = acceleration
600
400
200
200
400
600
01-22 12
01-22 15
01-22 18
01-22 21
01-23 00
01-23 03
01-23 06
01-23 09
01-23 12
Figure 18: Raw plot off acceleration off accelerometer y-axis, plot x-axis =
date and y-axis = acceleration
44
1400
1200
1000
800
600
400
200
0
01-22 12
01-22 15
01-22 18
01-22 21
01-23 00
01-23 03
01-23 06
01-23 09
01-23 12
Figure 19: Raw plot off acceleration off accelerometer z-axis, plot x axis =
date and y-axis = acceleration
Since the data in the y-axis is the cleanest, having the least outliers, we
will use it and a threshold of 200 as a maximum value over a time window
(minimum would also work as well) as input data to train a simple model
that determines if the engine is running. Obviously, this is not an example of
something that would require machine learning to detect since we can easily
know whether or not the machine is running, but an example of simple data
that can be analyzed with tinyML. This shows that a model can be trained
on collected accelerometer data. With data this simple it is also hard to
know if the model is overfitted since it is so simple.
45
1.0
0.8
0.6
0.4
0.2
0.0
Figure 20: Plot of which windows the acceleration has been higher than the
threshold and the engine is assumed to be running. Window length of 1000.
Since my computer has limited resources I could not train any more soph-
isticated models. Since the computer lacked sufficient RAM and the limited
computing performance made iterating a very long process, I settled for Lin-
ear Regression since it should be able to mimic what we intuitively have done
by choosing a threshold.
Code used for data formatting, training and prediction:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
#Importing data
path='data/20200513_20200609/raw_csv/ME1_SU1_csv/acceleration_0*.csv'
df=pd.concat(map(pd.read_csv,sorted(glob.glob(path))),ignore_index=True)
blocks= int(len(df.y)/blocksize)
ymax=[0]*blocks
46
#Checking if max amplitude in the block is over the threshold
ymaxbin=[False]*blocks
#Averaging the output over 10 samples to get rid of erroneous data samples
n=10
avgResult = np.average(predicted.reshape(-1, n), axis=1)
Below is the plot of the predicted output, we can see that although the
general shape of the wanted output is there the output varies significantly
even though the output state should be stable for long periods.
47
1.0
0.8
0.6
0.4
0.2
0.0
Even though the problem seems simple we can see that analysing se-
quential data can be challenging and a simple Linear regression is often not
enough and some more complicated algorithms need to be used. Here we
can also see that for the model’s training, we still need a significant amount
of compute performance even though it can be possible to run inference on
minimal hardware. With both this example and the TinyML Wand example,
we can see that analysing sequential data can be pretty challenging.
48
8 Conclusion
With the vast amount of data collected today and in the future, it is possible
to see that there is plenty that can be done with the data with the help
of Machine Learning. Both on more conventional computers with operating
systems and perhaps accelerators as well as on the Edge and in some cases
on the Edge on Microcontrollers. When it comes to the way the analysis
is to be done, as can be seen in this thesis, there is a vast and evergrowing
way of doing things based on new research and implementations of ways to
compute the data:
• In the future we might also have models that are trained in real-time
as data is gathered.
However, as can also be seen in this thesis it is important that the quality
of the data is good, both:
• That the right kind of data is gathered. This of course requires planning
ahead in regard to what the possible end uses are for the gathered
data. For example, if the problem to be solved changes after the data
is already gathered it might be challenging to utilize the data.
For these reasons among others, good quality data is quite valuable. So
gathering as much and diverse data as possible can often be of great value as
long as it is done ethically. As data needs to be gathered from the real world,
the process can not be as agile as other parts of computer engineering.
As can be seen in the thesis as well, TinyML as well as edge computing
might not always be the right choice for applying machine learning in a
system. If the processing needs a holistic view of data gathered in the system
a more centralized approach is better. Similarly, if only small amounts of data
are generated and sufficient uplinks are in place a more centralized approach
might be preferred due to simplicity.
In regard to the analysis, I feel that I did not manage to get any truly
meaningful output. However, I think that the task of maintenance predict-
ability would be better to be done with a more diverse set of data from the
49
engines not just acceleration data. Labelled data of when an engine is not
running optimally or about to break would also be very valuable.
So to conclude, when planning to do data analysis it is important to figure
out what the root requirements are and based on those select the right kind
of processing to be done as well as where it is to be done.
50
9 Summary in Swedish
Titel: Tillämplighet av TinyML för förutsägbart underhåll.
9.1 Inledning
I dagens värld har vi en ökande mängd apparater som många samlar in data
i någon form som sedan ofta används för analys av systemet eller någon sorts
respons, endera reglering eller styrning. När datamängderna ökar explosion-
sartat kan det uppstå problem med hur data hanteras, lagras och behandlas.
För att hjälpa oss hantera all den genererade datan kan vi planera våra
system på olika sätt beroende på kraven för systemen. Denna avhandling
handlar om huruvida det är möjligt att behandla den genererade datan med
hjälp av kantberäkning på mikrokontroller, specifikt maskininlärning, i form
av TinyML.
9.2 Maskininlärning
Maskininlärning är ett mycket omdiskuterat område i dagens värld eftersom
vi har väldiga mängder med data samlade från olika sorters system. Då vi be-
handlar denna data kan vi inte alltid intuitivt veta hur olika variabler i datan
relaterar till varandra, men med hjälp av maskininlärning kan vi med algor-
itmer behandla data för att upptäcka korrelationer eller bygga upp modeller
som beskriver beteendet hos systemet eller systemen där datat är insamlat.
Dessa korrelationer och modeller kan sedan användas för att analysera till
exempel prestandan eller hälsan av systemet eller noder i systemet, eller för
att förutspå beteendet hos systemet eller noder i systemet.
51
• I vissa system kan två eller flera variabler vara extremt korrelerade och
ifall variablerna i ett eller flera sampel inte följer korrelationen kan vi
misstänka att ett anomali förekommit.
9.4 Kantberäkning
I kantberäkning görs beräkningar på den så kallade kanten av systemet, alltså
på de apparaterna där datan genereras, eller fysiskt relativt nära dem. Detta
kan ge oss följande fördelar:
• Minskar på mängden rå data som behöver skickas för att produceras.
• Responstiden minskar ifall vi på noderna behöver agera på den lokalt
samlade datan.
52
• Energianvändningen kan också vara begränsad på grund av läget där
noden måste befinna sig. Detta betyder att vi inte alltid har en “oänd-
lig” mängd med ström från elnätet utan noderna kanske är batterid-
rivna, ibland med solceller och ibland utan. Detta i sin tur innebär att
noderna måste vara väldigt energisnåla.
Men vi har också fördelar med att göra databehandlingen på kanten på
mikrokontrollrar:
• Behovet att skicka stora mängder data till en central server minskar
drastiskt.
9.6 Analys
Datan som analyserades är data samlad från en bilfärja som färdades i Bot-
tenviken mellan Vasa i Finland och Umeå i Sverige. I båtens maskinrum
och på motorerna och deras fästen placerades accelerometersensorer. Data
från dessa sensorer analyserades. För analysen användes ett slumpmässigt
tidsintervall. Analysen gjordes med linjär regression för att undersöka hur
bra algoritmen klarar av klassificering av data.
1.0
0.8
0.6
0.4
0.2
0.0
Figure 22: Graf av träningsdata där motorn har antagits vara igång. Datat
är delat upp i 1000 datapunkter långa fönster.
53
1.0
0.8
0.6
0.4
0.2
0.0
Med analysen kan det ses att en så simpel algoritm inte är så bra på
klassificering av sekventiella data. Formen av det önskade resultaten ses
tydligt i uppskattningen, men under perioderna motorn konstant är igång
hoppar uppskattningen extremt ofta till att motorn är av.
9.7 Sammanfattning
Som med de flesta problemen så finns det inte alltid en lösning som passar
perfekt för att lösa alla problem. TinyML är alltså ett verktyg till i verktygs-
backen för att kunna lösa problem var maskin inlärning kan vara lösningen,
men man måste fortfarande fundera på vad målet är med lösningen. TinyML
är alltså ett bra verktyg när vi vill minska på nätverksbehovet och latensen
och maximera användningen av mikrokontrollrar som vi redan har i bruk.
Samtidigt ser vi också att TinyML inte kan bota att vi fortfarande behöver
data av hög kvalitet för att kunna utnyttja den till högsta grad.
54
References
[1] Bruno Stecanella. Support Vector Machines (SVM) Algorithm Explained .
https : / / monkeylearn . com / blog / introduction - to - support -
vector-machines-svm/. Accessed 2022-05.
[2] Rebecca Bevans. Simple Linear Regression — An Easy Introduction
Examples. https://www.scribbr.com/statistics/simple-linear-
regression/. Accessed 2022-05.
[3] Pranoy Radhakrishnan. “What are Hyperparameters ? and How to
tune the Hyperparameters in a Deep Neural Network?” In: (2017). url:
https://towardsdatascience.com/what- are- hyperparameters-
and - how - to - tune - the - hyperparameters - in - a - deep - neural -
network-d0604917584a.
[4] What is anomaly detection? https://developer.ibm.com/learningpaths/
get-started-anomaly-detection-api/what-is-anomaly-detection/.
Accessed 2022-04.
[5] Sahil Garg. Algorithm selection for Anomaly Detection. https : / /
medium.com/analytics-vidhya/algorithm-selection-for-anomaly-
detection-ef193fd0d6d1. Accessed 2022-04.
[6] What is edge computing? https://www.ibm.com/cloud/what- is-
edge-computing. Accessed 2021-06.
[7] What is Edge Computing? https://www.intel.com/content/www/
us/en/edge-computing/what-is-edge-computing.html. Accessed
2021-06.
[8] Pete Warden and Daniel Situnayake. TinyML: Machine learning with
TENSORFLOW lite on Arduino and ultra-low power microcontrollers.
O’Reilly Media Inc., 2020.
[9] FlatBuffers white paper . https://google.github.io/flatbuffers/
flatbuffers_white_paper.html. Accessed 2021-10.
[10] FlatBuffers. https://google.github.io/flatbuffers/. Accessed
2021-10.
[11] https : / / github . com / tensorflow / tflite - micro / tree / main /
tensorflow/lite/micro/examples/magic_wand.
[12] https://github.com/tensorflow/tflite-micro/commit/bef8fe8bc6183cc4e1ce852579
55
[13] Andrei-Raoul Morariu, Wictor Lund, Andreas Lundell et al. “Edge-
based Vibration Monitoring of Marine Vessel Engines”. English. In:
12th Symposium on High-Performance Marine Vehicles. Ed. by Ber-
tram Volker. Symposium on High-Performance Marine Vehicles : HIPER
; Conference date: 12-10-2020 Through 14-10-2020. Germany: Technis-
che Universität Hamburg-Harburg, Oct. 2020, pp. 239–250.
56